Finding highest value in a HashMap issue:

Discussion in 'Programming General' started by SuF, Oct 7, 2011.

Finding highest value in a HashMap issue:
  1. Unread #1 - Oct 7, 2011 at 3:40 PM
  2. SuF
    Joined:
    Jan 21, 2007
    Posts:
    14,212
    Referrals:
    28
    Sythe Gold:
    1,234
    Discord Unique ID:
    203283096668340224
    <3 n4n0 Two Factor Authentication User Community Participant Spam Forum Participant Sythe's 10th Anniversary

    SuF Legend
    Pirate Retired Global Moderator

    Finding highest value in a HashMap issue:

    I made this method and it works but is slow (I've provided a full test class for ya). With a million Strings stored, it takes 229 seconds to find the 10,000th place highest number. If you reduce the number of stored values to like 100k, it works much faster but I still want to streamline it anyways. Thoughts?

    Code:
    import java.util.HashMap;
    
    public class MethodTest
    {
    	private HashMap<String, Integer> wordCount = new HashMap<String, Integer>();
    	
    	public MethodTest()
    	{
    		for (int a = 0; a < 1000000; a++)
    		{
    			wordCount.put(Integer.toString(a), a);
    		}
    	}
    	
    	public String getTopWord(int index)
    	{
    		String[] list = new String[index];
    		for (int a = 0; a < list.length; a++)
    		{
    			list[a] = null;
    		}
    		for (String word : wordCount.keySet())
    		{
    			for (int a = 0; a < list.length; a++)
    			{
    				if (list[a] == null || wordCount.get(word) > wordCount.get(list[a]))
    				{
    					for (int b = list.length - 1; b > a; b--)
    					{
    						list[b] = list[b - 1];
    					}
    					list[a] = word;
    					break;
    				}
    			}
    		}
    		return list[index - 1];
    	}
    	
    	public static void main(String[] args)
    	{
    		long a = System.currentTimeMillis();
    		System.out.println(new MethodTest().getTopWord(10000));
    		System.out.println((System.currentTimeMillis() - a));
    		System.out.println((System.currentTimeMillis() - a) / 1000);
    	}
    }
    
     
  3. Unread #2 - Oct 7, 2011 at 6:50 PM
  4. blindkilla
    Joined:
    Jun 22, 2005
    Posts:
    1,896
    Referrals:
    0
    Sythe Gold:
    6
    Discord Unique ID:
    282000633404456960
    Discord Username:
    sogord

    blindkilla Guru
    $25 USD Donor New

    Finding highest value in a HashMap issue:

    Seems to me that most of your inefficiency comes from the fact that you are swapping null values for a huge portion of the iterations. It's almost a bubble sort (worst/slowest sorting algorithm).

    I don't think that for loop is needed, there has to be a better way to do it.

    Anyway, I'm going out soon so I'll take another look tomorrow.

    Trying looking into some other sorting algorithms like merge sort, quick sort, heap sort.
     
  5. Unread #3 - Oct 7, 2011 at 7:35 PM
  6. SuF
    Joined:
    Jan 21, 2007
    Posts:
    14,212
    Referrals:
    28
    Sythe Gold:
    1,234
    Discord Unique ID:
    203283096668340224
    <3 n4n0 Two Factor Authentication User Community Participant Spam Forum Participant Sythe's 10th Anniversary

    SuF Legend
    Pirate Retired Global Moderator

    Finding highest value in a HashMap issue:

    I didn't feel like having to read about stuff... >_<. Maybe I will. I could just forget the null and just do the first what ever rank you want interations slightly differently.
     
  7. Unread #4 - Oct 7, 2011 at 7:41 PM
  8. itsboom
    Joined:
    May 7, 2010
    Posts:
    290
    Referrals:
    0
    Sythe Gold:
    0

    itsboom Forum Addict

    Finding highest value in a HashMap issue:

    Why do you have
    Code:
    for (int a = 0; a < list.length; a++)
    {
    	list[a] = null;
    }
    
    by default shouldnt every element be null when you create the array?

    Also I believe you have currently implemented an insertion sort(which still has the same Big O as bubble :())

    You are also sorting from high to low so the last value in your array is actually the lowest value in the hashmap
     
  9. Unread #5 - Oct 7, 2011 at 7:58 PM
  10. SuF
    Joined:
    Jan 21, 2007
    Posts:
    14,212
    Referrals:
    28
    Sythe Gold:
    1,234
    Discord Unique ID:
    203283096668340224
    <3 n4n0 Two Factor Authentication User Community Participant Spam Forum Participant Sythe's 10th Anniversary

    SuF Legend
    Pirate Retired Global Moderator

    Finding highest value in a HashMap issue:

    I dunno. I thought so but I read something that said if you tried to access an element that you haven't put a value in, it fucks your program. What ever stupid java book I have to read for class. >_<. They are null, it works fine without it.

    I thought it up myself so what everrrr. And yes. Its sorting it into rankings. 1 is the highest count, 2 is the next, etc.
     
  11. Unread #6 - Oct 7, 2011 at 8:03 PM
  12. itsboom
    Joined:
    May 7, 2010
    Posts:
    290
    Referrals:
    0
    Sythe Gold:
    0

    itsboom Forum Addict

    Finding highest value in a HashMap issue:

    I hate to be the one to tell you this, but this will not work and needs to be rewritten. You sort the values. great. that doesnt tell you anything. I'm trying to think of the best way to do it, but so far all I have is sorting every value in the hashmap and then returning the value at index index.

    I could be wrong, but from where I'm sitting I dont see any easy way of doing it in the current form.

    On the null thing. You can access a null value but if you try to do anything to it your program will shit bricks.
     
  13. Unread #7 - Oct 7, 2011 at 10:18 PM
  14. SuF
    Joined:
    Jan 21, 2007
    Posts:
    14,212
    Referrals:
    28
    Sythe Gold:
    1,234
    Discord Unique ID:
    203283096668340224
    <3 n4n0 Two Factor Authentication User Community Participant Spam Forum Participant Sythe's 10th Anniversary

    SuF Legend
    Pirate Retired Global Moderator

    Finding highest value in a HashMap issue:

    It works just fine....... It's just slow. I sort the entire hashmap and get index index. This is a faster one: (5 times faster)

    Code:
    public String getTopWord(int index)
    	{
    		String[] list = new String[index];
    		String lastWord = null;
    		for (String word : wordCount.keySet())
    		{
    			if (lastWord == null || wordCount.get(word) > wordCount.get(lastWord))
    			{
    				for (int a = 0; a < list.length; a++)
    				{
    					if (list[a] == null || wordCount.get(word) > wordCount.get(list[a]))
    					{
    						lastWord = list[list.length - 1];
    						for (int b = list.length - 1; b > a; b--)
    						{
    							list[b] = list[b - 1];
    						}
    						list[a] = word;
    						break;
    					}
    				}
    			}
    		}
    		return list[index - 1];
    	}
     
  15. Unread #8 - Oct 8, 2011 at 3:10 PM
  16. Nullware
    Joined:
    Jan 30, 2007
    Posts:
    1,761
    Referrals:
    4
    Sythe Gold:
    0

    Nullware Guru

    Finding highest value in a HashMap issue:

    Do you need a data structure that offers O(1) access to elements like HashMap? If you do then a better way of doing it would probably be using a HashSet instead of a HashMap since it will let you use Collections.sort(). Their built-in sort will run in n*log(n) time instead of the approximately n^2 solution you currently have.

    For what you're doing a sorted set (I used TreeSet) will offer better performance than what I described above while losing the ability to retrieve arbitrary elements.
    I tried with 100k, 1M, and 10M below.:)
    Code:
    import java.util.Iterator;
    import java.util.TreeSet;
    
    public class MethodTest
    {
       class Word implements Comparable
       {
          public String word;
          public Integer count;
          public Word(String w, Integer c)
          {
             this.word = w;
             this.count = c;
          }
    
          public int compareTo(Object o1)
          {
             if(this.count >= ((Word)o1).count)
                return -1;
             else
                return 1;
          }
       }
    
       private TreeSet<Word> wordCount = new TreeSet<Word>();
    
       public MethodTest()
       {
          for (int a = 0; a < 10000000; a++)
          {
             wordCount.add(new Word(Integer.toString(a), a));
          }
       }
    
       public String getTopWord(int index)
       {
          Iterator<Word> words = wordCount.iterator();
          String previous = "";
          String result = "";
          for(int i = 0; i < index; i++)
          {
             result = previous;
             previous = words.next().word;
          }
          return result;
       }
    
       public static void main(String[] args)
       {
          long a = System.currentTimeMillis();
          System.out.println(new MethodTest().getTopWord(10000));
          System.out.println((System.currentTimeMillis() - a));
          System.out.println((System.currentTimeMillis() - a) / 1000);
       }
    }
    
    P.S. I would recommend changing your editor to replace tabs with spaces (3-4) since most people do this and spaces typically stay formatted better than tabs in different places.
     
  17. Unread #9 - Oct 8, 2011 at 6:43 PM
  18. SuF
    Joined:
    Jan 21, 2007
    Posts:
    14,212
    Referrals:
    28
    Sythe Gold:
    1,234
    Discord Unique ID:
    203283096668340224
    <3 n4n0 Two Factor Authentication User Community Participant Spam Forum Participant Sythe's 10th Anniversary

    SuF Legend
    Pirate Retired Global Moderator

    Finding highest value in a HashMap issue:

    Well, essentially I am counting the number of times words are said in IRC and I need to be able to easily access any word if someone asks to see the count (or any other command). I'l just make a Word class and work off of that. I thought it was kind of silly to do just to store the word and the number but if it will make sorting and stuff faster, I'l do it.
     
  19. Unread #10 - Oct 8, 2011 at 6:47 PM
  20. SuF
    Joined:
    Jan 21, 2007
    Posts:
    14,212
    Referrals:
    28
    Sythe Gold:
    1,234
    Discord Unique ID:
    203283096668340224
    <3 n4n0 Two Factor Authentication User Community Participant Spam Forum Participant Sythe's 10th Anniversary

    SuF Legend
    Pirate Retired Global Moderator

    Finding highest value in a HashMap issue:

    Okay. That doesn't seem to work. >_>.
     
  21. Unread #11 - Oct 9, 2011 at 11:41 AM
  22. Nullware
    Joined:
    Jan 30, 2007
    Posts:
    1,761
    Referrals:
    4
    Sythe Gold:
    0

    Nullware Guru

    Finding highest value in a HashMap issue:

    Benchmarks:

    I ended up combining a Hashtable and a Treeset to essentially model a Hashtable that also has its elements sorted by value.
    Unfortunately it takes double the memory but I think it offers the performance you need based on my testing.

    Access functions

    getValueByKey() which finds the value (word count) based on a key will always be super fast since it uses the hashtable for lookups.

    getKeyByRank() which finds the word based on the ranking of its value will always be the bottleneck when looking for high rankings since it is using a sorted set which has to be iterated over one element at a time.


    Possible optimizations of getKeyByRank()
    • Not allow for rankings above a certain number to be queried (only allowing something like 100000 or less would keep this very fast)
    • If you do still want to allow any ranking to be searchable you could make it smarter by using a descendingIterator() if the rank you're looking for is in the latter half of the sorted set

    Code:
    Code:
    import java.util.*;
    
    public class SortedHashTable<K, V> {
    
       class KeyValueComparator implements Comparator<KeyValue<K, V>>
       {
          public int compare(KeyValue<K, V> item1, KeyValue<K, V> item2)
          {
             int v1 = (Integer) item1.value;
             int v2 = (Integer) item2.value;
    
             if(v1 >= v2)
                return -1;
             else
                return 1;
          }
       }
    
       class KeyValue<K, V>
       {
          public K key;
          public V value;
    
          public KeyValue(K k, V v)
          {
             this.key = k;
             this.value = v;
          }
       }
    
       private Hashtable<K, V> hashtable;
       private TreeSet<KeyValue<K, V>> values;
    
       public SortedHashTable()
       {
          this.hashtable = new Hashtable<K, V>();
          this.values = new TreeSet<KeyValue<K, V>>(new KeyValueComparator());
       }
    
       public void add(K key, V value)
       {
          KeyValue<K, V> kv = new KeyValue<K, V>(key, value);
          this.values.add(kv);
          this.hashtable.put(key, value);
       }
    
       public V getValueByKey(K key)
       {
          return this.hashtable.get(key);
       }
    
       public K getKeyByRank(int rank)
       {
          if(rank > values.size())
             return null;
    
          Iterator<KeyValue<K, V>> words = this.values.iterator();
          
          K previous = null;
          K result = null;
          for(int i = 0; i < rank; i++)
          {
             result = previous;
             previous = words.next().key;
          }
          return result;
       }
    
       public static void main(String[] args)
       {
          int n = Integer.parseInt(args[0]);
    
          SortedHashTable<String, Integer> wordCount = new SortedHashTable<String, Integer>();
    
          long a = System.currentTimeMillis();
          for (int i = 0; i < n; i++)
          {
             wordCount.add(Integer.toString(i), i);
          }
          System.out.println("Inserts took: " + (System.currentTimeMillis() - a) + "ms");
          System.out.println();
    
          a = System.currentTimeMillis();
          System.out.println("Rank 10000: " + wordCount.getKeyByRank(10000));
          System.out.println("getKeyByRank took: " + (System.currentTimeMillis() - a) + "ms");
          System.out.println();
    
          a = System.currentTimeMillis();
          System.out.println("Rank 1000000: " + wordCount.getKeyByRank(1000000));
          System.out.println("getKeyByRank took: " + (System.currentTimeMillis() - a) + "ms");
          System.out.println();
    
          a = System.currentTimeMillis();
          System.out.println("Key '10000': " + wordCount.getValueByKey("10000"));
          System.out.println("getValueByKey took: " + (System.currentTimeMillis() - a) + "ms");
          System.out.println();
    
          a = System.currentTimeMillis();
          System.out.println("Key '999999': " + wordCount.getValueByKey("999999"));
          System.out.println("getValueByKey took: " + (System.currentTimeMillis() - a) + "ms");
          System.out.println();
       }
    }
    
    I didn't bother with the optimizations because I thought it would be a good opportunity for you to try it if you even want to use this code.
    Also, wouldn't take up any more of my time if you decide you don't want to use this after all...^_^
     
  23. Unread #12 - Oct 9, 2011 at 1:19 PM
  24. SuF
    Joined:
    Jan 21, 2007
    Posts:
    14,212
    Referrals:
    28
    Sythe Gold:
    1,234
    Discord Unique ID:
    203283096668340224
    <3 n4n0 Two Factor Authentication User Community Participant Spam Forum Participant Sythe's 10th Anniversary

    SuF Legend
    Pirate Retired Global Moderator

    Finding highest value in a HashMap issue:

    I'l look through that to see if I can figure out what your doing as it is most likely better than what I am doing. Regardless, I did manage to get a binary sort working which its speed depends solely on how many elements there are, not the rank it is looking for which made me feel smart.
     
  25. Unread #13 - Oct 9, 2011 at 11:15 PM
  26. Nullware
    Joined:
    Jan 30, 2007
    Posts:
    1,761
    Referrals:
    4
    Sythe Gold:
    0

    Nullware Guru

    Finding highest value in a HashMap issue:

    Do you mean Binary Tree and not Binary Sort? I'd like to see what you did too.
     
  27. Unread #14 - Oct 10, 2011 at 4:26 PM
  28. SuF
    Joined:
    Jan 21, 2007
    Posts:
    14,212
    Referrals:
    28
    Sythe Gold:
    1,234
    Discord Unique ID:
    203283096668340224
    <3 n4n0 Two Factor Authentication User Community Participant Spam Forum Participant Sythe's 10th Anniversary

    SuF Legend
    Pirate Retired Global Moderator

    Finding highest value in a HashMap issue:

    Its just an insertion sort using a binary search to find where it should be the next value, but I decided it should be called binary sort...... >_>. I know I could make it faster, but I haven't gotten that far yet. I'm having fun playing around with a bunch of different sorting methods to try and get it faster. lol. Like with the below method, it doesn't have to sift the ENTIRE array over every time it moves a value, only the ones up to the index (since that's all we care about). That could improve its speed some (depending on the size of the array).

    Code:
    public String getTopWord3(int index)
        {
            String[] list = wordCount.keySet().toArray(new String[0]);
    
            // loops through all words
            for (int a = 1; a < list.length; a++)
            {
                String temp = list[a];
                int low = 0;
                int high = a;
                int mid;
                do
                {
                    mid = low + ((high - low) / 2);
                    if (wordCount.get(temp) > wordCount.get(list[mid]))
                    {
                        high = mid;
                    }
                    else if (wordCount.get(temp) < wordCount.get(list[mid]))
                    {
                        low = mid + 1;
                        if (low == high)
                        {
                            mid = low;
                            break;
                        }
                    }
                    else
                    {
                        break;
                    }
    
                } while (low < high);
                System.arraycopy(list, mid, list, mid + 1, a - mid);
                list[mid] = temp;
            }
    
            return list[index - 1];
        }
     
  29. Unread #15 - Oct 10, 2011 at 5:12 PM
  30. Nullware
    Joined:
    Jan 30, 2007
    Posts:
    1,761
    Referrals:
    4
    Sythe Gold:
    0

    Nullware Guru

    Finding highest value in a HashMap issue:

    Code:
    String[] list = wordCount.keySet().toArray(new String[0]);
    This copy is really expensive and it occurs every time you call getTopWord3().
    If you want to go ahead with your way of doing things you should at the very least maintain an up-to-date copy of the hashtable's data in an array at all times so you don't have to do this.

    Your binary search gives the same performance as the treeset I used because they are both really just sorted sets with O(log n) lookup. Only real difference is that I always maintain a copy of the hashtable's data in the treeset so it never has to do the full copy operation.
     
  31. Unread #16 - Oct 10, 2011 at 5:55 PM
  32. SuF
    Joined:
    Jan 21, 2007
    Posts:
    14,212
    Referrals:
    28
    Sythe Gold:
    1,234
    Discord Unique ID:
    203283096668340224
    <3 n4n0 Two Factor Authentication User Community Participant Spam Forum Participant Sythe's 10th Anniversary

    SuF Legend
    Pirate Retired Global Moderator

    Finding highest value in a HashMap issue:

    I didn't want to hold a sorted copy at all times just because it needs to add things extremely quickly (since it counts IRC words) but now I'm thinking that is the way to go since I have a fairly fast method and if I did that, I'd only have to put one item in at once and shift the array.

    And the copy (compared to the speed I'm getting) isn't that bad. And I've updated the method one more time to make it take even less time when the rank is low. D:
     
< CodingBay! | [Shell Scripting] Syntax Error >

Users viewing this thread
1 guest


 
 
Adblock breaks this site