Key Words: Hash tables, hash function, hash buckets, string hash, PJW algorithm to find string hashes.
Topics at a glance:
- Understanding Hash Tables, hash functions, hash buckets
- How to implement a custom Hash table that can store integers and strings.
In this chapter we’ll see another popular data structure known as the “Hash Table”. The core philosophy behind hash table is, as the name suggests, a unique hash key, associated with each of the values entered in the table. So how values and keys are associated? A hash table uses a unique key for identifying the storage location of the data it stores. Hash table finds the key dynamically and will not generally stores the key itself. Whenever a new data has to be entered, then and there a key is determined and using this key, a location will be decided for storing this data. When we want to search or delete this value/data then also hash table determines the key and finds the location and manipulates the data. That means, hash table necessarily do not require to store this key-value mapping information internally.
The limitation of a Hash table can be related to the limitation of the hashing algorithm itself. Hashing algorithm can often generates the same key for different values and will lead to a hash collision. To avoid this there are two popular methods:
- Chain hashing
- Open address hashing
In this chapter we’ll see what’s chain hashing. Chain hashing is relatively less complex than open address hashing. If it requires to frequently add and delete keys/values from the hash table, then it is better to use chain hashing. Also, it is safe from the issue of clustering that is inherent with open address hashing algorithm. The main disadvantage of chain hashing is its poor cache performance and inefficient space utilization.
We will adopt a technique using hash buckets for illustrating a hash table implementation using chain hashing. Also, to avoid the issue with inefficient space utilization, we’ll use std::lists rather than fixed size arrays to implement the storage strategy. ‘std::lists‘ are more efficient than std::vectors when it requires frequent insertion and deletion.
In our implementation, we will use a custom hash function that generates a hash key, which in turn, will be used as the bucket identifier. Storage location is an array of lists. The key computed by the hashing function will be the index to this array of lists. Data will be pushed back into a list found at an index. Here, key itself is not the location in the list. Also, it doesn’t make any sense to use indexing operation on a list which is not supported in C++ anyway. Instead of storing data in a single list that grows with every new data added, we use buckets of lists. In each bucket, a smaller list will be present and will be relatively faster to search and find the data from a smaller list.
Now, coming to custom hashing function. For integer values, we will use value modulo bucket size as the key computation algorithm. We’ll support one more datatype in our list. i.e. strings. For strings we will use the PJW hash function, used in Unix ELF formats.
Please refer to Wikipedia link:
https://en.wikipedia.org/wiki/PJW_hash_function for more information on PJW hash function.
Now, let us see the implementation of our custom Hash table.
You can see that Hash_Table class is a template class. I have defined the class invariant that the data type should be either an integer data type or a string data type. Any other data type will throw an exception from the constructor.
Let us see the result of executing the following code:
Testing integer hash... integer hash has been created Val : 5 added into hash table Val : 2 added into hash table Val : 6 added into hash table Val : 7 added into hash table Val : 1 added into hash table Val : 10 added into hash table Val : 11 added into hash table Val : 4 added into hash table Val : 8 added into hash table Testing deletion... 2 has been deleted from hash table 10 has been deleted from hash table 10 not deleted from hash table Testing string hash... string hash has been created Hello_String_1 added into Hash_Table Hello_Another_String_1 is added into Hash_Table Testing deletion... Hello_String_1 is deleted from Hash_Table Hello_Another_String_1 is deleted from Hash_Table Main ends string hash has been destroyed integer hash has been destroyed
See how our Hash table works for both integer and strings? It is possible to further improvise our Hash_Table class to support unsigned integers and floats as well. Also, we can add functions to print the data stored in the hash table. I’ll let you guys to look into those aspects of improvisation.
For now, we can use this Hash_Table implementation for our understanding purposes.
Enjoyed the chapter? Let me know in the comments below. Thanks! 😊