Chapter 22 : Data Structures With C++ – Hash Tables

Key Words: Hash tables, hash function, hash buckets, string hash, PJW algorithm to find string hashes.

Topics at a glance:

  • Understanding Hash Tables, hash functions, hash buckets
  • How to implement a custom Hash table that can store integers and strings.

In this chapter we’ll see another popular data structure known as the “Hash Table”. The core philosophy behind hash table is, as the name suggests, a unique hash key, associated with each of the values entered in the table. So how values and keys are associated? A hash table uses a unique key for identifying the storage location of the data it stores. Hash table finds the key dynamically and will not generally stores the key itself. Whenever a new data has to be entered, then and there a key is determined and using this key, a location will be decided for storing this data. When we want to search or delete this value/data then also hash table determines the key and finds the location and manipulates the data. That means, hash table necessarily do not require to store this key-value mapping information internally.

The limitation of a Hash table can be related to the limitation of the hashing algorithm itself. Hashing algorithm can often generates the same key for different values and will lead to a hash collision. To avoid this there are two popular methods:

  1. Chain hashing
  2. Open address hashing

In this chapter we’ll see what’s chain hashing. Chain hashing is relatively less complex than open address hashing. If it requires to frequently add and delete keys/values from the hash table, then it is better to use chain hashing. Also, it is safe from the issue of clustering that is inherent with open address hashing algorithm. The main disadvantage of chain hashing is its poor cache performance and inefficient space utilization.

We will adopt a technique using hash buckets for illustrating a hash table implementation using chain hashing. Also, to avoid the issue with inefficient space utilization, we’ll use std::lists rather than fixed size arrays to implement the storage strategy. ‘std::lists‘ are more efficient than std::vectors when it requires frequent insertion and deletion.

In our implementation, we will use a custom hash function that generates a hash key, which in turn, will be used as the bucket identifier. Storage location is an array of lists. The key computed by the hashing function will be the index to this array of lists. Data will be pushed back into a list found at an index. Here, key itself is not the location in the list. Also, it doesn’t make any sense to use indexing operation on a list which is not supported in C++ anyway. Instead of storing data in a single list that grows with every new data added, we use buckets of lists. In each bucket, a smaller list will be present and will be relatively faster to search and find the data from a smaller list.

Now, coming to custom hashing function. For integer values, we will use value modulo bucket size as the key computation algorithm. We’ll support one more datatype in our list. i.e. strings. For strings we will use the PJW hash function, used in Unix ELF formats.

Please refer to Wikipedia link:

https://en.wikipedia.org/wiki/PJW_hash_function for more information on PJW hash function.

Now, let us see the implementation of our custom Hash table.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
#include <iostream>
#include <list>
#include <string>

using namespace std;

template <typename T, int N>
class Hash_Table
{
public:
    enum class error_codes{invalid_key_type = 0, invalid_key = 1};
private:
    list<T> h_tables[N];
    string hash_type;
    
    int hash_key(T val)
    {
        int key = -1;
        unsigned int h = 0U;
        unsigned int g = 0U;
        
        if(hash_type.compare("integer hash") == 0)
        {
            int *temp_val_ptr = reinterpret_cast<int *>(&val);
            int temp_val = *temp_val_ptr;
            if(temp_val < 0){temp_val *= -1;}
            key = temp_val % N;
        }
        else
        {
            // string hash uses PJW hash algorithm 
            // in UNIX ELF string hashing 
            string *temp = reinterpret_cast<string*>(&val);
            char *str = const_cast<char *>(temp->c_str());
            char *p;
            int int_eq = 0;
            
            for(p = str; *p; ++p)
            {
                int_eq = static_cast<int>(*p);
                h = (h << 4) + int_eq; 
                g = h & 0xF0000000U;
                
                if(g != 0x0U) // i.e. high nibble is set 
                {
                    h ^= (g >> 24);
                }
                
                h &= ~g;
            }
            
            key = static_cast<int>(h) % N;
        }
        
        return key;
        
    }
    
public:
    Hash_Table()
    {
        hash_type = "integer hash";
        constexpr bool integer_type = is_same<T, int>::value;
        constexpr bool string_type = is_same<T, string>::value;
        
        // check for key type. class invariance 
        if(   (integer_type == false) 
            &&(string_type == false) )
        {
            cout << "invalid key type" << endl;
            throw error_codes::invalid_key_type;
        }
        
        if(string_type == true)
        {
            hash_type = "string hash";
        }
        
        cout << hash_type << " has been created" << endl;
    }
    
    ~Hash_Table()
    {
        cout << hash_type << " has been destroyed" << endl;
    }
    
    int insert_value(T val)
    {
        int status = -1;
        int key = hash_key(val);
        
        if(key != -1)
        {
            // found a valid key
            h_tables[key].push_back(val);
            status = 0;
        }
        
        return status;
        
    }
    
    int delete_value(T val)
    {
        int status = -1;
        int key = hash_key(val);
        
        if(key != -1)
        {
            auto itr_start = h_tables[key].begin();
            auto itr_end = h_tables[key].end();
            
            for(auto itr = itr_start; itr != itr_end; ++itr)
            {
                if(*itr == val)
                {
                    // found the val
                    h_tables[key].erase(itr);
                    status = 0;
                    break;
                }
            }
        }
        
        return status;
    }
};

You can see that Hash_Table class is a template class. I have defined the class invariant that the data type should be either an integer data type or a string data type. Any other data type will throw an exception from the constructor.

Let us see the result of executing the following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
int main()
{
    int unsorted_array[] = {5, 2, 6, 7, 1, 10, 11, 4, 8};
    int status = 0;
    
    cout << "\nTesting integer hash..." << endl;
    Hash_Table<int, 5> ht_1;
    
    for (auto val : unsorted_array)
    {
        status = ht_1.insert_value(val);
        if(status == 0)
        {
            cout << "Val : " << val << " added into hash table" << endl;
        }
        else
        {
            cout << "Val : " << val << " not added into hash table" << endl;
        }
    }
    
    cout << "\nTesting deletion..." << endl;
    //delete 2
    if( 0 == ht_1.delete_value(2))
    {
        cout << "2 has been deleted from hash table" << endl;
    }
    else
    {
        cout << "2 not deleted from hash table" << endl;
    }
    
    //delete 10
    if( 0 == ht_1.delete_value(10))
    {
        cout << "10 has been deleted from hash table" << endl;
    }
    else
    {
        cout << "10 not deleted from hash table" << endl;
    }
    
    //delete 10 again
    if( 0 == ht_1.delete_value(10))
    {
        cout << "10 has been deleted from hash table" << endl;
    }
    else
    {
        cout << "10 not deleted from hash table" << endl;
    }
    
    // Now testing string hash
    cout << "\nTesting string hash..." << endl;
    Hash_Table<string, 5> ht_2;
    
    if(0 == ht_2.insert_value("Hello_String_1"))
    {
        cout << "Hello_String_1 added into Hash_Table" << endl;
    }
    else
    {
        cout << "Hello_String_1 is not added into Hash_Table" << endl;
    }
    
    if(0 ==    ht_2.insert_value("Hello_Another_String_1"))
    {
        cout << "Hello_Another_String_1 is added into Hash_Table" << endl;
    }
    else
    {
        cout << "Hello_Another_String_1 is deleted from Hash_Table" << endl;
    }
    
    cout << "\nTesting deletion..." << endl;
    
    // delete both strings 
    if(0 == ht_2.delete_value("Hello_String_1"))
    {
        cout << "Hello_String_1 is deleted from Hash_Table" << endl;
    }
    else
    {
        cout << "Hello_String_1 is not deleted from Hash_Table" << endl;
    }
    
    if(0 ==    ht_2.delete_value("Hello_Another_String_1"))
    {
        cout << "Hello_Another_String_1 is deleted from Hash_Table " << endl;
    }
    else
    {
        cout << "Hello_Another_String_1 is not deleted from Hash_Table" << endl;
    }
    
    cout << "\nMain ends" << endl;
    
    return 0;
}

Result:

Testing integer hash...
integer hash has been created
Val : 5 added into hash table
Val : 2 added into hash table
Val : 6 added into hash table
Val : 7 added into hash table
Val : 1 added into hash table
Val : 10 added into hash table
Val : 11 added into hash table
Val : 4 added into hash table
Val : 8 added into hash table

Testing deletion...
2 has been deleted from hash table
10 has been deleted from hash table
10 not deleted from hash table

Testing string hash...
string hash has been created
Hello_String_1 added into Hash_Table
Hello_Another_String_1 is added into Hash_Table

Testing deletion...
Hello_String_1 is deleted from Hash_Table
Hello_Another_String_1 is deleted from Hash_Table

Main ends
string hash has been destroyed
integer hash has been destroyed

See how our Hash table works for both integer and strings? It is possible to further improvise our Hash_Table class to support unsigned integers and floats as well. Also, we can add functions to print the data stored in the hash table. I’ll let you guys to look into those aspects of improvisation.

For now, we can use this Hash_Table implementation for our understanding purposes.

Enjoyed the chapter? Let me know in the comments below. Thanks! 😊

0

Leave a Reply