C++ Tutorial: Intro to Hash Tables

June 21st, 2014

In this tutorial, we will be taking an introductory look at hash tables in C++. Hash tables are useful for storing large amounts of data in an efficient manner. They are also an essential aspect of most undergraduate computer science courses, so study up! Near the bottom of this post, you can find everything you need to create a C++ program that implements a hash table. (You can also download the source files you need from the Github link here).The example program will create a simple table for storing words.

Note: This tutorial assumes that you are familiar with arrays, pointers, modular arithmetic, algorithm efficiency, and linked lists especially. To learn more about linked lists, please see the previous tutorial on linked lists. With that in mind, let’s begin!

Contents:

  1. What is a Hash Table?
  2. What is a Hash Function?
  3. What are Hash Collisions?
  4. Designing the Hash Table
  5. Coding the Hash Table
    • LinkedList.h
    • LinkedList.cpp
    • HashTable.h
    • HashTable.cpp
    • main.cpp
  6. References

What is a Hash Table?

A hash table is a data structure for storing key-value pairs. Unlike a basic array, which uses index numbers for accessing elements, a hash table uses keys to look up table entries. This makes data management more manageable for the human user since its easier to catalog data entries by their attributes rather than their count in a giant list.

Hash_Tab_2

In C++, we implement a hash table as an array of linked lists. Its sort of like a multidimensional array. In a two-dimensional array, for instance, the elements consist of rows of a fixed length. In a hash table, however, the elements (a.k.a. buckets) can expand or shrink to accommodate a virtually infinite number of table entries.

In terms of efficiency, a hash table is a compromise between an array and a linked list. It uses both indexing and list traversal to store and retrieve data elements.

Looking up elements by index makes an array very efficient. No matter where an item is stored in the array, it always takes the same amount of time to retrieve it. In technical terms, getting an item from an array is an O(1) or “constant time operation.

Looking up elements in a linked list is a lot less efficient. You can’t just directly access any node in the list. Instead, you have to traverse down the list until you find the target item. If the item you are looking for happens to be at the front of the list, the retrieval is an O(1) operation since you only traversed down one node. If the item is at the end of the list, retrieving it would be an O(n) operation, where n is the total number of nodes in the list.

To summarize, as the number of elements increases in an array, the runtime to access an element by its index remains constant. With a linked list, the time it takes to access a particular element increases linearly with the number of elements.

What is a Hash Function?

A hash function decides where to store and retrieve items in a hash table. It takes in an item key as its parameter and returns an index location for that particular item. Generally, a hash function uses modulo arithmetic. In particular, a key value is divided by the table length to generate an index number in the table. This index number refers to a location, or bucket, in the hash table.

HashFunction_7

Here’s an example: Suppose the hash function takes in a string as its parameter, and then it adds up the ASCII values of all of the characters in that string to get an integer. If the key is pumpkin, then the sum of the ASCII values would be 772. After that, the hash function takes the modulus of this number by the table length to get an index number. If the table length is 13, then 772 modulo 13 is 5. So the item with the key pumpkin, would go into bucket # 5 in the hash table.

int hash( string key )
{
    int value = 0;
    for ( int i = 0; i < key.length(); i++ )
        value += key[i];
    return value % tableLength;
}

What are Hash Collisions?

Occasionally, different words will generate the same hash function result. For example, the keys “tar” and “rat” will both hash to the index 2 if there are 13 buckets total in the hash table. When two keys hash to the same index, we get a hash collision.

To maximize the efficiency of storing and retrieving items in the hash table, we need to minimize the number of collisions. Each collision is another item in the bucket; another node in the linked list. We know that storing and retrieving items in a linked list grows more inefficient as the list grows longer. So it makes sense to keep these linked lists as short as possible.

There are several ways to write a good hash function. Making the table size a prime number is a good way to minimize collisions. In general, a good hash function fills out a hash table fairly uniformly. The printHistogram method of the HashTable class in the code example below lets you visualize the hash table. Ideally, the distribution of items should be fairly balanced.

Hash Table Contains 25 Items total
1:	 X X X X
2:	 X
3:	 X
4:	 X
5:	 X
6:	 X
7:	 X X X
8:	 X X X
9:	 X X X
10:	 X
11:	 X X X
12:	
13:	 X X X

Designing the Hash Table

Before we begin coding, its helpful to design the classes we will need for the program and how they will all work together. In the image below, we have UML (Universal Modeling Language) diagrams for the objects we will need in the program. First we start with a simple struct for representing items, called Item. From there, we can build a LinkedList from Items. Finally, we build a HashTable from a dynamic array of LinkedList objects. UML_HashTable

The Code

Copy and paste this code into your project:

**(June 1st 2015 Update: Thank you, David, for detecting my errors and helping me improve the code!)***

//*****************************************************************
// HashTable.cpp
// HashTable
//
// Created by Kar Beringer on June 18, 2014.
//
// This header file contains the Hash Table class definition.
// Hash Table array elements consist of Linked List objects.
//*****************************************************************
#include HashTable.h
// Constructs the empty Hash Table object.
// Array length is set to 13 by default.
HashTable::HashTable( int tableLength )
{
if (tableLength <= 0) tableLength = 13;
array = new LinkedList[ tableLength ];
length = tableLength;
}
// Returns an array location for a given item key.
int HashTable::hash( string itemKey )
{
int value = 0;
for ( int i = 0; i < itemKey.length(); i++ )
value += itemKey[i];
return (value * itemKey.length() ) % length;
}
// Adds an item to the Hash Table.
void HashTable::insertItem( Item * newItem )
{
int index = hash( newItem -> key );
array[ index ].insertItem( newItem );
}
// Deletes an Item by key from the Hash Table.
// Returns true if the operation is successful.
bool HashTable::removeItem( string itemKey )
{
int index = hash( itemKey );
return array[ index ].removeItem( itemKey );
}
// Returns an item from the Hash Table by key.
// If the item isn’t found, a null pointer is returned.
Item * HashTable::getItemByKey( string itemKey )
{
int index = hash( itemKey );
return array[ index ].getItem( itemKey );
}
// Display the contents of the Hash Table to console window.
void HashTable::printTable()
{
cout << \n\nHash Table:\n;
for ( int i = 0; i < length; i++ )
{
cout << Bucket << i + 1 << : ;
array[i].printList();
}
}
// Prints a histogram illustrating the Item distribution.
void HashTable::printHistogram()
{
cout << \n\nHash Table Contains ;
cout << getNumberOfItems() << Items total\n;
for ( int i = 0; i < length; i++ )
{
cout << i + 1 << :\t;
for ( int j = 0; j < array[i].getLength(); j++ )
cout << X;
cout << \n;
}
}
// Returns the number of locations in the Hash Table.
int HashTable::getLength()
{
return length;
}
// Returns the number of Items in the Hash Table.
int HashTable::getNumberOfItems()
{
int itemCount = 0;
for ( int i = 0; i < length; i++ )
{
itemCount += array[i].getLength();
}
return itemCount;
}
// De-allocates all memory used for the Hash Table.
HashTable::~HashTable()
{
delete [] array;
}
//*****************************************************************
// End of File
//*****************************************************************
view raw HashTable.cpp hosted with ❤ by GitHub
//*****************************************************************
// HashTable.h
// HashTable
//
// Created by Karlina Beringer on June 18, 2014.
//
// This header file contains the Hash Table class declaration.
// Hash Table array elements consist of Linked List objects.
//*****************************************************************
#ifndef HashTable_h
#define HashTable_h
#include LinkedList.h
//*****************************************************************
// Hash Table objects store a fixed number of Linked Lists.
//*****************************************************************
class HashTable
{
private:
// Array is a reference to an array of Linked Lists.
LinkedList * array;
// Length is the size of the Hash Table array.
int length;
// Returns an array location for a given item key.
int hash( string itemKey );
public:
// Constructs the empty Hash Table object.
// Array length is set to 13 by default.
HashTable( int tableLength = 13 );
// Adds an item to the Hash Table.
void insertItem( Item * newItem );
// Deletes an Item by key from the Hash Table.
// Returns true if the operation is successful.
bool removeItem( string itemKey );
// Returns an item from the Hash Table by key.
// If the item isn’t found, a null pointer is returned.
Item * getItemByKey( string itemKey );
// Display the contents of the Hash Table to console window.
void printTable();
// Prints a histogram illustrating the Item distribution.
void printHistogram();
// Returns the number of locations in the Hash Table.
int getLength();
// Returns the number of Items in the Hash Table.
int getNumberOfItems();
// De-allocates all memory used for the Hash Table.
~HashTable();
};
#endif
//*****************************************************************
// End of File
//*****************************************************************
view raw HashTable.h hosted with ❤ by GitHub
//*****************************************************************
// LinkedList.cpp
// HashTable
//
// Created by Karlina Beringer on June 16, 2014.
//
// This header file contains the Linked List class declaration.
// Hash Table array elements consist of Linked List objects.
//*****************************************************************
#include LinkedList.h
// Constructs the empty linked list object.
// Creates the head node and sets length to zero.
LinkedList::LinkedList()
{
head = new Item;
head -> next = NULL;
length = 0;
}
// Inserts an item at the end of the list.
void LinkedList::insertItem( Item * newItem )
{
if (!head -> next)
{
head -> next = newItem;
length++;
return;
}
Item * p = head;
Item * q = head;
while (q)
{
p = q;
q = p -> next;
}
p -> next = newItem;
newItem -> next = NULL;
length++;
}
// Removes an item from the list by item key.
// Returns true if the operation is successful.
bool LinkedList::removeItem( string itemKey )
{
if (!head -> next) return false;
Item * p = head;
Item * q = head;
while (q)
{
if (q -> key == itemKey)
{
p -> next = q -> next;
delete q;
length–;
return true;
}
p = q;
q = p -> next;
}
return false;
}
// Searches for an item by its key.
// Returns a reference to first match.
// Returns a NULL pointer if no match is found.
Item * LinkedList::getItem( string itemKey )
{
Item * p = head;
Item * q = head;
while (q)
{
p = q;
if ((p != head) && (p -> key == itemKey))
return p;
q = p -> next;
}
return NULL;
}
// Displays list contents to the console window.
void LinkedList::printList()
{
if (length == 0)
{
cout << \n{ }\n;
return;
}
Item * p = head;
Item * q = head;
cout << \n{ ;
while (q)
{
p = q;
if (p != head)
{
cout << p -> key;
if (p -> next) cout << , ;
else cout << ;
}
q = p -> next;
}
cout << }\n;
}
// Returns the length of the list.
int LinkedList::getLength()
{
return length;
}
// De-allocates list memory when the program terminates.
LinkedList::~LinkedList()
{
Item * p = head;
Item * q = head;
while (q)
{
p = q;
q = p -> next;
if (q) delete p;
}
}
//*****************************************************************
// End of File
//*****************************************************************
view raw LinkedList.cpp hosted with ❤ by GitHub
//*****************************************************************
// LinkedList.h
// HashTable
//
// Created by Karlina Beringer on June 16, 2014.
//
// This header file contains the Linked List class declaration.
// Hash Table array elements consist of Linked List objects.
//*****************************************************************
#ifndef LinkedList_h
#define LinkedList_h
#include <iostream>
#include <string>
using namespace std;
//*****************************************************************
// List items are keys with pointers to the next item.
//*****************************************************************
struct Item
{
string key;
Item * next;
};
//*****************************************************************
// Linked lists store a variable number of items.
//*****************************************************************
class LinkedList
{
private:
// Head is a reference to a list of data nodes.
Item * head;
// Length is the number of data nodes.
int length;
public:
// Constructs the empty linked list object.
// Creates the head node and sets length to zero.
LinkedList();
// Inserts an item at the end of the list.
void insertItem( Item * newItem );
// Removes an item from the list by item key.
// Returns true if the operation is successful.
bool removeItem( string itemKey );
// Searches for an item by its key.
// Returns a reference to first match.
// Returns a NULL pointer if no match is found.
Item * getItem( string itemKey );
// Displays list contents to the console window.
void printList();
// Returns the length of the list.
int getLength();
// De-allocates list memory when the program terminates.
~LinkedList();
};
#endif
//*****************************************************************
// End of File
//*****************************************************************
view raw LinkedList.h hosted with ❤ by GitHub
//**************************************************************
// main.cpp
// HashTable
//
// Created by Kar Beringer on June 19, 2014.
// Demonstrate a simple Hash Table in C++.
// Implements a Linked List class.
//**************************************************************
#include HashTable.h
int main()
{
// Create 26 Items to store in the Hash Table.
Item * A = new Item {Apple, NULL};
Item * B = new Item {Banana, NULL};
Item * C = new Item {Caterpillar, NULL};
Item * D = new Item {Dog, NULL};
Item * E = new Item {Elephant, NULL};
Item * F = new Item {Fedora, NULL};
Item * G = new Item {Goosebumps, NULL};
Item * H = new Item {House, NULL};
Item * I = new Item {Insects, NULL};
Item * J = new Item {Jam, NULL};
Item * K = new Item {Kite, NULL};
Item * L = new Item {Limestone, NULL};
Item * M = new Item {Mountaineering, NULL};
Item * N = new Item {Night, NULL};
Item * O = new Item {Open Sesame, NULL};
Item * P = new Item {Potatoes, NULL};
Item * Q = new Item {Quantum Mechanics, NULL};
Item * R = new Item {Rrrrrrrrrrawr, NULL};
Item * S = new Item {Snakes, NULL};
Item * T = new Item {Tizzy Tube, NULL};
Item * U = new Item {Underworld, NULL};
Item * V = new Item {Volcanic Ash, NULL};
Item * W = new Item {Who When What Why, NULL};
Item * X = new Item {XXX, NULL};
Item * Y = new Item {Yellow, NULL};
Item * Z = new Item {Zest of Lemon, NULL};
// Create a Hash Table of 13 Linked List elements.
HashTable table;
// Add 3 Items to Hash Table.
table.insertItem(A);
table.insertItem(B);
table.insertItem(C);
table.printTable();
table.printHistogram();
// Remove one item from Hash Table.
table.removeItem(Apple);
table.printTable();
table.printHistogram();
// Add 23 items to Hash Table.
table.insertItem(D);
table.insertItem(E);
table.insertItem(F);
table.insertItem(G);
table.insertItem(H);
table.insertItem(I);
table.insertItem(J);
table.insertItem(K);
table.insertItem(L);
table.insertItem(M);
table.insertItem(N);
table.insertItem(O);
table.insertItem(P);
table.insertItem(Q);
table.insertItem(R);
table.insertItem(S);
table.insertItem(T);
table.insertItem(U);
table.insertItem(V);
table.insertItem(W);
table.insertItem(X);
table.insertItem(Y);
table.insertItem(Z);
table.printTable();
table.printHistogram();
// Look up an item in the hash table
Item * result = table.getItemByKey(Snakes);
cout << result -> key << endl;
return 0;
}
view raw main.cpp hosted with ❤ by GitHub

References:

  1. Carrano, Frank. Timothy, Henry. Data Abstraction & Problem Solving with C : Walls and Mirrors. Boston: Pearson, 2013. Print.
  2. “Hash Functions.” SparkNotes. 2014. Web. http://www.sparknotes.com/cs/searching/hashtables/section2.rhtml
  3. Suh, Eric. “Hash Tables.” Cprogramming.com. 2011. Web. http://www.cprogramming.com/tutorial/computersciencetheory/hash-table.html
  4. Walker, Julienne. “The Art of Hashing.” Eternally Confused. 2014. Web. http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

Categories: mathtechnologytutorials