Hash Collision

Abstract

Occurs when running Hash Function on two different inputs and we obtain the same output
In the above example, $12836 % 100$ and $20336 % 100$ result in the same value $36$
There 3 approaches to handle - Hash Map Expansion, Separate Chaining and Open Addressing

Hash Map Expansion

Expand the output space to avoid Hash Collision totally if expansion is big enough
Reduce the chance of hash collision if expansion isn’t big enough

In the above example, when $136 % 100$ and $236 % 100$ result in the same output $36$ , we can double the output space to avoid the collision
After expansion, $136 % 200 = 136$ , $236 % 200 = 36$

Intense Computation Needed

With the change of the output space size, it requires us to re-run the Hash Function on all keys to obtain the new Bucket

It is very time consuming to recalculate hashes and move all key-value pairs from old arrays to new arrays

So the best practice is to reserve a pretty decent space to avoid frequent expanding

Load Factor

A faction of total elements inside Hash Map and total buckets inside the hash map
Measures the severity of Hash Collision, acts as a triggering condition for Hash Map Expansion

Java

When Load Factor exceeds 0.75, Hash Map Expansion will be triggered to expand twice the original size

Separate Chaining

Use each Bucket in the underlying output space of Hash Map to hold a Linked List node. We put all the conflicted key-value pairs as nodes in the Linked List. This allows us to live with a certain degree of Hash Collision. Thus, reducing the frequency of Hash Map Expansion
You can find the code implementation of separate chaining here

More Memory Usage

Each key-value pair node needs space to store a pointer to point to the next node

Linear Efficient

Operation like searching, insertion, and deletion takes $O (n)$ time on Linked List.

We can converted the long Linked List to AVL Tree or Red Black Tree to optimise the search efficiency from $O (n)$ to $O (l o g n)$ .

Implementation in Programming Languages

Java

Starting from JDK 1.8, when Hash Map length reaches 64 & the Linked List length is 8. Linked List will be converted to Red Black Tree to improve performance

Golang

Every Bucket has maximum 8 key-value pairs. Once exceeded, it will be linked to a overflowing bucket

When there is too many overflowing buckets, Hash Map Expansion will be performed to ensure performance

Mechanism of Common Operations

Search key-value pair with key

Obtain the index of bucket by passing the key to the Hash Function

Obtain the starting node of the Linked List with the index

Iterate through the Linked List to find key-value pair node that matches with the given key

Add key-value pair

We perform Search key-value pair with key to ensure it is a new key-value pair we want to add

If unable to find, we add a new node with the key-value pair to the Linked List. Else we return duplicated key error

Delete key-value pair

We perform Search key-value pair with key to ensure the key we want to delete exists

Delete the key-value pair node if there is one

Open Addressing

Similar to Separate Chaining, it is an approach to live with Hash Collision, so as to reduce the frequency of Hash Map Expansion
But unlike Separate Chaining, open addressing doesn’t leverage on extra data structures like Linked List to live with. There are mainly 2 approaches - Linear Probing and Double Hashing

Delete key-value pair

We need a special indicator to remove a key-value pair. If we just make it empty, the key-value pairs stored after it will be ignored when we are trying to search a key-value pair that has Hash Collision with the deleted key-value pair. Because the key-value pair may be stored right after the deleted key-value pair

Linear Probing

The idea is to find the next empty Bucket for the conflicted key-value pair linearly

Key-value pair clustering

When the array is filled up with key-value pairs in a continuous manner. We may need to iterate through many key-value pairs, including key-value pairs that has a hash index to get/delete the desired key-value pair

Implementation in Programming Language

Python

Uses Pseudorandomness to find the next empty bucket for conflicted key-value pair

Mechanism of Common Operations

Search key-value pair with key

Obtain the Array index by passing the key to the Hash Function

If the key-value pair in the bucket doesn’t match with the key, iterate through the buckets linearly until the correct key-value is found

Else, if the bucket is empty, then key-value pair isn’t inside, return None

Add key-value pair

Obtain the Array index by passing the key to the Hash Function

If the bucket already has a key-value pair, then iterating linearly until a empty bucket is found, and then insert the key-value pair

Delete key-value pair

Obtain the Array index by passing the key to the Hash Function

Iterating through linearly until the desired key-value pair is found

Deleting the key-value pair by placing a special indicator in the bucket to denote the bucket is empty and it shouldn’t break the iteration flow

Double Hashing

The idea here to find the next empty bucket for the conflicted key-value pair by executing the Hash Function multiple times

More Distributed

Not easy to have key-value pair clustering

More Computational Demanding

More Hash Function performed means more computation is required

Mechanism of Common Operations

Search key-value pair with key

Obtain the Array index by passing the key to the Hash Function

If the desired key-value pair isn’t inside the bucket, perform the Hash Function again until we find the correct key-value pair

If empty bucket is countered, then the desired key-value isn’t stored in the Hash Map

Add key-value pair

Obtain the Array index by passing the key to the Hash Function

If the bucket already has a key-value pair, then keep performing Hash Function until an empty bucket is found, then insert the key-value pair

CS Notes

Recent Updates

Subtyping

Datatype

Binary Search in Rotated Sorted Array

Explorer

Hash Collision

Abstract

Hash Map Expansion

Load Factor

Separate Chaining

Open Addressing

Linear Probing

Double Hashing

Table of Contents

Backlinks

Graph View