Abstract


Hash Map Expansion


  • Expand the output space to avoid Hash Collision totally if expansion is big enough
  • Reduce the chance of hash collision if expansion isn’t big enough

  • In the above example, when and result in the same output , we can double the output space to avoid the collision
  • After expansion, ,

Intense Computation Needed

With the change of the output space size, it requires us to re-run the Hash Function on all keys to obtain the new Bucket

It is very time consuming to recalculate hashes and move all key-value pairs from old arrays to new arrays

So the best practice is to reserve a pretty decent space to avoid frequent expanding

Load Factor

Separate Chaining


  • Use each Bucket in the underlying output space of Hash Map to hold a Linked List node. We put all the conflicted key-value pairs as nodes in the Linked List. This allows us to live with a certain degree of Hash Collision. Thus, reducing the frequency of Hash Map Expansion
  • You can find the code implementation of separate chaining here

More Memory Usage

Each key-value pair node needs space to store a pointer to point to the next node

Linear Efficient

Operation like searching, insertion, and deletion takes time on Linked List.

We can converted the long Linked List to AVL Tree or Red Black Tree to optimise the search efficiency from to .

Mechanism of Common Operations

Open Addressing


Delete key-value pair

We need a special indicator to remove a key-value pair. If we just make it empty, the key-value pairs stored after it will be ignored when we are trying to search a key-value pair that has Hash Collision with the deleted key-value pair. Because the key-value pair may be stored right after the deleted key-value pair

Linear Probing

  • The idea is to find the next empty Bucket for the conflicted key-value pair linearly

Key-value pair clustering

When the array is filled up with key-value pairs in a continuous manner. We may need to iterate through many key-value pairs, including key-value pairs that has a hash index to get/delete the desired key-value pair

Mechanism of Common Operations

Double Hashing

  • The idea here to find the next empty bucket for the conflicted key-value pair by executing the Hash Function multiple times

More Distributed

Not easy to have key-value pair clustering

More Computational Demanding

More Hash Function performed means more computation is required

Mechanism of Common Operations