📅 Sat, Apr 6, 2024 ⏲️ ~ 5 minutes Intermediate

What are and how to use HashSet

HASH SET, also called HashTables or simply Sets, are unordered collections of unique elements. This means that there can be no duplicates in a set and the order of the elements does not matter.

Therefore, the only operations allowed by a HASH SET are adding, removing, and checking the existence of an element in the collection.

It uses a technique called hashing to internally organize the elements, allowing for quick retrieval and very fast element searches.

Properties

Property	Set
How often you’ll use it	🔻
Is mutable	✔️
Is ordered	❌
Is indexable	❌
Allows duplicates	❌

When to use a HashSet

The main use of a HASH SET is for efficient searching of elements. Imagine you have a series of collectible cards, and you want to know if you already have that card in your collection. The famous “I have it, I don’t have it” of children.

hashset-cuando-usar

With a HASH SET, you add each card to the collection. This tells you very quickly (much faster than an Array) if you already have that card added previously.

The HASH SET allows you to store elements in a collection know if you already have that element and retrieve it if necessary. Therefore, the main use is the search for elements, duplicates, etc.

Finally, the HASH SET has a “sibling”, the DICTIONARY. This operates quite similarly to a HashSet, but with key-value pairs.

HashSet Examples in Different Languages

The syntax for using a HashSet may vary depending on the programming language being used.

Next, we will see examples in some popular languages:

HashSet<string> names = new HashSet<string>();
names.Add("John");
names.Add("Mary");
names.Add("Peter");

foreach (string name in names)
{
    Console.WriteLine(name);
}

C++

#include <iostream>
#include <unordered_set>
using namespace std;

int main()
{
    unordered_set<string> names;
    names.insert("John");
    names.insert("Mary");
    names.insert("Peter");

    for (const string& name : names)
    {
        cout << name << endl;
    }

    return 0;
}

Javascript

const names = new Set();
names.add("John");
names.add("Mary");
names.add("Peter");

for (const name of names)
{
    console.log(name);
}

Python

names = set()
names.add("John")
names.add("Mary")
names.add("Peter")

for name in names:
    print(name)

In the previous examples, we create a HashSet (using the HashSet class in C# and C++, and the Set object in JavaScript and Python) and add several elements to it. Then, we iterate through the elements of the HashSet and display them on the screen.

Internal Operation Advanced

Internally, a HASH SET works similarly to a dynamic array, but uses a hash function to generate numerical indices for the elements.

A hash function is a function that takes an element as input and returns a unique value associated with that element. This unique value is used as an index to store and retrieve the object in the HashSet.

curso-programacion-hash

The “trick” of a hash function is that it is designed to provide unique and evenly distributed results for each element. It seems very complicated, but a simple example of a hash function could simply be to calculate the modulo of the element value (in binary) with respect to the size of the HASH SET.

When an element is added to the HASH SET, the hash function is applied to the object to determine its storage index. In that position, the element is saved in the position given by its hash function.

curso-programacion-hashset

Now, if we have to check if our HASH SET contains an element, it simply calculates the hash function of the element and checks if the position is occupied or not. If it is occupied, it already contains it, and if it is empty, it does not.

If by adding elements our HASH SET becomes full, as it is a dynamic array, it expands dynamically transparently for the user. In addition, the hash function is changed for the new size, and the hashes of the stored elements must be recalculated.

This is also the reason why a HashSet is unordered. The elements are internally saved “where they belong”, which has nothing to do with the order in which we introduced them.

The hash function ensures that the elements are stored and searched for efficiently, as the search is performed directly in the position calculated by the hash function, without the need to go through all the elements.

Efficiency

For a HASH SET, which is a collection in which the elements are not ordered and does not allow duplicates, the table would look like this:

Operation	Dictionary
Sequential access	⚫
Random access	🟢
Add at the beginning	⚫
Remove at the beginning	⚫
Add at the end	⚫
Remove at the end	⚫
Random insertion	🟡
Random removal	🟢
Search	🟢

In the case of the HASH SET, it does not make sense to talk about sequential access, adding at the beginning or removing at the beginning, since the elements are not in a specific order and cannot be accessed directly by index. It also does not make sense to talk about adding or removing at the end, since the elements are not in a specific order.

It is possible to add elements “in the order they belong”. That operation is O(1), unless we have to expand the HashSet, in which case it is O(n).

The most relevant operation for a HASH SET is the search, which is based on hashing. This allows for quick and constant access to search for elements by their value, which is O(1).

Read more about collection efficiency read more ⯈