Dipty's Diary : September 2018

Monday, September 10, 2018

Implementation of Hash Table with chaining

In my last post about hash table I have explained how hash table with chaining works. Now let's try to implement that.

First thing we'll do is - create a class called HashTable.

In this class at first we'll declare a private variable called HASH_MAX that will be the size of hash table. Next we'll create a class called linkedList, it works same as the Node class we made in the linked list implementation. And after this will declare the hashTable array that is linkedList type.

Now will declare the constructor and functions we need. First function is the hash function that we've talked about in last post. Next we'll declare the insertItem() function through which we'll insert our hash table values. And the last two functions are numberOfItemsInIndex() that returns size of every index and printTable() will print the values of the hash table.

Now let's see in details how these constructors and functions work.

HashTable():

In this constructor we're just putting default values to every index of the hashTable. First we are creating a new linkedList and assigning it in hashTable's every index, then we are giving a default data to the linked list of that index and giving default value to the next pointer.

hashFunction(string key):

This is the same hashFunction we used in the last post. we're just adding up all the ASCII values of the string and returning sum mod HASH_MAX.

insertItem(string key):

For insertion first we'll pass the string through the hashFucntion and initialize the returned value in the index variable. Then we'll check if any value is saved in that index or not. If the data element of the first element is equal to "empty"(that is out default value) then there is no value saved in this index, so we'll just initialize the data of the first element of that list as the string we provided. But if there is one or more values saved in that linked list, then we'll create a pointer called tmp and initialize it with hashTable[index] linkedList and then create a new node for this linked list called x and will initialize data of this node with the provided string and the next pointer of this node with NULL. After that we'll run a loop till end of the list and when we'll go to the last element, we'll just initialize it's next pointer with the new node (i.,e x) we created.

numberOfItemInIndex(int index):

Thais fuction takes a index number as parameter and returns how many strings are saved in that linked list. First we check if the list is empty of not, if it is empty we'll return 0 else we'll increase the count then declare a linkedList type poiter called tmp and initialize it with linkedList of the provided index. Then we'll just run loop till end of the list and increase the count variable and then return the value.

printTable():

This function is pretty self explanatory. We've just print the size and elements of the linked list of every index. I've already explained linked list print before you can see that.
Full Source Code

Saturday, September 8, 2018

Intro to Hash Table with Chaining

Hash table is a data structure that combines an array with a linked list. It takes the random access ability of an array and combines it with dynamism of linked list.

If implemented well we are basically taking all the advantages of both the data structure to get insertion,deletion and search in constant time.

Hash table is a combination of two things:

Hash function: it returns a positive integer that is called hash code.
An Array: it is capable of storing data of the type we want to store

The basic idea of hash table is we take our data and run that data through the hash function, so the data is processed and return a number (called hash code) and with that number we just store the data we want to store in the data structure.

But how can we define our hash function? There is no one particular way to define a hash function. There are lot of good and bad hash function. And as we're trying to get the operation close to constant time we'll have to implement good hash function.

A good hash function should have:

Use only the data being hashed -> we don't want to incorporate anything else other than the data
Use all the data being hashed -> we want to use all of the data
Be deterministic -> every time we pass the same data we should get the same hash code
Uniformly distribute data -> data should be spread out through out the able
Generate very different hash code for very similar data

Example of a hash table is:

In this we are taking a string ans adding ASCII value of each of the character and returning the sum mod the size of hashTable(i.e HASH_MAX) and that is our hash code and we are saving the string to that position of hash table.

But if there is two input first- "aaop" and second- "paoa" both of them will return the hash code 7 if we use the above hash function. So, which ever data will save in the hash table that will be overwritten by the later one. We call this problem collision.

Collision occurs when two pieces of data run through the same hash function and returns the same hash code.

But we want to save both the data into the hash table. So, how do we do it? one way can be linear probing.

In linear probing, if we have a collision, we try to place data in the next consecutive element in the array until we find a vacancy.So, if we can not put any of the two strings in hash code 7 then we'll try to put it in 8 and if that is also not vacant we'll increase hash code until we find it. That means we are stretching away from constant time and leaning toward order of n with this process, and this problem is called clustering.

So, linear probing is subject to a problem called clustering. That means once there is a miss, two adjacent cells will contain data, making more likely that in the future cluster will grow.

The other problem is in our code we are still have room for specific amount of string(i.e HASH_MAX) that means we are still limited. We can only store as much data as we have location in the array.

So, how can we solve these problems? This is where chaining comes in play. And this is where we bring the linked list back to the picture. What if each element of array holding one piece of data, it held multiple pieces of data?

But that doesn't make sense, we know each element of array can only one piece of data of a data type. But what if that data type is a linked list?So, what if every element of an array is a pointer to the head of a linked list? Then we can build these linked list and grow them arbitrarily.

So, we start to grow these chains out of the array locations. Now we can store an arbitrary amount of data into the hash table without ever running into collision.

And we know that when we insert something in the front of the linked list, that is a O(1) operation.

But we know search or find function in linked list takes O(n) time. But instead of having one linked list, we now have HASH_MAX(i.e size of array) number of linked list where HASH_MAX can be 10 or 1000 whatever the size of array is, then time complexity is O(n/10) or O(n/1000).

We know theoretically we disregard constants when we measure complexity, but in real world these things matter. We will notice this happens to run 10 times or 1000 times faster, because we're distributing one long chain (linked list) across the 1000 smaller chains.So, each time we search through one of those chains we can ignore the other 999 chains we don't care about.

As we are focusing on only one linked list which on average going to be thousand times shorter. So, we still are tending toward the average case of being constant time.

Chaining works like this:

We take the hash code and make it point to the node which contains the string. And in collision i.e if we try to put "paoa" in the hash table:

We make a new node with that data and put it in end of the list that was pointed by 7.

I will show the implementation in my next post.

Friday, September 7, 2018

Linked List Functions Implementation

In my last post about linked list I talked about how we can insert elements in linked list and how to print those elements. But there are a lot of other functions that we can perform on linked list, I am listing some them below and their functionalities:

pushFront() -> adds element to the front of the list
topFront() -> returns the data of the first element
popFront() -> removes first element of the list
pushBack() -> adds element at the end of the list
popBack() -> removes the last element from the list
find(data) -> returns true or false depending on weather the value exists in the list or not
erase(data) -> removes element from the list
empty() -> returns boolean value depending on weather the list is empty or not
addsBefore(node,data) -> adds element before a node

Now I'll implement some of these functions.

pushFront():

To push an element in the front we'll just have to create a new node insert the data and initialize it's next pointer with head that means new node will point to previous first elements and then initialize head with currentNode because now this is the first element. As all of these operations takes constant time, time complexity of pushFront() is order of 1 i.e. O(1).

topFront():

To get the data of the list we'll just have to return data of head. It takes constant time.

popFront():

To delete the first element we'll just have to delete the first element and update the head pointer with the next elements reference. It takes constant time.

pushBack():

To push elements at the end of the list is similar to pushFront only difference is this time we'll update tmp because that is what we're using to keep track of the tail of the list.

find(data):

To find any data we'll have to traverse the list until we find it or if it is not on the list then till the end of list. In worst case this takes order of n i.e. O(n) time.
Full Source Code in C++
Full Source Code in Python3

Wednesday, September 5, 2018

Intro to Linked List

Linked list is linear data structure where each element consists of two things: data and information for next node.

To implement Linked list in C++ first we'll make a class called Node that will have two fields.

*next pointer will contain the reference for next node and data will have the value. After this will create three Node type global pointers.

Now to insert nodes at the end of linked list we'll create a function called insert_elements(). This function will take the input value as the parameter.

We have created a new node by writing: "new Node" and initialized it to newNode. That means now the pointer newNode is pointing to the new node.

At first we have to check if head is equal to NULL or not. If there is no element in linked list, head will be NULL. So, we'll initialize it with newNode and we'll also initialize tmp with newNode. Now all three pointers is pointing to the same node and as we're not initializing the *next pointer it points to NULL.

If the linked list is not empty that means the head is not NULL, then we'll just insert the value in the variable data and initialize the previous nodes *next pointer to the current node and initialze *tmp to now point at current node.

If we want to print the values inserted in the linked list we can create a function print_elements().

we'll make a node type pointer that points to the head that means the first node of the linked list and we'll iterate loop until currentNode is NULL and print the data of each node. currentNode will be NULL when we reach the last node.
Full Source code

Dipty's Diary