Dipty's Diary

Wednesday, November 21, 2018

Introduction To Binary Search Tree

A tree data structure is a way to hold data that looks like a tree when it's visualized. For example:

All data points in a tree is called nodes. The top of the tree is called the root node and from the root it branches out into additional nodes called child node. Each of these child nodes may have more child nodes. Nodes with branches leading to other nodes are called the parent. And nodes that branches out from the parent is called child node and each children that parent of other tree makes their own subtree. Branches at the end of the tree which doesn't have any child is called leaf nodes.

What is Binary Search Tree?

A tree data structure can have any number of branches from every node but in binary tree each node only has two branches. And binary search tree is binary tree that is ordered. That means each left subtree is less than or equal to the parent node and each right subtree is greater than or equal to the parent node. In fact the picture above is a example to binary search tree.

Why Use Binary Search Tree?

Because the use the principle of binary search, on average the operations are able to skips about half of the tree. So, for each for search, insertion and deletion takes time proportional to the logarithm of number of items stored in the tree. For example, if we are looking for a small value, we only have to search through the left nodes and ignore the right children.

Binary search tree is also memory efficient because in array or in hash table we have take specific fixed amount of memory space and some of the indexes may remain empty. But in binary search tree we store the data by reference. As we add new data we take a chunk of memory and link to it.

When to Use Binary Search Tree?

Binary search tree is useful in case where elements can be compared to less than or greater than manner. For example- if we have bunch of names saved and we want to search a particular name.

Binary search tree will save the data in alphabetical order:

If we search for the name "orin' , we'll start at the root as the search key is less than the

root (orin < paul) then we'll go to the left child(nasrn). The left child less than the searched key

(orin > nasrin). So, we'll go to the right child which is the search key.

Time complexity

Insertion: O(log n)
Deletion: O(log n)
Search: O(log n)

Monday, September 10, 2018

Implementation of Hash Table with chaining

In my last post about hash table I have explained how hash table with chaining works. Now let's try to implement that.

First thing we'll do is - create a class called HashTable.

In this class at first we'll declare a private variable called HASH_MAX that will be the size of hash table. Next we'll create a class called linkedList, it works same as the Node class we made in the linked list implementation. And after this will declare the hashTable array that is linkedList type.

Now will declare the constructor and functions we need. First function is the hash function that we've talked about in last post. Next we'll declare the insertItem() function through which we'll insert our hash table values. And the last two functions are numberOfItemsInIndex() that returns size of every index and printTable() will print the values of the hash table.

Now let's see in details how these constructors and functions work.

HashTable():

In this constructor we're just putting default values to every index of the hashTable. First we are creating a new linkedList and assigning it in hashTable's every index, then we are giving a default data to the linked list of that index and giving default value to the next pointer.

hashFunction(string key):

This is the same hashFunction we used in the last post. we're just adding up all the ASCII values of the string and returning sum mod HASH_MAX.

insertItem(string key):

For insertion first we'll pass the string through the hashFucntion and initialize the returned value in the index variable. Then we'll check if any value is saved in that index or not. If the data element of the first element is equal to "empty"(that is out default value) then there is no value saved in this index, so we'll just initialize the data of the first element of that list as the string we provided. But if there is one or more values saved in that linked list, then we'll create a pointer called tmp and initialize it with hashTable[index] linkedList and then create a new node for this linked list called x and will initialize data of this node with the provided string and the next pointer of this node with NULL. After that we'll run a loop till end of the list and when we'll go to the last element, we'll just initialize it's next pointer with the new node (i.,e x) we created.

numberOfItemInIndex(int index):

Thais fuction takes a index number as parameter and returns how many strings are saved in that linked list. First we check if the list is empty of not, if it is empty we'll return 0 else we'll increase the count then declare a linkedList type poiter called tmp and initialize it with linkedList of the provided index. Then we'll just run loop till end of the list and increase the count variable and then return the value.

printTable():

This function is pretty self explanatory. We've just print the size and elements of the linked list of every index. I've already explained linked list print before you can see that.
Full Source Code

Saturday, September 8, 2018

Intro to Hash Table with Chaining

Hash table is a data structure that combines an array with a linked list. It takes the random access ability of an array and combines it with dynamism of linked list.

If implemented well we are basically taking all the advantages of both the data structure to get insertion,deletion and search in constant time.

Hash table is a combination of two things:

Hash function: it returns a positive integer that is called hash code.
An Array: it is capable of storing data of the type we want to store

The basic idea of hash table is we take our data and run that data through the hash function, so the data is processed and return a number (called hash code) and with that number we just store the data we want to store in the data structure.

But how can we define our hash function? There is no one particular way to define a hash function. There are lot of good and bad hash function. And as we're trying to get the operation close to constant time we'll have to implement good hash function.

A good hash function should have:

Use only the data being hashed -> we don't want to incorporate anything else other than the data
Use all the data being hashed -> we want to use all of the data
Be deterministic -> every time we pass the same data we should get the same hash code
Uniformly distribute data -> data should be spread out through out the able
Generate very different hash code for very similar data

Example of a hash table is:

In this we are taking a string ans adding ASCII value of each of the character and returning the sum mod the size of hashTable(i.e HASH_MAX) and that is our hash code and we are saving the string to that position of hash table.

But if there is two input first- "aaop" and second- "paoa" both of them will return the hash code 7 if we use the above hash function. So, which ever data will save in the hash table that will be overwritten by the later one. We call this problem collision.

Collision occurs when two pieces of data run through the same hash function and returns the same hash code.

But we want to save both the data into the hash table. So, how do we do it? one way can be linear probing.

In linear probing, if we have a collision, we try to place data in the next consecutive element in the array until we find a vacancy.So, if we can not put any of the two strings in hash code 7 then we'll try to put it in 8 and if that is also not vacant we'll increase hash code until we find it. That means we are stretching away from constant time and leaning toward order of n with this process, and this problem is called clustering.

So, linear probing is subject to a problem called clustering. That means once there is a miss, two adjacent cells will contain data, making more likely that in the future cluster will grow.

The other problem is in our code we are still have room for specific amount of string(i.e HASH_MAX) that means we are still limited. We can only store as much data as we have location in the array.

So, how can we solve these problems? This is where chaining comes in play. And this is where we bring the linked list back to the picture. What if each element of array holding one piece of data, it held multiple pieces of data?

But that doesn't make sense, we know each element of array can only one piece of data of a data type. But what if that data type is a linked list?So, what if every element of an array is a pointer to the head of a linked list? Then we can build these linked list and grow them arbitrarily.

So, we start to grow these chains out of the array locations. Now we can store an arbitrary amount of data into the hash table without ever running into collision.

And we know that when we insert something in the front of the linked list, that is a O(1) operation.

But we know search or find function in linked list takes O(n) time. But instead of having one linked list, we now have HASH_MAX(i.e size of array) number of linked list where HASH_MAX can be 10 or 1000 whatever the size of array is, then time complexity is O(n/10) or O(n/1000).

We know theoretically we disregard constants when we measure complexity, but in real world these things matter. We will notice this happens to run 10 times or 1000 times faster, because we're distributing one long chain (linked list) across the 1000 smaller chains.So, each time we search through one of those chains we can ignore the other 999 chains we don't care about.

As we are focusing on only one linked list which on average going to be thousand times shorter. So, we still are tending toward the average case of being constant time.

Chaining works like this:

We take the hash code and make it point to the node which contains the string. And in collision i.e if we try to put "paoa" in the hash table:

We make a new node with that data and put it in end of the list that was pointed by 7.

I will show the implementation in my next post.

Friday, September 7, 2018

Linked List Functions Implementation

In my last post about linked list I talked about how we can insert elements in linked list and how to print those elements. But there are a lot of other functions that we can perform on linked list, I am listing some them below and their functionalities:

pushFront() -> adds element to the front of the list
topFront() -> returns the data of the first element
popFront() -> removes first element of the list
pushBack() -> adds element at the end of the list
popBack() -> removes the last element from the list
find(data) -> returns true or false depending on weather the value exists in the list or not
erase(data) -> removes element from the list
empty() -> returns boolean value depending on weather the list is empty or not
addsBefore(node,data) -> adds element before a node

Now I'll implement some of these functions.

pushFront():

To push an element in the front we'll just have to create a new node insert the data and initialize it's next pointer with head that means new node will point to previous first elements and then initialize head with currentNode because now this is the first element. As all of these operations takes constant time, time complexity of pushFront() is order of 1 i.e. O(1).

topFront():

To get the data of the list we'll just have to return data of head. It takes constant time.

popFront():

To delete the first element we'll just have to delete the first element and update the head pointer with the next elements reference. It takes constant time.

pushBack():

To push elements at the end of the list is similar to pushFront only difference is this time we'll update tmp because that is what we're using to keep track of the tail of the list.

find(data):

To find any data we'll have to traverse the list until we find it or if it is not on the list then till the end of list. In worst case this takes order of n i.e. O(n) time.
Full Source Code in C++
Full Source Code in Python3

Wednesday, September 5, 2018

Intro to Linked List

Linked list is linear data structure where each element consists of two things: data and information for next node.

To implement Linked list in C++ first we'll make a class called Node that will have two fields.

*next pointer will contain the reference for next node and data will have the value. After this will create three Node type global pointers.

Now to insert nodes at the end of linked list we'll create a function called insert_elements(). This function will take the input value as the parameter.

We have created a new node by writing: "new Node" and initialized it to newNode. That means now the pointer newNode is pointing to the new node.

At first we have to check if head is equal to NULL or not. If there is no element in linked list, head will be NULL. So, we'll initialize it with newNode and we'll also initialize tmp with newNode. Now all three pointers is pointing to the same node and as we're not initializing the *next pointer it points to NULL.

If the linked list is not empty that means the head is not NULL, then we'll just insert the value in the variable data and initialize the previous nodes *next pointer to the current node and initialze *tmp to now point at current node.

If we want to print the values inserted in the linked list we can create a function print_elements().

we'll make a node type pointer that points to the head that means the first node of the linked list and we'll iterate loop until currentNode is NULL and print the data of each node. currentNode will be NULL when we reach the last node.
Full Source code

Friday, April 27, 2018

Segment Tree

Segment tree is basically taking the recursion pattern of merge sort and turning it into data structure. So, before we start with segment tree we have to know how merge sort works. If we see the diagram below;

So, what we are doing here is- we are recursing down the array by diving it in half and then we are recursing back up and merging the sorted array. On each level while recursing down it take linear amount of time because we are loop over the array and diving in half which takes linear amount of time, then while recursing up while combing the sorted list in liniear amount of time. So we can use this information to analyze mergesort runtime. If we see the width of the array we get at the end of splitting the array , we get the width as n so, that’a O(n) and the height of the array is log n because we are splitting the array in half each time and log of something basically means how many time we can diving that in half. Now if we combine both width and height we get the area a nlog n that means the complexity cof merge sort is O(nlog n).

Now that we know how merge sort works the question is -how can we turn this into data structure? What we’ll do is we’ll consider each array of the merge sort as different nodes and use the nodes to form a tree. So, I am taking the sequence of the recursion and turning it into data structure. The tree will look like below:

So, as we can see our array has 6 element so the root node consists of all the elements from 0-5 then we are splitting the array in half so the left node consists of elements 0-2 and right node consists of elements 3-5 then keep diving those till the end so the tree looks like below

:

If you notice, I have drawn the same tree as before but here instead of putting the segments of array I’ve just put the range of those segments . So, segment tree is useful when we are solving range query type of problem. For example, If you are given an array and you have to find out minimum element of a particular range, let’s say the array is :

If we try to solve this using linear search the if we have n element and 1 query time complexity will be O(n*1) that means O(n).So, if we have q queries the complexity will be O(nQ) . Now if n<=10^5 and q<=10^5 and the time limit is 1second, with linear search we’ll get a TLE. But if we use segment tree, that we’ll help us reducing the time complexity. For this problem we’ll draw a similar tree like before.

But in each node we’ll save the minimum element of the entire range that node. So, the root node will contain minimum element of the range [0-5] and likewise other nodes will contain minimum element of it’s range. And we’ll start storing the minimum value from the leaf nodes because the range size of the leaf is 1 from each node, so the element of that index itself. First we’ll store the number 8 node where range is 0-0 so we have to just put the element of 0^th index which is 1, then in range 1-1 we’ll store element of index 1 which is 3, likewise we’ll fill all the leaf nodes with there values. Now we’ll move on to the parent nodes of the leafs. First we’ll store the minim of range [0-1] which is the minimum of range [0,0] and [1,1] that is 1, then minimum of range [3-4] is minimum of range[3.3] and [3.4] which is -2. Then we’ll store the minimum of range [3-5] which is -2 , minimum of range [0-2] is 1 and minimum of range [0-5] is -2. Now that we’ve build our tree .At first we’ve filled the leaf nodes and then in a bottom up way we’ve filled the rest of the nodes.

Now we’ll see how to build this tree using code. We will use an array to build this tree. If you have noticed I have numbered the nodes in the tree and where 10 and 11 is missing because node 5 can’t be divided further. The numbering of the nodes is done so that we can make a relationship between parent and child nodes. So, node 1’s children are 2 and 3, node 2’s children are 4 and 5, node 3’s children are 6 and 7. Can you see a pattern here? When we are getting node i it’s children are 2*I and 2*i+1 and if we have children and want to find out it’s parent then if the child node is i, the parent node will be i/2 because for children 2,3 we get parent 1 likewise children 4,5 we get parent 2 . By using these notation we can store the children node in the array. So, the array for our tree we’ll look like this:

The nodes are not in the tree I have kept those indexes empty. So for n elements we’ll need total 2n-1 nodes in the tree but when my allocating array that is larger than 2n-1 because we re also taking some extra indexes which are empty, so the exact size of the array will be 2*(2^ceil(log n))+1 ≈ 4*n+1, this is the maximum size needed to build a tree of size n.

Function for building the tree is:

Now that we have made the segment tree, we have to do the query in the tree. In this case we are finding the minimum element in a range. So, when the query comes, there will be three cases .

To know the first case let’s see an example- If user does a query for range [0,5] then in the tree the root node is the range [0,5] and it contains -2 that means we’ve got the output -2. Case is:

1)This is complete overlap because the node range [0,5] is completely inside the query range[0,5], so in complete overlap we just have to return tree[index], that’s how we get -2 as output

To know the second case and third case let’s see an example- If user does a query for range [0,2] then in the tree the root node is the range [0,5] that means we’ve partial overlap Case is:

2)Partial overlap because the query range [0,2] is partially inside the root node range[0,5], So, in partial overlap we’ll go to both sides left and right. First we’ll go to left in left we get the node range[0,2]and our query range is also [0,2] this is complete overlap so we’ll return the value which this node contains which is 1.

Now we’ll go to the right node the range of which is [3-5] and query range is [0,2] so we’ve got the third case that is no overlap.

3) When the query range is outside the node range we get no overlap, in this case we’ll return infinity

So, the minimum of 1 and infinity we’ll be 1. And if you see our input array, the minimum element between [0,2] is also 1 that means we’re getting the right output.

So, the three cases for segment tree is:

1) Complete overlap where output is element of that node

2) Partial overlap where we have to find output by going both sides

3) No overlap, in this case we’ll return infinity

Now let’s see the code for the query:

Saturday, January 13, 2018

Pointers in C/C++

You may not be a C or C++ programmer and thinking there’s no need for this but knowledge of pointers is useful for any programmer irrespective of the language they use. Usually people tend to avoid pointers when they are learning c language because pointers are abstract thought of memory layout but it is a foundation of good programming principals. So, let’s understand what pointers are and how it works.

And before we get into knowing pointers, we have to know how memory works. See when we run a program we use memory to store variables and such. The memory a program uses is divided into blocks and these blocks are of different sizes of 32biots or 64bits depending on your operating system. All the blocks has a specific address associated with it and that’s why we can access every location of these blocks using its address. And as these addresses are in hexadecimal so, it is difficult to deal with them directly and that’s why we access memory in program through variables. Variables symbolically represents address in the memory and when we use variables we don’t worry about what the address is and that’s’ why we don’t have to worry about hexadecimal codes and we can focus on symbolic means.

In most of our programs we instantiate our variables but how’s that values get stored in memory? For example- we have two variables a and b which we’ll instantiate

int a = 5;

int b = 10;

Memory layout for these we’ll look something like this:

If we consider the entire memory broken down into blocks like above, we will find that the values of a and b we’ll occupy one of these blocks. And as the data type of a and b is integer(int), integers are 32bit that means 4bytes and if our OS is 32bits, we will find that integers occupy two of these blocks. So, we have 10 in one of the blocks and 5 in one of the box. Each of these blocks have addresses and addresses can be from zero to number of blocks our particular machine has or our program uses. One thing to remember here is the program decides which block we’ll occupy which variable values and we don’t have control over that. So, in short what is happening here is, ‘a’ and ‘b’ symbolically refers to a memory location, 5 and 10 are the values get stored in that locations and those lives in the memory spaces called a or b, these locations has underlying addresses that is known by our C program, we simply refers to those by ‘a’ or ‘b’. Now that we understand memory layout let’s move on to pointer.

What is a pointer?

We’ve just seen that when we initialized variable ‘a’, it’s assigned in some location of the memory and the C program will decide which location it can be assigned and symbolically we refer to the location as ‘a’.

But now we want to point to that location, what I mean by it is- for example we can set another variable aPtr which is located in completely different location in memory like below-

And aPtr holds the address of variable ‘a’

So, we can say ‘aPtr’ points to ‘a’ because , notice the value of ‘aPtr’ is 123FF which is the address of variable ‘a’. This is basically the core of how pointer works. A pointer just holds the address of another variable.

In short pointer is a special type of variable because regular variable stores integers, doubles etc. but pointers stores an address in memory. We can say a pointer points to another address in memory.

Now that we know what a pointer is, we have to know how to setup ‘aPtr’ to point to ‘a’.

C implementation of pointer

it is just an one line command like this:

int *aPtr = &a;

Here int * means pointer to an integer that means int specific what kind of variable this pointer points to and * indicates that this is a pointer. The next thing here is the & symbol before the variable ‘a’,’ &’ indicates that we want the address of the variable not the value and we access the address of ‘a’ by that symbol(&) and assign it to ‘aPtr’.

Type of pointer variables

I have just given an example of integer type pointer. But what other type do pointer variables have? We can point to any type of variable it can be primitive type(integer, double, character etc) as well as user defined types(structure in C). Type of a pointer depends on what it points to.

But remember earlier I have said addresses are just hexadecimal values, so now the question comes to mind that if all addresses are just hexadecimal values and pointer just holds the addresses, then why does it need to know what type of variable it points to?

See all different variables occupies different amount of space in memory(i.e. int:4bytes,float: 4byes,char: 1byte) and pointer needs to know what type of variables it points to so that we can do something called pointer arithmetic to those variables. I will explain that on my next post.

Accessing the value the address the pointers holds

It's also just one line like this:

int valueOfaPtr = *aPtr;

here we have a int type variable ‘valueOfaPtr’ which is initialized by the pointer variable ‘*aPtr’ that points the variable ‘a’ whose value is 5. By ‘*aPtr’ we are getting the value of the variable which address this (aPtr) variable is holding. So, now if we print the variable ‘valueOfaPtr’ we we’ll see that it has the value 5.

The whole code:

The output:

First we have printed ‘a’ which has value 5 and after that we have printed ‘aPtr’ and it gave us a hexadecimal value that is the address of variable ‘a’ ad after that we have printed ‘*aPtr’ which give us the value 5 which is the value of ‘a’. Notice how ‘aPtr’ gives the address and ‘*aPtr’ gives the value, that’s why when we initialized the variable ‘valueOfaPtr’ we have initialized with ‘*aPtr’ because that is a normal variable we wanted to initialize that variable with value not the address.