Algorithms and Data Structures — Part II
Associative Arrays (Dictionaries)
Basics
To insert the key/value pair {‘ADAM’, 65} we have to create a hash function, h(x), to convert the key to an integer representing the index of the underlying array.
We can represent characters as their numerical equivalent in ASCII
We can now perform any operation we want on these numbers, let’s choose sum
Having added these values, we have to ensure the result of the hash function is between 0 and (m-1), where m is the size of the underlying array. We can do this with the modulus operator
So now we have 3, which is a valid index in the underlying array. Thus completes this simple hash function
Another example:
Here we see the hash function we crated (sum of all ASCII values in key string, and mod 8) produced the same number as a previous key. When a hash function produces the same value for two different keys, this is called collision. Either we keep the new value and throw away the old (adam, 39) or we keep the old and throw away the new (naac, 21). It is important for hash tables (dictionaries) to deal with these collisions
How to Handle Collisions in Hashtables
There are no perfect hash functions, so we must handle collisions. There are two methods, chaining and open addressing
Chaining: using a linked list as well as an array
Here we see that k2 and k4 have a collision for slot 2 in the underlying array. So a linked list is created to store the value v4. This linked list is then travered upon look up, but look up is still closer to O(1) than O(N), since the length of the linked list is always less than N (the number of key/value pairs).
Worst case scenario is that every key points to the same slot and we end up with a single linked list, reducing our data structure to a O(N) for lookup, which defeats the purpose of the hashtable to begin with. We like chaining because it is so easy to implement …
… but we prefer open addressing
One method of open addressing (look up other methods) is linear probing. Here we try the next index until we get to an empty slot in the underlying array.
Another approach is quadradic probing
Another approach is rehashing
Example of linear probing:
Here we see that k4 and k2 have the same hash value. We must then go to the next available index in the array to find a place for k4.
The worst case scenario is when we have collisions and used linked list or probing and we end up with basically a list like structure, resulting in O(N) time complexity for search, insertion, and deletion
If we want the search, insertion, and deletion to be guarenteed fast, we should use balanced binary search trees, as the worst case for these methods is O(log N)
Dynamic Resizing
Real World Applications of Hashtables
Graph Theory
Overview
In the directed graph, we see that vertex Cis connected to vertex A (meaning we can get to C from A), but vertex A is not connected to C (we cannot get to A from C).
Types of Graphs
Look up how to check if there are cycles in a graph
Graph Representation
The root of the edge is key of the hashtable below. The edge numbers represent weights, the edge going from A to C has a weight of 4, for example.
Here we see the root node is the row, the edge weights are the numbers
Symmetric Graphs (notice the matrix is symmetric around the diagonal). This graph is also not directed, as shown in the matrix.
Applications of Graphs
Graph Traversal Algorithms
Breadth First Search (BFS)
Concrete Example:
We then add A to the visited nodes, and pop it from the queue to find its neighbors
We then take the first item we inserted (queue is first in first out), then consider its neighbors and add them to the queue:
And so on, until the queue is empty
BFS Applications
Depth First Search
The blow shows that when a stack is being used as the underlying data structure, recursion can be used. Recursion just transfers the memory management to the call stack instead of the explicitly declared stack used in the iterative solution.
Concrete Example:
The order of the nodes placed on the stack doesn’t matter. The last item is considered, in this case B, and popped off the stack (LIFO), and its neighbors are considered, and placed on the stack
The next node to be considered is C (we see how this is now going deeper into the graph, as opposed to ensuring that every node on a given “level” is considered before going on to the next “level.”
C has no children, so the next node to be considered is D, and it’s children (neighbors) are added to the stack
This continues until all the nodes have been considered and the stack is empty.
Memory Complexity of BFS vs DFS
For BFS we have to store more items in the queue structure, because we are going “level” by “level” and storing all the neighbors. How much memory is this mathematically? If it’s a balanced tree, then there are N/2 leaf nodes to store. So we have to store O(N) items, if we are using a tree with N items (this is the upper bound worst case scenario for a graph).
For DFS we are just storing as many items on the stack as the height of the tree, and we know this complexity to be O(log N)
Summary
Dijkstra’s Algorithm
Concrete Example:
Using a heap allows us to order the nodes by size, and makes node retrieval O(1)
We then go to the shortest path, to node B in this case, and look at all the neighbors. If the neigbors have already been visited, as in H, we check to see if the path we have currently taken, A -> B -> H, is shorter than the path originally taken, A -> H. In this case it is not, so we do not update the value of H on the heap.
Here we see the next node to be considered is H as it is the shortest path now. And then we calculate a new path to C that is shorter than previously calculated and we update that value on the heap
Next we take the shortest node, in this case E, and consider all of it’s neighbors. After calculating the weights and comparing them to the values in the heap we move on to node F
After calculating the values for F we move on to C
Finally when the heap is empty the algorithm ends
We now have the ability to create a “shortest path tree” as shown
Adjacency Matrix Representation
In this representation the indexes represent the nodes and the [index, index] represents a connection with a positive value indicating the weight (0 weight being a non connection). This matrix is non directed as [node1, node2] has the same value as [node2, node1]. This produces a symmetric matrix.
We implement Dijkstra’s Algorithm by using a shortest path matrix, that we initialize as follows:
After considering the first node, A, we see that the next node to consider is the shortest path, or node D. Here we see for the path to node C, we have a path already from A, which has a weight of 5, but the new path (A -> D -> C) has a total weight of 12, so we stick with the 5. For node F, the path is A -> D -> F, so the total weight is considered.
Next F
Then C
Then B
Finally E
This is much slower than adjacency list representation, as it takes O(N²)
Applications of Shortest Path
Critical Path Method (CPM)
Bellman-Ford Algorithm
https://en.wikipedia.org/wiki/Dynamic_programming
The maximum length of a shortest path, the maximum number of nodes it touches, is N-1.
Negative cycle: here we see a cycle where the sum is below zero, and when iterated through again, the sum will get lower and lower (more and more negative), thus making this an infinite loop (when comparing to see if the sum is less than the previous value).
Psudo Code:
Relaxation:
Implementation Example Graph
Disjoint Sets
Making Sets
Find