Algorithms and Data Structures — Part II

Amaury Larancuent

13 min readJul 22, 2021

Associative Arrays (Dictionaries)

Basics

To insert the key/value pair {‘ADAM’, 65} we have to create a hash function, h(x), to convert the key to an integer representing the index of the underlying array.

We can represent characters as their numerical equivalent in ASCII

We can now perform any operation we want on these numbers, let’s choose sum

Having added these values, we have to ensure the result of the hash function is between 0 and (m-1), where m is the size of the underlying array. We can do this with the modulus operator

So now we have 3, which is a valid index in the underlying array. Thus completes this simple hash function

Another example:

Here we see the hash function we crated (sum of all ASCII values in key string, and mod 8) produced the same number as a previous key. When a hash function produces the same value for two different keys, this is called collision. Either we keep the new value and throw away the old (adam, 39) or we keep the old and throw away the new (naac, 21). It is important for hash tables (dictionaries) to deal with these collisions

How to Handle Collisions in Hashtables

There are no perfect hash functions, so we must handle collisions. There are two methods, chaining and open addressing

Chaining: using a linked list as well as an array

Here we see that k2 and k4 have a collision for slot 2 in the underlying array. So a linked list is created to store the value v4. This linked list is then travered upon look up, but look up is still closer to O(1) than O(N), since the length of the linked list is always less than N (the number of key/value pairs).

Worst case scenario is that every key points to the same slot and we end up with a single linked list, reducing our data structure to a O(N) for lookup, which defeats the purpose of the hashtable to begin with. We like chaining because it is so easy to implement …

… but we prefer open addressing

One method of open addressing (look up other methods) is linear probing. Here we try the next index until we get to an empty slot in the underlying array.

Another approach is quadradic probing

Another approach is rehashing

Example of linear probing:

Here we see that k4 and k2 have the same hash value. We must then go to the next available index in the array to find a place for k4.

The worst case scenario is when we have collisions and used linked list or probing and we end up with basically a list like structure, resulting in O(N) time complexity for search, insertion, and deletion

If we want the search, insertion, and deletion to be guarenteed fast, we should use balanced binary search trees, as the worst case for these methods is O(log N)

Dynamic Resizing

Real World Applications of Hashtables

Graph Theory

Overview

In the directed graph, we see that vertex Cis connected to vertex A (meaning we can get to C from A), but vertex A is not connected to C (we cannot get to A from C).

Types of Graphs

Look up how to check if there are cycles in a graph

Graph Representation

The root of the edge is key of the hashtable below. The edge numbers represent weights, the edge going from A to C has a weight of 4, for example.

Here we see the root node is the row, the edge weights are the numbers

Symmetric Graphs (notice the matrix is symmetric around the diagonal). This graph is also not directed, as shown in the matrix.

Applications of Graphs

Graph Traversal Algorithms

Breadth First Search (BFS)

Concrete Example:

We then add A to the visited nodes, and pop it from the queue to find its neighbors

We then take the first item we inserted (queue is first in first out), then consider its neighbors and add them to the queue:

And so on, until the queue is empty

BFS Applications

Depth First Search

The blow shows that when a stack is being used as the underlying data structure, recursion can be used. Recursion just transfers the memory management to the call stack instead of the explicitly declared stack used in the iterative solution.

Concrete Example:

The order of the nodes placed on the stack doesn’t matter. The last item is considered, in this case B, and popped off the stack (LIFO), and its neighbors are considered, and placed on the stack

The next node to be considered is C (we see how this is now going deeper into the graph, as opposed to ensuring that every node on a given “level” is considered before going on to the next “level.”

C has no children, so the next node to be considered is D, and it’s children (neighbors) are added to the stack

This continues until all the nodes have been considered and the stack is empty.

Memory Complexity of BFS vs DFS

For BFS we have to store more items in the queue structure, because we are going “level” by “level” and storing all the neighbors. How much memory is this mathematically? If it’s a balanced tree, then there are N/2 leaf nodes to store. So we have to store O(N) items, if we are using a tree with N items (this is the upper bound worst case scenario for a graph).

For DFS we are just storing as many items on the stack as the height of the tree, and we know this complexity to be O(log N)

Summary

Dijkstra’s Algorithm

Concrete Example:

Using a heap allows us to order the nodes by size, and makes node retrieval O(1)

We then go to the shortest path, to node B in this case, and look at all the neighbors. If the neigbors have already been visited, as in H, we check to see if the path we have currently taken, A -> B -> H, is shorter than the path originally taken, A -> H. In this case it is not, so we do not update the value of H on the heap.

Here we see the next node to be considered is H as it is the shortest path now. And then we calculate a new path to C that is shorter than previously calculated and we update that value on the heap

Next we take the shortest node, in this case E, and consider all of it’s neighbors. After calculating the weights and comparing them to the values in the heap we move on to node F

After calculating the values for F we move on to C

Finally when the heap is empty the algorithm ends

We now have the ability to create a “shortest path tree” as shown

Adjacency Matrix Representation

In this representation the indexes represent the nodes and the [index, index] represents a connection with a positive value indicating the weight (0 weight being a non connection). This matrix is non directed as [node1, node2] has the same value as [node2, node1]. This produces a symmetric matrix.

We implement Dijkstra’s Algorithm by using a shortest path matrix, that we initialize as follows:

After considering the first node, A, we see that the next node to consider is the shortest path, or node D. Here we see for the path to node C, we have a path already from A, which has a weight of 5, but the new path (A -> D -> C) has a total weight of 12, so we stick with the 5. For node F, the path is A -> D -> F, so the total weight is considered.

Next F

Then C

Then B

Finally E

This is much slower than adjacency list representation, as it takes O(N²)

Applications of Shortest Path

Critical Path Method (CPM)

Bellman-Ford Algorithm

https://en.wikipedia.org/wiki/Dynamic_programming

The maximum length of a shortest path, the maximum number of nodes it touches, is N-1.

Negative cycle: here we see a cycle where the sum is below zero, and when iterated through again, the sum will get lower and lower (more and more negative), thus making this an infinite loop (when comparing to see if the sum is less than the previous value).