Isometry and randomization algorithms

Shih-Hsuan(Ben) Lin
7 min readMar 16, 2021

Motivate

Randomness in algorithms is a topic I never stumble upon before, but I was truly amazed by its beauty once learned. I used to disregard randomness in algorithms since it has always been usually that our algorithms must perform rigorously. Turns out, with the help of some statistical proofs on probabilities, randomness can in fact become a piece of power evidence in proving our algorithms stability! This is completely the opposite meaning of randomness and the “unorder “feeling it radiances. Randomness also proves to be lenient in achieving average cases. For many data structures taught in class, it is dependent on us on what we choose to start the algorithm. Like binary search (manually selecting the middle key), topological sort (selecting any vertex), and many more. However, the thing is we are terrible at making great choices since we have biases, we tend to select results that perform best for the algorithm. The backfire is that we must do all the additional asymptotic analysis of considering worst cases. For randomized algorithms, however, the computer chooses itself on where to start. Therefore, there’s no such thing as “worst-case” in these kinds of algorithms since all cases happen at the same probability! The arbitrary manner yet still securing stability is what makes randomization algorithms amazing and powerful.

Explain

For all discussed algorithms and data structures, I will give a summary of it and explain some basic operations like search, construction/insertion, deletion, and asymptotic analysis of the basic operations. All demos of construction/insertion will use the same dataset:

[9, 13, 30, 39, 41, 48, 51, 53, 55, 60]

with differing randomizing rules (or extra values) respectively to each data structure that will be explained in their portion. This data is chosen specifically from figure 3 from the Deterministic Skip Lists paper in order to make clearer comparisons between each data structure and their isometry property later discussed in Discuss.

Treaps

Treap is a randomized binary search tree. Basically, this data structure is a combination of binary trees and heaps: each node has two values associated with it, a value and a key. It uses the values to mimic the construction process of a binary tree and uses the keys as priority values in simulating the idea/rule of sinking/swimming nodes (tree rotations for treaps) in a heap. Keys are randomly assigned to each value. Treap search is completely the same as how search is in BST by looking at the values.

Construction/Insertion

Given that our insertion dataset is already in sorted order, in addition to randomly assigning keys to each value, I will also randomly choose which value v to insert first so that our results are not optimal. After randomly selecting a value, a random key k with bounds 0 ≤ k ≤ 100 is assigned to the value. I will also assume that keys are not reoccurring so that there is no need to break ties. The new insertion order with each value’s associated key represented as (v, k) is as follows:

[(51, 84), (9, 92), (53, 22), (48, 73), (60, 70),

(41, 65), (13, 7), (55, 63), (30, 26), (39, 17)]

The rules of constructing treaps are simple: first, insert nodes as if constructing a BST by looking at the values, then move nodes to their correct position based on their keys using tree rotations as if manipulating a heap (I follow the rules of min-heap in the demo).

<iframe src=”https://docs.google.com/presentation/d/e/2PACX-1vTsx5ujY60dUs4-k2YIeXlS8GTqsRMgI9A3Tf8HhEGzcvCangQ-5P--n_4GeAbIZ8ROde5VhMqTO2ip/embed?start=false&loop=false&delayms=3000" frameborder=”0" width=”960" height=”569" allowfullscreen=”true” mozallowfullscreen=”true” webkitallowfullscreen=”true”></iframe>

Deletion

For deletion, we need to first sink our node to the most bottom level so it became a leaf, then we can simply delete it. Priority value properties must be preserved thus during the sinking process we need to use reverse tree rotations on the child with lower priority when the invariant is violated. The given demo is a deletion of the root (13, 7):

<iframe src=”https://docs.google.com/presentation/d/e/2PACX-1vSaOj3RnD-5qLWF538NEa_Bbk9ht3tBHuBcrg2zvUnMRZMOOfd5-k79_7E0lgW4YfXmyZpcP0eiqXE0/embed?start=false&loop=false&delayms=3000" frameborder=”0" width=”960" height=”569" allowfullscreen=”true” mozallowfullscreen=”true” webkitallowfullscreen=”true”></iframe>

Asymptotic Analysis

Treap constructs a tree as if it is a BST and orders nodes based on their priority values. However, since priority values are randomly assigned, treap is essentially BST with randomized inserting order. As a result, treaps will almost always get an expected depth of log n for n nodes. This also further implies that basic operations including insertion and deletion will have an expected runtime of O(log n).

Skip Lists

Skip lists are sorted linked lists that have checkpoints to help improve search time. Each linked list is duplicated with roughly ½ of its original size, the duplicated linked list is then stringed as a pointer to the original list. The duplicated linked list can also be further duplicated ½ of its size, resulting in ¼ of the original list. This process can be done multiple times. The way nodes are duplicated is random, each node has a ½ probability of being duplicated to the pointer linked lists. Continue the promoting process, that is, if a true false randomizer consecutively results in true, and stop promoting once the randomizer hits false. From this, a node being promoted at a level too high is very unlikely ((½)^n). This is also the reason that the duplicated linked list is expected to be ½ of its original size. Skip lists’ deletion is straightforward: simply search for the node and delete it in the base linked list and all connected promoted ones.

Search

Searching a value on skip lists is just like how searching is done linearly in a sorted linked list; however, skip lists have the aid of duplicated pointer linked lists that will improve the time of the search. Starting from the skip list with the top-most level, linearly traverse the linked list and stop until the desired value is found, the current value is larger than the desired value, or the traversal is finished and neither of the first two conditions is met. If the current value is larger than the desired value, go back to the value before this and go down another level to continue the same process at the new level. A demo of the search of the value 39 from the skip list taken in the Deterministic Skip Lists paper is shown:

<iframe src=”https://docs.google.com/presentation/d/e/2PACX-1vQ2g9gulAbjexf-m65O5GdQLD8EcRohFrlW2G3n-uzuWfzy-dRGNUJ814LF8hnpuHLkKgxe3_DS41-q/embed?start=false&loop=false&delayms=3000" frameborder=”0" width=”960" height=”569" allowfullscreen=”true” mozallowfullscreen=”true” webkitallowfullscreen=”true”></iframe>

Construction/Insertion

Skip lists are constructed with the basic rules of constructing a sorted linked list, with an additional step of determining whether the inserted node needs to be promoted or not, and how many levels of promotion. First, use search to discover the insertion position, then use a true false randomizer (or any device that helps in giving ½ probability say flipping a coin) in determining how many levels of promotion. A node will be promoted n times if the randomizer consecutively results in true n times. Every level of the linked list will also have a dummy starting node having the value of negative infinity for convenience in connecting nodes in the new level. Whenever a node is promoted to a level where the linked list is not constructed yet, the dummy node is also automatically promoted. For the demo, I use the same randomized insertion order for treap’s construction; however, levels of promotion for each node will be identical to figure 3 from Deterministic Skip Lists paper.

<iframe src=”https://docs.google.com/presentation/d/e/2PACX-1vRpJZHTlL79GY9383z68pxlJGDa0zXbprolugJUpGMo3D-M4xlW6PsoVi_HO4Z-JaCfg93MczyMEerf/embed?start=false&loop=false&delayms=3000" frameborder=”0" width=”960" height=”569" allowfullscreen=”true” mozallowfullscreen=”true” webkitallowfullscreen=”true”></iframe>

Asymptotic Analysis

As discussed previously, each duplicated linked list will have ½ of its original size with high probability. This implies that skip lists divide linked lists similarly to how BST cuts values in half and half. For n nodes, BST will have roughly around log(n) levels; this is also true for Skip Lists. Skip Lists will have log(n) levels with high probability, and this further suggests that other basic operations: search, insert and deletion (the latter two involve search which is the most costly operation) will have O(log n) expected runtime.

Zip Trees

Zip Trees are the binary tree version of skip lists. The top-most level of skip lists is represented in zip trees starting from the root then going to the right child. One difference between these two data structures is that zip trees’ top level does not reference itself as skip lists do. Zip trees descend their level by connecting their left child to the next smaller value from the upcoming level rather than itself. This makes sense since binary trees only have two links, right and left, and the left subtree will always have a value smaller than the parent. Moreover, since it is difficult to portray the idea of levels within a binary tree, a key/rank is associated with each node to mimic skip lists’ level system. Just like treaps, zip trees’ search is identical to that of BST’s.

Insertion/Unzip

Whenever a value is being inserted perform the randomizing process identical to skip lists to assign rank values to each node, then search for the node starting at highest rank and temporarily stop when: current node’s left child’s rank < insert node’s rank, current node’s right child’s rank ≤ insert node’s rank, and current node’s rank = insert node’s rank but has greater value. After that, continue searching for the insertion position and pay attention to the paths taken. All left explores and nodes visited will become the left child of the inserted node, while all right nodes visited will be the right child. After the insertion point is finally discovered, insert the value, and connect all related nodes. This process is called unzipping.

<iframe src=”https://docs.google.com/presentation/d/e/2PACX-1vTlwbdGFfsGEYp803P8A1FM3HVyPGIA4ArfURnJtiEgZsIr5UbTEHTHOa-L-0n_HvJRytojZBgPxSDP/embed?start=false&loop=false&delayms=3000" frameborder=”0" width=”960" height=”569" allowfullscreen=”true” mozallowfullscreen=”true” webkitallowfullscreen=”true”></iframe>

Deletion/Zip

Deletion of a value has a similar process of unzipping but this time values are being zipped, meaning the unzipping process is being unrolled. First, identify the position of the node being deleted, then observe its left and right children. Zip two trees, that is, make the right subtree the right child of the left subtree if the left subtree’s root’s rank is greater than that of the right subtree, or make left subtree the left children of right subtree.

”https://docs.google.com/presentation/d/e/2PACX-1vTN5DJ2msLP3GUl1ItJ6B_3ULDKSHG1uuPpt5pkzsUwBDpFY15FjYipkg_nAVf_Yj2mFo3jufZBUkZc/embed?start=false&loop=false&delayms=3000"

Asymptotic Analysis

Since Zip trees have exactly the same ideas for skip lists, that means that all basic operation time costs will be identical to the skip lists. Therefore, insertions and deletion will all have an expected runtime of O(log n).

Address

A 5-way correspondence between skip lists, 2–3–4 trees, zip trees, red-black trees, and triple-pivot quicksort is demonstrated below using the values from figure 3.

https://docs.google.com/presentation/d/1bS5rBczcMlw496NeuJKQ40N02FugZA46MOKhgJ3K0g0/edit?usp=sharing

--

--