i built a skiplist to understand probabilistic data structures

we live in a world so obsessed with structure—so much so that we often forget how much of it depends on chance. we crave predictability—systems we can understand and control. but sometimes randomness isn’t a flaw; it’s a feature.

skiplists are exactly that kind of structure. in today’s post, we’ll explore how they work and build one step by step.

find the full code here.

so…what exactly is a skiplist?#

wikipedia describes it as “a probabilistic data structure that allows \(O(\log n)\) average complexity for searches and insertions.”¹

but let’s simplify that.

imagine you’re in a grocery store, searching for peanut butter. you could slowly walk down every aisle, checking each item, one by one…or you could glance up at the signs above each section and quickly skip straight to where you need to be.

at their core, skiplists are just linked lists—except each node has multiple forward pointers arranged into “levels.” the bottom level links every node in order, like a standard list, while higher levels act as express lanes, letting us skip over large sections of data. it’s a simple trick that gives us performance close to balanced trees or sorted arrays, without the complexity of maintaining balance.

deciding who gets to skip#

but how do we decide which elements get to appear in these higher layers? we simply flip a coin.

each node has a probability \(p\) (often \(\frac{1}{2}\) or \(\frac{1}{4}\)) of being promoted to the next layer. this randomness ensures our structures stay roughly balanced on average—without the overhead of strict balancing algorithms.

skiplists grow logarithmically with the number of elements added.

this helps avoid two extremes:

too few levels, and searches become slow.
too many, and you waste memory on pointers without much speedup.

in fact, a skiplist with \(n\) elements and promotion probability \(p\) will have about \(\log_{1/p}(n)\) levels.

for example, if \(p = \frac{1}{2}\), that’s just \(\log_2(n)\)—the same as a balanced binary search tree.

let’s build it.#

the node#

a skiplist is made up of a single set of nodes—each one pointing forward across multiple levels.

node

struct Node<K, V> {
    key: Option<K>,
    val: Option<V>,
    fwd: Vec<Link<K, V>>, // one forward pointer per level
}

each node holds a key-value pair and a list of forward pointers, one for every level it participates in. the number of levels (i.e. the size of fwd) is determined at insertion time, based on a random promotion rule we’ll look at shortly.

the skiplist#

next, we define the skiplist-a container that holds our nodes and a bit of metadata to keep things running.

the skiplist

struct SkipList<K, V> {
  head: Arc<RwLock<Node<K, V>>>,
  max: usize,
  len: usize,
  p: f64,
}

head is a special node that spans all levels and acts as the entry points for traversals.
max is the maximum number of levels any node can have. it’s not fixed—we can increase it at runtime depending on how many nodes we’ve added, which helps keep the structure efficient as it grows.
len keeps track of how many elements are in the list.
p is the promotion probability.

how many levels? let the coin decide.#

skiplists don’t rebalance themselves like trees. instead, they rely on randomness.

every time we insert a new node, we run a simple function to decide how many levels it should span:

random_level

fn random_level(&self) -> usize {
  let mut rng = rand::thread_rng();
  let mut lvl = 1;
  while rng.gen_bool(self.p) && lvl < self.max {
  lvl += 1;
  }
  lvl
}

we start at level 1 and keep flipping a weighted coin—gen_bool(self.p)—promoting the node each time we get heads. this continues until we either get tails or reach the max level.

most nodes stay at level 1, a few get promoted, and even fewer reach the top. this gives us O(\log n) performance for search and insertions—without ever rebalancing.

it’s not perfect, but it doesn’t need to be. in practice, “good enough” is plenty fast.

growing with the list#

as the skiplist grows, so should its levels.

we don’t want too many levels (that wastes memory), but we don’t want too few either (that slows things down). so we let math decide.

when inserting, we can compute an optimal number of levels using:

\(\text{levels} \approx \log_{1/p}(n)\)

where:

\(n\) is the number of elements
\(p\) is the probability a node gets promoted to the next level

here’s how we do that in code:

calculating optimal levels

fn optimal_levels(&self) -> usize {
    if self.len == 0 {
      return 1; 
    }

    let base = 1.0 / self.p;
    let estimate = (self.len as f64).log(base).ceil() as usize;
    
    estimate + 2
}

we can check this periodically and resize if needed:

resize levels

fn resize_levels(&mut self) {
  let opt = self.optimal_levels();
  
  if opt > self.max {
    self.grow(opt);
  }
}

fn grow(&mut self, new_max: usize) {
  let mut head = self.head.write().unwrap();

  while head.fwd.len() < new_max {
      head.fwd.push(None);
  }

  self.max = new_max;
}

this way, the structure stays efficient as it grows.

inserting a new node#

inserting into a skiplist has two steps:

find where the new key should go.
wire the new node in.

we start from the top layer of the head node and move forward until the next node’s key is greater than or equal to our target. then we drop down a level and repeat. by the time we reach level 0, we know the exact spot to insert.

this “search and drop” behavior is what gives skiplists their efficiency.

once we’ve found the insertion point, we check if the key already exists. if so, we just update the value. if not, we create a new node with a randomly chosen height and wire it in at each of its levels.

insertion

fn put(&mut self, _k: K, _v: V) -> Option {
  let mut update = vec![None; self.max];
  let mut curr = Arc::clone(&self.head);
  // search phase: track nodes whose forward pointers need to be updated
  for level in (0..self.max).rev() {
      loop {
          let next = curr.read().unwrap().fwd[level].clone();
          match next {
              Some(node) => {
                  let node_key = node.read().unwrap().key.as_ref().unwrap();
                  if node_key < &_k {
                      curr = node;
                  } else {
                      break;
                  }
              }
              None => break,
          }
      }
      update[level] = Some(Arc::clone(&curr));
  }

  // duplicate check: if key already exists, update and return
  if let Some(next) = update[0].as_ref().unwrap().read().unwrap().fwd[0].clone() {
      let mut next_ref = next.write().unwrap();
      if let Some(key) = &next_ref.key {
          if key == &_k {
              return next_ref.val.replace(_v);
          }
      }
  }

  // create new node with random level
  let _lvl = self.random_level();
  let entry = Arc::new(RwLock::new(Node::entry(_k, _v, _lvl)));

  // insert new node by wiring it into each level
  for level in 0.._lvl {
      let mut prev = update[level].as_ref().unwrap().write().unwrap();
      entry.write().unwrap().fwd[level] = prev.fwd[level].take();
      prev.fwd[level] = Some(Arc::clone(&entry));
  }

  self.len += 1;

  self.resize();

  None
}

searching#

searching is just like inserting—except we don’t change anything.

we start at the top of the head node and move forward until the next key is greater than or equal to the one we want. if it’s less, we keep going; if it’s equal, we’ve found it; if it’s greater, we drop down a level and repeat.

by level 0, we’ll either find they key-or confirm it’s not there.

searching

  fn get(&self, _k: K) -> Option<V> 
  where 
      K: Ord,
      V: Clone 
  {
      let mut curr = Arc::clone(&self.head);

      for level in (0..self.max).rev() {
          loop {
              let next = {
                  let curr_ref = curr.read().unwrap();
                  curr_ref.fwd[level].clone()
              };

              match next {
                  Some(node) => {
                      match {
                          let node_ref = node.read().unwrap();
                          node_ref.key.as_ref().map(|k| k.cmp(&_k)) 
                      } {
                          Some(Ordering::Less) => {
                              curr = node;
                          }
                          Some(Ordering::Equal) => {
                              return node.read().unwrap().val.clone();
                          }
                          _ => break,
                      }
                  }
                None => break,
              }
          }
      }
      None
  }

a few thoughts on improvement#

right now, every put and get locks nodes with RwLock, which lets multiple readers in-but only one writer at a time. for most cases, that’s fine. but in write-heavy environments, it can lead to contention, or worse: deadlocks, if threads try to lock overlapping nodes in different orders.

closing thoughts#

this was a fun one.

building a skiplist form scratch helped me understand rust better—memory, ownership, concurrency—things it doesn’t let you gloss over.

but more than that, it’s a stepping stone.

thanks for reading.

skip list - wikipedia. https://en.wikipedia.org/wiki/Skip_list ↩︎