natechoe.dev The blog Contact info Other links The github repo

How heaps (priority queues) work

Priority queues use a very simple data structure called a "heap". Heaps are a special type of binary tree with exactly two properties:

  1. Every parent node is smaller than both of its children
  2. Heaps are "almost complete", meaning that their nodes can be represented as an array in reading order without NULL values.

Rule two is a bit confusing, so here's an example: The array [1, 2, 3, 4, 5] can be placed into a tree with all of its nodes in reading order like this:

1
2
4
5
3

If you just read this tree left to right and top to bottom, you get [1, 2, 3, 4, 5]. Not all trees have this property, though. This tree, for example, cannot be represented as an array of values.

1
2
NULL
5
3

This tree would turn into [1, 2, 3, NULL, 5]. With heaps, you're not allowed to have NULL values in the middle of the array like that.

Here are some examples of valid heaps:

1
2
3

The first rule passes. The only parent node (1) is smaller than all of its children (2 and 3). The second rule also passes, because this tree has all of its nodes in reading order. If I read out this tree to you, it would go [1, 2, 3].

Here's another example:

2
5
12,917,322
6
3
4
NULL

This heap is also valid. Every parent node is smaller than its children, and it can be read in reading order as [2, 5, 3, 12917322, 6, 4, NULL]. There are, of course, NULL elements in this array, but they're all at the end so it's fine.

The following heap, on the other hand, is invalid.

2
5
12,917,322
6
3
NULL
4

If I put this tree into reading order, it would go [2, 5, 3, 12917322, 6, NULL, 4]. I hit a NULL before I finish reading all of the real nodes, so it breaks rule two.

This heap is also invalid.

2
1
3

The heap is perfectly balanced, but there is a node that is smaller than its parent (1 is less than 2), so it breaks rule one.

Heaps have a few interesting properties, the biggest one being that the root node of a heap is always the smallest node. If it wasn't, then it wouldn't be a valid heap. This gives an opportunity: If we can figure out how to efficiently add and remove from a heap, we can create an efficient priority queue.

Let's try adding the number 0 to this heap.

3
5
7
NULL
9
NULL
NULL

We don't want to break rule number 2, so we can add our new node to the next spot in reading order.

3
5
7
0
9
NULL
NULL

Now we're breaking rule number 1, because 0 is less than 5. Luckily, we can swap those two nodes without a problem. This operation is guaranteed to be safe. If you make a parent node smaller, you can guarantee that both children will still be bigger than the parent.

3
0
7
5
9
NULL
NULL

Great! We've just made a small section of this heap a little bit more correct. Of course, it's still wrong, 0 is less than 3. We can fix this by swapping the 0 node with its parent again.

0
3
7
5
9
NULL
NULL

Now our heap is entirely correct, with an extra node. This operation is called "bubble-up", because you take a node and you repeatedly swap it upwards until it reaches a safe spot. Let's try it again by adding an 8.

We begin by adding it to the next spot in reading order

0
3
7
5
9
8
NULL

Then we realize that this node's parent is too big, so we swap it upwards

0
3
7
5
8
9
NULL

And we're done (another swap would have broken the heap). An implementation of this algorithm in pseudocode might look like this:

function bubble_up(node):
    // Return if we're safe
    if (node == root)
        return;
    if (node.value > node.parent.value)
        return;

    // Otherwise, swap up
    swap(node, node.parent);
    bubble_up(node.parent);

function add_value(value):
    new_node = add_node_to_tree_in_next_available_spot(new node(value));
    bubble_up(new_node);

We've got adding, how about removing? Let's begin by removing the root node.

NULL
3
7
5
8
9
NULL

Of course, this isn't a valid binary tree anymore, so let's swap the root node with the last node in the tree.

9
3
7
5
8
NULL
NULL

Because we've just taken a node from the bottom and pushed it to the top, this node is definitely going to be too big. We can fix this by swapping it with the smaller of its two children. This operation is also guaranteed to be safe for a similar reason to bubble-up.

3
9
7
5
8
NULL
NULL

It looks like our node is still too big, so we can swap it again.

3
5
7
9
8
NULL
NULL

And we're done. This operation is called "bubble-down". Pseudo-code for this algorithm might look like this:

function bubble_down(node):
    // Return if we're good
    if (node.child1 == NULL)
        return;
    if (node.child2 == NULL && node.child1.value < node.value)
        return;
    if (node.value < node.child1.value && node.value < node.child2.value)
        return;

    smaller = min(node.child1, node.child2);
    swap(node, smaller);

function remove():
    ret = root.value;
    swap(root, last_node());
    bubble_down(root);
    return ret;

There is still one more thing about heaps that I haven't talked about. Because heaps can be represented as an array, we usually just store all of the nodes in an array and don't bother with storing any references. When you run

System.out.println(some_priorityqueue);

What you get is that array, where for any node i, its children are i*2 and 2*i+1.