First off, you may want to see my prior comparisons here.

Relative to my previous, related comparisons, this one uses larger data set sizes, includes sorteddict, and is heavy on writes (sequential once, random another) and light on random reads.

Some notes about the datastructures:

AA tree

a simplified form of a red-black tree.
AVL tree commonly used on disk, but here it is in virtual memory.
binary_tree_dict unbalanced binary tree, meaning it's nice if your keys are in random order, and awful if they are sorted.
B tree also commonly used on disk, but also used in virtual memory here.
dict the usual Python hash table. It is not ordered, and is included only for the sake of comparison. Interestingly, in CPython 2.7 it is more subject to MemoryError exceptions than the other datastructures here - despite being run on a 64 bit system. CPython 3.4, Pypy and Pypy3 did not exhibit the problem. Also, random workload did not exhibit the problem, but sequential workload did.
Red-black tree commonly believed to be the performance champion, but that's not what happened here.
Scapegoat tree believed to give a time tradeoff: time spent self-balancing against time spent searching a deep tree, using a user-specified constant (alpha, also written α) to determine how much time goes to each. Here we see that there is indeed a tradeoff, but it's not a very good one. In fact, at about 67108864 elements, some operations started taking much longer than they should.
Sorteddict has a pretty impressive constant.
Splay trees reorganize themselves on writes like normal, but also on reads. They're doing well in the sequential workload, but that's probably because of the infrequent random reads; I believe this because using a splay tree in order reduces it to a linked list, much like an unbalanced binary tree.
Treap sometimes said to be the fastest (even compared to red-black tree), but to have a greater standard deviation in operation times. Here we see it performing poorly compared to sorteddict.

Methodology:

  • Here's the comparison in graph form
  • Here's the comparison again as a collapsible HTML table


    Hits: 4289
    Timestamp: 2024-12-28 00:09:00 PST

    Back to Dan's tech tidbits

    You can e-mail the author with questions or comments: