Relative to my previous, related comparisons, this one uses larger data set sizes, includes sorteddict, and is heavy on writes (sequential once, random another) and light on random reads.
Some notes about the datastructures:
a simplified form of a red-black tree.
|commonly used on disk, but here it is in virtual memory.
|unbalanced binary tree, meaning it's nice if your keys are in random order, and awful if they are sorted.
|also commonly used on disk, but also used in virtual memory here.
|the usual Python hash table. It is not ordered, and is included only for the sake of comparison. Interestingly, in CPython 2.7 it is more subject to MemoryError exceptions than the other datastructures here - despite being run on a 64 bit system. CPython 3.4, Pypy and Pypy3 did not exhibit the problem. Also, random workload did not exhibit the problem, but sequential workload did.
|commonly believed to be the performance champion, but that's not what happened here.
|believed to give a time tradeoff: time spent self-balancing against time spent searching a deep tree, using a user-specified constant (alpha, also written α) to determine how much time goes to each. Here we see that there is indeed a tradeoff, but it's not a very good one. In fact, at about 67108864 elements, some operations started taking much longer than they should.
|has a pretty impressive constant.
|reorganize themselves on writes like normal, but also on reads. They're doing well in the sequential workload, but that's probably because of the infrequent random reads; I believe this because using a splay tree in order reduces it to a linked list, much like an unbalanced binary tree.
|sometimes said to be the fastest (even compared to red-black tree), but to have a greater standard deviation in operation times. Here we see it performing poorly compared to sorteddict.
You can e-mail the author with questions or comments: