It seemed a little strange, when I wrote it in Rust, that the Rust version was slower than the Python version, even though the Python version was doing about the same CPU work, and more I/O work (the two algorithms were similar, but not identical).
I profiled the Rust code using a flame graph, and it turned out that most of the time was spent generating the md5 digests.
A blog post I saw said that Rust doesn't have just one md5 implementation, and that it was one md5 implementation for Rust that was slow, not Rust itself.
So I set out to compare as many Rust md5 implementations as I could find - hoping to find "a good one". And I tossed in Python and C, to see how they compared. The result is below. It's the "md5er returned" lines that have timings, at the end of each such line. The units are seconds, and lower values are better:
$ make below cmd output started 2018 Sun Nov 18 06:28:21 PM PST gcc -o smp-gcc -ansi -pedantic -Wall -O3 smp-c.c -lcrypto clang -o smp-clang -ansi -pedantic -Wall -O3 smp-c.c -lcrypto ./smp-gcc gcc md5er returned "2995f9ab6976da7997fb14378d5a280e" in 580.059692s ./smp-clang clang md5er returned "2995f9ab6976da7997fb14378d5a280e" in 581.094604s cargo run --release Compiling compare-md5s v0.1.0 (file:///home/dstromberg/src/home-svn/md5s/trunk) Finished release [optimized] target(s) in 0.69s Running `target/release/compare-md5s` crypto::md5 md5er returned "2995f9ab6976da7997fb14378d5a280e" in 870.871029237s md5::Context md5er returned "2995f9ab6976da7997fb14378d5a280e" in 1003.274510337s md-5::Md5 md5er returned "2995f9ab6976da7997fb14378d5a280e" in 792.284610801s /usr/local/cpython-3.7/bin/python3 ./smp.py cpython md5er returned "2995f9ab6976da7997fb14378d5a280e" in 577.4749364852905s /usr/local/pypy3-6.0.0/bin/pypy3 ./smp.py pypy md5er returned "2995f9ab6976da7997fb14378d5a280e" in 573.8880910873413sAs you can see, the two Python versions were the fastest - even faster than the C versions. And Rust, in all 3 cases, was the slowest. One of the Rust implementations is hand-coded assembly language, not pure Rust, interestingly.
Interlanguage performance comparisons are often a mixed bag. For example, I wrote a prime number sieve in each of Python, Java and Rust - the Rust version was faster than the Java version, and the Java version was faster than the Python version. So it's far from the case that Python is always faster than Rust.
But for this md5 comparison, it seems like Rust may be having a bit of a problem that goes beyond just one md5 implementation.
It's possible we should be asking not "Why is Rust slow in this case?", but "Why is Python fast in this case?"
It's worth noting, of course, that the Python is probably just wrapping OpenSSL - it's not pure Python of course. But the C is also just calling OpenSSL.
The code for all three languages can be found here.
Update: Apparently gcc and clang are infrequently faster than CPython on this, but Pypy
has been consistently faster than gcc and clang. This particular performance gap is rather small.
You can e-mail the author with questions or comments: