• Options that come to mind, in no particular order (but see below) :
    1. You may be able to just select a better algorithm and/or datastructure, especially if you're using lots of nested loops.
    2. You might be able to just optimize your Python: .../PythonSpeed/PerformanceTips
    3. Cython: A dialect of Python that can freely mix Python and C datatypes. Gives a modest speedup with a few type annotations for classes, and can be quite a bit faster if you give your innermost loop all C types to operate on. Pretty much CPython only at this time. Works with 2.x and 3.x!
    4. SWIG: A sort of interface/glue language for matching up a C library with a potentially large list of higher level languages, including CPython.
    5. ctypes: Allows you to call individual C functions and access individual C datatypes. Not blazing in CPython unless you spend a lot of time in the C code. Can be a bit brittle. Also works on Pypy; in fact Pypy uses it a lot today.
    6. C extension module: Pretty standard stuff for CPython - this is how much of the CPython standard library is put together. Some but not all of these work with Pypy if you use cpyext. In short, with this you're writing a module with a bunch of boiler plate, in C, but it can be called from CPython.
    7. CFFI: A new foreign function interface coming from the Pypy project. They have it working with CPython and Pypy now. It is like ctypes, but is less brittle.
    8. Pypy: This isn't really a way of calling C or C++, but it's got almost as much speed as C for a lot of pure python code. If you don't have a lot of C extension modules you require, then Pypy might be a good option for you.
    9. subprocess: A very portable way of interacting with another process in not-necessarily-the-same-language. Can be used with just about any language in the child process. Not blazing fast, unless you spend little time passing data back and forth. Pretty much all data exchanged has to be serialized to ASCII or something, but it is so portable and simple and loosely coupled that it's worth thinking about. Note that although this is a bit of a mess on Windows (slow process creation, exit statuses unreliable, non-columnar tool output), it works quite well on *ix.
    10. Shedskin - experimental translator that creates C++ from implicitly static Python code. It is actively being ported to Python 3.x (2022-12-30).
    11. Boost.Python: I've not tried Boost.Python yet, but it looks kind of like a ctypes or CFFI for C++.
    12. SIP sounds interesting; it's used to automatically generate bindings for Qt/KDE, which means it handles some pretty huge stuff.
    13. Shiboken is also used for some Qt stuff, and may be more modern (?) than SIP (?).
    14. Numba - a JIT for CPython. Note that it has two modes: "object" and "no-python". object is rather unimpressive, and can actually be slower than pure CPython. However, no-python can do pretty well. no-python, however, requires an absence of most Python modules; it appears to be for math primarily or entirely.
    15. Pythran translates Python to templatized C++.
    16. Py2C is another Python to C++ converter.
    17. HOPE - a decorator tells HOPE what to convert to C++ for performance (?). Why it's better for astrophysics than anything else is a bit of a mystery. Supports 2.7 and 3.x.
    18. py14 - transpiles Python 2.x to C++14. It's not intended to be a complete transpiler; it's more about showing off C++14's type inference abilities.
    19. Ufora: Automatically parallel Python scaling to thousands of cores for data science and numerical computing
    20. Grumpy transpiles pure Python to Golang.
    21. pybind11 - C++ <-> Python interop.
    22. PyO3 / rust-cpython - speeding up CPython using Rust.
    23. Extending Python 3.x with Golang (might be slow!).
    24. py2cpp - another Python -> C++ project.
    25. mypyc aims to transpile type-annotated Python to C. As of 2018-10-23, it's in very early development.
    26. cppyy - uses cling. Requires C++ headers at runtime?
    27. Using Lua. Lua is actually a small language itself.
    28. HPy
      • A joint effort among some of the developers behind Pypy, CPython and Cython.
      • This aims to make things fast and flexible.
      • It's possible Cython will become able to generate HPy modules.
      • One of the chief ideas is to make it so you can prevent extension modules from digging around in CPython implementation details.
      • Another is that you should be able to build a binary module that runs on multiple versions of Python, unmodified, if you're willing to accept a modest performance penalty. Also, you can build for each Python version you want to target, if you need the best possible performance.
      • It is not yet clear what this will mean for CFFI.
    29. Taichi is a new one that sounds interesting, and claims to be able to compete with C++ in performance. It's apparently a dialect of Python that you can embed in a CPython module. It's apparently somehow parallel, and can run on a GPU?
  • Don't rush to coding in C or other forms of optimization - C is much more expensive in terms of programmer time and is prone to subtle, hard to find bugs. Write your code in Pure Python first, to get the program producing correct results at whatever speed, and then if you find that the program is too slow, profile the program to discover which part(s) need to be redone in one of the ways listed. Usually you'll only need 0-2% of your program to be done in something other than pure python - you may as well reap the programmer-time savings for as much of the code as is practical.

  • Options, in order, once you've identified a hotspot through profiling:
    1. Look into an algorithm or datastructure improvement.
    2. Numba is a reasonable way to speed up CPython. I haven't used it much, but I found that "object" mode doesn't help much, and "nopython" mode can help quite a bit. nopython mode appears to require you to put your math in a callable all by itself, without any uses of Python modules.
    3. Pypy is a good next step if you can run your entire program on it, or perhaps use subprocess to call part of it in a separate process. If you don't have a lot of C extension modules being used in your code, it's great. Sometimes it's even appropriate to rewrite some of your C extension modules into pure python to facilitate running your code on Pypy.
    4. Perhaps try converting your hot spot to Cython. This requires modifications to your code, and produces C that you can compile and import. I've had good luck using m4 to automatically generate pure python and cython from the same input file.
    5. If even that isn't enough, perhaps try rewriting your hotspot in C and calling that using subprocess, CFFI or the up-and-coming HPy. Again, subprocess will require a new process - if that's too onerous (or slow!), don't do it. This requires C programming, but at least you didn't have to rewrite your entire program in a labor-intensive language.


    Algorithm or Datastructure Improvement

    CPython

    Numba

    Pypy

    Cython

    Subprocess

    CFFI
    Algorithm or Datastructure Improvement
    CPython Combines well
    Numba Combines well Combines well
    Pypy Combines well Does not combine well within a given process, but you could use subprocess Probably does not combine well
    Cython Combines well Combines well Combines Well, though not on the same callable Does not combine well
    Subprocess Combines well Combines well Combines well Combines well Combines well
    CFFI Combines well Combines well Combines well, though not on the same callable Combines well Combines well, though not on the same callable Combines well

    Here is a page comparing a C++ microbenchmark to a similar Python microbenchmark. Don't make too much of this - microbenchmarks are not good indicators of overall performance.




    Hits: 8422
    Timestamp: 2024-03-01 11:46:38 PST

    Back to Dan's tech tidbits

    You can e-mail the author with questions or comments: