• Options that come to mind, in no particular order (but see below) :
    1. You may be able to just select a better algorithm and/or datastructure, especially if you're using lots of nested loops.
    2. You might be able to just optimize your Python: .../PythonSpeed/PerformanceTips
    3. Cython: A dialect of Python that can freely mix Python and C datatypes. Gives a modest speedup with a few type annotations for classes, and can be quite a bit faster if you give your innermost loop all C types to operate on. Pretty much CPython only at this time. Works with 2.x and 3.x!
    4. SWIG: A sort of interface/glue language for matching up a C library with a potentially large list of higher level languages, including CPython.
    5. ctypes: Allows you to call individual C functions and access individual C datatypes. Not blazing in CPython unless you spend a lot of time in the C code. Can be a bit brittle. Also works on Pypy; in fact Pypy uses it a lot today.
    6. C extension module: Pretty standard stuff for CPython - this is how much of the CPython standard library is put together. Some but not all of these work with Pypy if you use cpyext. In short, with this you're writing a module with a bunch of boiler plate, in C, but it can be called from CPython.
    7. CFFI: A new foreign function interface coming from the Pypy project. They have it working with CPython and Pypy now. It is like ctypes, but is less brittle.
    8. Pypy: This isn't really a way of calling C or C++, but it's got almost as much speed as C for a lot of pure python code. If you don't have a lot of C extension modules you require, then Pypy might be a good option for you.
    9. subprocess: A very portable way of interacting with another process in not-necessarily-the-same-language. Can be used with just about any language in the child process. Not blazing fast, unless you spend little time passing data back and forth. Pretty much all data exchanged has to be serialized to ASCII or something, but it is so portable and simple and loosely coupled that it's worth thinking about. Note that although this is a bit of a mess on Windows (slow process creation, exit statuses unreliable, non-columnar tool output), it works quite well on *ix.
    10. Shedskin - experimental translator that creates C++ from implicitly static Python code. Sadly, shedskin will likely never make the jump from 2.x to 3.x according to the project's author.
    11. Boost.Python: I've not tried Boost.Python yet, but it looks kind of like a ctypes or CFFI for C++.
    12. SIP sounds interesting; it's used to automatically generate bindings for Qt/KDE, which means it handles some pretty huge stuff.
    13. Numba - a JIT for CPython
    14. Pythran translates Python to templatized C++. Is not as complete as shedskin.
    15. Py2C is another Python to C++ converter.
    16. HOPE - a decorator tells HOPE what to convert to C++ for performance (?). Why it's better for astrophysics than anything else is a bit of a mystery.
    17. py14 - transpiles Python 2.x to C++14. It's not intended to be a complete transpiler; it's more about showing off C++14's type inference abilities.
  • Don't rush to coding in C or other forms of optimization - C is much more expensive in terms of programmer time and is prone to subtle, hard to find bugs. Write your code in Pure Python first, to get the program producing correct results at whatever speed, and then if you find that the program is too slow, profile the program to discover which part(s) need to be redone in one of the ways listed. Usually you'll only need 0-2% of your program to be done in something other than pure python - you may as well reap the programmer-time savings for as much of the code as is practical.

  • Options, in order, once you've identified a hotspot through profiling:
    1. Look into an algorithm or datastructure improvement.
    2. Numba is probably the least burdensome way to speed up CPython. This isn't burdensome, because it's just a matter of installing numba and using a decorator to say what to JIT.
    3. Pypy is a good next step if you can run your entire program on it, or perhaps use subprocess to call part of it in a separate process.
    4. Perhaps try converting your hot spot to Cython or Shedskin. These require minimal modifications to your code (if any), and produce C or C++ (respectively) that you can compile and import.
    5. If even that isn't enough, perhaps try rewriting your hotspot in C and calling that using subprocess or CFFI. Again, subprocess will require a new process - if that's too onerous (or slow!), don't do it. This requires C programming, but at least you didn't have to rewrite your entire program in a labor-intensive language.

    Algorithm or Datastructure Improvement







    Algorithm or Datastructure Improvement
    CPython Combines well
    Numba Combines well Combines well
    Pypy Combines well Does not combine well within a given process, but you could use subprocess Probably does not combine well
    Cython Combines well Combines well Combines Well, though not on the same callable Does not combine well
    Shedskin Combines well Combines well Combines well, though not on the same callable Does not combine well Combines well, though not on the same callable
    Subprocess Combines well Combines well Combines well Combines well Combines well Combines well
    CFFI Combines well Combines well Combines well, though not on the same callable Combines well Combines well, though not on the same callable Combines well, though not on the same callable Combines well

    Here is a page comparing a C++ microbenchmark to a similar Python microbenchmark. Don't make too much of this - microbenchmarks are not good indicators of overall performance.


    Back to Dan's tech tidbits

    You can e-mail the author with questions or comments: