Extending Python with C or C++
Python has an elegant interface to C and C++ which allows you to write the most computation-intensive parts of your code in a low-level language with excellent performance. The following example is due to Matthew Fowler who shared his code on github.
The following example extends Python with a simple program for computing the
standard deviation of many small matrices. While numpy
is optimized for large
matrices, it runs slower for millions of smaller calculations. These can be more
efficiently computed with a simple C++ program. In this exercise we will write
this code in stddev.cpp
below.
The following code requires the use of Python.h
which supplies a number of
types required to convert Python objects into C objects. This procedure is
described at length in the
documentation.
Components such as PyObject
, PyArg_ParseTuple
, and PyMethodDef
are all
part of this interface.
In [1]:
Overwriting stddev.cpp
The following Python code uses
distutils to package the
code from stdev.cpp
into a module called std
. We install the program below.
In [2]:
Overwriting setup.py
In [3]:
/Users/rpb/worker/dev/factory-bleed/conda/envs/dev/lib/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: 'install_requires'
warnings.warn(msg)
running install
running build
running build_ext
building 'std' extension
x86_64-apple-darwin13.4.0-clang -fno-strict-aliasing -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O3 -Wall -Wstrict-prototypes -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O3 -pipe -fdebug-prefix-map=${SRC_DIR}=/usr/local/src/conda/${PKG_NAME}-${PKG_VERSION} -fdebug-prefix-map=/Users/rpb/worker/dev/factory-bleed/conda/envs/dev=/usr/local/src/conda-prefix -flto -Wl,-export_dynamic -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O3 -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -D_FORTIFY_SOURCE=2 -mmacosx-version-min=10.9 -I/Users/rpb/worker/dev/factory-bleed/conda/envs/dev/include/python3.7m -c stddev.cpp -o build/temp.macosx-10.9-x86_64-3.7/stddev.o
clang-4.0: warning: -Wl,-export_dynamic: 'linker' input unused [-Wunused-command-line-argument]
x86_64-apple-darwin13.4.0-clang++ -bundle -undefined dynamic_lookup -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,/Users/rpb/worker/dev/factory-bleed/conda/envs/dev/lib -L/Users/rpb/worker/dev/factory-bleed/conda/envs/dev/lib -flto -Wl,-export_dynamic -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,/Users/rpb/worker/dev/factory-bleed/conda/envs/dev/lib -L/Users/rpb/worker/dev/factory-bleed/conda/envs/dev/lib -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -D_FORTIFY_SOURCE=2 -mmacosx-version-min=10.9 -arch x86_64 build/temp.macosx-10.9-x86_64-3.7/stddev.o -o build/lib.macosx-10.9-x86_64-3.7/std.cpython-37m-darwin.so
ld: warning: -pie being ignored. It is only used when linking a main executable
ld: warning: ignoring file /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/lib/libSystem.tbd, file was built for unsupported file format ( 0x2D 0x2D 0x2D 0x20 0x21 0x74 0x61 0x70 0x69 0x2D 0x74 0x62 0x64 0x2D 0x76 0x33 ) which is not the architecture being linked (x86_64): /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/lib/libSystem.tbd
running install_lib
copying build/lib.macosx-10.9-x86_64-3.7/std.cpython-37m-darwin.so -> /Users/rpb/worker/dev/factory-bleed/conda/envs/dev/lib/python3.7/site-packages
running install_egg_info
Removing /Users/rpb/worker/dev/factory-bleed/conda/envs/dev/lib/python3.7/site-packages/std_performance-1.0-py3.7.egg-info
Writing /Users/rpb/worker/dev/factory-bleed/conda/envs/dev/lib/python3.7/site-packages/std_performance-1.0-py3.7.egg-info
Now that we have installed this extension module, we are free to use it in an example to determine the performance benefit.
In [4]:
<Figure size 432x288 with 0 Axes>
We find that this method outperforms numpy
for small matrices.