{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Fast python loops with cython\n", "\n", "Cython is essentially a Python to C translator. Cython allows you to use syntax similar to Python, while achieving speeds near that of C. \n", "\n", "This post describes how to use Cython to speed up a single Python function involving ‘tight loops’. I’ll leave more complicated applications - with many functions and classes - for a later post.\n", "\n", "# Should I use Cython?\n", "\n", "If you’re using Python and need performance there are a variety of options, see [quantecon](http://quant-econ.net/py/need_for_speed.html) for a detailed comparison. And of course you could always choose a different language like Julia, or be brave and learn C itself.\n", "\n", "While the static compilation approach of Cython may not be cutting edge, Cython is mature, well documented and capable of handling large complicated projects. Cython code lies behind many of the big Python scientific libraries including scikit-learn and pandas." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The example\n", "\n", "Our example function evaluates a Radial Basis Function (RBF) approximation scheme. We assume each data point is a ‘center’ and use Gaussian type RBFs\n", "\n", "$$Y_{i}=\\sum_{j=0}^{N} \\beta_{j}~ e^{- ~\\theta ~||~X_{i}-X_{j} ~||^{2} }$$\n", "\n", "so our function takes an input data array X of shape (N, D), a parameter array $\\beta $ of length N and a ‘bandwidth’ parameter $\\theta $ and return an array of values Y of length N." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%load_ext cython" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python 3.5.2 final 0\n", "Cython 0.23.4\n" ] } ], "source": [ "import sys\n", "import Cython\n", "print(\"Python %d.%d.%d %s %s\" % sys.version_info)\n", "print(\"Cython %s\" % Cython.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Python loops\n", "\n", "Here’s the naive Python implementation" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from math import exp\n", "import numpy as np\n", "\n", "def rbf_network(X, beta, theta):\n", "\n", " N = X.shape[0]\n", " D = X.shape[1]\n", " Y = np.zeros(N)\n", "\n", " for i in range(N):\n", " for j in range(N):\n", " r = 0\n", " for d in range(D):\n", " r += (X[j, d] - X[i, d]) ** 2\n", " r = r**0.5\n", " Y[i] += beta[j] * exp(-(r * theta)**2)\n", "\n", " return Y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s make up some data\n", "\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "D = 5\n", "N = 1000\n", "X = np.array([np.random.rand(N) for d in range(D)]).T\n", "beta = np.random.rand(N)\n", "theta = 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Timing this in IPython we get\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 loop, best of 3: 4.56 s per loop\n" ] } ], "source": [ "%timeit rbf_network(X, beta, theta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dam those Python loops are slow!\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# scipy.interpolate.Rbf\n", "\n", "So in this case we’re lucky and there’s an external numpy based implementation" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from scipy.interpolate import Rbf\n", "rbf = Rbf(X[:,0], X[:,1], X[:,2], X[:,3], X[:, 4], np.random.rand(N))\n", "Xtuple = tuple([X[:, i] for i in range(D)])" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 loop, best of 3: 194 ms per loop\n" ] } ], "source": [ "%timeit rbf(Xtuple)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Much better. But what if we want to go faster or we don’t have a library we can use.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cython (inline)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we just write %%cython at the beginning of the cell to load the cython commands." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With Cython there are a few ‘tricks’ involved in achieving good performance. Here’s the first one, if we add the option -a to the first line: %%cython -a " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "we generate an inline output where lines highlighted in yellow are still using Python and are slowing our code down. Our goal is get rid of yellow lines, especially any inside of loops." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " Cython: _cython_magic_40e916e82ae86364c79b3425e383bd7d.pyx\n", " \n", " \n", "\n", "\n", "

Generated by Cython 0.23.4

\n", "

\n", " Yellow lines hint at Python interaction.
\n", " Click on a line that starts with a \"+\" to see the C code that Cython generated for it.\n", "

\n", "
+01: from math import exp
\n", "
  __pyx_t_1 = PyList_New(1); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __Pyx_INCREF(__pyx_n_s_exp);\n",
       "  __Pyx_GIVEREF(__pyx_n_s_exp);\n",
       "  PyList_SET_ITEM(__pyx_t_1, 0, __pyx_n_s_exp);\n",
       "  __pyx_t_2 = __Pyx_Import(__pyx_n_s_math, __pyx_t_1, -1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_t_1 = __Pyx_ImportFrom(__pyx_t_2, __pyx_n_s_exp); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_exp, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "
+02: import numpy as np
\n", "
  __pyx_t_2 = __Pyx_Import(__pyx_n_s_numpy, 0, -1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "
 03: 
\n", "
+04: def rbf_network_0(X,beta,theta):
\n", "
/* Python wrapper */\n",
       "static PyObject *__pyx_pw_46_cython_magic_40e916e82ae86364c79b3425e383bd7d_1rbf_network_0(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/\n",
       "static PyMethodDef __pyx_mdef_46_cython_magic_40e916e82ae86364c79b3425e383bd7d_1rbf_network_0 = {\"rbf_network_0\", (PyCFunction)__pyx_pw_46_cython_magic_40e916e82ae86364c79b3425e383bd7d_1rbf_network_0, METH_VARARGS|METH_KEYWORDS, 0};\n",
       "static PyObject *__pyx_pw_46_cython_magic_40e916e82ae86364c79b3425e383bd7d_1rbf_network_0(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {\n",
       "  PyObject *__pyx_v_X = 0;\n",
       "  PyObject *__pyx_v_beta = 0;\n",
       "  PyObject *__pyx_v_theta = 0;\n",
       "  PyObject *__pyx_r = 0;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"rbf_network_0 (wrapper)\", 0);\n",
       "  {\n",
       "    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_X,&__pyx_n_s_beta,&__pyx_n_s_theta,0};\n",
       "    PyObject* values[3] = {0,0,0};\n",
       "    if (unlikely(__pyx_kwds)) {\n",
       "      Py_ssize_t kw_args;\n",
       "      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);\n",
       "      switch (pos_args) {\n",
       "        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);\n",
       "        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "        case  0: break;\n",
       "        default: goto __pyx_L5_argtuple_error;\n",
       "      }\n",
       "      kw_args = PyDict_Size(__pyx_kwds);\n",
       "      switch (pos_args) {\n",
       "        case  0:\n",
       "        if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_X)) != 0)) kw_args--;\n",
       "        else goto __pyx_L5_argtuple_error;\n",
       "        case  1:\n",
       "        if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_beta)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"rbf_network_0\", 1, 3, 3, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "        }\n",
       "        case  2:\n",
       "        if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_theta)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"rbf_network_0\", 1, 3, 3, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "        }\n",
       "      }\n",
       "      if (unlikely(kw_args > 0)) {\n",
       "        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, \"rbf_network_0\") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "      }\n",
       "    } else if (PyTuple_GET_SIZE(__pyx_args) != 3) {\n",
       "      goto __pyx_L5_argtuple_error;\n",
       "    } else {\n",
       "      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);\n",
       "    }\n",
       "    __pyx_v_X = values[0];\n",
       "    __pyx_v_beta = values[1];\n",
       "    __pyx_v_theta = values[2];\n",
       "  }\n",
       "  goto __pyx_L4_argument_unpacking_done;\n",
       "  __pyx_L5_argtuple_error:;\n",
       "  __Pyx_RaiseArgtupleInvalid(\"rbf_network_0\", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "  __pyx_L3_error:;\n",
       "  __Pyx_AddTraceback(\"_cython_magic_40e916e82ae86364c79b3425e383bd7d.rbf_network_0\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return NULL;\n",
       "  __pyx_L4_argument_unpacking_done:;\n",
       "  __pyx_r = __pyx_pf_46_cython_magic_40e916e82ae86364c79b3425e383bd7d_rbf_network_0(__pyx_self, __pyx_v_X, __pyx_v_beta, __pyx_v_theta);\n",
       "  int __pyx_lineno = 0;\n",
       "  const char *__pyx_filename = NULL;\n",
       "  int __pyx_clineno = 0;\n",
       "\n",
       "  /* function exit code */\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "\n",
       "static PyObject *__pyx_pf_46_cython_magic_40e916e82ae86364c79b3425e383bd7d_rbf_network_0(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_X, PyObject *__pyx_v_beta, PyObject *__pyx_v_theta) {\n",
       "  PyObject *__pyx_v_N = NULL;\n",
       "  PyObject *__pyx_v_D = NULL;\n",
       "  PyObject *__pyx_v_Y = NULL;\n",
       "  PyObject *__pyx_v_i = NULL;\n",
       "  PyObject *__pyx_v_j = NULL;\n",
       "  PyObject *__pyx_v_r = NULL;\n",
       "  PyObject *__pyx_v_d = NULL;\n",
       "  PyObject *__pyx_r = NULL;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"rbf_network_0\", 0);\n",
       "/* … */\n",
       "  /* function exit code */\n",
       "  __pyx_L1_error:;\n",
       "  __Pyx_XDECREF(__pyx_t_1);\n",
       "  __Pyx_XDECREF(__pyx_t_2);\n",
       "  __Pyx_XDECREF(__pyx_t_3);\n",
       "  __Pyx_XDECREF(__pyx_t_4);\n",
       "  __Pyx_XDECREF(__pyx_t_11);\n",
       "  __Pyx_XDECREF(__pyx_t_12);\n",
       "  __Pyx_XDECREF(__pyx_t_13);\n",
       "  __Pyx_XDECREF(__pyx_t_14);\n",
       "  __Pyx_XDECREF(__pyx_t_15);\n",
       "  __Pyx_XDECREF(__pyx_t_16);\n",
       "  __Pyx_AddTraceback(\"_cython_magic_40e916e82ae86364c79b3425e383bd7d.rbf_network_0\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __pyx_r = NULL;\n",
       "  __pyx_L0:;\n",
       "  __Pyx_XDECREF(__pyx_v_N);\n",
       "  __Pyx_XDECREF(__pyx_v_D);\n",
       "  __Pyx_XDECREF(__pyx_v_Y);\n",
       "  __Pyx_XDECREF(__pyx_v_i);\n",
       "  __Pyx_XDECREF(__pyx_v_j);\n",
       "  __Pyx_XDECREF(__pyx_v_r);\n",
       "  __Pyx_XDECREF(__pyx_v_d);\n",
       "  __Pyx_XGIVEREF(__pyx_r);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "/* … */\n",
       "  __pyx_tuple_ = PyTuple_Pack(10, __pyx_n_s_X, __pyx_n_s_beta, __pyx_n_s_theta, __pyx_n_s_N, __pyx_n_s_D, __pyx_n_s_Y, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_r, __pyx_n_s_d); if (unlikely(!__pyx_tuple_)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_tuple_);\n",
       "  __Pyx_GIVEREF(__pyx_tuple_);\n",
       "/* … */\n",
       "  __pyx_t_2 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_40e916e82ae86364c79b3425e383bd7d_1rbf_network_0, NULL, __pyx_n_s_cython_magic_40e916e82ae86364c7); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_rbf_network_0, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "
 05: 
\n", "
+06:     N = X.shape[0]
\n", "
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(__pyx_v_X, __pyx_n_s_shape); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __pyx_t_2 = __Pyx_GetItemInt(__pyx_t_1, 0, long, 1, __Pyx_PyInt_From_long, 0, 0, 1); if (unlikely(__pyx_t_2 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;};\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_v_N = __pyx_t_2;\n",
       "  __pyx_t_2 = 0;\n",
       "
+07:     D = X.shape[1]
\n", "
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_v_X, __pyx_n_s_shape); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 7; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_1 = __Pyx_GetItemInt(__pyx_t_2, 1, long, 1, __Pyx_PyInt_From_long, 0, 0, 1); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 7; __pyx_clineno = __LINE__; goto __pyx_L1_error;};\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __pyx_v_D = __pyx_t_1;\n",
       "  __pyx_t_1 = 0;\n",
       "
+08:     Y = np.zeros(N)
\n", "
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_zeros); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __pyx_t_2 = NULL;\n",
       "  if (CYTHON_COMPILING_IN_CPYTHON && unlikely(PyMethod_Check(__pyx_t_3))) {\n",
       "    __pyx_t_2 = PyMethod_GET_SELF(__pyx_t_3);\n",
       "    if (likely(__pyx_t_2)) {\n",
       "      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);\n",
       "      __Pyx_INCREF(__pyx_t_2);\n",
       "      __Pyx_INCREF(function);\n",
       "      __Pyx_DECREF_SET(__pyx_t_3, function);\n",
       "    }\n",
       "  }\n",
       "  if (!__pyx_t_2) {\n",
       "    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_v_N); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "  } else {\n",
       "    __pyx_t_4 = PyTuple_New(1+1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_4);\n",
       "    __Pyx_GIVEREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_2); __pyx_t_2 = NULL;\n",
       "    __Pyx_INCREF(__pyx_v_N);\n",
       "    __Pyx_GIVEREF(__pyx_v_N);\n",
       "    PyTuple_SET_ITEM(__pyx_t_4, 0+1, __pyx_v_N);\n",
       "    __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_4, NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "  __pyx_v_Y = __pyx_t_1;\n",
       "  __pyx_t_1 = 0;\n",
       "
 09: 
\n", "
+10:     for i in range(N):
\n", "
  __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __Pyx_INCREF(__pyx_v_N);\n",
       "  __Pyx_GIVEREF(__pyx_v_N);\n",
       "  PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_v_N);\n",
       "  __pyx_t_3 = __Pyx_PyObject_Call(__pyx_builtin_range, __pyx_t_1, NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  if (likely(PyList_CheckExact(__pyx_t_3)) || PyTuple_CheckExact(__pyx_t_3)) {\n",
       "    __pyx_t_1 = __pyx_t_3; __Pyx_INCREF(__pyx_t_1); __pyx_t_5 = 0;\n",
       "    __pyx_t_6 = NULL;\n",
       "  } else {\n",
       "    __pyx_t_5 = -1; __pyx_t_1 = PyObject_GetIter(__pyx_t_3); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "    __pyx_t_6 = Py_TYPE(__pyx_t_1)->tp_iternext; if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "  for (;;) {\n",
       "    if (likely(!__pyx_t_6)) {\n",
       "      if (likely(PyList_CheckExact(__pyx_t_1))) {\n",
       "        if (__pyx_t_5 >= PyList_GET_SIZE(__pyx_t_1)) break;\n",
       "        #if CYTHON_COMPILING_IN_CPYTHON\n",
       "        __pyx_t_3 = PyList_GET_ITEM(__pyx_t_1, __pyx_t_5); __Pyx_INCREF(__pyx_t_3); __pyx_t_5++; if (unlikely(0 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        #else\n",
       "        __pyx_t_3 = PySequence_ITEM(__pyx_t_1, __pyx_t_5); __pyx_t_5++; if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_3);\n",
       "        #endif\n",
       "      } else {\n",
       "        if (__pyx_t_5 >= PyTuple_GET_SIZE(__pyx_t_1)) break;\n",
       "        #if CYTHON_COMPILING_IN_CPYTHON\n",
       "        __pyx_t_3 = PyTuple_GET_ITEM(__pyx_t_1, __pyx_t_5); __Pyx_INCREF(__pyx_t_3); __pyx_t_5++; if (unlikely(0 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        #else\n",
       "        __pyx_t_3 = PySequence_ITEM(__pyx_t_1, __pyx_t_5); __pyx_t_5++; if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_3);\n",
       "        #endif\n",
       "      }\n",
       "    } else {\n",
       "      __pyx_t_3 = __pyx_t_6(__pyx_t_1);\n",
       "      if (unlikely(!__pyx_t_3)) {\n",
       "        PyObject* exc_type = PyErr_Occurred();\n",
       "        if (exc_type) {\n",
       "          if (likely(exc_type == PyExc_StopIteration || PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();\n",
       "          else {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        }\n",
       "        break;\n",
       "      }\n",
       "      __Pyx_GOTREF(__pyx_t_3);\n",
       "    }\n",
       "    __Pyx_XDECREF_SET(__pyx_v_i, __pyx_t_3);\n",
       "    __pyx_t_3 = 0;\n",
       "/* … */\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
+11:         for j in range(N):
\n", "
    __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_3);\n",
       "    __Pyx_INCREF(__pyx_v_N);\n",
       "    __Pyx_GIVEREF(__pyx_v_N);\n",
       "    PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_v_N);\n",
       "    __pyx_t_4 = __Pyx_PyObject_Call(__pyx_builtin_range, __pyx_t_3, NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_4);\n",
       "    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "    if (likely(PyList_CheckExact(__pyx_t_4)) || PyTuple_CheckExact(__pyx_t_4)) {\n",
       "      __pyx_t_3 = __pyx_t_4; __Pyx_INCREF(__pyx_t_3); __pyx_t_7 = 0;\n",
       "      __pyx_t_8 = NULL;\n",
       "    } else {\n",
       "      __pyx_t_7 = -1; __pyx_t_3 = PyObject_GetIter(__pyx_t_4); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_3);\n",
       "      __pyx_t_8 = Py_TYPE(__pyx_t_3)->tp_iternext; if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    }\n",
       "    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "    for (;;) {\n",
       "      if (likely(!__pyx_t_8)) {\n",
       "        if (likely(PyList_CheckExact(__pyx_t_3))) {\n",
       "          if (__pyx_t_7 >= PyList_GET_SIZE(__pyx_t_3)) break;\n",
       "          #if CYTHON_COMPILING_IN_CPYTHON\n",
       "          __pyx_t_4 = PyList_GET_ITEM(__pyx_t_3, __pyx_t_7); __Pyx_INCREF(__pyx_t_4); __pyx_t_7++; if (unlikely(0 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "          #else\n",
       "          __pyx_t_4 = PySequence_ITEM(__pyx_t_3, __pyx_t_7); __pyx_t_7++; if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "          __Pyx_GOTREF(__pyx_t_4);\n",
       "          #endif\n",
       "        } else {\n",
       "          if (__pyx_t_7 >= PyTuple_GET_SIZE(__pyx_t_3)) break;\n",
       "          #if CYTHON_COMPILING_IN_CPYTHON\n",
       "          __pyx_t_4 = PyTuple_GET_ITEM(__pyx_t_3, __pyx_t_7); __Pyx_INCREF(__pyx_t_4); __pyx_t_7++; if (unlikely(0 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "          #else\n",
       "          __pyx_t_4 = PySequence_ITEM(__pyx_t_3, __pyx_t_7); __pyx_t_7++; if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "          __Pyx_GOTREF(__pyx_t_4);\n",
       "          #endif\n",
       "        }\n",
       "      } else {\n",
       "        __pyx_t_4 = __pyx_t_8(__pyx_t_3);\n",
       "        if (unlikely(!__pyx_t_4)) {\n",
       "          PyObject* exc_type = PyErr_Occurred();\n",
       "          if (exc_type) {\n",
       "            if (likely(exc_type == PyExc_StopIteration || PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();\n",
       "            else {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "          }\n",
       "          break;\n",
       "        }\n",
       "        __Pyx_GOTREF(__pyx_t_4);\n",
       "      }\n",
       "      __Pyx_XDECREF_SET(__pyx_v_j, __pyx_t_4);\n",
       "      __pyx_t_4 = 0;\n",
       "/* … */\n",
       "    }\n",
       "    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "
+12:             r = 0
\n", "
      __Pyx_INCREF(__pyx_int_0);\n",
       "      __Pyx_XDECREF_SET(__pyx_v_r, __pyx_int_0);\n",
       "
+13:             for d in range(D):
\n", "
      __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_4);\n",
       "      __Pyx_INCREF(__pyx_v_D);\n",
       "      __Pyx_GIVEREF(__pyx_v_D);\n",
       "      PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_v_D);\n",
       "      __pyx_t_2 = __Pyx_PyObject_Call(__pyx_builtin_range, __pyx_t_4, NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_2);\n",
       "      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "      if (likely(PyList_CheckExact(__pyx_t_2)) || PyTuple_CheckExact(__pyx_t_2)) {\n",
       "        __pyx_t_4 = __pyx_t_2; __Pyx_INCREF(__pyx_t_4); __pyx_t_9 = 0;\n",
       "        __pyx_t_10 = NULL;\n",
       "      } else {\n",
       "        __pyx_t_9 = -1; __pyx_t_4 = PyObject_GetIter(__pyx_t_2); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_4);\n",
       "        __pyx_t_10 = Py_TYPE(__pyx_t_4)->tp_iternext; if (unlikely(!__pyx_t_10)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      }\n",
       "      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "      for (;;) {\n",
       "        if (likely(!__pyx_t_10)) {\n",
       "          if (likely(PyList_CheckExact(__pyx_t_4))) {\n",
       "            if (__pyx_t_9 >= PyList_GET_SIZE(__pyx_t_4)) break;\n",
       "            #if CYTHON_COMPILING_IN_CPYTHON\n",
       "            __pyx_t_2 = PyList_GET_ITEM(__pyx_t_4, __pyx_t_9); __Pyx_INCREF(__pyx_t_2); __pyx_t_9++; if (unlikely(0 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "            #else\n",
       "            __pyx_t_2 = PySequence_ITEM(__pyx_t_4, __pyx_t_9); __pyx_t_9++; if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "            __Pyx_GOTREF(__pyx_t_2);\n",
       "            #endif\n",
       "          } else {\n",
       "            if (__pyx_t_9 >= PyTuple_GET_SIZE(__pyx_t_4)) break;\n",
       "            #if CYTHON_COMPILING_IN_CPYTHON\n",
       "            __pyx_t_2 = PyTuple_GET_ITEM(__pyx_t_4, __pyx_t_9); __Pyx_INCREF(__pyx_t_2); __pyx_t_9++; if (unlikely(0 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "            #else\n",
       "            __pyx_t_2 = PySequence_ITEM(__pyx_t_4, __pyx_t_9); __pyx_t_9++; if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "            __Pyx_GOTREF(__pyx_t_2);\n",
       "            #endif\n",
       "          }\n",
       "        } else {\n",
       "          __pyx_t_2 = __pyx_t_10(__pyx_t_4);\n",
       "          if (unlikely(!__pyx_t_2)) {\n",
       "            PyObject* exc_type = PyErr_Occurred();\n",
       "            if (exc_type) {\n",
       "              if (likely(exc_type == PyExc_StopIteration || PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();\n",
       "              else {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "            }\n",
       "            break;\n",
       "          }\n",
       "          __Pyx_GOTREF(__pyx_t_2);\n",
       "        }\n",
       "        __Pyx_XDECREF_SET(__pyx_v_d, __pyx_t_2);\n",
       "        __pyx_t_2 = 0;\n",
       "/* … */\n",
       "      }\n",
       "      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "
+14:                 r += (X[j, d] - X[i, d]) ** 2
\n", "
        __pyx_t_2 = PyTuple_New(2); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_2);\n",
       "        __Pyx_INCREF(__pyx_v_j);\n",
       "        __Pyx_GIVEREF(__pyx_v_j);\n",
       "        PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_v_j);\n",
       "        __Pyx_INCREF(__pyx_v_d);\n",
       "        __Pyx_GIVEREF(__pyx_v_d);\n",
       "        PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_v_d);\n",
       "        __pyx_t_11 = PyObject_GetItem(__pyx_v_X, __pyx_t_2); if (unlikely(__pyx_t_11 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;};\n",
       "        __Pyx_GOTREF(__pyx_t_11);\n",
       "        __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "        __pyx_t_2 = PyTuple_New(2); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_2);\n",
       "        __Pyx_INCREF(__pyx_v_i);\n",
       "        __Pyx_GIVEREF(__pyx_v_i);\n",
       "        PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_v_i);\n",
       "        __Pyx_INCREF(__pyx_v_d);\n",
       "        __Pyx_GIVEREF(__pyx_v_d);\n",
       "        PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_v_d);\n",
       "        __pyx_t_12 = PyObject_GetItem(__pyx_v_X, __pyx_t_2); if (unlikely(__pyx_t_12 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;};\n",
       "        __Pyx_GOTREF(__pyx_t_12);\n",
       "        __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "        __pyx_t_2 = PyNumber_Subtract(__pyx_t_11, __pyx_t_12); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_2);\n",
       "        __Pyx_DECREF(__pyx_t_11); __pyx_t_11 = 0;\n",
       "        __Pyx_DECREF(__pyx_t_12); __pyx_t_12 = 0;\n",
       "        __pyx_t_12 = PyNumber_Power(__pyx_t_2, __pyx_int_2, Py_None); if (unlikely(!__pyx_t_12)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_12);\n",
       "        __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "        __pyx_t_2 = PyNumber_InPlaceAdd(__pyx_v_r, __pyx_t_12); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_2);\n",
       "        __Pyx_DECREF(__pyx_t_12); __pyx_t_12 = 0;\n",
       "        __Pyx_DECREF_SET(__pyx_v_r, __pyx_t_2);\n",
       "        __pyx_t_2 = 0;\n",
       "
+15:             r = r**0.5
\n", "
      __pyx_t_4 = PyNumber_Power(__pyx_v_r, __pyx_float_0_5, Py_None); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_4);\n",
       "      __Pyx_DECREF_SET(__pyx_v_r, __pyx_t_4);\n",
       "      __pyx_t_4 = 0;\n",
       "
+16:             Y[i] += beta[j] * exp(-(r * theta)**2)
\n", "
      __Pyx_INCREF(__pyx_v_i);\n",
       "      __pyx_t_4 = __pyx_v_i;\n",
       "      __pyx_t_2 = PyObject_GetItem(__pyx_v_Y, __pyx_t_4); if (unlikely(__pyx_t_2 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;};\n",
       "      __Pyx_GOTREF(__pyx_t_2);\n",
       "      __pyx_t_12 = PyObject_GetItem(__pyx_v_beta, __pyx_v_j); if (unlikely(__pyx_t_12 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;};\n",
       "      __Pyx_GOTREF(__pyx_t_12);\n",
       "      __pyx_t_13 = __Pyx_GetModuleGlobalName(__pyx_n_s_exp); if (unlikely(!__pyx_t_13)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_13);\n",
       "      __pyx_t_14 = PyNumber_Multiply(__pyx_v_r, __pyx_v_theta); if (unlikely(!__pyx_t_14)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_14);\n",
       "      __pyx_t_15 = PyNumber_Power(__pyx_t_14, __pyx_int_2, Py_None); if (unlikely(!__pyx_t_15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_15);\n",
       "      __Pyx_DECREF(__pyx_t_14); __pyx_t_14 = 0;\n",
       "      __pyx_t_14 = PyNumber_Negative(__pyx_t_15); if (unlikely(!__pyx_t_14)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_14);\n",
       "      __Pyx_DECREF(__pyx_t_15); __pyx_t_15 = 0;\n",
       "      __pyx_t_15 = NULL;\n",
       "      if (CYTHON_COMPILING_IN_CPYTHON && unlikely(PyMethod_Check(__pyx_t_13))) {\n",
       "        __pyx_t_15 = PyMethod_GET_SELF(__pyx_t_13);\n",
       "        if (likely(__pyx_t_15)) {\n",
       "          PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_13);\n",
       "          __Pyx_INCREF(__pyx_t_15);\n",
       "          __Pyx_INCREF(function);\n",
       "          __Pyx_DECREF_SET(__pyx_t_13, function);\n",
       "        }\n",
       "      }\n",
       "      if (!__pyx_t_15) {\n",
       "        __pyx_t_11 = __Pyx_PyObject_CallOneArg(__pyx_t_13, __pyx_t_14); if (unlikely(!__pyx_t_11)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_DECREF(__pyx_t_14); __pyx_t_14 = 0;\n",
       "        __Pyx_GOTREF(__pyx_t_11);\n",
       "      } else {\n",
       "        __pyx_t_16 = PyTuple_New(1+1); if (unlikely(!__pyx_t_16)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_16);\n",
       "        __Pyx_GIVEREF(__pyx_t_15); PyTuple_SET_ITEM(__pyx_t_16, 0, __pyx_t_15); __pyx_t_15 = NULL;\n",
       "        __Pyx_GIVEREF(__pyx_t_14);\n",
       "        PyTuple_SET_ITEM(__pyx_t_16, 0+1, __pyx_t_14);\n",
       "        __pyx_t_14 = 0;\n",
       "        __pyx_t_11 = __Pyx_PyObject_Call(__pyx_t_13, __pyx_t_16, NULL); if (unlikely(!__pyx_t_11)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_11);\n",
       "        __Pyx_DECREF(__pyx_t_16); __pyx_t_16 = 0;\n",
       "      }\n",
       "      __Pyx_DECREF(__pyx_t_13); __pyx_t_13 = 0;\n",
       "      __pyx_t_13 = PyNumber_Multiply(__pyx_t_12, __pyx_t_11); if (unlikely(!__pyx_t_13)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_13);\n",
       "      __Pyx_DECREF(__pyx_t_12); __pyx_t_12 = 0;\n",
       "      __Pyx_DECREF(__pyx_t_11); __pyx_t_11 = 0;\n",
       "      __pyx_t_11 = PyNumber_InPlaceAdd(__pyx_t_2, __pyx_t_13); if (unlikely(!__pyx_t_11)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_11);\n",
       "      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "      __Pyx_DECREF(__pyx_t_13); __pyx_t_13 = 0;\n",
       "      if (unlikely(PyObject_SetItem(__pyx_v_Y, __pyx_t_4, __pyx_t_11) < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_DECREF(__pyx_t_11); __pyx_t_11 = 0;\n",
       "      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "
 17: 
\n", "
+18:     return Y
\n", "
  __Pyx_XDECREF(__pyx_r);\n",
       "  __Pyx_INCREF(__pyx_v_Y);\n",
       "  __pyx_r = __pyx_v_Y;\n",
       "  goto __pyx_L0;\n",
       "
" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%cython -a \n", "from math import exp\n", "import numpy as np\n", "\n", "def rbf_network_0(X,beta,theta):\n", "\n", " N = X.shape[0]\n", " D = X.shape[1]\n", " Y = np.zeros(N)\n", "\n", " for i in range(N):\n", " for j in range(N):\n", " r = 0\n", " for d in range(D):\n", " r += (X[j, d] - X[i, d]) ** 2\n", " r = r**0.5\n", " Y[i] += beta[j] * exp(-(r * theta)**2)\n", "\n", " return Y" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 loop, best of 3: 3.85 s per loop\n" ] } ], "source": [ "%timeit rbf_network_0(X, beta, theta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We only get the speedups when we start optimizing the code by typing the variables, avoiding the use of the math lib, and omitting some bounds checking. This allows the function to be run in pure C and not touch the python interpreter." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " Cython: _cython_magic_a7c52605a97d01d30b3f60080cbb0e2e.pyx\n", " \n", " \n", "\n", "\n", "

Generated by Cython 0.23.4

\n", "

\n", " Yellow lines hint at Python interaction.
\n", " Click on a line that starts with a \"+\" to see the C code that Cython generated for it.\n", "

\n", "
+01: from math import exp
\n", "
  __pyx_t_1 = PyList_New(1); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __Pyx_INCREF(__pyx_n_s_exp);\n",
       "  __Pyx_GIVEREF(__pyx_n_s_exp);\n",
       "  PyList_SET_ITEM(__pyx_t_1, 0, __pyx_n_s_exp);\n",
       "  __pyx_t_2 = __Pyx_Import(__pyx_n_s_math, __pyx_t_1, -1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_t_1 = __Pyx_ImportFrom(__pyx_t_2, __pyx_n_s_exp); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_exp, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "/* … */\n",
       "  __pyx_t_2 = PyDict_New(); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "
+02: import numpy as np
\n", "
  __pyx_t_2 = __Pyx_Import(__pyx_n_s_numpy, 0, -1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "
 03: 
\n", "
+04: def rbf_network_1(double[:, :] X,  double[:] beta, double theta):
\n", "
/* Python wrapper */\n",
       "static PyObject *__pyx_pw_46_cython_magic_a7c52605a97d01d30b3f60080cbb0e2e_1rbf_network_1(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/\n",
       "static PyMethodDef __pyx_mdef_46_cython_magic_a7c52605a97d01d30b3f60080cbb0e2e_1rbf_network_1 = {\"rbf_network_1\", (PyCFunction)__pyx_pw_46_cython_magic_a7c52605a97d01d30b3f60080cbb0e2e_1rbf_network_1, METH_VARARGS|METH_KEYWORDS, 0};\n",
       "static PyObject *__pyx_pw_46_cython_magic_a7c52605a97d01d30b3f60080cbb0e2e_1rbf_network_1(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {\n",
       "  __Pyx_memviewslice __pyx_v_X = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  __Pyx_memviewslice __pyx_v_beta = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  double __pyx_v_theta;\n",
       "  PyObject *__pyx_r = 0;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"rbf_network_1 (wrapper)\", 0);\n",
       "  {\n",
       "    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_X,&__pyx_n_s_beta,&__pyx_n_s_theta,0};\n",
       "    PyObject* values[3] = {0,0,0};\n",
       "    if (unlikely(__pyx_kwds)) {\n",
       "      Py_ssize_t kw_args;\n",
       "      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);\n",
       "      switch (pos_args) {\n",
       "        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);\n",
       "        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "        case  0: break;\n",
       "        default: goto __pyx_L5_argtuple_error;\n",
       "      }\n",
       "      kw_args = PyDict_Size(__pyx_kwds);\n",
       "      switch (pos_args) {\n",
       "        case  0:\n",
       "        if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_X)) != 0)) kw_args--;\n",
       "        else goto __pyx_L5_argtuple_error;\n",
       "        case  1:\n",
       "        if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_beta)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"rbf_network_1\", 1, 3, 3, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "        }\n",
       "        case  2:\n",
       "        if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_theta)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"rbf_network_1\", 1, 3, 3, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "        }\n",
       "      }\n",
       "      if (unlikely(kw_args > 0)) {\n",
       "        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, \"rbf_network_1\") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "      }\n",
       "    } else if (PyTuple_GET_SIZE(__pyx_args) != 3) {\n",
       "      goto __pyx_L5_argtuple_error;\n",
       "    } else {\n",
       "      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);\n",
       "    }\n",
       "    __pyx_v_X = __Pyx_PyObject_to_MemoryviewSlice_dsds_double(values[0]); if (unlikely(!__pyx_v_X.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "    __pyx_v_beta = __Pyx_PyObject_to_MemoryviewSlice_ds_double(values[1]); if (unlikely(!__pyx_v_beta.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "    __pyx_v_theta = __pyx_PyFloat_AsDouble(values[2]); if (unlikely((__pyx_v_theta == (double)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "  }\n",
       "  goto __pyx_L4_argument_unpacking_done;\n",
       "  __pyx_L5_argtuple_error:;\n",
       "  __Pyx_RaiseArgtupleInvalid(\"rbf_network_1\", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "  __pyx_L3_error:;\n",
       "  __Pyx_AddTraceback(\"_cython_magic_a7c52605a97d01d30b3f60080cbb0e2e.rbf_network_1\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return NULL;\n",
       "  __pyx_L4_argument_unpacking_done:;\n",
       "  __pyx_r = __pyx_pf_46_cython_magic_a7c52605a97d01d30b3f60080cbb0e2e_rbf_network_1(__pyx_self, __pyx_v_X, __pyx_v_beta, __pyx_v_theta);\n",
       "  int __pyx_lineno = 0;\n",
       "  const char *__pyx_filename = NULL;\n",
       "  int __pyx_clineno = 0;\n",
       "\n",
       "  /* function exit code */\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "\n",
       "static PyObject *__pyx_pf_46_cython_magic_a7c52605a97d01d30b3f60080cbb0e2e_rbf_network_1(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_X, __Pyx_memviewslice __pyx_v_beta, double __pyx_v_theta) {\n",
       "  int __pyx_v_N;\n",
       "  int __pyx_v_D;\n",
       "  __Pyx_memviewslice __pyx_v_Y = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  int __pyx_v_i;\n",
       "  int __pyx_v_j;\n",
       "  int __pyx_v_d;\n",
       "  double __pyx_v_r;\n",
       "  PyObject *__pyx_r = NULL;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"rbf_network_1\", 0);\n",
       "/* … */\n",
       "  /* function exit code */\n",
       "  __pyx_L1_error:;\n",
       "  __Pyx_XDECREF(__pyx_t_1);\n",
       "  __Pyx_XDECREF(__pyx_t_2);\n",
       "  __Pyx_XDECREF(__pyx_t_3);\n",
       "  __Pyx_XDECREF(__pyx_t_4);\n",
       "  __Pyx_XDECREF(__pyx_t_5);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_t_6, 1);\n",
       "  __Pyx_XDECREF(__pyx_t_19);\n",
       "  __Pyx_AddTraceback(\"_cython_magic_a7c52605a97d01d30b3f60080cbb0e2e.rbf_network_1\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __pyx_r = NULL;\n",
       "  __pyx_L0:;\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_Y, 1);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_X, 1);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_beta, 1);\n",
       "  __Pyx_XGIVEREF(__pyx_r);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "/* … */\n",
       "  __pyx_tuple__14 = PyTuple_Pack(10, __pyx_n_s_X, __pyx_n_s_beta, __pyx_n_s_theta, __pyx_n_s_N, __pyx_n_s_D, __pyx_n_s_Y, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_d, __pyx_n_s_r); if (unlikely(!__pyx_tuple__14)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_tuple__14);\n",
       "  __Pyx_GIVEREF(__pyx_tuple__14);\n",
       "/* … */\n",
       "  __pyx_t_2 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_a7c52605a97d01d30b3f60080cbb0e2e_1rbf_network_1, NULL, __pyx_n_s_cython_magic_a7c52605a97d01d30b); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_rbf_network_1, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __pyx_codeobj__15 = (PyObject*)__Pyx_PyCode_New(3, 0, 10, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_home_dcorre_cache_ipython_cytho, __pyx_n_s_rbf_network_1, 4, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "
 05: 
\n", "
+06:     cdef int N = X.shape[0]
\n", "
  __pyx_v_N = (__pyx_v_X.shape[0]);\n",
       "
+07:     cdef int D = X.shape[1]
\n", "
  __pyx_v_D = (__pyx_v_X.shape[1]);\n",
       "
+08:     cdef double[:] Y = np.zeros(N)
\n", "
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_zeros); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __pyx_t_2 = __Pyx_PyInt_From_int(__pyx_v_N); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_4 = NULL;\n",
       "  if (CYTHON_COMPILING_IN_CPYTHON && unlikely(PyMethod_Check(__pyx_t_3))) {\n",
       "    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);\n",
       "    if (likely(__pyx_t_4)) {\n",
       "      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);\n",
       "      __Pyx_INCREF(__pyx_t_4);\n",
       "      __Pyx_INCREF(function);\n",
       "      __Pyx_DECREF_SET(__pyx_t_3, function);\n",
       "    }\n",
       "  }\n",
       "  if (!__pyx_t_4) {\n",
       "    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "  } else {\n",
       "    __pyx_t_5 = PyTuple_New(1+1); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_5);\n",
       "    __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_4); __pyx_t_4 = NULL;\n",
       "    __Pyx_GIVEREF(__pyx_t_2);\n",
       "    PyTuple_SET_ITEM(__pyx_t_5, 0+1, __pyx_t_2);\n",
       "    __pyx_t_2 = 0;\n",
       "    __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_5, NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "  __pyx_t_6 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_t_1);\n",
       "  if (unlikely(!__pyx_t_6.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_v_Y = __pyx_t_6;\n",
       "  __pyx_t_6.memview = NULL;\n",
       "  __pyx_t_6.data = NULL;\n",
       "
 09:     cdef int i, j, d
\n", "
+10:     cdef double r = 0
\n", "
  __pyx_v_r = 0.0;\n",
       "
 11: 
\n", "
+12:     for i in range(N):
\n", "
  __pyx_t_7 = __pyx_v_N;\n",
       "  for (__pyx_t_8 = 0; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) {\n",
       "    __pyx_v_i = __pyx_t_8;\n",
       "
+13:         for j in range(N):
\n", "
    __pyx_t_9 = __pyx_v_N;\n",
       "    for (__pyx_t_10 = 0; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {\n",
       "      __pyx_v_j = __pyx_t_10;\n",
       "
+14:             r = 0
\n", "
      __pyx_v_r = 0.0;\n",
       "
+15:             for d in range(D):
\n", "
      __pyx_t_11 = __pyx_v_D;\n",
       "      for (__pyx_t_12 = 0; __pyx_t_12 < __pyx_t_11; __pyx_t_12+=1) {\n",
       "        __pyx_v_d = __pyx_t_12;\n",
       "
+16:                 r += (X[j, d] - X[i, d]) ** 2
\n", "
        __pyx_t_13 = __pyx_v_j;\n",
       "        __pyx_t_14 = __pyx_v_d;\n",
       "        __pyx_t_15 = -1;\n",
       "        if (__pyx_t_13 < 0) {\n",
       "          __pyx_t_13 += __pyx_v_X.shape[0];\n",
       "          if (unlikely(__pyx_t_13 < 0)) __pyx_t_15 = 0;\n",
       "        } else if (unlikely(__pyx_t_13 >= __pyx_v_X.shape[0])) __pyx_t_15 = 0;\n",
       "        if (__pyx_t_14 < 0) {\n",
       "          __pyx_t_14 += __pyx_v_X.shape[1];\n",
       "          if (unlikely(__pyx_t_14 < 0)) __pyx_t_15 = 1;\n",
       "        } else if (unlikely(__pyx_t_14 >= __pyx_v_X.shape[1])) __pyx_t_15 = 1;\n",
       "        if (unlikely(__pyx_t_15 != -1)) {\n",
       "          __Pyx_RaiseBufferIndexError(__pyx_t_15);\n",
       "          {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        }\n",
       "        __pyx_t_16 = __pyx_v_i;\n",
       "        __pyx_t_17 = __pyx_v_d;\n",
       "        __pyx_t_15 = -1;\n",
       "        if (__pyx_t_16 < 0) {\n",
       "          __pyx_t_16 += __pyx_v_X.shape[0];\n",
       "          if (unlikely(__pyx_t_16 < 0)) __pyx_t_15 = 0;\n",
       "        } else if (unlikely(__pyx_t_16 >= __pyx_v_X.shape[0])) __pyx_t_15 = 0;\n",
       "        if (__pyx_t_17 < 0) {\n",
       "          __pyx_t_17 += __pyx_v_X.shape[1];\n",
       "          if (unlikely(__pyx_t_17 < 0)) __pyx_t_15 = 1;\n",
       "        } else if (unlikely(__pyx_t_17 >= __pyx_v_X.shape[1])) __pyx_t_15 = 1;\n",
       "        if (unlikely(__pyx_t_15 != -1)) {\n",
       "          __Pyx_RaiseBufferIndexError(__pyx_t_15);\n",
       "          {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        }\n",
       "        __pyx_v_r = (__pyx_v_r + pow(((*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_X.data + __pyx_t_13 * __pyx_v_X.strides[0]) ) + __pyx_t_14 * __pyx_v_X.strides[1]) ))) - (*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_X.data + __pyx_t_16 * __pyx_v_X.strides[0]) ) + __pyx_t_17 * __pyx_v_X.strides[1]) )))), 2.0));\n",
       "      }\n",
       "
+17:             r = r**0.5
\n", "
      __pyx_v_r = pow(__pyx_v_r, 0.5);\n",
       "
+18:             Y[i] += beta[j] * exp(-(r * theta)**2)
\n", "
      __pyx_t_18 = __pyx_v_j;\n",
       "      __pyx_t_11 = -1;\n",
       "      if (__pyx_t_18 < 0) {\n",
       "        __pyx_t_18 += __pyx_v_beta.shape[0];\n",
       "        if (unlikely(__pyx_t_18 < 0)) __pyx_t_11 = 0;\n",
       "      } else if (unlikely(__pyx_t_18 >= __pyx_v_beta.shape[0])) __pyx_t_11 = 0;\n",
       "      if (unlikely(__pyx_t_11 != -1)) {\n",
       "        __Pyx_RaiseBufferIndexError(__pyx_t_11);\n",
       "        {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      }\n",
       "      __pyx_t_1 = PyFloat_FromDouble((*((double *) ( /* dim=0 */ (__pyx_v_beta.data + __pyx_t_18 * __pyx_v_beta.strides[0]) )))); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __pyx_t_5 = __Pyx_GetModuleGlobalName(__pyx_n_s_exp); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_5);\n",
       "      __pyx_t_2 = PyFloat_FromDouble((-pow((__pyx_v_r * __pyx_v_theta), 2.0))); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_2);\n",
       "      __pyx_t_4 = NULL;\n",
       "      if (CYTHON_COMPILING_IN_CPYTHON && unlikely(PyMethod_Check(__pyx_t_5))) {\n",
       "        __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_5);\n",
       "        if (likely(__pyx_t_4)) {\n",
       "          PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_5);\n",
       "          __Pyx_INCREF(__pyx_t_4);\n",
       "          __Pyx_INCREF(function);\n",
       "          __Pyx_DECREF_SET(__pyx_t_5, function);\n",
       "        }\n",
       "      }\n",
       "      if (!__pyx_t_4) {\n",
       "        __pyx_t_3 = __Pyx_PyObject_CallOneArg(__pyx_t_5, __pyx_t_2); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "        __Pyx_GOTREF(__pyx_t_3);\n",
       "      } else {\n",
       "        __pyx_t_19 = PyTuple_New(1+1); if (unlikely(!__pyx_t_19)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_19);\n",
       "        __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_19, 0, __pyx_t_4); __pyx_t_4 = NULL;\n",
       "        __Pyx_GIVEREF(__pyx_t_2);\n",
       "        PyTuple_SET_ITEM(__pyx_t_19, 0+1, __pyx_t_2);\n",
       "        __pyx_t_2 = 0;\n",
       "        __pyx_t_3 = __Pyx_PyObject_Call(__pyx_t_5, __pyx_t_19, NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        __Pyx_GOTREF(__pyx_t_3);\n",
       "        __Pyx_DECREF(__pyx_t_19); __pyx_t_19 = 0;\n",
       "      }\n",
       "      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "      __pyx_t_5 = PyNumber_Multiply(__pyx_t_1, __pyx_t_3); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_GOTREF(__pyx_t_5);\n",
       "      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "      __pyx_t_20 = __pyx_PyFloat_AsDouble(__pyx_t_5); if (unlikely((__pyx_t_20 == (double)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "      __pyx_t_21 = __pyx_v_i;\n",
       "      __pyx_t_11 = -1;\n",
       "      if (__pyx_t_21 < 0) {\n",
       "        __pyx_t_21 += __pyx_v_Y.shape[0];\n",
       "        if (unlikely(__pyx_t_21 < 0)) __pyx_t_11 = 0;\n",
       "      } else if (unlikely(__pyx_t_21 >= __pyx_v_Y.shape[0])) __pyx_t_11 = 0;\n",
       "      if (unlikely(__pyx_t_11 != -1)) {\n",
       "        __Pyx_RaiseBufferIndexError(__pyx_t_11);\n",
       "        {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      }\n",
       "      *((double *) ( /* dim=0 */ (__pyx_v_Y.data + __pyx_t_21 * __pyx_v_Y.strides[0]) )) += __pyx_t_20;\n",
       "    }\n",
       "  }\n",
       "
 19: 
\n", "
+20:     return Y
\n", "
  __Pyx_XDECREF(__pyx_r);\n",
       "  __pyx_t_5 = __pyx_memoryview_fromslice(__pyx_v_Y, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 20; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_5);\n",
       "  __pyx_r = __pyx_t_5;\n",
       "  __pyx_t_5 = 0;\n",
       "  goto __pyx_L0;\n",
       "
" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%cython -a \n", "from math import exp\n", "import numpy as np\n", "\n", "def rbf_network_1(double[:, :] X, double[:] beta, double theta):\n", "\n", " cdef int N = X.shape[0]\n", " cdef int D = X.shape[1]\n", " cdef double[:] Y = np.zeros(N)\n", " cdef int i, j, d\n", " cdef double r = 0\n", "\n", " for i in range(N):\n", " for j in range(N):\n", " r = 0\n", " for d in range(D):\n", " r += (X[j, d] - X[i, d]) ** 2\n", " r = r**0.5\n", " Y[i] += beta[j] * exp(-(r * theta)**2)\n", "\n", " return Y" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 loop, best of 3: 203 ms per loop\n" ] } ], "source": [ "%timeit rbf_network_1(X, beta, theta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is already a fair improvement given that so far all we’ve done is add some type declarations. For local variables we use the cdef keyword. For arrays we use ‘memoryviews’ which can accept numpy arrays as input." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now our first problem is that we’re still using the Python exponential function. We need to replace this with the C version. The main functions from math.h are included in the Cython libc library, so we just replace from math import exp with" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "from libc.math cimport exp \n" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " Cython: _cython_magic_8e84f5b2985cf43b081b58961850c2e5.pyx\n", " \n", " \n", "\n", "\n", "

Generated by Cython 0.23.4

\n", "

\n", " Yellow lines hint at Python interaction.
\n", " Click on a line that starts with a \"+\" to see the C code that Cython generated for it.\n", "

\n", "
+01: #from math import exp
\n", "
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 02: from libc.math cimport exp
\n", "
 03: 
\n", "
+04: import numpy as np
\n", "
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, -1); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 05: 
\n", "
+06: def rbf_network_2(double[:, :] X,  double[:] beta, double theta):
\n", "
/* Python wrapper */\n",
       "static PyObject *__pyx_pw_46_cython_magic_8e84f5b2985cf43b081b58961850c2e5_1rbf_network_2(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/\n",
       "static PyMethodDef __pyx_mdef_46_cython_magic_8e84f5b2985cf43b081b58961850c2e5_1rbf_network_2 = {\"rbf_network_2\", (PyCFunction)__pyx_pw_46_cython_magic_8e84f5b2985cf43b081b58961850c2e5_1rbf_network_2, METH_VARARGS|METH_KEYWORDS, 0};\n",
       "static PyObject *__pyx_pw_46_cython_magic_8e84f5b2985cf43b081b58961850c2e5_1rbf_network_2(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {\n",
       "  __Pyx_memviewslice __pyx_v_X = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  __Pyx_memviewslice __pyx_v_beta = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  double __pyx_v_theta;\n",
       "  PyObject *__pyx_r = 0;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"rbf_network_2 (wrapper)\", 0);\n",
       "  {\n",
       "    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_X,&__pyx_n_s_beta,&__pyx_n_s_theta,0};\n",
       "    PyObject* values[3] = {0,0,0};\n",
       "    if (unlikely(__pyx_kwds)) {\n",
       "      Py_ssize_t kw_args;\n",
       "      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);\n",
       "      switch (pos_args) {\n",
       "        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);\n",
       "        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "        case  0: break;\n",
       "        default: goto __pyx_L5_argtuple_error;\n",
       "      }\n",
       "      kw_args = PyDict_Size(__pyx_kwds);\n",
       "      switch (pos_args) {\n",
       "        case  0:\n",
       "        if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_X)) != 0)) kw_args--;\n",
       "        else goto __pyx_L5_argtuple_error;\n",
       "        case  1:\n",
       "        if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_beta)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"rbf_network_2\", 1, 3, 3, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "        }\n",
       "        case  2:\n",
       "        if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_theta)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"rbf_network_2\", 1, 3, 3, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "        }\n",
       "      }\n",
       "      if (unlikely(kw_args > 0)) {\n",
       "        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, \"rbf_network_2\") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "      }\n",
       "    } else if (PyTuple_GET_SIZE(__pyx_args) != 3) {\n",
       "      goto __pyx_L5_argtuple_error;\n",
       "    } else {\n",
       "      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);\n",
       "    }\n",
       "    __pyx_v_X = __Pyx_PyObject_to_MemoryviewSlice_dsds_double(values[0]); if (unlikely(!__pyx_v_X.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "    __pyx_v_beta = __Pyx_PyObject_to_MemoryviewSlice_ds_double(values[1]); if (unlikely(!__pyx_v_beta.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "    __pyx_v_theta = __pyx_PyFloat_AsDouble(values[2]); if (unlikely((__pyx_v_theta == (double)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "  }\n",
       "  goto __pyx_L4_argument_unpacking_done;\n",
       "  __pyx_L5_argtuple_error:;\n",
       "  __Pyx_RaiseArgtupleInvalid(\"rbf_network_2\", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "  __pyx_L3_error:;\n",
       "  __Pyx_AddTraceback(\"_cython_magic_8e84f5b2985cf43b081b58961850c2e5.rbf_network_2\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return NULL;\n",
       "  __pyx_L4_argument_unpacking_done:;\n",
       "  __pyx_r = __pyx_pf_46_cython_magic_8e84f5b2985cf43b081b58961850c2e5_rbf_network_2(__pyx_self, __pyx_v_X, __pyx_v_beta, __pyx_v_theta);\n",
       "  int __pyx_lineno = 0;\n",
       "  const char *__pyx_filename = NULL;\n",
       "  int __pyx_clineno = 0;\n",
       "\n",
       "  /* function exit code */\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "\n",
       "static PyObject *__pyx_pf_46_cython_magic_8e84f5b2985cf43b081b58961850c2e5_rbf_network_2(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_X, __Pyx_memviewslice __pyx_v_beta, double __pyx_v_theta) {\n",
       "  int __pyx_v_N;\n",
       "  int __pyx_v_D;\n",
       "  __Pyx_memviewslice __pyx_v_Y = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  int __pyx_v_i;\n",
       "  int __pyx_v_j;\n",
       "  int __pyx_v_d;\n",
       "  double __pyx_v_r;\n",
       "  PyObject *__pyx_r = NULL;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"rbf_network_2\", 0);\n",
       "/* … */\n",
       "  /* function exit code */\n",
       "  __pyx_L1_error:;\n",
       "  __Pyx_XDECREF(__pyx_t_1);\n",
       "  __Pyx_XDECREF(__pyx_t_2);\n",
       "  __Pyx_XDECREF(__pyx_t_3);\n",
       "  __Pyx_XDECREF(__pyx_t_4);\n",
       "  __Pyx_XDECREF(__pyx_t_5);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_t_6, 1);\n",
       "  __Pyx_AddTraceback(\"_cython_magic_8e84f5b2985cf43b081b58961850c2e5.rbf_network_2\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __pyx_r = NULL;\n",
       "  __pyx_L0:;\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_Y, 1);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_X, 1);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_beta, 1);\n",
       "  __Pyx_XGIVEREF(__pyx_r);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "/* … */\n",
       "  __pyx_tuple__14 = PyTuple_Pack(10, __pyx_n_s_X, __pyx_n_s_beta, __pyx_n_s_theta, __pyx_n_s_N, __pyx_n_s_D, __pyx_n_s_Y, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_d, __pyx_n_s_r); if (unlikely(!__pyx_tuple__14)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_tuple__14);\n",
       "  __Pyx_GIVEREF(__pyx_tuple__14);\n",
       "/* … */\n",
       "  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_8e84f5b2985cf43b081b58961850c2e5_1rbf_network_2, NULL, __pyx_n_s_cython_magic_8e84f5b2985cf43b08); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_rbf_network_2, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_codeobj__15 = (PyObject*)__Pyx_PyCode_New(3, 0, 10, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_home_dcorre_cache_ipython_cytho, __pyx_n_s_rbf_network_2, 6, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "
 07: 
\n", "
+08:     cdef int N = X.shape[0]
\n", "
  __pyx_v_N = (__pyx_v_X.shape[0]);\n",
       "
+09:     cdef int D = X.shape[1]
\n", "
  __pyx_v_D = (__pyx_v_X.shape[1]);\n",
       "
+10:     cdef double[:] Y = np.zeros(N)
\n", "
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_zeros); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __pyx_t_2 = __Pyx_PyInt_From_int(__pyx_v_N); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_4 = NULL;\n",
       "  if (CYTHON_COMPILING_IN_CPYTHON && unlikely(PyMethod_Check(__pyx_t_3))) {\n",
       "    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);\n",
       "    if (likely(__pyx_t_4)) {\n",
       "      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);\n",
       "      __Pyx_INCREF(__pyx_t_4);\n",
       "      __Pyx_INCREF(function);\n",
       "      __Pyx_DECREF_SET(__pyx_t_3, function);\n",
       "    }\n",
       "  }\n",
       "  if (!__pyx_t_4) {\n",
       "    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "  } else {\n",
       "    __pyx_t_5 = PyTuple_New(1+1); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_5);\n",
       "    __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_4); __pyx_t_4 = NULL;\n",
       "    __Pyx_GIVEREF(__pyx_t_2);\n",
       "    PyTuple_SET_ITEM(__pyx_t_5, 0+1, __pyx_t_2);\n",
       "    __pyx_t_2 = 0;\n",
       "    __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_5, NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "  __pyx_t_6 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_t_1);\n",
       "  if (unlikely(!__pyx_t_6.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_v_Y = __pyx_t_6;\n",
       "  __pyx_t_6.memview = NULL;\n",
       "  __pyx_t_6.data = NULL;\n",
       "
 11:     cdef int i, j, d
\n", "
+12:     cdef double r = 0
\n", "
  __pyx_v_r = 0.0;\n",
       "
 13: 
\n", "
+14:     for i in range(N):
\n", "
  __pyx_t_7 = __pyx_v_N;\n",
       "  for (__pyx_t_8 = 0; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) {\n",
       "    __pyx_v_i = __pyx_t_8;\n",
       "
+15:         for j in range(N):
\n", "
    __pyx_t_9 = __pyx_v_N;\n",
       "    for (__pyx_t_10 = 0; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {\n",
       "      __pyx_v_j = __pyx_t_10;\n",
       "
+16:             r = 0
\n", "
      __pyx_v_r = 0.0;\n",
       "
+17:             for d in range(D):
\n", "
      __pyx_t_11 = __pyx_v_D;\n",
       "      for (__pyx_t_12 = 0; __pyx_t_12 < __pyx_t_11; __pyx_t_12+=1) {\n",
       "        __pyx_v_d = __pyx_t_12;\n",
       "
+18:                 r += (X[j, d] - X[i, d]) ** 2
\n", "
        __pyx_t_13 = __pyx_v_j;\n",
       "        __pyx_t_14 = __pyx_v_d;\n",
       "        __pyx_t_15 = -1;\n",
       "        if (__pyx_t_13 < 0) {\n",
       "          __pyx_t_13 += __pyx_v_X.shape[0];\n",
       "          if (unlikely(__pyx_t_13 < 0)) __pyx_t_15 = 0;\n",
       "        } else if (unlikely(__pyx_t_13 >= __pyx_v_X.shape[0])) __pyx_t_15 = 0;\n",
       "        if (__pyx_t_14 < 0) {\n",
       "          __pyx_t_14 += __pyx_v_X.shape[1];\n",
       "          if (unlikely(__pyx_t_14 < 0)) __pyx_t_15 = 1;\n",
       "        } else if (unlikely(__pyx_t_14 >= __pyx_v_X.shape[1])) __pyx_t_15 = 1;\n",
       "        if (unlikely(__pyx_t_15 != -1)) {\n",
       "          __Pyx_RaiseBufferIndexError(__pyx_t_15);\n",
       "          {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        }\n",
       "        __pyx_t_16 = __pyx_v_i;\n",
       "        __pyx_t_17 = __pyx_v_d;\n",
       "        __pyx_t_15 = -1;\n",
       "        if (__pyx_t_16 < 0) {\n",
       "          __pyx_t_16 += __pyx_v_X.shape[0];\n",
       "          if (unlikely(__pyx_t_16 < 0)) __pyx_t_15 = 0;\n",
       "        } else if (unlikely(__pyx_t_16 >= __pyx_v_X.shape[0])) __pyx_t_15 = 0;\n",
       "        if (__pyx_t_17 < 0) {\n",
       "          __pyx_t_17 += __pyx_v_X.shape[1];\n",
       "          if (unlikely(__pyx_t_17 < 0)) __pyx_t_15 = 1;\n",
       "        } else if (unlikely(__pyx_t_17 >= __pyx_v_X.shape[1])) __pyx_t_15 = 1;\n",
       "        if (unlikely(__pyx_t_15 != -1)) {\n",
       "          __Pyx_RaiseBufferIndexError(__pyx_t_15);\n",
       "          {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "        }\n",
       "        __pyx_v_r = (__pyx_v_r + pow(((*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_X.data + __pyx_t_13 * __pyx_v_X.strides[0]) ) + __pyx_t_14 * __pyx_v_X.strides[1]) ))) - (*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_X.data + __pyx_t_16 * __pyx_v_X.strides[0]) ) + __pyx_t_17 * __pyx_v_X.strides[1]) )))), 2.0));\n",
       "      }\n",
       "
+19:             r = r**0.5
\n", "
      __pyx_v_r = pow(__pyx_v_r, 0.5);\n",
       "
+20:             Y[i] += beta[j] * exp(-(r * theta)**2)
\n", "
      __pyx_t_18 = __pyx_v_j;\n",
       "      __pyx_t_11 = -1;\n",
       "      if (__pyx_t_18 < 0) {\n",
       "        __pyx_t_18 += __pyx_v_beta.shape[0];\n",
       "        if (unlikely(__pyx_t_18 < 0)) __pyx_t_11 = 0;\n",
       "      } else if (unlikely(__pyx_t_18 >= __pyx_v_beta.shape[0])) __pyx_t_11 = 0;\n",
       "      if (unlikely(__pyx_t_11 != -1)) {\n",
       "        __Pyx_RaiseBufferIndexError(__pyx_t_11);\n",
       "        {__pyx_filename = __pyx_f[0]; __pyx_lineno = 20; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      }\n",
       "      __pyx_t_19 = __pyx_v_i;\n",
       "      __pyx_t_11 = -1;\n",
       "      if (__pyx_t_19 < 0) {\n",
       "        __pyx_t_19 += __pyx_v_Y.shape[0];\n",
       "        if (unlikely(__pyx_t_19 < 0)) __pyx_t_11 = 0;\n",
       "      } else if (unlikely(__pyx_t_19 >= __pyx_v_Y.shape[0])) __pyx_t_11 = 0;\n",
       "      if (unlikely(__pyx_t_11 != -1)) {\n",
       "        __Pyx_RaiseBufferIndexError(__pyx_t_11);\n",
       "        {__pyx_filename = __pyx_f[0]; __pyx_lineno = 20; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "      }\n",
       "      *((double *) ( /* dim=0 */ (__pyx_v_Y.data + __pyx_t_19 * __pyx_v_Y.strides[0]) )) += ((*((double *) ( /* dim=0 */ (__pyx_v_beta.data + __pyx_t_18 * __pyx_v_beta.strides[0]) ))) * exp((-pow((__pyx_v_r * __pyx_v_theta), 2.0))));\n",
       "    }\n",
       "  }\n",
       "
 21: 
\n", "
+22:     return Y
\n", "
  __Pyx_XDECREF(__pyx_r);\n",
       "  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_Y, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __pyx_r = __pyx_t_1;\n",
       "  __pyx_t_1 = 0;\n",
       "  goto __pyx_L0;\n",
       "
" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%cython -a\n", "#from math import exp\n", "from libc.math cimport exp\n", "\n", "import numpy as np\n", "\n", "def rbf_network_2(double[:, :] X, double[:] beta, double theta):\n", "\n", " cdef int N = X.shape[0]\n", " cdef int D = X.shape[1]\n", " cdef double[:] Y = np.zeros(N)\n", " cdef int i, j, d\n", " cdef double r = 0\n", "\n", " for i in range(N):\n", " for j in range(N):\n", " r = 0\n", " for d in range(D):\n", " r += (X[j, d] - X[i, d]) ** 2\n", " r = r**0.5\n", " Y[i] += beta[j] * exp(-(r * theta)**2)\n", "\n", " return Y" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10 loops, best of 3: 100 ms per loop\n" ] } ], "source": [ "%timeit rbf_network_2(X, beta, theta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we need to add some [compiler directives](http://docs.cython.org/src/reference/compilation.html#compiler-directives), the easiest way is to add this line to the top of the file" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " Cython: _cython_magic_44cf836a4f843bea06613906e3e97f99.pyx\n", " \n", " \n", "\n", "\n", "

Generated by Cython 0.23.4

\n", "

\n", " Yellow lines hint at Python interaction.
\n", " Click on a line that starts with a \"+\" to see the C code that Cython generated for it.\n", "

\n", "
 01: 
\n", "
+02: #from math import exp
\n", "
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 03: from libc.math cimport exp
\n", "
+04: import numpy as np
\n", "
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, -1); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 05: import cython
\n", "
 06: 
\n", "
 07: @cython.boundscheck(False)
\n", "
 08: @cython.wraparound(False)
\n", "
 09: @cython.nonecheck(False)
\n", "
+10: def rbf_network_3(double[:, :] X,  double[:] beta, double theta):
\n", "
/* Python wrapper */\n",
       "static PyObject *__pyx_pw_46_cython_magic_44cf836a4f843bea06613906e3e97f99_1rbf_network_3(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/\n",
       "static PyMethodDef __pyx_mdef_46_cython_magic_44cf836a4f843bea06613906e3e97f99_1rbf_network_3 = {\"rbf_network_3\", (PyCFunction)__pyx_pw_46_cython_magic_44cf836a4f843bea06613906e3e97f99_1rbf_network_3, METH_VARARGS|METH_KEYWORDS, 0};\n",
       "static PyObject *__pyx_pw_46_cython_magic_44cf836a4f843bea06613906e3e97f99_1rbf_network_3(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {\n",
       "  __Pyx_memviewslice __pyx_v_X = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  __Pyx_memviewslice __pyx_v_beta = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  double __pyx_v_theta;\n",
       "  PyObject *__pyx_r = 0;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"rbf_network_3 (wrapper)\", 0);\n",
       "  {\n",
       "    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_X,&__pyx_n_s_beta,&__pyx_n_s_theta,0};\n",
       "    PyObject* values[3] = {0,0,0};\n",
       "    if (unlikely(__pyx_kwds)) {\n",
       "      Py_ssize_t kw_args;\n",
       "      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);\n",
       "      switch (pos_args) {\n",
       "        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);\n",
       "        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "        case  0: break;\n",
       "        default: goto __pyx_L5_argtuple_error;\n",
       "      }\n",
       "      kw_args = PyDict_Size(__pyx_kwds);\n",
       "      switch (pos_args) {\n",
       "        case  0:\n",
       "        if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_X)) != 0)) kw_args--;\n",
       "        else goto __pyx_L5_argtuple_error;\n",
       "        case  1:\n",
       "        if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_beta)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"rbf_network_3\", 1, 3, 3, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "        }\n",
       "        case  2:\n",
       "        if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_theta)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"rbf_network_3\", 1, 3, 3, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "        }\n",
       "      }\n",
       "      if (unlikely(kw_args > 0)) {\n",
       "        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, \"rbf_network_3\") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "      }\n",
       "    } else if (PyTuple_GET_SIZE(__pyx_args) != 3) {\n",
       "      goto __pyx_L5_argtuple_error;\n",
       "    } else {\n",
       "      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);\n",
       "    }\n",
       "    __pyx_v_X = __Pyx_PyObject_to_MemoryviewSlice_dsds_double(values[0]); if (unlikely(!__pyx_v_X.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "    __pyx_v_beta = __Pyx_PyObject_to_MemoryviewSlice_ds_double(values[1]); if (unlikely(!__pyx_v_beta.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "    __pyx_v_theta = __pyx_PyFloat_AsDouble(values[2]); if (unlikely((__pyx_v_theta == (double)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "  }\n",
       "  goto __pyx_L4_argument_unpacking_done;\n",
       "  __pyx_L5_argtuple_error:;\n",
       "  __Pyx_RaiseArgtupleInvalid(\"rbf_network_3\", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "  __pyx_L3_error:;\n",
       "  __Pyx_AddTraceback(\"_cython_magic_44cf836a4f843bea06613906e3e97f99.rbf_network_3\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return NULL;\n",
       "  __pyx_L4_argument_unpacking_done:;\n",
       "  __pyx_r = __pyx_pf_46_cython_magic_44cf836a4f843bea06613906e3e97f99_rbf_network_3(__pyx_self, __pyx_v_X, __pyx_v_beta, __pyx_v_theta);\n",
       "  int __pyx_lineno = 0;\n",
       "  const char *__pyx_filename = NULL;\n",
       "  int __pyx_clineno = 0;\n",
       "\n",
       "  /* function exit code */\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "\n",
       "static PyObject *__pyx_pf_46_cython_magic_44cf836a4f843bea06613906e3e97f99_rbf_network_3(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_X, __Pyx_memviewslice __pyx_v_beta, double __pyx_v_theta) {\n",
       "  int __pyx_v_N;\n",
       "  int __pyx_v_D;\n",
       "  __Pyx_memviewslice __pyx_v_Y = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  int __pyx_v_i;\n",
       "  int __pyx_v_j;\n",
       "  int __pyx_v_d;\n",
       "  double __pyx_v_r;\n",
       "  PyObject *__pyx_r = NULL;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"rbf_network_3\", 0);\n",
       "/* … */\n",
       "  /* function exit code */\n",
       "  __pyx_L1_error:;\n",
       "  __Pyx_XDECREF(__pyx_t_1);\n",
       "  __Pyx_XDECREF(__pyx_t_2);\n",
       "  __Pyx_XDECREF(__pyx_t_3);\n",
       "  __Pyx_XDECREF(__pyx_t_4);\n",
       "  __Pyx_XDECREF(__pyx_t_5);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_t_6, 1);\n",
       "  __Pyx_AddTraceback(\"_cython_magic_44cf836a4f843bea06613906e3e97f99.rbf_network_3\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __pyx_r = NULL;\n",
       "  __pyx_L0:;\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_Y, 1);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_X, 1);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_beta, 1);\n",
       "  __Pyx_XGIVEREF(__pyx_r);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "/* … */\n",
       "  __pyx_tuple__14 = PyTuple_Pack(10, __pyx_n_s_X, __pyx_n_s_beta, __pyx_n_s_theta, __pyx_n_s_N, __pyx_n_s_D, __pyx_n_s_Y, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_d, __pyx_n_s_r); if (unlikely(!__pyx_tuple__14)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_tuple__14);\n",
       "  __Pyx_GIVEREF(__pyx_tuple__14);\n",
       "/* … */\n",
       "  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_44cf836a4f843bea06613906e3e97f99_1rbf_network_3, NULL, __pyx_n_s_cython_magic_44cf836a4f843bea06); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_rbf_network_3, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_codeobj__15 = (PyObject*)__Pyx_PyCode_New(3, 0, 10, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_home_dcorre_cache_ipython_cytho, __pyx_n_s_rbf_network_3, 10, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "
 11: 
\n", "
+12:     cdef int N = X.shape[0]
\n", "
  __pyx_v_N = (__pyx_v_X.shape[0]);\n",
       "
+13:     cdef int D = X.shape[1]
\n", "
  __pyx_v_D = (__pyx_v_X.shape[1]);\n",
       "
+14:     cdef double[:] Y = np.zeros(N)
\n", "
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_zeros); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __pyx_t_2 = __Pyx_PyInt_From_int(__pyx_v_N); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_4 = NULL;\n",
       "  if (CYTHON_COMPILING_IN_CPYTHON && unlikely(PyMethod_Check(__pyx_t_3))) {\n",
       "    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);\n",
       "    if (likely(__pyx_t_4)) {\n",
       "      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);\n",
       "      __Pyx_INCREF(__pyx_t_4);\n",
       "      __Pyx_INCREF(function);\n",
       "      __Pyx_DECREF_SET(__pyx_t_3, function);\n",
       "    }\n",
       "  }\n",
       "  if (!__pyx_t_4) {\n",
       "    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "  } else {\n",
       "    __pyx_t_5 = PyTuple_New(1+1); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_5);\n",
       "    __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_4); __pyx_t_4 = NULL;\n",
       "    __Pyx_GIVEREF(__pyx_t_2);\n",
       "    PyTuple_SET_ITEM(__pyx_t_5, 0+1, __pyx_t_2);\n",
       "    __pyx_t_2 = 0;\n",
       "    __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_5, NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "  __pyx_t_6 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_t_1);\n",
       "  if (unlikely(!__pyx_t_6.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_v_Y = __pyx_t_6;\n",
       "  __pyx_t_6.memview = NULL;\n",
       "  __pyx_t_6.data = NULL;\n",
       "
 15:     cdef int i, j, d
\n", "
+16:     cdef double r = 0
\n", "
  __pyx_v_r = 0.0;\n",
       "
 17: 
\n", "
+18:     for i in range(N):
\n", "
  __pyx_t_7 = __pyx_v_N;\n",
       "  for (__pyx_t_8 = 0; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) {\n",
       "    __pyx_v_i = __pyx_t_8;\n",
       "
+19:         for j in range(N):
\n", "
    __pyx_t_9 = __pyx_v_N;\n",
       "    for (__pyx_t_10 = 0; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {\n",
       "      __pyx_v_j = __pyx_t_10;\n",
       "
+20:             r = 0
\n", "
      __pyx_v_r = 0.0;\n",
       "
+21:             for d in range(D):
\n", "
      __pyx_t_11 = __pyx_v_D;\n",
       "      for (__pyx_t_12 = 0; __pyx_t_12 < __pyx_t_11; __pyx_t_12+=1) {\n",
       "        __pyx_v_d = __pyx_t_12;\n",
       "
+22:                 r += (X[j, d] - X[i, d]) ** 2
\n", "
        __pyx_t_13 = __pyx_v_j;\n",
       "        __pyx_t_14 = __pyx_v_d;\n",
       "        __pyx_t_15 = __pyx_v_i;\n",
       "        __pyx_t_16 = __pyx_v_d;\n",
       "        __pyx_v_r = (__pyx_v_r + pow(((*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_X.data + __pyx_t_13 * __pyx_v_X.strides[0]) ) + __pyx_t_14 * __pyx_v_X.strides[1]) ))) - (*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_X.data + __pyx_t_15 * __pyx_v_X.strides[0]) ) + __pyx_t_16 * __pyx_v_X.strides[1]) )))), 2.0));\n",
       "      }\n",
       "
+23:             r = r**0.5
\n", "
      __pyx_v_r = pow(__pyx_v_r, 0.5);\n",
       "
+24:             Y[i] += beta[j] * exp(-(r * theta)**2)
\n", "
      __pyx_t_17 = __pyx_v_j;\n",
       "      __pyx_t_18 = __pyx_v_i;\n",
       "      *((double *) ( /* dim=0 */ (__pyx_v_Y.data + __pyx_t_18 * __pyx_v_Y.strides[0]) )) += ((*((double *) ( /* dim=0 */ (__pyx_v_beta.data + __pyx_t_17 * __pyx_v_beta.strides[0]) ))) * exp((-pow((__pyx_v_r * __pyx_v_theta), 2.0))));\n",
       "    }\n",
       "  }\n",
       "
 25: 
\n", "
+26:     return Y
\n", "
  __Pyx_XDECREF(__pyx_r);\n",
       "  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_Y, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 26; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __pyx_r = __pyx_t_1;\n",
       "  __pyx_t_1 = 0;\n",
       "  goto __pyx_L0;\n",
       "
" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%cython --compile-args=-ffast-math --link-args=-ffast-math -a\n", "\n", "#from math import exp\n", "from libc.math cimport exp\n", "import numpy as np\n", "import cython\n", "\n", "@cython.boundscheck(False)\n", "@cython.wraparound(False)\n", "@cython.nonecheck(False)\n", "def rbf_network_3(double[:, :] X, double[:] beta, double theta):\n", "\n", " cdef int N = X.shape[0]\n", " cdef int D = X.shape[1]\n", " cdef double[:] Y = np.zeros(N)\n", " cdef int i, j, d\n", " cdef double r = 0\n", "\n", " for i in range(N):\n", " for j in range(N):\n", " r = 0\n", " for d in range(D):\n", " r += (X[j, d] - X[i, d]) ** 2\n", " r = r**0.5\n", " Y[i] += beta[j] * exp(-(r * theta)**2)\n", "\n", " return Y" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10 loops, best of 3: 31 ms per loop\n" ] } ], "source": [ "%timeit rbf_network_3(X, beta, theta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And that's pretty good, but we're still only using a single CPU because of python's [Global Interpretter Lock (GIL)](http://stackoverflow.com/questions/1294382/what-is-a-global-interpreter-lock-gil). Let's use all those cores..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Parallelisation in Cython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cython with OpenMP\n", "Cython supports [parallel processing](http://docs.cython.org/src/userguide/parallelism.html) via threads using the OpenMP backend. What does that look like?\n", "\n", "First of all, we have to add some extra compile flags to enable OpenMP. Next we put the loops in a nogil context which releases the GIL restriction, something we can only safely do when our code is in pure C without any interaction with Python objects.
\n", "Also in the same context block is parallel which sets up the OpenMP threading. Finally we see prange, parallel range, which executes the outer loop in parallel across the number of cores specified.
\n", "\n", "Now to create a multi-threaded version of rbf_network we just need to replace range() in the first loop with the multi-treaded version prange() from cython.parallel. This tells the compiler to run the loop across multiple CPU cores. In this case, we have no concurrency problems: the order in which the loop is executed doesn’t matter.
\n", "\n", "Notice the nogil argument in prange(N, nogil=True). In order to run multi-threaded code we need to turn off the GIL. This means that you can’t have Python code inside your multi-threaded loop or compilation will fail. It also means that any functions called inside the loop need to be defined nogil" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " Cython: _cython_magic_f943c84460c21e5d152873f930413d2a.pyx\n", " \n", " \n", "\n", "\n", "

Generated by Cython 0.23.4

\n", "

\n", " Yellow lines hint at Python interaction.
\n", " Click on a line that starts with a \"+\" to see the C code that Cython generated for it.\n", "

\n", "
 01: 
\n", "
+02: import cython
\n", "
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 03: from cython.parallel import prange, parallel
\n", "
+04: import numpy as np
\n", "
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, -1); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 05: #from math import exp
\n", "
 06: from libc.math cimport exp
\n", "
 07: 
\n", "
 08: @cython.boundscheck(False)
\n", "
 09: @cython.wraparound(False)
\n", "
 10: @cython.nonecheck(False)
\n", "
+11: def rbf_network_multithread(double[:, :] X,  double[:] beta, double theta):
\n", "
/* Python wrapper */\n",
       "static PyObject *__pyx_pw_46_cython_magic_f943c84460c21e5d152873f930413d2a_1rbf_network_multithread(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/\n",
       "static PyMethodDef __pyx_mdef_46_cython_magic_f943c84460c21e5d152873f930413d2a_1rbf_network_multithread = {\"rbf_network_multithread\", (PyCFunction)__pyx_pw_46_cython_magic_f943c84460c21e5d152873f930413d2a_1rbf_network_multithread, METH_VARARGS|METH_KEYWORDS, 0};\n",
       "static PyObject *__pyx_pw_46_cython_magic_f943c84460c21e5d152873f930413d2a_1rbf_network_multithread(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {\n",
       "  __Pyx_memviewslice __pyx_v_X = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  __Pyx_memviewslice __pyx_v_beta = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  double __pyx_v_theta;\n",
       "  PyObject *__pyx_r = 0;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"rbf_network_multithread (wrapper)\", 0);\n",
       "  {\n",
       "    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_X,&__pyx_n_s_beta,&__pyx_n_s_theta,0};\n",
       "    PyObject* values[3] = {0,0,0};\n",
       "    if (unlikely(__pyx_kwds)) {\n",
       "      Py_ssize_t kw_args;\n",
       "      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);\n",
       "      switch (pos_args) {\n",
       "        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);\n",
       "        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "        case  0: break;\n",
       "        default: goto __pyx_L5_argtuple_error;\n",
       "      }\n",
       "      kw_args = PyDict_Size(__pyx_kwds);\n",
       "      switch (pos_args) {\n",
       "        case  0:\n",
       "        if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_X)) != 0)) kw_args--;\n",
       "        else goto __pyx_L5_argtuple_error;\n",
       "        case  1:\n",
       "        if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_beta)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"rbf_network_multithread\", 1, 3, 3, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "        }\n",
       "        case  2:\n",
       "        if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_theta)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"rbf_network_multithread\", 1, 3, 3, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "        }\n",
       "      }\n",
       "      if (unlikely(kw_args > 0)) {\n",
       "        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, \"rbf_network_multithread\") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "      }\n",
       "    } else if (PyTuple_GET_SIZE(__pyx_args) != 3) {\n",
       "      goto __pyx_L5_argtuple_error;\n",
       "    } else {\n",
       "      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);\n",
       "    }\n",
       "    __pyx_v_X = __Pyx_PyObject_to_MemoryviewSlice_dsds_double(values[0]); if (unlikely(!__pyx_v_X.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "    __pyx_v_beta = __Pyx_PyObject_to_MemoryviewSlice_ds_double(values[1]); if (unlikely(!__pyx_v_beta.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "    __pyx_v_theta = __pyx_PyFloat_AsDouble(values[2]); if (unlikely((__pyx_v_theta == (double)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "  }\n",
       "  goto __pyx_L4_argument_unpacking_done;\n",
       "  __pyx_L5_argtuple_error:;\n",
       "  __Pyx_RaiseArgtupleInvalid(\"rbf_network_multithread\", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "  __pyx_L3_error:;\n",
       "  __Pyx_AddTraceback(\"_cython_magic_f943c84460c21e5d152873f930413d2a.rbf_network_multithread\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return NULL;\n",
       "  __pyx_L4_argument_unpacking_done:;\n",
       "  __pyx_r = __pyx_pf_46_cython_magic_f943c84460c21e5d152873f930413d2a_rbf_network_multithread(__pyx_self, __pyx_v_X, __pyx_v_beta, __pyx_v_theta);\n",
       "  int __pyx_lineno = 0;\n",
       "  const char *__pyx_filename = NULL;\n",
       "  int __pyx_clineno = 0;\n",
       "\n",
       "  /* function exit code */\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "\n",
       "static PyObject *__pyx_pf_46_cython_magic_f943c84460c21e5d152873f930413d2a_rbf_network_multithread(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_X, __Pyx_memviewslice __pyx_v_beta, double __pyx_v_theta) {\n",
       "  int __pyx_v_N;\n",
       "  int __pyx_v_D;\n",
       "  __Pyx_memviewslice __pyx_v_Y = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  int __pyx_v_i;\n",
       "  int __pyx_v_j;\n",
       "  int __pyx_v_d;\n",
       "  double __pyx_v_r;\n",
       "  PyObject *__pyx_r = NULL;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"rbf_network_multithread\", 0);\n",
       "/* … */\n",
       "  /* function exit code */\n",
       "  __pyx_L1_error:;\n",
       "  __Pyx_XDECREF(__pyx_t_1);\n",
       "  __Pyx_XDECREF(__pyx_t_2);\n",
       "  __Pyx_XDECREF(__pyx_t_3);\n",
       "  __Pyx_XDECREF(__pyx_t_4);\n",
       "  __Pyx_XDECREF(__pyx_t_5);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_t_6, 1);\n",
       "  __Pyx_AddTraceback(\"_cython_magic_f943c84460c21e5d152873f930413d2a.rbf_network_multithread\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __pyx_r = NULL;\n",
       "  __pyx_L0:;\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_Y, 1);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_X, 1);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_beta, 1);\n",
       "  __Pyx_XGIVEREF(__pyx_r);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "/* … */\n",
       "  __pyx_tuple__14 = PyTuple_Pack(10, __pyx_n_s_X, __pyx_n_s_beta, __pyx_n_s_theta, __pyx_n_s_N, __pyx_n_s_D, __pyx_n_s_Y, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_d, __pyx_n_s_r); if (unlikely(!__pyx_tuple__14)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_tuple__14);\n",
       "  __Pyx_GIVEREF(__pyx_tuple__14);\n",
       "/* … */\n",
       "  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_f943c84460c21e5d152873f930413d2a_1rbf_network_multithread, NULL, __pyx_n_s_cython_magic_f943c84460c21e5d15); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_rbf_network_multithread, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_codeobj__15 = (PyObject*)__Pyx_PyCode_New(3, 0, 10, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_home_dcorre_cache_ipython_cytho, __pyx_n_s_rbf_network_multithread, 11, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "
 12: 
\n", "
+13:     cdef int N = X.shape[0]
\n", "
  __pyx_v_N = (__pyx_v_X.shape[0]);\n",
       "
+14:     cdef int D = X.shape[1]
\n", "
  __pyx_v_D = (__pyx_v_X.shape[1]);\n",
       "
+15:     cdef double[:] Y = np.zeros(N)
\n", "
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_zeros); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __pyx_t_2 = __Pyx_PyInt_From_int(__pyx_v_N); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_4 = NULL;\n",
       "  if (CYTHON_COMPILING_IN_CPYTHON && unlikely(PyMethod_Check(__pyx_t_3))) {\n",
       "    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);\n",
       "    if (likely(__pyx_t_4)) {\n",
       "      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);\n",
       "      __Pyx_INCREF(__pyx_t_4);\n",
       "      __Pyx_INCREF(function);\n",
       "      __Pyx_DECREF_SET(__pyx_t_3, function);\n",
       "    }\n",
       "  }\n",
       "  if (!__pyx_t_4) {\n",
       "    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "  } else {\n",
       "    __pyx_t_5 = PyTuple_New(1+1); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_5);\n",
       "    __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_4); __pyx_t_4 = NULL;\n",
       "    __Pyx_GIVEREF(__pyx_t_2);\n",
       "    PyTuple_SET_ITEM(__pyx_t_5, 0+1, __pyx_t_2);\n",
       "    __pyx_t_2 = 0;\n",
       "    __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_5, NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "  __pyx_t_6 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_t_1);\n",
       "  if (unlikely(!__pyx_t_6.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_v_Y = __pyx_t_6;\n",
       "  __pyx_t_6.memview = NULL;\n",
       "  __pyx_t_6.data = NULL;\n",
       "
 16:     cdef int i, j, d
\n", "
+17:     cdef double r = 0
\n", "
  __pyx_v_r = 0.0;\n",
       "
 18: 
\n", "
+19:     for i in prange(N, nogil=True,num_threads=4):
\n", "
  {\n",
       "      #ifdef WITH_THREAD\n",
       "      PyThreadState *_save;\n",
       "      Py_UNBLOCK_THREADS\n",
       "      #endif\n",
       "      /*try:*/ {\n",
       "        __pyx_t_7 = __pyx_v_N;\n",
       "        if (1 == 0) abort();\n",
       "        {\n",
       "            #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))\n",
       "                #undef likely\n",
       "                #undef unlikely\n",
       "                #define likely(x)   (x)\n",
       "                #define unlikely(x) (x)\n",
       "            #endif\n",
       "            __pyx_t_9 = (__pyx_t_7 - 0) / 1;\n",
       "            if (__pyx_t_9 > 0)\n",
       "            {\n",
       "                #ifdef _OPENMP\n",
       "                #pragma omp parallel\n",
       "                #endif /* _OPENMP */\n",
       "                {\n",
       "                    #ifdef _OPENMP\n",
       "                    #pragma omp for lastprivate(__pyx_v_j) firstprivate(__pyx_v_i) lastprivate(__pyx_v_i) lastprivate(__pyx_v_d) lastprivate(__pyx_v_r) num_threads(4)\n",
       "                    #endif /* _OPENMP */\n",
       "                    for (__pyx_t_8 = 0; __pyx_t_8 < __pyx_t_9; __pyx_t_8++){\n",
       "                        {\n",
       "                            __pyx_v_i = 0 + 1 * __pyx_t_8;\n",
       "                            /* Initialize private variables to invalid values */\n",
       "                            __pyx_v_j = ((int)0xbad0bad0);\n",
       "                            __pyx_v_d = ((int)0xbad0bad0);\n",
       "                            __pyx_v_r = ((double)__PYX_NAN());\n",
       "/* … */\n",
       "      /*finally:*/ {\n",
       "        /*normal exit:*/{\n",
       "          #ifdef WITH_THREAD\n",
       "          Py_BLOCK_THREADS\n",
       "          #endif\n",
       "          goto __pyx_L5;\n",
       "        }\n",
       "        __pyx_L5:;\n",
       "      }\n",
       "  }\n",
       "
+20:         for j in range(N):
\n", "
                            __pyx_t_10 = __pyx_v_N;\n",
       "                            for (__pyx_t_11 = 0; __pyx_t_11 < __pyx_t_10; __pyx_t_11+=1) {\n",
       "                              __pyx_v_j = __pyx_t_11;\n",
       "
+21:             r = 0
\n", "
                              __pyx_v_r = 0.0;\n",
       "
+22:             for d in range(D):
\n", "
                              __pyx_t_12 = __pyx_v_D;\n",
       "                              for (__pyx_t_13 = 0; __pyx_t_13 < __pyx_t_12; __pyx_t_13+=1) {\n",
       "                                __pyx_v_d = __pyx_t_13;\n",
       "
+23:                 r += (X[j, d] - X[i, d])**2
\n", "
                                __pyx_t_14 = __pyx_v_j;\n",
       "                                __pyx_t_15 = __pyx_v_d;\n",
       "                                __pyx_t_16 = __pyx_v_i;\n",
       "                                __pyx_t_17 = __pyx_v_d;\n",
       "                                __pyx_v_r = (__pyx_v_r + pow(((*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_X.data + __pyx_t_14 * __pyx_v_X.strides[0]) ) + __pyx_t_15 * __pyx_v_X.strides[1]) ))) - (*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_X.data + __pyx_t_16 * __pyx_v_X.strides[0]) ) + __pyx_t_17 * __pyx_v_X.strides[1]) )))), 2.0));\n",
       "                              }\n",
       "
+24:             r = r**0.5
\n", "
                              __pyx_v_r = pow(__pyx_v_r, 0.5);\n",
       "
+25:             Y[i] += beta[j] * exp(-(r * theta)**2)
\n", "
                              __pyx_t_18 = __pyx_v_j;\n",
       "                              __pyx_t_19 = __pyx_v_i;\n",
       "                              *((double *) ( /* dim=0 */ (__pyx_v_Y.data + __pyx_t_19 * __pyx_v_Y.strides[0]) )) += ((*((double *) ( /* dim=0 */ (__pyx_v_beta.data + __pyx_t_18 * __pyx_v_beta.strides[0]) ))) * exp((-pow((__pyx_v_r * __pyx_v_theta), 2.0))));\n",
       "                            }\n",
       "                        }\n",
       "                    }\n",
       "                }\n",
       "            }\n",
       "        }\n",
       "        #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))\n",
       "            #undef likely\n",
       "            #undef unlikely\n",
       "            #define likely(x)   __builtin_expect(!!(x), 1)\n",
       "            #define unlikely(x) __builtin_expect(!!(x), 0)\n",
       "        #endif\n",
       "      }\n",
       "
 26: 
\n", "
+27:     return Y
\n", "
  __Pyx_XDECREF(__pyx_r);\n",
       "  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_Y, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 27; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __pyx_r = __pyx_t_1;\n",
       "  __pyx_t_1 = 0;\n",
       "  goto __pyx_L0;\n",
       "
" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%cython --compile-args=-fopenmp --link-args=-fopenmp --compile-args=-ffast-math --link-args=-ffast-math --force -a\n", "\n", "import cython\n", "from cython.parallel import prange, parallel\n", "import numpy as np\n", "#from math import exp\n", "from libc.math cimport exp\n", "\n", "@cython.boundscheck(False)\n", "@cython.wraparound(False)\n", "@cython.nonecheck(False)\n", "def rbf_network_multithread(double[:, :] X, double[:] beta, double theta):\n", "\n", " cdef int N = X.shape[0]\n", " cdef int D = X.shape[1]\n", " cdef double[:] Y = np.zeros(N)\n", " cdef int i, j, d\n", " cdef double r = 0\n", "\n", " for i in prange(N, nogil=True,num_threads=4):\n", " for j in range(N):\n", " r = 0\n", " for d in range(D):\n", " r += (X[j, d] - X[i, d])**2 \n", " r = r**0.5\n", " Y[i] += beta[j] * exp(-(r * theta)**2)\n", "\n", " return Y\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [], "source": [ "D = 5\n", "N = 1000\n", "X = np.array([np.random.rand(N) for d in range(D)]).T\n", "beta = np.random.rand(N)\n", "theta = 10" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100 loops, best of 3: 12.5 ms per loop\n" ] } ], "source": [ "%timeit rbf_network_multithread(X, beta, theta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion\n", "\n", "| Method | Time (ms) | Speed up Factor |\n", "| :------------- |:-------------:|:-----:|\n", "| Pure Python | 4560 | - |\n", "| Scipy | 194 | 23 |\n", "| Numpy | 52.9 | 86 |\n", "| Numba | 32.1 | 142 |\n", "| Cython | 31 | 147 |\n", "| Cython with parallelisation | 12.5 | 365 |\n", "\n", "I was able to make this particular array operation 365x faster using code that looks remarkably similar to the original naive python implementation. Also important to note that the function API did not change at all; the speed benefits and multithreading are completely transparent to the user of this function.\n", "\n", "If you are processing an array using loops, you should definitely look at Cython, particularly Cython with OpenMP threading, to speed up your operations. Maybe not as easy as Python, but certainly much better than learning C.\n", "\n", "Note that using the power of numpy you can already speeds up your code by a factor 86, and it holds in one line for our example!!\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial is a mix of some examples which can be found here:\n", "+ [Fast Python loops](http://nealhughes.net/cython1/)\n", "+ [Parallel computing in Cython - threads](http://nealhughes.net/parallelcomp2/)\n", "+ [Parallelizing numpy array loops with Cython and OpenMP](http://www.perrygeo.com/parallelizing-numpy-array-loops-with-cython-and-mpi.html)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cython (in real world)\n", "\n", "To use cython with your own codes you should write a .pyx file and a setup.py as explained below.\n", "\n", "A Cython version - implemented in the file fastloop.pyx - looks something like this" ] }, { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [ "from math import exp\n", "import numpy as np\n", "\n", "def rbf_network(double[:, :] X, double[:] beta, double theta):\n", "\n", " cdef int N = X.shape[0]\n", " cdef int D = X.shape[1]\n", " cdef double[:] Y = np.zeros(N)\n", " cdef int i, j, d\n", " cdef double r = 0\n", "\n", " for i in range(N):\n", " for j in range(N):\n", " r = 0\n", " for d in range(D):\n", " r += (X[j, d] - X[i, d]) ** 2\n", " r = r**0.5\n", " Y[i] += beta[j] * exp(-(r * theta)**2)\n", "\n", " return Y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Note that you don’t have to add type declarations in a *.pyx file. Any lines which use untyped variables will just remain in Python rather than being translated to C.\n", "\n", "To compile we need a setup.py script, that looks something like this" ] }, { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [ "from distutils.core import setup\n", "from Cython.Build import cythonize\n", "\n", "setup(name=\"fastloop\", ext_modules=cythonize('fastloop.pyx'),)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "then we compile from the terminal with\n", "\n" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "python setup.py build_ext --inplace" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which generates a C code file fastloop.c and a compiled Python extension fastloop.so.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With Cython there are a few ‘tricks’ involved in achieving good performance. Here’s the first one, if we type this in the terminal" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "cython fastloop.pyx -a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "we generate a fastloop.html file which we can open in a browser\n", "\n", "\n", "Lines highlighted yellow are still using Python and are slowing our code down. Our goal is get rid of yellow lines, especially any inside of loops.\n", "\n", "Out first problem is that we’re still using the Python exponential function. We need to replace this with the C version. The main functions from math.h are included in the Cython libc library, so we just replace from math import exp with" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "from libc.math cimport exp \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we need to add some [compiler directives](http://docs.cython.org/src/reference/compilation.html#compiler-directives), the easiest way is to add this line to the top of the file" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "#cython: boundscheck=False, wraparound=False, nonecheck=False\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that with these checks turned off you can get segmentation faults rather than nice error messages, so its best to debug your code before putting this line in.\n", "\n", "Next we can consider playing with compiler flags (these are C tricks rather than Cython tricks as such). When using gcc the most important option seems to be -ffast-math. From my limited experience, this can improve speeds a lot, with no noticeable loss of reliability. To implement these changes we need to modify the setup.py file:" ] }, { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [ "from distutils.core import setup\n", "from distutils.extension import Extension\n", "from Cython.Distutils import build_ext\n", "\n", "ext_modules=[ Extension(\"fastloop\",\n", " [\"fastloop.pyx\"],\n", " libraries=[\"m\"],\n", " extra_compile_args = [\"-ffast-math\"])]\n", "\n", "setup(\n", " name = \"fastloop\",\n", " cmdclass = {\"build_ext\": build_ext},\n", " ext_modules = ext_modules)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now if we run cython fastloop.pyx -a again we will see the loops are now free of Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The yellow outside the loops is irrelevant here (but would matter if we needed to call this function many times within another loop).\n", "\n", "Now you can recompile and test it out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, now we’re getting there." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Calling C functions\n", "\n", "So what else can we do? Well it turns out the exponential function is a bit of a bottleneck here, even the C version. One option is to use a [fast approximation to the exponential function](http://www.schraudolph.org/pubs/Schraudolph99.pdf)" ] }, { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [ "#include \n", "\n", "static union\n", "{\n", " double d;\n", " struct\n", " {\n", "\n", "#ifdef LITTLE_ENDIAN\n", " int j, i;\n", "#else\n", " int i, j;\n", "#endif\n", " } n;\n", "} _eco;\n", "\n", "#define EXP_A (1048576/M_LN2) /* use 1512775 for integer version */\n", "#define EXP_C 60801 /* see text for choice of c values */\n", "#define EXP(y) (_eco.n.i = EXP_A*(y) + (1072693248 - EXP_C), _eco.d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From Cython its easy to call C code. Put the above code in vfastexp.h, then just add the following to your fastloop.pyx file" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "cdef extern from \"vfastexp.h\":\n", " double exp_approx \"EXP\" (double)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So now we can just use exp_approx() in place of exp()." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Numba" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Numba](http://numba.pydata.org/) is an LLVM compiler for python code, which allows code written in Python to be converted to highly efficient compiled code in real-time. Due to its dependencies, compiling it can be a challenge. To experiment with Numba, I recommend using a local installation of Anaconda, the free cross-platform Python distribution which includes Numba and all its prerequisites within a single easy-to-install package.\n", "\n", "Numba is extremely simple to use. We just wrap our python function with autojit (JIT stands for \"just in time\" compilation) to automatically create an efficient, compiled version of the function:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from numba import jit" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "rbf_network_numba1=jit(rbf_network)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The slowest run took 8.29 times longer than the fastest. This could mean that an intermediate result is being cached.\n", "1 loop, best of 3: 32.1 ms per loop\n" ] } ], "source": [ "%timeit rbf_network_numba1(X, beta, theta)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from math import exp\n", "import numpy as np\n", "from numba import jit,autojit\n", "\n", "@autojit\n", "def rbf_network_numba2(X, beta, theta):\n", "\n", " N = X.shape[0]\n", " D = X.shape[1]\n", " Y = np.zeros(N)\n", "\n", " for i in range(N):\n", " for j in range(N):\n", " r = 0\n", " for d in range(D):\n", " r += (X[j, d] - X[i, d]) ** 2\n", " r = r**0.5\n", " Y[i] += beta[j] * exp(-(r * theta)**2)\n", "\n", " return Y" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The slowest run took 7.19 times longer than the fastest. This could mean that an intermediate result is being cached.\n", "1 loop, best of 3: 32.1 ms per loop\n" ] } ], "source": [ "%timeit rbf_network_numba2(X, beta, theta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparison between numba and cython here:\n", "\n", "https://jakevdp.github.io/blog/2012/08/24/numba-vs-cython/ \n", "\n", "https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/\n", "\n", "\n", "How to choose:\n", "https://eng.climate.com/2015/04/09/numba-vs-cython-how-to-choose/\n", "\n", "Overview of different optimisation methods:\n", "[quantecon](http://quant-econ.net/py/need_for_speed.html)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Numpy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before considering to use optimisation methods, you should use as much as possible the power of numpy!!!\n", "(Solution provided by Didier Vibert)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "D = 5\n", "N = 1000\n", "X = np.array([np.random.rand(N) for d in range(D)]).T\n", "beta = np.random.rand(N)\n", "theta = 10" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from numpy.linalg import norm\n", "\n", "def rbf_network_vec(X, beta, theta):\n", "\n", " Y = np.sum( beta *np.exp(-(norm(X[np.newaxis,:,:]-X[:,np.newaxis,:],axis=-1) * theta)**2 ), axis=1)\n", " \n", " return Y" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10 loops, best of 3: 52.9 ms per loop\n" ] } ], "source": [ "%timeit rbf_network_vec(X, beta, theta)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }