Python vs NumPy vs Nim

Yesterday I’ve stumbled on the article Pure Python vs NumPy vs TensorFlow Performance Comparison where the author gives a performance comparison of different implementations of gradient descent algorithm for a simple linear regression example.

Lately I’ve been experimenting with the Nim programming language, which promises to offer a Python-like easy to read syntax, while having C-like speeds. This seemed like a nice and simple example to compare speed between Nim and Python.

Python results

As everybody would expect, the article has shown that the pure Python version is much slower than the other two versions, but nobody would write numerical code like that.

NumPy allows us to write both more readable and much faster code, as it takes advantage of vectorised operations on NumPy arrays, and usually calls optimized C or Fortran code.

The code from the original article is used without modifications, I have just re-run it on my machine (i7-970 @ 3.20 GHz) to get the base values for a later comparison:

Python time: 34.62 seconds
NumPy time: 0.71 seconds

Enter Nim

Based on my previous experience of using both Nim and Python, I knew I could expect Nim to be noticeably faster than (pure) Python. The question is – can Nim compete against NumPy’s speed?

We’ll take “pure Nim” approach – no array/tensor library, meaning we need to iterate the arrays element by element, something that is known to be very costly in Python.

Let’s go through our program bit by bit:

import random, times, math

randomize(444)

const
  N = 10_000
  sigma = 0.1
  f = 2 / N
  mu = 0.001
  nEpochs = 10_000

Compared to Python, in Nim all imports are written on the same line, and importing a module in Nim is analogous to from foo import * in Python.

We take the same seed as in the original, and we define all the needed constants. (All indented lines are part of the const block.) Nim is a statically typed language, but the types can be inferred from the values.

Next we need to define vectors x and d (x = np.linspace(0, 2, N) and d = 3 + 2 * x + noise in Python), and we need to do it element by element:

var x, d: array[N, float]
for i in 0 ..< N:
  x[i] = f * i.float
  d[i] = 3.0 + 2.0 * x[i] + sigma * randomNormal()

Operator ..< iterates until the upper limit. (Operator .. would iterate to the limit, including it.)

The thing to notice here is that we cannot combine integers and floats – i needs to be converted to float. (Nim has UFCS support so i.float is the same as float(i).)

Function randomNormal, which gives us a Gaussian distribution (np.random.randn in the Python version), is taken from Arraymancer, which is Nim’s tensor library (still in early stage, but it is rapidly developed).

The remaining thing to do is to define the gradientDescent function:

proc gradientDescent(x, d: array[N, float], mu: float, nEpochs: int):
                     tuple[w0, w1: float] =
  var
    y: array[N, float]
    err: float
    w0, w1: float

  for n in 1 .. nEpochs:
    var grad0, grad1: float

    for i in 0 ..< N:
      err = f * (d[i] - y[i])
      grad0 += err
      grad1 += err * x[i]

    w0 += mu * grad0
    w1 += mu * grad1

    for i in 0 ..< N:
      y[i] = w0 + w1 * x[i]

  return (w0, w1)

When declaring variables, they are initialized with their default values (0.0 for floats), meaning that var y: array[N, float] is similar to y = np.zeros(N, dtype=float) in Python.

Lastly, we print the values of w0 and w1, to see if the values are close to the ones we expect (w0 = 3, w1 = 2) and measure the time needed for the calculation:

let start = cpuTime()
echo gradientDescent(x, d, mu, nEpochs)
echo "Nim time: ", cpuTime() - start, " seconds"

Nim results

We compile the program in release mode, which turns off runtime checks and turns on the optimizer, and run it:

$ nim c -d:release gradDesc.nim
$ ./gradDesc

(w0: 2.968954757075724, w1: 2.02593328759163)
Nim time: 0.226344 seconds
Version Time (s) Speedup vs Python Speedup vs NumPy
Python 34.62 - 0.02x
NumPy 0.71 48.76x -
Nim 0.226 153.19x 3.14x

While NumPy has offered significant speedups compared to the pure Python version, Nim manages to further improve upon that, and not just marginally – it is two orders of a magnitude faster than Python, and, what is more impressive, it is three times faster than NumPy.

Nim has managed to deliver on its promise – for this example it offers 3x performance compared to NumPy, while keeping the code readable.

 

Discussion on Reddit and Hacker News.

Source files (both .py and .nim) are available here.