## The need for an embedded array expression compiler for NumPy

Yesterday I found myself adding some additional Cython methods for doing fast grouped aggregations in pandas. To my disappointment, I found myself duplicating a lot of code and not having much alternative beyond cooking up some kind of ad hoc code generation framework. So, here’s the problem: array of data (possibly with NAs), array of labels, with the number of distinct labels known (range from 0 to N – 1). Aggregate data based on label using some function in {sum, min, max, product, mean, variance, …}. Here is a Cython function to add all the values in a group:

Python
 123456789101112131415161718192021222324252627282930 def group_add(ndarray[float64_t] out, ndarray[int32_t] counts, ndarray[float64_t] values, ndarray[int32_t] labels): cdef: Py_ssize_t i, N, lab float64_t val, count ndarray[float64_t] sumx, nobs nobs = np.zeros_like(out) sumx = np.zeros_like(out) N = len(values)     for i in range(N):         lab = labels[i]         if lab < 0:             continue         counts[lab] += 1         val = values[i]         # not NA         if val == val:             nobs[lab] += 1             sumx[lab] += val for i in range(len(counts)):         if nobs[i] == 0:             out[i] = nan         else:             out[i] = sumx[i]

Now suppose we want to take the product of the values:

Python
 123456789101112131415161718192021222324252627282930 def group_prod(ndarray[float64_t] out, ndarray[int32_t] counts, ndarray[float64_t] values, ndarray[int32_t] labels): cdef: Py_ssize_t i, N, lab float64_t val, count ndarray[float64_t] prodx, nobs nobs = np.zeros_like(out) prodx = np.ones_like(out) N = len(values)     for i in range(N):         lab = labels[i]         if lab < 0:             continue         counts[lab] += 1         val = values[i]         # not nan         if val == val:             nobs[lab] += 1             prodx[lab] *= val for i in range(len(counts)):         if nobs[i] == 0:             out[i] = nan         else:             out[i] = prodx[i]

However, the amount of code duplication here is kind of ridiculous; the only changes are: different reduction function, and different starting values. Ideally I would like to write code that looks like:

Python
 12345678 def group_reduce(how): def reducer(data, labels, out): filtered = notna(data) grouped = groupby(filtered, labels) return reduce(how, grouped, out=out) return reducer group_add = group_reduce(add) group_prod = group_reduce(multiply)

IMHO there is plenty of information here to generate C or LLVM that is at least as fast as the above code. However, this isn't completely extensible-- suppose we wanted to compute group variance. In that case what you actually need is

Python
 1234 nobs = reduce(count, grouped) X = reduce(add, grouped) XX = reduce(add, grouped * grouped) variance = (nobs * XX - X**2) / (nos * (nobs - 1))

The only thing you care about here is the variance result. Ideally, you want the generated code to:

• Only make one pass over the data
• Have no unnecessary temporary variables, except the X and XX arrays; the full reductions are necessary for computing the variance.
• Under this framework, something like mean becomes even more trivial:

Python
 1234 grouped = groupby(notna(data), labels) nobs = reduce(count, grouped) X = reduce(add, grouped) mean = X / nobs

Now, I can live with creating a C/Cython code generation framework, enabling me to write code like the above and get code that’s at least as good as something hand-written. But it would be best, of course, if the translation / machine code generation were dynamic to remove the compile step. Is there an existing toolchain here that can help with the machine code generation? Perhaps Numba is the place to do this with LLVM: Numba.

Perhaps the eventual result of something like this is an embedded JIT-ing APL-like language with Python syntax running inside CPython. You would like to make Numexpr (which is an awesome project!) effective obsolete while you’re at it. In the meantime, I’ll keep thinking about the best way forward.

• Stephen O’Gallagher

Numba seems the right way to go , and your use case could certainly help with its design.

No, I haven’t. Since it mainly just depends on NumPy (which is now a part of GAE?) it probably wouldn’t be too difficult for them to include.

• James Bergstra

Hi wes, what do you think of my numba + python hack for this:

https://github.com/ContinuumIO/numba/pull/6

It is about 0.1% feature complete, but I think the eat-cake-and-have-it-too solution is within striking distance.