## Performance quirk: making a 1D object ndarray of tuples

Getting a 1-dimensional ndarray of object dtype containing Python tuples is, unless I’m missing something, rather difficult. Take this simple example:

In [1]: tuples = zip(range(100000), range(100000))
In [2]: arr = np.array(tuples, dtype=object, ndmin=1)
In [3]: arr
Out[3]:
array([[0, 0],
[1, 1],
[2, 2],
...,
[99997, 99997],
[99998, 99998],
[99999, 99999]], dtype=object)

In [5]: arr.ndim
Out[5]: 2

OK, that didn’t work so well. The only way I’ve figured out how to get what I want is:

In [6]: arr = np.empty(len(tuples), dtype='O')
In [7]: arr[:] = tuples
In [8]: arr
Out[8]:
array([(0, 0), (1, 1), (2, 2), ..., (99997, 99997), (99998, 99998),
(99999, 99999)], dtype=object)

Yahtzee. But the kids aren’t alright:

In [9]: timeit arr[:] = tuples
10 loops, best of 3: 133 ms per loop

Maybe it’s just me but that strikes me as being outrageously slow. Someday I’ll look at what’s going on under the hood, but a quickie Cython function comes to the rescue:

def list_to_object_array(list obj):
'''
Convert list to object ndarray.
Seriously can't believe I had to write this function
'''

cdef:
Py_ssize_t i, n
ndarray[object] arr
n = len(obj)
arr = np.empty(n, dtype=object)
for i from 0 <= i < n:
arr[i] = obj[i]
return arr

You would hope this is faster, and indeed it’s about 85x faster:

In [12]: timeit arr = lib.list_to_object_array(tuples)
1000 loops, best of 3: 1.56 ms per loop

Scratching my head here, but I’ll take it. I suspect there might be some object copying going on under the hood, anyone know?

• http://twitter.com/minrk Min RK

Typically when one wants something *like* a 1D array of tuples, one uses a recarray.