Wes on 2015-10-02: The performance issue I found in NumPy has been fixed, but the pandas workaround is still faster by 2x or more.

Getting a 1-dimensional `ndarray`

of object dtype containing Python tuples is, unless I'm missing something, rather difficult. Take this simple example:

In [1]: tuples = zip(range(100000), range(100000)) In [2]: arr = np.array(tuples, dtype=object, ndmin=1) In [3]: arr Out[3]: array([[0, 0], [1, 1], [2, 2], ..., [99997, 99997], [99998, 99998], [99999, 99999]], dtype=object) In [5]: arr.ndim Out[5]: 2

OK, that didn't work so well. The only way I've figured out how to get what I want is:

In [6]: arr = np.empty(len(tuples), dtype='O') In [7]: arr[:] = tuples In [8]: arr Out[8]: array([(0, 0), (1, 1), (2, 2), ..., (99997, 99997), (99998, 99998), (99999, 99999)], dtype=object)

Yahtzee. But the kids aren't alright:

In [9]: timeit arr[:] = tuples 10 loops, best of 3: 133 ms per loop

Maybe it's just me but that strikes me as being outrageously slow. Someday I'll look at what's going on under the hood, but a quickie Cython function comes to the rescue:

def list_to_object_array(list obj): ''' Convert list to object ndarray. Seriously can't believe I had to write this function ''' cdef: Py_ssize_t i, n ndarray[object] arr n = len(obj) arr = np.empty(n, dtype=object) for i from 0 <= i < n: arr[i] = obj[i] return arr

You would hope this is faster, and indeed it's about **85x** faster:

In [12]: timeit arr = lib.list_to_object_array(tuples) 1000 loops, best of 3: 1.56 ms per loop

Scratching my head here, but I'll take it. I suspect there might be some object copying going on under the hood, anyone know?