Wes on 2015-10-02: The performance issue I found in NumPy has been fixed, but the pandas workaround is still faster by 2x or more.
Getting a 1-dimensional
ndarray of object dtype containing Python tuples is, unless I'm missing something, rather difficult. Take this simple example:
In : tuples = zip(range(100000), range(100000)) In : arr = np.array(tuples, dtype=object, ndmin=1) In : arr Out: array([[0, 0], [1, 1], [2, 2], ..., [99997, 99997], [99998, 99998], [99999, 99999]], dtype=object) In : arr.ndim Out: 2
OK, that didn't work so well. The only way I've figured out how to get what I want is:
In : arr = np.empty(len(tuples), dtype='O') In : arr[:] = tuples In : arr Out: array([(0, 0), (1, 1), (2, 2), ..., (99997, 99997), (99998, 99998), (99999, 99999)], dtype=object)
Yahtzee. But the kids aren't alright:
In : timeit arr[:] = tuples 10 loops, best of 3: 133 ms per loop
Maybe it's just me but that strikes me as being outrageously slow. Someday I'll look at what's going on under the hood, but a quickie Cython function comes to the rescue:
def list_to_object_array(list obj): ''' Convert list to object ndarray. Seriously can't believe I had to write this function ''' cdef: Py_ssize_t i, n ndarray[object] arr n = len(obj) arr = np.empty(n, dtype=object) for i from 0 <= i < n: arr[i] = obj[i] return arr
You would hope this is faster, and indeed it's about 85x faster:
In : timeit arr = lib.list_to_object_array(tuples) 1000 loops, best of 3: 1.56 ms per loop
Scratching my head here, but I'll take it. I suspect there might be some object copying going on under the hood, anyone know?