Performance quirk: making a 1D object ndarray of tuples

Getting a 1-dimensional ndarray of object dtype containing Python tuples is, unless I’m missing something, rather difficult. Take this simple example:

In [1]: tuples = zip(range(100000), range(100000))
In [2]: arr = np.array(tuples, dtype=object, ndmin=1)
In [3]: arr
Out[3]:
array([[0, 0],
       [1, 1],
       [2, 2],
       ...,
       [99997, 99997],
       [99998, 99998],
       [99999, 99999]], dtype=object)

In [5]: arr.ndim
Out[5]: 2

OK, that didn’t work so well. The only way I’ve figured out how to get what I want is:

In [6]: arr = np.empty(len(tuples), dtype='O')
In [7]: arr[:] = tuples
In [8]: arr
Out[8]:
array([(0, 0), (1, 1), (2, 2), ..., (99997, 99997), (99998, 99998),
       (99999, 99999)], dtype=object)

Yahtzee. But the kids aren’t alright:

In [9]: timeit arr[:] = tuples
10 loops, best of 3: 133 ms per loop

Maybe it’s just me but that strikes me as being outrageously slow. Someday I’ll look at what’s going on under the hood, but a quickie Cython function comes to the rescue:

def list_to_object_array(list obj):
    '''
    Convert list to object ndarray.
    Seriously can't believe I had to write this function
    '''

    cdef:
        Py_ssize_t i, n
        ndarray[object] arr
    n = len(obj)
    arr = np.empty(n, dtype=object)
    for i from 0 <= i < n:
        arr[i] = obj[i]
    return arr

You would hope this is faster, and indeed it’s about 85x faster:

In [12]: timeit arr = lib.list_to_object_array(tuples)
1000 loops, best of 3: 1.56 ms per loop

Scratching my head here, but I’ll take it. I suspect there might be some object copying going on under the hood, anyone know?