Talk at Rice Stats on structured data analysis, pandas, 11/21/2011

I had the privilege of speaking today at Rice at Hadley Wickham’s (of R fame) invitation. I talked broadly about the problems faced in structured data manipulation problems and how I’ve worked to address them in pandas. I enjoyed speaking with Hadley (whom I’d never met before in person) about these problems as we’ve come up with nice solutions to many of them independently, and in different languages (R and Python, respectively). A lot of folks assume that parts of pandas are inspired by his plyr and reshape2 packages, but I’ve only started playing around with them recently. I think we all have a lot we can learn from each other on the road toward building even better high performance, easy-to-use, and expressive data manipulation tools.

  • John Marino

    Looking forward to your book.

    [Reply]

  • Johnlinuxuser

    Pandas sounds interesting for R-like dataframe structure, but I have a simple question on it. In R data frame, it is easy to add/extend one row or many rows and an element to specific location in a new row. An example was shown below.

    df1=

    V1 V2 V3
    11 e1 23
    2 e2 232

    and add a new row df2 like that below
    a = 3
    b= e3
    c=345

    df2=(a, b,c) or c(a,b,c) in R

    In R, simply rbind (df1,df2) or df1[3,1]=a,df[3,2]=b,df[3,3]=c, and how to do it in python-like dataframe? Thanks.

    John

    [Reply]

    Wes McKinney Reply:

    see http://pandas.sourceforge.net/merging.html#appending-dataframe-objects
    or http://pandas.sourceforge.net/dsintro.html#column-selection-addition-deletion
    or http://pandas.sourceforge.net/merging.html#joining-merging-dataframes

    i’m actually going to add more functionality for adding rows in the future– it’s all pretty easy to implement.

    [Reply]

    Johnlinuxuser Reply:

    Thanks, but these instructions seemed easy for column addition. I am still wondering how to add a single new row. Could you please post a easy code to add a new row using the example data I posted above, a,b,c . Thanks again

    [Reply]

    Wes McKinney Reply:

    I wouldn’t recommend you do that in practice (you would be better to accumulate a Python list of tuples or lists, then pass those to DataFrame.from_records to convert to DataFrame), but assuming you absolutely must append a single record (which is just as costly with pandas as it is in R, requires creating an entirely new object), here is a way to do it:

    def append_record(df, row):
    dummy_df = DataFrame.from_records([row], names=df.columns)
    return df.append(dummy_df, ignore_index=True)

    In [52]: df
    Out[52]:
    V1 V2 V3
    0 11 e1 23
    1 2 e2 232

    In [53]: row
    Out[53]: [3, 'e3', 345]

    In [54]: append_record(df, row)
    Out[54]:
    V1 V2 V3
    0 11 e1 23
    1 2 e2 232
    2 3 e3 345

    I’m somewhat hesitant to add something like this to the API, though maybe.