The views and opinions on this website are my opinions and do not reflect the views of my current or former employers.
Since 2007, I have been developing data analysis software, mostly for use in the Python programming language. My primary objective has been improving user productivity, increasing performance and efficiency, and enhancing data interoperability. I am best known for creating the pandas project and writing the book Python for Data Analysis. Since 2015, I have been focused on the Apache Arrow project. I have also contributed to Apache Kudu (incubating) and Apache Parquet (where I am a PMC member). I was the co-founder and CEO of DataPad. I later spent a couple years leading efforts to bring Python and Hadoop together at Cloudera. In 2018, I founded Ursa Labs, a not-for-profit open source development group in partnership with RStudio. In 2018, I became a Member of The Apache Software Foundation
Open source projects
- I am a committer and PMC member for Apache Arrow, focusing on the C++ and Python implementations.
pandas (website): Python in-memory data wrangling, preparation, and analytics
- I created pandas and am its Benevolent Dictator for Life
- I created Ibis at Cloudera.
Feather: a language agnostic data frame file format
- Hadley Wickham and I designed Feather in January 2016 and released it in March.
Apache Kudu (incubating)
- I originally created the Python interface to Kudu, using Cython to wrap the C++ API.
- I am a committer and member of the PMC for Apache Parquet. I have been focusing on the C++ implementation
- I worked on time series models (e.g. VAR) and pandas integration.
Long form biography
I'm an American computer programmer and the Director of Ursa Labs. I studied theoretical mathematics at MIT (graduating in late 2006) before becoming very interested in programming and tools for data analysis, especially for industry use cases, in 2007.
From August 2007 to July 2010, I worked on the front office quant research team at AQR Capital Management, a large quantitative investment manager in Greenwich, CT. During this time, I led a very successful effort to migrate research and production model building and research processes to the Python programming language. I started building pandas on April 6, 2008, as part of a skunkworks effort to reproduce some econometric research in Python. As part of my work, we formed a new Research Development team for the global macro group to drive software innovation in the front office.
I joined the PhD program in the Statistical Science Department at Duke University before taking leave in Summer 2011 to explore ways to develop open source software (such as pandas) in a sustainable way. I discovered that entrepreneurship often makes more sense than consulting to fund open source with more leverage.
From November 2011 through August 2012, I wrote Python for Data Analysis.
In January 2012, I co-founded Lambda Foundry and we explored developing value-add financial software for the Python data stack. Ultimately the team and I went our separate ways.
In January 2013, I co-founded DataPad with Chang She, fellow MIT grad and a former AQR colleague. We were developing a full stack visual analytics product for business users, using the Python data stack for most of our core technology. We raised venture capital from Accel Partners, Google Ventures, SV Angel, Andreessen-Horowitz, and other investors.
In September 2014, DataPad's technology assets were acquired by Cloudera and we joined the engineering team there.
In April 2018, I collaborated with RStudio and Two Sigma to create Ursa Labs, a not-for-profit open source development group focused on shared infrastructure for data science, powered by Apache Arrow.
I was born in 1985. I grew up mostly in Tennessee and Ohio.