Archives for Wes McKinney

Feather format update: Whence and Whither?

Apache Arrow and the "10 Things I Hate About pandas"

Making Smart Phones Dumb Again

Software patents are evil, but BSD+Patents is probably not the solution

Extreme IO performance with parallel Apache Parquet in Python

Streaming Columnar Data with Apache Arrow

Development update: High speed Apache Parquet in Python with Apache Arrow

Native Hadoop file system (HDFS) connectivity in Python

From Arrow to pandas at 10 Gigabytes Per Second

2017 Outlook: pandas, Arrow, Feather, Parquet, Spark, Ibis

Kinesis Advantage2: Impressions

GitHub's one-dimensional view of open source contributions

Kinesis Savant Elite 2 Foot pedals

Feather: it's about metadata

Feather and Apache Arrow: Grokking file formats vs. in-memory representations

Rejoinder: the problem with conda-forge right now

conda-forge and PyData's CentOS moment

On Software Demos and Potemkin Villages

Avoid unsigned integers in C++ if you can

Compiling DataFrame code is harder than it looks

Do average consumers still need Dropbox?

Why pandas users should be excited about Apache Arrow

Analyzing Interactive Brokers XML Flex Statements with pandas

The problem with the data science language wars

Spying on instance methods with Python's mock module

Don't sell on Amazon

What’s changed

Thoughts on joining Cloudera

Strata NYC 2013 and PyData 2013 Talks

PyCon Singapore 2013

I'm moving to San Francisco. And hiring

Whirlwind tour of pandas in 10 minutes

Update on upcoming pandas v0.10, new file parser, other performance wins

A new high performance, memory-efficient file parser engine for pandas

Intro to Python for Financial Data Analysis at General Assembly

Easy, high performance time zone handling in pandas 0.8.0

Mastering high performance data algorithms I: Group By

A O(n log n) NA-friendly time series "as of" using array operations

The need for an embedded array expression compiler for NumPy

vbench Lightning Talk Slides from PyCon 2012

Even easier frequency tables in pandas 0.7.0

Contingency tables and cross-tabulations in pandas

NYCPython 1/10/2012: A look inside pandas design and development

High performance database joins with pandas DataFrame, more benchmarks

Some pandas Database Join (merge) Benchmarks vs. R base::merge

Introducing vbench, new code performance analysis and monitoring tool

Formatting DataFrame as HTML

Talk at Rice Stats on structured data analysis, pandas, 11/21/2011

pandas talk at PyHPC 2011 workshop in SC11, thoughts on hash tables

Filtering out duplicate pandas.DataFrame rows

PyHPC 2011 Pre-print paper on pandas

Fast and easy pivot tables in pandas 0.5.0

Performance quirk: making a 1D object ndarray of tuples

Python for Financial Data Analysis with pandas

Speeding up pandas's file parsers with Cython

Python, R, and the allure of magic

The pandas escaped the zoo: Python's pandas vs. R's zoo benchmarks

Faster time series alignment / joins for pandas, beating R's xts package

NumPy indexing peculiarities

NYC Open Statistical Programming Meetup on 9/14/2011

GroupBy-fu: improvements in grouping and aggregating data in pandas

A Roadmap for Rich Scientific Data Structures in Python

SciPy 2011 Conference Highlights

Adventures in Aggregating Data