Why pandas users should be excited about Apache Arrow

Tags pandas

I'm super excited to be involved in the new open source Apache Arrow community initiative. For Python (and R, too!), it will help enable

  • Substantially improved data access speeds
  • Closer to native performance Python extensions for big data systems like Apache Spark
  • New in-memory analytics functionality for nested / JSON-like data

There's plenty of places you can learn more about Arrow, but this post is about how it's specifically relevant to pandas users. See, for example: