## Even easier frequency tables in pandas 0.7.0

I put in a little work on a new crosstab function in the main pandas namespace. It’s basically a convenient shortcut to calling pivot_table to make it easy to compute cross-tabulations for a set of factors using pandas DataFrame or even vanilla NumPy arrays!

Here’s an example:

Python
 12345678910111213141516171819202122232425262728293031323334353637383940 In [1]: from pandas import * In [2]: df = DataFrame({'a' : np.random.randint(0, 2, size=20), ...: 'b' : np.random.randint(0, 5, size=20), ...: 'c' : np.random.randint(0, 3, size=20)}) In [3]: df Out[3]: a b c 0 0 1 2 1 1 4 1 2 1 4 2 3 1 2 2 4 0 0 0 5 0 0 2 6 1 2 2 7 1 2 0 8 1 1 1 9 1 2 0 10 1 4 1 11 1 1 2 12 0 1 2 13 1 1 0 14 1 4 1 15 1 4 0 16 0 4 1 17 0 3 1 18 1 0 1 19 0 4 1 In [4]: crosstab(df['b'], [df['a'], df['c']]) Out[4]: a 0 1 c 0 1 2 0 1 2 b 0 1 0 1 0 1 0 1 0 0 2 1 1 1 2 0 0 0 2 0 2 3 0 1 0 0 0 0 4 0 2 0 1 3 1

This makes it very easy to produce an easy-on-the-eyes frequency table. crosstab can also take NumPy arrays. Suppose we had 1 million draws from a normal distribution, and we wish to produce a histogram-like table showing the number of draws whose absolute values fall into the bins defined by [0, 1, 2, 3]. Also, let’s divide things up by sign. Using crosstab this becomes dead simple.

Python
 1234567891011121314 In [19]: arr = np.random.randn(1000000) In [20]: buckets = [0, 1, 2, 3] In [22]: crosstab(np.digitize(np.abs(arr), buckets), np.sign(arr), ....: rownames=['bucket'], colnames=['sign']) ....: Out[22]: sign -1.0 1.0 bucket 1 341678 340498 2 136104 135999 3 21424 21607 4 1334 1356

Of course since the result is a DataFrame you can customize the indexes:

Python
 12345678910111213141516171819 In [28]: table = crosstab(np.digitize(np.abs(arr), buckets), ....: np.sign(arr), rownames=['bucket'], ....: colnames=['sign']) ....: In [29]: table.index = Index(['[0, 1)', '[1, 2)', '[2, 3)', ....: '[3, inf)'], name="bin") ....: In [30]: table.columns = Index(['negative', 'positive'], name="sign") In [31]: table Out[31]: sign negative positive bin [0, 1) 341678 340498 [1, 2) 136104 135999 [2, 3) 21424 21607 [3, inf) 1334 1356

• Flyinthesky

Is there any way to just get the frequency counts for a single column? For example, a table of column A in the first example that indicated 0: 7 and 1: 13.

Thanks.

Yes, try `df['A'].value_counts()`

• mittenchops

Does this support crosstabs of array objects? Say df['b'] had contained multiple elements.

I don’t think it does at the moment but could be enabled to do so