R is much more magical than Python. What do I mean by this? In R, things like this are a part of everyday life:

> a <- rnorm(10)
> b <- rnorm(10)
> cbind(a, b)
a          b
[1,]  0.8729978  0.5170078
[2,] -0.6885048 -0.4430447
[3,]  0.4017740  1.8985843
[4,]  2.1088905 -1.4121763
[5,]  0.9375273  0.4703302
[6,]  0.5558276 -0.5825152
[7,] -2.1606252  0.7379874
[8,] -0.7651046 -0.4534345
[9,] -4.2604901  0.9561077
[10,]  0.3940632 -0.8331285


If you're a seasoned Python programmer, you might have the sort of visceral negative reaction that I do to this. Seriously, just where in the hell did those variable names come from? So when I say magic here I'm talking about abusing the language's parser. There is nothing special about R that makes the above behavior possible, but rather taking a fundamentally different design philosophy to, say, Python. As any Python programmer knows: Explicit is better than implicit. I happen to agree. There is also a bit of a semantic difference in R versus Python in that assignment in R typically copies data, whereas variables in Python are simply references (labels) for a particular object. So you could make the argument that the names a and b above are more strongly linked to the underlying data.

While building pandas over the last several years, I occasionally grapple with issues like the above. Maybe I should just break from Python ethos and embrace magic? I mean, how hard would it be to get the above behavior in Python? Python gives you stack frames and the ast module after all. So I went down the rabbit hole and wrote this little code snippet:

from pandas.util.testing import set_trace
import pandas.util.testing as tm

from pandas import *
import ast
import inspect
import sys

def merge(a, b):
f, args, _ = parse_stmt(inspect.currentframe().f_back)
return DataFrame({args[0] : a,
args[1] : b})

def parse_stmt(frame):
info = inspect.getframeinfo(frame)
call = info[-2][0]
mod = ast.parse(call)
body = mod.body[0]
if isinstance(body, (ast.Assign, ast.Expr)):
call = body.value
elif isinstance(body, ast.Call):
call = body
return _parse_call(call)

def _parse_call(call):
func = _maybe_format_attribute(call.func)

str_args = []
for arg in call.args:
if isinstance(arg, ast.Name):
str_args.append(arg.id)
elif isinstance(arg, ast.Call):
formatted = _format_call(arg)
str_args.append(formatted)

return func, str_args, {}

def _format_call(call):
func, args, kwds = _parse_call(call)
content = ''
if args:
content += ', '.join(args)
if kwds:
fmt_kwds = ['%s=%s' % item for item in kwds.iteritems()]
joined_kwds = ', '.join(fmt_kwds)
if args:
content = content + ', ' + joined_kwds
else:
content += joined_kwds
return '%s(%s)' % (func, content)

def _maybe_format_attribute(name):
if isinstance(name, ast.Attribute):
return _format_attribute(name)
return name.id

def _format_attribute(attr):
obj = attr.value
if isinstance(attr.value, ast.Attribute):
obj = _format_attribute(attr.value)
else:
obj = obj.id
return '.'.join((obj, attr.attr))

a = tm.makeTimeSeries()
b = tm.makeTimeSeries()
df = merge(a, b)


While this is woefully unpythonic, it's also kind of cool:

In [27]: merge(a, b)
Out[27]:
a         b
2000-01-03 -1.35      0.8398
2000-01-04  0.999    -1.617
2000-01-05  0.2537    1.433
2000-01-06  0.6273   -0.3959
2000-01-07  0.7963   -0.789
2000-01-10  0.004295 -1.446


This can even parse and format more complicated expressions (harder than it looks, because you have to walk the whole AST):

In [30]: merge(a, np.log(b))
Out[30]:
a        np.log(b)
2000-01-03  0.6243   0.7953
2000-01-04  0.3593  -1.199
2000-01-05  2.805   -1.059
2000-01-06  0.6369  -0.9067
2000-01-07 -0.2734   NaN
2000-01-10 -1.023    0.3326


Now, I am *not* suggesting we do this any time soon. I'm going to prefer the explicit approach (cf. the Zen of Python) any day of the week:

In [32]: DataFrame({'a' : a, 'log(b)' : np.log(b)})
Out[32]:
a        log(b)
2000-01-03  0.6243   0.7953
2000-01-04  0.3593  -1.199
2000-01-05  2.805   -1.059
2000-01-06  0.6369  -0.9067
2000-01-07 -0.2734   NaN
2000-01-10 -1.023    0.3326