Python, R, and the allure of magic

R is much more magical than Python. What do I mean by this? In R, things like this are a part of everyday life:

> a <- rnorm(10)
> b <- rnorm(10)
> cbind(a, b)
               a          b
 [1,]  0.8729978  0.5170078
 [2,] -0.6885048 -0.4430447
 [3,]  0.4017740  1.8985843
 [4,]  2.1088905 -1.4121763
 [5,]  0.9375273  0.4703302
 [6,]  0.5558276 -0.5825152
 [7,] -2.1606252  0.7379874
 [8,] -0.7651046 -0.4534345
 [9,] -4.2604901  0.9561077
[10,]  0.3940632 -0.8331285

If you’re a seasoned Python programmer, you might have the sort of visceral negative reaction that I do to this. Seriously, just where in the hell did those variable names come from? So when I say magic here I’m talking about abusing the language’s parser. There is nothing special about R that makes the above behavior possible, but rather taking a fundamentally different design philosophy to, say, Python. As any Python programmer knows: Explicit is better than implicit. I happen to agree. There is also a bit of a semantic difference in R versus Python in that assignment in R typically copies data, whereas variables in Python are simply references (labels) for a particular object. So you could make the argument that the names a and b above are more strongly linked to the underlying data.

While building pandas over the last several years, I occasionally grapple with issues like the above. Maybe I should just break from Python ethos and embrace magic? I mean, how hard would it be to get the above behavior in Python? Python gives you stack frames and the ast module after all. So I went down the rabbit hole and wrote this little code snippet:

While this is woefully unpythonic, it’s also kind of cool:

In [27]: merge(a, b)
Out[27]:
            a         b      
2000-01-03 -1.35      0.8398
2000-01-04  0.999    -1.617  
2000-01-05  0.2537    1.433  
2000-01-06  0.6273   -0.3959
2000-01-07  0.7963   -0.789  
2000-01-10  0.004295 -1.446

This can even parse and format more complicated expressions (harder than it looks, because you have to walk the whole AST):

In [30]: merge(a, np.log(b))
Out[30]:
            a        np.log(b)
2000-01-03  0.6243   0.7953  
2000-01-04  0.3593  -1.199    
2000-01-05  2.805   -1.059    
2000-01-06  0.6369  -0.9067  
2000-01-07 -0.2734   NaN      
2000-01-10 -1.023    0.3326

Now, I am *not* suggesting we do this any time soon. I’m going to prefer the explicit approach (cf. the Zen of Python) any day of the week:

In [32]: DataFrame({'a' : a, 'log(b)' : np.log(b)})
Out[32]:
            a        log(b)
2000-01-03  0.6243   0.7953
2000-01-04  0.3593  -1.199  
2000-01-05  2.805   -1.059  
2000-01-06  0.6369  -0.9067
2000-01-07 -0.2734   NaN    
2000-01-10 -1.023    0.3326