R is much more magical than Python. What do I mean by this? In R, things like this are a part of everyday life:
> b <- rnorm(10)
> cbind(a, b)
a b
[1,] 0.8729978 0.5170078
[2,] -0.6885048 -0.4430447
[3,] 0.4017740 1.8985843
[4,] 2.1088905 -1.4121763
[5,] 0.9375273 0.4703302
[6,] 0.5558276 -0.5825152
[7,] -2.1606252 0.7379874
[8,] -0.7651046 -0.4534345
[9,] -4.2604901 0.9561077
[10,] 0.3940632 -0.8331285
If you’re a seasoned Python programmer, you might have the sort of visceral negative reaction that I do to this. Seriously, just where in the hell did those variable names come from? So when I say magic here I’m talking about abusing the language’s parser. There is nothing special about R that makes the above behavior possible, but rather taking a fundamentally different design philosophy to, say, Python. As any Python programmer knows: Explicit is better than implicit. I happen to agree. There is also a bit of a semantic difference in R versus Python in that assignment in R typically copies data, whereas variables in Python are simply references (labels) for a particular object. So you could make the argument that the names a and b above are more strongly linked to the underlying data.
While building pandas over the last several years, I occasionally grapple with issues like the above. Maybe I should just break from Python ethos and embrace magic? I mean, how hard would it be to get the above behavior in Python? Python gives you stack frames and the ast module after all. So I went down the rabbit hole and wrote this little code snippet:
While this is woefully unpythonic, it’s also kind of cool:
Out[27]:
a b
2000-01-03 -1.35 0.8398
2000-01-04 0.999 -1.617
2000-01-05 0.2537 1.433
2000-01-06 0.6273 -0.3959
2000-01-07 0.7963 -0.789
2000-01-10 0.004295 -1.446
This can even parse and format more complicated expressions (harder than it looks, because you have to walk the whole AST):
Out[30]:
a np.log(b)
2000-01-03 0.6243 0.7953
2000-01-04 0.3593 -1.199
2000-01-05 2.805 -1.059
2000-01-06 0.6369 -0.9067
2000-01-07 -0.2734 NaN
2000-01-10 -1.023 0.3326
Now, I am *not* suggesting we do this any time soon. I’m going to prefer the explicit approach (cf. the Zen of Python) any day of the week:
Out[32]:
a log(b)
2000-01-03 0.6243 0.7953
2000-01-04 0.3593 -1.199
2000-01-05 2.805 -1.059
2000-01-06 0.6369 -0.9067
2000-01-07 -0.2734 NaN
2000-01-10 -1.023 0.3326

Wes McKinney Reply:
September 30th, 2011 at 2:24 am
Indeed, you certainly have the option of being explicit in R. My point was rather that enabling / encouraging “parser abuse” (or “S-expression magic” in Lisp parlance, if you will) leads to badly (or confusingly) designed software. R doesn’t even do a consistent job of it:
> cbind(a, exp(b))
a
[1,] 1.3246856 2.3932937
[2,] 2.2343485 0.6771349
[3,] -0.3247855 1.3599159
[4,] -0.2181146 0.3034037
[5,] 1.5601856 0.9904395
[6,] -1.3391083 2.1795645
[7,] 0.7141858 0.2738150
[8,] 0.7324488 0.6660562
[9,] -0.9724181 0.6383835
[10,] -0.9690034 0.4292268
So I guess maybe when variables get passed R keeps track of their bound variable names, but not if they are “unbound”.
[Reply]
Dirk Eddelbuettel Reply:
September 30th, 2011 at 2:54 am
Still a non-issue as no experienced R code would use cbind to create lasting data structures for later (human) consumption. Different strokes for different folks…
[Reply]
Joshua Ulrich Reply:
September 30th, 2011 at 7:26 am
In the case of cbind, you could force explicitness by setting deparse.level=0. “Abuse” is a bit of a harsh description considering the behavior is documented.
Your cbind example is consistent with the documentation, which says deparse.level=1 will only assign a column name if it is “sensible” (a valid name/symbol), which “exp(b)” is not:
> make.names(“exp(b)”)
[1] “exp.b.”
Set deparse.level=2 and R then “does a consistent job of it”. But you will have difficulty using the result of cbind(a,exp(b),deparse.level=2) in a model because the second argument isn’t a valid name. Trade-offs…
[Reply]
Wes McKinney Reply:
September 30th, 2011 at 2:37 pm
Like I said, this kind of stuff is deeply entrenched in the R ethos– you’re welcome to it. I realize that it’s there for domain specific reasons (making it easier to munge vectors and data.frames together without having to manually assign names to 1d objects). All I was saying is that I reject it as an acceptable design pattern in Python. To me this falls into the category of “give someone an inch and they’ll take a mile”.
I realize I’m being a bit cheeky calling it “parser abuse”, but I mean seriously, using information from the previous stack frame (or worse, the global namespace)? Thinking about it makes me want to wash my hands