What do serious statisticians use for doing their work? They all use R.

R is an interactive programming environment designed for data analysis. It has its own language (which can confusingly be called S for historical reasons), its own large library of basic and statistical functions, its own quality-controlled repository of contributed libraries, its own interactive shell with integrated plotting. In its own domain, it as complete a working language as Python, Perl or PHP. (It is certainly more mature than Javascript!)

Here are some features of the language to get programmers excited:

Functions are objects.
rmean = function(x=50) mean(rnorm(x))

Inline anonymous functions are easy.
boot(data, function(data, x){ mean(data) - mean(x) })

Numbers are always arrays.
mean(1) == 1

Arithmetic is vectorized.
c(1,2,3) * 2 == c(2,4,6)

Boolean operations are vectorized.
(c(1,2,3) == 3) ==  c(FALSE, FALSE, TRUE)

Object-oriented support with simple prototype system.
df = data.frame(c(1,2,3))
class(df) == "data.frame"
print.data.frame(df)
print(df) #same as previous because of method lookup

Code blocks are objects.
plot(c(1,2,3))
# graph shows "c(1,2,3)" as axis label... so cool!

More excitement: Django + R graphing .

 


Comments

Nick

Thu, 06 Nov 2008 02:00:51

I'm involved in a bioinformatics research project studying the prediction of transcription factor binding sites in tuberculosis and we use R almost exclusively for all of our statistical analysis. It's easy to use, I don't know why it isn't used in every stats class (even in HS). R is a major boom for stats and I hope more people can find time to use it and we can all get a better perspective of analysis.

 

anonymous

Thu, 06 Nov 2008 04:43:48

I worked a little with javascript (ecmascript) over the years. And believe me, R is no where near as mature.
The S dialect is terribly underdefined at times and there is no real specification. It is a language which was developed in a niche without a review process and judging from its semantics I'd say the developers had no previous experience building languages.

 

anon

Thu, 06 Nov 2008 06:05:32

R is easy to learn or use? Heh. I will give you "powerful for stats", that's about it.

 

Quinn

Thu, 06 Nov 2008 08:02:43

We used this in an advanced stats class in uni and it had some students near tears... it's not easy to learn if you don't come from a programming background. However, as I recall, it was fairly powerful.

 

Matt

Thu, 06 Nov 2008 09:27:23

This is the first I'm hearing of "R". Please excuse my ignorance, but how is this better than MATLAB? I'm pretty sure MATLAB could easily be called a standard in the engineering and math worlds, it does everything you've mentioned and so, so much more. Is "R" better because MATLAB is proprietary and not cheap?

 

olli

Thu, 06 Nov 2008 09:27:28

more mature than javascript? be serious.

 

dietbrisk

Thu, 06 Nov 2008 09:35:07

looks kind of like matlab to me...

 

O

Thu, 06 Nov 2008 09:53:36

"Is "R" better because MATLAB is proprietary and not cheap?"

No, that's the reason Octave is better than Matlab. :-)

 

jrcoyle

Thu, 06 Nov 2008 10:24:59

@anonymous "I'd say the developers had no previous experience building languages"

That's pretty rich. Ever heard of Bell Labs?

http://en.wikipedia.org/wiki/Bell_Labs

 

Antonio Di Narzo

Thu, 06 Nov 2008 10:30:32

"Is "R" better because MATLAB is proprietary and not cheap?"

First of all, R is Open Source, not just 'cheap'. This is an evident added value for academic settings. I cannot believe you don't see it.
Moreover, R is taking a lot of ground in industrial settings too (especially in pharma industry), where the cost for a bounch of software licenses isn't an issue at all: here the easy with which one can customize, redistribute, etc. etc. the software is a strong strategic advantage over a proprietary solution.
Public institutions too have long-term, strategic advantages in using an open-source solution for the same reasons. But this is a comparison between OpenSource vs proprietary software merits, not specifically targetted at Matlab vs R.

Second, R is less "general purpose" than matlab. While this is not a benefit in _general_, it means the computing environment is strongly focused on statistics. So, huge built-in support for statistical modelling, diagnostics and so forth. For what can matter for a statistician, I think Matlab is currently better in machine learning stuff, and has better performances for bootstrap algorithms.

One last word: I can't understand these somewhat fanatic replies. The article just says R is nice, and that a lot of statisticians love to use it. Nobody was saying: "hey, stop using matlab, R is superior!". I definitely think that one can be productive in both environments. If matlab is doing fine for your work, better to keep with it, that's for sure.

Just my opinion.

 

bla

Thu, 06 Nov 2008 10:50:29

R is open source. In academics and some enterprises it is *very* important to be able prove that you are not just blindly using an off-the-shelf tool and rely on the results, but that your tool is without fault (for scientific/legal reasons).

Open Source allows you to do just that (within certain limits of course).

 

Matt

Thu, 06 Nov 2008 13:26:39

Thanks for the info!

I really never heard of R before and I didn't see where in the article it mentioned that it was open source.

The only reason I brought up MATLAB was to get an honest comparison, not to start a war!

I'll certainly check out R and keep it in mind for future statistical work.

 

Thu, 06 Nov 2008 13:41:48

Nice writeup, check out statsinthewild.wordpress.com, I know that gregg (two g's) is a big fan of R and has done some tremendous work for us at messagesling.com with it.

 

Todd

Thu, 06 Nov 2008 14:00:46

I use R all the time (nearly every day), and have come to the conclusion that it's a fantastic tool for data exploration, analysis, modeling, and plotting but is a horrible programming language. It's inconsistent in everything from naming to functionality and is clunky to coerce into any kind of straightforward OO paradigm. It's missing simple data structures like hashes, though you can sort of get that functionality with lists through some horrible syntax.

At the end of the day, it's just a tool in my toolbox. I use C for things that have to run fast and efficiently, ruby for pretty much everything else, and R to analyze output and draw pretty graphs.

Finally, one useful link: http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html

 

Sat, 08 Nov 2008 07:47:02

Where I see a lot of interest and movement with R is in integration with other products. SPSS has integrated R into their desktop products.

WPS (a SAS/Base alternative) also has an integration model available where you can use WPS's SAS language to handle large volumes of data and R for advanced statistical analysis. Just a note, my company resells WPS and built the R integration module.

 



Leave a Reply

Name (required)
Email (not published)
Website