?

Log in

trombone

le_trombone in statisticians

I'm Not Sure That I Trust My Own Reation To This

Open source R in commercial Revolution, and Revolution lets R do stats on big data (the second link references the first).

From the first link: "Open source purists probably won't be all too happy to learn that Revolution is going to be employing an "open core" strategy, which means the core R programs will remain open source and be given tech support under a license model, but the key add-ons that make R more scalable will be closed source and sold under a separate license fee."

I'm not a purist, but the license fee described in the second link is ... impressive.

Comments

Yeah ...

... I'd be lying if I said this doesn't make me nervous. There's no question that Red Hat has been good for Linux, but there are other distros which are at least as capable as Red Hat's commercial offerings and which don't cost a dime. Will the same be true for Revolution and R?

In particular, this bit gives me chills:

... Revolution is going to be employing an "open core" strategy, which means the core R programs will remain open source and be given tech support under a license model, but the key add-ons that make R more scalable will be closed source and sold under a separate license fee. Because most of those 2,500 add-ons for R were built by academics and Revolution wants to supplant SPSS and SAS as the tools used by students, Revolution will be giving the full single-user version of the R Enterprise stack away for free to academics.

Oookay. Does this mean CRAN in its current form will be going offline? Because that would (a) most likely be illegal since most CRAN packages are distributed under the GPL, and were put up by authors with the understanding that they would be distributed under that standard open source model, and (b) completely kill the advantages R currently enjoys over SAS, SPSS, etc.

There's a comment from Ross Ihaka attached to the first story. He doesn't sound happy.

Also:

Smith says that there are a number of problems with R that need to be addressed to help it go more mainstream. For one thing, he says that while R has a number of different graphical interfaces available, it is still fundamentally driven through a command line interface.

[Beavis] Uh ... uh ... uh ... [/Beavis]
It's the classic "embrace and extend" strategy.

They'll start by selling enhancements to R that make it easy to process larger data sets (this is an important capability of SAS that R has simply been lacking for a long time. For most academic research it doesn't matter, but for the many newly emerging applications of R in commercial data mining it could be very useful.) I don't have any real problem with that part of their strategy, but I'm not really interested in buying their product, since I've never processed a data set that was so large that it wouldn't fit into the 12 gigabytes of RAM in my desktop machine.

They'll also give their stuff away free to academics so that it gets written into textbooks and used in classes. Once they've got a sufficient portion of the user base hooked on their offerings, they'll be free to jack up their prices.

That strategy will only work to the extent that leading users of R (particularly academics who teach the next generation of R users) treat these proprietary extensions as a standard part of R. If they ignore the proprietary extension then the company won't be able to get control over a captive user base.

The best response from those interested in keeping R free and fully open source is to do the hard work of making R capable of handling data sets that are too large to fit into RAM. Unfortunately, I don't think that's very likely to happen.

Another alternative would be for the companies that are using R commercially in their work to support developers who (because they'd be paid to do the hard work) would do the required work to extend open source R to support very large data sets. Some large corporations (Google and IBM for example) have supported work on Linux along these lines. I'm not sure that the commercial users of R working on large data sets are a large enough community (and include large enough corporations) to support this model.

November 2011

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
27282930   
Powered by LiveJournal.com