Does anyone know a simple formula for the expected value of the logarithm of the determinant of a Wishart-distributed matrix? It seems like there ought to be one, but my search so far has been frustrating.

(x-posted to stat_geeks)
Coach McGuirk

Textbook Topics

I am currently working on an introductory textbook and some additional materials (including textbook/class materials for an intermediate course). I have a question for you.

Set aside any preconceptions about what must be in an introductory (or intermediate) book on statistics. What should be in such a book? Introductory stats books still suffer from being very 1950's oriented in terms of topics, emphasis, lack of computing, etc. (If you think computing should still play no role in such a book, let me know that, too.) Statistics has changed a lot over the decades, should not the books change, too?

If you are a statistician, what is important that is missing? If you are from another field, what do you find missing or what should not be there? Please let me know if you are a statistician or not, too, that would help.

Any input would be great, and be really appreciated! Thanks!

Crossposted at stat_geeks
Peter Lorre in Mad Love

Kaplan Meier Curves in SAS

Is there a way to plot the unadjusted survival in a Kaplan Meier curve, as opposed to the survival estimate from the regression? (I am using Proc LIFETEST) This differs noticeably from the actual survival reported in a table and a reviewer has dinged us on it.

Shoudl I use the CDFPLOT statement in Proc CAPABILITY instead? (Assuming I have that installed, never heard of it before...)

Help needed!

As you can see, data points on this scatterplot are divided in two groups - one along horizontal line within Y values 4-8 and other along slanted line reaching Y=20.
 - Could you suggest any statistical criteria to confirm/reject that we really have two datasets here instead of one? 
 - Are there any methods of separating these points to two datasets better then "by eye"?
 - Are any of aforementioned methods implemented in StatSoft Statistica 8?

My best idea so far was to test the distribution of dependent variable (Y) for entire dataset and prove that it is not normal, then test two parts of dataset and prove that they are distributed normally. 

How many free parameters are there in a covariance matrix?

I'm trying to calculate BIC for a fairly complicated model which includes several covariance matrices, and it just occurred to me that I don't actually know how many free parameters this represents. If I have a D-by-D covariance matrix, the naive answer is that this represents D(D+1)/2 free parameters, because the matrix must be symmetric -- but the matrix must also be positive definite, which is a stronger condition, so I'm guessing that the actual number is something less than that. Any thoughts?

(x-posted to stat_geeks)

I'm Not Sure That I Trust My Own Reation To This

Open source R in commercial Revolution, and Revolution lets R do stats on big data (the second link references the first).

From the first link: "Open source purists probably won't be all too happy to learn that Revolution is going to be employing an "open core" strategy, which means the core R programs will remain open source and be given tech support under a license model, but the key add-ons that make R more scalable will be closed source and sold under a separate license fee."

I'm not a purist, but the license fee described in the second link is ... impressive.
Coach McGuirk

Stats Software for Intro Class (of non-statisticians)

I am slated to teach a class in intro stats to psych majors this upcoming semester, and I would like some software to use consistently throughout the course. I refuse to use commercial software; having myself been trained extensively in SAS only to find it unavailable due to my current job, I don't want to create that situation for my students. In short, I'd like to give them something they can keep using as long as they want to use it.

I'd use R, if they were science students (more comfortable with computing/etc.) or stats majors, but these people are from psychology. So I need something easy to install, and which can be made easy (easier) to use. Don't get me wrong, I have used R in intro work before, but it was not smooth.

I would consider R with a front end. Is R-commander better than it used to be a couple years ago?

I have already rejected PSPP, as it has no graphics (right?), and Statistical Lab (Statistiklabor) as, well, I am having trouble getting that working well. It also lacks the needed detailed support in English that my students might require.

Any other ideas? Any users of OpenEpi or Gretl? Would these work for general stats?

Ideally I'd like the following: some tools for simple non-parametrics, resampling (wishful thinking?), the usual suspects in normal theory statistics with both 1 and 2 way ANOVA, Fisher's exact test (wishful thinking again?), lots of good graphics, relatively easy data transformations, and the ability to do simple simulations (this may have to be done elsewhere than the main package). Obviously I'd like it to be Free (beer) and/or free (libre) and/or open source. But I'll settle for anything the students can get for 0.00 USD legally and give to there friends.

I realize that what I want is simplified-R. Any ideas? Thanks in advance for any leads you can give me.

X-post-to: stat_geeks.