Log in

Coach McGuirk

allogenes in statisticians

Stats Software for Intro Class (of non-statisticians)

I am slated to teach a class in intro stats to psych majors this upcoming semester, and I would like some software to use consistently throughout the course. I refuse to use commercial software; having myself been trained extensively in SAS only to find it unavailable due to my current job, I don't want to create that situation for my students. In short, I'd like to give them something they can keep using as long as they want to use it.

I'd use R, if they were science students (more comfortable with computing/etc.) or stats majors, but these people are from psychology. So I need something easy to install, and which can be made easy (easier) to use. Don't get me wrong, I have used R in intro work before, but it was not smooth.

I would consider R with a front end. Is R-commander better than it used to be a couple years ago?

I have already rejected PSPP, as it has no graphics (right?), and Statistical Lab (Statistiklabor) as, well, I am having trouble getting that working well. It also lacks the needed detailed support in English that my students might require.

Any other ideas? Any users of OpenEpi or Gretl? Would these work for general stats?

Ideally I'd like the following: some tools for simple non-parametrics, resampling (wishful thinking?), the usual suspects in normal theory statistics with both 1 and 2 way ANOVA, Fisher's exact test (wishful thinking again?), lots of good graphics, relatively easy data transformations, and the ability to do simple simulations (this may have to be done elsewhere than the main package). Obviously I'd like it to be Free (beer) and/or free (libre) and/or open source. But I'll settle for anything the students can get for 0.00 USD legally and give to there friends.

I realize that what I want is simplified-R. Any ideas? Thanks in advance for any leads you can give me.

X-post-to: stat_geeks.


I guess one question is, how long has it been since you tried using R in an intro class? Installation and documentation have both improved considerably over the last few years, IMO. I've taught "intro to biostats for non-biostatisticians" (mostly medical and nursing students) classes with both SAS and R, and while a few years ago it was easier to get them up and running with SAS, it seems like the reverse is now true. So you might want to at least consider trying them on straight R, since they'll always be able to get it. There is also at least one "introductory statistics with R" book out there which might be useful; I haven't read it, but it gets good reviews on Amazon.
Not long enough, unfortunately. :-) My main problem with your plan is I don't get to pick the textbook as this will be a last minute deal. (I am also teaching a new experimental curriculum that will not follow a standard textbook.) I may end up using R, but I do still believe the students will have more problems that I would like. :-(
Yeah, last-minute-ness always complicates things. Well, best of luck, whatever software you end up deciding to use. I guess what it comes down to is, they'll inevitably have some problems with any software, and it's just a matter of deciding which package's problems will be easiest to deal with.
I would honestly stick with spss (pasw); this is what they will have to know how to use for psych in research positions and probably other classes. maybe you can show them how to use R but use spss the most.
I understand where you are coming from, I really do. But I consider it immoral to use closed-source statistics software. The results of computations must be open to all. I also don't want to stick them with the cost, the obligation to pirate the application, or the intrinsic limits of the reduced versions of the system.

In addition, the limitations of SPSS and now the re-branded PASW (what does that stand for anyway?) are too limiting. It leads psych students down the garden path of doing the wrong statistic for the wrong problem. At some point we have to say enough to damaging the next generation because it is what we got.

Yeah this is overly dramatic, I know, but as a professional statistician who has to teach real statistics, there comes a point where I have to put my foot down and teach the right material, not the expected material. SPSS fights that at every turn.

But seriously, I do understand your position and appreciate your input.


As a some-time instructor of stats for psychologists (and other non-stats majors), I have to say you're just going to have to get over your hang up. R is great, but most psychologists are not going to be doing their own programming.

I hate SPSS, and not because it's closed-source. However, it *is* the standard software that psychologists use *and* are expected to know. Additionally, unless you are teaching at a severely strapped school, they will likely have a psych computer lab with SPSS available on those machines (verify that).

Or, if you want to make sure they *really* get it, do what I do and avoid software platforms entirely. Here's why: SPSS, JMP, Minitab, etc. are all menu-driven and incredibly easy to figure out. What you need to focus on in the stats class isn't the point-and-click, but the actual work and reasoning behind the methods. You're going to be the only person (or maybe one of two) person that's going to impress upon these students the importance of statistics in psychological (and, indeed, any scientific) research, and you're not doing them any favors if you just say "well, ANOVA is when you have groups, and then you get this table like so.. and then this big F here means you reject."

Teach them *why* things are supposed to be done, in addition to the (actual) how, and then leave it to them to figure out how to do the analysis in a software program later. It does no one any good to have to learn software when they're supposed to be learning the techniques -- give these kids a leg up and focus on the theory/skills and make them good practitioners.

If, however, you absolutely can't get by without giving them shortcuts, then use Excel. I hate it, too, but it's Analysis ToolPak add-on does all the statistical methods (ANOVA, histograms, regression, etc) that you will probably need in a stats-for-psychology class.

Good luck.

Re: software

I, too, have taught stats to psych majors (and other non-majors) a number of times in the past. (Actually, I have been doing it for the better part of a decade.) That is why I have pretty much given up on the traditional course. That and I find the books on stats written for psych to be so woefully out of sync with reality, number heavy, graph light, and generally stuck in the pre-1970s.

The course I plan on teaching is the basis of a new textbook I have been writing. This is one of the reasons that I need better software. I need to get students past the "which test do I use?" model and into the "what do I really want to know about the data (or experiment, or ...)?" model. I teach (and have taught with some success) a way of thinking and understanding, not a list of recipes. If I do my job right, knowing any specific package won't matter.

I agree that there is too much emphasis on software, but at the same time not using any software in a class is not good--you need it to analyze real sized problems. A class free of real problems does no one any favors. Excel encourages long columns of numbers and that, IMO, helps no one. (I have tried long columns, and it does not work for me. I have also taught with Excel several times. I found it to be problematic for a variety of reasons.) It is very old school to look at software as a short-cut. It is no longer so. Software, for better or worse, is an integral part of the research enterprise.

To the other point, there is a persistent illusion that to do standard analyses one has to "program" R. While raw R does not give you pull-down menus, it is hardly programming to do a t-test:
Or an ANOVA:
fit <- aov(y ~ A, data=mydataframe)
I grant you that there is a little more going on there, but it is not programming. Knowing how to store data correctly is also required by SPSS. There are some GUIs for R that are improving--I just don't know that I trust them yet. :-) But that is why I was asking around about other software options.

By the way--you didn't say why you hate SPSS. What is the problem with it?

Thanks for the input!

Re: software

True, there's not a lot of programming to do the basics, but it is still more than a lot of students are probably used to doing (unless they are at a school like mine, where everyone is required to have at least one semester of computer science).

I don't disagree that software is important to research, and that no one is out there manipulating massive matrices by hand anymore. At the risk of coming off as a total Luddite, teaching someone how to do something by hand (for example, the within sum of squares for ANOVA) not only is illustrative of the math at hand, but reinforces what, exactly, the within sum of squares is (oh! so, it's the squared difference between the points and the group's mean... it's the only "within" there is). I'm not suggesting that you have them partition the total variance or derive expected sums of squares, but on small problems it is useful to have the work done by hand (or to at least provide the parts and have the students put them together in the right way). By going strictly software or calculator base, and teaching to the technology, you risk (as I said above) stripping away the "why" behind using a technique... you're giving them a hammer without telling them what a nail looks like to appropriately use it.

I don't think that you would go so far as to completely ignore some of the statistical theory by including a bit of a software tutorial in the course, but by becoming too reliant on software too quickly, it's easy to introduce a lot of error into the results. Using variances, again, it's one thing to tell students that a variance can never be negative and have that as a mental check. It's quite another to show them *why* it can't be negative. Not that a package would spit out a negative variance (I hope), but there are other checks one does to see if results make sense, which can be lost by simply relying on software.

It'd be like (forgive the hyperbole) giving elementary school students a calculator and telling them that they don't need to understand how 1+1=2, just that, when you type that into a calculator, you get '2' back so that's the answer.

Finally, to answer your question, I hate SPSS for a variety of reasons:

1. It assumes the tail of your tests based on the test statistic. If, heaven forbid, you have a right-tailed alternative, but end up with a negative test statistic (so you looked in the wrong direction -- it happens), it will report the left-tailed p-value. (I hate JMP for a similar reason, but at least there it reports all possible p-values). Right now, this is my biggest pet peeve, as in an intro class, you may be emphasizing how important it is to really understand H0 and HA, and how they relate to p-values and test-statistics, but it is a rare student who will really get that when faced with an easy (wrong) output. Even after pointing out this flaw repeatedly for several weeks, nearly my entire class reported the wrong results from their tests (with real data and hypotheses).

2. It references "p-value" as "significance", which is incredibly misleading.

3. It calculates repeated measures statistics incorrectly (but we are unsure "how" it is incorrect, because of...

4. It doesn't have in its documentation the formulas used for different calculations... at least not that I've found

Sorry for the lengthy reply. My soapbox and I are going away now.

Re: software

Actually I am glad for the long reply. :-)

Maybe I misstated my position a bit. I am not against all "by hand" work. I used to teach linear algebra and I did make my math students invert or row-reduce a couple of matrices by hand, for practice. I am just against doing an all-by-hand course, which I may have incorrectly assumed you were promoting. I have actually seen that done in several psych departments, and I don't believe that making the students do dozens and dozens of those calculations helps the understanding.

I would never "just" use the computer, I always intended some hand work, but somewhere along the way, the class has to switch to a higher level process--and for me, giving summary statistics (rather than raw data from which the students derive the summaries) is deceptive. I have data on this, actually. So after some demonstrations and exercises for the understanding, I need some way for bulk data to turn into summaries and tests and intervals, and figures ... and there is where the software comes in. But I would consider it crazy to just push buttons. So I think we agree there. I certainly have always done the specific things you mention.

As an example of my point--I always teach the SS formulas using the so-called defining formulas, but (in psych) usually do not teach the so-called short-cut formulas. In such a class those are not well used. I have seen students in psych stats classes who were taught only the short-cut formulas, as the teacher had them doing every problem (with data) by hand. They also spent too much of their time using pre-computed summary stats in that course. Without the data at hand, how do they check the assumptions? Take it on faith in the problem statement? I suppose I am against faith-based statistics. :-) I want them using pictures of data for every problem, and that needs data and a fast way to get good pictures. Checking assumptions is a habit, and it cannot be established by waving our hands as teachers and saying "assume all is well." Which is about half of my justification for computers.

Thanks also for your comments on SPSS. Getting on my soapbox, that is why I constantly press for open-source software in the sciences. It is exactly problems like your comments 2-4 that make the point that a lot of us are making these days: scientific results must be open to inspection and therefore we need to be able to (at least in principle) know exactly how they were derived. But unfortunately there is a cost to that, and that is being forced to use software that may be built to a different way of doing things (commands rather than menus) than one might prefer. Unfortunate, but life is always balancing costs...

Thanks again!
Yeah, I am currently leaning that way. Now if I could just build my own installer to simplify the setting up... :-)




Minitab is the best software out there for non-stat students due to it's simple drop-down menu. We used this program to teach undergraduates when i was a graduate student. Check it out.

November 2011

Powered by LiveJournal.com