Log in

Previous 10 | Next 10

Jun. 7th, 2010



Survey on Short Courses and Tutorials in Biostatistics

If you are a statistician or a biostatistician (student or practicing), please take this survey!


Your responses will plan a role in planning short courses and tutorials for a major upcoming conference, and more importantly, it will really help me out! Thanks!

May. 26th, 2010



Survey on Short Courses and Tutorials in Biostatistics

Statisticians and Biostatisticians:

I invite you to take the following survey (it will only take a couple of minutes, I promise):


I am the sole graduate student on a planning committee for a major annual conference in biostatistics, and we are trying to gauge interest in specific topics for short courses and computer tutorials. You'll be doing me a HUGE favor by taking this survey!

Thanks so much, guys !!!

(cross-posted to stat_geeks)

May. 3rd, 2010



yet another "how to do this in R" question

Okay, here's what I'm trying to do. This is a genomics-related question, but it's a general problem. I have a two-column matrix or data frame representing gene start and stop positions. I want a vector of length n, where n is the total number of base pair positions on the chromosome, where each element has a value of 1 if the position is in a gene and 0 otherwise. For a very simple example, suppose n = 10 (surely this organism has the smallest genome ever!) and I have the following data frame "gene":

start stop
2 5
7 8

and I want the vector "isGene":


Now, the mindlessly inefficient way to this would be:

isGene = rep(0, n)
for(i in 1:nrow(gene))
geneRow = gene[i,]
isGene[geneRow$start:geneRow$stop] = 1

but surely there must be a better way? I'm dealing with real chromosomes here, not toy examples, and this kind of clumsy iteration eats up a lot of computing cycles.

(x-posted to stat_geeks)

Mar. 16th, 2010


(no subject)

Good evening everybody,
 How to fit data to cumulative normal distribution in R? I tried such way:
Read more...Collapse )

Feb. 18th, 2010



creation of a histogram picture

so here i am on a mac. i want to draw a histogram. having it draw from say a comma-separated file would be nice, but not necessary. i need a jpg or somesuch, with little x's and o's for my two kinds of subjects, and numbers from 0 to 25 for my scores. i have

is there a nice, free, possibly pre-installed tool i can use to do this?

sorry for such a very lamer question, but does anyone know a good tool for this? if all else fails i have linux and the gimp :)

Feb. 4th, 2010

ano hi...


Method of Moments, MLEs, and Standard Errors

Hi, I have a question about standard errors in the context of this problem:

    Suppose X is a discrete random variable with

    P(X=0) = 2y/3
    P(X=1) = y/3
    P(X=2) = 2(1-y)/3
    P(X=3) = (1-y)/3

    Where 0<=y<=1. The following 10 independent observations were taken from such a distribution: (3,0,2,1,3,2,1,0,2,1).

    Find the method of moments estimate of y, an approximate standard error for your estimate, the MLE of y, and an approximate standard error of the MLE.

I have found the method of moments estimate of y (5/12) and the MLE (.5) but I'm not sure how to go about approximating the standard errors. What I initially did for the SE of the first estimate was to calculate the different y's based on the observed probabilities of the X's, then add the squared differences between them and 5/12, divide by 4, and take the squared root, but that doesn't seem quite right. Sorry to ask such an elementary question, but I'm really puzzled as to how to do this. Any help would be greatly appreciated. Thanks!

Jan. 3rd, 2010



Hougaard-Weibull question

Okay, so I have a question about the Hougaard multivariate Weibull distribution that I'm hoping someone can help me answer. The distribution is as given in [1], and is most easily defined by the survival function. Let T = (T1, ..., Tn) be a vector of r.v.s with marginal Weibull distributions, and let t = (t1, ..., tn) be a vector of observations. Then the multivariate survival function is given by:

S(t) = P(T1 > t1, ..., Tn > tn) = exp{-(Σi=1,...,nεitiγ)α}

for constants α, γ, ε1, ..., εn > 0.

Hougaard claims that this is only a legitimate survival function with the additional constraint α ≤ 1. What I'm trying to understand is why. It seems to me that for any positive α, the expression obeys all the rules for a proper survival function: S(0, ..., 0) = 1, the limit as any ti goes to infinity is 0, and S is strictly decreasing in the ti's.

Now, I've read the derivation (partly in [1], partly in [2]) and I understand that S was derived via a positive stable frailty distribution, and that this derivation imposes the constraint. I also understand that in general, Archimedean copulas, of which this is an example, require concave generator functions, and although I haven't gone through the math I can guess that α > 1 might violate this requirement for some values of t. But again, looking at the specific S given above, I still don't see how any positive value of α can make it not be a legitimate survival function. Honestly, how it was derived seems kind of irrelevant to its legitimacy; once you've got the function, if it meets the requirements, why not use it?

Any insight that anyone can offer on this will be greatly appreciated.

[1] A Class of Multivariate Failure Time Distributions, P. Hougaard (1986), Biometrika 73(3):671-678

[2] Survival Models for Heterogeneous Populations Derived from Stable Distributions, P. Hougaard (1986), Biometrika 73(2):387-396

x-posted to stat_geeks

Dec. 22nd, 2009



bimodal tests

so i have this data which turned out to be bimodal. (i was so blind in scoring it, i was triple-blind!) let's call this score number s.

i want to describe and test this data in terms of one factor (on or off). most of "on" is in one peak, most of "off" is in the other peak. (by pretty large values of "most", especially in the "on" case, as it happens.)

i am sorta thinking about this in two ways:

. if f=(1,0), p(s)= ?


. for s=(0-26), p(f=(0,1)) = ?

are these the reasonable models, or given that i have two kinda gaussian curves with a nice valley of nuthin' in between, is there some other way i should look at it?


what would be reasonable tests to poke this with to try to generate some lovely p-values?

thanks for any thoughts :)

(eta over in my own journal i posted a "disease/exposure" 2x2 matrix analysis, which is SUPER LAME i know but i still got a nice teeny-weeny p-value. ;)

Nov. 15th, 2009

super cool, lounging


x-posted in stat_geeks: Help :(

Hey everyone,

I am a student currently taking a stats class, and having trouble with test statistics. For some reason this stuff just does not compute in my brain. I was wondering if anyone knew off the top of their heads the equations needed for a couple problems I have to complete. I have a list of equations, but they are all for proportions and I have a feeling they are not the equations that I need (this is what I get for being out of class for a week due to swine flu, eeek!).

Problem 1:
A study found that the mean number of hours of TV watched per day was 4.09 for black (N=101, Standard Error = 0.3616) and 2.59 for white (N=724, Standard Error = 0.0859).
a. What type of test should you run?
b. Construct Hypotheses
c. Conduct a significance test using an alpha-level of 0.01 and interpret.
d. interpret your P value
e. Construct a confidence interval and interpret.
f. Interpret as a ratio.

Problem 2:
An experiment of responses for noise detection under 2 conditions used a sample of twelve 9-month old children. The study found a sample mean difference of 70.1 and a standard deviation of 49.4 for the difference.
a. What type of test should you run?
b. Construct Hypotheses
c. Conduct a significance test using an alpha-level of 0.01 and interpret.
d. interpret your P value
e. Construct a confidence interval and interpret.

If anyone could help me - it would be MUCH APPRECIATED!

Nov. 11th, 2009

My Manga


SAS Macro Variable Question

Does anyone know how I can get assign the ATTRN(NObs) call to a macro variable, so I can run the macro iteratively until the resulting output file has 0 observations? I can save the NObs to a regular variable ( _NObs_) but I can't use this in a %DO %UNTIL() statement, and if I try to assign this value to a macro variable it resolves to '_NObs_'. I also tried using the %EVAL function to do this.

Appy polly loggies if this doesn't make sense, I can include my code if anyone is interested but its late and I developed a kludge workaround (%DO i = 1 %TO {some huge number} ), it would just be nice to be able to do this efficiently.

Previous 10 | Next 10