### Help needed!

As you can see, data points on this scatterplot are divided in two groups - one along horizontal line within Y values 4-8 and other along slanted line reaching Y=20.

- Could you suggest any statistical criteria to confirm/reject that we really have two datasets here instead of one?

- Are there any methods of separating these points to two datasets better then "by eye"?

- Are any of aforementioned methods implemented in StatSoft Statistica 8?

My best idea so far was to test the distribution of dependent variable (Y) for entire dataset and prove that it is not normal, then test two parts of dataset and prove that they are distributed normally.

lyonessei have yet to come across a decent test for bimodality (i kind of made one up at one point, but it was a lot of math :) i suggest sorting your data into your circles and your squares, picking what you think is the minima between them, and using those two axes to do a simple chi-square.

good luck!

_hellmaus_How to pick a minima between these groups better than "by eye"? What kind of sorting criteria should I use?

lyonessein terms of picking the minimum, **look** at your graph, and then write that this was the local minimum. it's just like picking the mode.

the test i made up involved calculating all permutations (your actual data, your data with two points switched in sq vs circleness, 4 points switched, 6 poings switched, &c. until all switches have been made) and then seeing the probability of getting at least this many squares on one side of the minimum as opposed to the other. i do not recommend this. i think a chi-square is much more comprehensible and will give you a better sense of your data.

tomtomtomtomtomlyonessehas already given you a good answer, but to get some more I recommend asking this question at http://stats.stackexchange.com/.nefedorAnd yes, I have a stat criteria to confirm/reject in R^N. It's from an article from one of the russian periodical stat journals, I'm not sure if it's any use to you if you don't know Russian. However I can explain the stat if you like.

_hellmaus_nefedor_hellmaus_nefedorpumpkin_pi_hellmaus_pumpkin_pibudhaboyIt looks like the 'top' set may have a different slope. create an factor variable for whatever you think it is that is causing the two sets.

Then fit this regression models using least squares:

y=a+X1m1+X2m2

(same intercept, different slopes)

or

y=a1+a1+X1m1+X2m2

(different intercepts, different slopes)

It's important to note though, that you've got to identify the factor you think made them different. You can't just say, 'these look different, so I'm going to give them a different factor). If you have no clue as to what made them different, you can definitely go kmeans clustering, or my personal favorite, fuzzy clustering which gives a 'possibility' of each datapoint being included in a group.

_hellmaus_Identification of factors that made two groups different are planned, I think that multiple regression analysys should be enough.

budhaboynefedorbudhaboyhttp://www.amazon.com/gp/product/047198

It's a damn shame it's so expensive when I got it in '03, it wasn't nearly that much.