permutation test for one-way ANOVA?

$\begingroup$

I want to know what steps to do the permutation test (and the R/SAS code) for one-way ANOVA?

Would you give me the specific steps?

Thanks, Buddies!

$\endgroup$ 1

1 Answer

$\begingroup$

First: The first thing to decide in doing a permutation test for a one-way ANOVA is the 'metric' you are going to use to judge differences. You might pick the maximum difference in the sample means, the variance of the sample means, the standard F-statistic, and so on.

Here I will illustrate the standard F-statistic. The motivation for doing a permutation test would be that you doubt the assumptions for the standard ANOVA model: normal distributions or equal variances. Failure of these assumptions does not necessarily mean that the F-statistic is a bad way of measuring differences among means. Rather the problem is that the F-statistic may not have an F-distribution.

Second, for my example, I'll select the dimensions of the data. Suppose we have $g = 3$ treatment groups, with $n = 10$ replications per group.

Second, we need some data. I'll generate these in R. Since you know R, you'll know that the null hypothesis is true and that the assumptions of a standard ANOVA are met. That way we can compare the permutation distribution against the distribution F(2, 27) when we're done--as a reality check on the validity of the program. Obviously, you can retrieve my data by using the same seed I did. You can check that the three group means are 48.85053, 49.64549, and 48.83616. The last line does the standard ANOVA obtaining F = 0.2778 and P-value 0.7596.

 set.seed(1234) x1 = rnorm(10, 50, 3); x2 = rnorm(10, 50, 3); x3 = rnorm(10, 50, 3) Group = as.factor(rep(1:3, each=10)); Meas = c(x1, x2, x3); anova(lm(Meas ~ Group))

Third, do the permutation test. Under the null hypothesis, it ought not to matter into which of the three groups each of the 30 observations falls. So the permutation test is done by randomly permuting the data vector 'Meas' and finding the F-statistic for each permutation. In what follows, I will take the lazy way out and use the R statements 'lm' and 'anova' find each of the F-statistics. Note that the F-statistic can be retrieved as element [1,4] of the output. This runs slowly because R formats the ANOVA table for each iteration, wasting a lot of time. (It would be much more efficient to write code to find F for each iteration, and you should experiment with that and maybe use more iterations than I did.)

With the seed shown, my simulated permutation distribution of the F statistic had 0.7667 its values above .2778 (the F-value for our data). This is close to the P-value 0.7596 obtained from the standard ANOVA. So our permutation distribution is giving nearly the same P-value as did F(2, 27).

You can also use 'hist(f.stat, prob=T)' to make a histogram of the m values of 'f.stat', and then use 'curve(df(x, 2, 27), n=1000, add=T)' to show that the simulation distribution is very nearly F(2, 27). (Caution: Never use 'F' to represent the F-statistic. In R, the name 'F' is reserved for 'FALSE' in logical vectors; changing that can result is amazing malfunctions. Would you care to guess how I know this?)

 set.seed(4321) # just so you can exactly replicate my simulation, if you like m = 10000; f.stat = numeric(m) for (i in 1:m) { perm.Meas = sample(Meas, 30) f.stat[i] = anova(lm(perm.Meas ~ Group))[1,4] } mean(f.stat > .2778) # P-val for your data; compare with .7596

Finally: Now to speed things up and perhaps use a larger m (for a balanced design only): You could put the 30 data values into a 3 x 10 matrix 'MAT' after permuting them. Then use 'rowMeans(Mat) to get the 3 group means and 'apply(MAT, 1, var)' to get the 3 group variances. From there it is trivial arithmetic to get 'f.stat[i]'.

Of course the next steps would be to simulate data for which the null hypothesis is not true and see if you get the right noncentral F distribution, and then try using some real data.

Note: This is a simulated permutation test. To do a real permutation test one would have to look at all the ways to put 30 observations into 3 groups of 10 and find the F-statistic for each of them (I guess something like $5.5 \times 10^{12}$)--a hopelessly formidable task. Instead we find the F-statistic for m of the possible arrangements and trust that to give a good idea what the true permutation distribution is like. So, usually in practice, except for the most trivial examples with only tiny datasets, 'permutation test' is an abbreviation for 'simulated permutation test'.

Addendum: You have requested the data. They were randomly generated in R using the code above. They appear below, unrounded in the first three columns and rounded to one place in the last three:

 x1 x2 x3 x1 x2 x3 46.37880 48.56842 50.40226 46.4 48.6 50.4 50.83229 47.00484 48.52794 50.8 47.0 48.5 53.25332 47.67124 48.67836 53.3 47.7 48.7 42.96291 50.19338 51.37877 43.0 50.2 51.4 51.28737 52.87848 47.91884 51.3 52.9 47.9 51.51817 49.66914 45.65539 51.5 49.7 45.7 48.27578 48.46697 51.72427 48.3 48.5 51.7 48.36010 47.26641 46.92903 48.4 47.3 46.9 48.30664 47.48848 49.95459 48.3 47.5 50.0 47.32989 57.24751 47.19215 47.3 57.2 47.2
$\endgroup$ 5

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like