NAME

eutest - exact U test


SYNOPSIS

eutest [-u] [-m samples] [-t seconds] [files]


DESCRIPTION

eutest compares two datasets for a difference in position by an extension of the Mann-Whitney U test (also known as the Wilcoxon rank sum test). The measurements in the two datasets are ranked from 1 to N, where N = n1 + n2 is the total number of measurements in the two datasets, then the ranks of the measurements in one dataset are summed. (When there are equal measurements, the average rank is used for each.) Statistical significance is then determined by reassorting the measurements between the two datasets, and determining what portion of reassortments give rank sums less than or equal to the actual value, and what proportion gives rank sums greater than or equal to the actual value.

The input consists of a series of lines, each containing one dataset. A line consists of a title followed by numerical values. The title and numerical values are separated from each other by whitespace (blanks or tabs). At present there is no way to include whitespace in a title. Input is read from the file named in the command line if there is one, or from standard input if no file is named. If more than one file is named, they are read in order and logically concatenated. An example of an eutest input file is:

        I5-     8 10 9 8.5 9 8 9
        intact  12 9 11 10 11 8 10
        M3-     11 12 12 10

Every possible pairwise comparison of the datasets is done. (In this case there are three: I5- vs intact, I5- vs M3-, and intact vs M3-.) The ouput contains one line for each comparison. Each line begins with the titles of the two datasets compared, followed by the percentage of assortments that give a rank sum less than or equal to the actual, followed by the percetage that give a greater than or equal rank sum. The first dataset is significantly less than the second at level alpha in a one-tailed test if the first percentage is less than alpha. The first dataset is significantly greater than the second in a one-tailed test if the second percentage is less than alpha. The two datasets are different in a two-tailed test if either percentage is less than alpha/2.

eutest has two modes: exact and Monte Carlo. Exact mode is the default. In this mode all possible reassortments of the data are examined to determine exact significance levels. The number of such assortments is N!/(n1! n2!), so exact mode becomes impractically slow as the datasets become large. Monte Carlo mode, specified by the -m option, can be used for large datasets. In this mode reassortments are generated at random. If a number is given following the -m option, it is the number of random reassortments to check. If no number is given it defaults to 10000, which is usually sufficient for distinguishing significant, highly significant, and non-significant.

The -t option is used to set a time limit on each comparison. If more than the specified time elapses, the particular test is aborted and eutest goes on to the next. This option is useful when some of the pairwise tests are too time-consuming to be practical. The default (if -t is specified without a number) is 60 sec.


OPTIONS

-m [samples]
Calculate P by the Monte Carlo method.

-t [seconds]
Abort the calculation after seconds seconds. In exact mode, no results are reported. In Monte Carlo mode, P values will be reported based on the samples already evaluated.

-u
The -u option tells eutest not to calculate probabilities, but instead to write, for each pair of datasets, numbers n1, n2, and U that can be looked up in a U statistic table.