What is a statistical test? Statistical tests are used to make inferences about a population based on data that are obtained from a sample of that population. Statistical tests evaluate the relationship between two or more variables that are measured in a sample. In the context of adverse impact, statistical tests assess the relationship between group membership (e.g., a particular race or sex) and decision outcome (e.g., pass/fail, hired, promoted). One does not know if adverse impact truly exists in some defined population. Therefore, we make the best decision we can based on the results obtained in a sample. Statistical test of adverse impact estimate the probability of obtaining the observed sample results assuming there is no relationship between group membership and outcome in the population. Statistical tests of adverse impact test the following hypothesis (or null hypothesis): There is no relationship between group membership and decision outcome (i.e., subgroups do not differ in decision outcome; there is no adverse impact); any observed difference is due to chance. Given this null hypothesis, there are four possible decision outcomes as shown in the table below:
Correct acceptance: Correctly accepting the null hypothesis. The truth (which is unknown) is that the population does not have adverse impact and it is decided based on the results of the statistical test that there is no adverse impact. Power: Correctly rejecting the null hypothesis. The truth (which is unknown) is that the population does have adverse impact and it is decided based on the results of the statistical test that there is adverse impact. Type I error: Incorrectly rejecting the null hypothesis. The truth (which is unknown) is that the population does not have adverse impact and it is decided based on the results of the statistical test that there is adverse impact. This is sometimes referred to as alpha error (or α). Type II error: Incorrectly accepting the null hypothesis. The truth (which is unknown) is that the population does have adverse impact and it is decided based on the results of the statistical test that there is no adverse impact. This is sometimes referred to as beta error (or β). A statistically significant result is one in which the probability of incorrectly concluding that adverse impact exists (i.e., a Type I error) is less than a specified level; this specified level is referred to as an alpha level (or α). Statistical tests produce a probability value (or p-value) that determines or estimates the probability of obtaining the sample result assuming there were no differences in the population. If the p-value resulting from the statistical test is less than the specified alpha level, we say the result is statistically significant and would decide, based on this test, that there is adverse impact. If an alpha level of .05 is chosen and the p-value resulting from the statistical test is less than .05, then there is less than a 5% probability that the difference is due to chance (i.e., there is less than a 5% probability of making a Type I error) and we say the result is statistically significant. Therefore, an advantage of statistical tests, compared to the four-fifths rule, is that tests of statistical significance can control Type I error through the chosen alpha level (the four-fifths rule is not regarded as a statistical test, does not provide a p-value to compare to an alpha level, and can not control Type I error). The probability of making a Type I error can be decreased by lowering the chosen alpha level or increased by raising the chosen alpha level. The choice of alpha level is somewhat arbitrary and it should be determined based on the question or relationship that is being analyzed. However, behavioral scientists have historically and consistently chosen an alpha level of .05. In addition, in the context of adverse impact, an alpha level of .05 appears to be the level recommended by the Uniform Guidelines (Question and Answer #24) and the Office of Federal Contract Compliance Programs. One disadvantage of statistical tests, compared to the impact ratio, is that statistical tests only indicate the likelihood with which the differences are due to chance; they do not describe the magnitude of the selection rate differences or describe how meaningful the differences are (e.g., trivial differences can be significant when the sample size is large, and meaningful differences can be non-significant when the sample size is small; see Question and Answer #20). In addition, although statistical tests can control Type I error, they are less powerful and more prone to Type II error than the 4/5ths rule and practical tests are, especially when samples sizes are relatively small and less balanced (a balance sample would be one that has 50% minority and 50% majority as well as 50% not hired and 50% hired). This is problematic because, if a statistical test is not significant, it cannot be certain if adverse impact truly does not exist or if the result is due to chance. Use Adverse Impact Analysis to estimate adverse impact using statistical tests now. |