Sarah Toton, Psychometrician, Caveon Test Security
June Edition
calendar-icon_whitecircle
imageedit_18_8790493667
Fair and Square: DOMC Findings by Demographic Group
The Data-Driven Decisions column helps you use the latest scientific research conducted by Caveon to evaluate your options and make the best decisions for your testing program.
Fairness is important in testing for ensuring that groups or individual test-takers are not systematically disadvantaged. Often, people ask me about the fairness of DOMC items. In this article, we will explore if the DOMC item format produces items that are biased against specific demographic groups. Here at Caveon, we have been doing a series of studies on innovative item types. We now have conducted and analyzed data from more than six research projects covering a whole range of topics, including geography, trivia, and technology. When we designed these experiments, we converted multiple-choice items to DOMC items. Thus, we have multiple-choice (MC) items and DOMC items that are equivalent in content and options and differ only in the format. We also asked participants survey questions about their demographics, experience with multiple-choice tests, and preference for multiple-choice versus DOMC. We directly compared the results from both item types across demographic variables such as gender, age, and ethnicity. Over all six of the research projects we’ve completed so far, we see similar results in terms of bias for multiple-choice and DOMC items. Generally, this is a lack of bias in both item types, but in a few cases we have seen a bias in scores for both. For example, age is related to geography knowledge such that older participants score higher. Let’s explore this finding. For background information, let me describe the participants of the geography study. There were 250 participants in this study, but two were excluded for completing the test too quickly or taking far longer than would be needed to complete the test. There were 130 men, 105 women, and some participants who chose not to specify the gender they identified with. The survey question for age was asked as a Likert scale item (In later studies, we changed the format of this question to elicit exact age in years), with groups of ages as options. The options were “Under 12”, “12-18”, “19-35”, “36-55”, “56-70”, “Over 70”, and “Prefer not to answer”. Participants ranged from 12 to over 70 years of age. The number of participants in represented age categories as well as summary statistics for the test scores for these categories are provided in Table 1. [1] In later studies, we changed the format of this question to elicit exact age in years.
You might also like...
More Reads
Test security expert Nikki Eatchel speaks about the importance of fairness in the field of test security and her experience at the forefront of the field.
Ask an Expert: Nikki Eatchel
To learn more about the DOMC item type, read this article by David Foster:
Contact
Interested in learning more about how to secure your testing program? Want to contribute to this magazine? Contact us.
Submit
Join our mailing list
Copyright© 2018 Caveon, LLC.
All rights reserved. Privacy Policy | Terms of Use
Table 1
When we do an ANOVA with test score as the dependent variable and age as the independent variable, we see that age does significantly affect test score (see Table 2). This is also true for DOMC score and multiple-choice score, which are simply the test scores split by item type.
Table 2
From the descriptive statistics in Table 1, we can see that the average test score seems to increase across our age groups and this likely accounts for the significant findings in Table 2, but it still is not clear which groups are different from others. A series of independent t-tests was run to determine which age groups score significantly differently from others. To ensure sufficient sample size, only age groups with 30 or more participants were assessed. The results of these contrasts can be seen in Table 3.
For test score and DOMC score, the one significant group difference was between 19-35 year olds and 36-55 year olds. For multiple-choice, the one significant group difference was between 19-35 year olds and 56-70 year olds. However, the difference between 19-35 year olds and 56-70 year olds for test score and the difference between 19-35 year olds and 56-70 year olds for multiple-choice were marginally significant. (Note that no corrections for multiple comparisons were performed. Applying Bonferonni’s correction results in a p-value threshold of 0.017, such that only p-values under this would be considered significant.) Although the pattern of results for DOMC and multiple-choice items is slightly different across our specific age categories, they show that 19-35 year olds generally score lower than older participants in the 36-70 year old range.  To investigate these findings further, we looked at how age was related to performance on each item. Chi-squared tests for each item were computed with item score and age category. The majority of the items showed no significant difference in item score based on age (30/40 with 20 DOMC and 20 multiple-choice items). However, some items did show statistically significant differences (these are presented in Table 4). Four multiple-choice items showed relationships with age that were not present for the DOMC versions of those items. Two DOMC items showed relationships with age that were not present for the multiple-choice versions of those items.
Table 4
Although the results of this study do suggest bias in some items related to demographics such as age, these findings make sense given that older participants have likely been exposed to more locations around the world, or at least more information about those locations.  Also, this bias is often not specific to the DOMC format items, but affects the multiple-choice format items as well.  For example, the Chunnel in item 16 was built beginning in 1988 and opened in 1994. If we assume that someone would have to be 16 years of age to hear about (or care about) news such as the opening of the Chunnel, then only people who were born in 1978 or before are likely to have experienced the building of the Chunnel. People who are born before 1978 are 40 or older this year (2018), supporting the general age differences in Table 2. Across more than six studies, there are minimal cases in which scores are related to the demographic group. However, in the cases when they are, the results are generally true for both multiple-choice and DOMC items, suggesting that this bias is not introduced by the DOMC format.
Table 3