The Detection + Reaction Lockbox
Carol Eckerly, Ph.D.
Carol Eckerly, Associate Psychometrician, Educational Testing Service (ETS)
Detecting Compromised Items in Computerized Adaptive Testing
You might also like...
More Reads
In a world where social media dominates online discourse and braindump conglomerates threaten to destroy, web patrol acts as one of the most powerful security tools in detecting fraud and defending our tests.
What do keeping eggs in baskets, putting extra nails in coffins, balancing on bikes, and using all three legs of the stool have in common? We'll give you a hint: Test Security.
Submit
Join our mailing list
Thanks!
Copyright© 2019 Caveon, LLC.
All rights reserved. Privacy Policy | Terms of Use
Contact
Interested in learning more about how to secure your testing program? Want to contribute to this magazine? Contact us.



The increased use of computerized adaptive testing (CAT) has been beneficial to both test sponsors and test takers alike. As most adaptive tests are offered on a continuous basis, test takers can generally schedule a test at their convenience. Additionally, test takers of varying abilities can be measured with similar precision using CAT, unlike fixed linear forms. CAT also has the potential to help from a security perspective, because there will be very little overlap in test takers’ exam forms given a sufficiently large item bank. This can make it harder for test takers to copy from each other or to benefit from item pre-knowledge (i.e., have access to live test content prior to the test). However, continuous testing also increases some security risks. Test takers who take the exam earlier in the test window can share content with test takers later in the test window, and organized item theft can severely jeopardize the validity of exam scores. Due to these risks, procedures to monitor the performance of test items over time are necessary to identify potential compromise of test content. Lui, Han, and Li (2019) recently introduced a method which uses a model to represent a hypothesized process of items becoming compromised and subsequently exposed prior to the exam to more and more examinees as time passes. The model is used to:
  • Flag potentially compromised items
  • Estimate the time point that the item was compromised
  • Estimate the rate at which the examinee population is exposed to the item
In test security research, investigators will often simulate test response data to gain insight into how proposed methods perform in terms of the detection rates and false positive rates across different conditions of interest. In the case of methods to identify compromised items, detection rates are defined as the percent of truly compromised items that are flagged using the method, and false positive rates are defined as the percent of non-compromised items that are falsely flagged using the method. In the absence of real data sets where compromised items are known, simulation studies are a useful way to assess false positive and detection rates. Lui, Han, and Li (2019) simulated test data to evaluate the performance of this detection model, illustrating conditions of both organized item theft that occurs early in the testing window, and random item leakage that occurs throughout the testing window. The authors’ simulation study showed that detection rates were high and false positive rates were close to the expectation given the specified flagging criteria for varying simulated speeds at which items become exposed to test takers. While this study presents a promising method for detecting compromised items in a CAT framework, practitioners should keep in mind that the model represents a simplified hypothesized process of exposure and score gain due to item compromise which will not fully capture the true process of compromise. Additionally, item compromise may not be the only reason an item is flagged using this procedure. For a variety of reasons, an item’s statistical properties may change over time, a process known as “item drift.” This method flags items that become easier over time, which may occur due to item compromise or another reason. However, even in instances where item drift occurs that is not due to compromise, this method could still potentially be a useful quality control tool for monitoring item performance over time.      
References
Liu C, Han KT and Li J (2019) Compromised Item Detection for Computerized Adaptive Testing. Front. Psychol. 10:829. doi: 10.3389/fpsyg.2019.00829