The Detection + Reaction Lockbox
Dennis Maynes, CESP
Dennis Maynes, CESP, Chief Scientist, Caveon
When Things Go "Bump" in the Night
Dennis Maynes heads up the Data Forensics division at Caveon Test Security. His current interests and emphasis are in the development and usage of testing models to test for change and aberrant patterns. He is also actively pursuing applied research in optimal sequential model selection for pattern recognition. He specializes in linear and non-linear modeling using regression, neural networks, and sequential models. He employed his skills during tenures at Wicat Systems and Wicat Education Institute, Intel, and Fonix, a speech recognition company.
You might also like...
More Reads
What compels our colleagues to dedicate their lives to stopping test fraud? This is meaningful work, as expressed by our colleagues in their personal reflections on the importance of test security.
Many of you undoubtedly have toolkits for various routine tasks, as well as for the occasional emergency that surfaces. But what’s in your toolbox for detecting test security problems?
There are some things that worry every testing program manager. In the context of test security, program managers worry about their items being stolen and passed around the Internet. They worry about widespread cheating, about unfriendly media coverage when breaches occur, and about many other nasty test security problems. All these “things that go bump in the night” fuel test security nightmares because they represent losses that can tear any testing program apart. Test security initiatives aim to protect against such losses. Losses that impact the integrity of the test results can occur in many ways, but they are usually the result of cheating on the tests or disclosure of the item bank.



Submit
Join our mailing list
Thanks!
Copyright© 2019 Caveon, LLC.
All rights reserved. Privacy Policy | Terms of Use
Contact
Interested in learning more about how to secure your testing program? Want to contribute to this magazine? Contact us.
"Things that go bump in the night"¹ refers to ghosts or other supernatural beings that are believed to be the source of frightening, unexplainable noises heard at night (that often sound like something being struck or bumped). English poet Alfred Noyes includes it in his 1909 anthology The Magic Casement:

“And if that the bowle of curds and creame were not duly set out for Robin Good-fellow, why, then, 'ware of bull-beggars, spirits, etc.
"From Ghoulies and Ghoosties, long-leggety Beasties, and Things that go Bump in the Night, Good Lord, deliver us!”
1. The History of 'Going Bump In The Night'. (n.d.). Retrieved August 13, 2019, from https://www.merriam-webster.com/words-at-play/origin-of-things-that-go-bump-in-the-night-history


References
Introduction
In 2012, a national news organization published the result of an investigation into the use of recalled test questions (i.e., items from the live exams that have been compiled and shared among test takers). The journalists reported that for several years, many candidates taking a national medical certification exam used “recalls” to become board certified, even though the medical certification board had prohibited this behavior.

A test security breach of this magnitude can undermine the public’s confidence in professional certifications and licenses.



At Caveon, we like to envision the test security process as composed of three distinct areas or elements, shown in the figure below:


This article discusses the value of detection, followed by an appropriate reaction, to maintain the security of the assessment program. At Caveon, we believe that detection and reaction are best viewed as a cohesive pair or dynamic duo, and that they should be coordinated into an integrated process. After all, information without action yields no results, and uninformed action is error-prone. By using both detection and reaction hand-in-hand, a testing program can effectively address and treat test security breaches and incidents. The detection/reaction process can be visualized with at least three components: gathering evidence, assessing the evidence, and taking appropriate actions.



This process should be guided by goals, and the results of the process should be managed to ensure goals are being achieved. It is important to realize that evidence differs in quality and credibility, so decision-makers need to carefully balance the strength of the evidence against the potential for making errors. When decisions may be challenged, the testing program should document the decisions so that they meet the standard of “good faith.” In other words, the testing program has the right and responsibility to oversee and enforce testing integrity when those decisions are applied uniformly and fairly, and not in a manner that is capricious or arbitrary.

Risk
Risk represents the potential losses that a testing program may incur when their test security is breached. Risk results from the likelihood of potential threats, vulnerabilities, and attacks faced by a testing program and the amount of damage that results. For example, if the entire item bank were to be disclosed and shared among the candidate population, it would be rendered useless. The entire item bank may need to be regenerated, a cost which could easily exceed one hundred thousand dollars. If this happened, the testing program might be forced to suspend testing, which could inconvenience many individuals who rely upon the results of the testing program to make decisions (e.g., candidates might have to postpone career plans, or state assessment officials may not be able to provide proper accountability evidence). Some losses may be easily quantified (such as the cost to redevelop the item bank) while other losses are not measured as easily (such as the inconvenience of placing career advancement on hold). Unless risk is assessed and understood, it cannot be managed. Detection can identify potential test security issues and quantify how likely those are to occur. It follows that a risk assessment is necessary to understand which detection procedures should be implemented, and which actions should be taken when test security problems arise. The simplest estimation of risk is expressed statistically as “expected loss,” which is the probability of the loss occurring multiplied by the magnitude of the loss. However, risk also involves an organization’s ability to weather or absorb test security losses. If potential losses overwhelm an organization’s resources (e.g., the organization cannot regenerate the item bank), the organization’s existence may be jeopardized. This certainly should be considered when conducting a risk assessment.

Expected losses, while useful when applied to large groups, may not accurately quantify risk for individual organizations. Homeowner’s insurance: Suppose that the probability a home valued at $500,000 will be destroyed by fire is 1 in 10,000. The expected loss paid to insurance is $50. But, if the house burns down, the homeowner’s loss of $500,000 will be 10,000 times greater than the expected loss. How much would the homeowner willingly pay for insurance? Certainly much more than $50. Financial investor: In personal finance, when two investment portfolios have the same expected return, the one with the greater risk has the greater fluctuation. In this example, less volatility means greater stability and security.



The above discussion is relevant because test security measures and initiatives require financial support. That financial support can be viewed as insurance or protection against loss that results from test security incidents. The amount of insurance (i.e., the size of the budget that is allocated toward test security) should be enough so that the organization can manage test security risks.
Detection
Often, we think of detection as discovering that test security has been—or is being—threatened. Typical detection initiatives are web searching for disclosed exam content, observing and monitoring a test’s administration (e.g., using a proctor or invigilator), and forensically analyzing test results. Other detection initiatives can also be beneficial, such as employing tip lines or using candidate surveys to gather information. Broadly speaking, detection initiatives seek to gather facts or evidence for answering questions concerning who, what, where, when, and how test security was threatened and/or breached. In this sense, detection procedures may be employed to not only learn about previously unknown incidents, but also to gather additional information about situations which might have raised concerns.



Several years ago, four students missed the final exam of a college course. They requested they be allowed to take the exam anyway. They claimed they had car trouble. They said that their car had a flat tire and its spare tire was missing. Giving them the benefit of the doubt, the professor allowed them to take the test, with one question added. The final question on the exam was, “Which tire?”
In this way, the professor learned whether the proffered excuse was legitimate.



Detection procedures should be selected to obtain information about threats and vulnerabilities that a testing program may face. Some threats are potentially shared across many types of testing programs. The likelihood varies between testing programs for other threats. For example, a vulnerability exists with assessments in public schools when teachers administer tests to their own students and the teacher has an incentive to help students while taking the test (e.g., the teacher’s salary may be affected by student test results). This same vulnerability will probably not exist at a test site where professionals take credentialing exams proctored by employees of the testing vendor. On the other hand, when the test results affect an individual’s career progression, the candidate may be motivated to search the Internet, purchase disclosed content, and use that information to pass the test. Elementary school students probably do not have the same motivation. Test security threats can be broadly split into two areas: Item Theft and Cheating. In general, item theft is best detected by obtaining a tip from a test taker or by searching for the exam content (e.g., Internet searches of test preparation sites and social media sites). Test-session information encoded into the questions that were presented to a test taker can help identify the individual who was responsible for the theft. Additional evidence obtained by proctors (e.g., confiscated electronic recording devices or visual records of theft-in-progress) can help detect item theft. On occasion, inferences concerning item theft may be drawn from test result data. Cheating is best detected by obtaining information from test takers, from testing irregularity reports, and by analysis of the test result data. It should be remarked that some types of cheating are more easily detected. For example, answer copying and tampering with exam answers are more easily detected than exam preknowledge or the use of a proxy test taker.



The following interaction occurred in 2014: Testing program manager: “I found a set of our test items on the Internet. Our items are delivered in random order to each test taker. What is the likelihood that the items on the Internet would be in the exact same order as the delivery of the items to a certain test taker?” Me: “How many questions were given?” Testing program manager: “Sixty-five questions.” Me: “The probability is one chance in 65 factorial, which is less than one chance in 10 to the 90th power.” Testing program manager: “Is that strong enough evidence to infer that the candidate who saw the test in that specific item order was involved in the theft and disclosure of the exam questions?”
Me: “Absolutely!”

In this situation, test session information (i.e., item order) encoded into the exam delivery identified at least one person involved in the theft of the exam questions.
Reactions to test security breaches must be guided by the evidence that was obtained. The following questions are relevant in assessing the evidence of a potential test security breach:

Reaction
1.      How strong or credible is the evidence?
2.      What additional evidence should be gathered?
3.      What inferences can be drawn from all the evidence?
4.      Are the inferences reasonable and legally defensible?

A testing program that uses detection procedures must define and follow procedures for making decisions about the gathered evidence and taking appropriate action. The only way to deal with test security breaches is through the test security process of detection and reaction; these two activities are intertwined and interconnected, and both are required to ensure test score validity. Specific detectors and reactions should be carefully selected and implemented to address vulnerabilities, threats, and risks to the testing program. Any test security actions or reactions must be supported by reasonable and defensible inferences.

In conclusion, the detection and reaction process is a critical element of test security that should never be neglected or overlooked. Each testing program should determine how they can effectively detect and react to security breaches in a way that achieves their individual test integrity goals.
1.      Seek redress through copyright infringement activities.
2.      Retire and replace the compromised items.
3.      Identify the thief and impose program sanctions or seek legal remedies.

For cheating:

In 2010, a testing vendor conducted a data forensics analysis of state assessment results for the previous year, at the request of the state department of education. The requested report was received and filed without making any decisions. A year later, an education news writer obtained a copy of the report and published the results, suggesting that the report had been “buried” intentionally. Because of this statement and other allegations, the state department of education was impelled to hire several investigators to determine whether improprieties had occurred.
Failure to act upon evidence can be damaging and harmful. Failure to obtain evidence can be damaging and harmful, also.



After compiling the evidence (and obtaining more evidence through investigation, if feasible), decisions should be made concerning appropriate actions. Testing programs can elect to react in several different ways.

For test theft:

1.      Invalidate test scores.
2.      Impose program sanctions such as banning and probation.

It should be remarked that it is not always possible to establish iron-clad inferences or “proof” that an individual cheated or that an individual was responsible for the theft of test items. Even so, the evidence may allow inferences that the test items are no longer reliable, or that the test scores are not valid. Hence, it is important to realize that some reactions can be made to address program elements (e.g., test taker agreements, replacement of test items, and invalidation of test scores), and that other reactions can be made to address the behaviors of individuals (e.g., ethics violations and copyright infringement). It may be that after appropriate review and consideration, no further action will be taken. Even so, review of the evidence followed by any decision is still a reaction.

Conclusion
Once a decision has been made, and a course of action has been selected, the evidence and decisions should be documented. Care should be taken in the documentation of the program’s reaction to the security breach. For example, “cheating” should not be stated as a conclusion, unless the evidence and the decision were based on that inference. To make inferences about an individual’s behavior, you will need information about that behavior; however, inferences about the validity of a test score or the disclosure of a test item can be drawn without behavioral evidence. Of course, decisions that address individual behavior need more care and support than decisions that address test scores and test items.
"Detection and reaction are best viewed as a cohesive pair or dynamic duo... After all, information without action yields no results, and uninformed action is error-prone."