Nikki Eatchel, Chief Assessment Officer, Scantron
June Edition
Ask an Expert:
Nikki Eatchel
Nikki Eatchel, Chief Assessment Officer for Scantron, has more than 23 years of experience in the assessment industry. She has spent the majority of her career focused in the areas of assessment development and psychometrics and has served in executive leadership positions for these areas in a number of global assessment organizations. In her positions she has worked in various testing segments, including education, certification and licensure, and employment testing. She has also served as an assessment and business consultant for various national and international organizations. She has been personally responsible for large-scale assessment development for international programs as well as state-wide clients, working in some capacity with all 50 states within the U.S. She has added to her well-rounded approach to the assessment industry by serving in leadership positions in program management, client support, and product development. She has also worked in the arena of employment litigation support, preparing testimony for teams representing both the prosecution and the defense. Nikki is the currently serving on the Executive Board for the Association of Test Publishers (ATP) and was Chairman of the Board in 2017. She has also served as the Chair of the Security Committee in 2011-2012 and the Co-Chair of the Security Committee for 2013-2014. Additionally, she has contributed on a number of industry committees, including the Operational Best Practices Committee for ATP and has presented numerous papers at such conferences as ATP, E-ATP, the Council of Chief State School Officers (CCSSO), the International Personnel Management Association (IPMA), and the Council on Licensure, Enforcement, and Regulation (CLEAR).
What are the most pressing issues that contribute to unfairness in testing today?
"As an assessment person I see immense value in testing for all stakeholder groups involved. As with any industry, however, there is certainly the potential for unfairness. Though unfairness could stem from a number of factors, there are two particular issues that are a focus for me. The first is the use of tests for purposes for which they were not intended. Assessment development is a complex process that requires a clear purpose and a specific evaluation target (e.g., an identified content domain, a particular set of knowledge, skills, or abilities). When evaluating the validity and psychometric soundness of an examination, that evaluation is directly linked to the stated purpose of the test (including the knowledge, skills, and abilities intended to be evaluated, and the population for which the exam was developed).  Unfortunately, assumptions are sometimes made about alternative uses for tests (and the resulting data) that are not supported by research. Such a situation can certainly lead to unfairness in testing. For example, using an assessment that was designed to measure student performance as a measure of teacher effectiveness can lead to unfair decisions about educators if the appropriate research has not been conducted to support that use. In addition, requiring a general aptitude test for a specific, skill-based job can be detrimental to candidates if that assessment has not been shown to be directly related to successful job performance. Assessments can be fantastic tools providing valuable information for decision making. When they are used and interpreted incorrectly, however, this creates an unfair testing environment. A second issue that can create unfairness is the reliance on singular data points to make complex and impactful decisions about students and candidates. Assessments used for high stakes decisions (clinical, educational, employment, certification) have to be valid, reliable, and legally defensible. Assuming that an assessment meets those requirements, it is still highly risky to assume a singular data point can provide all of the information necessary to make a sound decision. As individuals in assessment and measurement are driven by data, they understand the limitation of a single data point. But there are a variety of stakeholders that purchase, administer, and use assessments that may not have the education or training to understand the need for a holistic view of student and candidate performance. Though portfolio approaches to student evaluation have increased over the last decade, as has the use of different types of candidates assessments (e.g., written, practical, simulation, performance-based, gaming, etc.) over-reliance on singular data points (potentially due to time and cost considerations), is still a threat to fair assessment practices."
What audiences are most unfairly impacted by a lack of fairness in testing?
"A lack of fairness in testing can cause substantial impact on all audiences, including the assessment owners, the test takers, and the general public."
What can testing programs, both big and small, do to mitigate these issues and protect the populations most effected by them?
"Testing programs of all sizes can focus on a three-pronged approach to test fairness: (1) Design, (2) Documentation, and (3) Education.

1. Design
If a testing program is responsible for developing assessments (either internally or with an external testing vendor), ensuring the design and development plan followed are appropriate, research-based, and support the use of the assessment is critical. That process includes validating the assessment for the intended purpose and publishing a corresponding Technical Report. Following industry best practices during the development process and the evaluation of the psychometric soundness of the instrument provides the foundation for a fair assessment.

2. Documentation:
Many (if not most) assessment stakeholders will fail to read the full Technical Report available for an assessment. This is particularly true if they are users (versus owners) of the assessment product. Based on this practical consideration, it is particularly critical for assessment development institutions to create user-friendly and abbreviated documentation regarding the intended purpose and appropriate use of the exam. Documentation should be created that is aimed at various stakeholder groups (e.g., administrators versus candidates) to ensure that all major groups associated with an assessment are provided with accurate information about the test.
3. Education:
Documentation in and of itself is not enough. Education campaigns and training specifically designed to educate stakeholder groups regarding the appropriate use of tests and interpretation of test data is an important component of test fairness. Using a variety of communication methods across multiple delivery channels is the recommended approach. In addition, being very specific in those communications about appropriate and inappropriate uses of a test is vital."
"Whether the security weakness creates an unfair situation for candidates or for the general public, the impacts can be substantial."
How do security weakness contribute to unfairness?
"An underlying assumption regarding the assessment process is that it is standardized for candidates and students. Part of that standardization includes the restriction of unauthorized access to exam information prior to and during the test session. If some test takers have access to information regarding exam items prior to the test (due to a security weakness), fairness is compromised as some test takers will have an advantage over others. If other security issues occur (such as proxy testers) there is again compromise of fairness to the overall testing population. Strong security protocols and ongoing diligence throughout the assessment development, delivery, and reporting process is crucial to a fair and equitable testing environment. There is also another consideration in the assessment process that isn’t always labeled or identified in the same way -- fairness to the general public. Members of the public make an assumption when they interact with someone who is licensed or certified in a given profession. The underlying assumption is that the person who they are relying on for expertise and service is qualified to provide it. Security weaknesses in the assessment process increase the likelihood of false positives (individuals passing an exam when they should have failed based on their knowledge, skills, and abilities). False positive can actually endanger the public, allowing unqualified candidates to work in a profession in which they have not truly demonstrated competency. In that case, it is the general public that is being treated unfairly and put at risk. Whether the security weakness creates an unfair situation for candidates or for the general public, the impacts can be substantial."
What security measures have had the biggest impact on making tests more fair?
"The continued evolution of assessment security has had substantial impact on assessment fairness, creating a more standardized and accurate evaluation for all test takers. Biometrics employed for confirmation of candidate identity have been extremely useful tools during the test registration and administration process. Remote proctoring capabilities have expanded security options for a variety of testing programs unable to take advantage of brick-and-mortar test centers. Data forensics has also provided a wealth of information to testing programs to reduce cheating, enhance the accuracy and reliability of test scores, and improve the fairness of the overall assessment process."
What role do you think technology can play in making tests more fair? Specifically, what specific technologies do you think have been the most effective?
"From a proctoring standpoint, introduction of biometrics (such as facial recognition) has provided important checks and balances to ensure the security of assessments and the integrity of the test scores.
At a more foundational level, I think the introduction of computer-based assessments (and more specifically, computer-adaptive tests or CAT) created dramatic changes and enhancements in regard to candidate fairness. Though there are many advantages to a CAT exam, one advantage involves the removal of irrelevant questions that are clearly far above or far below the ability level of the test taker. Avoiding such questions not only makes the assessment shorter (reducing test fatigue) but it also helps to minimize extraneous variables such as boredom (with questions that are far too easy) or frustration (with questions that are far too hard) which can lead to less than optimal test taking behaviors and increase error in test scores. In addition, CAT provides increased measurement accuracy for test takers whose performance falls outside the average range within the ability distribution. Just these few advantages of a CAT exam have contributed greatly to the reduction of error in test scores and therefore the fairness of the assessment process."
"As technology continues to advance at lightening speeds, we need to find ways to more quickly innovate our approaches to assessment development and delivery, without sacrificing quality and accuracy of the tests."
Looking forward ten years, in what ways do you think that innovation and technology will continue to impact the fairness of tests?
"Technology does not in and of itself create a fair assessment process. Technology, when implemented thoughtfully, can provide valuable tools to support and enhance fairness. As mentioned above, there are a variety of techniques employed in assessment proctoring (on-site and remote) and data analysis that are grounded in technology and are designed to increase security and fairness (and have, over the years, done just that). There has also been other technology advances that are likely to have significant impact on testing fairness in the years to come, including AI, Analytics, and Game-Based Assessments:
Artificial Intelligence:
The use of Artificial Intelligence (AI) within an assessment environment is nothing new from a scoring perspective. Advances in AI and the ongoing evolution of computer technology has resulted in the significant growth of AI use over the last decade. Though the use of AI for constructed response scoring is not without issue, AI provides the advantage of consistent, objective, and reproducible scores across the test taking population (which has certainly had a positive impact on test fairness).
Advances in big data and analytics technology allows for the expansion of traditional assessment information to include more holistic evaluations of student and candidate capabilities. With strong analytics programs, test scores can be easily evaluated in a broader context with a variety of additional data points (other assessments, grades, attendance, surveys, on-the-job performance, etc.). These types of evaluations allow decision makers to utilize more complex (and likely more accurate) evaluations of an individual’s knowledge, skills, and abilities. Combining those capabilities with predictive analytics opens up a world of possibilities in which assessment data can not only be used to inform stakeholders about performance today, but to provide valuable information and assistance to students and candidates for the future.
Embedded (Game-Based) Assessments:
With the unprecedented advancement in technology over the last 20 years, there is an opportunity for assessment to dramatically evolve as well. One of those exciting areas in our industry is Embedded (Game-Based) Assessments. Game-based assessments provide an opportunity to evaluate test takers within an environment significantly different than a traditional testing session – an environment simultaneously focused on different aspects of a test takers knowledge and skill, such as recall of domain-specific content knowledge, real-time knowledge application and problem-solving processes, social collaboration, and contextual understanding. From a measurement perspective, the challenge is in identifying the right data sets within the game-based information trails, determining the best way to use the data for purposes of evaluation of particular knowledge or skills, and the process for making psychometrically sound decisions regarding a test takers ability level. None of these are small challenges, but as we solve those challenges, the resulting increase in accuracy regarding test taker performance will surely enhance the fairness of assessments."
In the past decade, what measures taken by the testing industry have been most effective in promoting fairness? In what ways can we do better?
"The testing industry is appropriately focused on promoting fairness throughout the entire assessment process. Fairness is built into every aspect of test design and development with adherence to industry best practices, involvement of a diverse set of subject-matter experts, the use of bias and sensitivity reviews, and the deployment of security procedures designed to ensure that every test taker has a level playing field and the best opportunity to accurately display their knowledge. That being said, we can always do better. Our industry can sometimes fall down at the point of communication – particularly when it comes to designing that communication in a format and vocabulary that is understood by the test taking population and the general public. We need to get better in providing clear and understandable messaging (through a variety of channels) about our constant vigilance in regard to fairness and accuracy. And as technology continues to advance at lightening speeds, we need to find ways to more quickly innovate our approaches to assessment development and delivery, without sacrificing quality and accuracy of the tests."
Moving away from the topic of fairness, what are three of the greatest advances in testing in the past 10 years?
"There is so much great work going on in our industry, it’s terribly hard to pick the top three. As mentioned, I think AI, analytics, and game-based assessments are extremely exciting. They have the capability to change the face of our industry. The question remains whether or not we can find ways to incorporate and evolve these advancements while maintaining the integrity (and interpretability) of assessment scores. " 
"Though the tempo of the negative public messaging ebbs and flows, it is critical for assessment professionals and assessment organizations to contribute more to the education of the general public about the pervasive, global value that this industry brings to the table."
What are the biggest issues, besides fairness, that are facing the testing industry right now and what do you think we should be doing to address them?
"In the past decade, the assessment industry has seen a substantial increase in negative public messaging regarding the value of testing. Unfortunately, it took the extreme actions by the general public (such as the Opt-Out movement, the protest of re-certification requirements, etc.) to get assessment and measurement professionals activity engaged in the public conversation. Though the tempo of the negative public messaging ebbs and flows, it is critical for assessment professionals and assessment organizations to contribute more to the education of the general public about the pervasive, global value that this industry brings to the table. If we fail to promote our advances and new assessment approaches that come along with the technology revolution, the era of the Do-It-Yourself/YouTube/Wikipedia generation will no longer appreciate the structured process, the variety of expertise, the measurement precision, and the technology support that is needed for valid and fair assessment."
What are you most excited for in the coming years? What changes are you looking forward to in this industry?
"I love the assessment industry because it is always changing and it’s always providing opportunity for growth. Whether you’re focused on AI, Deep Learning, Predictive Analytics, Game-Based Assessments (or any number of other exciting assessment areas), there is room to learn, to innovate, and to evolve. However, we often struggle in the assessment industry to keep up with innovation due to the high-stakes and high-risks associated with what we do. But we need to continue to push the envelope and evolve our assessment and measurement approaches. Our evolving world needs evolving assessments. Innovation takes struggle, but I look forward to the struggle."
You might also like...
More Reads
Fair and Square: DOMC Findings by Demographic Group
Fair and Square: DOMC Findings by Demographic Group. Use the latest scientific research conducted by Caveon to evaluate all of your options and make the best decisions for your testing program.
Data-Driven Success:
Read more →
Interested in learning more about how to secure your testing program? Want to contribute to this magazine? Contact us.
Join our mailing list
Thank you!
Copyright© 2018 Caveon, LLC.
All rights reserved. Privacy Policy | Terms of Use