Assessment Integrity in the Age of Generative AI
As generative AI transforms the technological landscape, understanding the impact for psychometric assessments has emerged as a top priority for our industry.
As AI tools become increasingly accessible—essentially available in every candidate’s pocket—organisations are keen to learn what this means for the integrity and effectiveness of psychometric assessments in high-stakes selection decisions.
Has the widespread availability of AI compromised the reliability of these assessments, or have they demonstrated resilience in the face of this technological evolution?
What We’re Seeing Now
AI’s Impact on Assessment Scores
In our ongoing analysis of client data, we’ve compared scores from previous campaigns, before AI tools became widely available, to more recent campaigns. The good news is that we’ve seen no significant increases in Aptitude or Situational Judgment Test (SJT) scores which would indicate widespread AI-assisted cheating. This means our assessments are performing as expected; no large jumps in candidate scores indicates there is no significant disruption to the integrity of the process.
As always, we will continue to monitor client data and keep an eye on broader trends in assessment to ensure our tools remain effective as technology continues to evolve.
It’s also important to note that other assessment methods, such as asynchronous video interviews, CVs, and written response questions (like essays), have proven to be more susceptible to AI assistance than psychometric assessments. This further reinforces the resilience of our psychometric tools in maintaining assessment integrity in an age of rapidly advancing AI technology.
Why are our tests showing resilience to AI?
Aptitude
Whilst testing the vulnerability of our assessments, we have found that using AI to complete aptitude assessments is generally unwieldy and difficult to complete successfully due to challenging time limits. AI achieves low scores on many of our aptitude item formats, all whilst giving users false confidence in its answers. Furthermore, as candidates do not receive feedback until after the completion of the assessment, they cannot be sure whether their use of AI is helping or harming their assessment score.
Where clients are particularly concerned about cheating with AI, our recommended approach is to use non-verbal, interactive formats. We offer a number of non-verbal aptitude assessments such as Swift Global Aptitude, which forms part of our. This is due to the fact tools such as Chat GPT are ‘large language models’ designed primarily to understand text.
Situational Judgment Tests (SJTs)
The response and scoring mechanism of our SJTs make it difficult for AI to discern a ‘correct’ pattern of answers for any given scenario. The use of a single-item response format instead of multiple choice—with candidates required to rate individual responses separately rather than concurrently—further disrupts the use of AI, as all the relevant information is not available at once.
Workplace Personality Questionnaires (e.g. our Wave assessments)
Behavioural assessments are innately more resilient to AI due to the lack of a singular correct answer or solution. There isn’t a specific ‘template’ for success for any one role or a specific set of answers that would create a desirable behavioural profile – strengths and weaknesses across multiple behaviours can combine in different ways to create a profile that is effective for a particular role. However, there are mechanisms within our behavioural assessment that increase their AI resilience. For example, the interactive format of Wave requires candidates to first rate and then rank behavioural items, making AI-assisted completion significantly more challenging. This dual-decision process—where two independent but correlating judgments must be made about the same behavioural item—adds complexity, reducing the likelihood that AI can generate consistently strategic or optimised responses.
Our recommendations to fortify assessment processes:
Combined/Multiple assessment measures
Combining different assessment measures can maximise validity and improve DE&I outcomes. In addition, using a combined approach can also help protect the integrity of the assessment process. Making decisions based on a combination of assessment data can help mitigate the impact of score manipulation on any individual assessment measure.
Supervised testing
Another mechanism by which to protect the integrity of assessment results is to conduct supervised testing to verify unsupervised scores, before making an offer. This involves asking candidates to complete, or re-complete, aptitude assessments in a supervised environment at a later stage in the hiring process to guarantee the assessment is completed honestly. This could be delivered virtually via a 1-to-1 online video call or in person at an assessment center.
Proctoring services
For clients with particularly strong concerns about AI cheating, we would recommend our new proctoring service which facilitates supervised testing at scale.
What can clients do?
Clients can take proactive steps to deter cheating by setting clear expectations with candidates before they begin assessments. Communicating these measures transparently can reduce the likelihood of dishonest behaviour:
Feedback & performance discussions
Inform candidates that their assessment results will be discussed later in the selection process. Knowing they may need to explain their responses and demonstrate related competencies in interviews can discourage faking or cheating.
Web cameras
Let candidates know they may be asked to enable their web camera during the assessment, even if formal proctoring is not in place. This added layer of accountability can deter dishonest practices.
Honesty contracts
Require candidates to acknowledge and agree to an honesty statement before starting the assessment. This ethical commitment can serve as a psychological deterrent, reinforcing the importance of integrity in the process.
While AI introduces new possibilities for candidates attempting to manipulate their assessment results, it is important to note that cheating itself in assessment is not a new challenge. An inherent vulnerability of unsupervised assessments is the inability to control or monitor how a candidate completes them. However, our research suggests that the vast majority of candidates approach assessments honestly, with only 10% indicating a willingness to cheat. Unsupervised assessment continues to provide recruiters with a highly efficient, accessible and cost-effective method of understanding applicant capability and is a vital part of any selection and development campaign. Where more reassurance through further verification is needed, supervised assessment remains a useful tool to achieve this end.
By leveraging AI-resilient assessment design and working closely with organisations to proactively implement deterrents, test publishers can stay ahead of these challenges. While vigilance is important, the data suggests that AI-assisted cheating is not currently impacting the validity of psychometric assessments. Whilst this is reassuring, we remain dedicated to actively addressing advancements in AI, ensuring our assessments continue to accurately reflect candidate’s true abilities.
Our CEO, Tom Herde spoke to R&D Director Rab MacIver, of Saville Assessment, on this topic. Watch the conversation below.
AUTHORS:
Laura Stewart
Managing Consultant – Saville Assessment
Jake Smith
Screening Solutions Manager – Saville Assessment
Find Out More
If you want to discuss anything you have read about in this article, or find out more about Saville Assessment’s services, talk to our expert team today.