We get asked by a lot of our students, “How is my UCAT mark scaled?” And to be honest, there’s not a lot of information out there about this. In this article, we’re going to clear up some of the mystery about the likely process used.
You probably are already aware that the UCAT test consists of a series of five subtests. they are:
After you’ve sat the UCAT, your results will look a little bit like this:
|Table: Sample UCAT Results|
|UCAT ANZ SUBTEST SCORES|
Each subtest has its results presented as a scaled score between 300 and 900.
The first four subtests are all related – they test different cognitive abilities. The scaled scores from these tests are added together to give a score between 1200 and 3600.
The Situational Judgement – which tests emotional intelligence and interpersonal skills and not test cognitive abilities – is presented separately.
In a word, fairness.
Essentially, UCAT scaling achieves two important things:
The five subtests are quite different:
Without scaling, the test results for each would be comparing apples to tuna, not even oranges!
Most UCAT questions are worth one mark (The exceptions are the partial marks in Situational Judgement and the questions in Decision Making worth two-marks). Without scaling, the easiest questions and the hardest questions would be worth just as much as each other. The test would no longer be meaningful in assessing the skills of vast cohort.
The UCAT is scaled using a method called Item Response Testing – IRT or IRT Scaling. This is a method of psychometric testing. The UCAT Consortium have made reference to their use of it in previous UCAT Technical Reports, but they aren’t specific about which model of IRT they use.
Item Response Theory is used to estimate a student’s ability. IRT takes into account the student’s mark and the difficulty of the question relative to other questions. The result of IRT is that the scaled mark doesn’t reflect the student’s raw mark. Instead, to illustrate a student’s performance in comparison to their peers, it shows their ability.
In IRT, a correct answer on a harder question is worth more than a correct mark on an easier question. This means that any two students who get the same raw mark might well get very different scaled marks!
The difference will depend on the questions each student got right or wrong. The student that got fewer of the harder questions wrong scores higher.
Each question needs to be ranked by difficulty. To do this, you need to test a wide range of students on a question and see what proportion of students get it right or wrong.
This data is used to develop a statistical model that correlates the student ability to test performance.
To quote our affiliates at UCAT Masterclass:
The statistical model will essentially say, “If the student’s ability corresponds to a score of X, then the student will get these Y questions correct, and these Z questions incorrect.”
The model is then applied to a student’s result and attempts to answer the question: “Given the difficulty of each question and this student’s responses, what is their most likely ability score?”
This process gives the UCAT a scaled score between 300 and 900.
Each UCAT has the same method applied to the results:
In addition, the UCAT testers, Pearson VUE, test out random new questions on the cohort sitting the test. these questions don’t receive a mark and you have no idea of knowing which questions those are. Having a cohort attempt these questions would be a savvy way of calibrating the weighting of the questions for future UCATs.
Furthermore, this process allows for further recalibration until the results are correct:
The result of this is fair and equitable scaling. But it also means that:
No, you don’t!
The data-hungry and the curious will like this. The average student doesn’t need to know or worry about IRT or UCAT scaling methods.
Instead, students should focus on acing each subtest.