MeasuresAndEvaluation
strength
ITEM RESPONSE THEORY
An Introduction
Problems with CTT
■ Extremely sample-dependent ■ Item statistics are all on separate scales from the ability
score ■ Cannot adequately take guessing into account ■ Does not estimate true scores well ■ Assumes a measurement model, but does not actually fit
the model to the data
A breakthrough: the Rasch model
■ Models the probability of a correct answer on a given item as a function of two parameters: – Participant ability (Theta) – Item difficulty (b)
■ These two parameters are on the SAME scale – Represented as z-scores
Rasch Model: Equation form
( ) 1
( 1 | ) 1 i jij i b
P x e θ
θ −= = +
Probability of answering item X correctly, given participant ability
Ability Item difficulty
*Also known as One- parameter-logistic (1pl) model
1pl: characteristic curves
■ The probability of a correct response on a given item can be represented by an item characteristic curve
■ Item difficulty is the level of theta (ability) at which a participant has a 50% likelihood of getting the item correct.
■ Example of curve where b = 0 (perfectly average)
■ The point where the b parameter is located is called the inflection point
Some real examples from JMP
This item is easier because it takes less ability to have a 50% chance of getting it right
This item is more difficult: a greater than average ability is required
Now you try: Which item is easiest and hardest
The 2pl model: Adding Discrimination ■ Remember from CTT: not all items discriminate equally!
■ The 2pl IRT model includes a discrimination parameter (a)
( ) 1
( 1 | ) 1 i jij i a b
P x e θ
θ − −= = +
Everything is the same except for this
2pl Curves: Look for Slope
■ The slope of the curve represents the item’s discrimination
■ Answers the question: how related to theta is this particular item?
In this example, Item 1 is more discriminating than Item 2
Now you try: Which item is most and least discriminating?
3pl: Adding guessing
■ One major failing of CTT is we can’t account for guessing
■ Is an item easy because participants can guess it, or is it actually an easy concept?
■ The 3pl model includes the guessing parameter (c) as the lower asymptote of the probability function:
■ The higher the probability of guessing, the easier it is for participants to guess the item.
( )
1 ( 1 | )
1 j i j j
ij i j a b
c P x c
e θ θ − −
− = = +
+
Guessing
3pl: Characteristic Curves
The green curve (V5) shows the highest guessing parameter
Important: Inflection point shift
■ When we add the guessing parameter, the inflection point is not longer the point where a participant has a 50% likelihood of a correct answer.
■ The new inflection point is calculated simply like this:
■ That means we need to change where we look for the difficulty parameter.
1 2
c+
Now you try: Which item is easiest and hardest to guess?
Information
■ In IRT, we typically talk about reliability in terms of “information”
■ Answers the question: “How much do we know about a participant’s true ability (theta) based on this item?”
■ BUT the information is not the same for all participants – This is a major difference from CTT – Items are more or less informative for different participants, depending on their
level of theta
■ Information functions depict the level of theta at which an item (or an entire test) is most informative
Information functions
■ The HEIGHT of the function shows HOW MUCH information is given by an item
■ The LOCATION of the peak of the function shows for what participants the item is informative
■ **Important: The information function will always peak at the item’s b-parameter (difficulty) and it’s height is determined directly by the item discrimination
The black curve here seems like the least informative, but it is the MOST informative for participants with theta levels greater than 2
JMP examples ■ The information functions of each item has been on the characteristic curves the
whole time:
■ Remember: Information function peaks at b (difficulty) and it’s height is determined by a (discrimination)
Now you try: Which item gives the most or least information?
Test Information ■ Summarizes the amount of information given by all of the items included on the test.
■ Analogous to composite reliability measures like Cronbach’s alpha.
■ Remember: information is different across different levels of theta
For what participants is this test most informative?
Right around the mean
Standard error of measurement in IRT
■ At any given point of theta, the SEM is the inverse of the test information. – SEM = 1 - Information
■ So the SEM is DIFFERENT for all the participants! ■ Think about this: how would this change our scenarios from last class?
Model Assumptions
■ In CTT we had the assumption that the errors were the same magnitude for all the items, and for all participants.
– Everything was at the composite level
■ Now, we don’t have those assumptions anymore.
■ We are still making these important assumptions: – Unidimensionality: The same true score causes variance in all the items – Independence: A previous item is not required to get a new item correct
T
Item 1
E1
Item 3
E3
Item 2
E2
Other benefits of IRT ■ True scores (thetas) can be directly estimated by the computer.
– When we deal with these true score estimates, confidence intervals like those we constructed in CTT are not needed
■ Item parameters should be nearly the same across samples – This can be empirically tested – This gets into test fairness
■ IRT can be used to: – construct computer-adaptive-tests – equate scores across tests or grade levels – pick items for clinical or neurocognitive testing
■ In IRT we have an empirical test of whether our measurement model fits our data (called model fit statistics)
IRT in JMP
Click on Item analysis
Select your data-set
Check settings and then click import
This dialog box pops up
Highlight the items you want to analyze
Click Test Items
Then Click OK
Change model type here (default is 2pl)
This is the output: just click on the arrows to display the particular plots you are looking for
Ok– now you try!
- Item Response Theory
- Problems with CTT
- A breakthrough: the Rasch model
- Rasch Model: Equation form
- 1pl: characteristic curves
- Some real examples from JMP
- Now you try: Which item is easiest and hardest
- The 2pl model: Adding Discrimination
- 2pl Curves: Look for Slope
- Now you try: Which item is most and least discriminating?
- 3pl: Adding guessing
- 3pl: Characteristic Curves
- Important: Inflection point shift
- Now you try: Which item is easiest and hardest to guess?
- Information
- Information functions
- JMP examples
- Now you try: Which item gives the most or least information?
- Test Information
- Standard error of measurement in IRT
- Model Assumptions
- Other benefits of IRT
- IRT in JMP
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Ok– now you try!