Module ‘UWO-1’: dichotomous IRT analyses

This report has been created by an R program written to read item scores data from a csv file, such as the Omega-IScores.csv file often used with Lertap 5, and then to use functions in selected R packages in order to get a variety of IRT statistics and graphs.

Large sections of the R code used to generate the report are from a paper by Smyth and Johnson, University of Western Ontario, UWO. A copy of their paper, ‘Item Response Theory for Dichotomous Items’, might be available here. Their code was integrated into an Rmd code module assembled by Larry Nelson, Curtin University, Western Australia (l.nelson@curtin.edu.au), last updated 28 June 2018.

‘Omega-IScores.csv’ files are created by Lertap. A reference to the respective Lertap help page is here. It is NOT necessary to use Omega-IScores.csv files – replace the ‘scored <- read.csv(“Omega-IScores.csv”)’ below with whatever csv file is appropriate.

The csv file used in this analysis was obtained from this folder:

[1] "C:/Users/larry/Dropbox/MyStuff/Lertap/Sites(Dreamweaver)/Lertap/Software"

The files in this folder are listed below:

dir()

 [1] "_notes"                               
 [2] "D1_scored.csv"                        
 [3] "eirt-1.3.0.exe"                       
 [4] "IRTmoduleUWO-1.Rmd"                   
 [5] "IRToys_Working_20Jan2018.R"           
 [6] "IRToys_Working_21Jan2018.R"           
 [7] "Lertap5.10.99-Folder"                 
 [8] "Lertap5.zip"                          
 [9] "Lertap51099-Folder"                   
[10] "Lertap51099.zip"                      
[11] "Lertap5MacroSetA.xlam"                
[12] "LRTP5.zip"                            
[13] "MacLertap5Docs.zip"                   
[14] "Notes on TAM Rmd files - Shortcut.lnk"
[15] "Notes.txt"                            
[16] "Omega-From-IScores.html"              
[17] "Omega-From-IScores.Rmd"               
[18] "Omega-From-IScores.zip"               
[19] "Omega-IScores.csv"                    
[20] "Omega-IScoresProg.R"                  
[21] "Plots"                                
[22] "Rasch-Analysis-TAM.docx"              
[23] "Rasch-Analysis-TAM.html"              
[24] "Rasch-Analysis-TAM.Rmd"               
[25] "SetupLertap5105inactive.exe"          
[26] "SetupLertap51073.exe"                 
[27] "SetupLertap5108-bf.exe"               
[28] "SetupLertap5108.exe"                  
[29] "SetupLertap51083.exe"                 
[30] "SetupLertap5109.exe"                  
[31] "SetupLertap583.exe"                   
[32] "TAM-Rmd-Notes-4Jan18.txt"             
[33] "TAM_Mod1_item.csv"                    
[34] "TAM_Mod1_xsi.csv"                     
[35] "ThreePL-Analysis-TAM.Rmd"             
[36] "TwoPL-Analysis-TAM.docx"              
[37] "TwoPL-Analysis-TAM.html"              
[38] "TwoPL-Analysis-TAM.Rmd"               
[39] "TwoPL-Analysis-TAM_files"             
[40] "USBLrtp563.zip"

The first step gets R to read the csv file into an internal working dataset called “scored”. Then the first few records of the dataset are displayed as a check. You should confirm that these records are as expected! The numbers in the first column have been inserted by R. The second column should contain scores for the first item being processed.

Note: lines that begin with ## are ignored by the processor.

scored <- read.csv("Omega-IScores.csv")
##scored <- read.csv("D1_scored.csv") ## from the FIMS study
##scored <- read.csv("EIRT-binaries.csv") ## from EIRT's main dataset
##scored <- LSAT  ## a standard dataset from the Psych package
head (scored)

  I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 I12 I13 I14 I15
1  1  0  1  0  1  0  1  1  1   1   1   1   1   1   1
2  0  1  1  1  1  1  1  1  1   1   1   1   1   0   1
3  1  0  0  0  0  1  1  1  1   1   0   0   1   1   1
4  0  0  0  0  0  0  0  0  0   0   0   0   1   1   1
5  0  1  1  1  1  0  0  1  1   1   1   0   0   1   0
6  0  1  1  1  1  0  1  1  1   1   1   1   1   1   1

We’ll try fitting the data to the Rasch model, with item discrimination fixed (‘constrained’) to a value of 1 – this will be referred to in this report as ‘Model1’.

A reference for the Rasch model is here. A basic introduction to IRT, with a focus on item characteristic curves is this one from SAS.

fit1 <- rasch(scored, constraint=cbind(length(scored)+1,1))

We can use the ‘summary’ function to get fit statistics for an IRT model. Results appear below under ‘Model Summary’. In this case, the results apply to the constrained Rasch model (fit1).

summary(fit1)


Call:
rasch(data = scored, constraint = cbind(length(scored) + 1, 1))

Model Summary:
  log.Lik      AIC      BIC
 -7356.09 14742.18 14813.81

Coefficients:
             value std.err   z.vals
Dffclt.I1   0.0819  0.0844   0.9704
Dffclt.I2   0.2552  0.0848   3.0116
Dffclt.I3  -1.8688  0.1013 -18.4542
Dffclt.I4   0.1653  0.0845   1.9548
Dffclt.I5  -0.4682  0.0851  -5.5014
Dffclt.I6   0.4986  0.0857   5.8184
Dffclt.I7   0.4310  0.0854   5.0475
Dffclt.I8   0.2372  0.0847   2.8003
Dffclt.I9   0.0101  0.0843   0.1195
Dffclt.I10 -0.5721  0.0856  -6.6866
Dffclt.I11 -2.0665  0.1057 -19.5499
Dffclt.I12  0.1952  0.0846   2.3075
Dffclt.I13  0.1294  0.0845   1.5322
Dffclt.I14 -0.9846  0.0884 -11.1390
Dffclt.I15 -0.7022  0.0863  -8.1399
Dscrmn      1.0000      NA       NA

Integration:
method: Gauss-Hermite
quadrature points: 21 

Optimization:
Convergence: 0 
max(|grad|): 0.04 
quasi-Newton: BFGS

The ‘coef’ function can be used to get a table of item difficulty, item discrimination (fixed at a value of 1 in this case), and the probablility that an ‘average’ individual (z-score of 0) will answer an item correctly.

coef(fit1,prob=TRUE)

         Dffclt Dscrmn P(x=1|z=0)
I1   0.08190912      1  0.4795342
I2   0.25524981      1  0.4365318
I3  -1.86880001      1  0.8663194
I4   0.16526234      1  0.4587782
I5  -0.46820779      1  0.6149595
I6   0.49860203      1  0.3778693
I7   0.43095235      1  0.3938989
I8   0.23720213      1  0.4409760
I9   0.01007583      1  0.4974811
I10 -0.57205810      1  0.6392379
I11 -2.06653916      1  0.8876082
I12  0.19522509      1  0.4513482
I13  0.12943724      1  0.4676858
I14 -0.98459658      1  0.7280193
I15 -0.70215471      1  0.6686653

A goodness-of-fit measure, ‘GoF’, is next. A significant p-value (such as 0.050 or less) is taken to suggest that the model is not a good fit to the data. BE PATIENT with this step as a FEW MINUTES might be required for the computation cycle to conclude.

(Note: lines with ## at the front are not processed by R, they’re ignored. Computing GoF can be computationally demanding, possibly taking more than a minute or two to finish – for this reason GoF will sometimes be deactivated by the use of ## at the start of some of the lines below.)

##GoF.rasch(fit1,B=199)
##GoF.rasch(fit1)

Following the suggestions in the Smyth and Johnson paper referenced above, we can now see what might happen if the constraint on the discrimination parameter in the Rasch model is relaxed (above it was constrained to a value of 1; now it’ll be free to assume any value – the value appears as ‘Dscrmn’ below). This will be referred to here as ‘Model2’.

fit2 <- rasch(scored)
summary(fit2)


Call:
rasch(data = scored)

Model Summary:
   log.Lik      AIC      BIC
 -7270.584 14573.17 14649.57

Coefficients:
             value std.err   z.vals
Dffclt.I1   0.0388  0.0637   0.6087
Dffclt.I2   0.1659  0.0642   2.5837
Dffclt.I3  -1.3445  0.0812 -16.5507
Dffclt.I4   0.1032  0.0639   1.6146
Dffclt.I5  -0.3518  0.0643  -5.4717
Dffclt.I6   0.3435  0.0655   5.2465
Dffclt.I7   0.2937  0.0651   4.5144
Dffclt.I8   0.1523  0.0641   2.3751
Dffclt.I9  -0.0095  0.0636  -0.1494
Dffclt.I10 -0.4277  0.0648  -6.6010
Dffclt.I11 -1.4829  0.0854 -17.3654
Dffclt.I12  0.1225  0.0640   1.9148
Dffclt.I13  0.0749  0.0638   1.1736
Dffclt.I14 -0.7194  0.0679 -10.5873
Dffclt.I15 -0.5202  0.0656  -7.9315
Dscrmn      1.5314  0.0501  30.5411

Integration:
method: Gauss-Hermite
quadrature points: 21 

Optimization:
Convergence: 0 
max(|grad|): 0.59 
quasi-Newton: BFGS

Is Model2 a better fit to the Rasch model? The ‘anova’ function in the ltm package may be used to test the hypothesis that there’s no significant difference between models. If the p-value below is less than 0.050, the hypothesis is rejected, and the second model might be regarded as providing a better fit.

A reference for these fit statistics is this Wikipedia page.

anova(fit1,fit2)


 Likelihood Ratio Table
          AIC      BIC  log.Lik    LRT df p.value
fit1 14742.18 14813.81 -7356.09                  
fit2 14573.17 14649.57 -7270.58 171.01  1  <0.001

Okay, now let’s try a third model by allowing items to have different discrimination values, we’ll refer to it here as ‘Model3’. This model is called the ‘2PL’ model in the literature, the ‘two-parameter logistic model’. The table of coefficients (below) has the difficulty estimate (‘Dffclt’) for each item, followed by the discrimination estimate (‘Dscrmn’) for each item.

fit3 <-ltm(scored ~ z1)
summary(fit3)


Call:
ltm(formula = scored ~ z1)

Model Summary:
   log.Lik      AIC      BIC
 -7178.181 14416.36 14559.62

Coefficients:
             value std.err   z.vals
Dffclt.I1   0.0612  0.0681   0.8983
Dffclt.I2   0.1596  0.0601   2.6545
Dffclt.I3  -1.5320  0.1444 -10.6120
Dffclt.I4   0.1505  0.0798   1.8855
Dffclt.I5  -0.3006  0.0581  -5.1755
Dffclt.I6   0.3124  0.0607   5.1498
Dffclt.I7   0.2772  0.0616   4.5026
Dffclt.I8   0.1265  0.0544   2.3238
Dffclt.I9  -0.0099  0.0528  -0.1873
Dffclt.I10 -0.4502  0.0760  -5.9220
Dffclt.I11 -1.7972  0.1829  -9.8252
Dffclt.I12  0.1690  0.0771   2.1906
Dffclt.I13  0.1003  0.0698   1.4363
Dffclt.I14 -0.6639  0.0693  -9.5788
Dffclt.I15 -0.7406  0.1178  -6.2879
Dscrmn.I1   1.4016  0.1255  11.1698
Dscrmn.I2   1.8731  0.1603  11.6867
Dscrmn.I3   1.2178  0.1398   8.7125
Dscrmn.I4   1.0866  0.1066  10.1975
Dscrmn.I5   2.0924  0.1845  11.3433
Dscrmn.I6   1.9552  0.1677  11.6595
Dscrmn.I7   1.8417  0.1581  11.6465
Dscrmn.I8   2.4722  0.2167  11.4067
Dscrmn.I9   2.7300  0.2496  10.9392
Dscrmn.I10  1.2880  0.1224  10.5208
Dscrmn.I11  1.1157  0.1395   7.9968
Dscrmn.I12  1.1539  0.1102  10.4742
Dscrmn.I13  1.3451  0.1218  11.0485
Dscrmn.I14  1.7522  0.1626  10.7755
Dscrmn.I15  0.8297  0.0977   8.4939

Integration:
method: Gauss-Hermite
quadrature points: 21 

Optimization:
Convergence: 0 
max(|grad|): 0.036 
quasi-Newton: BFGS

Is Model3, 2PL, better than Model2, the unconstrained Rasch model? That is, does Model3 appear to fit the data better than Model2? If the p-value below is significant, less than 0.050, then yes, it would seem so.

anova(fit2, fit3)


 Likelihood Ratio Table
          AIC      BIC  log.Lik    LRT df p.value
fit2 14573.17 14649.57 -7270.58                  
fit3 14416.36 14559.62 -7178.18 184.81 14  <0.001

The Rasch and 2PL models, as used above, might be seen as incomplete as they do not reflect the possible effects of guessing. The ‘3PL’ IRT model is used to estimate item difficultly, item discrimination, and an item guessing parameter.

We’ll get 3PL results next, and, since this is the fourth ‘model’ used in this report, we’ll refer to here it as ‘Model4’ (the first model was the constained Rasch model, the second was the unconstrained Rasch model, the third was the 2PL model, and now we’ll have 3PL, ‘Model4’). In the table below, estimates of item guessing parameters are labelled as ‘Gussng’.

fit4 <-tpm(scored, type="latent.trait", max.guessing=1)
summary(fit4)


Call:
tpm(data = scored, type = "latent.trait", max.guessing = 1)

Model Summary:
   log.Lik      AIC      BIC
 -7146.978 14383.96 14598.85

Coefficients:
             value std.err   z.vals
Gussng.I1   0.0128  0.1329   0.0964
Gussng.I2   0.0000  0.0011   0.0139
Gussng.I3   0.0000  0.0019   0.0052
Gussng.I4   0.2458  0.0292   8.4251
Gussng.I5   0.0389  0.0736   0.5284
Gussng.I6   0.0874  0.0251   3.4754
Gussng.I7   0.0892  0.0304   2.9342
Gussng.I8   0.0000  0.0005   0.0108
Gussng.I9   0.0000  0.0001   0.0047
Gussng.I10  0.0000  0.0030   0.0125
Gussng.I11  0.0000  0.0039   0.0084
Gussng.I12  0.2499  0.0344   7.2604
Gussng.I13  0.0000  0.0009   0.0087
Gussng.I14  0.0000  0.0013   0.0075
Gussng.I15  0.3905  0.0700   5.5771
Dffclt.I1   0.1324  0.2742   0.4830
Dffclt.I2   0.2103  0.0580   3.6262
Dffclt.I3  -1.5303  0.1478 -10.3569
Dffclt.I4   0.6788  0.0768   8.8401
Dffclt.I5  -0.1846  0.1337  -1.3803
Dffclt.I6   0.4975  0.0651   7.6467
Dffclt.I7   0.4718  0.0724   6.5203
Dffclt.I8   0.1862  0.0530   3.5142
Dffclt.I9   0.0536  0.0521   1.0299
Dffclt.I10 -0.4081  0.0771  -5.2911
Dffclt.I11 -1.7448  0.1743 -10.0104
Dffclt.I12  0.7095  0.0854   8.3087
Dffclt.I13  0.1444  0.0681   2.1209
Dffclt.I14 -0.6185  0.0712  -8.6906
Dffclt.I15  0.4215  0.2153   1.9578
Dscrmn.I1   1.4319  0.3762   3.8061
Dscrmn.I2   1.9202  0.1655  11.6005
Dscrmn.I3   1.1870  0.1332   8.9102
Dscrmn.I4   2.8430  0.5519   5.1514
Dscrmn.I5   2.2184  0.3579   6.1989
Dscrmn.I6   2.9908  0.4807   6.2218
Dscrmn.I7   2.6497  0.4354   6.0861
Dscrmn.I8   2.4665  0.2174  11.3463
Dscrmn.I9   2.7256  0.2445  11.1460
Dscrmn.I10  1.2841  0.1204  10.6630
Dscrmn.I11  1.1418  0.1364   8.3701
Dscrmn.I12  2.9850  0.7692   3.8805
Dscrmn.I13  1.3668  0.1241  11.0115
Dscrmn.I14  1.7556  0.1592  11.0256
Dscrmn.I15  1.6789  0.4545   3.6943

Integration:
method: Gauss-Hermite
quadrature points: 21 

Optimization:
Optimizer: optim (BFGS)
Convergence: 0 
max(|grad|): 0.032

Let’s now see if the data fit the ‘3PL’ IRT model better than the ‘2PL’ IRT model. If the p-value below is significant (less than 0.050), we will have evidence to suggest that it does.

anova(fit3, fit4)


 Likelihood Ratio Table
          AIC      BIC  log.Lik  LRT df p.value
fit3 14416.36 14559.62 -7178.18                
fit4 14383.96 14598.85 -7146.98 62.4 15  <0.001

How about ability estimates? It’s possible to get them, to be sure, but note: we have used ## markers below to turn off the estimates as the output can be quite lenghty. If you want theta estimates, remove the ## in front of factor.scores(fit3) .

##factor.scores(fit3)

The factor.scores(fit1) line below can be used (if wanted) to get ability estimates based on the constrained Rasch model:

##factor.scores(fit1)

Graphics are always great to have, a definite aid in the interpretation of IRT results. Here are some ‘ICCs’, item characteristic curves, for Model1, the Rasch model with discrimination constrained to have a value of 1.

plot(fit1,
legend = TRUE,
     cx = "bottomright",
     lwd = 3,
     cex.main = 1.5,
     cex.lab = 1.3,
cex = 0.8)

Next, ICCs for the 2PL model:

plot(fit3,
legend = TRUE,
     cx = "bottomright",
     lwd = 3,
     cex.main = 1.5,
     cex.lab = 1.3,
cex = 0.8)

Next: ICCs for the 3PL model:

plot(fit4,
legend = TRUE,
     cx = "bottomright",
     lwd = 3,
     cex.main = 1.5,
     cex.lab = 1.3,
cex = 0.8)

Item information curves are shown below. They’re used to indicate where each item best measures the latent trait. The curves are based on results from the 2PL model, ‘fit3’ in this case.

plot(fit3,
legend = TRUE,
     cx = "topright",
     cex = 0.8,
     type = "IIC",
     annot = FALSE,
     lwd = 3,
     cex.main = 1.5,
     cex.lab = 1.3)

Last but not least we come to a graph of the ‘test information function’. It’s used as an indicator of where the items, taken as a whole (that is, as a ‘test’), provide the most precise information on the latent trait (‘ability’). In this example, ‘fit3’ has been employed, the 2PL IRT model.

An excellent and classic reference to test information and item information functions (or ‘curves’) is this one by Frank Baker.

plot(fit3, type="IIC",
     items = 0,
     lwd = 3,
     cex.main = 1.5,
     cex.lab = 1.3)

Module ‘UWO-1’: dichotomous IRT analyses

THE END!