References cited in Lertap 5 documents

Allen, M.J. & Yen, W.M. (1979). Introduction to measurement theory. Monterey, California: Brooks/Cole.

Angoff, W.H. (1993). Perspectives on differential item functioning methodology. In P.W. Holland and H. Wainer (Eds.), Differential Item Functioning (pp. 3-23). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Bandalos, D.L. (2018). Measurement theory and applications. New York, NY: The Guilford Press.

Berk, R. A. (1978). Empirical evaluation of formulae for correction of item-total point-biserial correlations. Educational and Psychological Measurement, 38, 647-652.

Berk, R.A. (1980). A consumer's guide to criterion-referenced test reliability. Journal of Educational Measurement, 17, 323-350.

Berk, R.A. (1984). Selecting the index of reliability. In R.A. Berk (Ed.) A guide to criterion-referenced test construction. Baltimore, Maryland: The Johns Hopkins Press.

Berk, R.A. (2000). Ask Mister Assessment Person. In Teachers: Supply and demand in an age of rising standards. Amherst, MA: National Evaluation Systems, Inc. (Note: in May, 2013, the papers in this series could be found as individual PDF files by searching the internet.)

Bond, T.G. & Fox, C.M. (2007). Applying the Rasch Model (2nd ed.). Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

Bond, T.G. & Fox, C.M. (2015). Applying the Rasch Model (3rd ed.). New York, NY: Routledge.

Brennan, R.L. (1972). A generalized upper-lower discrimination index. Educational and Psychological Measurement, 32, 289-303.

Brennan, R.L. (1984). Estimating the dependability of the scores. In R.A. Berk (Ed.) A guide to criterion-referenced test construction. Baltimore, Maryland: The Johns Hopkins Press.

Brennan, R.L. & Kane, M.T. (1977). An index of dependability for mastery tests. Journal of Educational Measurement, 14, 277-289.

Brown, M.B. (1977). Algorithm AS 116: the tetrachoric correlation and its standard error. Applied Statistics, 26, 343-351.

Camilli, G. & Shepard, L.A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage Publications.

Carr, N.T. (2011). Designing and analyzing language tests. Oxford: Oxford University Press.

Case, S.M. & Swanson, D.B. (1998). Constructing written test questions for the basic and clinical sciences. Philadelphia: National Board of Medical Examiners.

Cattell, R.B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245-276.

Cizek, G.J. (2001). An overview of issues concerning cheating on large-scale tests. Paper presented at the annual meeting of NCME, the National Council on Measurement and Evaluation, April 2001, Seattle, Washington.

Clauser, B.E., & Mazor, K.M. (1998). An NCME instructional module on using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-44.

Cody, R. & Smith, J.K. (2014). Test scoring and analysis using SAS. Cary, NC: SAS Institute Inc.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46.

Crocker, L.M. & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart, and Winston.

Dai, S (2021). Handling missing responses in psychometrics: methods and software. Psych 2021, 3, 673-693 (https://doi.org/10.3390/psych3040043).

Dawber, T. (2004). Robustness of Lord's formulas for item difficulty and discrimination conversions between classical and item response theory models. Edmonton, Alberta: unpublished doctoral dissertation, University of Alberta (also see following reference).

Dawber, T., Rogers, W.T., & Carbonaro, M. (2004). Robustness of Lord's formulas for item difficulty and discrimination conversions between classical and item response theory models. Paper presented at the annual meeting of AERA, the American Educational Research Association, April 12, 2004, San Diego, California.

de la Harpe, B.I. (1998). Design, implementation, and evaluation of an in-context learning support program for first year education students and its impact on educational outcomes. Perth, Western Australia: unpublished doctoral dissertation, Curtin University of Technology.

DeMars, Christine (2010). Item response theory. New York: Oxford University Press, Inc.

Dimitrov, D.M. (2003). Reliability and true-score measures of binary items as a function of their Rasch difficulty parameter. Journal of Applied Measurement, 4(3), 222-233.

Dorans, N.J. & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential performance on the Scholastic Aptitude Test. Journal of Educational Measurement,23, 355-368.

Dorans, N.J. & Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P.W. Holland and H. Wainer (Eds.), Differential Item Functioning (pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Dorans, N.J. & Kulick, E. (2006). Differential item functioning on the Mini-Mental State Examination: an application of the Mantel-Haenszel and standardization procedures. Medical Care, 44(11), S107-S114.

Dunn, T.J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105, 399-412. doi:10.1111/bjop.12046

Eason, S. (1991). Why generalizability theory yields better results than classical test theory: a primer with concrete examples. In B. Thompson (Ed.), Advances in Educational Research: Substantive findings, methodological developments (Vol. 1, pp. 83-98). Greenwich, CT: JAI.

Ebel, R.L. & Frisbie, D.A. (1986). Essentials of Educational Measurement (4th ed.). Sydney: Prentice-Hall of Australia.

Fan, X. (1998). Item response theory and classical test theory: an empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58 (3), 357-381.

Feldt, L.S. (1984). Some relationships between the binomial error model and classical test theory. Educational and Psychological Measurement, 44, 883-891.

Frederiksen, N., Mislevy, R.J., & Bejar, I.I. (Eds.) (1993). Test theory for a new generation of tests. Hillsdale, NJ: Lawrence Erlbaum Associates.

Garrett, H.E. (1952). Testing for teachers. New York: American Book Company.

Geldhof, G.J., Preacher, K.J, & Zyphur, M.J. (2013). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods. doi:10.1037/a0032138.

Gil Escudero, G., Suárez Falcón, J.C., y Martinez Arias, R. (1999): Aplicación de un procedimiento iterativo para la selección de modelos de la Teoria de la Respuesta al Item en una prueba de rendimiento lector. Revista de Educación, 319, 253-272.

Glass, G.V & Stanley, J.C. (1970). Statistical methods in education and psychology. Englewood Cliffs, NJ: Prentice-Hall.

Glass, G.V & Stanley, J.C. (1974). Metodos estadisticos aplicados a las ciencias sociales. London: Prentice-Hall Internacional.

Green, J. (1999). Excel 2000 VBA programmer's reference. Birmingham, England: Wrox Press.

Gronlund, N.E. (1985). Measurement and evaluation in teaching (5th ed.). New York: Collier Macmillan Publishers.

Gulliksen, H. (1950). Theory of mental test scores. New York: John Wiley & Sons.

Haladyna, T.M. (2004). Developing and validating multiple-choice test items (3rd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

Haladyna, T.M. & Rodriguez, M.C.(2013). Developing and validating test items. New York: Routledge.

Hambleton, R.K. & Jones, R.W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47.

Hambleton, R.K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.

Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.

Harpp, D.N. & Hogan, J.J. (1993). Crime in the classroom- detection and prevention of cheating on multiple-choice exams. Journal of Chemical Education, 70(4), 306-311.

Harpp, D.N., Hogan, J.J., & Jennings, J.S. (1996). Crime in the classroom- Part II, an update. Journal of Chemical Education, 73(4), 349-351.

Hattie, J., Jaeger, R.M., & Bond, L. (1999). Persistent methodological questions in educational testing. Review of Research in Education, 24, 393-446.

Hays, W.L. (1973). Statistics for the social sciences. London: Holt, Rinehart and Winston.

Hills, J.R. (1976). Measurement and evaluation in the classroom. Columbus, Ohio: Charles E. Merrill.

Hopkins, K.D. (1998). Educational and psychological measurement and evaluation (8th ed.). Boston: Allyn & Bacon.

Hopkins, K.D. & Glass, G.V (1978). Basic statistics for the behavioral sciences. Englewood Cliffs, NJ: Prentice-Hall.

Hopkins, K.D., Stanley, J.C., & Hopkins, B.R. (1990). Educational and psychological measurement and evaluation (7th ed.). Englewood Cliffs, NJ: Prentice-Hall.

Hoyt, C.J. (1941). Test reliability estimated by analysis of variance. Psychometrika, 6, 153-160.

Kaplan, R.M. & Sacuzzo, D.P. (1993). Psychological testing: principles, applications, and issues. Pacific Grove, California: Brooks/Cole.

Kelly, T.L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30, 17-24.

Kerlinger, F.N. (1973). Foundations of behavioral research (2nd ed.). London: Holt, Rinehart, and Winston.

Kline, P. (2015). A handbook of test construction. New York: Routledge.

Kolen, M.J. & Brennan, R.L. (1995). Test equating: methods and practices. New York: Springer-Verlag.

Lawson, S. (1991). One parameter latent trait measurement: Do the results justify the effort? In B. Thompson (Ed.), Advances in Educational Research: Substantive findings, methodological developments (Vol. 1, pp. 159-168). Greenwich, CT: JAI.

Lindeman, R.H. & Merenda, P.F. (1979). Educational measurement (2nd ed.). London: Scott, Foresman and Company.

Linn, R.L. & Gronlund, N.E. (1995). Measurement and assessment in teaching (7th ed.). Englewood Cliffs, NJ: Prentice-Hall.

Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillside, NJ: Lawrence Erlbaum Associates.

Lord, F.M. (1984). Standard errors of measurement at different ability levels. Journal of Educational Measurement, 21(3), 239-243.

Lord, F.M. & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, Massachusetts: Addison-Wesley.

MacDonald, P. & Paunonen, S.V. (2002). A Monte Carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921-943.

Magis, D., Beland, S., Tuerlinckx, F. and De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847-862. doi: 10.3758/BRM.42.3.847

Magnusson, D. (1967). Test theory. London: Addison-Wesley.

McDonald, R.P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum Associates.

Mehrens, W.A. & Lehmann, I.J. (1991). Measurement and evaluation in education and psychology (4th ed.). London: Holt, Rinehart and Winston.

Meyer, J. Patrick (2010). Reliability. New York: Oxford University Press, Inc.

Meyer, J. Patrick (2014). Applied Measurement with jMetrik. New York: Routledge (Taylor & Francis Group).

Michaelides, M.P. (2008). An illustration of a Mantel-Haenszel procedure to flag misbehaving common items in test equating. Practical Assessment, Research & Evaluation, 13 (7). Available online: http://pareonline.net/getvn.asp?v=13&n=7

Nandakumar, R. (1994). Assessing dimensionality of a set of item responses—Comparison of different approaches. Journal of Educational Measurement, 31, 17-35.

Nelson, L.R. (1974). Guide to LERTAP use and interpretation. Dunedin, New Zealand: Department of Education, University of Otago.

Nelson, L.R. (1981). PLATISLA, an introduction to applied social science statistical methods. Dunedin, New Zealand: Department of Education, University of Otago. (Click here to link to sample data from Platisla.)

Nelson, L.R. (1984). Using microcomputers to assess achievement and instruction. Educational Measurement: Issues and Practice, 3(2), 22-26. (Click here for a PDF copy.)

Nelson, L.R. (2000). Item analysis for tests and surveys using Lertap 5. Perth, Western Australia: Curtin University of Technology (www.lertap5.com).  (Click here to select chapters.)

Nelson, L.R. (2004). Excel as an aide in teaching measurement and research methods. Thai Journal of Educational Research and Measurement (ISSN 1685-6740): 2(1), 43-55. (Click here to link to a copy of this paper; the final journal article is here.)

Nelson, L.R. (2005). Some observations on the scree test, and on coefficient alpha. Thai Journal of Educational Research and Measurement (ISSN 1685-6740): 3(1), 1-17. (Click here to link to a copy of this paper; the final journal article is here.)

Nelson, L.R. (2006). Using selected indices to monitor cheating on multiple-choice exams. Thai Journal of Educational Research and Measurement (ISSN 1685-6740): 4(1), 1-18. (Click here to link to a copy of this paper; the final journal article is here.)

Nelson, L.R. (2007). Some issues related to the use of cut scores. Thai Journal of Educational Research and Measurement (ISSN 1685-6740): 5(1), 1-16. (Click here to link to a copy of this paper; click here for the final journal article.)

Nelson, L.R. (2008). Rasching an achievement test. Thai Journal of Educational Research and Measurement (ISSN 1685-6740): 6(1), 1-22. (Click here to link to a copy of this paper; the final journal article is here.)

Nelson, L.R. (2016). Coefficients alpha and omega, an empirical comparison. DOI: 10.13140/RG.2.1.1957.5929 (Click here to link to a copy of this paper.)

Nelson, L.R. (2017). Item analysis software for classes. DOI: 10.13140/RG.2.2.32532.71049 (Click here to link to a copy of this working paper.)

Nelson, L.R. (2020). Assessing the invariance of cognitive item statistics. DOI: 10.13140/RG.2.2.36829.97769 (Click here to link to a copy of this paper.)

Nelson, L.R. (2021). Computing alpha and omega reliability estimates. DOI: 10.13140/RG.2.2.15727.97440 (Click here to link to a copy of the working paper.)

Online Press, Inc. (1997). Quick course in Microsoft Excel 97. Redmond, Washington: Microsoft Press.

Oosterhof, A.C. (1990). Classroom applications of educational measurement. Columbus, Ohio: Merrill.

Pedhazur, E.J. & Schmelkin, L.P. (1991). Measurement, design, and analysis: an integrated approach. Hillsdale, NJ: Lawrence Erlbaum Associates.

Peng, C-Y.J. & Subkoviac, M.J. (1980). A note on Huynh's normal approximation procedure for estimating criterion-referenced reliability. Journal of Educational Measurement, 17, 359-368.

Pintrich, P.R., Smith, D.A.F., Garcia, T. & McKeachie, W.J. (1991). A manual for the use of the Motivated Strategies for Learning Questionnaire (MSLQ). Ann Arbor, Michigan: the University of Michigan.

Popham, W.J. (1978). Criterion-referenced measurement. Englewood Cliffs, NJ: Prentice-Hall.

Qualls-Payne, A.L. (1992). A comparison of score level estimates of the standard error of measurement. Journal of Educational Measurement, 29(3), 213-225.

Revelle, W. (2016). Using R and the psych package to find ω. (February 2017: seen at http://personality-project.org/r/psych/HowTo/R_for_omega.pdf)

Revelle, W. & Condon, D.M. (2019). Reliability from Alpha to Omega: A tutorial. Psychological Assessment, 31(12):1395-1411.

Revelle, W. & Zinbarg, R.E. (2009). Coefficients alpha, beta, omega, and the glb: comments on Sijtsma. Psychometrika, 74(1):145-154. doi:10.1007/s11336-008-9102-z. (September 2020: seen at http://personality-project.org/revelle/publications/rz09.pdf)

Roussos, L.A., Schnipke, D.L., & Pashley, P.J. (1999). A generalized formula for the Mantel-Haenszel differential item functioning parameter. Journal of Educational and Behaviorial Statistics, 24(3), 293-322.

Sanders, D.H. (1981). Computers in society. New York: McGraw-Hill.

Stage, C. (1998). A comparison between item analysis based on item response theory and classical test theory: a study of the SweSAT Subtest READ. Educational Measurement No 30. Umeå, Sweden: University of Umeå, Department of Educational Measurement. (Possibly available at www.umu.se/edmeas/publikationer/index_eng.html.)

Stage, C. (2003). Classical test theory or item response theory: the Swedish experience. Educational Measurement No 42. Umeå, Sweden: University of Umeå, Department of Educational Measurement.  (Possibly available at www.umu.se/edmeas/publikationer/index_eng.html; found at the following address January 2008: www.umu.se/edmeas/publikationer/pdf/em%20no%2042.pdf.)

Stevenson, J. (1998). Performance of the Cognitive Holding Power Questionnaire in schools. Learning and Instruction, 8(5), 393-410.

Stevenson, J.C. & Evans, G.T. (1994). Conceptualization and measurement of cognitive holding power. Journal of Educational Measurement, 31(2), 161-181.

Subkoviak, M.J. (1976). Estimating reliability from a single administration of a criterion-referenced test. Journal of Educational Measurement, 13, 265-276.

Subkoviak, M.J. (1984). Estimating the reliability of mastery-nonmastery classifications. In R.A. Berk (Ed.) A guide to criterion-referenced test construction. Baltimore, Maryland: The Johns Hopkins Press.

Thompson, B. (2004). Exploratory and confirmatory factor analysis: understanding concepts and applications. Washington, DC: The American Psychological Association.

Thompson, B. (2006). Foundations of behavioral statistics: an insight-based approach. New York: The Guilford Press.

Thorndike, R.L. (1982). Educational measurement: Theory and practice. In D. Spearitt (Ed.), The improvement of measurement in education and psychology: Contributions of latent trait theory (pp. 3-13). Princeton, NJ: ERIC Clearinghouse of Tests, Measurements, and Evaluations. (ERIC Document Reproduction Service No. ED 222 545.)

Tukey, J.W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.

Wainer, H. (1989). The future of item analysis. Journal of Educational Measurement, 26, 191-208.

Wainer, H. (2006). A psychometric cicada: Educational Measurement returns. Educational Researcher, 36(8), 485-486., DOI: 10.3102/0013189X07311288

Wesolowsky, G.O. (2000). Detecting excessive similarity in answers on multiple choice exams. Journal of Applied Statistics, 27(7), 909-921.

Wiersma, W. & Jurs, S.G. (1990). Educational measurement and testing (2nd ed.). Boston: Allyn & Bacon.

Wright, B.D. & Stone, M.H. (1979). Best test design. Chicago: Mesa Press.

Wu, M. et al. (2016). Educational measurement for applied researchers. Singapore: Springer Nature Singapore Ltd., DOI: 10.1007/978-981-10-3302-5_2

Wyse, A.E. & Babcock, B. (2016). Does maximizing information at the cut score always maximize classification accuracy and consistency? Journal of Educational Measurement, 53(1), 23-44.

Zieky, M. (2003). A DIF Primer. Princeton, NJ: Educational Testing Service.

Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: lagging rules, minimum sample size requirements, and criterion refinement. (Research Report. No. RR-12-08). Princeton, NJ: Educational Testing Service. Located here 8 January 2019.