Skip to main content
Published Online:https://doi.org/10.1027/1614-2241.5.1.18

Statistical techniques based on logistic regression (LR) are adequate for the detection of differential item functioning (DIF) in dichotomous items. Nevertheless, they return more false positives (FPs) than do other DIF detection techniques. This paper compares the efficacy of DIF detection using the LR significance test and the estimation of the effect size that these procedures provide using R2 of Nagelkerke. The variables manipulated were different conditions of sample size, focal and reference group sample size ratio, amount of DIF, test length and percentage of test items with DIF. In addition, examinee responses were generated to simulate both uniform and nonuniform DIF (symmetric and asymmetric). In all cases, dichotomous response tests were used. The results show that the use of R2 as a strategy for detecting DIF obtained lower correct detection percentages than those obtained from significance tests. Moreover, the LR significance test showed adequate control of FP rates, close to the nominal 5%, although the rate was slightly higher than the nominal 5% when the sample size was smaller. However, when the effect size measure was used to detect DIF, the FP rates were lower and <1% for a wide number of conditions. In addition, a statistically significant main effect of the sample size variable was obtained. Thus, the FP percentages were higher when the sample size was small (100/100). The results obtained indicate that the use of R2 as a measure of effect size together with the statistical significance test reduces the rate of FP.

References

  • American Psychological Association , (2001). Publication manual of the American Psychological Association, 5th ed. Washington, DC: Author. First citation in articleGoogle Scholar

  • Gómez-Benito, J. , Navas-Ara, M. J. (2000). A Comparison of χ2, RFA and IRT based procedures in the detection of DIF. Quality & Quantity, 34, 17–31. First citation in articleCrossrefGoogle Scholar

  • Gómez-Benito, J. , Hidalgo, M. D. , Padilla, J. L. , González, A. (2005). Desarrollo informático para la utilización de la regresión logística como técnica de detección del DIF [Software for the use of logistic regression as DIF detection procedure]. Paper presented at the IX Congreso de Metodología de las Ciencias Sociales y de la Salud, September. Granada, Spain. First citation in articleGoogle Scholar

  • Hambleton, R. K. , Cook, L. (1983). Robustness of item response models and effects of test length and sample size on the precision of ability estimates. In D. J. Weiss, (Ed.), New horizons in testing: Latent trait test theory and computerized adaptative testing (pp. 31–49). New York: Academic Press. First citation in articleCrossrefGoogle Scholar

  • Hidalgo, M. D. , Gómez-Benito, J. (2003). Test purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19, 1–11. First citation in articleLinkGoogle Scholar

  • Hidalgo, M. D. , Gómez-Benito, J. (2006). Nonuniform DIF detection using discriminant logistic analysis and multinomial logistic regression: A comparison for polytomous items. Quality & Quantity, 40, 805–823. First citation in articleCrossrefGoogle Scholar

  • Hidalgo, M. D. , López-Pina, J. A. (2004). DIF detection and effect size: A comparison between logistic regression and Mantel-Haenszel variation. Educational and Psychological Measurement, 64, 903–915. First citation in articleCrossrefGoogle Scholar

  • Hidalgo, M. D. , Gómez-Benito, J. , Padilla, J. L. (2005). Regresión logística: alternativas de análisis en la detección del funcionamiento diferencial del ítem [Logistic regression: Analysis alternatives in the detection of differential item functioning]. Psicothema, 17, 509–515. First citation in articleGoogle Scholar

  • Jodoin, M. G. , Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329–349. First citation in articleCrossrefGoogle Scholar

  • Millsap, R. E. , Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17(4), 297–334. First citation in articleCrossrefGoogle Scholar

  • Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691–692. First citation in articleCrossrefGoogle Scholar

  • Narayanan, P. , Swaminathan, H. (1996). Identification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257–274. First citation in articleCrossrefGoogle Scholar

  • Navas-Ara, M. J. , Gómez-Benito, J. (2002). Effects of ability scale purification on the identification of DIF. European Journal of Psychological Assessment., 18(1), 9–15. First citation in articleLinkGoogle Scholar

  • Oshima, T. C , Raju, N. S. , Nanda, A. O. (2006). A new method for assessing the statistical significance in the differential functioning of item and tests (DFIT) framework. Journal of Educational Measurement, 43(1), 1–17. First citation in articleCrossrefGoogle Scholar

  • Penfield, R. D. , Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19, 5–15. First citation in articleCrossrefGoogle Scholar

  • Potenza, M. T. , Dorans, N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19, 23–37. First citation in articleCrossrefGoogle Scholar

  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 492–502. First citation in articleCrossrefGoogle Scholar

  • Rogers, H. J. , Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116. First citation in articleCrossrefGoogle Scholar

  • Swaminathan, H. , Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. First citation in articleCrossrefGoogle Scholar

  • Wilkinson, L. , & The Task Force on Statistical Inference , (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604. First citation in articleCrossrefGoogle Scholar

  • Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland, H. Wainer, (Eds.), Differential item functioning (pp. 337–347). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. First citation in articleGoogle Scholar

  • Zumbo, B. D. (1999). A Handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. First citation in articleGoogle Scholar

  • Zumbo, B. D. , Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF. Prince George, Canada: University of Northern British Columbia, Edgeworht Laboratory for Quantitative Behavioral Science. First citation in articleGoogle Scholar