Efficacy of Effect Size Measures in Logistic Regression
An Application for Detecting DIF
Abstract
Statistical techniques based on logistic regression (LR) are adequate for the detection of differential item functioning (DIF) in dichotomous items. Nevertheless, they return more false positives (FPs) than do other DIF detection techniques. This paper compares the efficacy of DIF detection using the LR significance test and the estimation of the effect size that these procedures provide using R2 of Nagelkerke. The variables manipulated were different conditions of sample size, focal and reference group sample size ratio, amount of DIF, test length and percentage of test items with DIF. In addition, examinee responses were generated to simulate both uniform and nonuniform DIF (symmetric and asymmetric). In all cases, dichotomous response tests were used. The results show that the use of R2 as a strategy for detecting DIF obtained lower correct detection percentages than those obtained from significance tests. Moreover, the LR significance test showed adequate control of FP rates, close to the nominal 5%, although the rate was slightly higher than the nominal 5% when the sample size was smaller. However, when the effect size measure was used to detect DIF, the FP rates were lower and <1% for a wide number of conditions. In addition, a statistically significant main effect of the sample size variable was obtained. Thus, the FP percentages were higher when the sample size was small (100/100). The results obtained indicate that the use of R2 as a measure of effect size together with the statistical significance test reduces the rate of FP.
References
2001). Publication manual of the American Psychological Association, 5th ed. Washington, DC: Author.
, (2000). A Comparison of χ2, RFA and IRT based procedures in the detection of DIF. Quality & Quantity, 34, 17–31.
(2005). Desarrollo informático para la utilización de la regresión logística como técnica de detección del DIF [Software for the use of logistic regression as DIF detection procedure]. Paper presented at the IX Congreso de Metodología de las Ciencias Sociales y de la Salud, September. Granada, Spain.
(1983). Robustness of item response models and effects of test length and sample size on the precision of ability estimates. In , New horizons in testing: Latent trait test theory and computerized adaptative testing (pp. 31–49). New York: Academic Press.
(2003). Test purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19, 1–11.
(2006). Nonuniform DIF detection using discriminant logistic analysis and multinomial logistic regression: A comparison for polytomous items. Quality & Quantity, 40, 805–823.
(2004). DIF detection and effect size: A comparison between logistic regression and Mantel-Haenszel variation. Educational and Psychological Measurement, 64, 903–915.
(2005). Regresión logística: alternativas de análisis en la detección del funcionamiento diferencial del ítem [Logistic regression: Analysis alternatives in the detection of differential item functioning]. Psicothema, 17, 509–515.
(2001). Evaluating Type I error and power rates using an effect size measure with logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329–349.
(1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17(4), 297–334.
(1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691–692.
(1996). Identification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257–274.
(2002). Effects of ability scale purification on the identification of DIF. European Journal of Psychological Assessment., 18(1), 9–15.
(2006). A new method for assessing the statistical significance in the differential functioning of item and tests (DFIT) framework. Journal of Educational Measurement, 43(1), 1–17.
(2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19, 5–15.
(1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19, 23–37.
(1988). The area between two item characteristic curves. Psychometrika, 53, 492–502.
(1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116.
(1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.
(1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604.
, (1993). Practical questions in the use of DIF statistics in test development. In , Differential item functioning (pp. 337–347). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
(1999). A Handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.
(1997). A measure of effect size for a model-based approach for studying DIF. Prince George, Canada: University of Northern British Columbia, Edgeworht Laboratory for Quantitative Behavioral Science.
(