Evaluating the effect of variables on diagnostic measures (sensitivity, specificity, positive, and negative predictive values) is often of interest to clinical researchers. Logistic regression (LR) models can be used to predict diagnostic measures of a screening test. A marginal model framework using generalized estimating equation (GEE) with logit/log link can be used to compare the diagnostic measures between two or more screening tests. These individual modeling approaches to each diagnostic measure ignore the dependency among these measures that might affect the association of covariates with each diagnostic measure. The diagnostic measures are computed using joint distribution of screening test result and reference test result which generates a multinomial response data. Thus, multinomial logistic regression (MLR) is a more appropriate approach to modeling these diagnostic measures. In this study, the validity of LR and GEE approaches as compared to MLR model was assessed for modeling diagnostic measures. All methods provided unbiased estimates of diagnostic measures in the absence of any covariate. LR and GEE methods produced more biased estimates as compared to MLR approach especially for small sample size studies. No bias was obtained in predicting sensitivity measure using MLR method for one screening test. Our proposed MLR method is robust for modeling diagnostic measures of a screening test as opposed to LR method. MLR method and GEE method produced similar estimates of diagnostic measures for comparing two screening tests in large sample size studies. The proposed MLR model for diagnostic measures is simple, and available in common statistical software. Our study demonstrates that MLR method should be preferred as an alternative for modeling diagnostic measures.
multinomial logistic regression, predictive values, sensitivity, specificity, acute appendicitis, pulmonary abnormalities, medical diagnostic test.
AGRESTI, A., (2007). An Introduction to Categorical Data Analysis. John Wiley & Sons, Inc., Hoboken, New Jersey, p. 174.
ALBANO, M. C., ROSS, G. W., DITCHEK, J. J., DUKE, G. L., TEEGER, S., SOSTMAN, H. D., FLOMENBAUM, N., SEIFERT, C., BRILL, P. W., (2001). Resident Interpretation of Emergency CT Scans in the Evaluation of Acute Appendicitis. Academic Radiology, 8, 915-918.
BERGTOLD, J. S., YEAGER, E. A., FEATHERSTONE, A., (2011). Sample Size and Robustness of Inferences from Logistic Regression in the Presence of Nonlinearity and Multicollinearity. The Annual Meeting of Agricultural and Applied Economics Association.
CARNEY, P. A., MIGLIORETTI, D. L., YANKASKAS, B. C., KERLIKOWSKE, K., ROSENBERG, R., RUTTER, C. M., GELLER, B. M., ABRAHAM, L.A., TAPLIN, S. H., DIGNAN, M., CUTTER, G., BALLARD-BARBASH, R., (2003). Individual and Combined Effects of Age, Breast Density, and Hormone Replacement Therapy Use on the Accuracy of Screening Mammography. Annals of Internal Medicine, 138(3), 168-75.
COUGHLIN, S. S., TROCK, B., CRIQUI, M. H., PICKLE, L. W., BROWNER, D., TEFFT, M. C., (1992). The Logistic Modeling of Sensitivity, Specificity, and Predictive Value of a Diagnostic Test. Journal of Clinical Epidemiology, 45, 1-7.
ELIE, C., COSTE, J., THE FRENCH SOCIETY OF CLINICAL CYTOLOGY STUDY, (2008). A Methodological Framework to Distinguish Spectrum Effects from Spectrum Biases and to Assess Diagnostic and Screening Test Accuracy for Patient Populations: Application to the Papanicolaou Cervical Cancer Smear Test. BMC Medical Research Methodology, 8, 7.
FIDLER, V., NAGELKERKE N., (2013). The Mantel-Haenszel Procedure Revisited: Models and Generalizations. PLoS One, 8(3), e58327.
FIGUEROA-CASAS, J. B., CONNERY, S. M., MONTOYA, R., DWIVEDI, A. K., LEE, S., (2014). Accuracy of Early Prediction of Duration of Mechanical Ventilation by Intensivists. Annals of the American Thoracic Society, 11(2), 182-185.
JANSSENS, A. C., DENG, Y., BORSBOOM, G. J., EIJKEMANS, M. J., HABBEMA, J. D., STEYERBERG, E. W., (2005). A New Logistic Regression Approach for the Evaluation of Diagnostic Test Results. Medical Decision Making, 25(2), 168-177.
KING, G., ZENG, L., (2001). Logistic Regression in Rare Events Data. Political Analysis, 9, 137-163.
LAYA M. B., LARSON E. B., TAPLIN S. H., WHITE E., (1996). Effect of Estrogen Replacement Therapy on the Specificity and Sensitivity of Screening Mammography. Journal of National Cancer Institute, 88(10), 643-649.
LEISENRING, W., PEPE, M. S., LONGTON, G., (1997). A Marginal Regression Modelling Framework for Evaluating Medical Diagnostic Tests. Statistics in Medicine, 16, 1263-1281.
LEISENRING, W., ALONZO, T., PEPE, M. S., (2000). Comparisons of Predictive Values of Binary Medical Diagnostic Tests for Paired Designs. Biometrics, 56, 345-351.
LIU, H., (1998). Robust Standard Error Estimate for Cluster Sampling Data: A SAS/IML Macro Procedure for Logistic Regression with Huberization. In: Proceedings of the Twenty-Third Annual SAS Users Group International.
MIETTINEN, O. S., (1976). Stratification by a Multivariate Confounder Score. American Journal of Epidemiology. 104, 609-620.
MOSKOWITZ, C. S., PEPE, M. S., (2006). Comparing the Predictive Values of Diagnostic Tests: Sample Size and Analysis for Paired Study Designs. Memorial Sloan-Kettering Cancer Center, Department of Epidemiology & Biostatistics Working Paper Series. Working Paper 5.
NEMES, S., JONASSON, J. M., GENELL, A., STEINECK, G., (2009). Bias in Odds Ratios by Logistic Regression Modelling and Sample Size. BMC Medical Research Methodology, 9, 56.
PUGGIONI, G., GELFAND, A. E., ELMORE, J. G., (2008). Joint Modeling of Sensitivity and Specificity. Statistics in Medicine, 27(10), 1745-1761.
YE, F., LORD, D., (2014). Comparing Three Commonly Used Crash Severity Models on Sample Size Requirements: Multinomial Logit, Ordered Probit and Mixed Logit Models. Analytic Methods in Accident Research, 1, 72-85