Survival data is a special type of data that measures the time to an event of interest. The most important feature of survival data is the presence of censored observations. An observation is said to be right-censored if the time of the observation is, for some reason, shorter than the time to the event. If no censoring occurs in the data, standard statistical models can be used to analyse the data. Pseudo-observations can replace censored observations and thereby allow standard statistical models to be used.

In this paper, a pseudo-observation approach was applied to single-event and competing-risks analysis, with special attention paid to the properties of the pseudo-observations. In the empirical part of the study, the use of regression models based on pseudo-observations in credit-risk assessment was investigated. Default, defined as a delay in payment, was considered to be the event of interest, while prepayment of credit was treated as a possible competing risk. Credits that neither default nor are prepaid during the follow-up were censored observations. Typical application characteristics of the credit and creditor were the covariates in the regression model. In a sample of retail credits provided by a Polish financial institution, regression models based on pseudo-observations were built for the single-event and competing-risks approaches. Estimates and discriminatory power of these models were compared to the Cox PH and Fine-Gray models.

generalised estimating equations, cumulative incidence function, probability of default, credit risk, survival analysis

AGRESTI, A., (2007). Logistic Regression, in An Introduction to Categorical Data Analysis, Second Edition, John Wiley & Sons, Inc., Hoboken, NJ, USA.

AKAIKE, H., (1974). A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19 (6), pp. 716–723.

ANDERSEN, P. K., KLEIN, J. P., ROSTHOJ, S., (2003). Generalised linear models for correlated pseudo-observations, with applications to multi-state models, Biometrika, 90 (1), pp. 15–27.

ANDERSEN, P. K., PERME, M., (2010). Pseudo-observations in survival analysis, Statistical Methods in Medical Research 19 (1), pp. 71–99.

BINDER, N., GERDS, T. A., ANDERSEN, P. K., (2014). Pseudo-observations for competing risks with covariate dependent censoring, Lifetime data analysis, 20(2), pp. 303–315.

COX, D., (1972). Regression Models and Life-Tables, Journal of the Royal Statistical Society, Series B (Methodological), 34 (2), pp. 187–220

DELONG, E., DELONG, D., CLARKE-PEARSON, D., (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach, Biometrics 44, pp. 837–845.

DIRICK, L., CLAESKENS, G., BAESENS, B., (2017). Time to default in credit scoring using survival analysis: a benchmark study, J Oper Res Soc 6, pp. 652–655.

FINE, J., GRAY, R., (1999). A Proportional Hazards Model for the Subdistribution of a Competing Risk, Journal of the American Statistical Association, 94 (446), pp. 496–509.

HALLER, B., SCHMIDT, G., ULM, K., (2013). Applying competing risks regression models: an overview, Lifetime Data Anal 19, pp. 33–58.

HAND, D. J., (2009). Measuring classifier performance: a coherent alternative to the area under the ROC curve, Mach Learn 77, pp. 103–123.

HOJSGAARD, S., HALEKOH, U., YAN, J., (2005). The R Package geepack for Generalized Estimating Equations. Journal of Statistical Software, 15:2, pp. 1–11.

KLEIN, J., MOESCHBERGER, M., (2003). Survival Analysis: Techniques for Censored and Truncated Data, Statistics for Biology and Health, 2nd ed., Springer, New York.

KLEIN, J. P., ANDERSEN, P. K., (2005). Regression Modelling of Competing Risks Data Based on Pseudovalues of the Cumulative Incidence Function, Biometrics, 61 (1), pp. 223–229.

KUK, D., VARADHAN R., (2013). Model selection in competing risks regression, Statistics in Medicine 32, pp. 3077–3088.

LIANG, K., ZEGER, S., (1986). Longitudinal Data Analysis Using Generalized Linear Models. Biometrika, 73 (1), pp. 13–22.

MILLS, M., (2011). Introducing Survival and Event History Analysis, Sage, Los Angeles.

PINTILIE, M., (2006). Competing Risks: A Practical Perspective, Wiley.

VENABLES, W. N., RIPLEY, B. D., (2002). Modern Applied Statistics with S. Fourth edition, Springer.

WATKINS, J. G. T., VASNEV, A. L., GERLACH, R., (2014). Multiple Event Incidence and Duration Analysis for Credit Data Incorporating Non-Stochastic Loan Maturity, J. Appl. Econ., 29, pp. 627–648.

Copyright © 2019 Statistics Poland