LINEAR REGRESSION WITH DATA MISSING NOT AT RANDOM: BOOTSTRAP APPROACH
Abstract
OLS regressions have a set of assumption in order to have its point and interval estimates to be unbiased and efficient. Data missing not at random (MNAR) can pose serious estimations issues in the linear regression. In this study we evaluate the performance of OLS confidence interval estimates with MNAR data. We also suggest bootstrapping as a remedy for such data cases and compare the traditional confidence intervals against bootstrap ones. As we need to know the true parameters, we carry out a simulations study. Research results indicate that both approaches show similar results having similar intervals size. Given that bootstrap required a lot of computations, traditional methods is still recommended to be used even in case of MNAR
Keywords:
linear model sample size confidence Interval bootstrap accuracy interval size missing not at randomReferences
Carpenter, J. R., & Kenward, M. G. (2012). Missing data in clinical trials: a practical guide. Practical Guides to Biostatistics and Epidemiology. Cambridge University Press.
Chernick, M. R., and LaBudde, R. A. (2014). An introduction to bootstrap methods with applications to R. John Wiley & Sons.
Chernozhukov, V., and Hong, H. (2003). An MCMC approach to classical estimation. Journal of Econometrics, 115(2), 293-346.
Davison , A. C. , and Hinkley , D. V. (1997). Bootstrap Methods and Their Applications. Cambridge University Press, Cambridge .
DiCiccio , T., and Efron , B. (1992). More accurate confidence intervals in exponential families. Biometrika 79, 231 – 245 .
Efron , B., and Tibshirani , R. (1986). Bootstrap methods for standard errors, confidence intervals and other measures of statistical accuracy. Statistical Science. Vol. 1 , 54 – 77
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1), 1-26.
Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. SIAM, Philadelphia
Fan, Y., and Li, Q. (2004). A consistent model specification test based on the kernel density estimation. Econometrica, 72(6), 1845-1858.
Flachaire, E. (2007). Bootstrapping heteroscedastic regression models: wild bootstrap vs pairs bootstrap. Computational Statistics and Data Analysis, 49 (2), 361-376
Freedman , D. A. (1981). Bootstrapping regression models. Annals of Statistics, 9, 1218 – 1228
Graham, J. W. (2003). Adding missing-data-relevant variables to FIML-based structural equation models. Structural Equation Modeling, 10(1), 80-100.
Greene, W. H. (2021) Econometric Analysis, 8th edn, Pearson
Gujarati, D. N., Porter, D. C., and Gunasekar, S. (2012). Basic econometrics. McGraw-Hill Higher Education
He, Y., & Zaslavsky, A. M. (2012). Diagnostics for multiple imputation in surveys with missing data. Biometrika, 99(4), 731-745.
Horowitz, J. L., and Markatou, M. (1996). Semiparametric estimation of regression models for panel data. Review of Economic Studies, 63(1), 145-168.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2023). An Introduction to Statistical Learning. Publisher.
Lind, D. A., Marchal, W. G., and Wathen, S. A. (1967). Statistical Techniques in Business and Economics (2nd ed). Publisher
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. Wiley.
Liu , R. Y. (1988). Bootstrap procedures under some non i.i.d. models . Annals of Statistics 16, 1696 – 1708
Politis, D. and Romano, J, (1994). The Stationary bootstap. The journal of American Statistical Association. 89 (428), 1303-1312
Schafer, J. L., & Graham, J. W. (2002). Multiple imputation for missing data: A cautionary tale. Sociological Methods & Research, 31(4), 445-454.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.