LINEAR REGRESSION WITH DATA MISSING NOT AT RANDOM: BOOTSTRAP APPROACH

Authors

Abstract

OLS regressions have a set of assumption in order to have its point and interval estimates to be unbiased and efficient. Data missing not at random (MNAR) can pose serious estimations issues in the linear regression. In this study we evaluate the performance of OLS confidence interval estimates with MNAR data. We also suggest bootstrapping as a remedy for such data cases and compare the traditional confidence intervals against bootstrap ones. As we need to know the true parameters, we carry out a simulations study. Research results indicate that both approaches show similar results having similar intervals size. Given that bootstrap required a lot of computations, traditional methods is still recommended to be used even in case of MNAR

Keywords:

linear model sample size confidence Interval bootstrap accuracy interval size missing not at random

References

Carpenter, J. R., & Kenward, M. G. (2012). Missing data in clinical trials: a practical guide. Practical Guides to Biostatistics and Epidemiology. Cambridge University Press.

Chernick, M. R., and LaBudde, R. A. (2014). An introduction to bootstrap methods with applications to R. John Wiley & Sons.

Chernozhukov, V., and Hong, H. (2003). An MCMC approach to classical estimation. Journal of Econometrics, 115(2), 293-346.

Davison , A. C. , and Hinkley , D. V. (1997). Bootstrap Methods and Their Applications. Cambridge University Press, Cambridge .

DiCiccio , T., and Efron , B. (1992). More accurate confidence intervals in exponential families. Biometrika 79, 231 – 245 .

Efron , B., and Tibshirani , R. (1986). Bootstrap methods for standard errors, confidence intervals and other measures of statistical accuracy. Statistical Science. Vol. 1 , 54 – 77

Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1), 1-26.

Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. SIAM, Philadelphia

Fan, Y., and Li, Q. (2004). A consistent model specification test based on the kernel density estimation. Econometrica, 72(6), 1845-1858.

Flachaire, E. (2007). Bootstrapping heteroscedastic regression models: wild bootstrap vs pairs bootstrap. Computational Statistics and Data Analysis, 49 (2), 361-376

Freedman , D. A. (1981). Bootstrapping regression models. Annals of Statistics, 9, 1218 – 1228

Graham, J. W. (2003). Adding missing-data-relevant variables to FIML-based structural equation models. Structural Equation Modeling, 10(1), 80-100.

Greene, W. H. (2021) Econometric Analysis, 8th edn, Pearson

Gujarati, D. N., Porter, D. C., and Gunasekar, S. (2012). Basic econometrics. McGraw-Hill Higher Education

He, Y., & Zaslavsky, A. M. (2012). Diagnostics for multiple imputation in surveys with missing data. Biometrika, 99(4), 731-745.

Horowitz, J. L., and Markatou, M. (1996). Semiparametric estimation of regression models for panel data. Review of Economic Studies, 63(1), 145-168.

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2023). An Introduction to Statistical Learning. Publisher.

Lind, D. A., Marchal, W. G., and Wathen, S. A. (1967). Statistical Techniques in Business and Economics (2nd ed). Publisher

Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. Wiley.

Liu , R. Y. (1988). Bootstrap procedures under some non i.i.d. models . Annals of Statistics 16, 1696 – 1708

Politis, D. and Romano, J, (1994). The Stationary bootstap. The journal of American Statistical Association. 89 (428), 1303-1312

Schafer, J. L., & Graham, J. W. (2002). Multiple imputation for missing data: A cautionary tale. Sociological Methods & Research, 31(4), 445-454.

Downloads

Published

How to Cite

LINEAR REGRESSION WITH DATA MISSING NOT AT RANDOM: BOOTSTRAP APPROACH. (2024). Economic Development and Analysis, 2(4), 492-502. https://doi.org/10.60078/2992-877X-2024-vol2-iss4-pp492-502