ITACOSM 2017 - ITAlian COnference on Survey Methodology
Bologna, June 14-16
In the general framework of linear regression models (Kutner et al., 2004), prediction intervals of the conditional response are obtained quite easily under a set of classical assumptions concerning the stochastic component of the model. In particular, denoting with $y = X \beta + \epsilon$ the linear model, we usually assume $\epsilon \sim N(0, \sigma^2)$. Nevertheless, some serious issues may arise with respect to such assumptions in real applications, especially in small samples. The resulting prediction intervals can then suffer in terms of empirical coverage and width.
Several methods have been proposed in literature for dealing with the violation of the assumptions made on $\epsilon$. They differ with respect to the type of target assumption. Homoscedastic residual bootstrap (Davison & Hinkley, 1997) has been proposed in case normality cannot be assumed even asymptotically. When heteroscedasticity is unpatterned – or, at least, difficult to model – a method called wild bootstrap (Wu, 1986) is one local option for estimating each error’s distribution. The Generalized Additive Models for Location, Scale and Shape (GAMLSS) (Rigby and Stasinopoulos, 2005) are another possible choice, in that such models are able to set different distributions for the response so to properly estimate conditional changes in mean, standard deviation, skewness, and kurtosis. However, properly specifying models in real life applications is not the easiest task, in that the information on the population is mostly limited. Therefore, quantile regression (Koenker, 2005) proves to be a nonparametric alternative for obtaining prediction intervals. In fact, considering the two quantiles leaving a proportion $\alpha/2$ in the left and right tail respectively, they immediately provide a $(1 - \alpha)$% prediction interval for a new observation $y_+$ conditional upon $x_+$, i.e. a future value for $X$.
This study exploits a simulation approach in order to compare the prediction intervals built through QR to the ones obtained parametrically via Linear Regression (OLS), homoscedastic residual bootstrap and wild bootstrap. A variant of the wild bootstrap is also proposed, since the original method only provides a solution for estimation, but not for prediction.