On the use of non-parametric tests, probabilistic index models and quantile regression for comparing measures of locations. A simulation study

13 September 2018
 Download pdf

ASA 2018 - Statistics and the Assessment, Control and Scenarios of Risks. Applications to Food, Social and Health, Economic and Environmental Fields
Pescara, 12-14 September
Convegno dell’Associazione Italiana per la Statistica Applicata
Conference webpage

The comparison between two independent groups is typically based on some measure of locations. In case of skewness or heavy tailed distribution, the Wilcoxon-Mann Whitney (WMW) statistics is the most common choice in applied research. One of the reasons of its widespread (mis)use is to ascribe to its availability in almost all the statistical software. It is well know that WMW does not test the equality of the two population medians but rather it is based on the probability that one population tends to assume values higher than the other (Wilcox, 2012a). Only if the distributions are equal except for their location, the WMW actually represents a test for comparing medians. Moreover, the WMW can lead to conservative decisions depending on the ratio of the variances of the two distributions and/or on the ratio of sample sizes (Brunner and Munzel, 2000). Although a variety of methods have been proposed to improve the standard approach (Wilcox, 2012b), their rather technical features represent an obstacle to a wider use by non-statisticians.

Quantile regression (Koenker and Bassett, 1978; Koenker, 2005; Davino et al., 2013; Furno and Vistocco, 2018), on the other hands, exploits the widespread regression modelling framework and it is currently implemented in the main statistical software. Considering its distribution free nature, quantile regression is a natural candidate to face with the above-mentioned limitations of the WMW statistics. Moreover it can easily account for confounding factors by including them as additional covariates in the model. More recently, the WMW statistics has been framed in a modelling approach by Thas et al. (2012) through the Probabilistic Index Models (PIM) which offers a semiparametric statistical model for testing covariate effects in a two-sample design.

In the current research a simulation study will be conducted to compare the standard approaches with quantile regression in terms of empirical significance and power levels. Different simulation settings will be considered starting from scenarios where the WMW assumptions are respected and gradually moving to more realistic ones. The simulation settings will include also the effect of confounding factors by comparing the quantile regression results with the PIM approach.

« Preference mapping using quantile regression | Modeling preferences: beyond the average effects »