Some Notes on Evaluating the Prediction Error for the Generalized Estimating Equations

Dario Gregori

Abstract

In spite of the frequent use of generalized estimating equations (Liang and Zeger, 1986), in particular for modeling correlated binary data, there has been devoted very small attention by the literature to arguments like model checking, outliers detection and prediction accuracy evaluation. This paper is intended to focus on the latter aspect, discussing the applicability of some common methods to the generalized estimating equation model:
(i) Apparent error, naive or adjusted according to several criteria (Cp, AIC, BIC);
(ii) cross-validation;
(iii) bootstrap based methods.
The main difficulty in using cross-validation and bootstrap arises from the need of retaining the correlation structure in the data. By sampling clusters instead of observations we retain the correlation present in observations belonging to the same cluster. An advantage of this technique over more model-dependent techniques like bootstrapping residuals is that correlation remains a nuisance term, in line with the spirit of the generalized estimating equations, for which a precise assumption of correlation structure is not needed. Internal and external prediction error are evaluated using the proposed methods with reference to a case study of public health