Bayesian adjustment for misclassification in mortality data
Mohamad Amin Pourhoseingholi
Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.
Gastroenterology and Liver diseases Research Center, Shahid Beheshti University of Medical Sciences, Iran
Correspondence: Mohamad Amin Pourhoseingholi, Gastroenterology and Liver diseases Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran, Tel 982122432515, Fax 982122432517
Received: September 28, 2014 | Published: November 12, 2014
Citation: Pourhoseingholi MA. Bayesian adjustment for misclassification in mortality data. Biom Biostat Int J. 2014;1(2):44-45. DOI: 10.15406/bbij.2014.01.00010
Download PDF
In medical studies, a difficulty in drawing inference from categorical data is the existence of misclassification. Misclassification is the disagreement between the observed and the true value. Sick individuals may be diagnosed as healthy or the causes of diseases or death may be misjudged. There is an effect of misclassification on estimation and hypothesis testing, often leading to biased estimates, and can therefore cause one to underestimate health risks.1 The effect of misclassification was first noted by Bross2 and in statistical literature, two approaches are recommended, the first; using a small validation sample3 and the second; Bayesian analysis in which subjective prior information on at least some subset of the parameters is used to re-estimate misclassified statistic.4–6
The difficulty of first approach is that without the presence of additional information beyond the correct data, it is not possible to take into account the effect of misclassification and the difficulty with re-sampling is the necessity of an infallible classifier, which may not exist or may be expensive.7 On the other hand, the Bayesian literature on this topic is steadily growing. Most importantly, more complex model could be handled due to development of computational techniques (e.g., Monte Carlo methods).
Among medical indexes, mortality is a familiar projection in the assessment of the burden of diseases. But this aim needs reliable death registry systems which reports death statistics, annually and accurately. Besides, the analysis of death statistics subject to misclassification is a major problem in epidemiological analysis.1 Although the World Health Organization (WHO) has encouraged member states to introduce systems of death registration involving medical certification of the cause of death, the misclassification or underestimation of mortality data is still happened in official statistics, most of them in developing countries.
Bayesian approach received much attention in the case of misclassification for mortality data. Whittemore and Gong incorporated supplemental data on both true and fallible disease and used this approach to estimate cervical cancer mortality rates in Poisson regression4 and Sposto et al.5 developed this likelihood to assess the effect of diagnostic misclassification on non-cancer and cancer mortality dose–response. Stamey et al.1 provided a Bayesian approach, which extends the models of Whittemore & Gong4 and Sposto et al.5 But their technique dose not assumes that the misclassification parameters are known. Also, the prior information on the misclassification parameters would be used instead of validation data.2 They used this Bayesian approach in data consisting of the number of deaths due to cancer and non-cancer among residents of Hiroshima and Nagasaki, Japan. We derived an extension of models proposed by Stamey et al.1 to correct and account for misclassification in cancer mortality data.8 Suppose there are two sample groups for death classification;
and
where r is the covariate pattern, y1 is the exact cause of death and y2 is the misclassified group in which the cause of death in the first group was incorrectly labeled, and
and
in which μi is the observed rate of death mortality for the covariate pattern in Stamey et al.1 approach, there is a possibility of two way incorrectly labeled but in our approach, just one group supposed to be misclassified because of the nature of real data). Let θ be the probability that an observation from group 1 is incorrectly labeled in group 2. If the actual rate of death for each group (unknown) is supposed to be as λi, the relation between actual rate and observed rate can be written in following form;
and
.
The joint distribution of the observable mortality data in this case of misclassification is proportional to;
To perform Bayesian inference, one can assume beta prior distribution for the misclassified parameter, i.e.
Because θ is an unknown parameter, we employed a latent variable approach according to Paulino et al.,9,10 Liu et al.11 and Stamey et al.1 to simplify the full conditional models and estimate the posterior distribution using a Gibbs sampling algorithm.
In this case, we define to be the number of counts from the first group incorrectly labeled as being in the misclassified group. So; and finally the posterior appears in the following form;
This approach was employed to correct the misclassification in cancer mortality data.12,13 In the absence of double sampling or valid data, Bayesian approach would be a good alternative to eliminate the effects of misclassification for mortality data, typically for death statistics of developing countries, which data are subject to misclassification or under-reporting. Bayesian technique is flexible and easily handled by computational analysis.
Authors declare that there are no conflicts of interests.