Population surveys and demographic studies are the gold standard for estimating HIV prevalence. However, non-response in these surveys is of major concern, especially if it is not random and complete case analysis becomes an inappropriate data analysis method. Therefore, a comprehensive analysis that will account for the missing data must be used to obtain unbiased HIV prevalence estimates.
Serological samples were collected from participants who were residents of a Demographic Surveillance System (DSS) in Kisesa, Tanzania. HIV prevalence was estimated using three methods. Firstly, using the Complete case analysis (CCA), assuming data were Missing Completely at Random (MCAR). The other two methods, multiple imputations (MI) and inverse probability weighting (IPW) assumed that non-response was missing at random (MAR). For MI, a logistic regression model adjusting for age, sex, residence, and marital status was used to impute 20 datasets to re-estimate the HIV prevalence. The propensity for participating in the sero-survey and being tested for HIV given age, sex, residence, and marital status were generated using logistic regression models. Using the propensity scores, inverse probability weights were derived for participants who were tested for HIV.
The overall CCA HIV prevalence estimate was 6.6% (95% CI: 6.0-7.2), with 5.4% (95% CI: 4.6-6.3) in males and 7.3% (95% CI: 6.6-8.1) in females. Using MI, the overall HIV prevalence was 6.8% (95% CI: 6.2-7.5), 6.2% (95% CI: 5.1-7.3) in males, and 7.4% (95% CI: 6.6-8.2) in females. Using IPW the overall HIV prevalence was 6.7% (95% CI: 6.1-7.4), with 5.5% (95% CI: 4.7-6.5) in males and 7.7% (95% CI: 7.0 - 8.6) in females. HIV prevalence differed significantly between age groups (p<0.001), with the highest estimate in males aged 35-39 and females aged 40-44, and the lowest in both males and females aged 15-19 years.
Complete case analysis underestimates HIV prevalence compared to methods that adjust for missing data. After comparing CCA, MI, and IPW, we found out that the best method to adjust for missing data in population surveys is through the use of multiple imputations.