Viet Nam tuberculosis prevalence survey: equity analysis – Data dictionary

Persistent Identifier:

Data description

An individual-level analytical dataset generated from two consecutive nationally representative surveys conducted in Viet Nam in 2007 and 2017. The source data used to generated this combined dataset may be requested from the Viet Nam National Lung hospital by e-mailing

R code and STATA DO files written and used to merge the two consecutive nationally representative surveys into a single individual-level analytical dataset may be obtained from

Data collection methods

Cross-sectional surveys using multistage cluster sampling based on the estimated prevalence of TB in the country at the time of the study. One survey was conducted in 2007 and another in 2017. Each cluster was a district. Individuals living in enumerated households were eligible for inclusion in the study if >15 and lived in the household for >3 months (2007 survey) but >2 weeks in the 2017 survey. Participants were screened for TB based on self-reported symptoms and chest radiograph results. Tuberculosis case definition was having had a smear test and at least one positive LJ culture.

Data analysis and preparation

An asset index was constructed using principal component analysis (PCA) of size variables (1) the presence of clay floors, (2) wood as fuel for cooking, (3) ownership of a stereo system, (4) television, (5) motorbike, or (6) car. Using the index, households were divided into four groups based on their relative wealth. “Neighbourhood” provincial poverty variables were generated by combining clusters into provinces and imputing provincial poverty values based on data from the World Bank.

Data codebook

Var# Data field Description Answer code Answer label
1 newID Individual study id, of format  A-BB-CCC-DDDD, where A=survey number, BB=cluster, CCC=household, and DDDD=individual.    
2 age_final age in years    
3 agegroup categorised age variable    
      1 15-24
      2 25-34
      3 35-44
      4 45-54
      5 55-64
      6 >65
4 awe Absolute wealth estimate generated based on Hruschka et al.    
5 cough Self-reported cough. If a cough (more than 2 weeks) was reported in either then screening or in-depth interview.    
6 pca_continuous Generated from households that ownthe following assets.    
      1 Clay floor
      2 Cooking with wood
      3 Stereo
      3 TV
      4 Motorbike
      5 Car
7 pca_quartile pca_quartile    
8 poverty_2009 Poverty: GSO-WB poverty headcount (%)    
9 poverty_2013 MOLISA income-based poverty rate    
10 sex_final Participant sex    
      0 Female
      1 Male
11 survey Survey year    
      0 2007 survey
      1 2017 survey
12 symptoms_any If on symptom screen an individual had EITHER a fever, night sweats, weight loss, a cough for more than two weeks, or blood in sputum=1, otherwise 0.    
      1 Individual had EITHER a fever, night sweats, weight loss, a cough for more than two weeks, or blood in sputum on symptom screen
      0 None of the above on symptom screen
13 tbcase TB case definition    
      0 Not a TB case
      1 TB case
14 Wtotps Total weight for differential cluster size, participation by age and sex, stratification by  areas plus postratification weight: weight for divergence from general population curve by age.    
15 previoustb Had TB treatment in the past.    
16 stratum_id Stratum ID    
      1 Urban
      2 Remote
      3 Rural
17 zone Geographic zone    
      1 North
      2 Centre
      3 South