https://doi.org/10.17037/DATA.00000961
Data from a cross-sectional survey on non-communicable diseases and their risk factors carried out between 2013 and 2017 in one rural and one urban site in Malawi by the Malawi Epidemiology Intervention Research Unit (MEIRU). Data also include some anthropometric measures and laboratory test results, and self-report data relating to socio-economic status.
Electronic data collection method was used with the Open Data Kit (ODK) software on tablets. Interviews were conducted face-to-face by trained local staff, at the household of each participant in the local language. The majority of the data come from this face-to-face questionnaire, with some data coming from lab test results, and anthropometry measurements (carried out in the field at the same time as the questionnaire).
Data from questionnaire and lab sources for the two sites were combined and then the two site’s data appended together. New variables were created from the raw data to assist analysis and anonymisation processes (see ‘privacy’ section below).
Karonga district, northern Malawi (the Karonga Health and Demographic Surveillance Site)
Lilongwe, Central Malawi (Area 25 Research Site)
Data quality checks were carried out prospectively during data collection including range (plausibility) checks and restricted data entry and cross-checking between data tables to ensure all records were accounted for.
Human
One record is included for each individual, covering the period of the MEIRU NCD survey in 2013. Participant identifying information, including names, locations, and study codes have been scrambled or removed. ID numbers were scrambled using Nesstar Publisher and Statistical Disclosure Control methods described here (https://sdcpractice.readthedocs.io/en/latest/) were applied using sdcMicro software. Three SDC methods found in sdcMicro were applied to these data: Recoding, local suppression and Post Randomisation Method
In 3 variables agegroup5 (5 years age groups), cmarital (marital status) and educat (education level), the detail is reduced by recoding values. Table 1 gives an overview of the recorded variables
Table 2: Overview of recoded variables
Variable | Description | Approach used | Before | After |
agegroup5 | 5 years age group | Recode 5 years intervals to 10 years intervals | 20 | 7 |
cmarital | Marital status | Grouped together the divorced, separated and the widowed | 5 | 4 |
educat | Education level | Grouped together letarate no formal education, literate standard 1 - 5 and literate standard 6 - 8 into literate up to primary school level. Combined secondary and tertiary level into Secondary level and higher | 7 | 4 |
We further anonymise the data by suppressing certain values. Values of the key variables considered to pause elevated risk of disclosure were deleted. Table 2 gives an overview of the number of suppressed values per variable.
Table 2 Overview of Suppressions
Key Variable | Description | Suppressions (number) | Suppressions (%) |
sex | Sex | 0 | 0 |
agegroup10 | 10 year age group | 0 | 0 |
site | Study site | 0 | 0 |
cmarital | Marital status | 48 | 0.15 |
educat | Education level | 216 | 0.716 |
occupation | Occupation | 1208 | 3.95 |
32 categorical variables in the data were considered to be spontaneous recognition variables meaning that they are variables that could potentially identify individuals without much searching. These variables are: 'drinker', 'smoker', 'agedibp_cat', 'agediab_cat', 'agechole_cat', 'ageathma_cat', 'agetb_cat', 'electric_mode', 'parlamp_mode', 'telv_mode', 'mopho_mode', 'landpho_mode', 'sofa_mode', 'refrig_mode', 'tabchar_mode', 'egcook_mode', 'watpipe_fmode', 'motcyc_mode', 'car_mode', 'bicyc_mode', 'radio_mode', 'cow_mode', 'oxcart_mode', 'hhwate_fmode', 'wheze', 'hhpossscore_mode', 'hhpossscorek_mode', 'hhpossscorel_mode'.
Local suppression was considered unsuitable for these variables since the information loss was going to be huge. Consequently a perturbative method called Post Randomisation Method (Templ, Meindl, & Kowarik, 2013) was used. This method alters some values of the selected variables according to some reclassification rules such that if an intruder attempts to re-identify individuals, they make matches but the match will be to a wrong individual.
Table 3: Overview of perturbed variables and the number of values changed using PRAM
Variable | Description | Changed categories | changed (%) |
drinker | summary variable of number of units drank (D) | 539 | 1.76% |
smoker | summary variable of number of cigarettes smoked (D) | 468 | 1.53% |
agedibp_cat | Age diagnosed high blood pressure | 139 | 0.45% |
agediab_cat | Age diagnosed diabetes | 36 | 0.12% |
agechole_cat | Age diagnosed cholesterol | 9 | 0.03% |
ageathma_cat | Age diagnosed asthma | 125 | 0.41% |
agetb_cat | Age diagnosed TB | 72 | 0.24% |
electric_mode | Whether household has electricity (household mode) (D) | 3604 | 11.79% |
parlamp_mode | Whether household owns a paraffin lamp (household mode) (D) | 1708 | 5.59% |
telv_mode | Whether household owns a tv (household mode) (D) | 3425 | 11.20% |
mopho_mode | Whether household owns a mobile phone (household mode) (D) | 3719 | 12.16% |
landpho_mode | Whether household owns a land phone (household mode) (D) | 981 | 3.21% |
sofa_mode | Whether household owns a sofa (household mode) (D) | 2658 | 8.69% |
refrig_mode | Whether household owns a fridge (household mode) (D) | 1800 | 5.89% |
tabchar_mode | Whether household owns a table chairs (household mode) (D) | 1967 | 6.43% |
egcook_mode | Whether household owns an electric or gas cooker (household mode) (D) | 2185 | 7.15% |
watpipe_fmode | Whether household has water piped to house (female only mode) (D) | 1641 | 5.37% |
motcyc_mode | Whether household owns a motorcycle (household mode) (D) | 255 | 0.83% |
car_mode | Whether household owns a car (household mode) (D) | 1811 | 5.92% |
bicyc_mode | Whether household owns a bicycle (household mode) (D) | 2010 | 6.57% |
radio_mode | Whether household owns a radio (household mode) (D) | 3723 | 12.18% |
cow_mode | Whether household owns a cow (household mode) (D) | 1128 | 3.69% |
oxcart_mode | Whether household owns an oxcart (household mode) (D) | 782 | 2.56% |
hhwate_fmode | Household water source (female only mode) (D) | 2595 | 8.49% |
wheze | Wheezing in last year | 270 | 0.88% |
hhpossscore_mode | Karonga & Lilongwe: summary possession score using mode values (D) | 3131 | 10.24% |
hhpossscorek_mode | Karonga only: summary possession score using the mode values (D) | 902 | 2.95% |
hhpossscorel_mode | Lilongwe only: summary possession score using the mode values (D) | 1589 | 5.20% |
Ethics approval was obtained from LSHTM and the Malawi National Health Sciences Research Committee (NHSRC).
The ethics protocol numbers are: LSHTM #6303 and NHSRC #1072
Cardiovascular diseases, diabetes, risk factors, Africa, Malawi
English
Understanding local determinants of cardiovascular disease and diabetes to inform novel interventional strategies
Current PI - Professor Mia Crampin
Former PIs’ Professor Moffat Nyirenda and Professor Shabbar Jaffar.
Wellcome Trust
098610/Z/12/Z
Forename | Surname | Faculty / Dept | Institution | |
Mia | Crampin | Epidemiology & Population Health /Population Health | London School of Hygiene & Tropical Medicine | Data Creator |
Estelle | McLean | Epidemiology & Population Health /Population Health | London School of Hygiene & Tropical Medicine | Data Creator |
Alison | Price | Epidemiology & Population Health /Population Health | London School of Hygiene & Tropical Medicine | Data Creator |
Keith | Branson | Epidemiology & Population Health /Population Health | London School of Hygiene & Tropical Medicine | Senior Computer Manager |
Chifundo | Kanjala | Epidemiology & Population Health /Population Health | London School of Hygiene & Tropical Medicine | Data Documentation and Statistical Disclosure Control |
Forename | Surname | Faculty / Dept | Institution | Role |
Elizabeth | Munthali | Malawi Epidemiology Intervention Research Unit (MEIRU) | Data documentation | |
Dominic | Nzundah | Malawi Epidemiology Intervention Research Unit (MEIRU) | Data documentation |
Filename | Description | Access status | Licence |
meiru_ncdopen.csv | The dataset is a subset of variables from the MEIRU NCD survey for investigating the burden of diabetes, overweight and obesity, hypertension, and multimorbidity, their treatment, and their associations with lifestyle and other factors in Malawi (Price et al., 2018). The topics covered in this subset are: Diabetes, hypertension, BP, asthma, obesity, associated risk factors in the form of demographics, education, occupation, household asset ownership status and associated household assets ownership scores | Available | Creative Commons Attribution (CC-BY) for open data |