Malawi Epidemiology and Intervention Research Unit Non-Communicable Disease Survey data, 2013-2017 - User Guide

Persistent Identifier

Data Description

Data from a cross-sectional survey on non-communicable diseases and their risk factors carried out between 2013 and 2017 in one rural and one urban site in Malawi by the Malawi Epidemiology Intervention Research Unit (MEIRU). Data also include some anthropometric measures and laboratory test results, and self-report data relating to socio-economic status.

Data Collection Methods

Electronic data collection method was used with the Open Data Kit (ODK) software on tablets. Interviews were conducted face-to-face by trained local staff, at the household of each participant in the local language. The majority of the data come from this face-to-face questionnaire, with some data coming from lab test results, and anthropometry measurements (carried out in the field at the same time as the questionnaire).

Data Analysis and Preparation

Data from questionnaire and lab sources for the two sites were combined and then the two site’s data appended together. New variables were created from the raw data to assist analysis and anonymisation processes (see ‘privacy’ section below).

Geographic regions

Karonga district, northern Malawi (the Karonga Health and Demographic Surveillance Site)

Lilongwe, Central Malawi (Area 25 Research Site)

Key dates

Quality Controls

Data quality checks were carried out prospectively during data collection including range (plausibility) checks and restricted data entry and cross-checking between data tables to ensure all records were accounted for.




One record is included for each individual, covering the period of the MEIRU NCD survey in 2013. Participant identifying information, including names, locations, and study codes have been scrambled or removed. ID numbers were scrambled using Nesstar Publisher and Statistical Disclosure Control methods described here ( were applied using sdcMicro software. Three SDC methods found in sdcMicro were applied to these data: Recoding, local suppression and Post Randomisation Method

Reducing detail in variables by recoding

In 3 variables agegroup5 (5 years age groups), cmarital (marital status) and educat (education level), the detail is reduced by recoding values. Table 1 gives an overview of the recorded variables

Table 2: Overview of recoded variables

Variable Description Approach used Before After
agegroup5 5 years age group Recode 5 years intervals to 10 years intervals 20 7
cmarital Marital status Grouped together the divorced, separated and the widowed 5 4
educat Education level Grouped together letarate no formal education, literate standard 1 - 5 and literate standard 6 - 8 into literate up to primary school level. Combined secondary and tertiary level into Secondary level and higher 7 4

Local suppression

We further anonymise the data by suppressing certain values. Values of the key variables considered to pause elevated risk of disclosure were deleted. Table 2 gives an overview of the number of suppressed values per variable.

Table 2 Overview of Suppressions

Key Variable Description Suppressions (number) Suppressions (%)
sex Sex 0 0
agegroup10 10 year age group 0 0
site Study site 0 0
cmarital Marital status 48 0.15
educat Education level 216 0.716
occupation Occupation 1208 3.95

Post Randomisation Method

32 categorical variables in the data were considered to be spontaneous recognition variables meaning that they are variables that could potentially identify individuals without much searching. These variables are: 'drinker', 'smoker', 'agedibp_cat', 'agediab_cat', 'agechole_cat', 'ageathma_cat', 'agetb_cat', 'electric_mode', 'parlamp_mode', 'telv_mode', 'mopho_mode', 'landpho_mode', 'sofa_mode', 'refrig_mode', 'tabchar_mode', 'egcook_mode', 'watpipe_fmode', 'motcyc_mode', 'car_mode', 'bicyc_mode', 'radio_mode', 'cow_mode', 'oxcart_mode', 'hhwate_fmode', 'wheze', 'hhpossscore_mode', 'hhpossscorek_mode', 'hhpossscorel_mode'.

Local suppression was considered unsuitable for these variables since the information loss was going to be huge. Consequently a perturbative method called Post Randomisation Method (Templ, Meindl, & Kowarik, 2013) was used. This method alters some values of the selected variables according to some reclassification rules such that if an intruder attempts to re-identify individuals, they make matches but the match will be to a wrong individual.

Table 3: Overview of perturbed variables and the number of values changed using PRAM

Variable Description Changed categories changed (%)
drinker summary variable of number of units drank (D) 539 1.76%
smoker summary variable of number of cigarettes smoked (D) 468 1.53%
agedibp_cat Age diagnosed high blood pressure 139 0.45%
agediab_cat Age diagnosed diabetes 36 0.12%
agechole_cat Age diagnosed cholesterol 9 0.03%
ageathma_cat Age diagnosed asthma 125 0.41%
agetb_cat Age diagnosed TB 72 0.24%
electric_mode Whether household has electricity (household mode) (D) 3604 11.79%
parlamp_mode Whether household owns a paraffin lamp (household mode) (D) 1708 5.59%
telv_mode Whether household owns a tv (household mode) (D) 3425 11.20%
mopho_mode Whether household owns a mobile phone (household mode) (D) 3719 12.16%
landpho_mode Whether household owns a land phone (household mode) (D) 981 3.21%
sofa_mode Whether household owns a sofa (household mode) (D) 2658 8.69%
refrig_mode Whether household owns a fridge (household mode) (D) 1800 5.89%
tabchar_mode Whether household owns a table chairs (household mode) (D) 1967 6.43%
egcook_mode Whether household owns an electric or gas cooker (household mode) (D) 2185 7.15%
watpipe_fmode Whether household has water piped to house (female only mode) (D) 1641 5.37%
motcyc_mode Whether household owns a motorcycle (household mode) (D) 255 0.83%
car_mode Whether household owns a car (household mode) (D) 1811 5.92%
bicyc_mode Whether household owns a bicycle (household mode) (D) 2010 6.57%
radio_mode Whether household owns a radio (household mode) (D) 3723 12.18%
cow_mode Whether household owns a cow (household mode) (D) 1128 3.69%
oxcart_mode Whether household owns an oxcart (household mode) (D) 782 2.56%
hhwate_fmode Household water source (female only mode) (D) 2595 8.49%
wheze Wheezing in last year 270 0.88%
hhpossscore_mode Karonga & Lilongwe: summary possession score using mode values (D) 3131 10.24%
hhpossscorek_mode Karonga only: summary possession score using the mode values (D) 902 2.95%
hhpossscorel_mode Lilongwe only: summary possession score using the mode values (D) 1589 5.20%


Ethics approval was obtained from LSHTM and the Malawi National Health Sciences Research Committee (NHSRC).

The ethics protocol numbers are: LSHTM #6303 and NHSRC #1072


Cardiovascular diseases, diabetes, risk factors, Africa, Malawi

Language of written material


Project title

Understanding local determinants of cardiovascular disease and diabetes to inform novel interventional strategies

Principal Investigator

Current PI - Professor Mia Crampin

Former PIs’ Professor Moffat Nyirenda and Professor Shabbar Jaffar.


Wellcome Trust

Grant Number


Data Creators

Forename Surname Faculty / Dept Institution
Mia Crampin Epidemiology & Population Health /Population Health London School of Hygiene & Tropical Medicine Data Creator
Estelle McLean Epidemiology & Population Health /Population Health London School of Hygiene & Tropical Medicine Data Creator
Alison Price Epidemiology & Population Health /Population Health London School of Hygiene & Tropical Medicine  Data Creator



Epidemiology & Population Health /Population Health

London School of Hygiene & Tropical Medicine

Senior Computer Manager



Epidemiology & Population Health /Population Health

London School of Hygiene & Tropical Medicine Data Documentation and Statistical Disclosure Control

Associated Roles

Forename Surname Faculty / Dept Institution Role
Elizabeth Munthali   Malawi Epidemiology Intervention Research Unit (MEIRU) Data documentation
Dominic Nzundah Malawi Epidemiology Intervention Research Unit (MEIRU) Data documentation

File Description

Filename Description Access status Licence
meiru_ncdopen.csv The dataset is a subset of variables  from the MEIRU NCD survey for investigating the burden of diabetes, overweight and obesity, hypertension, and multimorbidity, their treatment, and their associations with lifestyle and other factors in Malawi (Price et al., 2018). The topics covered in this subset are: Diabetes, hypertension, BP, asthma, obesity, associated risk factors in the form of demographics, education, occupation, household asset ownership status and associated household assets ownership scores Available Creative Commons Attribution (CC-BY) for open data