Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths

Miasnikof, P; Giannakeas, V; Gomes, M; Aleksandrowicz, L; Shestopaloff, AY; Alam, D; Tollman, S; Samarikhalaj, A and Jha, P (2015). Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths. [Dataset]. Figshare. https://doi.org/10.6084/m9.figshare.c.3629489_D1.v1

Copy

BACKGROUND: Verbal autopsies (VA) are increasingly used in low- and middle-income countries where most causes of death (COD) occur at home without medical attention, and home deaths differ substantially from hospital deaths. Hence, there is no plausible "standard" against which VAs for home deaths may be validated. Previous studies have shown contradictory performance of automated methods compared to physician-based classification of CODs. We sought to compare the performance of the classic naive Bayes classifier (NBC) versus existing automated classifiers, using physician-based classification as the reference. METHODS: We compared the performance of NBC, an open-source Tariff Method (OTM), and InterVA-4 on three datasets covering about 21,000 child and adult deaths: the ongoing Million Death Study in India, and health and demographic surveillance sites in Agincourt, South Africa and Matlab, Bangladesh. We applied several training and testing splits of the data to quantify the sensitivity and specificity compared to physician coding for individual CODs and to test the cause-specific mortality fractions at the population level. RESULTS: The NBC achieved comparable sensitivity (median 0.51, range 0.48-0.58) to OTM (median 0.50, range 0.41-0.51), with InterVA-4 having lower sensitivity (median 0.43, range 0.36-0.47) in all three datasets, across all CODs. Consistency of CODs was comparable for NBC and InterVA-4 but lower for OTM. NBC and OTM achieved better performance when using a local rather than a non-local training dataset. At the population level, NBC scored the highest cause-specific mortality fraction accuracy across the datasets (median 0.88, range 0.87-0.93), followed by InterVA-4 (median 0.66, range 0.62-0.73) and OTM (median 0.57, range 0.42-0.58). CONCLUSIONS: NBC outperforms current similar COD classifiers at the population level. Nevertheless, no current automated classifier adequately replicates physician classification for individual CODs. There is a need for further research on automated classifiers using local training and test data in diverse settings prior to recommending any replacement of physician-based classification of verbal autopsies.

Keywords

Cause of death; Computer-coded verbal autopsy; InterVA Tariff Naive Bayes classifier Physician certified verbal autopsy; Verbal autopsy

Item Type	Dataset
Resource Type	Resource Type Resource Description Dataset Quantitative
Capture method	Compilation/Synthesis
Date	25 November 2015
Language(s) of written materials	English
Creator(s)	Miasnikof, P; Giannakeas, V; Gomes, M; Aleksandrowicz, L; Shestopaloff, AY; Alam, D; Tollman, S; Samarikhalaj, A and Jha, P
LSHTM Faculty/Department	Faculty of Epidemiology and Population Health > Dept of Population Health (2012- )
Participating Institutions	St. Michael’s Hospital; University of Toronto; University of the Witwatersrand; Ryerson University; London School of Hygiene & Tropical Medicine
Date Deposited	10 Mar 2016 16:11
Last Modified	09 Feb 2022 18:32
Publisher	Figshare

Explore Further

Aleksandrowicz, Lukasz

Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths

Dept of Population Health (2012- )

Data record - Figshare (Online Data Resource)

Data / Code

File1-WHO_DeathCauseCategories.doc

subject: Data
: Available under Creative Commons: Attribution 3.0
info: Mapping of WHO cause of death categories
description: application/msword
folder_info: 51kB

Download

File4-MDS_Cause_of_death.doc

subject: Data
: Available under Creative Commons: Attribution 3.0
info: Sensitivity and specificity of assignment by cause of death in Indian Million Death Study
description: application/msword
folder_info: 70kB

Download

File6-Matlab_Cause_of_death.doc

subject: Data
: Available under Creative Commons: Attribution 3.0
info: Sensitivity and specificity of assignment by cause of death, on Matlab data
description: application/msword
folder_info: 70kB

Download

Documentation

File3-InterVA-4_TariffMethods.doc

subject: Documentation
: Available under Creative Commons: Attribution 3.0
info: Description of InterVA-4 and Open-source Tariff Method
description: application/msword
folder_info: 26kB

Download

File5-Agincourt_Cause_of_death.doc

subject: Documentation
: Available under Creative Commons: Attribution 3.0
info: Sensitivity and specificity of assignment by cause of death, on Agincourt data
description: application/msword
folder_info: 72kB

Download

Study Instrument

File2-Rcode.zip

subject: Study Instrument
: Available under Creative Commons: Attribution 3.0
info: R code used to produce the results in this study
folder_zip: application/x-zip
folder_info: 103kB

Download

Atom

BibTeX

OpenURL ContextObject in Span

Multiline CSV

OpenURL ContextObject

Dublin Core (with Type as Type)

MPEG-21 DIDL

Data Cite XML

EndNote

HTML Citation

JSON

METS

MODS

RDF+N3

RDF+N-Triples

RDF+XML

Reference Manager

Refer

Simple Metadata

ASCII Citation

EP3 XML

Export

Downloads