Phelan, J, Coll, F, Mcnerney, R, Ascher, DB, Pires, DEV, Furnham, N, Coeck, N, Hill-Cawthorne, GA, Nair, MB, Mallard, K, Ramsay, A, Campino, S, Hibberd, M, Pain, A, Rigouts, L and Clark, T. 2015. Whole genome sequences for M.tuberculosis isolates from the TDR strain bank. [Online]. European Nucleotide Archive. Available from: http://www.ebi.ac.uk/ena/data/view/PRJEB11653
Phelan, J, Coll, F, Mcnerney, R, Ascher, DB, Pires, DEV, Furnham, N, Coeck, N, Hill-Cawthorne, GA, Nair, MB, Mallard, K, Ramsay, A, Campino, S, Hibberd, M, Pain, A, Rigouts, L and Clark, T. Whole genome sequences for M.tuberculosis isolates from the TDR strain bank [Internet]. European Nucleotide Archive; 2015. Available from: http://www.ebi.ac.uk/ena/data/view/PRJEB11653
Phelan, J, Coll, F, Mcnerney, R, Ascher, DB, Pires, DEV, Furnham, N, Coeck, N, Hill-Cawthorne, GA, Nair, MB, Mallard, K, Ramsay, A, Campino, S, Hibberd, M, Pain, A, Rigouts, L and Clark, T (2015). Whole genome sequences for M.tuberculosis isolates from the TDR strain bank. [Data Collection]. European Nucleotide Archive. http://www.ebi.ac.uk/ena/data/view/PRJEB11653
Description
Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure and interaction modelling are used to understand the functional effects of putative mutations and provide insight into the molecular mechanisms leading to resistance. To investigate the potential utility of these approaches, we analysed the genomes of 144 Mycobacterium tuberculosis clinical isolates from The Special Programme for Research and Training in Tropical Diseases (TDR) collection sourced from 20 countries in four continents. A genome-wide approach was applied to 127 isolates to identify polymorphisms associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. In addition, the effect of identified candidate mutations on protein stability and interactions was assessed quantitatively with well-established computational methods. The analysis revealed that mutations in the genes rpoB (rifampicin), katG (isoniazid), inhA-promoter (isoniazid), rpsL (streptomycin) and embB (ethambutol) were responsible for the majority of resistance observed. A subset of the mutations identified in rpoB and katG were predicted to affect protein stability. Further, a strong direct correlation was observed between the minimum inhibitory concentration values and the distance of the mutated residues in the three-dimensional structures of rpoB and katG to their respective drugs binding sites.
Description of data capture | All DNA samples underwent Illumina sequencing on the HiSeq 2000 platform at the KAUST genomic facility, generating paired-end reads of 150 bp (Additional file 1: Table S1, pathogenseq.lshtm.ac.uk/tdr, Additional file 1: Table S2). All raw sequence data can be downloaded from the ENA short read archive (accession number PRJEB11653). For the raw sequence data, trimmomatic (v0.33) software [42] (parameters: LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36) was used to remove or truncate reads of low quality. High quality reads were then mapped to the H37Rv reference genome (Genbank accession: AL123456.3) using the BWA-mem (v0.7.12) algorithm [43] (parameters: -c 100 -M -T 50). From the resulting alignments, SAMtools (v1.3) [44] and GATK (v3.5) [45] software (default parameter settings) were used to call SNPs and small indels, and the interaction of variants between the methods retained. Mappability values were calculated along the reference genome using GEM-Mappability software with a k-mer length of 50 bp and a 0.04 % substitution threshold [46]. Non-unique SNP sites (mappability values greater than one) were removed. Sample genotypes were called using the majority allele (minimum frequency 75 %) in positions supported by at least 20-fold total genome coverage, otherwise they were classified as missing. Isolates or SNPs with in excess of 10 % missing genotype calls were excluded. The final dataset included 144 isolates and 17,952 genome-wide SNPs. |
---|---|
Data capture method | Experiment |
Date (Date published in a 3rd party system) | 4 November 2015 |
Language(s) of written materials | English |
Data Creators | Phelan, J, Coll, F, Mcnerney, R, Ascher, DB, Pires, DEV, Furnham, N, Coeck, N, Hill-Cawthorne, GA, Nair, MB, Mallard, K, Ramsay, A, Campino, S, Hibberd, M, Pain, A, Rigouts, L and Clark, T |
---|---|
LSHTM Faculty/Department | Faculty of Epidemiology and Population Health > Dept of Infectious Disease Epidemiology Faculty of Infectious and Tropical Diseases > Dept of Pathogen Molecular Biology |
Participating Institutions | Study consortium |
Funders |
|
---|
Date Deposited | 11 Apr 2016 09:57 |
---|---|
Last Modified | 09 Jul 2021 11:22 |
Publisher | European Nucleotide Archive |
Downloads
Data / Code
Filename: AdditionalFile1-TableS1.docx
Description: The isolates according to geographic location and phenotypic drug resistance
Content type: Dataset
File size: 57kB
Mime-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document