Dataset for: Developing HIV risk prediction tools in four African settings – User Guide

Permanent identifier



A quantitative dataset collected as part of an on-going HIV vaccine preparedness study “The PrEPVacc Registration Cohort”. The study was set up at sites in Masaka, Uganda; Durban, South Africa; Maputo, Mozambique; Mbeya and Dar es Salaam, Tanzania, to prepare a population of HIV negative individuals at risk of acquiring HIV for possible participation in the PrEPVacc HIV vaccine efficacy and pre-exposure prophylaxis trial.

It contains information on demographic, HIV risk behavioural characteristics, HIV prevalence at screening and incidence during follow up, among participants.

Data collection methods

Study data was collected by trained designated staff and entered directly onto the appropriate Case Report Forms(CRFs) or in source documents.

Data analysis and preparation

A study database was set up in OpenClinica, a web-based data management system and hosted at the MRC/UVRI clinical research centre. At each of the study sites, data was double entered by two separate data entry personnel. Data was entered as soon as possible (e.g. within one week) after a visit and routine data cleaning queries were raised and resolved. All data analysis was conducted in STATA V.16 (College Station, TX, US).

Geographic regions

Data was captured at the study sites below:

Key dates

Quality controls

Standardised CRFs were used to collect and record all quantitative data. Project staff were trained in principles of ICH-GCP. Project staff were also trained on the study database, completion of CRFs and data quality control procedures. The Open Clinica database was designed to quality control the data at entry have logic checks such as   validation of double data entry discrepancy checks through discrepancy note alerts, query key missing data, and checking data consistency within individual CRFs.

The database has an audit trail functionality which is used to monitor timeliness and of data entry as well as track records of data corrections/changes in the database. The data was reviewed, cleaned and managed by clinical study monitors and data managers using Stata data quality checking do files. Queries were routinely identified and forwarded to the study sites for resolution. Once raised to the sites, their resolution progress was monitored.


We recruited individuals aged 18–45 years of age, HIV-negative, willing to provide locator information, available for follow-up, and considered to be at risk of HIV infection as per the following risk indicators: suspected/confirmed sexually transmitted infection (STI), unprotected sex with ≥2 partners, unprotected sex with a new partner in the past 3 months, or unprotected transactional sex (giving/receiving money/goods in exchange for sex) in the past month, among others.


At the sites, CRFs and source documents were kept in  secure central locations. All computers were password controlled. All data was collected in a pseudo-anonymised format with every participant having a participant ID. No names were collected in the study database. For this dataset, participants have now also been assigned new PTIDs.


Organisation Ethics ID Other information
Uganda Virus Research Institute Ethics committee Protocol code reference number: GC/127/18/03/637 Approval version date: 13 March 2018
London School of Hygiene and Tropical Medicine LSHTM Ethics Ref: 26494 ‑ 1 Approval version date: 9 September 2021
The Uganda National council for science and technology Ref no.: 2392 Approval version date: 25/06/2018
National Institute for Medical Research (Tanzania) Ref no.: GB/152/377/01/174 Approval version date : 21/03/2018
The South African Medical Research Council Ref no.: 001/20 Approval version date : 30/09/2021
Comité Nacional de Bioética para a Saúde (CNBS) Ref no.: 567/CNBS/18 Approval version date : 20/12/2018


HIV/AIDS; HIV incidence; HIV prevalence; HIV risk score.

Language of written material


Project information

Project name Funder/sponsor Grant number
The PrEPVacc registration cohort study The second European and Developing Countries Clinical Trials Partnership (EDCTP2) Grant number: RIA-2016V-1644


Forename Surname Faculty / Dept Institution Role
Sheila Kansiime Statistics MRC/UVRI & LSHTM Uganda Research Unit Researcher/ Data analysis
Christian Holm Hansen Medical Research Council International Statistics and Epidemiology Group LSHTM, UK Researcher/ Data analysis
Richard Hayes Medical Research Council International Statistics and Epidemiology Group LSHTM, UK Researcher/ Data analysis
Eugene Ruzagira HIV Epidemiology and Intervention Programme MRC/UVRI & LSHTM Uganda Research Unit Researcher / Project Leader
PrEPVacc study team N/a N/a N/a Research Group

Associated roles

Forename Surname Faculty / Dept Institution Role
Gertrude Mutonyi Data MRC/UVRI and LSHTM Data Manager
Ayoub Kakande Data MRC/UVRI and LSHTM Data Manager

File description

Filename Description Access status Licence
PrEPVacc_HIV_prognostic_tools_data Data set containing demographic, HIV risk behavioural characteristics, HIV prevalence at screening and incidence, among participants in the PrEPVacc  HIV vaccine preparedness cohort study Request access for all Data Sharing Agreement (This dataset may be requested for the purpose of research verification and use in academic research, subject to evidence of ethics approval being provided and a data sharing agreement being signed. All requests are handled by the MRC / UVRI and LSHTM Uganda Research Unit)
PrEPVacc_HIV_prognostic_tools_data_codebook Data dictionary for HIV prognostic tools dataset Open Creative Commons Attribution (CCBY)