Diagnostic accuracy of physician review, expert algorithms and data-derived algorithms in adult verbal autopsies

Similar documents
SmartVA Analyze Outputs Interpretation Sheet

How HIV prevalence, number of sexual partners and marital status are related in rural Uganda.

Downloaded from:

Validation of verbal autopsy procedures for adult deaths in China

Adherence to case management guidelines of IMCI by health care workers in Tshwane

Overview of The Child Health and Mortality Prevention Surveillance Network (CHAMPS)

LAO PEOPLE'S DEMOCRATIC REPUBLIC

PAPUA NEW GUINEA 330 COUNTRY HEALTH INFORMATION PROFILES. WESTERN PACIFIC REGION HEALTH DATABANK, 2011 Revision. Female. Total. Male.

CHILD HEALTH. There is a list of references at the end where you can find more information. FACT SHEETS

Under-five and infant mortality constitutes. Validation of IMNCI Algorithm for Young Infants (0-2 months) in India

public facility in the same area context of AMFm

WESTERN PACIFIC REGION HEALTH DATABANK, 2011 Revision. Total Total. Number of new cases. Total

PHO: Metadata for Mortality from Avoidable Causes

Cause of death among reproductive age group women in Maharashtra, India

WHO analysis of causes of maternal death: a proposed protocol for a global systematic review

Prevalent opportunistic infections associated with HIV-positive children 0-5 years in Benin city, Nigeria

Where is care provided mostly for children?

Verbal autopsy can consistently measure AIDS mortality: A validation study in Tanzania and Zimbabwe

SmartVA-Analyze 2.0 Help

Mortality measurement in transition: proof of principle for standardised multi-country comparisons*

InterVA4: An R package to analyze verbal autopsy data

WESTERN PACIFIC REGION HEALTH DATABANK, 2011 Revision. Total

Appendix 3 Sample size and cost of surveys

Is ABC enough to explain changes in HIV prevalence in rural Uganda?

Jeanne S. Sheffield, MD Professor, Maternal-Fetal Medicine University of Texas Southwestern Medical Center

Prevalence of active convulsive epilepsy in sub-saharan Africa and associated risk factors: cross-sectional and case-control studies

IMCI Health Facility Survey

Population-level effect of HIV on adult mortality and early evidence of reversal after introduction of antiretroviral therapy in Malawi

Prevalence of sexual intercourse among school-going adolescents in Coast Province, Kenya

Case Definitions of Clinical Malaria under Different Transmission Conditions in Kilifi District, Kenya

7 The relevance of mortality as an outcome measure of evaluation studies : Illustration using Safe Motherhood Programmes

Integrated Community Case Management (iccm) and the role of pneumonia diagnostic tools

Latent tuberculosis infection

Ex post evaluation Tanzania

Second generation HIV surveillance: Better data for decision making

Children s Health and Nutritional Status. Data from the 2011 Ethiopia Demographic and Health Survey

Leading causes of mortality from diseases and injury in Nepal: a report from national census sample survey

Dates to which data relate Cost and effectiveness data were collected between 1995 and The price year was 1998.

Module 2 Mortality CONTENTS ILLUSTRATED GUIDES

SNP WORKING PAPERS. An Inter-survey Study of Adult Mortality in Rural Malawi ( ) No. 5, June Henry V. Doctor

DISCLOSURE STATEMENT

Good Health & Well-Being. By Alexandra Russo

What we need to know: The role of HIV surveillance in ending the AIDS epidemic as a public health threat

Mortality in Mozambique

MEASURING VIOLENCE AGAINST CHILDREN

WHO Application of ICD-10 for low-resource settings initial cause of death collection. The Startup Mortality List (ICD-10-SMoL) V2.

Downloaded from:

AOHS Global Health. Unit 1, Lesson 3. Communicable Disease

PROGRESS REPORT ON CHILD SURVIVAL: A STRATEGY FOR THE AFRICAN REGION. Information Document CONTENTS

DHS METHODOLOGICAL REPORTS 10

Trends in HIV prevalence and incidence sex ratios in ALPHA demographic surveillance sites,

Cohort Profile: The PROLIFE study in Kerala, India

Critical Review Form Clinical Prediction or Decision Rule

combat HIV/AIDS, malaria and other diseases

Vaccine Preventable Disease Surveillance: Overview. Thomas Cherian, WHO

Caring for sick children in the community: Experiences from Malawi. Humphreys Nsona IMCI Unit

The KEMRI/CDC Health & Demographic Surveillance System

Bias. A systematic error (caused by the investigator or the subjects) that causes an incorrect (overor under-) estimate of an association.

Systematic review of epidemiological evidence

The human immunodeficiency virus/acquired immune

Neighbourhood HEALTH PROFILE A PEEL HEALTH STATUS REPORT BRAMPTON. S. Fennell, Brampton Mayor

Rush to Judgment Understanding the STI-HIV Trials

Sustained 10-year gain in adult life expectancy following antiretroviral therapy roll-out in rural Malawi: July 2005 to June 2014

Risk factors of diarrheal disease among children in the East African countries of Burundi, Rwanda and Tanzania

Verbal Autopsy: Reliability and Validity Estimates for Causes of Death in the Golestan Cohort Study in Iran

An Intelligent Decision Support System for the Prompt Diagnosis of Malaria and Typhoid Fever in the Malaria Belt of Africa

University of Wollongong. Research Online. Australian Health Services Research Institute

Below you will find information about diseases, the risk of contagion, and preventive vaccinations.

Estimating the distribution of causes of death among children age 1 59 months in highmortality countries with incomplete death certification

Report of Medical History

1 Case Control Studies

Overview of the Malaria Vaccine Implementation Programme (MVIP) Prof. Fred Were SAGE meeting 17 April, 2018

8. Preparation of an electronic atlas of amenable mortality (Results of work package 7)

Title: Home Exposure to Arabian Incense (Bakhour) and Asthma Symptoms in Children: A Community Survey in Two Regions in Oman

Impact of Immunization on Under 5 Mortality

By: Aklilu Abrham(BSc, MSc in pediatrics and child health) Tuesday, January 21,

Invest in the future, defeat malaria

IMPACT OF DEVELOPMENT ASSISTANCE FOR HEALTH ON COUNTRY SPENDING

Available Data and Data Quality

WHO: Forum Issue #02 Student Officer Position:

University of Nottingham, UK. Addis Ababa University, Ethiopia

Repeat Measurement of Case-Control Data: Corrections for Measurement Error in a Study of lschaemic Stroke and Haemostatic Factors

Cryptococcal antigen prevalence in HIV-infected Tanzanians: a cross-sectional study and evaluation of a point-of-care lateral flow assay

ustainable Development Goals

Challenges of Observational and Retrospective Studies

Macquarie University ResearchOnline

Introduction. Infections acquired by travellers

THE NEPAL TRUST Working with Health, Community Development & Hope in the Hidden Himalayas Primary Healthcare Programme

Pearce, N (2016) Analysis of matched case-control studies. BMJ (Clinical research ed), 352. i969. ISSN DOI: /bmj.

Review of the Asthma Mortality Rate for Minnesota Residents Aged 55 Years or Older, : When Death Certificates Deserve a Second Look

Post-Test. Select the best answer. There is only one correct answer for each question.

Date of study period: April 12 May 7, 2010 and August September, 2010

HIV testing in black Africans living in England

The DNF Friend or Foe?

Life expectancy at birth for males

Madhukar Pai, MD, PhD Jessica Minion, MD

in control group 7, , , ,

Respiratory Problems in Children

Institutional information. Concepts and definitions

Transcription:

International Epidemiological Association 1999 Printed in Great Britain International Journal of Epidemiology 1999;28:1081 1087 Diagnostic accuracy of physician review, expert algorithms and data-derived algorithms in adult verbal autopsies Maria A Quigley, Daniel Chandramohan and Laura C Rodrigues Background The verbal autopsy (VA) is used to collect information on cause-specific mortality from bereaved relatives. A cause of death may be assigned by physician review of the questionnaires, or by an algorithm. We compared the diagnostic accuracy of physician review, an expert algorithm, and data-derived algorithms. Methods Data were drawn from a multicentre validation study of 796 adult deaths that occurred in hospitals in Tanzania, Ethiopia, and Ghana. A gold standard cause of death was assigned using hospital records and death certificates. The VA interviews were carried out by trained fieldworkers 1 21 months after the subject s death. A cause of death was assigned by physician review and an expert algorithm. Data-derived algorithms that most accurately estimated the causespecific mortality fraction (CSMF) for each cause of death were identified using logistic regression. Results The most common causes of death were tuberculosis/aids (CSMF = 18.6%), malaria (CSMF = 10.7%), meningitis (CSMF = 8.3%), and cardiovascular disorders (CSMF = 8.2%). The CSMF obtained using physician review was within ±20% of the gold standard value for 12 causes of death including the four common causes. The CSMF obtained using the expert algorithm was within ±20% of the gold standard for eight causes of death, including tuberculosis/aids, malaria, and meningitis. The CSMF obtained using the data-derived algorithms was within ±20% of the gold standard for seven causes of death, including tuberculosis/ AIDS, meningitis, and cardiovascular disorders. All three methods yielded a specificity of at least 80% for all causes of death, and a sensitivity of at least 80% for deaths due to injuries and rabies. Conclusions For those settings where physician review is not feasible, expert and data-derived algorithms provide an alternative approach for assigning many causes of death. We recommend that the algorithms proposed herein are validated further. Keywords Verbal autopsy, mortality, validity, algorithm, data-derived, Africa Accepted 22 June 1999 The verbal autopsy (VA) is a widely used method for collecting information on cause-specific mortality where the medical certification of deaths is incomplete. Trained fieldworkers interview bereaved relatives using a questionnaire to elicit information on symptoms experienced by the deceased before death. The completed questionnaires are reviewed by one or more physicians who assign probable causes of death (physician review). An alternative means of assigning causes of death is to Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK. Reprint requests to: Maria Quigley, MRC Tropical Epidemiology Group, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK. follow a set of pre-defined diagnostic criteria given in an expert algorithm. Physician review and expert algorithms have been the subject of validation studies in adults 1 and children, 2 8 and the results have been varied. In a recent comparison of these methods in adults, 1 physician review generally had higher sensitivityand/or specificity than VA diagnosis reached by expert algorithms. Expert algorithms are based on the symptoms deemed by physicians to be essential, confirmatory or supportive in diagnosing a particular cause of death. However, these symptoms are not necessarily the most discriminating ones. For example, in a child VA study in Kenya, 9 fever was regarded as essential in the expert algorithm for malaria, but it had poor discriminating power because 93% of all malaria deaths and 86% of non-malaria deaths had fever. 1081

1082 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY The symptoms that give rise to the most accurate algorithm may be found by applying standard statistical methods to VA data. Such an algorithm will be referred to as a data-derived algorithm. If data-derived algorithms prove to be more accurate than expert algorithms, then these may be used in preference to expert algorithms, or to improve on the accuracy of expert algorithms. Using data from our multicentre validation study of 796 adult VA, we identified the algorithm that most accurately estimates the cause-specific mortality fraction (CSMF) for several causes of death. A comparison of the diagnostic accuracy of physician review, expert algorithms and data-derived algorithms is reported in this paper. Methods Multicentre validation study Between April 1993 and April 1995, a VA validation study was conducted in three sites: Ifakara Hospital, Tanzania; Jimma Hospital, Ethiopia; and Bawku Hospital, Ghana. The study population and methods are described in detail elsewhere. 1 In brief, all adult (aged 15 years) deaths that occurred at the study hospitals during the study period and for which an address at the time of death was within 60 km of the hospital (Ifakara 414; Jimma 327; Bawku 274), were eligible for inclusion in the study. A VA was completed for 315 subjects (76% of those eligible) in Ifakara, 249 (76%) in Jimma, and 232 (85%) in Bawku. Hospital records and death certificates were reviewed by one of the authors (DC) and a local physician; a single cause of death was assigned to each subject and used as the gold standard. The VA interviews were carried out by trained fieldworkers who were not medically qualified, but who had 12 years of formal education. Interviews were conducted 1 21 months (median = 288 days) after the subject s death. Three physicians with experience of working in sub-saharan Africa reviewed the completed questionnaires and, where possible, assigned a primary underlying cause of death, and where appropriate, coprimary and immediate causes of death. A cause of death was assigned if at least two physicians agreed on the primary cause of death. If all three disagreed on the primary cause of death, the questionnaire was reviewed by the panel, and, where possible, a diagnosis was reached by consensus. A hierarchical expert algorithm 1 was also employed to assign a single cause of death to each subject. Data-derived algorithms Each subject in the study was randomly assigned to one of two groups (the train and test datasets), such that the number of deaths due to each (gold standard) cause was the same in each group. If a cause of death had an odd number of subjects, the extra subject was assigned to the train dataset. The subjects in the train dataset (n = 410) were used to derive the algorithms and those in the test dataset (n = 386) were used to validate the algorithms. For subjects in the train dataset, each symptom was cross-tabulated by each cause of death and the following statistics were obtained: sensitivity; specificity; expected number of cause-specific deaths using this symptom (E); odds ratio (OR); and Wald test P-value. As most of the symptoms formed categorical variables, potential discriminant functions were identified using logistic regression rather than discriminant analysis. Table 1 Number of cause-specific deaths in all three sites combined, and in train and test datasets Cause of Train & test CSMF a Train Test death datasets combined (%) dataset dataset Tuberculosis/AIDS 148 18.6 77 71 Malaria 85 10.7 43 42 Meningitis 66 8.3 34 32 CVS b disorders 65 8.2 33 32 Acute abdominal conditions 55 6.9 28 27 Diarrhoeal diseases 51 6.4 26 25 Direct maternal causes 50 6.3 25 25 Neoplasms 34 4.3 18 16 Injuries 33 4.1 18 15 Hepatitis 32 4.0 16 16 Chronic liver diseases 25 3.2 13 12 Anaemia 24 3.0 12 12 Pneumonia 23 2.9 12 11 Renal disorders 21 2.6 11 10 Tetanus 13 1.6 7 6 Rabies 7 0.9 4 3 a Cause-specific mortality fraction. b Cardiovascular system. Those symptoms with OR 2 (or OR 0.5) and Wald P-value 0.10, or with Wald P-value 0.05 were included in a logistic model. Symptoms that were not statistically significant (Wald P-value 0.10) were dropped from the model in a backward stepwise manner. A score was obtained for every subject by summing the coefficients of the model over the symptoms for that subject i.e. score = b 1 X 1 + b 2 X 2 + b 3 X 3 +..., where X i are the symptoms in the model (coded 1 if the symptom is present and 0 if absent), and b i are the log(or) associated with the symptoms. For each cause of death, we identified the cutoff that best separated those who died from this cause (score cutoff) from the remaining subjects (score cutoff). Such a cutoff was obtained by comparing the sensitivity, specificity, and CSMF for every possible cutoff. We chose the cutoff that corresponded to an algorithm with sensitivity and specificity at least 90%. If no such algorithm existed (which was usually the case), we chose the cutoff that gave E closest to the true number of causespecific deaths, such that the sensitivity was at least 50%. The causes of death and their hierarchy We obtained data-derived algorithms for all causes of death given in Table 1. Tuberculosis and AIDS are combined because these two diseases often occur in the same subject and have many symptoms in common, so it is unlikely that the VA can discriminate between them. We designed the data-derived algorithms to be hierarchical with two levels. The first level uses the questions on injury (where types of injury recorded included assault, road traffic accident, war injury, dog bite, other animal bite, and accidental poisoning) and suicide. We assumed that subjects who answered yes to the question on injuries or suicide would have a cause of death due to injuries (including suicide), rabies, or tetanus. We used all subjects to derive algorithms for causes of death due to injuries, rabies, and tetanus, but we used only the subjects who answered no to the questions

DIAGNOSTIC ACCURACY IN ADULT VERBAL AUTOPSIES 1083 on injuries and suicide to derive algorithms for the remaining causes of death. The algorithm for each cause of death was applied in turn, and hence any subject could have more than one cause of death. This is in contrast with the hierarchical expert algorithm. Validation The causes of death assigned by physician review, the expert algorithm, and the data-derived algorithm were compared with the gold standard diagnosis. The data-derived algorithms were validated on the test dataset, so as to minimize the bias in the estimates of sensitivity, specificity, and CSMF. An algorithm with very high sensitivity and specificity could be used at the individual level (to correctly identify subjects with a particular cause of death). We defined a method as having high diagnostic accuracy for use at the individual level if the sensitivity and specificity were at least 90%. An algorithm is useful at the population level if it accurately estimates the CSMF. It should be noted that the CSMF may be accurately estimated even if many subjects are misclassified (at the individual level), provided that the number of false positives equals the number of false negatives. Conversely, even with reasonably high estimates of sensitivity and specificity, the VA estimates of CSMF can be extremely inaccurate. 10,11 We defined a method as having high diagnostic accuracy for use at the population level if the sensitivity was at least 50%, specificity at least 90%, and CSMF within ±20% of the true value. Results Verbal autopsy data were available on 796 adult deaths. Table 2 shows the data-derived algorithms for selected causes of death. Table 2 Sensitivity (se), specificity (sp), crude odds ratio (OR A ), and adjusted odds ratio (OR B ) for symptoms in the final logistic models for selected causes of death Symptom se sp OR A OR B log(or B ) Injuries Injury AND type of injury NOT 89 99 Dog bite AND had injury for 30 days Rabies Dog bite 100 99.75 [Model 1a] Tuberculosis/AIDS if NOT suicide/injury AND score 3.59 Had tuberculosis or HIV/AIDS 65 95 34.40 23.36 3.15 No vomiting or vomiting for 1 day 96 14 4.17 4.19 1.43 Pass stools 4 times per day 34 87 3.38 3.70 1.31 Weight loss 81 62 6.70 3.13 1.14 Cough for 3 weeks 38 94 8.92 2.77 1.02 [Model 1b] (excluding had tuberculosis ) Tuberculosis/AIDS if NOT suicide/injury AND score 8.48 No vomiting or vomiting for 1 day 96 14 4.17 19.55 2.97 No distended abdomen 91 20 2.70 10.03 2.31 Cough 68 79 7.39 6.87 1.93 Ill for 1 4 months 74 76 7.76 5.93 1.78 Blood in stools 21 90 2.18 5.83 1.76 Vomiting 42 72 1.73 4.28 1.45 Pallor 49 77 2.93 4.21 1.44 Fever for 1 month 19 97 6.21 3.59 1.28 [Model 1c] Tuberculosis/AIDS if NOT suicide/injury AND had TB/HIV/AIDS Had tuberculosis or HIV/AIDS 57 96 Malaria if NOT suicide/injury AND score 1.83 2 3 fits per day 16 97 6.49 6.15 1.82 Age 35 44 years 31 84 2.44 3.29 1.19 Confused/loss of consciousness 1 7 days 56 74 3.65 2.55 0.94 No abdominal pain 71 44 2.03 2.42 0.89 Meningitis if NOT suicide/injury AND score 5.85 Stiff neck 59 94 19.11 14.09 2.65 No cough 94 32 8.33 7.19 1.97 No pallor 94 29 7.14 6.65 1.89 Continuous fever 65 73 4.38 4.42 1.49 Stiff body 32 96 10.39 3.44 1.24 continued

1084 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Table 2 Continued Symptom se sp OR A OR B log(or B ) CVS a disorders if NOT suicide/injury AND score 6.47 Puffiness of face 39 89 5.37 7.76 2.05 Cough 3 14 days 21 94 4.31 7.11 1.96 Abdominal distension 8 30 days 15 96 4.53 6.73 1.91 No weight loss 72 48 2.50 5.99 1.79 No yellow discolouration of eyes 88 27 2.78 5.40 1.69 Had hypertension 38 93 7.87 4.79 1.57 Shortness of breath 55 68 2.69 4.45 1.49 Age 45 years 70 64 4.55 3.79 1.33 Diarrhoeal diseases if NOT suicide/injury AND score 11.52 No cough 92 32 5.88 34.51 3.54 Non-specific b abdominal pain 25 93 11.20 20.92 3.04 Change in amount of urine for 2 days 18 97 5.80 12.48 2.52 No distended abdomen 96 19 6.25 12.42 2.52 Age 55 years 58 74 3.95 10.31 2.33 Diarrhoea for 4 30 days 72 87 16.77 8.76 2.17 Diarrhoea 92 71 25.34 7.39 2.00 Cramps in abdomen 62 80 6.13 3.62 1.29 The log(or B ) for symptoms in the model are used to obtain a score. For example, looking at malaria, if the subject has 2 3 fits per day and is aged 35 44 then the score is the sum of the log(or B ) for these symptoms (1.82 + 1.19 = 2.99), which is greater than 1.83, so the subject is classified as a malaria death. a Cardiovascular system. b Refers to abdominal pain that is not cramps, dull ache, or burning pain. Data-derived algorithms There were seven deaths due to rabies (CSMF = 0.9%). In the univariate analysis, dog bite had a sensitivity of 100% (4/4) and specificity of 99.75% (403/404). We formed an algorithm using only dog bite rather than exploring logistic regression. When validated on the test dataset (Table 3), the algorithm slightly overestimated the true number of deaths due to rabies (4 versus 3). There were 33 deaths due to injuries (CSMF = 4.1%). In the univariate analysis, the question on injuries had a sensitivity of 94% and specificity of 97%. We formed an algorithm using only the questions on injury rather than exploring logistic regression. The algorithm assigned a cause of death as injuries if the subject had an injury, the type of injury was not a dog bite, and they had the injury for 30 days before their death. When validated on the test dataset (Table 3), the algorithm slightly overestimated the true number of deaths due to injuries (16 versus 15). There were 148 deaths due to tuberculosis/aids (defined as tuberculosis, AIDS, or tuberculosis and AIDS) (CSMF = 18.6%). In the univariate analysis, cough had a sensitivity of 68% and specificity of 79%. The respondent s report that the deceased had HIV/AIDS had a sensitivity of 15% and specificity of 99.7%. The specificity of had tuberculosis was also very high (96%), although the sensitivity was low (51%). The sensitivity and specificity of these two questions combined were 65% and 95% respectively. Model 1a (Table 2) was based on five symptoms, most notably, had tuberculosis or HIV/AIDS. The algorithm assigned a cause of death as tuberculosis/aids if the score was 3.59, for example, if the subject had tuberculosis or HIV/AIDS plus any other symptom in the model. When validated on the test dataset (Table 3), the algorithm overestimated the true number of tuberculosis/aids deaths (78 versus 71). Model 1b was obtained without the variable had tuberculosis and/or HIV/AIDS, since this question may not be reliable in settings where there is little knowledge about HIV infection, or where few people have undergone an HIV test. When validated on the test dataset, the algorithm overestimated the true number of tuberculosis/aids deaths (90 versus 71). Model 1c was obtained using only the variable had tuberculosis and/or HIV/ AIDS. When validated on the test dataset, the algorithm underestimated the true number of tuberculosis/aids deaths (38 versus 71). There were 85 deaths due to malaria (CSMF = 10.7%), but only 42 were confirmed with a positive blood slide. A dataderived algorithm was obtained that discriminated between the 22 confirmed malaria deaths in the test dataset and the remaining subjects (unconfirmed malaria and all other causes of death). Fever had a sensitivity of 67% and a specificity of 34% and was not strongly or significantly associated with malaria deaths (OR = 1.12, P = 0.9). Fever for 1 7 days was strongly associated with malaria deaths (OR = 2.68, P = 0.04) and had high specificity (83%) but poor sensitivity (33%); it did not remain in the final model. The four symptoms in the final model for confirmed malaria were fitted in a logistic model based on all malaria deaths (Table 2). When validated on the test dataset (Table 3), the algorithm overestimated the true number of malaria deaths (89 versus 42). There were 66 deaths due to meningitis (CSMF = 8.3%). In the univariate analysis, fever had a high sensitivity (94%) but a low specificity (36%) and did not remain in the final model. The final model for meningitis was based on five symptoms (Table 2). When validated on the test dataset (Table 3), the algorithm

DIAGNOSTIC ACCURACY IN ADULT VERBAL AUTOPSIES 1085 Table 3 Number of cause-specific deaths, sensitivity (se), and specificity (sp) for physician review, expert algorithm, and data-derived algorithms a Train & test datasets combined Test dataset Cause of death O E P se P sp P E E se E sp E O E L se L sp L TB/AIDS (model 1a) 148 151** 76 94 174* 68 89 71 78** 65 90 TB/AIDS (model 1b) 148 151** 76 94 174* 68 89 71 90 44 81 TB/AIDS (model 1c) 148 151** 76 94 174* 68 89 71 38 44 98 Malaria 85 80** 33 93 84** 19 90 42 89 48 80 Meningitis 66 60** 59 97 68** 38 94 32 33** 53 95 CVS b disorders 65 57* 48 96 47 25 96 32 32** 34 94 AAC c 55 65* 69 97 57** 46 96 27 18 37 98 Diarrhoeal diseases 51 58* 61 96 46* 43 97 25 30* 48 95 Direct maternal causes 50 58* 82 98 48** 48 98 25 18 52 97 Neoplasms 34 36** 50 98 11 6 98.8 16 24 19 94 Injuries 33 36** 97 99.5 42 94 98.6 15 16** 80 99 Hepatitis 32 14 34 99.6 24 16 98 16 10 0 97 Chronic liver diseases 25 29* 40 98 23** 28 98 12 14* 8 97 Anaemia 24 12 33 99.5 9 17 99.4 12 25 25 94 Pneumonia 23 19* 39 98.7 15 9 98 11 14 27 97 Renal disorders 21 22** 38 98 22** 29 98 10 10** 10 98 Tetanus 13 10 77 100 9 23 99.2 6 5* 17 99 Rabies 7 7 86 99.9 9 100 99.7 3 4 100 99.7 a O = observed number of cause-specific deaths. Expected number of cause-specific deaths using physician review (E P ), expert algorithm (E E ), and logistic regression algorithm (E L ). Sensitivity and specificity of physician review (se P and sp P ), expert algorithm (se E and sp E ), and logistic regression algorithm (se L and sp L ). b Cardiovascular system. c Acute abdominal conditions. ** CSMF estimated to within 10% of its true value. * CSMF estimated to within 10 20% of its true value. slightly overestimated the true number of meningitis deaths (33 versus 32). There were 65 deaths due to CVS (cardiovascular system) disorders (CSMF = 8.2%). The final model for CVS disorders was based on eight symptoms. When validated on the test dataset (Table 3), the algorithm correctly estimated the true number of CVS deaths (32). There were 51 deaths due to diarrhoeal diseases (CSMF = 6.4%). In the univariate analysis, diarrhoea was strongly associated with diarrhoeal deaths (OR = 25.34, P 0.01) and had a sensitivity of 92% and specificity of 71%; it did not remain in the final model. Blood in the stools was strongly associated with diarrhoeal deaths in the univariate analysis (OR = 9.63, P 0.001) but it did not remain in the final model. The final model for diarrhoeal deaths was based on eight symptoms (Table 2). When validated on the test dataset (Table 3), the algorithm overestimated the true number of diarrhoeal deaths (30 versus 25). Validation Table 4 compares the CSMF for different methods of assigning all causes of death, including those causes of death for which data-derived algorithms have not been presented. The number of deaths for some causes was too small to allow precise estimation of diagnostic accuracy. For example, in the test dataset, there were three rabies deaths, and the data-derived algorithm estimated four, which overestimated the true CSMF by 33%, even though sensitivity and specificity were 100%. In general, Table 4 Comparison of cause-specific mortality fractions (CSMF) for physician review, expert algorithm, and data-derived algorithms Physician Expert Data-derived Cause of death True review algorithm algorithm Tuberculosis/AIDS 18.6 19.0 (1) 21.9 (2) 20.2 (1) Malaria 10.7 10.1 (1) 10.6 (1) 23.1 (4) Meningitis 8.3 7.5 (1) 8.5 (1) 8.5 (1) CVS a disorders 8.2 7.2 (2) 5.9 (3) 8.3 (1) AAC b 6.9 8.2 (2) 7.2 (1) 4.7 (3) Diarrhoeal diseases 6.4 7.3 (2) 5.8 (2) 7.8 (2) Direct maternal causes 6.3 7.3 (2) 6.0 (2) 4.7 (3) Neoplasms 4.3 4.5 (1) 1.4 (4) 6.2 (3) Injuries 4.1 4.5 (1) 5.3 (3) 4.1 (1) Hepatitis 4.0 1.8 (4) 3.0 (3) 2.6 (4) Chronic liver diseases 3.1 3.6 (2) 2.9 (1) 3.6 (2) Anaemia 3.0 1.5 (4) 1.1 (4) 6.5 (4) Pneumonia 2.9 2.4 (2) 1.9 (3) 3.6 (3) Renal disorders 2.6 2.8 (1) 2.8 (1) 2.6 (1) Tetanus 1.6 1.3 (3) 1.1 (3) 1.3 (4) Rabies 0.9 0.9 (1) 1.1 (3) 1.0 (3) a Cardiovascular system. b Acute abdominal conditions. Numbers in parentheses show relative difference between CSMF as given by gold standard and CSMF as estimated by each method coded as: 1: 10%, 2: 10 19%, 3: 20 50%, 4: 50%.

1086 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY the data-derived algorithms performed as well as the expert algorithms but not as well as physician review. Physician review had high diagnostic accuracy for use at the population level for many causes of death. The expert algorithm had high diagnostic accuracy for use at the population level for tuberculosis/aids, and possibly for acute abdominal conditions, diarrhoeal diseases, direct maternal causes, and rabies. The data-derived algorithms had high diagnostic accuracy for use at the population level for tuberculosis/aids, meningitis, injuries, and possibly rabies and diarrhoeal diseases. Discussion The present study develops the methodology given in Quigley et al. 9 as a means of assigning causes of adult deaths. Our study is based on a large number of subjects (n = 796), a large number of symptoms ascertained using the VA (n = 88), and has a reliable gold standard diagnosis. The main limitation of our study is that the data-derived algorithms have not been properly validated. The data-derived algorithms are likely to perform better on the test dataset than they would on other datasets, because the train and test datasets were drawn from the same population. The main shortcoming of data-derived algorithms is that they might only discriminate well between causes of death in the population from which they are derived, and therefore, they may have poor repeatability. For example, two symptoms in our data-derived algorithm for meningitis (stiff neck and continuous fever) appear in our expert algorithm, and clinical experience suggests that these symptoms might be associated with meningitis deaths in other settings. However, no cough and no pallor appear in our data-derived algorithm merely because of strong associations between the presence of these symptoms and other causes of death (i.e. tuberculosis/aids and anaemia respectively) in our dataset. It should be noted that, owing to the hierarchical nature of our expert algorithm, no cough appears in the expert algorithm for meningitis because meningitis is assigned only after the exclusion of tuberculosis/aids. Poor repeatability of the data-derived algorithms is a particular problem for causes of death with small numbers. We identified algorithms for causes of death with small numbers, but none except the algorithm for rabies performed well, even in the test dataset. Furthermore, although the algorithm for rabies is encouraging, the numbers are too small to allow proper validation. Another limitation of our study, which applies to all VA validation studies, is that the performance of the VA has been assessed in subjects who died in hospital. The VA may perform differently when applied to subjects who did not die in hospital, though clearly there will be no reliable gold standard outside the hospital setting. The VA technique is appropriate for causes of death that comprise a well-defined and distinct group of subjects, that experience some symptoms in common with one another, that the rest of the population does not experience. This applies to causes of death due to meningitis and injuries, which tend to perform well under VA in adults 1 and children. 2,3 In our study, the VA also performed well for deaths due to tuberculosis/aids. This result is consistent with findings from a study in Uganda, in which the CSMF for HIV infection was 47% using physician review, compared to 50% using HIV serostatus. 12 Although many of the symptoms associated with tuberculosis and AIDS occur in subjects with other infectious diseases, we found combinations of symptoms that discriminated between deaths due to tuberculosis/aids and other causes of death. However, the overlap in symptoms in subjects with tuberculosis, AIDS, or tuberculosis and AIDS makes it unlikely that the VA can discriminate between these diagnoses. Furthermore, our dataderived algorithms for assigning deaths due to tuberculosis only, AIDS only, and tuberculosis with AIDS did not perform well (data not shown). All symptoms in our algorithm (model 1a) except no vomiting or vomiting for 1 day appear in our expert algorithm. The high discriminating power of duration of vomiting arises because only 6% of subjects who vomited for one day died of tuberculosis/aids, compared to 29% of subjects who vomited for 1 day and 16% of subjects with no vomiting. Clearly, this symptom needs validation in other settings. A study in Tanzania, used our VA questionnaire in 178 adult deaths with known HIV serostatus. 13 Few respondents mentioned HIV on the VA questionnaire. A wide range of symptoms were associated with HIV serostatus, although no single symptom nor the WHO AIDS case definition were able to discriminate effectively between HIV-positive and HIV-negative subjects. However, a simplified AIDS case definition had better discrimination, which suggests that case definitions should be validated and modified using data-derived methods in each setting. In a study of maternal deaths in Bangladesh, 14 the VA performed better under certain conditions: in a study of maternal mortality rather than under routine surveillance; when the interviewer was female; when the questionnaire combined open and semi-structured parts; and when the diagnosis was assigned by physicians rather than medical assistants. Most of our interviewers were male, and our study did not focus on maternal deaths, but our VA tool performed well for direct maternal deaths. The poor performance of the VA for deaths due to malaria in our study and in child VA studies 2,7 may be because malaria symptoms tend to be non-specific. Alternatively, the gold standard diagnosis for malaria may be less than perfect. In our study, malaria deaths were confirmed with a positive blood slide for only 50% of cases. The main advantage of physician review over algorithms is that all sections of the VA questionnaire, even the open-ended questions and comments, are utilized. This worked particularly well for maternal deaths and neoplasms, each of which comprised several specific causes of death with few symptoms in common. Maternal deaths comprised abortion, eclampsia, haemorrhage, sepsis, and obstructive labour; neoplasms comprised all cancers. We did not identify algorithms for these specific subgroups because of small numbers, but instead found algorithms for the broad causes given above. There are, however, two main disadvantages of physician review compared to algorithms. First, it is very time-consuming: it is estimated that it takes 20 30 physician-minutes to review each VA questionnaire. 1 Second, it is a subjective approach, which means that it may have poor repeatability and it is difficult to compare CSMF between sites. Our study suggests that all three methods may be used to correctly classify individuals with causes of death due to injuries and rabies. Physician review may also be used to correctly classify individuals with deaths due to maternal causes and possibly tuberculosis/aids. Physician review had high diagnostic accuracy for use at the population level for many causes of

DIAGNOSTIC ACCURACY IN ADULT VERBAL AUTOPSIES 1087 death. The expert algorithms did not perform as well, but might have adequate diagnostic accuracy for use at the population level for tuberculosis/aids, acute abdominal conditions, diarrhoeal diseases, and maternal causes. Our proposed data-derived algorithms for tuberculosis/aids, meningitis, diarrhoeal diseases, injuries, and rabies look promising for use at the population level. Further validation is required to ensure that the symptoms in the proposed algorithms are associated with the cause of death in other settings, and that the cutoffs for the dataderived algorithms are appropriate. Acknowledgements This study was supported by the Department for International Development, UK. We are grateful to Gillian Maude. Maria Quigley is supported by the Medical Research Council, UK. References 1 Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ. Verbal autopsies for adult deaths: their development and validation in a multicentre study. Trop Med Int Health 1998;3:436 46. 2 Snow RW, Armstrong JRM, Forster D et al. Childhood deaths in Africa: uses and limitations of verbal autopsies. Lancet 1992;8:351 55. 3 Sachdev HPS et al. Cited in: Measurement of overall and cause-specific mortality in infants and children: Memorandum from a WHO/ UNICEF meeting. Bull World Health Organ 1994;72:707 13. 4 Mirza NM, Macharia WM, Wafula EM et al. Verbal autopsy: a tool for determining cause of death in a community. E Afr Med J 1990;67: 693 98. 5 Kalter HD, Gray RH, Black RE, Gultiano SA. Validation of postmortem interviews to ascertain selected causes of deaths in children. Int J Epidemiol 1990;19:380 86. 6 Osinski P. Cited in: Measurement of overall and cause-specific mortality in infants and children: Memorandum from a WHO/ UNICEF meeting. Bull World Health Organ 1994;72:707 13. 7 Mobley CC. Cited in: Ewbank DC, Gribble JN (eds). Effects of Health Programs on Child Mortality in Sub-Saharan Africa. Washington, USA: National Academic Press, 1993, p.20. 8 Pacque-Margolis S, Pacque M, Dukuly Z et al. Application of the verbal autopsy during a clinical trial. Soc Sci Med 1990;5:585 91. 9 Quigley MA, Armstrong Schellenberg JRM, Snow RW. Algorithms for verbal autopsies: a validation study in Kenyan children. Bull World Health Organ 1996;74:147 54. 10 Anker M. The effect of misclassification error on reported causespecific mortality fractions from verbal autopsy. Int J Epidemiol 1997; 26:1090 96. 11 Maude GH, Ross DA. The effect of different sensitivity, specificity and cause-specific mortality fractions on the estimation of differences in cause-specific mortality rates in children from studies using verbal autopsy. Int J Epidemiol 1997;26:1097 06. 12 Kamali A, Wagner H-U, Nakiyingi J, Sabiiti I, Kengeya-Kayondo JF, Mulder DW. Verbal autopsy as a tool for diagnosing HIV-related adult deaths in rural Uganda. Int J Epidemiol 1996;25:679 84. 13 Todd J, Balira R, Grosskurth H et al. HIV-associated mortality in a rural Tanzanian population. AIDS 1997;11:801 07. 14 Ronsmans C, Vanneste AM, Chakraborty J, Van Ginneken J. A comparison of three verbal autopsy methods to ascertain levels and causes of maternal deaths in Matlab, Bangladesh. Int J Epidemiol 1998;27: 660 66.