Diagnostic accuracy of physician review, expert algorithms and data-derived algorithms in adult verbal autopsies

International Epidemiological Association 1999 Printed in Great Britain International Journal of Epidemiology 1999;28:1081 1087 Diagnostic accuracy of physician review, expert algorithms and data-derived algorithms in adult verbal autopsies Maria A Quigley, Daniel Chandramohan and Laura C Rodrigues Background The verbal autopsy (VA) is used to collect information on cause-specific mortality from bereaved relatives. A cause of death may be assigned by physician review of the questionnaires, or by an algorithm. We compared the diagnostic accuracy of physician review, an expert algorithm, and data-derived algorithms. Methods Data were drawn from a multicentre validation study of 796 adult deaths that occurred in hospitals in Tanzania, Ethiopia, and Ghana. A gold standard cause of death was assigned using hospital records and death certificates. The VA interviews were carried out by trained fieldworkers 1 21 months after the subject s death. A cause of death was assigned by physician review and an expert algorithm. Data-derived algorithms that most accurately estimated the causespecific mortality fraction (CSMF) for each cause of death were identified using logistic regression. Results The most common causes of death were tuberculosis/aids (CSMF = 18.6%), malaria (CSMF = 10.7%), meningitis (CSMF = 8.3%), and cardiovascular disorders (CSMF = 8.2%). The CSMF obtained using physician review was within ±20% of the gold standard value for 12 causes of death including the four common causes. The CSMF obtained using the expert algorithm was within ±20% of the gold standard for eight causes of death, including tuberculosis/aids, malaria, and meningitis. The CSMF obtained using the data-derived algorithms was within ±20% of the gold standard for seven causes of death, including tuberculosis/ AIDS, meningitis, and cardiovascular disorders. All three methods yielded a specificity of at least 80% for all causes of death, and a sensitivity of at least 80% for deaths due to injuries and rabies. Conclusions For those settings where physician review is not feasible, expert and data-derived algorithms provide an alternative approach for assigning many causes of death. We recommend that the algorithms proposed herein are validated further. Keywords Verbal autopsy, mortality, validity, algorithm, data-derived, Africa Accepted 22 June 1999 The verbal autopsy (VA) is a widely used method for collecting information on cause-specific mortality where the medical certification of deaths is incomplete. Trained fieldworkers interview bereaved relatives using a questionnaire to elicit information on symptoms experienced by the deceased before death. The completed questionnaires are reviewed by one or more physicians who assign probable causes of death (physician review). An alternative means of assigning causes of death is to Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK. Reprint requests to: Maria Quigley, MRC Tropical Epidemiology Group, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK. follow a set of pre-defined diagnostic criteria given in an expert algorithm. Physician review and expert algorithms have been the subject of validation studies in adults 1 and children, 2 8 and the results have been varied. In a recent comparison of these methods in adults, 1 physician review generally had higher sensitivityand/or specificity than VA diagnosis reached by expert algorithms. Expert algorithms are based on the symptoms deemed by physicians to be essential, confirmatory or supportive in diagnosing a particular cause of death. However, these symptoms are not necessarily the most discriminating ones. For example, in a child VA study in Kenya, 9 fever was regarded as essential in the expert algorithm for malaria, but it had poor discriminating power because 93% of all malaria deaths and 86% of non-malaria deaths had fever. 1081

1082 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY The symptoms that give rise to the most accurate algorithm may be found by applying standard statistical methods to VA data. Such an algorithm will be referred to as a data-derived algorithm. If data-derived algorithms prove to be more accurate than expert algorithms, then these may be used in preference to expert algorithms, or to improve on the accuracy of expert algorithms. Using data from our multicentre validation study of 796 adult VA, we identified the algorithm that most accurately estimates the cause-specific mortality fraction (CSMF) for several causes of death. A comparison of the diagnostic accuracy of physician review, expert algorithms and data-derived algorithms is reported in this paper. Methods Multicentre validation study Between April 1993 and April 1995, a VA validation study was conducted in three sites: Ifakara Hospital, Tanzania; Jimma Hospital, Ethiopia; and Bawku Hospital, Ghana. The study population and methods are described in detail elsewhere. 1 In brief, all adult (aged 15 years) deaths that occurred at the study hospitals during the study period and for which an address at the time of death was within 60 km of the hospital (Ifakara 414; Jimma 327; Bawku 274), were eligible for inclusion in the study. A VA was completed for 315 subjects (76% of those eligible) in Ifakara, 249 (76%) in Jimma, and 232 (85%) in Bawku. Hospital records and death certificates were reviewed by one of the authors (DC) and a local physician; a single cause of death was assigned to each subject and used as the gold standard. The VA interviews were carried out by trained fieldworkers who were not medically qualified, but who had 12 years of formal education. Interviews were conducted 1 21 months (median = 288 days) after the subject s death. Three physicians with experience of working in sub-saharan Africa reviewed the completed questionnaires and, where possible, assigned a primary underlying cause of death, and where appropriate, coprimary and immediate causes of death. A cause of death was assigned if at least two physicians agreed on the primary cause of death. If all three disagreed on the primary cause of death, the questionnaire was reviewed by the panel, and, where possible, a diagnosis was reached by consensus. A hierarchical expert algorithm 1 was also employed to assign a single cause of death to each subject. Data-derived algorithms Each subject in the study was randomly assigned to one of two groups (the train and test datasets), such that the number of deaths due to each (gold standard) cause was the same in each group. If a cause of death had an odd number of subjects, the extra subject was assigned to the train dataset. The subjects in the train dataset (n = 410) were used to derive the algorithms and those in the test dataset (n = 386) were used to validate the algorithms. For subjects in the train dataset, each symptom was cross-tabulated by each cause of death and the following statistics were obtained: sensitivity; specificity; expected number of cause-specific deaths using this symptom (E); odds ratio (OR); and Wald test P-value. As most of the symptoms formed categorical variables, potential discriminant functions were identified using logistic regression rather than discriminant analysis. Table 1 Number of cause-specific deaths in all three sites combined, and in train and test datasets Cause of Train & test CSMF a Train Test death datasets combined (%) dataset dataset Tuberculosis/AIDS 148 18.6 77 71 Malaria 85 10.7 43 42 Meningitis 66 8.3 34 32 CVS b disorders 65 8.2 33 32 Acute abdominal conditions 55 6.9 28 27 Diarrhoeal diseases 51 6.4 26 25 Direct maternal causes 50 6.3 25 25 Neoplasms 34 4.3 18 16 Injuries 33 4.1 18 15 Hepatitis 32 4.0 16 16 Chronic liver diseases 25 3.2 13 12 Anaemia 24 3.0 12 12 Pneumonia 23 2.9 12 11 Renal disorders 21 2.6 11 10 Tetanus 13 1.6 7 6 Rabies 7 0.9 4 3 a Cause-specific mortality fraction. b Cardiovascular system. Those symptoms with OR 2 (or OR 0.5) and Wald P-value 0.10, or with Wald P-value 0.05 were included in a logistic model. Symptoms that were not statistically significant (Wald P-value 0.10) were dropped from the model in a backward stepwise manner. A score was obtained for every subject by summing the coefficients of the model over the symptoms for that subject i.e. score = b 1 X 1 + b 2 X 2 + b 3 X 3 +..., where X i are the symptoms in the model (coded 1 if the symptom is present and 0 if absent), and b i are the log(or) associated with the symptoms. For each cause of death, we identified the cutoff that best separated those who died from this cause (score cutoff) from the remaining subjects (score cutoff). Such a cutoff was obtained by comparing the sensitivity, specificity, and CSMF for every possible cutoff. We chose the cutoff that corresponded to an algorithm with sensitivity and specificity at least 90%. If no such algorithm existed (which was usually the case), we chose the cutoff that gave E closest to the true number of causespecific deaths, such that the sensitivity was at least 50%. The causes of death and their hierarchy We obtained data-derived algorithms for all causes of death given in Table 1. Tuberculosis and AIDS are combined because these two diseases often occur in the same subject and have many symptoms in common, so it is unlikely that the VA can discriminate between them. We designed the data-derived algorithms to be hierarchical with two levels. The first level uses the questions on injury (where types of injury recorded included assault, road traffic accident, war injury, dog bite, other animal bite, and accidental poisoning) and suicide. We assumed that subjects who answered yes to the question on injuries or suicide would have a cause of death due to injuries (including suicide), rabies, or tetanus. We used all subjects to derive algorithms for causes of death due to injuries, rabies, and tetanus, but we used only the subjects who answered no to the questions

DIAGNOSTIC ACCURACY IN ADULT VERBAL AUTOPSIES 1083 on injuries and suicide to derive algorithms for the remaining causes of death. The algorithm for each cause of death was applied in turn, and hence any subject could have more than one cause of death. This is in contrast with the hierarchical expert algorithm. Validation The causes of death assigned by physician review, the expert algorithm, and the data-derived algorithm were compared with the gold standard diagnosis. The data-derived algorithms were validated on the test dataset, so as to minimize the bias in the estimates of sensitivity, specificity, and CSMF. An algorithm with very high sensitivity and specificity could be used at the individual level (to correctly identify subjects with a particular cause of death). We defined a method as having high diagnostic accuracy for use at the individual level if the sensitivity and specificity were at least 90%. An algorithm is useful at the population level if it accurately estimates the CSMF. It should be noted that the CSMF may be accurately estimated even if many subjects are misclassified (at the individual level), provided that the number of false positives equals the number of false negatives. Conversely, even with reasonably high estimates of sensitivity and specificity, the VA estimates of CSMF can be extremely inaccurate. 10,11 We defined a method as having high diagnostic accuracy for use at the population level if the sensitivity was at least 50%, specificity at least 90%, and CSMF within ±20% of the true value. Results Verbal autopsy data were available on 796 adult deaths. Table 2 shows the data-derived algorithms for selected causes of death. Table 2 Sensitivity (se), specificity (sp), crude odds ratio (OR A ), and adjusted odds ratio (OR B ) for symptoms in the final logistic models for selected causes of death Symptom se sp OR A OR B log(or B ) Injuries Injury AND type of injury NOT 89 99 Dog bite AND had injury for 30 days Rabies Dog bite 100 99.75 [Model 1a] Tuberculosis/AIDS if NOT suicide/injury AND score 3.59 Had tuberculosis or HIV/AIDS 65 95 34.40 23.36 3.15 No vomiting or vomiting for 1 day 96 14 4.17 4.19 1.43 Pass stools 4 times per day 34 87 3.38 3.70 1.31 Weight loss 81 62 6.70 3.13 1.14 Cough for 3 weeks 38 94 8.92 2.77 1.02 [Model 1b] (excluding had tuberculosis ) Tuberculosis/AIDS if NOT suicide/injury AND score 8.48 No vomiting or vomiting for 1 day 96 14 4.17 19.55 2.97 No distended abdomen 91 20 2.70 10.03 2.31 Cough 68 79 7.39 6.87 1.93 Ill for 1 4 months 74 76 7.76 5.93 1.78 Blood in stools 21 90 2.18 5.83 1.76 Vomiting 42 72 1.73 4.28 1.45 Pallor 49 77 2.93 4.21 1.44 Fever for 1 month 19 97 6.21 3.59 1.28 [Model 1c] Tuberculosis/AIDS if NOT suicide/injury AND had TB/HIV/AIDS Had tuberculosis or HIV/AIDS 57 96 Malaria if NOT suicide/injury AND score 1.83 2 3 fits per day 16 97 6.49 6.15 1.82 Age 35 44 years 31 84 2.44 3.29 1.19 Confused/loss of consciousness 1 7 days 56 74 3.65 2.55 0.94 No abdominal pain 71 44 2.03 2.42 0.89 Meningitis if NOT suicide/injury AND score 5.85 Stiff neck 59 94 19.11 14.09 2.65 No cough 94 32 8.33 7.19 1.97 No pallor 94 29 7.14 6.65 1.89 Continuous fever 65 73 4.38 4.42 1.49 Stiff body 32 96 10.39 3.44 1.24 continued

1084 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Table 2 Continued Symptom se sp OR A OR B log(or B ) CVS a disorders if NOT suicide/injury AND score 6.47 Puffiness of face 39 89 5.37 7.76 2.05 Cough 3 14 days 21 94 4.31 7.11 1.96 Abdominal distension 8 30 days 15 96 4.53 6.73 1.91 No weight loss 72 48 2.50 5.99 1.79 No yellow discolouration of eyes 88 27 2.78 5.40 1.69 Had hypertension 38 93 7.87 4.79 1.57 Shortness of breath 55 68 2.69 4.45 1.49 Age 45 years 70 64 4.55 3.79 1.33 Diarrhoeal diseases if NOT suicide/injury AND score 11.52 No cough 92 32 5.88 34.51 3.54 Non-specific b abdominal pain 25 93 11.20 20.92 3.04 Change in amount of urine for 2 days 18 97 5.80 12.48 2.52 No distended abdomen 96 19 6.25 12.42 2.52 Age 55 years 58 74 3.95 10.31 2.33 Diarrhoea for 4 30 days 72 87 16.77 8.76 2.17 Diarrhoea 92 71 25.34 7.39 2.00 Cramps in abdomen 62 80 6.13 3.62 1.29 The log(or B ) for symptoms in the model are used to obtain a score. For example, looking at malaria, if the subject has 2 3 fits per day and is aged 35 44 then the score is the sum of the log(or B ) for these symptoms (1.82 + 1.19 = 2.99), which is greater than 1.83, so the subject is classified as a malaria death. a Cardiovascular system. b Refers to abdominal pain that is not cramps, dull ache, or burning pain. Data-derived algorithms There were seven deaths due to rabies (CSMF = 0.9%). In the univariate analysis, dog bite had a sensitivity of 100% (4/4) and specificity of 99.75% (403/404). We formed an algorithm using only dog bite rather than exploring logistic regression. When validated on the test dataset (Table 3), the algorithm slightly overestimated the true number of deaths due to rabies (4 versus 3). There were 33 deaths due to injuries (CSMF = 4.1%). In the univariate analysis, the question on injuries had a sensitivity of 94% and specificity of 97%. We formed an algorithm using only the questions on injury rather than exploring logistic regression. The algorithm assigned a cause of death as injuries if the subject had an injury, the type of injury was not a dog bite, and they had the injury for 30 days before their death. When validated on the test dataset (Table 3), the algorithm slightly overestimated the true number of deaths due to injuries (16 versus 15). There were 148 deaths due to tuberculosis/aids (defined as tuberculosis, AIDS, or tuberculosis and AIDS) (CSMF = 18.6%). In the univariate analysis, cough had a sensitivity of 68% and specificity of 79%. The respondent s report that the deceased had HIV/AIDS had a sensitivity of 15% and specificity of 99.7%. The specificity of had tuberculosis was also very high (96%), although the sensitivity was low (51%). The sensitivity and specificity of these two questions combined were 65% and 95% respectively. Model 1a (Table 2) was based on five symptoms, most notably, had tuberculosis or HIV/AIDS. The algorithm assigned a cause of death as tuberculosis/aids if the score was 3.59, for example, if the subject had tuberculosis or HIV/AIDS plus any other symptom in the model. When validated on the test dataset (Table 3), the algorithm overestimated the true number of tuberculosis/aids deaths (78 versus 71). Model 1b was obtained without the variable had tuberculosis and/or HIV/AIDS, since this question may not be reliable in settings where there is little knowledge about HIV infection, or where few people have undergone an HIV test. When validated on the test dataset, the algorithm overestimated the true number of tuberculosis/aids deaths (90 versus 71). Model 1c was obtained using only the variable had tuberculosis and/or HIV/ AIDS. When validated on the test dataset, the algorithm underestimated the true number of tuberculosis/aids deaths (38 versus 71). There were 85 deaths due to malaria (CSMF = 10.7%), but only 42 were confirmed with a positive blood slide. A dataderived algorithm was obtained that discriminated between the 22 confirmed malaria deaths in the test dataset and the remaining subjects (unconfirmed malaria and all other causes of death). Fever had a sensitivity of 67% and a specificity of 34% and was not strongly or significantly associated with malaria deaths (OR = 1.12, P = 0.9). Fever for 1 7 days was strongly associated with malaria deaths (OR = 2.68, P = 0.04) and had high specificity (83%) but poor sensitivity (33%); it did not remain in the final model. The four symptoms in the final model for confirmed malaria were fitted in a logistic model based on all malaria deaths (Table 2). When validated on the test dataset (Table 3), the algorithm overestimated the true number of malaria deaths (89 versus 42). There were 66 deaths due to meningitis (CSMF = 8.3%). In the univariate analysis, fever had a high sensitivity (94%) but a low specificity (36%) and did not remain in the final model. The final model for meningitis was based on five symptoms (Table 2). When validated on the test dataset (Table 3), the algorithm

DIAGNOSTIC ACCURACY IN ADULT VERBAL AUTOPSIES 1085 Table 3 Number of cause-specific deaths, sensitivity (se), and specificity (sp) for physician review, expert algorithm, and data-derived algorithms a Train & test datasets combined Test dataset Cause of death O E P se P sp P E E se E sp E O E L se L sp L TB/AIDS (model 1a) 148 151** 76 94 174* 68 89 71 78** 65 90 TB/AIDS (model 1b) 148 151** 76 94 174* 68 89 71 90 44 81 TB/AIDS (model 1c) 148 151** 76 94 174* 68 89 71 38 44 98 Malaria 85 80** 33 93 84** 19 90 42 89 48 80 Meningitis 66 60** 59 97 68** 38 94 32 33** 53 95 CVS b disorders 65 57* 48 96 47 25 96 32 32** 34 94 AAC c 55 65* 69 97 57** 46 96 27 18 37 98 Diarrhoeal diseases 51 58* 61 96 46* 43 97 25 30* 48 95 Direct maternal causes 50 58* 82 98 48** 48 98 25 18 52 97 Neoplasms 34 36** 50 98 11 6 98.8 16 24 19 94 Injuries 33 36** 97 99.5 42 94 98.6 15 16** 80 99 Hepatitis 32 14 34 99.6 24 16 98 16 10 0 97 Chronic liver diseases 25 29* 40 98 23** 28 98 12 14* 8 97 Anaemia 24 12 33 99.5 9 17 99.4 12 25 25 94 Pneumonia 23 19* 39 98.7 15 9 98 11 14 27 97 Renal disorders 21 22** 38 98 22** 29 98 10 10** 10 98 Tetanus 13 10 77 100 9 23 99.2 6 5* 17 99 Rabies 7 7 86 99.9 9 100 99.7 3 4 100 99.7 a O = observed number of cause-specific deaths. Expected number of cause-specific deaths using physician review (E P ), expert algorithm (E E ), and logistic regression algorithm (E L ). Sensitivity and specificity of physician review (se P and sp P ), expert algorithm (se E and sp E ), and logistic regression algorithm (se L and sp L ). b Cardiovascular system. c Acute abdominal conditions. ** CSMF estimated to within 10% of its true value. * CSMF estimated to within 10 20% of its true value. slightly overestimated the true number of meningitis deaths (33 versus 32). There were 65 deaths due to CVS (cardiovascular system) disorders (CSMF = 8.2%). The final model for CVS disorders was based on eight symptoms. When validated on the test dataset (Table 3), the algorithm correctly estimated the true number of CVS deaths (32). There were 51 deaths due to diarrhoeal diseases (CSMF = 6.4%). In the univariate analysis, diarrhoea was strongly associated with diarrhoeal deaths (OR = 25.34, P 0.01) and had a sensitivity of 92% and specificity of 71%; it did not remain in the final model. Blood in the stools was strongly associated with diarrhoeal deaths in the univariate analysis (OR = 9.63, P 0.001) but it did not remain in the final model. The final model for diarrhoeal deaths was based on eight symptoms (Table 2). When validated on the test dataset (Table 3), the algorithm overestimated the true number of diarrhoeal deaths (30 versus 25). Validation Table 4 compares the CSMF for different methods of assigning all causes of death, including those causes of death for which data-derived algorithms have not been presented. The number of deaths for some causes was too small to allow precise estimation of diagnostic accuracy. For example, in the test dataset, there were three rabies deaths, and the data-derived algorithm estimated four, which overestimated the true CSMF by 33%, even though sensitivity and specificity were 100%. In general, Table 4 Comparison of cause-specific mortality fractions (CSMF) for physician review, expert algorithm, and data-derived algorithms Physician Expert Data-derived Cause of death True review algorithm algorithm Tuberculosis/AIDS 18.6 19.0 (1) 21.9 (2) 20.2 (1) Malaria 10.7 10.1 (1) 10.6 (1) 23.1 (4) Meningitis 8.3 7.5 (1) 8.5 (1) 8.5 (1) CVS a disorders 8.2 7.2 (2) 5.9 (3) 8.3 (1) AAC b 6.9 8.2 (2) 7.2 (1) 4.7 (3) Diarrhoeal diseases 6.4 7.3 (2) 5.8 (2) 7.8 (2) Direct maternal causes 6.3 7.3 (2) 6.0 (2) 4.7 (3) Neoplasms 4.3 4.5 (1) 1.4 (4) 6.2 (3) Injuries 4.1 4.5 (1) 5.3 (3) 4.1 (1) Hepatitis 4.0 1.8 (4) 3.0 (3) 2.6 (4) Chronic liver diseases 3.1 3.6 (2) 2.9 (1) 3.6 (2) Anaemia 3.0 1.5 (4) 1.1 (4) 6.5 (4) Pneumonia 2.9 2.4 (2) 1.9 (3) 3.6 (3) Renal disorders 2.6 2.8 (1) 2.8 (1) 2.6 (1) Tetanus 1.6 1.3 (3) 1.1 (3) 1.3 (4) Rabies 0.9 0.9 (1) 1.1 (3) 1.0 (3) a Cardiovascular system. b Acute abdominal conditions. Numbers in parentheses show relative difference between CSMF as given by gold standard and CSMF as estimated by each method coded as: 1: 10%, 2: 10 19%, 3: 20 50%, 4: 50%.

1086 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY the data-derived algorithms performed as well as the expert algorithms but not as well as physician review. Physician review had high diagnostic accuracy for use at the population level for many causes of death. The expert algorithm had high diagnostic accuracy for use at the population level for tuberculosis/aids, and possibly for acute abdominal conditions, diarrhoeal diseases, direct maternal causes, and rabies. The data-derived algorithms had high diagnostic accuracy for use at the population level for tuberculosis/aids, meningitis, injuries, and possibly rabies and diarrhoeal diseases. Discussion The present study develops the methodology given in Quigley et al. 9 as a means of assigning causes of adult deaths. Our study is based on a large number of subjects (n = 796), a large number of symptoms ascertained using the VA (n = 88), and has a reliable gold standard diagnosis. The main limitation of our study is that the data-derived algorithms have not been properly validated. The data-derived algorithms are likely to perform better on the test dataset than they would on other datasets, because the train and test datasets were drawn from the same population. The main shortcoming of data-derived algorithms is that they might only discriminate well between causes of death in the population from which they are derived, and therefore, they may have poor repeatability. For example, two symptoms in our data-derived algorithm for meningitis (stiff neck and continuous fever) appear in our expert algorithm, and clinical experience suggests that these symptoms might be associated with meningitis deaths in other settings. However, no cough and no pallor appear in our data-derived algorithm merely because of strong associations between the presence of these symptoms and other causes of death (i.e. tuberculosis/aids and anaemia respectively) in our dataset. It should be noted that, owing to the hierarchical nature of our expert algorithm, no cough appears in the expert algorithm for meningitis because meningitis is assigned only after the exclusion of tuberculosis/aids. Poor repeatability of the data-derived algorithms is a particular problem for causes of death with small numbers. We identified algorithms for causes of death with small numbers, but none except the algorithm for rabies performed well, even in the test dataset. Furthermore, although the algorithm for rabies is encouraging, the numbers are too small to allow proper validation. Another limitation of our study, which applies to all VA validation studies, is that the performance of the VA has been assessed in subjects who died in hospital. The VA may perform differently when applied to subjects who did not die in hospital, though clearly there will be no reliable gold standard outside the hospital setting. The VA technique is appropriate for causes of death that comprise a well-defined and distinct group of subjects, that experience some symptoms in common with one another, that the rest of the population does not experience. This applies to causes of death due to meningitis and injuries, which tend to perform well under VA in adults 1 and children. 2,3 In our study, the VA also performed well for deaths due to tuberculosis/aids. This result is consistent with findings from a study in Uganda, in which the CSMF for HIV infection was 47% using physician review, compared to 50% using HIV serostatus. 12 Although many of the symptoms associated with tuberculosis and AIDS occur in subjects with other infectious diseases, we found combinations of symptoms that discriminated between deaths due to tuberculosis/aids and other causes of death. However, the overlap in symptoms in subjects with tuberculosis, AIDS, or tuberculosis and AIDS makes it unlikely that the VA can discriminate between these diagnoses. Furthermore, our dataderived algorithms for assigning deaths due to tuberculosis only, AIDS only, and tuberculosis with AIDS did not perform well (data not shown). All symptoms in our algorithm (model 1a) except no vomiting or vomiting for 1 day appear in our expert algorithm. The high discriminating power of duration of vomiting arises because only 6% of subjects who vomited for one day died of tuberculosis/aids, compared to 29% of subjects who vomited for 1 day and 16% of subjects with no vomiting. Clearly, this symptom needs validation in other settings. A study in Tanzania, used our VA questionnaire in 178 adult deaths with known HIV serostatus. 13 Few respondents mentioned HIV on the VA questionnaire. A wide range of symptoms were associated with HIV serostatus, although no single symptom nor the WHO AIDS case definition were able to discriminate effectively between HIV-positive and HIV-negative subjects. However, a simplified AIDS case definition had better discrimination, which suggests that case definitions should be validated and modified using data-derived methods in each setting. In a study of maternal deaths in Bangladesh, 14 the VA performed better under certain conditions: in a study of maternal mortality rather than under routine surveillance; when the interviewer was female; when the questionnaire combined open and semi-structured parts; and when the diagnosis was assigned by physicians rather than medical assistants. Most of our interviewers were male, and our study did not focus on maternal deaths, but our VA tool performed well for direct maternal deaths. The poor performance of the VA for deaths due to malaria in our study and in child VA studies 2,7 may be because malaria symptoms tend to be non-specific. Alternatively, the gold standard diagnosis for malaria may be less than perfect. In our study, malaria deaths were confirmed with a positive blood slide for only 50% of cases. The main advantage of physician review over algorithms is that all sections of the VA questionnaire, even the open-ended questions and comments, are utilized. This worked particularly well for maternal deaths and neoplasms, each of which comprised several specific causes of death with few symptoms in common. Maternal deaths comprised abortion, eclampsia, haemorrhage, sepsis, and obstructive labour; neoplasms comprised all cancers. We did not identify algorithms for these specific subgroups because of small numbers, but instead found algorithms for the broad causes given above. There are, however, two main disadvantages of physician review compared to algorithms. First, it is very time-consuming: it is estimated that it takes 20 30 physician-minutes to review each VA questionnaire. 1 Second, it is a subjective approach, which means that it may have poor repeatability and it is difficult to compare CSMF between sites. Our study suggests that all three methods may be used to correctly classify individuals with causes of death due to injuries and rabies. Physician review may also be used to correctly classify individuals with deaths due to maternal causes and possibly tuberculosis/aids. Physician review had high diagnostic accuracy for use at the population level for many causes of

DIAGNOSTIC ACCURACY IN ADULT VERBAL AUTOPSIES 1087 death. The expert algorithms did not perform as well, but might have adequate diagnostic accuracy for use at the population level for tuberculosis/aids, acute abdominal conditions, diarrhoeal diseases, and maternal causes. Our proposed data-derived algorithms for tuberculosis/aids, meningitis, diarrhoeal diseases, injuries, and rabies look promising for use at the population level. Further validation is required to ensure that the symptoms in the proposed algorithms are associated with the cause of death in other settings, and that the cutoffs for the dataderived algorithms are appropriate. Acknowledgements This study was supported by the Department for International Development, UK. We are grateful to Gillian Maude. Maria Quigley is supported by the Medical Research Council, UK. References 1 Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ. Verbal autopsies for adult deaths: their development and validation in a multicentre study. Trop Med Int Health 1998;3:436 46. 2 Snow RW, Armstrong JRM, Forster D et al. Childhood deaths in Africa: uses and limitations of verbal autopsies. Lancet 1992;8:351 55. 3 Sachdev HPS et al. Cited in: Measurement of overall and cause-specific mortality in infants and children: Memorandum from a WHO/ UNICEF meeting. Bull World Health Organ 1994;72:707 13. 4 Mirza NM, Macharia WM, Wafula EM et al. Verbal autopsy: a tool for determining cause of death in a community. E Afr Med J 1990;67: 693 98. 5 Kalter HD, Gray RH, Black RE, Gultiano SA. Validation of postmortem interviews to ascertain selected causes of deaths in children. Int J Epidemiol 1990;19:380 86. 6 Osinski P. Cited in: Measurement of overall and cause-specific mortality in infants and children: Memorandum from a WHO/ UNICEF meeting. Bull World Health Organ 1994;72:707 13. 7 Mobley CC. Cited in: Ewbank DC, Gribble JN (eds). Effects of Health Programs on Child Mortality in Sub-Saharan Africa. Washington, USA: National Academic Press, 1993, p.20. 8 Pacque-Margolis S, Pacque M, Dukuly Z et al. Application of the verbal autopsy during a clinical trial. Soc Sci Med 1990;5:585 91. 9 Quigley MA, Armstrong Schellenberg JRM, Snow RW. Algorithms for verbal autopsies: a validation study in Kenyan children. Bull World Health Organ 1996;74:147 54. 10 Anker M. The effect of misclassification error on reported causespecific mortality fractions from verbal autopsy. Int J Epidemiol 1997; 26:1090 96. 11 Maude GH, Ross DA. The effect of different sensitivity, specificity and cause-specific mortality fractions on the estimation of differences in cause-specific mortality rates in children from studies using verbal autopsy. Int J Epidemiol 1997;26:1097 06. 12 Kamali A, Wagner H-U, Nakiyingi J, Sabiiti I, Kengeya-Kayondo JF, Mulder DW. Verbal autopsy as a tool for diagnosing HIV-related adult deaths in rural Uganda. Int J Epidemiol 1996;25:679 84. 13 Todd J, Balira R, Grosskurth H et al. HIV-associated mortality in a rural Tanzanian population. AIDS 1997;11:801 07. 14 Ronsmans C, Vanneste AM, Chakraborty J, Van Ginneken J. A comparison of three verbal autopsy methods to ascertain levels and causes of maternal deaths in Matlab, Bangladesh. Int J Epidemiol 1998;27: 660 66.