Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods

Similar documents
Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

Copy Number Variation Methods and Data

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

CONSTRUCTION OF STOCHASTIC MODEL FOR TIME TO DENGUE VIRUS TRANSMISSION WITH EXPONENTIAL DISTRIBUTION

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

Gurprit Grover and Dulumoni Das* Department of Statistics, Faculty of Mathematical Sciences, University of Delhi, Delhi, India.

Optimal Planning of Charging Station for Phased Electric Vehicle *

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

Using Past Queries for Resource Selection in Distributed Information Retrieval

Study and Comparison of Various Techniques of Image Edge Detection

The effect of salvage therapy on survival in a longitudinal study with treatment by indication

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

Normal variation in the length of the luteal phase of the menstrual cycle: identification of the short luteal phase

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia,

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION.

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

Physical Model for the Evolution of the Genetic Code

National Polyp Study data: evidence for regression of adenomas

Optimal probability weights for estimating causal effects of time-varying treatments with marginal structural Cox models

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

NHS Outcomes Framework

An Introduction to Modern Measurement Theory

Impact of Imputation of Missing Data on Estimation of Survival Rates: An Example in Breast Cancer

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS

Price linkages in value chains: methodology

Validation of the Gravity Model in Predicting the Global Spread of Influenza

Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana,

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

Introduction ORIGINAL RESEARCH

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU

Disease Mapping for Stomach Cancer in Libya Based on Besag York Mollié (BYM) Model

Investigation of zinc oxide thin film by spectroscopic ellipsometry

Estimation of Relative Survival Based on Cancer Registry Data

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

Statistical Analysis on Infectious Diseases in Dubai, UAE

BIOSTATISTICS. Lecture 1 Data Presentation and Descriptive Statistics. dr. Petr Nazarov

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

PSI Tuberculosis Health Impact Estimation Model. Warren Stevens and David Jeffries Research & Metrics, Population Services International

ALMALAUREA WORKING PAPERS no. 9

BIOSTATISTICS. Lecture 1 Data Presentation and Descriptive Statistics. dr. Petr Nazarov

TOPICS IN HEALTH ECONOMETRICS

UNIVERISTY OF KWAZULU-NATAL, PIETERMARITZBURG SCHOOL OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE

EXAMINATION OF THE DENSITY OF SEMEN AND ANALYSIS OF SPERM CELL MOVEMENT. 1. INTRODUCTION

Saeed Ghanbari, Seyyed Mohammad Taghi Ayatollahi*, Najaf Zare

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Evaluation of two release operations at Bonneville Dam on the smolt-to-adult survival of Spring Creek National Fish Hatchery fall Chinook salmon

Integration of sensory information within touch and across modalities

Evaluation of the generalized gamma as a tool for treatment planning optimization

Statistical models for predicting number of involved nodes in breast cancer patients

Arithmetic Average: Sum of all precipitation values divided by the number of stations 1 n

Project title: Mathematical Models of Fish Populations in Marine Reserves

Resampling Methods for the Area Under the ROC Curve

Association between cholesterol and cardiac parameters.

Does reporting heterogeneity bias the measurement of health disparities?

Non-parametric Survival Analysis for Breast Cancer Using nonmedical

A STOCHASTIC EQUATION-BASED MODEL OF THE VALUE OF INTERNATIONAL AIR-TRAVEL RESTRICTIONS FOR CONTROLLING PANDEMIC FLU

Insights in Genetics and Genomics

Rainbow trout survival and capture probabilities in the upper Rangitikei River, New Zealand

Are Drinkers Prone to Engage in Risky Sexual Behaviors?

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

Appendix F: The Grant Impact for SBIR Mills

STAGE-STRUCTURED POPULATION DYNAMICS OF AEDES AEGYPTI

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

Chapter 20. Aggregation and calibration. Betina Dimaranan, Thomas Hertel, Robert McDougall

Effects of Estrogen Contamination on Human Cells: Modeling and Prediction Based on Michaelis-Menten Kinetics 1

We analyze the effect of tumor repopulation on optimal dose delivery in radiation therapy. We are primarily

Bimodal Bidding in Experimental All-Pay Auctions

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

An Approach to Discover Dependencies between Service Operations*

A Meta-Analysis of the Effect of Education on Social Capital

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM

Recent Trends in U.S. Breast Cancer Incidence, Survival, and Mortality Rates

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO

Biased Perceptions of Income Distribution and Preferences for Redistribution: Evidence from a Survey Experiment

Drug Prescription Behavior and Decision Support Systems

Natural Image Denoising: Optimality and Inherent Bounds

What Determines Attitude Improvements? Does Religiosity Help?

Non-linear Multiple-Cue Judgment Tasks

Deposited on: 8 May 2008 Glasgow eprints Service

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel,

4.2 Scheduling to Minimize Maximum Lateness

Are National School Lunch Program Participants More Likely to be Obese? Dealing with Identification

Journal of Economic Behavior & Organization

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex

Combined Temporal and Spatial Filter Structures for CDMA Systems

Concentration of teicoplanin in the serum of adults with end stage chronic renal failure undergoing treatment for infection

Transcription:

Research Artcle Receved 30 September 2009, Accepted 15 March 2010 Publshed onlne n Wley Onlne Lbrary (wleyonlnelbrary.com) DOI: 10.1002/sm.3941 Estmatng the dstrbuton of the wndow perod for recent HIV nfectons: A comparson of statstcal methods Mchael J. Sweetng, a Danela De Angels, a,b John Parry b and Barbara Sulgo c In the past few years a number of antbody bomarkers have been developed to dstngush between recent and establshed Human Immunodefcency Vrus (HIV) nfecton. Typcally, a specfc threshold/cut-off of the bomarker s chosen, values below whch are ndcatve of recent nfectons. Such bomarkers have attracted consderable nterest as the bass for ncdence estmaton usng a cross-sectonal sample. An estmate of HIV ncdence can be obtaned from the prevalence of recent nfecton, as measured n the sample, and knowledge of the tme spent n the recent nfecton state, known as the wndow perod. However, such calculatons are based on a number of assumptons concernng the dstrbuton of the wndow perod. We compare two statstcal methods for estmatng the mean and dstrbuton of a wndow perod usng data on repeated measurements of an antbody bomarker from a cohort of HIV seroconverters. The methods account for the nterval-censored nature of both the date of seroconverson and the date of crossng a specfc threshold. We llustrate the methods usng repeated measurements of the Avdty Index (AI) and make recommendatons about the choce of threshold for ths bomarker so that the resultng wndow perod satsfes the assumptons for ncdence estmaton. Copyrght 2010 John Wley & Sons, Ltd. Keywords: HIV ncdence; bomarker; mxed-effects models; wndow perod 1. Introducton Incdence estmaton has long been the holy gral of Human Immunodefcency Vrus (HIV) epdemologcal research. Estmates of ncdence are needed to montor ongong transmsson, to evaluate nterventons amed to reduce transmsson and to plan resource allocaton for preventon. Cohort studes that follow-up unnfected ndvduals, the gold standard for ncdence estmaton, are expensve to run and can be subject to observatonal bases due to selecton and follow-up adherence [1]. Thus, hstorcally, much effort has been put nto the development of ndrect methods of estmaton [2--6]. Methods based on snapshot, or cross-sectonal, samplng [1, 5, 6] have attracted consderable nterest n recent years as laboratory methods, based on characterstcs of the antbody response soon after nfecton, are beng contnuously developed to dentfy recent nfectons. The dea underlyng these methods, or at least ther smplfed verson, s as follows. Let d bethedateonwhcha cross-sectonal survey s conducted and the sampled ndvduals are tested for HIV and classfed as negatve or postve and, among the postve, as recently nfected or not accordng to the measured level of a chosen bomarker. The prevalence of recent nfecton P(d) atdated can be expressed n terms of the ncdence densty of HIV at tme t, I (t) as P(d)= d 0 I (t)s(d t)dt, (1) 3194 a MRC Bostatstcs Unt, Insttute of Publc Health, Robnson Way, Cambrdge CB2 0SR, U.K. b Health Protecton Agency Centre for Infectons, 61 Colndale Avenue, London NW9 5EQ, U.K. c Natonal AIDS Unt, Department of Infectous Dseases, Isttuto Superore d Santá, Vale Regna Elena 299, 00161 Rome, Italy Correspondence to: Mchael J. Sweetng, MRC Bostatstcs Unt, Insttute of Publc Health, Robnson Way, Cambrdge CB2 0SR, U.K. E-mal: mchael.sweetng@mrc-bsu.cam.ac.uk Contract/grant sponsor: MRC; contract/grant number: U.1052.00.007 Contract/grant sponsor: UK Department of Health; contract/grant number: AIDB 2/29 Copyrght 2010 John Wley & Sons, Ltd. Statst. Med. 2010, 29 3194--3202

where S(t) s the survval functon of the tme spent n the recent nfecton state, the so-called wndow perod. HIV ncdence I (t) s commonly estmated from (1) under two assumptons. Frst, there exsts a maxmum wndow perod w m such that S(w m )=0, and second the ncdence s constant over the past w m years, that s over the calendar perod [d w m,d]. Under these assumpton, Equaton (1) smplfes as follows: d P(d) = I S(d t)dt = I S(x)dx d w m 0 = I μ, (2) where μ s the mean of the wndow perod dstrbuton (see [1] for more detals). The problem of estmatng I then becomes that of usng a cross-sectonal (random) sample to estmate the prevalence of those recently nfected, and to acqure the necessary knowledge of μ. Owng to the assumptons underlyng Equaton (2) t s, therefore, undesrable for w m to be too large and hence the dstrbuton of the wndow perod to have a long tal. In the last 10 years a number of assays have been proposed to detect recent nfectons. The orgnal procedure nvolved testng ndvduals usng Senstve/Less Senstve (S/LS) commercal antbody assays (e.g. 3A11-LS, LS EIA), n order to detect dfferental HIV ttre [7]. More recently a bomarker has been proposed based on the prncple that antbodes produced early after nfecton bnd less strongly to the antgen than those produced n establshed nfecton [8]. The avdty of the antbodes to bnd to the antgen can be measured usng the Avdty Index (AI). The AI s calculated by dvdng the sample-to-cutoff (S/CO) rato from a low-avdty sample treated wth guandne by the S/CO rato from a control sample, more detals of whch can be found n [9]. For early nfecton, weak bndng causes the level of antbodes n the treated sample to be less than that n the control, and hence the AI takes values less than one. For more establshed nfecton, antbody levels n the two samples are smlar and hence the AI approaches a value of one. Condtonally on the choce of a specfc threshold, commonly 0.8, ndvduals wth measured AI below the threshold are classfed as recently nfected and the wndow perod s the tme spent below the chosen threshold. It s clear that the wndow perod s a fundamental ngredent n the estmaton of HIV ncdence. It depends on the rate of antbody response and hence can vary consderably between ndvduals. By rasng or lowerng the assocated threshold, the wndow perod can be lengthened or shortened, respectvely. If t s too short very few ndvduals are classfed as recently nfected, resultng n a loss of precson for ncdence estmaton; too long and the assumpton of a constant ncdence s no longer vable. Hence knowledge about the dstrbuton of the wndow perod, not just ts mean, s essental. Despte ths, t s commonly the case that a threshold s chosen based on the dagnostc accuracy of classfyng ndvduals as recently nfected, where true recency s defned as a certan perod post-seroconverson, rather than based on the resultng dstrbuton of the wndow perod. The am of ths paper s to llustrate two statstcal methods for estmatng the dstrbuton of threshold-specfc wndow perods. The frst method (Secton 2) mplements a doubly censored survval analyss approach to obtan a non-parametrc estmate of the wndow perod dstrbuton. The second method (Secton 3) s based on modellng the ndvdual growth curves of the bomarker usng mxed-effects models, and nvertng the functonal relatonshp to obtan estmates of the wndow perod dstrbuton. In Secton 4, we apply the methods to data from a cohort of HIV nfected ndvduals. For each ndvdual AI measurements are avalable longtudnally and the dates of the last negatve and frst postve HIV antbody test are known. Fnally n Secton 5 we make recommendatons for the choce of threshold assocated wth the AI assay so that the resultng wndow perod dstrbuton s lkely to satsfy the assumptons requred for Equaton (2). wm 2. Estmaton usng non-parametrc survval analyss Suppose data on n ndvduals consst of the dates of the last negatve and the frst postve test results, as establshed usng the standard enzyme mmunoassay, and repeated measurements of an antbody bomarker. For ndvdual we have dates d ve,d +ve and a sequence of m measurements y j, taken at tmes t j, measured from the frst postve test date d +ve.note that the nterval (d ve,d +ve ] s the (seroconverson) nterval wthn whch ndvdual has seroconverted. The am s, for a gven bomarker threshold α, to estmate the dstrbuton of the tme from seroconverson tll the bomarker crosses α (Fgure 1). Let X and Z denote the unknown date of seroconverson and crossng the threshold, respectvely. For ndvdual we know that X (d ve,d +ve ] (x L, x U ]. Further, f the growth of the bomarker s assumed to be monotoncally ncreasng wth no measurement error, then we also know that Z (d +ve +t k 1,d +ve +t k ] (z L, zu ], where the kth measurement s the frst tme after testng postve that the bomarker s observed above the threshold. If an ndvdual s observed to 3195 Copyrght 2010 John Wley & Sons, Ltd. Statst. Med. 2010, 29 3194--3202

Fgure 1. Typcal data avalable from an ndvdual wth repeated bomarker measurements. The wndow perod s defned as the unknown tme from seroconverson to crossng the threshold, α. be above the threshold on ther frst measurement, then the only nformaton s that Z (d ve,d +ve +t 1 ]. Conversely, f an ndvdual s never observed to be above the threshold then Z s rght censored and Z (d +ve +t m, ]. The wndow perod for threshold α s defned as T = Z X. To estmate the dstrbuton of the wndow perod correctly the bvarate densty for (X, Z) needs to be modelled from whch the dstrbuton for T can be derved. Smlar technques have been used to estmate the tme from seroconverson to AIDS [10, 11]. A unvarate survval analyss of the ntervalcensored data T (z L x U, z U x L ] assumes an ncorrect lkelhood and hence such an approach should be avoded [12], although Rech et al. [13] fnd that such an approach can provde relable estmates of the medan. For each ndvdual, the par (x, z) are known to le wthn the regon R =(x L, x U ] (z L, zu ] and hence the lkelhood of the observatons for n ndvduals s L = n m(x, z;θ)dx dz, R =1 3196 where m(x, z;θ) s the jont densty of (X, Z) gven parameters θ. To obtan the non-parametrc maxmum lkelhood estmate (NPMLE) of m(x, z), De Gruttola and Lagakos [10] generalzed the self-consstency algorthm of Turnbull [14] for sngly censored unvarate data. As an example, Fgure 2 shows rectangles R from sx fctonal ndvduals to llustrate where the NPMLE assgns mass. The shaded regons wth bold outlne show where the NPMLE mass les. Gentleman and Vandal [12] used concepts from graph theory to show that all the mass assocated wth the NPMLE le wthn the maxmal ntersectons of the rectangles R 1,..., R n. One dffculty for bvarate nterval-censored data s that the NPMLE estmate may be non-unque. Representatonal non-unqueness occurs whenever the maxmal ntersectons of the rectangles are not ponts, as s the case n Fgure 2. The NPMLE does not ndcate how the mass wthn each ntersecton should be assgned. Ths can be extremely problematc f much of the data are rght censored n one dmenson, as occurs wth lne segment F. The probablty mass must then be assgned over an nfnte lne segment. To overcome ths non-unqueness one must make some parametrc assumpton. The mass could be dstrbuted parametrcally over the ntersectons (e.g unformally), or all mass could be placed at a sngle pont, for example at the mdpont of each regon. For estmaton of the wndow perod T = Z X one useful approach s to assgn all the mass to ether the bottom rght-hand corner or the top left-hand corner of each maxmal ntersecton regon. Dong ths allows us to obtan an upper and lower lmt, respectvely, for the cumulatve dstrbuton functon (CDF) of T. Such an approach s demonstrated n Secton 4 where we obtan upper and lower lmts for the wndow perod CDF from an HIV cohort wth longtudnal AI measurements. Copyrght 2010 John Wley & Sons, Ltd. Statst. Med. 2010, 29 3194--3202

Fgure 2. Illustraton of 6 ndvduals wth unknown date of seroconverson and unknown date of crossng threshold. Each rectangle represents the regon n whch the pont (x, z) s known to le for that ndvdual. The shaded areas show the regons where all the mass of the NPMLE estmate les. 3. Modellng the growth of the bomarker The survval analyss approach presented n Secton 2 does not utlze all the repeated bomarker measurements that are avalable. Furthermore, t does not allow for measurement error n the bomarker process, snce the date of crossng the threshold s assumed to le between the last date observed to be below and the frst date observed to be above the threshold. In addton, the NPMLE s lkely to be non-unque and hence a parametrc assumpton must be made. Ths warrants an alternatve parametrc approach where the growth of the bomarker s modelled. To llustrate, we develop a growth model for the AI, but smlar parametrc models can be developed for other antbody bomarkers. It shall be assumed, wth some bologcal ratonale, that the underlyng (latent) antbody response to HIV nfecton ncreases monotoncally over a perod of tme. Furthermore, for establshed nfecton we assume the AI to approach a value of one. These two observatons lead us to consder a non-lnear monotoncally ncreasng functon wth an asymptote n whch to model the growth of the AI over tme. Let Tj =τ +t j be the unknown tme from seroconverson to the jth measurement, where τ s the unknown tme from seroconverson to the frst postve date for the th ndvdual. Then the growth of the AI can be modelled usng the followng mxed-effects model: y j = 0 +( 1 0 )exp( exp( 2 )T j )+ε j. (3) Ths three-parameter non-lnear functon s monotoncally ncreasng, and approaches an asymptote, 0,astmetends to nfnty. The parameter 1 s the ntercept and s nterpreted as the value of the AI at seroconverson, and 2 s the logarthm of the rate constant. The ε j N(0,σ 2 w ) are normally dstrbuted error terms. The parameters, 0, 1,and 2 are specfed as random-effects, to allow between person varablty although n Secton 4 we shall consder whether each can be modelled as a fxed-effect. The random-effects are modelled usng a multvarate Normal dstrbuton, whch allows us to borrow strength between ndvduals, ( 0, 1, 2 ) T N 3 ((μ 0,μ 1,μ 2 ) T,Σ b ), where Σ b s an unstructured varance covarance matrx. The fnal requrement s the specfcaton of τ for each ndvdual. One assumpton s that the seroconverson date s exactly at the mdpont of the seroconverson nterval, and hence τ =(d +ve d ve )/2. Ths however dsregards any uncertanty about the date of seroconverson. Hence we wll label the model usng ths assumpton as the naïve model. More realstcally a dstrbuton can be placed on τ, and wthout further knowledge about testng strateges or AI measurements, a sensble aprorbelef s that τ Unform(0,d +ve d ve ) (.e. that seroconverson s equally lkely to occur at any tme durng the seroconverson nterval). We shall label ths model the unform pror model. Snce our belefs about the dstrbuton of τ may change once we model the growth of the AI measurements, t s ntutve to perform such analyses from a Bayesan vewpont. In Secton 4 we shall nvestgate the use of both the naïve and unform pror models n ths framework. 3197 Copyrght 2010 John Wley & Sons, Ltd. Statst. Med. 2010, 29 3194--3202

3.1. Inverse predcton The am s to estmate the wndow perod,.e. the tme from seroconverson to crossng a specfed threshold, α, usng the ftted parametrc model. Ths can be acheved usng an nverse predcton technque. For a threshold α, the assocated wndow perod for ndvdual, T (α), can be expressed as a functon of the random-effects for patent usng Equaton (3): ( ) T 1 (α;/ )=log 0 exp( α 2 ). (4) 0 Note that, snce we are nterested n the tme to truly crossng threshold α, ths predcton does not nclude the measurement error term. Usng the relaton (4) the posteror dstrbuton of the wndow perod for ndvdual, p(t (α;/ ) y ), can be easly derved usng Markov chan Monte Carlo (MCMC) methods, together wth the posteror dstrbuton of any functon of the T (α) s. Specfcally, the average wndow perod n the gven sample can be calculated as μ(α;/ 1,...,/ n )= n=1 T (α;/ )/n and ts dstrbuton obtaned over the MCMC teratons. A further use of the parametrc model s to obtan a predctve dstrbuton of the wndow perod for a new ndvdual (an out-of-sample predcton). Ths s acheved by frst samplng new random-effects 0new, 1new,and 2new from the posteror dstrbuton of the random-effects superpopulaton. Then for each realzaton we can calculate the assocated wndow perod T new (α;/ new)=log ( 1new 0new α 0new ) exp( 2new ). The mean, medan, and percentles for ths out-of-sample predcton can agan be easly estmated from the MCMC sample. The predcton s for a generc new ndvdual and hence automatcally accounts for between ndvdual varablty. 4. Illustraton: repeated AI measurements from a cohort of HIV seroconverters 4.1. Data Data are avalable on 175 HIV seroconverters consstng of the last negatve and the frst postve test dates, usng a standard mmunoassay (Abbott AxSYM HIV 1/2 go), together wth a seres of AI measurements post HIV dagnoss. All ndvduals were dentfed as frst-tme HIV-postves n voluntary counselng and testng centres located n 4 hosptals (2 n Rome, 1 n Turn, and 1 n Bresca). Of these ndvduals, 72 had one AI measurement and therefore provde no nformaton on the growth rate of AI. Hence ths analyss uses data from the 103 ndvduals wth two or more AI measurements. There are on average four AI measurements per person, wth the maxmum number of measurements beng 10, and the mnmum 2. The tme between the last negatve and the frst postve test (the seroconverson nterval) s on average 3.6 months, but there s consderable varaton between patents, wth ntervals rangng from 2 days to 18 months. The mean value of the frst AI measurement after dagnoss s 0.60, but agan there s consderable varaton (range 0.19, 1.09), reflectng the fact that ndvduals are dagnosed at dfferent perods post seroconverson. Indvdual growth patterns of AI are shown n Fgure 3, where tme s plotted from the mdpont of the seroconverson nterval. Most of the AI growth occurs wthn the frst 12 months of the seroconverson mdpont, and an asymptote close to 1 s soon reached. However, there s a large varaton n measurements taken close to the seroconverson mdpont. Ths could be due to the natural varaton between ndvduals, and/or because the exact date of seroconverson s unknown. 4.2. Non-parametrc wndow perod estmaton The date of crossng the 0.8 threshold s rght censored for many ndvduals, especally those who seroconverted after 2002. Ths causes non-unqueness n the NPMLE as dscussed n Secton 2. Usng the MLEcens package n R [15] we calculate the upper and lower bounds of the CDF for the wndow perod. These are shown as dotted lnes n Fgure 4. Clearly, nferences based on these non-parametrc lmts are of lttle practcal use, suggestng that the growth of the AI should be modelled parametrcally. 3198 4.3. Longtudnal modellng To ft the growth models we use a Bayesan approach mplemented through MCMC n WnBUGS [16], code for whch s avalable from the authors on request. Non-nformatve Gaussan prors are placed on the means μ 0, μ 1,andμ 2, whle σ 2 w s gven an nverse-gamma(0.001,0.001) pror. To ensure that Σ b s postve-defnte, an nverse-wshart pror dstrbuton s used wth degrees of freedom equal to one plus the dmenson of Σ b. Ths has the effect of placng a unform dstrbuton on each of the correlaton parameters [17]. Results (not shown) from the unform pror non-lnear mxed-effects model ndcate that even after accountng for unknown seroconverson date there s stll consderable Copyrght 2010 John Wley & Sons, Ltd. Statst. Med. 2010, 29 3194--3202

1.2 1.0 0.8 AI 0.6 0.4 0.2 0.0 0 1 2 3 4 5 6 7 8 9 10 Years snce seroconverson nterval mdpont Fgure 3. Data from 103 ndvduals showng the growth of the AI, where the tme orgn s defned as the mdpont of the seroconverson nterval. 1.0 0.8 CDF of wndow perod 0.6 0.4 0.2 NPMLE upper and lower bounds 0.0 0.0 0.5 1.0 1.5 2.0 Years Fgure 4. Cumulatve dstrbuton functon of the predcted wndow perod for a new ndvdual for an AI threshold of 0.8. The sold lne shows the dstrbuton from the non-lnear mxed-effects model. The dotted lnes show the upper and lower bounds as calculated from the NPMLE. evdence that ntercepts should be treated as random-effects (fxed effect vs random-effects devance nformaton crteron (DIC) [18], 797 vs 892). There s also varaton n the ndvdual rates of growth, but no evdence for random asymptotes (fxed vs random DIC, 892 vs 874). Table I shows estmates from the naïve and unform pror non-lnear mxed-effects models treatng the asymptote as a fxed effect. The asymptote s estmated to be just over 1 for both models and the populaton mean ntercept,.e. the value of the AI at seroconverson, s approxmately 0.35. There s a slght dfference between the models n the populaton mean estmate of the log-rate parameter: on average the rate of growth s less n the unform pror model. However, the man dfference between the two models s n the estmate of the between-subject standard devatons (SD). The between-subject SD for the ntercept s lower n the unform pror model, whereas the SD for the log-rates s slghtly hgher. The reduced varablty n the ntercepts s explaned by properly accountng for the unknown date of seroconverson, allowng for a more homogenous populaton at seroconverson. The slghtly ncreased varablty n the log-rates gves a more honest reflecton of our uncertanty n the growth rate of the AI when date of seroconverson s unknown. The unform pror model also has a lower DIC suggestng a better model for predctve accuracy. 3199 Copyrght 2010 John Wley & Sons, Ltd. Statst. Med. 2010, 29 3194--3202

Table I. Parameter estmates from the naïve and unform pror non-lnear mxed-effects models. Naïve model Unform pror model Parameter Posteror medan (SD) Posteror medan (SD) Populaton means Asymptote μ 0 1.017 (0.007) 1.016 (0.006) Intercept μ 1 0.346 (0.023) 0.349 (0.021) Log-rate μ 2 0.934 (0.119) 0.964 (0.122) Between ndvdual standard devatons Intercept 0.145 (0.021) 0.125 (0.019) Log-rate 0.835 (0.109) 0.860 (0.110) Correlaton between ntercepts and log-rates 0.53 (0.14) 0.59 (0.14) Wthn-ndvdual standard devaton 0.076 (0.003) 0.074 (0.003) Posteror mean devance 967.3 991.6 Effectve no. of parameters 96.7 99.6 Devance nformaton crteron 870.6 892.0 Table II. In-sample wndow perod and out-of-sample probabltes of reachng threshold wthn gven tme perods, for the naïve and unform pror models. In-sample wndow perod, days Predcted out-of-sample probablty of reachng threshold 0 3 0 6 0 9 0 12 Model Threshold Mean Medan 90th percentle months months months months Naïve model 0.60 72 59 145 0.73 0.94 0.98 1.00 (63, 86) (49, 68) (117, 184) 0.70 125 100 244 0.46 0.80 0.92 0.96 (109, 149) (86, 114) (197, 314) 0.75 160 125 309 0.34 0.70 0.86 0.93 (139, 191) (108, 144) (249, 405) 0.80 203 156 391 0.24 0.58 0.78 0.88 (175, 244) (135, 180) (312, 520) Unform pror model 0.60 71 56 139 0.75 0.95 0.98 0.99 (61, 85) (46, 66) (112, 180) 0.70 125 97 242 0.46 0.81 0.92 0.96 (108, 149) (82, 112) (193, 319) 0.75 160 122 311 0.34 0.70 0.86 0.93 (138, 192) (104, 141) (246, 413) 0.80 202 152 395 0.25 0.59 0.78 0.88 (174, 245) (129, 176) (310, 529) Posteror medan and 95 per cent credble ntervals presented for each quantty. 3200 An nterestng queston s whether t s possble to use the nformaton on the longtudnal values of the bomarker to nfer the unknown seroconverson date. Ths can be assessed by lookng at the dfference between the pror and posteror dstrbutons for τ. If any learnng about τ s takng place, we would expect the dfference between the pror and posteror SD to be postve, and the dfference between pror and posteror means to gve nformaton about the drecton of any shft. For the majorty of ndvduals (66 per cent) the dfference between the pror and posteror mean for τ s small (wthn 5 per cent of the length of the seroconverson perod), and for these ndvduals the pror and posteror SDs are very smlar suggestng that lttle s learnt about the seroconverson tme. However, for 17 of the ndvduals the pror SD s notceably greater than the posteror (over 5 per cent of the length of the seroconverson nterval). Ths mples that t s possble to learn about the seroconverson tme. For these ndvduals we estmate the posteror mean seroconverson tme to be closer to the last negatve test date (.e. towards the begnnng of the seroconverson nterval) for 11 17 (65 per cent) of the subjects. Predctons for the n-sample and out-of-sample wndow perod dstrbutons are gven n Table II. In ths applcaton both models provde almost dentcal predctons of these dstrbutons. The estmated mean tme for the sample to cross the 0.8 threshold s 202 days (95 per cent CrI 174, 245). However, the dstrbuton of the wndow perod s rght-skewed snce the mean s consderably larger than the medan. The estmated 90th percentle of Copyrght 2010 John Wley & Sons, Ltd. Statst. Med. 2010, 29 3194--3202

the wndow perod dstrbuton s estmated wth hgh mprecson for ths sample of ndvduals. For a threshold of 0.8, the model predcts that only 88 per cent of new ndvduals wll cross the threshold wthn 12 months of seroconverson. Ths compares wth 93 per cent for a threshold of 0.75, 96 per cent for a threshold of 0.7, and 99 per cent for a threshold of 0.6. The complete CDF for a threshold of 0.8 s shown as the sold lne n Fgure 4. As expected, the predcton of the CDF from ths parametrc model les wthn the non-parametrc bounds. 5. Dscusson In ths paper we have estmated the dstrbuton of a wndow perod usng two statstcal methods. For the AI we estmate the wndow perod assocated wth a number of dfferent thresholds. A threshold of 0.8 has prevously been suggested as a cut-off to classfy ndvduals as recently nfected, based on senstvty and specfcty of the bomarker 6 months after the mdpont of the seroconverson nterval [8, 19]. We fnd such a threshold to be assocated wth a mean wndow perod of 202 days (95 per cent CrI 174, 245). The probablty that the wndow perod s longer than one year s not nsubstantal, estmated to be 12 per cent. For a perod of three years ths probablty drops to less than 1 per cent. A threshold of 0.75 or 0.7 may be an alternatve choce for ncdence estmaton, snce the probablty of the wndow perod beng greater than one year s low, at 7 and 4 per cent, respectvely. For Equaton (2) to hold, a long-taled dstrbuton for the wndow perod s undesrable and can volate the assumpton of a constant ncdence over the duraton of the wndow perod [1]. Indeed wth all ncdence assays the full dstrbuton of the wndow perod, and not just the mean, should be consdered and explored before use n ncdence estmaton, a fact that s often gnored. The use of a mxed-effects model to descrbe the growth of an antbody assay whle ncorporatng uncertanty assocated wth the seroconverson tme s novel. Many prevous analyses have assumed seroconverson to be at the mdpont of the seroconverson nterval [8, 20--22]. We have shown that, for our sample, ths naïve approach and the unform pror approach result n almost dentcal predctons of the wndow perod. Ths could be due to the relatvely short duraton between the last negatve and the frst postve tests n our cohort or the fact that the pror mean for the unform pror model s the mdpont of the seroconverson nterval. However, for a cohort wth longer seroconverson ntervals the unform pror model would be more realstc, so that the uncertanty n the data s properly accounted for. Ths model reflects our aprorbelef that seroconverson s equally lkely to occur anywhere between the last negatve and the frst postve test. Other choces of dstrbuton can easly be ncorporated to reflect dfferent aprorbelefs. By comparng pror and posteror dstrbutons t s clear that for some ndvduals we have been able to learn about ther date of seroconverson from ther longtudnal seres of AI measurements. However, for other ndvduals very lttle nformaton about ther date of seroconverson can be gleaned. The mxed-effects model could therefore n theory be used to predct the date of seroconverson for a new ndvdual gven nformaton on ther AI measurements and seroconverson nterval. The estmates of the mean wndow perod presented here for the AI now requre valdaton from other studes of seroconverters. It s mperatve that selecton bases are avoded when recrutng such cohorts n order for the dstrbuton of the wndow perod to be as representatve as possble. We have found consderable varaton n the growth of the AI between ndvduals, although the heterogenety could not be further nvestgated snce no covarate nformaton about these ndvduals was avalable. However, a prevous study has found no effect of antretrovral treatment, protease nhbtors, sex, or age on the growth rate of the AI, suggestng that t s potentally a robust bomarker [8]. Nevertheless, further research nto the characterstcs of the AI s warranted to enable ts use for ncdence estmaton. References 1. Brookmeyer R. Should bomarker estmates of HIV ncdence be adjusted? AIDS 2009; 23(4):485--491. 2. Brookmeyer R, Gal MH. A method for obtanng short-term projectons and lower bounds on the sze of the AIDS epdemc. Journal of the Amercan Statstcal Assocaton 1988; 83:301--308. 3. Isham V. Mathematcal-modelng of the transmsson dynamcs of HIV nfecton and AIDS a revew. Journal of the Royal Statstcal Socety Seres A Statstcs n Socety 1988; 151:5--30. 4. Ades AE, Medley GF. Estmates of dsease ncdence n women based on antenatal or neonatal seroprevalence data: HIV n New York Cty. Statstcs n Medcne 1994; 13(18):1881--1894. 5. Kaplan E. Snapshot samples. Soco-Economc Plannng Scences 1997; 31:281--291. 6. Karon JM, Song R, Brookmeyer R, Kaplan EH, Hall HI. Estmatng HIV ncdence n the Unted States from HIV/AIDS survellance data and bomarker HIV test results. Statstcs n Medcne 2008; 27(23):4617--4633. 7. Janssen RS, Satten GA, Stramer SL, Rawal BD, O Bren TR, Weblen BJ, Hecht FM, Jack N, Cleghorn FR, Kahn JO, Chesney MA, Busch MP. New testng strategy to detect early HIV-1 nfecton for use n ncdence estmates and for clncal and preventon purposes. Journal of the Amercan Medcal Assocaton 1998; 280:42--48. 3201 Copyrght 2010 John Wley & Sons, Ltd. Statst. Med. 2010, 29 3194--3202

8. Sulgo B, Mass M, Gall C, Scandra M, D Sora F, Pezzott P, Reccha O, Montella F, Sncco A, Rezza G. Identfyng recent HIV nfectons usng the avdty ndex and an automated enzyme mmunoassay. Journal of Acqured Immune Defcency Syndromes 2003; 32:424--428. 9. Chawla A, Murphy G, Donnelly C, Booth CL, Johnson M, Parry JV, PhllpS A, Gerett AM. Human mmunodefcency vrus (HIV) antbody avdty testng to dentfy recent nfecton n newly dagnosed HIV type 1 (HIV-1)-seropostve persons nfected wth dverse HIV-1 subtypes. Journal of Clncal Mcrobology 2007; 45:415--420. 10. De Gruttola V, Lagakos SW. Analyss of doubly-censored survval data, wth applcaton to AIDS. Bometrcs 1989; 45(1):1--11. 11. Km MY, De Gruttola V, Lagakos SW. Analyzng doubly censored data wth covarates, wth applcaton to AIDS. Bometrcs 1993; 49(1):13--22. 12. Gentleman R, Vandal AC. Nonparametrc estmaton of the bvarate CDF for arbtrarly censored data. Canadan Journal of Statstcs 2002; 30:557--571. 13. Rech NG, Lessler J, Cummngs DAT, Brookmeyer R. Estmatng ncubaton perod dstrbutons wth coarse data. Statstcs n Medcne 2009; 28:2769--2784. 14. Turnbull BW. Emprcal dstrbuton functon wth arbtrarly grouped, censored and truncated data. Journal of the Royal Statstcal Socety Seres B-Methodologcal 1976; 38:290--295. 15. Maathus M. MLEcens: Computaton of the MLE for bvarate (nterval) censored data. R Package Verson 0.1-2, 2007. Avalable from: http://www.stat.washngton.edu/marloes. 16. Spegelhalter D, Thomas A, Best NDL. WnBUGS Verson 1.4 User Manual. MRC Bostatstcs Unt, Cambrdge, 2003. 17. Gelman A, Hll J. Data Analyss usng Regresson and Multlevel/Herarchcal Models. Cambrdge Unversty Press: Cambrdge, 2007. 18. Spegelhalter DJ, Best NG, Carln BP, van der Lnde A. Bayesan measures of model complexty and ft. Journal of the Royal Statstcal Socety Seres B-Statstcal Methodology 2002; 64:583--616. 19. Gall C, Boss V, Regne V, Rodella A, Manca N, Camon L, Sulgo B. Accuracy of dfferent thresholds for the ant-hiv avdty ndex. Mcrobologa Medca 2008; 23:59--63. 20. Parekh BS, Hu DJ, Vanchsen S, Satten GA, Candal D, Young NL, Ktayaporn D, Srsuwanvla LO, Rakhtam S, Janssen R, Choopanya K, Mastro TD. Evaluaton of a senstve/less-senstve testng algorthm usng the 3A11-LS assay for detectng recent HIV seroconverson among ndvduals wth HIV-1 subtype B or E nfecton n Thaland. AIDS Research and Human Retrovruses 2001; 17(5):453--458. 21. Parekh BS, Kennedy MS, Dobbs T, Pau CP, Byers R, Green T, Hu DJ, Vanchsen S, Young NL, Choopanya K, Mastro TD, McDougal JS. Quanttatve detecton of ncreasng HIV type 1 antbodes after seroconverson: a smple assay for detectng recent HIV nfecton and estmatng ncdence. AIDS Research and Human Retrovruses 2002; 18(4):295--307. 22. McDougal JS, Parekh BS, Peterson ML, Branson BM, Dobbs T, Ackers M, Gurwth M. Comparson of HIV type 1 ncdence observed durng longtudnal follow-up wth ncdence estmated by cross-sectonal analyss usng the BED capture enzyme mmunoassay. AIDS Research and Human Retrovruses 2006; 22(10):945--952. 3202 Copyrght 2010 John Wley & Sons, Ltd. Statst. Med. 2010, 29 3194--3202