INTER LABORATORY REPRODUCIBILITY AND PROFICIENCY TESTING WITHIN THE HUMAN PAPILLOMAVIRUSES CERVICAL CANCER SCREENING PROGRAM IN CATALONIA, SPAIN

Similar documents
Protecting the underscreened women in developed countries: the value of HPV test

Screening for Cervical Cancer. Grand Rounds 1/16/13 Meggan Linck

Chapter 5. M.G. Dijkstra L. Rozendaal M. van Zummeren F.J. van Kemenade P.J.F. Snijders C.J.L.M. Meijer J. Berkhof. Submitted for publication

Original Article Evaluation of the management of Hr-HPV+/PapTest- women: results at 1-year recall

Comparison of Clinical Performance of Abbott RealTime High Risk HPV Test with That of Hybrid Capture 2 Assay in a Screening Setting

B. Eaton, J Reid, D. Getman, and J Dockter are employees of Hologic, Inc.

Clinical Evaluation of the Cartridge-Based GeneXpert Human Papillomavirus Assay in Women Referred for Colposcopy

Abnormal Cervicovaginal Cytology With Negative Human Papillomavirus Testing

Performance of the Aptima High-Risk Human Papillomavirus mrna Assay in a Referral Population in Comparison with Hybrid Capture 2 and Cytology

Biomarkers and HPV testing: The future of cervical screening

HPV16/18 genotyping for the triage of HPV positive women in primary cervical cancer screening in Chile

Edinburgh Research Explorer

Jean-Christophe Noël and Philippe Simon. 1. Introduction

Clinical Performance of Roche COBAS 4800 HPV Test

Ajmal Akbari 1, Davy Vanden Broeck 1,2,3,4*, Ina Benoy 1,2,4, Elizaveta Padalko 2,5, Johannes Bogers 1,2,3,4 and Marc Arbyn 6

Type-specific HPV E6/E7 mrna detection by real-time PCR improves

Xinxin Du, Shufang Jiang, Aijun Liu, Yurong Fu, Yun Zhang, Yuanguang Meng *

Supplements to the European Guidelines on Prevention of Cervical Cancer

Evidence-based treatment of a positive HPV DNA test. Th. Agorastos Prof. of Obstetrics & Gynaecology Aristotle University Thessaloniki/GR

Woo Dae Kang, Ho Sun Choi, Seok Mo Kim

Human Papillomavirus and Papanicolaou Tests to Screen for Cervical Cancer

The devil is in the details

News. Laboratory NEW GUIDELINES DEMONSTRATE GREATER ROLE FOR HPV TESTING IN CERVICAL CANCER SCREENING TIMOTHY UPHOFF, PHD, DABMG, MLS (ASCP) CM

The BD Onclarity HPV assay on SurePath collected samples meets the. International Guidelines for Human Papillomavirus Test Requirements for

Are 20 human papillomavirus types causing cervical cancer?

Comparative evaluation of different DNA extraction methods for HPV genotyping by Linear Array and INNO-LiPA

Downloaded from:

The Korean Journal of Cytopathology 15 (1) : 17-27, 2004

Michela Iacobellis 1, Cecilia Violante 1, Gabriella Notarachille 1, Angela Simone 2, Rosa Scarfì 2 and Giuseppe Giuffrè 2*

HUMAN PAPILLOMAVIRUS TESTING FOR PRIMARY CERVICAL CANCER SCREENING A Technology Assessment

Human Papillomaviruses: Biology and Laboratory Testing

HUMAN PAPILLOMAVIRUS TESTING

Use of a high-risk human papillomavirus DNA test as the primary test in a cervical cancer screening programme: a population-based cohort study

Triaging borderline/mild dyskaryotic Pap cytology with p16/ki-67 dual-stained cytology testing: cross-sectional and longitudinal outcome study

HPV Molecular Diagnostics and Cervical Cytology. Philip E. Castle, PhD, MPH American Society for Clinical Pathology (ASCP) March 15, 2012

Natural History of HPV Infections 15/06/2015. Squamous cell carcinoma Adenocarcinoma

Evaluation of a Newly Developed GenoArray Human Papillomavirus (HPV) Genotyping Assay and Comparison with the Roche Linear Array HPV Genotyping Assay

The clearest path to the most meaningful results. The cobas HPV Test delivers clinical value with workflow efficiencies every step of the way

Comparison of HPV test versus conventional and automation-assisted Pap screening as potential screening tools for preventing cervical cancer

HPV vaccination in the Netherlands

No HPV High Risk Screening with Genotyping. CPT Code: If Result is NOT DETECTED (x3) If Results is DETECTED (Genotype reported)

Validation of an automated detection platform. for use with the Roche Linear Array HPV Genotyping Test ACCEPTED SEPEHR N.

Promoting Cervical Screening Information for Health Professionals. Cervical Cancer

Cytology history preceding cervical cancer diagnosis: a regional analysis of 286 cases

HPV Genotyping: A New Dimension in Cervical Cancer Screening Tests

HPV TESTING AND UNDERSTANDING VALIDITY: A tough row to hoe. Mark H. Stoler, MD ASC Companion Meeting USCAP 2008

Philip E. Castle, Patti E. Gravitt, Diane Solomon, Cosette M. Wheeler and Mark Schiffman

POET REPORT Perspectives on Emerging Technology

FREQUENCY AND RISK FACTORS OF CERVICAL Human papilloma virus INFECTION

RESEARCH COMMUNICATION. Type-specific Human Papillomavirus Distribution in Invasive Cervical Cancer in Korea,

A Cytologic/Histologic Review of 367 Cases. Original Article. Cancer Cytopathology August 25,

Universal human papillomavirus genotyping by the digene. HPV Genotyping RH and LQ Tests

Can HPV-16 Genotyping Provide a Benchmark for Cervical Biopsy Specimen Interpretation?

Comparison of the DiagCor GenoFlow Human Papillomavirus Array Test and Roche Linear Array HPV Genotyping Test

Clinical Relevance of HPV Genotyping. A New Dimension In Human Papillomavirus Testing. w w w. a u t o g e n o m i c s. c o m

Methods for HPV Detection: Polymerase Chain Reaction Assays

Prevalence and Determinants of High-risk Human Papillomavirus Infection in Women with High Socioeconomic Status in Seoul, Republic of Korea

Original Article Dysregulation of mirna-21 and their potential as biomarkers for the diagnosis of cervical cancer

Human Papillomavirus Genotype Specificity of Hybrid Capture 2. Low-risk Probe Cocktail

Opinion: Cervical cancer a vaccine preventable disease

RESEARCH. Long term predictive values of cytology and human papillomavirus testing in cervical cancer screening: joint European cohort study

Type-Specific Human Papillomavirus E6/E7 mrna Detection by Real-Time PCR Improves Identification of Cervical Neoplasia

Abstract. Ogilvie et al. BMC Cancer 2010, 10:111

Prevalence of unclassified HPV genotypes among women with abnormal cytology

NIH Public Access Author Manuscript Lancet Oncol. Author manuscript; available in PMC 2012 February 6.

The PapilloCheck Assay for the Detection of High Grade Cervical Intraepithelial Neoplasia

Biomed Environ Sci, 2015; 28(1): 80-84

Gynecologic Oncology

Human papillomavirus infections among Japanese

Clinical Policy Title: Fluorescence in situ hybridization for cervical cancer screening

THE NEW ERA OF PRIMARY HPV SCREENING FOR PREVENTION OF INVASIVE CERVICAL CANCER

Philip E. Castle, Diane Solomon, Mark Schiffman, Cosette M. Wheeler for the ALTS Group

VALGENT (validation of HPV genotyping assays) implications for the introduction of primary HPV screening

Screening for Cervical Cancer Reference List

Attitudes to self-sampling of vaginal smear for human papilloma virus analysis among women not attending organized cytological screening

Cross-Sectional Comparison of an Automated Hybrid Capture 2 Assay and the Consensus GP5 /6 PCR Method in a Population-Based Cervical Screening Program

These comments are an attempt to summarise the discussions at the manuscript meeting. They are not an exact transcript.

Int J Clin Exp Pathol 2018;11(11): /ISSN: /IJCEP

A systematic review of the role of human papilloma virus (HPV) testing within a cervical screening programme: summary and conclusions

Received 14 December 2005/Returned for modification 17 February 2006/Accepted 1 May 2006

HPV-Negative Results in Women Developing Cervical Cancer: Implications for Cervical Screening Options

On behalf of the New Technologies for Cervical Cancer Screening Working Group

Primary High Risk HPV Testing with Cytology Triage

Human Papillomavirus (HPV) in Atypical Squamous Cervical Cytology: the Invader HPV Test as a New Screening Assay

Human Papillomaviruses and Cancer: Questions and Answers. Key Points. 1. What are human papillomaviruses, and how are they transmitted?

Abstract. Human papillomavirus (HPV) DNA testing is cost-effective 1-3 (S. Kulasingam, PhD, et al, unpublished Atypical

Chapter 4. Triaging HPV-positive women with normal cytology by p16/ki-67 dual-stained cytology testing: Baseline and longitudinal data

Single and multiple human papillomavirus infections in cervical abnormalities in Portuguese women

Usefulness of p16/ki67 Immunostaining in the Triage of Women Referred to Colposcopy

Human Papillomavirus Genotyping Using an Automated Film-Based Chip Array

Epidemiology and genotype distribution of human papillomavirus (HPV) in Southwest China: a cross-sectional five years study in non-vaccinated women

MTC - MAIN TRAINING COURSE

Cervical cancer prevention: Advances in primary screening and triage system

Introduction. Original Article

Safe, Confident, QIAsure

Innovative molecular markers for diagnosis and prognosis in cervical neoplasia Boers, Aniek

Pushing the Boundaries of the Lab Diagnosis in Asia

Chapter 2. Br J Cancer Mar 18;110(6):

Absolute Risk of a Subsequent Abnormal Pap among Oncogenic Human Papillomavirus DNA-Positive, Cytologically Negative Women

Transcription:

JCM Accepts, published online ahead of print on 26 February 2014 J. Clin. Microbiol. doi:10.1128/jcm.00100-14 Copyright 2014, American Society for Microbiology. All Rights Reserved. 1 2 INTER LABORATORY REPRODUCIBILITY AND PROFICIENCY TESTING WITHIN THE HUMAN PAPILLOMAVIRUSES CERVICAL CANCER SCREENING PROGRAM IN CATALONIA, SPAIN 3 4 5 6 R. Ibáñez 1, M. Felez Sánchez 1,2, J.M. Godínez 2, C. Guardiá 3, E. Caballero 4, R. Juve 5, N. Combalia 6, B. Bellosillo 7, D. Cuevas 8, J. Moreno Crespi 9, Ll. Pons 10, J. Autonell 11, C. Gutierrez 12, J. Ordi 13, S. de Sanjosé 1,2,14, I.G. Bravo 1,2. 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 Infections and Cancer Unit, Cancer Epidemiology Research Program, Catalan Institute of Oncology (ICO). L'Hospitalet de Llobregat, Barcelona, Spain 2 Infections and Cancer, Bellvitge Institute of Biomedical Research (IDIBELL) L'Hospitalet de Llobregat, Barcelona, Spain 3 Primary Care Clinical Laboratory Barcelonés Nord y Vallés Oriental. Catalan Institute of Health. 08911 Badalona (Spain) 4 Microbiology Department. University Hospital Vall d Hebron. Catalan Institute of Health. 08035 Barcelona (Spain) 5 Primary Care Clinical Laboratory Bon Pastor. Catalan Institute of Health. 08030 Barcelona (Spain) 6 Pathology Department. UDIAT CD, University Hospital Parc Tauli. Universitat Autònoma de Barcelona. 08208 Barcelona (Spain) 7 Pathology Department. Hospital del Mar. 08003 Barcelona (Spain) 8 Pathology and Molecular Genetics Department. University Hospital Arnau de Vilanova. Catalan Institute of Health. 25198 Lleida (Spain) 9 Pathology Department. University Hospital of Girona Dr. Josep Trueta. Infections and Cancer Unit. Catalan Institute of Oncology. 17007 Girona (Spain) 10 Pathology Deparment, IISPV, URV, Hospital de Tortosa Verge de la Cinta (HTVC). Catalan Institute of Health. 43500 Tortosa (Spain) 11 Pathology Department. Hospital Consortium of Vic. 08500 Vic. Barcelona (Spain) 12 Clinical Laboratory ICS Tarragona, Molecular Biology Section. University Hospital of Tarragona Joan XXIII. Catalan Institute of Health. 43005 Tarragona (Spain) 13 Pathology Department, CRESIB Hospital Clínic, University Barcelona. 08036 Barcelona (Spain) 14 CIBER Epidemiology and Public Health, Spain 1

33 34 35 Corresponding author: 36 Ignacio G. Bravo 37 38 Infections and Cancer Laboratory, Cancer Epidemiology Research Programme, Catalan Institute of Oncology 39 Avda. Gran Vía, 199 203 08908 L Hospitalet de Llobregat, Barcelona; Spain. 40 Phone: +34 93 260 7123 /7812 41 E mail address: igbravo@iconcologia.net 42 43 Keywords: 44 45 Human papillomaviruses, HPV, Hybrid capture, HC2, Reproducibility, Reliability, Cervical cancer screening, Quality control. 46 Running title: 47 Proficiency testing within a cervical cancer screening program 48 49 2

50 ABSTRACT 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 In Catalonia, a screening protocol for cervical cancer including Human Papillomaviruses (HPVs) DNA testing using the Digene Hybrid Capture assay (HC2) was implemented in 2006. In order to monitor inter laboratory reproducibility, a proficiency testing (PT) survey of the HPVs samples was launched in 2008. The aim of this study was to explore the repeatability of HC2 performance. Participant laboratories provided annually twenty samples, five randomly chosen samples from each of the following relative light units (RLU) intervals: <0.5; 0.5 0.99; 1 9.99; and 10. Kappa statistics was used to determine the agreement level between the original and the PT readings. The nature and origin of the discrepant results were calculated by bootstrapping. A total of 946 specimens were retested. Kappa values were 0.91 for positive/negative categorical classification and 0.79 for the four RLU intervals studied. Sample retesting yielded systematically lower RLU values than the original test (p<0.005), independently of the time elapsed between both determinations (median 53 days), possibly due to freeze thaw cycles. The probability for a sample to show clinically discrepant results upon retesting was a function of the RLU value: samples with RLU in the 0.5 5 interval showed 10.80% probability to yield discrepant results (95% CI:7.86 14.33) compared to 0.85% probability for samples outside this interval (95% CI:0.17 1.69,). Globally, HC2 assay shows a high inter laboratory concordance. We have identified differential confidence thresholds and suggested the guidelines for inter laboratory PT in the future, as analytical quality assessment of the HPVs DNA detection remains a central component of the screening program for cervical cancer prevention. 71 3

72 INTRODUCTION 73 74 75 76 77 78 79 80 81 82 Persistent infection with oncogenic Human Papillomaviruses (HPVs) is a necessary cause in the development of invasive cervical cancer (CC) (8, 16, 24). CC screening based on cervical cytology has been instrumental in decreasing incidence and mortality associated to CC in those countries with high screening coverage rates (13). However, the infectious aetyology of this disease lead to the suggestion that the detection of the DNA of HPVs responsible for cellular transformation could provide a strong predictive marker for early detection of women at risk (20). Worldwide studies have established that testing for the presence of HPVs DNA shows a higher sensitivity than cytology for the detection of high grade cervical intraepithelial lesions (2). Despite this demonstrated higher sensitivity, its application to CC screening programs is still not widespread (1, 2, 17, 19, 21). 83 84 85 86 87 88 89 90 Currently one of the most widely used tests for the detection of HPVs DNA is the Digene Hybrid Capture assay (HC2; Qiagen, Gaithersburg, MD), approved by the U.S. Food and Drug Administration (FDA) in 2003 for its use in clinical settings. Large prospective cohort studies and randomized controlled trials have proved this assay as a highly clinical sensitivity test (90 95%) for the detection of high grade CIN (2). The HC2 procedure does not include a PCR amplification step, and uses cocktail of probes designed to detect 13 HPVs classified as carcinogenic (groups 1 and 2A) by the International Agency for Research on Cancer: HPV16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, and 68 (12). 91 92 93 94 95 96 In the last years several tests for detection of DNA or transcripts of HPVs have been developed (7, 18). A number of these tests have been proposed to be applied for CC screening. Beyond sensitivity and specificity, an additional critical factor before their introduction as a screening technique is the need for high reproducibility under the diverse conditions that the different clinical laboratories may face (15). The current guidelines for HPVs test propose an intralaboratory reproducibility and inter laboratory agreement with a lower confidence bound not 4

97 98 less than 87% for both of them (15). Good reproducibility of blinded clinical samples in the laboratories might further warrant results reliability (4). 99 100 101 102 103 104 105 106 107 108 In Catalonia, North East of Spain, a screening protocol including HPVs testing for selected indications was implemented in 2006 (9). Women became eligible for the carcinogenic HPVs testing if they belonged to any of the three following categories: i) inadequate screening history, i.e. women aged over 40 years and with no history of cytology in the previous five years, ii) incident diagnosis of atypical squamous cell of undetermined significance (ASC US); or iii) first follow up visit after a surgical conization. During the 2006 to 2012 period, 116,970 tests for carcinogenic HPVs have been performed within CC screening activities in the Public Health Sector in Catalonia. In order to monitor inter laboratory reproducibility, a proficiency testing (PT) survey of the HPVs samples was launched in 2008, covering the twelve laboratories involved in the screening activities. 109 110 111 112 113 The aim of the present study was to evaluate inter laboratory reproducibility and error rate in the HC2 performance in cervical specimens among the twelve reference laboratories participating in the HPVs screening in Catalonia during the period 2008 2011. Additionally, we aimed to investigate the nature and origin of the discrepant results in order to provide guidelines for the assessment of inter laboratory concordance thresholds in the future. 114 5

115 MATERIALS AND METHODS 116 Design of external proficiency testing program 117 118 119 120 121 All tests (original and PT assays) were performed using HC2 with the high risk probe pool only, following the manufacturer's instructions. According to the FDA approved guidelines, the threshold for positivity is the HC2 response against a control sample containing 1.0 pg ml 1 HPV DNA, roughly equivalent to 5,000 genomic copies. A sample was considered positive if it rendered a relative light unit ratio (RLU) of 1 with respect to the positive control. 122 123 124 125 126 127 128 129 130 The twelve laboratories for HPVs screening in Catalonia (Hospital Hospital Universitari Dr. Josep Trueta, Consoci hospitalari de Vic, Hospital Universitari Joan XXIII, Hospital del Mar, Hospital Universitari de Bellvitge Institut Català d Oncologia (ICO), Hospital Clínic, Hospital Universitari Vall d Hebron, Hospital Universitari Verge de la Cinta, Hospital Universitari Arnau de Vilanova, Consorci sanitari Parc Tauli, and the Laboratoris d Atenció Primària Doctor Robert and Bon Pastor) participated in the PT program. The laboratory at the ICO was designated by the Health Department of the Government of Catalonia as the reference laboratory for PT purposes. The overall project was approved by the ethical committee of the ICO/IDIBELL. All information regarding the identification of patients was anonymized before analysis. 131 132 133 134 135 136 137 138 139 The PT was run annually and blindly. Between 2008 and 2011 each participating laboratory delivered 20 samples to the reference laboratory. The PT of the reference laboratory itself was performed by one of the twelve laboratories. For PT, the laboratories provided the residual aliquot of the original samples. Eleven of the participant laboratories collected the samples in STM medium. One laboratory served liquid cytology based screening and in this case, buffer conversion previous to retesting was performed according to the manufacturer s guidelines. Samples were kept frozen at 20 ºC before delivery. As far as possible, laboratories were requested to deliver samples for PT within the three months after the initial testing. Nevertheless, the possible effect of time elapsed between the original and the PT test was also 6

140 141 142 assessed. In order to ensure a uniform coverage distribution of positive and negative HC2 values, the laboratories were asked to provide annually five randomly chosen samples of each of the following RLU intervals: <0.5; 0.5 0.99; 1 9.99; and 10. 143 Concordance analysis 144 145 146 147 HPV samples were collated and analysed to allow for inter laboratory comparisons. Paired HC2 test results were categorized as Original negative/pt negative, Original positive/pt negative, Original negative/pt positive, and Original positive/pt positive based on the 1.0 RLU/PC positive cutoff. 148 149 150 151 152 153 154 Cohen s Kappa statistics was used to determine the level of agreement between the original and the PT categories of results (positive/negative and RLU intervals). The Kappa statistic is a measure of inter rater agreement, which tests as null hypothesis the inter rater agreement by chance (14). Generally a Kappa score between 0.8 and 1 is considered as an excellent agreement, values between 0.61 and 0.8 substantial agreement; between 0.41 and 0.6 moderate agreement, between 0.21 and 0.4 fair agreement; and between 0 and 0.2 slight agreement (3). 155 156 157 158 159 160 Correlation between original and PT HC2 readouts as well as correlation between changes in signal in both readouts and the time elapsed between both tests were assessed by linear regression using SPSS Statistics17.0 (SPSS Statistics/IBM, Chicago IL, USA). Paired Wilcoxon & Mann Whitney test implemented in R (http://www.r project.org/) was used to test the null hypothesis of the median difference between RLU values in origin and at PT to be nondifferent from zero. 161 162 For all statistical analyses, a p value below 0.05 was considered as a significant for rejection of the null hypothesis. 163 Bootstrapping analysis 7

164 165 166 167 168 169 170 171 172 173 In order to estimate the limits for the expected number of discrepancies between the results at origin and at the PT, a bootstrapping analysis was done. One thousand virtual labs were generated by bootstrapping among the pooled samples, allowing for replacement. All these virtual labs were constructed with the same data structure that was requested for the PT, i.e. 20 samples in total, five samples of each of the intervals above defined. We performed also the same analyses in another hypothetical scenario, with 40 samples per virtual laboratory, ten of each interval. The total numbers of discrepancies and concordances, as well as the Cohen s Kappa statistics were computed for each of the virtual labs. Distribution of discrepancies was fitted to a Poisson model and the 95% confidence interval was calculated based on the fitted model. 174 175 176 177 178 179 180 181 182 We explored further the differential repeatability of the technique for different values of the RLU variable. Each of the intervals 0.5 0.99 and 1.0 10 was divided in 0.1 unit sub intervals. For each sub interval, one thousand virtual laboratories were created by drawing random samples with replacement from the original pooled samples. The size of each virtual lab was equal to the original number of samples within the sub interval. The mean percentage of discrepancies was calculated for each of the defined sub intervals. The 2.5% and 97.5% quantiles of the 1000 bootstrap samples were used to define the lower bound and the upper bound of 95% CI. In all cases bootstrapping analyses were performed, using in house Perl scripts. 183 8

184 RESULTS 185 186 187 During the period 2008 2011, a total of 946 specimens were retested for PT within the framework of the CC screening program in Catalonia, Spain. The results of these paired tests were used to study inter laboratory reproducibility. 188 Comparison between origin PT paired tests showed high correlation 189 190 191 192 193 194 195 196 197 198 199 200 The distribution of the samples selected for PT was chosen to show a flat distribution of RLU values around the clinical cutoff value. The precise distribution of the samples tested in the PT and its comparison with the distribution of the general population is given in Table 1. In the general population in which HC2 was used to assess the presence of DNA of oncogenic HPVs (n=46,949), the distribution of positive and negative results for the period 2006 2009 was 22% and 78%, respectively. In contrast, in the PT samples this distribution was 52.2% and 47.8%, respectively. Regarding the classification by categories of RLU values, in the HPV tested general population the central intervals spanning 0.5 9.99 comprised only the 9.8% of all samples, while 47% of the PT samples were located in these intervals. These differences arise from the focus itself of the PT program, aiming to evaluate the inter laboratory reliability of the technique and not necessarily matching with the distribution of the RLU values in the general population. 201 202 203 204 205 206 207 208 A comparison of RLU values of the HC2 test by the twelve tested laboratories and of the repeated test is shown in Figure 1. The overall correlation coefficient for the original and the PT values was 0.95 (p<0.05), ranging between 0.88 and 0.97 for each of the individual laboratories (data not shown). Correlation restricted to samples interpreted as positive (RLU 1) in both origin and in the PT retest increased up to 0.97 (p<0.05) (data not shown). The PT retests rendered systematically lower RLU values than the reported by origin laboratories, with a pairwise median decrease of 9% in signal. This median difference was significantly different from zero, as determined by a paired Wilcoxon & Mann Whitney test (p<0.005). 9

209 Time elapsed between original and PT test did not influence reproducibility 210 211 212 213 214 215 The mean time elapsed between the first and the PT read was 90.5 days, with a median of 53.0 days, and a range from 8 to 885 days. To determine whether time elapsed between consecutive assays on the same sample could influence the differences between the RLU values reported by HC2, the correlation between the original and the PT readout values was assessed (Figure 2). No significant correlation was found (p=0.83), indicating that the time interval between consecutive tests did not result in a decrease of the RLU readout value. 216 All laboratories exhibited high agreement with the PT results 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 In order to assess the agreement between the two tests, Cohen s Kappa was calculated for the positive/negative categorical classification and RLU intervals (Table 2). Paired comparison between original and PT analyses for positive/negative results rendered an overall excellent agreement (Kappa=0.91). The Kappa values for the individual laboratories ranged between 0.84 and 0.97, being above 0.90 for eight (66.7%) of them (Table S1). The laboratory using liquid cytology based samples did not behave different from the rest of laboratories tested, which used STM as collecting medium. The total number of discordant results was 44 (4.6% of the total samples). A slightly asymmetric distribution of discordant tests was observed, with more discordant samples being positive in origin and negative in the PT than vice versa (25 vs 19 respectively), although this difference was not significant (Z Score=1.28, p value=0.20). The vast majority of discrepant results (97.7%) were reported in samples with RLU values between 0.5 and 10.0 in original tests. Remarkably, 13 of them showed a large difference between the paired tests: ten positive samples with an original RLU between 2.0 and 9.0 produced negative results between 0.08 and 0.77 in the PT tests. In contrast, only two samples with original negative result (RLU of 0.61 and 0.72) produced positive results (RLU of 2.97 and 2.98 respectively) in the PT retest. Only one sample was an outlier following Tukey s criterion (23), showing an original result of 38.57 and a 0.17 result in the PT test. 10

234 235 236 237 238 The reproducibility of the test in each of the RLU intervals here defined showed an overall Kappa value of 0.79 with values ranging from 0.70 to 0.84 for each of the individual laboratories (Table S2). The observed concordance in the extreme intervals, i.e. RLU below 0.5 and RLU above 10, was higher than in the intervals around the cutoff, i.e. RLU between 0.5 and 10, with overall agreement values of 96.4% and 70.1%, respectively. 239 Exploring new alternatives in the design of the PT test 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 Finally, we explored different alternatives for future improvement of the PT design. First, we aimed to provide a threshold for identifying significant deviations from the maximum expected number of discrepancies, under the current sampling scheme for PT. Limits for a discrepancy report were determined under two scenarios: 20 samples analysed for PT per laboratory with five samples per each interval (i.e. the one currently implemented); and 40 samples per laboratory with ten samples for each interval (which could be implemented depending on budget). We generated, by bootstrapping with replacement from the original data, 1,000 data sets that corresponded to virtual laboratories, using the same stratified structure of samples. The results were fitted to a Poisson distribution and the accumulated probability for the detection of the corresponding number of discrepant results was calculated. We aimed to find the number of discrepant results that could be expected to appear by chance with a probability below 5% (Figure 4). Using the stratified structure of samples evaluated in the present study (n=20, five samples of each interval), the probability of having two or more discrepancies was 7.5%, whereas the probability of three or more discrepancies was 1.7%. If 40 samples in total were collected, the probability of finding three or more discrepancies was 13.1%, while retrieving four or more discrepancies presented a probability of 4.7%. Thus, under the present PT structure and retesting 20 samples per laboratory, the analyses should be labelled as significantly discordant when PT yields three or more discrepant samples. If 40 11

258 259 were to be retested using the same structure, the flag for significant discordance should be raised for sets with four or more discrepant samples. 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 Finally we explored whether the results could be used to improve the PT design, by modifying the boundaries for sample stratification. We focused on the readout values with lower reproducibility, identified above by means of Kappa values as the area around the positivity threshold. This gray area comprised two of our RLU categories: 0.5 0.99 and 1 9 99. By using bootstrapping analyses as described above, we explored the expected percentage of discrepancies along both RLU categories (Figure 4). The results indicated that in the 0.5 RLU<1 interval the expected number of samples showing discrepant results did not experiment significant changes, remaining flat around 13.7% (95%CI: 5.5 28.6%). In the 1 RLU<10 interval we observed a significant monotonic decrease (p value for trend<0.001) in the percentage of samples showing discrepant results, starting from 25.9% (95%CI: 14.8 38.9%) for values in the 1 RLU<2 interval and stabilising after RLU 5 interval to reach 9.7% (95%CI: 5.7 13.4%). Moreover, we have explored the differential probability for a sample to show discrepancy upon retesting, as a function of the RLU values. The RLU interval 0.5 5 concentrated the highest probability for discrepancies: the probability for a sample in this interval to show discrepancy was 10.80% (95% CI: 7.86 14.33) compared to 0.85% (95% CI: 0.17 1.69,) for samples outside this interval. The accumulated probabilities for discrepancy for the different intervals of the RLU variable are given in Table 3. These improved intervals will be implemented recommended in the future for the PT activities when using HC2. We will thus ask the participating laboratories to would be asked to provide annually with randomly chosen samples of each of the following RLU intervals with the following structure: RLU<0.5, five samples; 0.5 RLU<1, ten samples; 1 RLU<5, ten samples; and RLU 5, five samples. In this case, the probability of having three or more discrepancies by chance was will be 13.5%, whereas the probability of three four or more discrepancies by chance was will be 4.8%. 12

283 284 DISCUSSION 285 286 287 288 289 290 291 292 293 294 An inter laboratory PT was successfully implemented for the detection of high risk HPV types, within the CC screening activities in the Public Health System in Catalonia. It was considered instrumental to ensure the accuracy of the results obtained by HPV testing. A total of 946 samples were analysed for PT during the 2008 2011 period and a total of 44 (4.6%) discrepancies were found. The present PT study was in agreement with the guidelines proposed by Meijer and co workers for the validation of high risk HPV tests for primary CC screening (15). These guidelines proposed that inter laboratory agreement should be determined by evaluation of at least 500 samples, with a 30% of the samples having tested positive in a reference laboratory using a clinically validated assay, and reaching an agreement with a lower confidence bound not less than 87%. 295 296 297 298 299 300 301 302 It is very important to note that the distribution of the RLU values used for the PT does not follow the one obtained from the general population participating in the screening algorithms in which HC2 was used to assess the presence of DNA of oncogenic HPVs (Table 1). Differences between distributions reflect the analytical nature of our PT program, as we were interested in assessing the overall performance of the laboratories running the HC2 technique. We chose therefore to design a balanced PT sample distribution exploring equally the full range of the readout variable, as otherwise the central values, close to cutoff and more prone to discrepancy, would have been undersampled. 303 304 305 306 307 Paired test demonstrated an almost excellent inter laboratory agreement in all twelve participating laboratories, both for positive/negative agreement (Kappa= 0.91) as well as for the four RLU categories (Kappa= 0.79). In our study, RLU values were slightly but significantly lower in the PT retesting compared to those obtained in the original laboratory, with a median decrease of 9.0% from the original RLU value. One possible explanation for this trend could 13

308 309 310 311 312 313 314 315 316 317 318 have been specimen degradation between the two tests, as it has been reported for HC2 to a certain extent (5, 6). We tested the hypothesis of the influence of time elapsed between the two consecutive tests to account for this difference, but we did not find a significant association between the decrease in the RLU readout variable and the time between the two HC2 tests. An alternative explanation could be the impact of consecutive freeze thaw cycles, as samples were denatured, analysed in origin, frozen, sent, and thawed for PT analysis, especially because the HC2 test does not include an amplification step. Finally, although in absolute terms more samples were found positive in origin and tested negative in PT than the inverse case, the difference between both numbers was not statistically significant. Overall, the small decrease in signal during retesting entails little or none clinical significance because HPV retesting is not part of routine clinical practice (6). 319 320 321 322 323 324 325 326 327 328 329 330 331 Several reports have addressed the use of HC2 in CC screening and its performance compared to other methods for detecting HPVs genetic material (10, 11, 18, 22). However, few data on the reproducibility of HC2 are available. In the ALTS study, inter laboratory reproducibility values across four laboratories were found to be similar to those herein communicated (Kappa=0.84) (6). The inter laboratory reproducibility for HC2 in seven laboratories participating in an Italian clinical trial was also high (4). In both cases RLU values around the cutoff (RLU=1) were more prone to show discrepant results, as observed in our study. Thus, the random variation of RLU values was more influential on the classification as positive or negative for samples rendering values close to the cutoff (4). One fundamental contribution of the present study was the detailed description of the differential reproducibility of the HC2 technique for different intervals of the RLU. We conclude therefore that, in our settings, the 0.5 RLU<5 is the interval in which PT paired values of the readout RLU were more likely to show discrepancy. 14

332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 A central aim of the initial PT assessment after the first four years of the CC screening activities was to provide guidelines for a better PT control itself in the future. As such, we have estimated the number of samples that should test discrepant after retesting, to be considered as a significant discordant. In our case and with our data structure, this critical value is three or more individual discrepant samples in the sets of 20 samples to be retested. Two factors in our PT design are susceptible of changes for improvement: the number of retested samples and its stratified structure. Increasing the number of retested samples would allow us to be more accurate at finding a number of discrepancies significantly higher than expected by chance. This has been exemplified in the simulation described above including 40 instead of 20 samples per laboratory per year. Obviously, this change also doubles the cost of the PT assessment and cost benefit equilibrium must be found. Regarding sampling structure, the aim of the PT program is analytical, and not clinical, and we have thus chosen to equally monitor throughout the dynamic range of the readout variable, instead of randomly sampling from the entire population. This strategy allows placing a special emphasis on the region in which the technique is less reproducible. In order to focus in the sensitive area in which samples were more likely to be discordant (0.5 5) we propose to apply the following sample structure for future PT assessment: five samples with RLU<0.5; ten samples with 0.5 RLU<1; ten samples with 1 RLU<5; and five samples with RLU 5. 350 Conclusions 351 352 353 354 355 356 The results of the PT assessment for the CC screening program in Catalonia have shown that the HC2 assay has a high inter laboratory concordance. The use of a common, standardized protocol with sharp anti contamination measures for samples and processing fluids in the twelve laboratories involved has been instrumental to achieve these excellent results. We have also detected a significant decrease in HC2 signal after retesting that cannot be linked to the time elapsed between consecutive tests, and that may be attributable to the additional freeze 15

357 358 359 360 361 362 363 thaw cycle. We have additionally explored in depth the differential repeatability along the dynamic response of the readout variable and have demonstrated that most discrepant results accumulate in the 0.5 5 RLU interval, with samples in this interval being 12.7 times more likely to show discrepant results than samples outside this interval. With this information we have further defined future recommendations and confidence thresholds for inter laboratory PT in the future when using HC2 as screening test, as analytical quality assessment of the HPVs DNA detection remains a central component of the screening program for CC prevention. 364 16

365 Authors contributions 366 367 368 Conceived the analyses: RI, IGB. Coordinated the screening activities: RI, SdS. Analysed the data: RI, MFS, IGB. Performed experiments: JMG, CG, EC, RJ, NC, BB, CD, JMC, LlP, JA, CG, JO. Drafted the manuscript: RI, MFS, IGB. All authors read and approved the final manuscript. 369 370 Acknowledgments 371 372 373 374 375 This study has been partially supported by the Pla Director d Oncologia of the Health Department in Catalonia and grants from the Instituto de Salud Carlos III (Spanish Government, grants RCESP C03/09, RTICESP C03/10, RTIC RD06/0020/0095, RD12/0036/0056, and CIBERESP) and from the Agència de Gestió d Ajuts Universitaris i de Recerca (Catalan Government, grants AGAUR 2005SGR 00695 and 2009SGR126). 376 377 378 379 SdS has received occasional travel grants from Merck and Qiagen and holds a restricted grant from Qiagen and from Merck not related to the study presented. None of the funding bodies had any role in study design; in collection, analysis, and data interpretation; in the writing of the manuscript; and in the decision to submit the manuscript for publication. 380 381 382 383 384 385 386 387 388 The authors want to thank Prof. Attila Lorincz for his comments and remarks to an initial version of the manuscript. We would like to thank the following persons for their collaboration in this study: Marylène Lejeune and Carlos López, Cristina Callau from Molecular Biology and Research Section, IISPV, URV, Hospital de Tortosa Verge de la Cinta (HTVC); Anna Ferran Gibert, Mercé Rey Ruhi, Ruth Orellana Fernandez and Irmgard Costa Tranchel from UDIAT CD, Hospital Universitari Parc Taulí; Roser Esteve from Barcelona Hospital Clínic; Montserrat Sardà Roca and Àngels Verdaguer Autonell from Consorci Hospitalari de Vic; Jo Ellen Klaustermeier, Vanesa Camón, Ana Esteban, Yolanda Florencia, Isabel Català and Núria Baixeras from Hospital universitari de Bellvitge Institut Català d Oncologia; José Maria Navarro Olivella, Carme 17

389 390 391 392 393 394 395 Perich Alsina, and Belen Viñado from laboratori Clínic Bon Pastor; Cristina Antúnes Vitales, Glòria Oliveras Serrat, Lluís Bernadó Turmo and Pilar Castro Marqueta from Institut Català d'oncologia (Girona); Ana Velasco Sánchez, Anna Serrate López, Marta Romero Fernández, Azahar Romero Jiménez and Nuria Llecha from Hospital Universitari Arnau de Vilanova; Vanessa Guerrero Hormiga and Ignacia Ramos Mora from Hospital Universitari Joan XXIII; Rosa Tenllado Gimenez and Adoración Velilla from Laboratori Barcelonès Nord i Vallès Oriental; and Belén Lloveras, Francesc Alameda and Merced Muset from Hospital del Mar. 396 18

397 REFERENCES 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 1. Anttila, A., L. Kotaniemi Talonen, M. Leinonen, M. Hakama, P. Laurila, J. Tarkkanen, N. Malila, and P. Nieminen. 2010. Rate of cervical cancer, severe intraepithelial neoplasia, and adenocarcinoma in situ in primary HPV DNA screening with cytology triage: randomised study within organised screening programme. BMJ 340:c1804. 2. Arbyn, M., G. Ronco, A. Anttila, C. J. Meijer, M. Poljak, G. Ogilvie, G. Koliopoulos, P. Naucler, R. Sankaranarayanan, and J. Peto. 2012. Evidence regarding human papillomavirus testing in secondary prevention of cervical cancer. Vaccine 30 Suppl 5:F88 99. 3. Brennan, P., and A. Silman. 1992. Statistical methods for assessing observer variability in clinical measures. BMJ 304:1491 4. 4. Carozzi, F. M., A. Del Mistro, M. Confortini, C. Sani, D. Puliti, R. Trevisan, L. De Marco, A. G. Tos, S. Girlando, P. D. Palma, A. Pellegrini, M. L. Schiboni, P. Crucitti, P. Pierotti, A. Vignato, and G. Ronco. 2005. Reproducibility of HPV DNA Testing by Hybrid Capture 2 in a Screening Setting. Am J Clin Pathol 124:716 21. 5. Castle, P. E., A. Hildesheim, M. Schiffman, C. A. Gaydos, A. Cullen, R. Herrero, M. C. Bratti, and E. Freer. 2003. Stability of archived liquid based cytologic specimens. Cancer 99:320 2. 6. Castle, P. E., C. M. Wheeler, D. Solomon, M. Schiffman, and C. L. Peyton. 2004. Interlaboratory reliability of Hybrid Capture 2. Am J Clin Pathol 122:238 45. 7. Cuzick, J., C. Bergeron, M. von Knebel Doeberitz, P. Gravitt, J. Jeronimo, A. T. Lorincz, J. L. M. M. C, R. Sankaranarayanan, J. F. S. P, and A. Szarewski. 2012. New technologies and procedures for cervical cancer screening. Vaccine 30 Suppl 5:F107 16. 8. de Sanjose, S., W. G. Quint, L. Alemany, D. T. Geraets, J. E. Klaustermeier, B. Lloveras, S. Tous, A. Felix, L. E. Bravo, H. R. Shin, C. S. Vallejos, P. A. de Ruiz, M. A. Lima, N. Guimera, O. Clavero, M. Alejo, A. Llombart Bosch, C. Cheng Yang, S. A. Tatti, E. Kasamatsu, E. Iljazovic, M. Odida, R. Prado, M. Seoud, M. Grce, A. Usubutun, A. Jain, G. A. Suarez, L. E. Lombardi, A. Banjo, C. Menendez, E. J. Domingo, J. Velasco, A. Nessa, S. C. Chichareon, Y. L. Qiao, E. Lerma, S. M. Garland, T. Sasagawa, A. Ferrera, D. Hammouda, L. Mariani, A. Pelayo, I. Steiner, E. Oliva, C. J. Meijer, W. F. Al Jassar, E. Cruz, T. C. Wright, A. Puras, C. L. Llave, M. Tzardi, T. Agorastos, V. Garcia Barriola, C. Clavel, J. Ordi, M. Andujar, X. Castellsague, G. I. Sanchez, A. M. Nowakowski, J. Bornstein, N. Munoz, and F. X. Bosch. 2010. Human papillomavirus genotype attribution in invasive cervical cancer: a retrospective cross sectional worldwide study. Lancet Oncol 11:1048 1056. 9. Departament de Salut. Direcció General de Planificació i Avaluació. Generalitat de Catalunya. 2007. Protocol de les Activitats per al Cribratge del Càncer de Coll Uterí a l Atenció Primària., Barcelona. 10. Gravitt, P. E., M. Schiffman, D. Solomon, C. M. Wheeler, and P. E. Castle. 2008. A comparison of linear array and hybrid capture 2 for detection of carcinogenic human papillomavirus and cervical precancer in ASCUS LSIL triage study. Cancer Epidemiol Biomarkers Prev 17:1248 54. 11. Hesselink, A. T., N. W. Bulkmans, J. Berkhof, A. T. Lorincz, C. J. Meijer, and P. J. Snijders. 2006. Cross sectional comparison of an automated hybrid capture 2 assay and the consensus GP5+/6+ PCR method in a population based cervical screening program. J Clin Microbiol 44:3680 5. 19

445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 12. IARC. 2012. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. Volume 100B: A review on human carcinogens. International Agency for Research on Cancer., Lyon, France. 13. IARC. 2005. IARC Working Group on the Evaluation of Cancer Preventive Strategies. In I. Press (ed.), Cervix Cancer Screening, vol. 10, Lyon, France. 14. Landis, J. R., and G. G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33:159 74. 15. Meijer, C. J., H. Berkhof, D. A. Heideman, A. T. Hesselink, and P. J. Snijders. 2009. Validation of high risk HPV tests for primary cervical screening. J Clin Virol 46 Suppl 3:S1 4. 16. Muñoz, N., F. X. Bosch, S. de Sanjosé, R. Herrero, X. Castellsague, K. V. Shah, P. J. F. Snijders, and C. J. L. M. Meijer. 2003. Epidemiologic classification of human papillomavirus types associated with cervical cancer. N Engl J Med 348:518 527. 17. Naucler, P., W. Ryd, S. Tornberg, A. Strand, G. Wadell, K. Elfgren, T. Radberg, B. Strander, B. Johansson, O. Forslund, B. G. Hansson, E. Rylander, and J. Dillner. 2007. Human papillomavirus and Papanicolaou tests to screen for cervical cancer. N Engl J Med 357:1589 97. 18. Poljak, M., J. Cuzick, B. J. Kocjan, T. Iftner, J. Dillner, and M. Arbyn. 2012. Nucleic acid tests for the detection of alpha human papillomaviruses. Vaccine 30 Suppl 5:F100 6. 19. Rijkaart, D. C., J. Berkhof, L. Rozendaal, F. J. van Kemenade, N. W. Bulkmans, D. A. Heideman, G. G. Kenter, J. Cuzick, P. J. Snijders, and C. J. Meijer. 2012. Human papillomavirus testing for the detection of high grade cervical intraepithelial neoplasia and cancer: final results of the POBASCAM randomised controlled trial. Lancet Oncol 13:78 88. 20. Ronco, G., J. Dillner, K. M. Elfstrom, S. Tunesi, P. J. Snijders, M. Arbyn, H. Kitchener, N. Segnan, C. Gilham, P. Giorgi Rossi, J. Berkhof, J. Peto, and C. J. Meijer. 2013. Efficacy of HPV based screening for prevention of invasive cervical cancer: follow up of four European randomised controlled trials. Lancet. 21. Ronco, G., P. Giorgi Rossi, F. Carozzi, M. Confortini, P. Dalla Palma, A. Del Mistro, B. Ghiringhello, S. Girlando, A. Gillio Tos, L. De Marco, C. Naldoni, P. Pierotti, R. Rizzolo, P. Schincaglia, M. Zorzi, M. Zappa, N. Segnan, and J. Cuzick. 2010. Efficacy of human papillomavirus testing for the detection of invasive cervical cancers and cervical intraepithelial neoplasia: a randomised controlled trial. Lancet Oncol 11:249 57. 22. Sankaranarayanan, R., L. Gaffikin, M. Jacob, J. Sellors, and S. Robles. 2005. A critical assessment of screening methods for cervical neoplasia. Int J Gynaecol Obstet 89 Suppl 2:S4 S12. 23. Tukey, J. W. 1977. Exploratory data analysis. Addison Wesley, Reading. 24. Walboomers, J. M. M., M. V. Jacobs, M. M. Manos, F. X. Bosch, J. A. Kummer, K. V. Shah, P. J. F. Snijders, C. J. L. M. Meifer, and N. Munoz. 1999. Human papillomavirus is a necessary cause of invasive cervical cancer worldwide. J Pathol 189:12 19. 487 20

488 Figure legends 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 Figure 1. Scatterplot of the RLU values for the results of the HC2 test at the origin laboratory and for the paired proficiency test. Results are expressed as logarithmic values of relative luminiscence units (RLU) reads. The correlation coefficient for linear regression was 0.95 (pvalue<0.05). Figure 2. Scatterplot of relative luminiscence units (RLU) ratios between original and proficiency test as a function of time elapsed between the two tests. Results are expressed as logarithmic values of the ratio of RLU and time in days. The correlation coefficient obtained for linear regression was 0.003 (p value=0.219), and therefore no loss in signal can be attributed to the time elapsed between consecutive tests. Figure 3. Distribution of number of clinically discrepant results between the original laboratory and the Proficiency test (circles) and fit to a Poisson distribution (grey line). Accumulated probability for the detection of the corresponding number of discrepancies are shown above each point. By bootstrapping with replacement of the all samples, 1,000 virtual labs were generated and the number of discrepancies was calculated. With the same stratifies structure of samples used in this study (RLU<0.5; 0.5 RLU<1; 1 RLU<10; and RLU 10), two different scenarios were tested: (A) 20 samples, five from each interval; and (B) 40 samples, ten from each interval. Figure 4. Expected percentage of discrepancies along the RLU intervals 0.5 0.99 (A) and 1 9.99 (B). Mean percentage of discrepancies were obtained by bootstrapping. Grey area encompasses 95% confidence interval. RLU: Relative luminiscence units. 510 511 21

Table 1. Data from the general population compared to data from the reference HPV laboratories used for the HPV proficiency testing CATEGORIES OF RESULTS SCREENING GENERAL POPULATION * N (%) PROFICIENCY TESTING** N (%) HPV negative 36,600 (78) 452 (47.8) HPV positive 10,349 (22) 494 (52.2) Total 46,949 (100) 946 (100) RLU<0.5 34,580 (73.7) 253 (26.7) 0.5 RLU<1 2,020 (4.3) 199 (21) 1 RLU<10 2,563 (5.5) 246 (26) RLU 10 7,786 (16.6) 248 (26.2) * Data obtained from the HPV test performed to the screening general population in the period 2007-2009. ** Data obtained from the reference HPV laboratories for the HPV proficiency testing during the period 2008-2011.

Table 2: Comparison of tested laboratories and HPV Proficiency testing Laboratory paired Hybrid Capture 2 (HC2) Test Results Proficiency HC2 results of origin testing HC2 Total Analysis of agreement results NEGATIVE POSITIVE NEGATIVE 433 25 458 Kappa 0.91 POSITIVE 19 469 488 p-value 0.00 Total 452 494 946 RLU<0.5 0.5 RLU<1 1 RLU<10 RLU 10 Total Analysis of agreement RLU<0.5 243 81 11 1 336 0.5 RLU<1 10 99 13 0 122 Kappa 0.79 1 RLU<10 0 19 213 7 239 p-value 0.00 RLU 10 0 0 9 240 249 Total 253 199 246 248 946

Table 3. Comparison between the percentage of discrepancies between results at origin and results after proficiency testing, as a function of the relative luminescence units (RLU). The categories used in the current proficiency testing scheme and those proposed for the improved proficiency testing are shown. Categories of samples Mean probability of discrepancies, % (95% CI) Proficiency testing RLU/CO <0.5 0 (0) 0.5 RLU/CO <1 9.59 (6.02-13.58) 1 RLU/CO <10 9.87 (6.50-13.82) RLU/CO 10 0.41 (0-1.21) Proposed proficiency testing RLU/CO <0.5 0 (0) 0.5 RLU/CO <1 9.59 (6.02-13.58) 1 RLU/CO <5 12.56 (7.64-18.47) RLU/CO 5 1.18 (0.30-2.87)