Examining the prevalence of rheumatoid arthritis in data from the Clinical Practice Research Datalink Julian Gardiner, Michael Soljak, Department of Primary Care & Public Health Benjamin Ellis, Arthritis Research UK Funded by Arthritis Research UK Strategic Grant
Background ARUK MSK Calculator project Phase 2A: refinement of Phase 1 data Phase 2B: data discovery (including rheumatoid arthritis (RA) prevalence model) Phase 2C: external validation Phase 2D: linked wider analyses Phase 2E: apply prevalence models to other UK countries data Phase 2F: value estimation Phase 2G: Support for web developers
Aims of RA prevalence model development 1 to identify cases of RA in the CPRD in three different ways: 1. doctor diagnosis NB QOF register 2. develop an algorithm for the EULAR / ACR diagnostic criteria [1] 3. Identify patients taking disease-modifying anti-rheumatic drugs (DMARDs) without another disease indication to estimate the prevalence of RA identified in these three ways to model risk factors for RA in a case- control /non-case analysis using logistic regression [1] Aletaha D, Neogi T, Silman AJ. 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Annals of the Rheumatic Diseases. 2010;69(10).
Aims of RA prevalence model development 2 to apply the derived odds ratios to risk factor subcategories in general practice and Middle Layer Super Output Area (MLSOA) populations to estimate time from first RA clinical manifestation and from diagnostic algorithm being met to time of diagnosis entry [1] to perform a geospatial comparison of observed/expected prevalence of RA [1] Raza, K. and A. Filer (2014). "The therapeutic window of opportunity in rheumatoid arthritis: does it ever close?" Annals of the Rheumatic Diseases 74(5): 793-794.
Methods: data extraction We extracted data from the CPRD for all patients with one or more Read/medcodes from a list of RA related items. Medcodes were in four categories: doctor diagnoses of RA joint inflammation acute phase reactant (APR) tests C-reactive protein (CRP), Erythrocyte sedimentation rate (ESR) Serology tests Rheumatoid Factor (RF), Anti citrullinated protein antibody (ACPA) For example 37131 Effusion of PIP joint of finger 38980 Effusion of DIP joint - finger 53659 Effusion of hip 17658 Effusion of knee 65998 Effusion of tibio-fibular joint 27746 Effusion of ankle 94322 Effusion of subtalar joint 91298 Effusion of talonavicular joint 73723 Effusion of lesser MTP joint 62465 Effusion of IP joint of toe 2695 Synovitis of hip 43238 Synovitis of knee etc (130 medcodes in all)
Removal of conflicting diagnoses RA cases with a more recent alternative diagnosis will be excluded: Psoriatic arthropathy Ankylosing spondylitis Sacroiliitis NEC Spondylitis NOS Psoriatic arthritis Psoriatic arthropathy NOS FH: Ankylosing spondylitis Inflammatory spondylopathies Psoriasis spondylitica BASDAI - Bath ankylosing spondylitis disease activity index Juvenile ankylosing spondylitis Spinal enthesopathy Arthritis mutilans Other inflammatory spondylopathies Other inflammatory spondylopathies NOS Marie - Strumpell spondylitis Distal interphalangeal psoriatic arthropathy Inflammatory spondylopathies in diseases EC
EULAR/ACR RA diagnostic algorithm
Algorithm steps (Score 6, diagnose as case of RA) A. Joint involvement 1 large joint Score 0 2 10 large joints Score 1 1 3 small joints Score 2 4 10 small joints Score 3 >10 joints (at least one small joint) Score 5 B. Serology Negative RF and negative ACPA Score 0 Low-positive RF or low-positive ACPA Score 2 High-positive RF or high-positive ACPA Score 3 C. Acute-phase reactants Normal CRP and normal ESR Score 0 Abnormal CRP or abnormal ESR Score 1 D. Duration of symptoms <6 weeks Score 0 6 weeks Score 1 (must occur within 3 months of joint involvement to contribute to the score)
Candidate patients The CPRD extract contained over 3,000,000 patients. In order to make the algorithm more tractable we filtered the patients to produce a list of candidate patients. The candidate patients for the RA diagnostic algorithm were patients with 1. Record of joint involvement 2. Test result for either APR of serology test (or both) There were 136,036 such patients.
Results: records of large joint involvement Number of records Frequency % 0 99,908 73.44 1 26,371 19.39 2 6,541 4.81 3 1,694 1.25 4 772 0.57 5 320 0.24 6 172 0.13 7 93 0.07 8 63 0.05 9 41 0.03 10 or more 61 0.04 Total 136,036 100.00
Results: records of small joint involvement Number of records Frequency % 0 135,034 99.26 1 717 0.53 2 243 0.18 3 28 0.02 4 12 0.01 5 or more 2 0.00 Total 136,036 100.00
Results: records of site not specified Number of records Frequency % 0 33,499 24.63 1 81,631 60.01 2 14,341 10.54 3 3,797 2.79 4 1,480 1.09 5 604 0.44 6 255 0.19 7 171 0.13 8 89 0.07 9 46 0.03 10 or more 123 0.09 Total 136,036 100.00
Revising the algorithm Problems with joint counting The laterality of the joints is not specified (except possibly in hard to access free text) so for example two records of knee involvement could be one joint or two. The number of joints involved is not specified, so for example a record of finger joint involvement could mean anything from 1 to 8 joints. In a large majority of cases the joint location is not specified. Revised algorithm Algorithm A (strict) Algorithm B (lax) If there is any record of joint involvement, score 1 point If there is any record of joint involvement, score 2 points 1 large joint Score 0 2 10 large joints Score 1 1 3 small joints Score 2 4 10 small joints Score 3 >10 joints (at least one small) Score 5
So two versions of algorithm used Algorithm Score from A (strict): Algorithm score minimum A (strict) of 1 points for Algorithm joint involvement B (lax) Algorithm joint section B (lax): score minimum of 2 points for joint involvement of algorithm Frequency % Frequency % 0 1,120 0.82 1,120 0.82 1 133,914 98.44 0 0.00 2 988 0.73 134,902 99.17 3 14 0.01 14 0.01
Total algorithm scores for strict and lax joint involvement Total score from algorithm Algorithm A (strict) Algorithm B (lax) Frequency % Frequency % 0 966 0.71 966 0.71 1 51,479 37.84 0 0.00 2 54,438 40.02 51,920 38.17 3 1,222 0.90 54,527 40.08 4 11,346 8.34 701 0.52 5 16,483 12.12 11,418 8.39 6 98 0.07 16,493 12.12 7 4 0.00 11 0.01
Total possible? RA cases after conflicting disease exclusions Cases of RA Before exclusion of patients with alternative diagnosis After exclusions of patients with alternative diagnosis Doctor diagnosed cases 89,675 88,299 Additional algorithm diagnosed cases Algorithm A (strict) Algorithm B (lax) 70 68 13,321 12,928
Prevalence of RA in CPRD data: 1960-2014 Year(s) Prevalence of doctor diagnosed RA (per million) Additional prevalence of algorithm diagnosed RA (per million) Total prevalence of RA (per million) Additional algorithm cases as % of doctor diagnosed cases 1960-1964 186.4 0 186.4 0 1965-1969 255.3 0 255.3 0 1970-1974 342.2 0 342.2 0 1975-1979 452.5 0.2 452.7 0 1980-1984 586.5 0.5 587 0.1 1985-1989 753.4 1.2 754.6 0.2 1990-1994 1238.4 17.1 1255.5 1.4 1995-1999 1841.6 89.9 1931.5 4.9 2000-2004 2540.3 303.8 2844 12 2005-2009 3461.6 615.1 4076.6 17.8 2010 3948.3 783.8 4732.1 19.9 2011 4094.9 830.2 4925.1 20.3 2012 4272.7 874.8 5147.5 20.5 2013 4552.3 911.9 5464.2 20 2014 4877.4 942.1 5819.4 19.3
Comparison with previous RA prevalence estimates Men Women CPRD1 doctor diagnosed cases 0.290 % 0.672 % CPRD2 doctor diagnosed + lax algorithm 0.353 % 0.795 % Norfolk study [1] 0.44 % 1.16 % [1] Symmons, D., et al. (2002). "The prevalence of rheumatoid arthritis in the United Kingdom: new estimates for a new century." Rheumatology (Oxford) 41(7): 793-800.
Prevalence patterns by age group and time (males) 1 Year Prevalence of doctor diagnosed RA (cases per million people) 18-44 45-64 65-74 75+ 2000 287.3 2486.6 6039.1 7594.3 2001 305.7 2582.9 6356.3 7856.3 2002 320.5 2678.6 6684.2 8091.4 2003 331.9 2799.9 7020.9 8435.9 2004 344.3 2917.2 7399 8847.7 2005 360.6 3018.1 7612.8 9299.9 2006 384.1 3125.4 7878.3 9539.6 2007 394.5 3214.4 8079.4 9718.2 2008 404.8 3259.8 8243.2 9948.1 2009 426.8 3249.3 8483.2 10293.5 2010 437.8 3251.8 8538.6 10407 2011 440.6 3260.7 8622.8 10468.7 2012 452.3 3241.7 8693 10672.5 2013 476.7 3350.9 8991.7 11033 2014 506.3 3465.7 9341.1 11510.3
Prevalence patterns by age group and time (males) 2 Year Prevalence of additional algorithm diagnosed RA (cases per million people) 18-44 45-64 65-74 75+ 2000 50.4 346.1 441.6 202.8 2001 64.2 429.1 566.8 292.1 2002 76.2 524.1 710.8 390 2003 88.6 619.5 858.2 484.2 2004 97.6 714 938.6 614.4 2005 109.1 793.4 1109.1 719.3 2006 120.4 843.4 1242.4 822.3 2007 132.5 896.4 1360.4 896.9 2008 143 958.8 1437 1040.6 2009 147.8 984.2 1529.5 1154.9 2010 159 995.6 1659.5 1291.2 2011 161 1004 1758.6 1373.8 2012 159.4 1021.3 1780.6 1456.5 2013 154.3 1013.7 1835.2 1525 2014 150.1 990.1 1893 1565.6
Prevalence patterns by age group and time (males) 3 Year Percentage change in the doctor diagnosed prevalence from the addition of algorithm diagnosed cases 18-44 45-64 65-74 75+ 2000 17.5 13.9 7.3 2.7 2001 21 16.6 8.9 3.7 2002 23.8 19.6 10.6 4.8 2003 26.7 22.1 12.2 5.7 2004 28.3 24.5 12.7 6.9 2005 30.2 26.3 14.6 7.7 2006 31.3 27 15.8 8.6 2007 33.6 27.9 16.8 9.2 2008 35.3 29.4 17.4 10.5 2009 34.6 30.3 18 11.2 2010 36.3 30.6 19.4 12.4 2011 36.5 30.8 20.4 13.1 2012 35.2 31.5 20.5 13.6 2013 32.4 30.3 20.4 13.8 2014 29.6 28.6 20.3 13.6
Discussion Are the algorithm cases really cases of RA? Scoring 2 points for any joint involvement, our algorithm B (lax) cases have: (i) Some joint involvement + low positive RF test + positive APR test within 3 months of joint involvement + symptom duration > 6 weeks OR (ii) Some joint involvment + high positive RF test + EITHER pos. APR test within 3 months of joint involvement OR symptom duration > 6 weeks We may do better to think of these as high risk individuals or pre-cases. The decline in the rate of additional algorithm cases with increasing age (PREVIOUS SLIDE) is consistent with algorithm cases being high risk individuals (or pre-cases ) many of whom then go on to develop doctor diagnosed RA.
Next steps: DMARDs and regression modelling Identifying a third group of potential cases those taking DMARD drugs Identification of a further group of (potential) RA cases in addition to doctor diagnosed and algorithm group those taking DMARD drugs who have no other diagnosis to explain the prescription. Regression modelling We will explore the risk factors for RA in case / control logistic regression models, considering the three groups of cases / potential cases identified: Doctor diagnosed RA cases Algorithm diagnosed RA cases Patients taking DMARDs with no other known cause
Next steps: regression modelling / known risk factors Known Risk Factors for RA Sex Age group Smoking Alcohol intake BMI Ethnicity Occupation Parity Deprivation Education
Conclusion Working with CPRD data has the advantage of an extremely large sample size. The potential lack of uniformity in the data presents some challenges when interpreting results. It is important to make use of validation and sensitivity analysis where possible. Abbreviations ACPA Anti Citrullinated Protein Antibody (test) ACR American College of Rheumatology APR Acute Phase Reactants ARUK Arthritis Research UK CPRD Clinical Practice Research Datalink CRP C-Reactive Protein (test) ESR Erythrocyte Sedimentation Rate (test) EULAR European League Against Rheumatism MSK Musculoskeletal QOF Quality and Outcomes Framework RA RF Rheumatoid Arthritis Rheumatoid Factor (test) References Aletaha D, Neogi T, Silman AJ. 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Annals of the Rheumatic Diseases. 2010;69(10). Raza, K. and A. Filer (2014). "The therapeutic window of opportunity in rheumatoid arthritis: does it ever close?" Annals of the Rheumatic Diseases 74(5): 793-794. Symmons, D., et al. (2002). "The prevalence of rheumatoid arthritis in the United Kingdom: new estimates for a new century." Rheumatology (Oxford) 41(7): 793-800.