Outcome measures in neuroscience clinical trials

Neuroscience TM Outcome measures in neuroscience clinical trials Jang-Ho Cha, Head, Translational Medicine Neuroscience Novartis Institute for BioMedical Research February 2017

Outline What is our problem? Poor success rate in CNS drug development Why do we have this problem? Indirect measures lack of sensitivity to change placebo effects How can we do things differently? Examples of novel approaches 2 World CNS Summit February 2017 J.Cha

3 What is our problem? Poor success rate in CNS drug development

Likelihood of approval (LOA) from ph I Low success rate in CNS drug development Source: Tufts CSDD, 2017 4 World CNS Summit February 2017 J.Cha

The time it takes to reach approval once drug is in clinical trials: 2010-14 Add here slide? 5 World CNS Summit February 2017 J.Cha Source: Tufts CSDD, 2017

Poor success in CNS drug development A number of factors Lost in translation: poor predictive value of animal models of disease FST Depression Diagnostic imprecision Based on DSM clinical criteria Disease heterogeneity: most neuroscience indications are not a single disease ALS: SOD1, C9orf72, FUS/TLS Weak clinical outcome measures 6 World CNS Summit February 2017 J.Cha

7 Why do we have this problem? Lack of sensitivity to change and placebo effects

No fluid BMx as an outcome measure Clinical outcome measures are largely based on paper pencil cognitive tests and subjective rating scales Neuroscience lacks informative fluid (blood, CSF) biomarkers CV: serum cholesterol ID: viral load Diabetes: HgbA1c 8 World CNS Summit February 2017 J.Cha

Depression Schizophrenia Parkinsson s disease Alzheimer s disease Test Examples of subtests Author/s Year MMSE Folstein et al., 1975 ADAS-Cog Rosen et al., 1984 NTB (2007) The NTB includes: Harrison 2007 Items from the Wechsler Memory Scale Wechsler 1987 Rey Auditory Verbal Learning Test Rey 1964 Category Fluency Test Reitan, Ralph 1985 UPDRS The UPDRS includes e.g.: Hoehn and Yahr scale Hoehn and Yahr 1967 Schwab and England activities of daily living scale Schwab and England 1969 MATRICS The MATRICS includes e.g.: Trail making test Reitan 1944 Wechsler Memory Scale 3 rd Ed. Wechsler 1997 Hopkins Verbal Learning Test Brandt 1991 PANSS Kay et al., 1987 MADRS Montgomery and Asberg 1979 HAMD Hamilton 1960

Regulatory endpoints are dated Colossus, the world's first programmable, electronic, digital computer First Apple Mac (Macintosh) MCI criteria 1940 1960 1980 2000 1950 1970 1990 2010 RAVLT MMSE ADAS- Cog ADCS-ADL WMS subscales CDR COWAT* 10 2007: The NTB Present day: Composite developments

Regulatory endpoints are dated Colossus, the world's first programmable, electronic, digital computer First Apple Mac (Macintosh) 1940 1960 1980 2000 1950 1970 1990 2010 TMT HAMD Schwab& England Hoehn and Yahr scale MADRS PANSS HVLT UPDRS 11 2008: MCCB

Limitations result in large and lengthy POCs The field relies on paper-pencil tests and rating scales Traditional endpoints developed for detecting disease pathology Complex, burdensome and costly Major issues: Variability Psychometric limitations Large, lengthy and costly POCs that may not translate to larger trials

Variability Variability adds noise to clinical trial data Numerous sources of variability from the beginning of the trial to the end: Inter- and intraindividual variability Heterogeneous patient populations Day to day fluctuations Rater variability Variability in how the test is administered and scored (rater errors and bias) Rater turnover (especially in longer clinical trials) Site and country variability (especially in larger clinical trials) The more noisy the data the more difficult it is to detect a drug signal 13

Psychometric limitations Poor intra-rater reliability Practice effects Lack of sensitivity to change Ceiling and floor effects The outcome measures are either too easy or too difficult making it difficult to demonstrate cognitive enhancement or decline The choice of outcome measure should match the severity of cognitive impairment being studied 14

Ceiling effects are seen on ADAS-Cog Items at floor are too difficult = max possible score Items at ceiling are too easy Grundman MPH et al. (2004). 10 11 12

Psychometric limitations Intra-rater reliability Sensitivity to change Practice effects Ceiling and floor effects The outcome measures are either too easy or too difficult making it difficult to demonstrate cognitive enhancement or decline The choice of outcome measure should match the severity of cognitive impairment being studied Cognition is categorized into different domains No test is a pure measure of memory, language, executive function Some test batteries do not cover all cognitive domains ADAS-Cog does not capture executive function 16

ADAS-Cog: no attention and executive fx Primarily a measure of memory, language and praxis 10 11 12 Grundman MPH et al. (2004).

Cognitive testing is surprisingly costly Translations, licensing/version control, standardizations ($$$) Most traditional paper & pencil tests require: high skill level extensive administrator training ($$) Monitoring ($$$) Data management Burdensome to patients Burdensome to site personnel Large, lengthy and expensive POCs that may not translate well to larger trials 18

Increasing placebo response on PANSS in schizophrenia trials Greater placebo response Less placebo response Alphs et al., 2012 19

Small effect sizes in Alzheimer s disease ADAS-Cog primary endpoint Drug/brand name Donepezil /Aricept Galantamine /Razadyne Memantine /Namenda Rivastigmine /Exelon donepezil and memantine /Namzaric Approved For FDA Approved Primary endpoint Trial n ES All stages 1996 ADAS-cog DON-302 303 0.53 Mild to moderate Moderate to severe 2001 ADAS-cog GAL-INT-I 435 0.50 2003 All stages 2000 ADAS-cog RIV-B303 480 0.23 Moderate to severe 2014 ADCS-ADL Effect sizes: Very Small=0.01, Small=0.20, Medium=0.50, Large=0.80, Very Large=1.20, Huge=2.0 (Cohen, 1988; Sawilowsky, 2009) 20

Small effect sizes in Alzheimer s disease ADAS-Cog primary endpoint ADAS-Cog estimates of the size of the treatment effect (Cohen s d, IIT by dose) Low dose ChEIs High dose ChEIs Effect sizes: Very Small=0.01, Small=0.20, Medium=0.50, Large=0.80, Very Large=1.20, Huge=2.0 (Cohen, 1988; Sawilowsky, 2009) 21 Rockwood, 2004

To summarize: many of the traditional endpoints are not appropriate for today s clinical trials For example: cognitive tests that are too easy and subjective rating scales that result in a high placebo response Impact: Small ES drives higher study samples More expensive trials Longer trials NS indications relatively less attractive 22

23 How can we do things differently? Examples of novel approaches

What is being done? Improving sensitivity of clinical endpoints by adding or removing test items e.g. composite endpoint development Improving administration e.g. thorough rater training and use of tablets/ipad Optimizing patient selection e.g. enriched enrollment of drug responsive patients Optimizing the study design e.g. run in/training periods and longer trials Addressing day to day variability e.g. more frequent administrations 24

Does repeated computerized testing improve sensitivity? A study on the cognitive effects of donepezil in AD Jaeger, J., Hårdemark, H. G., Zettergren, A., Sjögren, N., & Hannesdottir, K. Presented at AAIC 2011.

Does repeated computerized testing improve sensitivity? No significant donepezil signal relative to placebo at 4 weeks A donepezil memory signal emerged at 8 and 12 weeks on (averaged) OCL test and on NTB memory subtests Not on ADAS-Cog Sensitivity of OCL was further improved if the first two baseline measurements were excluded Jaeger, J., Soaita, A., Gale, J., & Hannesdottir, K. Presented at CTAD 2013

An urgent need for innovation Improved outcome measures Measures that are objective (physiology, sensors) Require less administrator involvement and training Noninvasive Easy to administer and brief Accurately track things like reaction speed Clearly translate to patient s ability to function in daily life Biomarkers to inform clinical development Larger effect sizes (smaller sample sizes, shorter trials) There is no one size fits all in clinical trials Carefully think about clinical endpoint strategy for each drug, patient population and study design 28

MRI a technology that revolutionized progress in Multiple sclerosis Faster, more informative trials 1995-2015 Expanded Disability Status Scale (EDSS) Traditional Clinical Endpoint 2000-2009 Interferons MRI - T1, T2 Gd+ lesions Exploratory endpoint, Phase 2 decision making 2010-2015 Gilenya, Tysabri, Tecfidera, Lemtrada Black Holes -> Axonal Loss; Volume Loss -> Brain Atrophy Additional Descriptors of Clinical Benefit No Therapies Many Therapies

The landscape of digital devices 30

Improved technology in epilepsy trials From Pen and Paper to Automatic Detection Improve measurement accuracy Seizure Diary Current primary endpoint Device assisted Seizure ediary Automatic Seizure Detection

Improved assessments of movement Using wearable devices to capture objective motor function data Short Physical Performance Battery Balance, gait and chair stand ability Timed Up & Go Test Time to rise from chair, walk 3 meters, turn, sit down 2-minute walk test Distance walked in 2 minutes. Traditional subjective data vs. continuous quantitative sensor data

Computer games in CNS trials? Example: Evo from Akili Evo is a fast-paced, self-administered, dual task tablet game that measures the ability to pay attention, plan and make decisions Deficits commonly occur in CNS disorders like Alzheimer's disease, ADHD and depression A study of almost 100 healthy elderly subjects with and without amyloid deposits (a biomarker of Alzheimer s disease, confirmed by PET imaging) The goal was to investigate the potential of the game as a biomarker or clinical endpoint for use in future Alzheimer's trials Cognition was assessed at baseline and over the course of one month of game play Evo was able to differentiate between healthy subjects with and without amyloid deposits in their brain 33

Digital/wearable devices Digital devices hold a lot of promise As sensitive and objective clinical endpoints Shortening clinical trials Less burdensome and in-home assessments Allowing continuous monitoring providing a better insight into real-life settings and activities of daily living However limited by lack of Validation and regulatory approval Understanding of how to summarise/analyze the data Standards for implementing devices in large/global clinical trials 34

Closing remarks Many of the current paper pencil tests and ratings scales have significant limitations Replace poor endpoints Reduce patient burden and cost Increase measurement accuracy Novel measures are needed to change the game in neuroscience drug development 35

Thank you