Using Causal Inference T Make Sense f Messy Data Ilya Shpitser Jhn C. Malne Assistant Prfessr f Cmputer Science Malne Center fr Engineering in Healthcare The Jhns Hpkins University
Health Care: Csts Abslute ependitures $3.0 trillin 17.5% GDP (2014) Relative ependitures 50% increase in past 10 years Ptential efficiency gains $750 billin (2009) mre than 25% f the ttal Frm Best Care At Lwer Csts: The Path t Cntinuusly Learning Health Care in America Institute f Medicine, 2012
Health Care: Cmpleity Mre cnditins e.g. 79 year ld patient with 19 meds per day Mre clinicians e.g. 200 ther dctrs treating patients f a single primary care dctr Mre chices Mre activities e.g. hundreds f diagnstic factrs; dzens f treatments e.g. ICU clinicians with 180 activities per day Frm Best Care At Lwer Csts: The Path t Cntinuusly Learning Health Care in America Institute f Medicine, 2012
Malne Center Missin: T catalyze and accelerate the develpment, translatin, and deplyment f research-based innvatins that advance the effectiveness and efficiency f health care. Smart Devices and Systems fr Healthcare creating devices and infrmatin analytics that enhance care in the clinical envirnment Mdeling and Optimizatin fr Healthcare Delivery epliting traditinal and new surces f data t enhance the efficiency and quality f healthcare Mbile Health and Healthy Living develping innvatins that supprt individuals utside traditinal care envirnments, that enhance health in everyday life, and that augment traditinal health care appraches
My Wrk at the Malne Center Science frm biased data Pr treatment utcmes: bad treatment, pr adherence, cnfunding? Decisin supprt Treatment decisins are a cmple cmbinatin f medical training and institutinal knwledge. Can we use learning algrithms t help? Dealing with missing data Mst datasets in practice have systematically missing entries. This creates bias if nt prperly handled. Hw d we handle cmple missing data?
Science frm biased data Better healthcare means making better decisins Decisins are abut causal efficacy Randmized cntrlled trial data ften nt available Practical data: cnfunding bias, selectin bias, missing data, measurement errr The field f causal inference aims t prvide answers in this challenging setting
Adherence in HIV Patients Setting: lngitudinal bservatinal studies f HIV patients (PEPFAR prgram). Outcme: viral failure, treatments are antiretrviral therapies Questin: hw are utcmes affected by: Pr drug chice, r Pr adherence Frmally, adherence mediates (all?, sme?) f the effect f the drug.
Adherence as a causal prblem What causes virlgical failure in patients? A single slice f a lngitudinal study: C (age, gender, etc.) A (HIV drug), D (white bld cell #), M (% pills taken), Y (utcme) Lts f reasns Y might be pr!
Predicting the hypthetical Every patient was n sme drug, had sme ticity, and sme adherence level. What wuld have happened t their utcme If ticity were lw? If adherence were high? RCTs pssible fr this, but epensive, lengthy. Alternative apprach fr eisting, messy data: Fit bserved data mdels Cmbine in a particular way t mimic the right RCT. Hard in general due t cnfunding, selectin bias.
Predictins under cunterfactual adherence Hw wuld less effective treatment d if adherence was f mre effective treatment? Cmparisn treatments Baseline treatment 2 3 4 5 1 0.412* 0.210-0.059-0.068* 2-0.132-0.495* -0.198* 3 - - -0.566* -0.135* 4 - - - -0.027* Mst significant effects negative. Meaning: Mre effective treatments are harder t take. Effectiveness driven by bichemistry, nt adherence.
Clinical decisin supprt Epliting patterns in cmple data is difficult fr (unaided) humans, even very eperienced clinically. Naïve analysis can be misleading Eample: in crashing sepsis patients, treatment is assciated with wrse utcmes. Wanted: a tl that can utput cunterfactual utcmes at a cmple decisin pint
Decisin Supprt MAP Treatment 1 Treatment 2 Treatment 2 r 3? MAP 3 2 3 2 3: risk 60% 1: risk 80% time time Causal inference methds eist fr predicting cunterfactual utcmes based n factual data. Wrk in prgress (with Suchi Saria s grup) n learning treatment plicies. 13
Missing Data Ubiquitus prblem. Often handled prly. Pssibility f severe bias (eample): HIV prevalence in Zambia Demgraphic and Health Survey Sick peple (severely) underreprt Cmplete case analysis underestimates prevalence by as much as 10%.
Lgistic regressin K CR 15 Sample size K CR Nn-linear cmple separatin K CR K CR mean Missing data Dealing With Missing Data
New methds fr missing data Mst cmple setting is data missing nt at randm (MNAR) Peple dn t reprt seual histry due t that histry. Vting intent, intermittent drput, etc. Easier settings: reweigh bserved cases based n typicality (recent NYT article n plls abut this) Develped new etensin f this t MNAR data. Mre generally: wrk n a cmplete thery f when missing data is a slvable prblem.
Selected prjects Decisin supprt in the ICU (with Suchi Saria and Katie Henry) Net generatin methds fr data missing nt at randm (with Eric Tchetgen Tchetgen and James Rbins) Mediatin analysis fr understanding adherence in HIV studies (with Eric Tchetgen Tchetgen and Phyllis Kanki) Mediatin analysis fr study f radiatin side effects (with Tdd McNutt and the Oncspace Cnsrtium)
THANK YOU! Ilya Shpitser (ilyas@cs.jhu.edu)