CS221 / Autumn 2017 / Liang & Ermon. Lecture 19: Conclusion

CS221 / Autumn 2017 / Liang & Ermon Lecture 19: Conclusion

Outlook AI is everywhere: IT, transportation, manifacturing, etc. AI being used to make decisions for: education, credit, employment, advertising, health care and policing CS221 / Autumn 2017 / Liang & Ermon 1

Ethical Issues Algorithms and computers are neutral: but algorithms and data are created by people: CS221 / Autumn 2017 / Liang & Ermon 2

2D visualization of word vectors CS221 / Autumn 2017 / Liang & Ermon 3

Analogies [Bolukbasi et al., 2016] Differences in vectors capture relations: θ france θ french θ mexico θ spanish (language) θ king θ man θ queen θ woman (gender) θ king θ man + θ woman θ queen Data as a social mirror: θ computer programmer θ man + θ woman θ homemaker θ doctor θ father + θ mother θ nurse θ feminism θ woman + θ man θ convervatism CS221 / Autumn 2017 / Liang & Ermon 4

Framework x D train Learner f y By design: picks up patterns in training data, including biases CS221 / Autumn 2017 / Liang & Ermon 5

Ethical concerns..big data analytics have the potential to eclipse longstanding civil rights protections in how personal information is used in housing, credit, employment, health, education and the marketplace. Americans relationship with data should expand, not diminish, their opportunities.. CS221 / Autumn 2017 / Liang & Ermon 6

Question When you develop ML systems, what should you be aware of? CS221 / Autumn 2017 / Liang & Ermon 7

Protected attributes Example task: predict whether a criminal will re-offend Available data: [Conviction Type, Number of Priors, Age, Income, Location, Race] Protected Attributes: To avoid discrimination, we should discard the Race attribute. New data: [Conviction Type, Number of Priors, Age, Income, Location] Use degree-2 polynomial features: [Conviction Type * Number of Priors, Conviction Type * Age,..., Income*Location] CS221 / Autumn 2017 / Liang & Ermon 8

Protected attributes Example task: predict whether a criminal will re-offend Features: [Conviction Type * Number of Priors, Conviction Type * Age,..., Income*Location] We have access to the following feature: (Location = Oakland) (Income < 10K) Can infer absent attributes: for example, race and gender CS221 / Autumn 2017 / Liang & Ermon 9

Machine Bias COMPAS: software used across the country to predict future criminals (recidivism) Propublica: it s biased against blacks (higher false positive rates) Disputed by Northpointe Inc. (same precision) CS221 / Autumn 2017 / Liang & Ermon 10

Approximation and estimation error All predictors F f approx. error g Learning est. error ˆf Feature extraction Generally, more data means smaller estimation error By definition, less data on minority groups. Can lead to higher error rates on minority. CS221 / Autumn 2017 / Liang & Ermon 11

Fairness Figure from Moritz Hardt Most ML training objectives will produce model accurate for majority class, at the expense of the minority one. CS221 / Autumn 2017 / Liang & Ermon 12

Fairness Figure from Moritz Hardt Two classifiers with 5% error: CS221 / Autumn 2017 / Liang & Ermon 13

References Links: https://fairmlclass.github.io/ https://www.fatml.org/ https://cdt.org/issue/privacy-data/digital-decisions/ CS221 / Autumn 2017 / Liang & Ermon 14

Responsibility fatml.org Principles for Accountable Algorithms: There is always a human ultimately responsible for decisions made or informed by an algorithm. The algorithm did it is not an acceptable excuse if algorithmic systems make mistakes or have undesired consequences, including from machine-learning processes CS221 / Autumn 2017 / Liang & Ermon 15

Feedback loops Data Hypothesis Predictions CS221 / Autumn 2017 / Liang & Ermon 16

Privacy Not reveal sensitive information (income, health, communication) Compute average statistics (how many people have cancer?) yes no yes no no no no yes no yes CS221 / Autumn 2017 / Liang & Ermon 17

[Warner, 1965] Randomized response Do you have a sibling? Method: Flip two coins. If both heads: answer yes/no randomly Otherwise: answer yes/no truthfully Analysis: true-prob = 4 3 (observed-prob 1 8 ) CS221 / Autumn 2017 / Liang & Ermon 18

Causality Goal: figure out the effect of a treatment on survival Data: For untreated patients, 80% survive For treated patients, 30% survive Does the treatment help? Nothing! Sick people are more likely to undergo treatment... CS221 / Autumn 2017 / Liang & Ermon 19

[Szegedy et al., 2013; Goodfellow et al., 2014] Adversaries AlexNet predicts correctly on the left AlexNet predicts ostrich on the right CS221 / Autumn 2017 / Liang & Ermon 20

[Mykel Kochdorfer] Safety guarantees For air-traffic control, threshold level of safety: probability 10 9 for a catastrophic failure (e.g., collision) per flight hour Move from human designed rules to a numeric Q-value table? yes CS221 / Autumn 2017 / Liang & Ermon 21

CS221 / Autumn 2017 / Liang & Ermon 22

CS221 / Autumn 2017 / Liang & Ermon 23

CS221 / Autumn 2017 / Liang & Ermon 24

Other AI-related courses http://ai.stanford.edu/courses/ Foundations: CS228: Probabilistic Graphical Models CS229: Machine Learning CS229T: Statistical Learning Theory CS334A: Convex Optimization CS238: Decision Making Under Uncertainty CS257: Logic and Artificial Intelligence CS246: Mining Massive Data Sets CS221 / Autumn 2017 / Liang & Ermon 25

Other AI-related courses http://ai.stanford.edu/courses/ Applications: CS224N: Natural Language Processing (with Deep Learning) CS224U: Natural Language Understanding CS231A: Introduction to Computer Vision CS231N: Convolutional Neural Networks for Visual Recognition CS223A: Introduction to Robotics CS227B: General Game Playing CS221 / Autumn 2017 / Liang & Ermon 26

Probabilistic graphical models (CS228) h x Forward-backward, variable elimination belief propagation, variational inference Gibbs sampling Markov Chain Monte Carlo (MCMC) Learning the structure CS221 / Autumn 2017 / Liang & Ermon 27

Machine learning (CS229) Boosting, bagging, feature selection Discrete continuous K-means mixture of Gaussians Q-learning policy gradient CS221 / Autumn 2017 / Liang & Ermon 28

Statistical learning theory (CS229T) Question: what are the mathematical principles behind learning? Uniform convergence: with probability at least 0.95, your algorithm will return a predictor h H such that TestError(h) TrainError(h) + Complexity(H) n CS221 / Autumn 2017 / Liang & Ermon 29

Cognitive science Question: How does the human mind work? Cognitive science and AI grew up together Humans can learn from few examples on many tasks Computation and cognitive science (PSYCH204): Cognition as Bayesian modeling probabilistic program [Tenenbaum, Goodman, Griffiths] CS221 / Autumn 2017 / Liang & Ermon 30

Neuroscience Neuroscience: hardware; cognitive science: software Artificial neural network as computational models of the brain Modern neural networks (GPUs + backpropagation) not biologically plausible Analogy: birds versus airplanes; what are principles of intelligence? CS221 / Autumn 2017 / Liang & Ermon 31

Online materials Online courses (Coursera, edx) Videolectures.net: tons of recorded talks from major leaders of AI (and other fields) arxiv.org: latest research (pre-prints) Blog posts, tutorials CS221 / Autumn 2017 / Liang & Ermon 32

Conferences AI: IJCAI, AAAI Machine learning: ICML, NIPS, UAI, COLT Data mining: KDD, CIKM, WWW Natural language processing: ACL, EMNLP, NAACL Computer vision: CPVR, ICCV, ECCV Robotics: RSS, ICRA CS221 / Autumn 2017 / Liang & Ermon 33

Search problems Markov decision processes Constraint satisfaction problems Adversarial games Bayesian networks Reflex States Variables Logic Low-level intelligence Machine learning High-level intelligence Please fill out course evaluations on Axess. Thanks for an exciting quarter! CS221 / Autumn 2017 / Liang & Ermon 34