Predicting Depression via Social Media Munmun De Choudhury, Michael Gamon, Scott Counts and Eric Horvitz Martin Leginus
Depression Lifetime prevalence varies from 3% in Japan to 17% in the USA Sometimes, people are not aware that are depressed e.g., slow onset of depression
Can we use social networks to detect mental diseases? Social networks
Social media Fine grained signals of user behavior over longer period of time Symptoms of mental disorders are more observable in comparison to other diseases i.e., social engagement, emotion, language or linguistic styles Authors focus on Major Depressive Disorder (MDD). Symptoms are: low mood, low self esteem, loss of interest, negative perception of the world Aim: Predict given user activities on social media whether he/she is likely to be depressed?
How to get a ground truth data? Crowdsourcing Amazon s Mechanical Turk service Crowd workers filled in two depression screening tests http://www.bcbsm.com/pdf/depression_ces D.pdf http://www.thecommunityhouse.org/wp content/uploads/2012/01/beck Depression Inventory and Scoring Key1.pdf Self reported questions: Had you been diagnosed with clinical depression? If yes, what was the estimated onset? Are you depressed or taking any antidepressants at the moment? Participants should have a public Twitter profile
How to get a ground truth data? 1583 participants (price ~ 1400$) only 40% agreed to share their Twitter feeds further cleaning results in 476 users (243 men, 233 women) 171 users scored positive for depression Having a job again makes me happy. Less time to be depressed and eat all day while watching sad movies. Are you okay? Yes. I understand that I am upset and hope less and nothing can help me I m okay but I am not alright empty feelings I WAS JUST TALKING ABOUT HOW I I HAVE EMOTION OH MY GOODNESS I FEEL AWFUL I want someone to hold me and be there for me when I m sad. Reloading twitter till I pass out. *lonely* *anxious* *butthurt* *frustrated* *dead*
Characteristic attributes Engagement Egocentric social graph Emotion Depression language Linguistic language 43 different features which are calculated daily over the period of one year.
Engagement Volume # of posts per day made by the user Reply posts proportion of user @reply posts indicates social interaction Retweets proportion of user retweets indicates information sharing Links # of shared urls over a day Question centric # of posts which try to seek or derive information from the Twitter users Insomnia index difference between # of posts during night and day window
Engagement Volume Reply posts # of posts per day made by the user proportion of user @reply posts indicates social interaction
Egocentric social graph Set of nodes from user s two hop neighborhood user y user v Edge between two users implies a communication with @replies during a given day user yy user w user x user u user xx Measuring the following: # of incoming or outgoing posts Reciprocity # of user responds to communication started by other user Prestige ratio a ratio of # of messages sent to user u, to the # of messages targeted to user v Graph density, clustering coefficient, size of the graph, embeddedness, # of ego components
Egonetwork measures Depressed class Non depressed class Egocentric social graph #followers/inlinks 26.9 (σ=78.3) 45.32 (σ=90.74) #followees/outlinks 19.2 (σ=52.4) 40.06 (σ=63.25) Reciprocity 0.77 (σ=0.09) 1.364 (σ=0.186) Prestige ratio 0.98 (σ=0.13) 0.613 (σ=0.277) Graph density 0.01 (σ=0.03) 0.019 (σ=0.051) Clustering coefficient 0.02 (σ=0.05) 0.011 (σ=0.072) 2 hop neighborhood 104 (σ=82.42) 198.4 (σ=110.3) Embeddedness 0.38 (σ=0.14) 0.226 (σ=0.192) #ego components 15.3 (σ=3.25) 7.851 (σ=6.294)
Emotion Psycho linguistic resource LIWC to measure positive or negative affect ANEW lexicon used for computing activation and dominance Activation describes a physical intensity in an emotion (terrified is greater than scared) Dominance refers to the degree of control in an emotion (anger is dominant, fear is submissive) Linguistic style Using linguistic resource LIWC for recognizing 22 specific linguistic styles: articles, auxiliary verbs, conjunctions, adverbs, personal pronouns, prepositions, functional words, assent, negation, certainty and quantifiers
Emotion + Linguistic style
Depression language Depression lexicon built from Yahoo answers on Mental Health ~ 900k Q&A pairs Association for each word and regex depress* calculated using Pointwise mutual information Log likelihood ratio Top 1000 words with the highest tf idf Antidepressant usage list of antidepressants from Wikipedia used to construct drugs lexicon
Theme Symptoms Unigrams Depression language anxiety, withdrawal, severe, delusions, adhd, weight, insomnia, drowsiness, suicidal, appe tite, dizziness, nausea, episodes, attacks, sleep, seizures, addictive, weaned, swings, dysfunction, blurred, irritability, headache, fatigue, imbalance, nervousness, psychosis, drowsy Disclosure fun, play, helped, god, answer, wants, leave, beautiful, suffer, sorry, tolerance, agree, hate, helpful, haha, enjoy, social, talk, save, win, care, love, like, hold, cope, amazing, discuss Treatment medication, side effects, doctor, doses, effective, prescribed, therapy, inhibitor, stimulant, antidepressant, patients, neurotransmitters, prescriptions, psychotherapy, diagnosis, clinical, pills, chemical, counteract, toxicity, hospitalization, sedative, 150mg, 40mg, drugs Relationships, life home, woman, she, him, girl, game, men, friends, sexual, boy, someone, movie, favorite, jesus, house, music, religion, her, songs, party, bible, relationship, hell, young, style, church, lord, father, season, heaven, dating
Predicting depressive behavior
Feature vectors For each feature, the following four features are computed Mean frequency Variance Mean momentum Entropy 188 features = 43 attributes x 4 + 4 demographic features Principal component analysis to reduce number of features
Classifier Support Vector Machine classifier Radial basis kernel 10 fold cross validation and 100 randomized experimental runs
Results precision recall acc. (+ve) acc. (mean) engagement 0.542 0.439 53.212% 55.328% ego network 0.627 0.495 58.375% 61.246% emotion 0.642 0.523 61.249% 64.325% linguistic style 0.683 0.576 65.124% 68.415% depression language 0.655 0.592 66.256% 69.244% demographics 0.452 0.406 47.914% 51.323% all features 0.705 0.614 68.247% 71.209% reduced dimensions 0.742 0.629 70.351% 72.384%
Results
Discussion Implications Privacy issues
Conclusion and future work 43 different attributes that characterize depressed users of social media Crowdsourced golden standard Forecast of depression before reported onset
Questions