Data with Python - Examples
|
|
- Myron Hines
- 5 years ago
- Views:
Transcription
1 Data with Python - Examples May 5, 2018 In [57]: #ipython In [58]: import pandas as pd 1 pandas: load data In [59]: DATA_PATH = '/usr/lib/python3/dist-packages/pandas/tests/data/tips.csv' In [60]: #!cat /usr/lib/python3/dist-packages/pandas/tests/data/tips.csv In [61]: data = pd.read_csv(data_path) In [62]: data Out[62]: total_bill tip sex smoker day time size Female No Sun Dinner Male No Sun Dinner Male No Sun Dinner Male No Sun Dinner Female No Sun Dinner Male No Sun Dinner Male No Sun Dinner Male No Sun Dinner Male No Sun Dinner Male No Sun Dinner Male No Sun Dinner Female No Sun Dinner Male No Sun Dinner Male No Sun Dinner Female No Sun Dinner Male No Sun Dinner Female No Sun Dinner Male No Sun Dinner Female No Sun Dinner Male No Sat Dinner Male No Sat Dinner Female No Sat Dinner 2 1
2 Female No Sat Dinner Male No Sat Dinner Male No Sat Dinner Male No Sat Dinner Male No Sat Dinner Male No Sat Dinner Male No Sat Dinner Female No Sat Dinner Female Yes Sat Dinner Female Yes Sat Dinner Male Yes Sat Dinner Male Yes Sat Dinner Male Yes Sat Dinner Female Yes Sat Dinner Male Yes Fri Lunch Female Yes Fri Lunch Male Yes Fri Lunch Female No Fri Lunch Male Yes Fri Lunch Female Yes Fri Lunch Female Yes Fri Lunch Male No Sat Dinner Male No Sat Dinner Female Yes Sat Dinner Male Yes Sat Dinner Male Yes Sat Dinner Male No Sat Dinner Male No Sat Dinner Male Yes Sat Dinner Male No Sat Dinner Male Yes Sat Dinner Male Yes Sat Dinner Female No Sat Dinner Male No Sat Dinner Female Yes Sat Dinner Male Yes Sat Dinner Male No Sat Dinner Female No Thur Dinner 2 [244 rows x 7 columns] In [63]: data.sort_values('tip') Out[63]: total_bill tip sex smoker day time size Female Yes Sat Dinner Male Yes Sat Dinner Female Yes Fri Dinner 2 2
3 Female No Sat Dinner Female No Sun Dinner Female Yes Sat Dinner Male Yes Sat Dinner Male No Sat Dinner Male No Sat Dinner Female No Thur Lunch Male No Sun Dinner Female No Thur Lunch Male Yes Sat Dinner Male No Thur Lunch Male No Sat Dinner Male No Sat Dinner Male No Thur Lunch Male Yes Sun Dinner Female No Sat Dinner Male Yes Fri Dinner Male No Fri Dinner Female No Thur Lunch Female No Thur Lunch Male Yes Sat Dinner Female No Thur Lunch Male No Thur Lunch Male No Sun Dinner Male No Sun Dinner Male Yes Fri Lunch Female Yes Sat Dinner Male No Sun Dinner Male Yes Fri Dinner Male No Sun Dinner Male No Sat Dinner Male No Thur Lunch Female No Thur Lunch Male No Sun Dinner Male No Sun Dinner Female No Sun Dinner Male Yes Thur Lunch Female Yes Thur Lunch Female Yes Sat Dinner Male No Sun Dinner Female No Sun Dinner Male Yes Sun Dinner Male Yes Sat Dinner Female No Thur Lunch Female No Sun Dinner Male No Sun Dinner Male Yes Sun Dinner 2 3
4 Male No Thur Lunch Male No Sat Dinner Male No Sun Dinner Male Yes Sun Dinner Female Yes Sat Dinner Male No Thur Lunch Male No Sat Dinner Male No Sat Dinner Male No Sat Dinner Male Yes Sat Dinner 3 [244 rows x 7 columns] In [64]: data.head() Out[64]: total_bill tip sex smoker day time size Female No Sun Dinner Male No Sun Dinner Male No Sun Dinner Male No Sun Dinner Female No Sun Dinner 4 In [65]: data['tip'] Out[65]:
5 Name: tip, Length: 244, dtype: float64 In [66]: data['tip'] / data['total_bill'] Out[66]:
6
7 Length: 244, dtype: float64 In [67]: data['perc_tip'] = data['tip'] / data['total_bill'] In [68]: data.head() Out[68]: total_bill tip sex smoker day time size perc_tip Female No Sun Dinner Male No Sun Dinner Male No Sun Dinner Male No Sun Dinner Female No Sun Dinner In [69]: data[data.sex == 'Female'] Out[69]: total_bill tip sex smoker day time size perc_tip Female No Sun Dinner Female No Sun Dinner Female No Sun Dinner Female No Sun Dinner Female No Sun Dinner Female No Sun Dinner Female No Sat Dinner Female No Sat Dinner Female No Sat Dinner Female No Sat Dinner Female No Sat Dinner Female No Sat Dinner Female No Sun Dinner Female No Sun Dinner Female No Sat Dinner Female No Sat Dinner Female Yes Sat Dinner Female No Sat Dinner Female Yes Sat Dinner Female Yes Sat Dinner Female No Sat Dinner Female No Thur Lunch Female No Thur Lunch Female Yes Fri Dinner Female Yes Fri Dinner Female No Fri Dinner Female Yes Fri Dinner Female Yes Fri Dinner
8 Female Yes Sat Dinner Female Yes Sat Dinner Female No Sun Dinner Female No Sun Dinner Female No Sun Dinner Female No Sun Dinner Female Yes Sun Dinner Female Yes Sat Dinner Female Yes Sat Dinner Female Yes Sun Dinner Female Yes Sun Dinner Female Yes Sun Dinner Female Yes Thur Lunch Female Yes Thur Lunch Female Yes Thur Lunch Female Yes Thur Lunch Female Yes Thur Lunch Female Yes Thur Lunch Female Yes Thur Lunch Female Yes Sat Dinner Female Yes Sat Dinner Female Yes Sat Dinner Female Yes Sat Dinner Female Yes Sat Dinner Female Yes Fri Lunch Female No Fri Lunch Female Yes Fri Lunch Female Yes Fri Lunch Female Yes Sat Dinner Female No Sat Dinner Female Yes Sat Dinner Female No Thur Dinner [87 rows x 8 columns] In [70]: data[data.sex == 'Female'].to_csv('waitresses.csv') In [71]:!cat waitresses.csv,total_bill,tip,sex,smoker,day,time,size,perc_tip 0,16.99,1.01,Female,No,Sun,Dinner,2, ,24.59,3.61,Female,No,Sun,Dinner,4, ,35.26,5.0,Female,No,Sun,Dinner,4, ,14.83,3.02,Female,No,Sun,Dinner,2, ,10.33,1.67,Female,No,Sun,Dinner,3, ,16.97,3.5,Female,No,Sun,Dinner,3, ,20.29,2.75,Female,No,Sat,Dinner,2,
9 22,15.77,2.23,Female,No,Sat,Dinner,2, ,19.65,3.0,Female,No,Sat,Dinner,2, ,15.06,3.0,Female,No,Sat,Dinner,2, ,20.69,2.45,Female,No,Sat,Dinner,4, ,16.93,3.07,Female,No,Sat,Dinner,3, ,10.29,2.6,Female,No,Sun,Dinner,2, ,34.81,5.2,Female,No,Sun,Dinner,4, ,26.41,1.5,Female,No,Sat,Dinner,2, ,16.45,2.47,Female,No,Sat,Dinner,2, ,3.07,1.0,Female,Yes,Sat,Dinner,1, ,17.07,3.0,Female,No,Sat,Dinner,3, ,26.86,3.14,Female,Yes,Sat,Dinner,2, ,25.28,5.0,Female,Yes,Sat,Dinner,2, ,14.73,2.2,Female,No,Sat,Dinner,2, ,10.07,1.83,Female,No,Thur,Lunch,1, ,34.83,5.17,Female,No,Thur,Lunch,4, ,5.75,1.0,Female,Yes,Fri,Dinner,2, ,16.32,4.3,Female,Yes,Fri,Dinner,2, ,22.75,3.25,Female,No,Fri,Dinner,2, ,11.35,2.5,Female,Yes,Fri,Dinner,2, ,15.38,3.0,Female,Yes,Fri,Dinner,2, ,44.3,2.5,Female,Yes,Sat,Dinner,3, ,22.42,3.48,Female,Yes,Sat,Dinner,2, ,20.92,4.08,Female,No,Sat,Dinner,2, ,14.31,4.0,Female,Yes,Sat,Dinner,2, ,7.25,1.0,Female,No,Sat,Dinner,1, ,25.71,4.0,Female,No,Sun,Dinner,3, ,17.31,3.5,Female,No,Sun,Dinner,2, ,10.65,1.5,Female,No,Thur,Lunch,2, ,12.43,1.8,Female,No,Thur,Lunch,2, ,24.08,2.92,Female,No,Thur,Lunch,4, ,13.42,1.68,Female,No,Thur,Lunch,2, ,12.48,2.52,Female,No,Thur,Lunch,2, ,29.8,4.2,Female,No,Thur,Lunch,6, ,14.52,2.0,Female,No,Thur,Lunch,2, ,11.38,2.0,Female,No,Thur,Lunch,2, ,20.27,2.83,Female,No,Thur,Lunch,2, ,11.17,1.5,Female,No,Thur,Lunch,2, ,12.26,2.0,Female,No,Thur,Lunch,2, ,18.26,3.25,Female,No,Thur,Lunch,2, ,8.51,1.25,Female,No,Thur,Lunch,2, ,10.33,2.0,Female,No,Thur,Lunch,2, ,14.15,2.0,Female,No,Thur,Lunch,2, ,13.16,2.75,Female,No,Thur,Lunch,2, ,17.47,3.5,Female,No,Thur,Lunch,2, ,27.05,5.0,Female,No,Thur,Lunch,6, ,16.43,2.3,Female,No,Thur,Lunch,2, ,8.35,1.5,Female,No,Thur,Lunch,2,
10 146,18.64,1.36,Female,No,Thur,Lunch,3, ,11.87,1.63,Female,No,Thur,Lunch,2, ,29.85,5.14,Female,No,Sun,Dinner,5, ,25.0,3.75,Female,No,Sun,Dinner,4, ,13.39,2.61,Female,No,Sun,Dinner,2, ,16.21,2.0,Female,No,Sun,Dinner,3, ,17.51,3.0,Female,Yes,Sun,Dinner,2, ,10.59,1.61,Female,Yes,Sat,Dinner,2, ,10.63,2.0,Female,Yes,Sat,Dinner,2, ,9.6,4.0,Female,Yes,Sun,Dinner,2, ,20.9,3.5,Female,Yes,Sun,Dinner,3, ,18.15,3.5,Female,Yes,Sun,Dinner,3, ,19.81,4.19,Female,Yes,Thur,Lunch,2, ,43.11,5.0,Female,Yes,Thur,Lunch,4, ,13.0,2.0,Female,Yes,Thur,Lunch,2, ,12.74,2.01,Female,Yes,Thur,Lunch,2, ,13.0,2.0,Female,Yes,Thur,Lunch,2, ,16.4,2.5,Female,Yes,Thur,Lunch,2, ,16.47,3.23,Female,Yes,Thur,Lunch,3, ,12.76,2.23,Female,Yes,Sat,Dinner,2, ,13.27,2.5,Female,Yes,Sat,Dinner,2, ,28.17,6.5,Female,Yes,Sat,Dinner,3, ,12.9,1.1,Female,Yes,Sat,Dinner,2, ,30.14,3.09,Female,Yes,Sat,Dinner,4, ,13.42,3.48,Female,Yes,Fri,Lunch,2, ,15.98,3.0,Female,No,Fri,Lunch,3, ,16.27,2.5,Female,Yes,Fri,Lunch,2, ,10.09,2.0,Female,Yes,Fri,Lunch,2, ,22.12,2.88,Female,Yes,Sat,Dinner,2, ,35.83,4.67,Female,No,Sat,Dinner,3, ,27.18,2.0,Female,Yes,Sat,Dinner,2, ,18.78,3.0,Female,No,Thur,Dinner,2, In [72]: data['perc_tip'].mean() Out[72]: In [73]: # Slow version: #the_sum = 0 #for row in data: # the_sum += row['perc_tip'] # #the_mean = the_sum / len(data) In [74]: data.groupby('size')['perc_tip'].mean() Out[74]: size
11 Name: perc_tip, dtype: float64 In [75]: data.groupby(['size', 'sex'])['perc_tip'].mean() Out[75]: size sex 1 Female Male Female Male Female Male Female Male Female Male Female Male Name: perc_tip, dtype: float64 In [76]: means = data.groupby(['size', 'sex'])['perc_tip'].mean() In [77]: means.unstack('sex') Out[77]: sex Female Male size In [78]: data.groupby(['size', 'sex'])['perc_tip'].mean().unstack().to_latex() Out[78]: '\\begin{tabular}{lrr}\n\\toprule\nsex & Female & Male \\\\\nsize In [79]: # "readable" version: (data.groupby(['size', 'sex']) ['perc_tip'].mean().unstack().to_latex()) Out[79]: '\\begin{tabular}{lrr}\n\\toprule\nsex & Female & Male \\\\\nsize 11
12 2 Slide 3 matplotlib: visualize data In [80]: from matplotlib import pyplot as plt %matplotlib inline In [81]: plt.plot([1,3,2]) Out[81]: [<matplotlib.lines.line2d at 0x7fa9fcc22160>] In [82]: plt.bar([0,1,2], [1,3,2]) Out[82]: <Container object of 3 artists> 12
13 In [83]: data['total_bill'].plot() Out[83]: <matplotlib.axes._subplots.axessubplot at 0x7fa9fcb29320> 13
14 In [84]: data[['total_bill', 'tip']].plot() Out[84]: <matplotlib.axes._subplots.axessubplot at 0x7fa9fcb26dd8> In [85]: data.groupby('size')['perc_tip'].mean().plot(kind='bar') Out[85]: <matplotlib.axes._subplots.axessubplot at 0x7fa9fcb1e128> 14
15 In [86]: data.groupby(['size', 'sex'])['perc_tip'].mean().unstack().plot(kind='bar' Out[86]: <matplotlib.axes._subplots.axessubplot at 0x7fa9fca84e48> In [87]: female = (data['sex'] == 'Female') data.plot(kind='scatter', x='perc_tip', y='total_bill', c=female, edgecolor='r' ) Out[87]: <matplotlib.axes._subplots.axessubplot at 0x7fa9fcb264e0> 15
16 4 Slide 5 Statsmodels In [88]: import statsmodels.api as sm In [89]: res = sm.ols.from_formula('tip ~ total_bill + sex + day + size', data=data In [90]: res.summary() ========================================================================== ========================================================================== Out[90]: <class 'statsmodels.iolib.summary.summary'> """ OLS Regression Results Dep. Variable: tip R-squared: 0 Model: OLS Adj. R-squared: 0 Method: Least Squares F-statistic: 3 Date: Sat, 05 May 2018 Prob (F-statistic): 4.04 Time: 10:36:20 Log-Likelihood: -34 No. Observations: 244 AIC: 7 Df Residuals: 237 BIC: 7 Df Model: 6 Covariance Type: nonrobust 16
17 ========================================================================== ========================================================================== coef std err t P> t [ Intercept sex[t.male] day[t.sat] day[t.sun] day[t.thur] total_bill size Omnibus: Durbin-Watson: 2 Prob(Omnibus): Jarque-Bera (JB): 49 Skew: Prob(JB): 1.87 Kurtosis: Cond. No. Warnings: [1] Standard Errors assume that the covariance matrix of the errors is cor """ In [91]: data['day'].unique() Out[91]: array(['sun', 'Sat', 'Thur', 'Fri'], dtype=object) 6 Slide 7 scikit-learn In [92]: from sklearn.neural_network import MLPClassifier In [93]: clf = MLPClassifier() In [94]: data.head() Out[94]: total_bill tip sex smoker day time size perc_tip Female No Sun Dinner Male No Sun Dinner Male No Sun Dinner Male No Sun Dinner Female No Sun Dinner In [95]: data['sex'] = data['sex'] == 'Female' data['smoker'] = data['smoker'] == 'Yes' data['time'] = data['time'] == 'Dinner' In [96]: data.head() 17
18 Out[96]: total_bill tip sex smoker day time size perc_tip True False Sun True False False Sun True False False Sun True False False Sun True True False Sun True In [97]: data['good_tip'] = data['perc_tip'] > data['perc_tip'].mean() In [98]: x = data.drop(['good_tip', 'day', 'perc_tip', 'tip'], axis=1) y = data['good_tip'] In [99]: data.head() Out[99]: total_bill tip sex smoker day time size perc_tip good_tip True False Sun True False False False Sun True False False False Sun True True False False Sun True False True False Sun True False In [100]: res = clf.fit(x, y) In [101]: # I'M CHEATING! I'M CHEATING! res.score(x, y) Out[101]: In [102]: from sklearn.tree import DecisionTreeClassifier, export_graphviz In [103]: tree = DecisionTreeClassifier(max_depth=4) In [104]: res = tree.fit(x, y) In [105]: res.score(x, y) Out[105]: In [106]: dot_data = export_graphviz(tree, out_file=none, feature_names=x.columns, filled=true, rounded=true, special_characters=true) In [107]: import graphviz graph = graphviz.source(dot_data) In [108]: graph Out[108]: 18
19 True total_bill gini = samples = 169 value = [82, 87] total_bill gini = samples = 244 value = [137, 107] False total_bill 48.3 gini = samples = 75 value = [55, 20] smoker 0.5 gini = samples = 24 value = [6, 18] total_bill gini = samples = 145 value = [76, 69] total_bill gini = samples = 73 value = [55, 18] gini = 0.0 samples = 2 value = [0, 2] time 0.5 gini = samples = 17 value = [6, 11] gini = 0.0 samples = 7 value = [0, 7] total_bill gini = samples = 129 value = [71, 58] total_bill gini = samples = 16 value = [5, 11] size 3.5 gini = samples = 55 value = [38, 17] smoker 0.5 gini = samples = 18 value = [17, 1] gini = samples = 8 value = [1, 7] gini = samples = 9 value = [5, 4] gini = samples = 122 value = [64, 58] gini = 0.0 samples = 7 value = [7, 0] gini = 0.0 samples = 3 value = [0, 3] gini = samples = 13 value = [5, 8] gini = samples = 33 value = [26, 7] gini = samples = 22 value = [12, 10] gini = samples = 9 value = [8, 1] gini = 0.0 samples = 9 value = [9, 0] In [109]: from sklearn.ensemble import RandomForestClassifier In [110]: forest = RandomForestClassifier(max_depth=4) In [111]: res = forest.fit(x, y) In [112]: res.score(x, y) Out[112]:
Notes for laboratory session 2
Notes for laboratory session 2 Preliminaries Consider the ordinary least-squares (OLS) regression of alcohol (alcohol) and plasma retinol (retplasm). We do this with STATA as follows:. reg retplasm alcohol
More informationAge (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)
Criminal Justice Doctoral Comprehensive Exam Statistics August 2016 There are two questions on this exam. Be sure to answer both questions in the 3 and half hours to complete this exam. Read the instructions
More informationApplication of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties
Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point
More informationSTAT 503X Case Study 1: Restaurant Tipping
STAT 503X Case Study 1: Restaurant Tipping 1 Description Food server s tips in restaurants may be influenced by many factors including the nature of the restaurant, size of the party, table locations in
More informationMultiple Linear Regression Analysis
Revised July 2018 Multiple Linear Regression Analysis This set of notes shows how to use Stata in multiple regression analysis. It assumes that you have set Stata up on your computer (see the Getting Started
More informationFinal Exam - section 2. Thursday, December hours, 30 minutes
Econometrics, ECON312 San Francisco State University Michael Bar Fall 2011 Final Exam - section 2 Thursday, December 15 2 hours, 30 minutes Name: Instructions 1. This is closed book, closed notes exam.
More information1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.
LDA lab Feb, 6 th, 2002 1 1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. 2. Scientific question: estimate the average
More informationFor People With Diabetes. Blood Sugar Diary. SCAN Health Plan
For People With Diabetes Blood Sugar Diary SCAN Health Plan 78 A Circle of Help to Live a Healthy Life You are the center of a healthy life with diabetes. All the elements of good care begin and end with
More informationm 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers
SOCY5061 RELATIVE RISKS, RELATIVE ODDS, LOGISTIC REGRESSION RELATIVE RISKS: Suppose we are interested in the association between lung cancer and smoking. Consider the following table for the whole population:
More informationPatient Education. intermountainhealthcare.org/diabetes. BG Tracker. for people with diabetes MONITORING BLOOD GLUCOSE
Patient Education intermountainhealthcare.org/diabetes Tracker for people with diabetes MONITORING BLOOD GLUCOSE Title Case My name/phone: Contact numbers: Healthcare provider: Diabetes educator: Pharmacy:
More informationName: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.
Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam May 28 th, 2015: 9am to 1pm Instructions: 1. There are seven questions and 12 pages. 2. Read each question carefully. Answer
More informationMultivariate dose-response meta-analysis: an update on glst
Multivariate dose-response meta-analysis: an update on glst Nicola Orsini Unit of Biostatistics Unit of Nutritional Epidemiology Institute of Environmental Medicine Karolinska Institutet http://www.imm.ki.se/biostatistics/
More informationFinal Research on Underage Cigarette Consumption
Final Research on Underage Cigarette Consumption Angie Qin An Hu New York University Abstract Over decades, we witness a significant increase in amount of research on cigarette consumption. Among these
More informationRegression Output: Table 5 (Random Effects OLS) Random-effects GLS regression Number of obs = 1806 Group variable (i): subject Number of groups = 70
Regression Output: Table 5 (Random Effects OLS) Random-effects GLS regression Number of obs = 1806 R-sq: within = 0.1498 Obs per group: min = 18 between = 0.0205 avg = 25.8 overall = 0.0935 max = 28 Random
More informationSociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame,
Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ I. True-False. (20 points) Indicate whether the following statements are true or false.
More informationMODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING
Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects
More informationApplication Note 201
201 Application note: Determination of the of Milk Application note no.: 201 By: E. de Jong Date: January 2001 Copyright Delta Instruments 2005 www.deltainstruments.com Table of contents Table of contents
More informationNORTH SOUTH UNIVERSITY TUTORIAL 2
NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 Correlation Analysis INTRODUCTION In correlation analysis, we estimate
More informationDaniel Boduszek University of Huddersfield
Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Multinominal Logistic Regression SPSS procedure of MLR Example based on prison data Interpretation of SPSS output Presenting
More informationConstructing a mixed model using the AIC
Constructing a mixed model using the AIC The Data: The Citalopram study (PI Dr. Zisook) Does Citalopram reduce the depression in schizophrenic patients with subsyndromal depression Two Groups: Citalopram
More informationbivariate analysis: The statistical analysis of the relationship between two variables.
bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for
More informationANOVA in SPSS (Practical)
ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling
More informationROC Curves. I wrote, from SAS, the relevant data to a plain text file which I imported to SPSS. The ROC analysis was conducted this way:
ROC Curves We developed a method to make diagnoses of anxiety using criteria provided by Phillip. Would it also be possible to make such diagnoses based on a much more simple scheme, a simple cutoff point
More informationVessel wall differences between middle cerebral artery and basilar artery. plaques on magnetic resonance imaging
Vessel wall differences between middle cerebral artery and basilar artery plaques on magnetic resonance imaging Peng-Peng Niu, MD 1 ; Yao Yu, MD 1 ; Hong-Wei Zhou, MD 2 ; Yang Liu, MD 2 ; Yun Luo, MD 1
More informationRegression models, R solution day7
Regression models, R solution day7 Exercise 1 In this exercise, we shall look at the differences in vitamin D status for women in 4 European countries Read and prepare the data: vit
More informationThis tutorial presentation is prepared by. Mohammad Ehsanul Karim
STATA: The Red tutorial STATA: The Red tutorial This tutorial presentation is prepared by Mohammad Ehsanul Karim ehsan.karim@gmail.com STATA: The Red tutorial This tutorial presentation is prepared by
More informationModeling unobserved heterogeneity in Stata
Modeling unobserved heterogeneity in Stata Rafal Raciborski StataCorp LLC November 27, 2017 Rafal Raciborski (StataCorp) Modeling unobserved heterogeneity November 27, 2017 1 / 59 Plan of the talk Concepts
More informationStatistics: Making Sense of the Numbers
Statistics: Making Sense of the Numbers Chapter 9 This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public performance or display, including
More informationPsych 5741/5751: Data Analysis University of Boulder Gary McClelland & Charles Judd. Exam #2, Spring 1992
Exam #2, Spring 1992 Question 1 A group of researchers from a neurobehavioral institute are interested in the relationships that have been found between the amount of cerebral blood flow (CB FLOW) to the
More informationIntroduction to regression
Introduction to regression Regression describes how one variable (response) depends on another variable (explanatory variable). Response variable: variable of interest, measures the outcome of a study
More informationAnswer to exercise: Growth of guinea pigs
Answer to exercise: Growth of guinea pigs The effect of a vitamin E diet on the growth of guinea pigs is investigated in the following way: In the beginning of week 1, 10 animals received a growth inhibitor.
More informationPARALLELISM AND THE LEGITIMACY GAP 1. Appendix A. Country Information
PARALLELISM AND THE LEGITIMACY GAP 1 Appendix A Country Information PARALLELISM AND THE LEGITIMACY GAP 2 Table A.1 Sample size by country 2006 2008 2010 Austria 2405 2255 0 Belgium 1798 1760 1704 Bulgaria
More informationTHE UNIVERSITY OF SUSSEX. BSc Second Year Examination DISCOVERING STATISTICS SAMPLE PAPER INSTRUCTIONS
C8552 THE UNIVERSITY OF SUSSEX BSc Second Year Examination DISCOVERING STATISTICS SAMPLE PAPER INSTRUCTIONS Do not, under any circumstances, remove the question paper, used or unused, from the examination
More informationPackage speff2trial. February 20, 2015
Version 1.0.4 Date 2012-10-30 Package speff2trial February 20, 2015 Title Semiparametric efficient estimation for a two-sample treatment effect Author Michal Juraska , with contributions
More information4. STATA output of the analysis
Biostatistics(1.55) 1. Objective: analyzing epileptic seizures data using GEE marginal model in STATA.. Scientific question: Determine whether the treatment reduces the rate of epileptic seizures. 3. Dataset:
More informationNormal Q Q. Residuals vs Fitted. Standardized residuals. Theoretical Quantiles. Fitted values. Scale Location 26. Residuals vs Leverage
Residuals 400 0 400 800 Residuals vs Fitted 26 42 29 Standardized residuals 2 0 1 2 3 Normal Q Q 26 42 29 360 400 440 2 1 0 1 2 Fitted values Theoretical Quantiles Standardized residuals 0.0 0.5 1.0 1.5
More information2. Scientific question: Determine whether there is a difference between boys and girls with respect to the distance and its change over time.
LDA lab Feb, 11 th, 2002 1 1. Objective:analyzing dental data using ordinary least square (OLS) and Generalized Least Square(GLS) in STATA. 2. Scientific question: Determine whether there is a difference
More informationStats for Clinical Trials, Math 150 Jo Hardin Logistic Regression example: interaction & stepwise regression
Stats for Clinical Trials, Math 150 Jo Hardin Logistic Regression example: interaction & stepwise regression Interaction Consider data is from the Heart and Estrogen/Progestin Study (HERS), a clinical
More informationWe define a simple difference-in-differences (DD) estimator for. the treatment effect of Hospital Compare (HC) from the
Appendix A: Difference-in-Difference Estimation Estimation Strategy We define a simple difference-in-differences (DD) estimator for the treatment effect of Hospital Compare (HC) from the perspective of
More informationMidterm Exam ANSWERS Categorical Data Analysis, CHL5407H
Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H 1. Data from a survey of women s attitudes towards mammography are provided in Table 1. Women were classified by their experience with mammography
More informationQuestion 1(25= )
MSG500 Final 20-0-2 Examiner: Rebecka Jörnsten, 060-49949 Remember: To pass this course you also have to hand in a final project to the examiner. Open book, open notes but no calculators or computers allowed.
More informationPredicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering
Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering Kunal Sharma CS 4641 Machine Learning Abstract Clustering is a technique that is commonly used in unsupervised
More informationLogbook Dates FROM TO
LOGBOOK Logbook Dates FROM TO Contact Details NAME ADDRESS CITY STATE / POSTCODE PHONE DOCTOR S NAME DOCTOR S PHONE DOCTOR S EMAIL DIABETES EDUCATOR S NAME DIABETES EDUCATOR S PHONE DIABETES EDUCATOR S
More informationADVANCED STATISTICAL METHODS: PART 1: INTRODUCTION TO PROPENSITY SCORES IN STATA. Learning objectives:
ADVANCED STATISTICAL METHODS: ACS Outcomes Research Course PART 1: INTRODUCTION TO PROPENSITY SCORES IN STATA Learning objectives: To understand the use of propensity scores as a means for controlling
More informationSociology Exam 3 Answer Key [Draft] May 9, 201 3
Sociology 63993 Exam 3 Answer Key [Draft] May 9, 201 3 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. Bivariate regressions are
More informationMMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?
MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference
More informationSPSS output for 420 midterm study
Ψ Psy Midterm Part In lab (5 points total) Your professor decides that he wants to find out how much impact amount of study time has on the first midterm. He randomly assigns students to study for hours,
More informationAnalyzing diastolic and systolic blood pressure individually or jointly?
Analyzing diastolic and systolic blood pressure individually or jointly? Chenglin Ye a, Gary Foster a, Lisa Dolovich b, Lehana Thabane a,c a. Department of Clinical Epidemiology and Biostatistics, McMaster
More informationSteven Farber McMaster University Antonio Páez McMaster University Kay Axhausen ETH Zurich
Steven Farber McMaster University Antonio Páez McMaster University Kay Axhausen ETH Zurich Analysis of transportation systems has adopted more ideas from the activity paradigm We now recognize that travel
More informationOnline Appendix. Supply-Side Drug Policy in the Presence of Substitutes: Evidence from the Introduction of Abuse-Deterrent Opioids
Online Appendix Supply-Side Drug Policy in the Presence of Substitutes: Evidence from the Introduction of Abuse-Deterrent Opioids Abby Alpert, David Powell, Rosalie Liccardo Pacula Appendix Figure A.1:
More information_- Part 2e Urinary-System Ca, Females Pacific Constant CHAPTER 12
CHAPTER 12 Urinary-System Cancers, Females: Relation with Medical Radiation * Part I. Introduction Urinary-System Cancers include cancers of the kidney, bladder, "and other urinary organs" (Chapter 4,
More informationMalignant Tumor Detection Using Machine Learning through Scikit-learn
Volume 119 No. 15 2018, 2863-2874 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ Malignant Tumor Detection Using Machine Learning through Scikit-learn Arushi
More informationSAS Data Setup: SPSS Data Setup: STATA Data Setup: Hoffman ICPSR Example 5 page 1
Hoffman ICPSR Example 5 page 1 Example 5: Examining BP and WP Effects of Negative Mood Predicting Next-Morning Glucose (complete data, syntax, and output available for SAS, SPSS, and STATA electronically)
More informationOuting the Outliers. Tails of the Unexpected. ICEAA 2016 International Training Symposium Bristol, 17 th to 20 th October 2016
Outing the Outliers or Tails of the Unepected ICEAA 2016 International Training Symposium Bristol, 17 th to 20 th October 2016?! Alan R Jones Estimata Limited Promoting TRACEability in Estimating Outing
More informationATTACH YOUR SAS CODE WITH YOUR ANSWERS.
BSTA 6652 Survival Analysis Winter, 2017 Problem Set 5 Reading: Klein: Chapter 12; SAS textbook: Chapter 4 ATTACH YOUR SAS CODE WITH YOUR ANSWERS. The data in BMTH.txt was collected on 43 bone marrow transplant
More informationNEUROBLASTOMA DATA -- TWO GROUPS -- QUANTITATIVE MEASURES 38 15:37 Saturday, January 25, 2003
NEUROBLASTOMA DATA -- TWO GROUPS -- QUANTITATIVE MEASURES 38 15:37 Saturday, January 25, 2003 Obs GROUP I DOPA LNDOPA 1 neurblst 1 48.000 1.68124 2 neurblst 1 133.000 2.12385 3 neurblst 1 34.000 1.53148
More informationIn-hospital Intensive Care Unit Mortality Prediction Model
In-hospital Intensive Care Unit Mortality Prediction Model COMPUTING FOR DATA SCIENCES GROUP 6: MANASWI VELIGATLA (24), NEETI POKHARNA (27), ROBIN SINGH (36), SAURABH RAWAL (42) Contents Impact Problem
More informationExercise Verify that the term on the left of the equation showing the decomposition of "total" deviation in a two-factor experiment.
Exercise 2.2.1 Verify that the term on the left of the equation showing the decomposition of "total" deviation in a two-factor experiment y ijk y = ( y i y ) + ( y j y ) + [( y ij y ) ( y i y ) ( y j y
More informationSPSS Portfolio. Brittany Murray BUSA MWF 1:00pm-1:50pm
SPSS Portfolio Brittany Murray BUSA 2182 MWF 1:00pm-1:50pm Table Of Contents I) SPSS Computer Lab Assignment # 1 Frequency Distribution a) Cover Page b) Explanatory Paragraph c) Appendix II) SPSS Computer
More informationLab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.
Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups. Activity 1 Examining Data From Class Background Download
More informationTip the Calorie Balance
: Tip the Calorie Balance The Program involves 2 lifestyle changes: 1. Healthy eating. This includes eating less fat and more whole grains, fruits, and vegetables. 2. Being active. Both relate to weight
More informationCHAPTER 9. X Coefficient(s) Std Err of Coef. Coefficient / S.E. Constant Std Err of Y Est R Squared
CHAPTER 9 Digestive System Cancers, Males: Relation with Medical Radiation IIIIII:- -:X - X 1 I I I I I I II I I II I II ::::: : 5::::: :' ":5:: 5 :::::::: : :: ":: ::::::: ::::: :::::: : ::' ::::::::::::::::::::::::::;;::::::
More informationECON Introductory Econometrics Seminar 7
ECON4150 - Introductory Econometrics Seminar 7 Stock and Watson EE11.2 April 28, 2015 Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 1 / 25 E. 11.2 b clear set more
More informationANOVA. Thomas Elliott. January 29, 2013
ANOVA Thomas Elliott January 29, 2013 ANOVA stands for analysis of variance and is one of the basic statistical tests we can use to find relationships between two or more variables. ANOVA compares the
More informationDaniel Boduszek University of Huddersfield
Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Logistic Regression SPSS procedure of LR Interpretation of SPSS output Presenting results from LR Logistic regression is
More informationToday: Binomial response variable with an explanatory variable on an ordinal (rank) scale.
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Single Explanatory Variable on an Ordinal Scale ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10,
More informationPROC CORRESP: Different Perspectives for Nominal Analysis. Richard W. Cole. Systems Analyst Computation Center. The University of Texas at Austin
PROC CORRESP: Different Perspectives for Nominal Analysis Richard W. Cole Systems Analyst Computation Center The University of Texas at Austin 35 ABSTRACT Correspondence Analysis as a fundamental approach
More informationDr. Kelly Bradley Final Exam Summer {2 points} Name
{2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.
More informationChapter 6 Measures of Bivariate Association 1
Chapter 6 Measures of Bivariate Association 1 A bivariate relationship involves relationship between two variables. Examples: Relationship between GPA and SAT score Relationship between height and weight
More informationGeneralized Mixed Linear Models Practical 2
Generalized Mixed Linear Models Practical 2 Dankmar Böhning December 3, 2014 Prevalence of upper respiratory tract infection The data below are taken from a survey on the prevalence of upper respiratory
More informationChoosing a Significance Test. Student Resource Sheet
Choosing a Significance Test Student Resource Sheet Choosing Your Test Choosing an appropriate type of significance test is a very important consideration in analyzing data. If an inappropriate test is
More informationSTP 231 Example FINAL
STP 231 Example FINAL Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.
More informationSPSS output for 420 midterm study
Ψ Psy Midterm Part In lab (5 points total) Your professor decides that he wants to find out how much impact amount of study time has on the first midterm. He randomly assigns students to study for hours,
More informationData Analysis in the Health Sciences. Final Exam 2010 EPIB 621
Data Analysis in the Health Sciences Final Exam 2010 EPIB 621 Student s Name: Student s Number: INSTRUCTIONS This examination consists of 8 questions on 17 pages, including this one. Tables of the normal
More informationWeek 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.
Week 8 Hour 1: More on polynomial fits. The AIC Hour 2: Dummy Variables what are they? An NHL Example Hour 3: Interactions. The stepwise method. Stat 302 Notes. Week 8, Hour 1, Page 1 / 34 Human growth
More informationX Coefficient(s) Std Err of Coef. Coefficient / S.E Difference-Cancers, Males S...
CHAPTER 18 "Difference" Cancers, Males: Relation with Medical Radiation e Part 1. Introduction Difference-Cancers are All-Cancers-Minus-Respiratory-System Cancers. The dramatic increase in Respiratory-System
More informationLeast likely observations in regression models for categorical outcomes
The Stata Journal (2002) 2, Number 3, pp. 296 300 Least likely observations in regression models for categorical outcomes Jeremy Freese University of Wisconsin Madison Abstract. This article presents a
More informationCHAPTER TWO REGRESSION
CHAPTER TWO REGRESSION 2.0 Introduction The second chapter, Regression analysis is an extension of correlation. The aim of the discussion of exercises is to enhance students capability to assess the effect
More informationExample 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics)
Example 7.2 Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Example 7.2. Autocorrelation 1 / 17 Questions.
More informationCIS192 Python Programming
CIS192 Python Programming Scientific Computing Eric Kutschera University of Pennsylvania March 20, 2015 Eric Kutschera (University of Pennsylvania) CIS 192 March 20, 2015 1 / 28 Course Feedback Let me
More informationLimited dependent variable regression models
181 11 Limited dependent variable regression models In the logit and probit models we discussed previously the dependent variable assumed values of 0 and 1, 0 representing the absence of an attribute and
More informationAnalysis of Variance: repeated measures
Analysis of Variance: repeated measures Tests for comparing three or more groups or conditions: (a) Nonparametric tests: Independent measures: Kruskal-Wallis. Repeated measures: Friedman s. (b) Parametric
More informationMULTIPLE REGRESSION OF CPS DATA
MULTIPLE REGRESSION OF CPS DATA A further inspection of the relationship between hourly wages and education level can show whether other factors, such as gender and work experience, influence wages. Linear
More informationBasic Biostatistics. Chapter 1. Content
Chapter 1 Basic Biostatistics Jamalludin Ab Rahman MD MPH Department of Community Medicine Kulliyyah of Medicine Content 2 Basic premises variables, level of measurements, probability distribution Descriptive
More information31 days Wed Nov 15, Fri Dec 15, 2017 N/A. Hypoglycemia. Sensor usage (CGM) Mahlon's glucose data was in the target range about 77% of the day.
Mahlon Lovett Overview Report Generated at: Thu, Feb 8, 2018 12:42 PM EST 137 26 N/A 18% 75% 7% Days with CGM data Avg. calibrations per day 3% 1 / 31 0.0 0% Average glucose Standard Hypoglycemia Time
More informationBusiness Research Methods. Introduction to Data Analysis
Business Research Methods Introduction to Data Analysis Data Analysis Process STAGES OF DATA ANALYSIS EDITING CODING DATA ENTRY ERROR CHECKING AND VERIFICATION DATA ANALYSIS Introduction Preparation of
More informationMEA DISCUSSION PAPERS
Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de
More informationQuantitative Methods in Computing Education Research (A brief overview tips and techniques)
Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu
More informationCHAPTER 15. X Coefficient(s) Std Err of Coef. Coefficient / S.E. Constant Std Err of Y Est R Squared
CHAPTER 15 Buccal-Cavity & Pharynx Cancers, Males: Relation with Medical Radiation I ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ l lll Illl Illl I Il Illl~ lll * Part 1. Introduction :i~~~~i~~i~~i~~i:!:!~~~ i :ii:: ~ ~:!i::i!!i
More informationSTATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012
STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION by XIN SUN PhD, Kansas State University, 2012 A THESIS Submitted in partial fulfillment of the requirements
More informationSubescala D CULTURA ORGANIZACIONAL. Factor Analysis
Subescala D CULTURA ORGANIZACIONAL Factor Analysis Descriptive Statistics Mean Std. Deviation Analysis N 1 3,44 1,244 224 2 3,43 1,258 224 3 4,50,989 224 4 4,38 1,118 224 5 4,30 1,151 224 6 4,27 1,205
More information2017 and Beyond Kill Mode Training Co., Inc. / All Rights Reserved.
PROGRESS TRACKER by Dan Long Legalities Thank you for taking the time to note these important points prior to diving into the program. Copyright Notice No part of this report may be reproduced or transmitted
More informationThe Food Consumption Analysis
The Food Consumption Analysis BACKGROUND FOR STUDY The CHIS 2005 Adult Survey contains data on the individual records from the adult component of 2005 California Health Interview Survey. That is the population
More informationReview: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections
Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi
More informationBinary Diagnostic Tests Two Independent Samples
Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary
More informationCross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Diseases.
Cross-validation Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Diseases. October 29, 2015 Cancer Survival Group (LSH&TM) Cross-validation October
More informationIn this module I provide a few illustrations of options within lavaan for handling various situations.
In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural
More informationEffects of Nutrients on Shrimp Growth
Data Set 5: Effects of Nutrients on Shrimp Growth Statistical setting This Handout is an example of extreme collinearity of the independent variables, and of the methods used for diagnosing this problem.
More informationItem-Total Statistics
64 Reliability Case Processing Summary N % Cases Valid 46 00.0 Excluded a 0.0 46 00.0 a. Listwise deletion based on all variables in the procedure. Reliability Statistics Cronbach's Alpha N of Items.869
More informationMotor Programs Lab. 1. Record your reaction and movement time in ms for each trial on the individual data Table 1 below. Table I: Individual Data RT
Motor Programs Lab Introduction. This lab will simulate an important experiment performed by Henry and Rogers (1960). The task involved the subject responding to an external signal then executing a simple,
More information