Introduction of Empirical Analysis using Stata: For Beginners
|
|
- Chrystal Doyle
- 5 years ago
- Views:
Transcription
1 WBS seminar ('17/12/23) 1 Introduction of Empirical Analysis using Stata: For Beginners Lecturer: Tohru Yoshioka-Kobayashi Project Research Associate Department of Technology Management for innovation Graduate School of Engineering, the University of Tokyo t-koba@tmi.t.u-tokyo.ac.jp Acknowledgements: Mr. Kisa Sugihara and Mr. Akihiro Kawamura made a great contribution to the English translation. This lecture material can be used secondary according to the Creative Commons name display. Please note that there are some areas that do not adequately touch the statistical rigor.
2 0.Introduction 2 Introduction of the lecturer Researcher in MOT: '15 Ph.D. in Engineering from UTokyo Studying an organizational management in technology and design development Researcher in IP policy: '07 Master in Law from Osaka-U Seeking policy implications in the intelletual property law Career Assistant in legal affairs in the univ. start-up (Signpost, Corp.) Policy analysit in a private think tank (Mitsubishi Res. Inst.) Hitotsubashi Univ. & Univ. of Tokyo
3 0.Introduction 3 Goal We will learn basic knowledge and skills to reveal (or proof) a causal relationship. Even those who are not bright in mathematics will be able to analyze by yourself after the seminar The contents of the lecture are based on statistics, but no formula is used. Specialized in general-purpose analytical methods We use Stata.
4 0.Introduction 4 Agenda I. Preparation for the Analysis: How to Load data II. Descriptive Statistics and Graphs III. Data Processing IV. Regression Analysis V. Reporting of Regression Results
5 0.Introduction 5 Empirical analysis procedures 1 Setting research questions 2 Literature review 3 Causal model design 8 Creating a data set 9 Analysis 10 Discussion (interpretation) 4 Search for statistics and other data sources 5 Perform simple verification 6 Collecting data 7 Cleaning data (Data cleansing) Carried out in the head With data Embodiment
6 I. Preparation for the Analysis: How to Load data 6
7 I. Preparation for the analysis 7 1)Characteristics of statistical analysis software Stata SPSS R GRETL Features High High Medium (High w/add-in) User experience Good Good Bad Good Price High High Free Free Support Characteri stics Official Support + a couple of books Strong in the analysis of the social science Official support + Books A little strong in the analysis of the natural science A variety of information online + Books Strong in data processing Medium (High w/add-in) Information online Strong in analysis of the economics
8 I. Preparation for the analysis 8 2)Data to use SampleData_OECD.txt Created from OECD, Main Science and Technology Indicator tab-separated data Records of the following values 2008 and 2013 and their growth in 2013 (compare to those in 2008) Workforce population (thousands) PCT Patent applications (number of patents)...number of patent applications that are willing to apply to foreign countries Industry Value added (US $ million) Technology trade received (US $ Million) Technology trade payments (US $ Million) Technical trade balance (US $ million)...amount Received - payment
9 I. Preparation for the analysis 9 2)Data to use Data item Variable name Country Region_Narrow Region_Broad Laborforce_2008_thousands Content Country Region name Laborforce_2013_thousands (2013) Laborforce_growthrate pctpatentapplication_2008 Continent name Workforce population (thousands) (2008) Growth ( ) pctpatentapplication_2013 (2013) Pct_growthrate Valueadded_2008_m_usd Valueadded_2013_m_usd (2013) Number of international patent applications (2008) Growth rate( ) Industry Value added (US $ Million) (2008) Valueadded_growthrate Growth rate ( ) ValueAdded_Growth_M_USD Growth value ( ) Variable name Techreceipts_2008_m_usd Content Techreceipts_2013_m_usd (2013) Techreceipts_growthrate Techpayments_2008_m_usd Techpayments_2013_m_usd (2013) Techpayments_growthrate Techbalance_2008_m_usd Techbalance_2013_m_usd (2013) Techbalance_growth_m_usd Laborforce_growth_dummy Techbalance_growth_dummy Asiapacific_dummy Europe_dummy Eu_dummy Technology trade received (US $ Million) (2008) Growth rate( ) Technology trade payments (US $ Million) (2008) Growth rate( ) Technical trade balance of payment (US $ Million) (2008) Growth value (US $ Million) ( ) Dummy variable takes 1 if labor force population growth rate > 0 Dummy variable takes 1 if technology trade balance growth rate > 0 Dummy variable takes 1 if the country is in Asia or Paficif (including North America) Dummy variable takes 1 if the country is in Europe Dummy variable takes 1 if the country is one of the EU members
10 I. Preparation for the analysis 10 2)Data to use Questions to be solved What factor does increase the industry valueadded? What factor does increase the technology balance of payment? Important limitation: Examine only within the available data
11 I. Preparation for the analysis 11 3)Data for the experienced Overview IMPP_Eng_DATA.txt or IMPP_EnglishEdu_En.xlsx Source: Ministry of Education(MEXT) English Skill Survey in 2016 Surveys to public high schools and junior high schools Other government statistics Observed year FY2016
12 I. Preparation for the analysis 12 3)Data for the experienced Items Classification Items Variable names Basic information Prefecture ID ID Prefecture High School English Number of English teachers in public HS...(a) Teachers' English Skill Those who took an English examination among (a)...(b) (MEXT English Skill Survey in 2016) Those who graded Eiken Pre-1 and upper and these equivalents among (b)...(c) (c)/(a) High School Students' Seniors in public HS...(d) English Skill (MEXT Those who took an English examination among (d)...(e) English Skill Survey in Those who graded Eiken Pre-2 and upper among (e)...(f) 2016) Those who are regarded as equivalent to Eiken Pre-2 and upper except (f)...(g) (f)+(g) ((f)+(g))/(d) Pref_Str HS_T_ALL HS_T_EXAM HS_T_E1 HS_T_E1_R HS_S_ALL HS_S_EXAM HS_S_E2 HS_S_OT HS_S_E2OT HS_S_E2OT_R
13 I. Preparation for the analysis 13 3)Data for the experienced Items Classification Items Variable names Junior High School Number of English teachers in public JHS...(h) JH_T_ALL English Teachers' Those who took an English examination among (h)...(i) JH_T_EXAM English Skill (MEXT English Skill Survey in Those who graded Eiken Pre-1 and upper and these JH_T_E1 2016) equivalents among (i)...(j) (j)/(h) JH_T_E1_R Junior High School Seniors in public JHS...(k) Students' English Skill Those who took an English examination among (k)...(l) (MEXT English Skill Survey in 2016) Those who graded Eiken Pre-2 and upper among (l)...(m) Those who are regarded as equivalent to Eiken Pre-2 and upper except (m)...(n) (m)+(n) ((m)+(n))/(k) JH_S_ALL JH_S_EXAM JH_S_E2 JH_S_OT JH_S_E2OT JH_S_E2OT_R
14 I. Preparation for the analysis 14 3)Data for the experienced Items Classification Items Variable names Num. of High Schools Num. of high schools...(o) HS_I_ALL (MEXT Educational Num. of private high schools...(p) HS_I_PRIV Institution Basic Survey) Num. of public high schools...(q) HS_I_PUBL Num. of students who newly attend collage, university, and junior collage (MEXT Educational Institution Basic Survey) Num. of students who newly attend collages and universities (by HS location) Num. of students who newly attend junior collages (by HS location) Num. of graduate from JSH in 2013 (by JHS location) Percentage of students who go on to collages, universities, and junior collages Percentage of students who go on to collages and universities HS_S_UNIV_ENT HS_S_JC_ENT JH_S_PREVALL HS_S_UNJC_R HS_S_UNIV_R
15 I. Preparation for the analysis 15 3)Data for the experienced Questions What factors do influence on English skills of high school students?
16 I. Preparation for the analysis 16 4)Load data Statistics software has a fixed format The structure of the data must be followed as below Individual observation target in vertical direction (row direction) Variables (index) for each observation object in the horizontal direction (column direction) The top line should have a variable name Variable Do not put a line break in variable names No Name Gender Age Height 1 M.Y. Observations M S. F K.K. M
17 I. Preparation for the analysis 17 4)Load data Variable name guidelines How to name variables English letters and _(underscore) only make it safe. You should prevent use other symbols or Japanese Don't put a blank It is better not to use number as a first letter. Note) The data itself may contain Japanese and symbols
18 I. Preparation for the analysis 18 4)Load data File format It is best to read the Excel file. It is possible for STATA (though the old version does not work) If not,"tab-delimited text" is better than CSV. CSV data separate variables by ","(comma). In the numeric data, Excel and other database softwares may add "," as the digit indication. To avoid to be treated as separeted variables, these softwares add double-quotation like 333,231,298 when file is saved. Loading the file, R and Stata may treat numeric variables as a string. If the file is separated by tab, you can prevent this.
19 I. Preparation for the analysis 19 4)Load data FileMenu>Import> Choose Text data created by a spreadsheet
20 I. Preparation for the analysis 20 4)Load data Click on Browse [ii] Click on Browse [i] Keep checking tabdelimited data in advance
21 I. Preparation for the analysis 21 4)Load data On the file open window, choose Text Files (*.txt) and then open the data file Change to Text Files (*.txt)
22 I. Preparation for the analysis 22 4)Load data If you see a variable in the top right it is success Here
23 I. Preparation for the analysis 23 4)Load data Note the type of each variable in the imported data int Long Double Number (can be calculated) Byte 0/1(Can be calculated) Str String (not calculated) When there is garbage in the data or output to a tabdelimited text format with You can see it here.
24 I. Preparation for the analysis 24 4)Load data The type of the variable can be confirmed from [Variable Manager] Here
25 I. Preparation for the analysis 25 4)Load data The correct method A variable that is treated as string-type incorrectly can be fixed in DataMenu >Create or change data>other variabletransformation Commands>Convert variables from string to numeric.
26 II. Descriptive Statistics and Graphs 26
27 II. Descriptive Statistics and Graphs 27 1) Descriptive statistics View descriptive statistics Statistics Menu >Summaries, tables, and tests >Summary and descriptive statistics >Summary Statistics
28 II. Descriptive Statistics and Graphs 28 1) Descriptive statistics View descriptive statistics Just click on the data you want to aggregate in Variables [I] Just click and choose... [ii]ok
29 II. Descriptive Statistics and Graphs 29 1) Descriptive statistics View descriptive statistics. summarize laborforce_growthrate pct_growthrate valueadded_growthrate techbalance_gro > wth_m_usd Variable Obs Mean Std. Dev. Min Max laborforce~e pct_growth~e valueadded~e tech~h_m_usd Long variable names are omitted Standard deviation #Command lines for descriptive statistics summarize laborforce_growthrate pct_growthrate
30 II. Descriptive Statistics and Graphs 30 1) Descriptive statistics View descriptive statistics by/if/in Tags can be narrowed and aggregated by group [i]check here [ii]select a variable to be the groupʻs base (For example Europe_dummy)
31 II. Descriptive Statistics and Graphs 31 1) Descriptive statistics View descriptive statistics (results by group) -> europe_dummy = 0 Variable Obs Mean Std. Dev. Min Max laborforce~e pct_growth~e valueadded~e tech~h_m_usd > europe_dummy = 1 Variable Obs Mean Std. Dev. Min Max laborforce~e pct_growth~e valueadded~e tech~h_m_usd #Descriptive statistics by groups by europe_dummy, sort : summarize laborforce_growthrate pct_growthrate
32 II. Descriptive Statistics and Graphs 32 1) Descriptive statistics Correlations between variables Statistics > Summaries, tables and tests > Summary and descriptive statistics > Correlations and covariances #Correlations correlate valueadded_growthrate techbalance_growth_m_usd
33 II. Descriptive Statistics and Graphs 33 1) Descriptive statistics Correlations between variables (cont.)
34 II. Descriptive Statistics and Graphs 34 1) Descriptive statistics Correlations between variables (cont.): Results. correlate valueadded_growthrate techbalance_growth_m_usd techbalance_growth_dummy laborforce_growthrate pct_growthrate (obs=29) eu_dummy valuea~e ~h_m_usd techba~y laborf~e pct_gr~e eu_dummy valueadded~e tech~h_m_usd techbalanc~y laborforce~e pct_growth~e eu_dummy
35 II. Descriptive Statistics and Graphs 35 2)Graphs Drawing a histogram Graphics Menu > Histogram
36 II. Descriptive Statistics and Graphs 36 2)Graphs Drawing a histogram (cont.) Select a variable
37 II. Descriptive Statistics and Graphs 37 2)Graphs Drawing a histogram (cont.): Results Density ValueAdded_GrowthRate #Drawing a histogram histgram valueadded_growthrate
38 II. Descriptive Statistics and Graphs 38 2)Graphs Drawing a histogram by groups You can create a histogram for each group in the By tab [i]click By [ii] Select variables to use for grouping Density ValueAdded_GrowthRate Graphs by Europe_Dummy 0 5
39 II. Descriptive Statistics and Graphs 39 2)Graphs Drawing a histogram by groups Command lines #Drawing a histogram by groups histgram valueadded_growthrate, by(europe_dummy) Increase/decrease bins #Change the number of bins histgram valueadded_growthrate, bin(12) Density ValueAdded_GrowthRate
40 II. Descriptive Statistics and Graphs 40 2)Graphs Drawing a scatter chart Graphics Menu>Twoway graph (scatter, line, etc.)
41 II. Descriptive Statistics and Graphs 41 2)Graphs Drawing a scatter chart [i]click Create
42 II. Descriptive Statistics and Graphs 42 2)Graphs Drawing a scatter chart (cont.) [i]select the Scatter in the basic plots [ii]select each axis variable [iii] Press accept to return to the previous screen. Then press ok #Drawing a scatter chart twoway (scatter valueadded_growthrate pct_growthrate)
43 II. Descriptive Statistics and Graphs 43 2)Graphs Drawing a scatter chart (cont.): Results PCT_GrowthRate LaborForce_GrowthRate
44 II. Descriptive Statistics and Graphs 44 2)Graphs Drawing a scatter plot matrix Graphics > Scatterplot matrix
45 II. Descriptive Statistics and Graphs 45 2)Graphs Drawing a scatter plot matrix Select variables ValueAdded_GrowthRate LaborForce_GrowthRate PCT_GrowthRate EU_Dummy #Drawing a scatter plot matrix graph matrix valueadded_growthrate laborforce_growthrate pct_growthrate eu_dummy
46 II. Descriptive Statistics and Graphs 46 2)Graphs Drawing a box plot Graphics > Box plot
47 II. Descriptive Statistics and Graphs 47 2)Graphs Drawing a box plot PCT_GrowthRate #Drawing a box plot graph box pct_growthrate
48 II. Descriptive Statistics and Graphs 48 2)Graphs Drawing a box plot by groups [i]click Categories tab [ii]check Group1 [iii]select a variable for grouping #Drawing a box plot by groups graph box pct_growthrate, over(region_broad)
49 II. Descriptive Statistics and Graphs 49 2)Graphs Drawing a box plot by groups: Results PCT_GrowthRate Asia-Pacific Europe Other
50 II. Descriptive Statistics and Graphs 50 3)Exercise Our dataset (SampleData_OECD) includes one variable contains errors Hint: They are obvious errors Hint: Error are in specific variales among labor force, PCT, and value added related variables Find the variable by using summary statistics, histgrams, and scatter plots
51 II. Descriptive Statistics and Graphs 51 3)Exercise Answer ValueAdded_Growth_M_USD They calculated the value in 2008 minus the value in Thus, too many negative growths!
52 III. Data Processing 52
53 III. Data Processing 53 1)Create a new variable How to compute a new variable Data Menu>Create or change data>create new variable
54 III. Data Processing 54 1)Create a new variable How to compute a new variable [i]fill the name of the new variable [ii]click Create
55 III. Data Processing 55 1)Create a new variable How to compute a new variable (cont.) log( techbalance_growth_m_u sd ) [i] The mathematical process can be chosen from Function >Mathmatical [ii] You can choose a variable from variables #Create a new variable generate log( techbalance_growth_m_usd )
56 III. Data Processing 56 2)Save the dataset Save the modified dataset [1] File > Export > Textdata (delimited, *.csv)
57 III. Data Processing 57 2)Save the dataset Save the modified dataset [1] Input a file name Check Tab-delimited #Save the dataset in a tab delimited format text file export delimited using "OECD_data_v02.txt", delimiter(tab) replace
58 III. Data Processing 58 2)Save the dataset Save the modified dataset [2] File > Save as... #Save the dataset in a Stata data file(.dta) save "OECD_data.dta"
59 IV. Regression Analysis 59
60 IV. Regression Analysis 60 1) Estimating correlations with multiple variables: Basics Collect a large number of data and estimate an influence of each factor Performance b a c Factor 1 Green layer indicates the layer which is the most closest with all data (dots) Performance =a*factor 1 +b*factor 2+c Regression Analysis Factor 2 (note) Generally, green layer is not triangle, but in this example, we put limitation on Factor 1 and 2 (>0) and Performance (< p)
61 IV. Regression Analysis 61 1) Estimating correlations with multiple variables: Basics Key terms Dependent variable The variable to be estimated. In many cases, performance indicators Explanatory variables, independent variables Variables that are affected (or think there is a strong correlation with) dependent variable Control variables A variable that is not an explanatory variable that is affecting (or thinks there is a strong correlation) dependent variable In many cases, the variables used in prior research
62 IV. Regression Analysis 62 1) Estimating correlations with multiple variables: Basics What can be used as a explanatory variable? i. Squared term Estimates along with the normal one (first term?) and see the degree of influence of both to find a quadratic effect Multi-collinearity is often allowed between first term (x) and squared term (x 2 ) Interpretation Coefficients of X Coefficients of X 2 Interpretation 1 Significantly (+) 2 Significantly (-) Significantly (-) Significantly (+) Inverse-U shaped U-shaped 3 Not Significant Significantly (+) Positive impact is non-linear 4 Significantly (+) Not Significant A linear positive impact
63 IV. Regression Analysis 63 1) Estimating correlations with multiple variables: Basics What can be used as a explanatory variable? (cont.) ii. Cross section Use when there is a condition and how the explanatory variable works differently (check the moderator effect) Estimates along with each explanatory variables and see the degree of influence of both Factor 1 Performance Factor 2 Influence of Factor 1 depend on Factor 2
64 IV. Regression Analysis 64 1) Estimating correlations with multiple variables: Basics What can be used as a explanatory variable? (cont.) ii. Cross section (cont.) Notes: Cross section often cause multicollinearity with original explanatory variables: Need centering or standardization Centering: Original value mean value Standardization: (Original value - mean) / standard deviation If there is an unbalance between two explanatory variables, cross section will have biased influence: Need standardization or alignment of the number of digits
65 IV. Regression Analysis 65 1) Estimating correlations with multiple variables: Basics What can be used as a explanatory variable? (cont.) iii. Dummy variable The variable takes 1 if fulfill specific condition, otherwise 0. Useful to control the differences of conditions or affiliations (Example) Previous race win dummy: Takes 1 if the horse won in the previous race (Source) JRA Bolton, R. N., & Chapman, R. G. (1986). Searching for positive returns at the track: A multinomial logit model for handicapping horse races. Management Science, 32(8),
66 IV. Regression Analysis 66 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used The number of samples does not have to be large if it meets from i to v i. All explanatory variables are data derived from the experiment. (An uncertain value that takes a certain range = not a random variable) ii. The expected value of the error is 0 iii. No heteroscedasticity The error term is not unevenly distributed (see next page) The coefficients estimated for each explanatory variable are mathematically optimal solutions iv. No correlation between explanatory variables and errors Variable describing the explained variable is not lacking There are no variables that affect both the description variable and the explanatory variable. It also says There is no endogenous or "error terms are non-correlated" v. Error is normal distribution vi. It becomes possible to appropriately judge whether coefficients estimated for each explanatory variable are statistically correct There are no strong correlation between explanatory variables Bias is not included in the coefficients estimated for each explanatory variable
67 IV. Regression Analysis Modified the material provided by Dr. Koichi Hasegawa 67 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used iii) No heteroscedasticity Heteroscedasticity: the scattering of error tends to be greatly scattered in a specific area and scattered small in another area under the influence of a certain factor. The result is not reliable in the greatly scattered area (it is only a value taken between) Error Check by Breusch-pagan Test, or LM test Estimated formula If there is uneven dispersion Solution 1. Add missing variables to model 2. Logarithmic translation of explanatory variables and explained variables 3. Use a robust standard error 4. Estimating by Weighted least squares method (details, practice omitted), maximum likelihood method Cause1
68 IV. Regression Analysis 68 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used iv) No correlation between error and explanatory variable = no endogeneity (or no omitted variable bias) Knowledge volume and correlation Amount of knowledge Number of papers read Research time Number of hours spent Luck??? (Studentsʼ smartness) Cannot measure Highly rated research papers Evaluation from Instructors/ Awards/ Number of paper cited appear in the error sector Example:Scenes in which the seminar instructor's influence works both the number of accessible articles and the evaluation It cannot estimate the pure effect of the amount of knowledge as long as it is not possible to measure the goodness of the head of the person. Must be consider before the analysis. Durbin-wu-hausman test detect the endogeneity If there is an endogeneity Solution Fixed effect model estimation on panel data Adding control variables Adopt method of instrumental variables (IV)
69 IV. Regression Analysis 69 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used iv) No correlation between error and explanatory variable = no endogeneity (or no omitted variable bias) Phenomena observed when omitted variable bias exists R 2 is low (the model's explanatory power is weak) We have not added explanatory variables and control variables (It is not important in causality model, but it affects variable to be explained) that have been confirmed to have a significant influence on previous studies using the same explained variable Solution - check the previous research carefully!
70 IV. Regression Analysis 70 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used iv) No correlation between error and explanatory variable = no endogeneity (simultaneity bias or reverse causality) Amount of knowledge Number of papers read Correlation Example: Scenes where you can concentrate on research by being known as writing a good paper Research time Number of hours spent Devoted to research Already published highly rated research papers Highly rated research papers Evaluation from Instructors/ Awards/ Number of paper cited If itʻs not in the explanatory variable, its effect will appear in the error term Correct calculation is impossible in circulation. Must be consider before the analysis. Detectable by Durbin-Wu-Hausman test. Solution Add the value of one term before the explanatory variable Adopt method of instrumental variables (IV)
71 IV. Regression Analysis 71 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used v)normal distribution of errors However, if the sample is large enough (about a few hundreds) no verification required If the error is not normally distributed, the estimated line is not the correct slope. Confirm whether the residual is normal distribution by Kurtosis / Skewness Test or Shapiro-Wilk Normality Test If it is not a normal distribution Frequency of value to take error with actual samples Solution 1. Logarithmically transform (Log) and squared the dependent variable and explanatory variable 2. Calculate by the maximum likelihood method, like Possison model, Probit model, or Tobit model
72 IV. Regression Analysis 72 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used vi)no strong correlation between explanatory variables: nonexistent of multicollinearity Multi-collinearity: it is not known which variables to influence among highly correlated explanatory variables, and the estimated coefficients become inaccurate Observed phenomena Although the coefficient of determination is high, the t value of each explanatory variable is low (not significant) Abnormally high standard error It does not coincide with the sign (+ or ) of the coefficient of the result estimated by the model with only one correlative explanatory variable. VIF (Variance inflation Factor) is obtained and it is confirmed whether or not a variable showing 4 or more (or 10 or more) exists If there is a multicollinearity Solution 1. Eliminating unnecessary explanatory variables 2. Convert explanatory variables to difference or ratio 3. Factor analysis or principal component analysis is carried out to the explanatory variables, creating a non-correlated synthetic variable
73 IV. Regression Analysis 73 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) 7 steps in regression analysis 1 Design the causal relationship model and drop it into the indicator Make a model without endogeneity (omitted variable bias, simultaneity bias) Samples should be large (at least explanatory variable 2 or ) 2 Create descriptive statistics & correlation matrix Be sure to create a histogram to verify the distribution If the dependent variable does not take normal distribution, estimates other than OLS are also considered If the digits of the explanatory variable are different from each other, multiply by 1,000, prepare by 1 / 1,000 times etc. For explanatory variables whose correlation is too strong, either one is dropped or later checked for multicollinearity
74 IV. Regression Analysis 74 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) 7 steps in regression analysis 3 Make two models with only control variables without explanatory variables and models with explanatory variables Compare R2 of both models and see contribution of explanatory variables 4 If it contains a variable with strong correlation, check whether there is multiple collinearity Check VIF : It is more than 4 or more (or 10 or more)? In the case of multiple collinearity, one drops out, converts a variable, aggregates it by principal component analysis, etc.
75 IV. Regression Analysis 75 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) 7 steps in regression analysis 5 If the estimation models including variables with strong correlations, you should conduct multiple estimations, in which there correlated variables are included/excluded If the sign (positive or negative) of the estimation result of that variable changes depending on the model, the effect of multiple collinearity strongly appears If there is a pair of explanatory variables that has a high correlation in the correlation matrix table, but does not have multiple collinearity, this can show that there is no problem in the estimation Model 1 Model 2 Model 3 Strongly correlated Explanatory variable A Explanatory variable B Included Not included Included Not included Included Included
76 IV. Regression Analysis 76 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) 7 steps in regression analysis 6 After performing multiple regression analysis, obtain error and verify it whether the error is not uneven distribution or normal distribution Inhomogeneity dispersion of error is confirmed by Breush-Pagan test and LM test If the error is unevenly distributed, use a robust standard error, etc. Whether it follows the normal distribution is confirmed by skewness kurtosis test and Shapiro-Wilk normality test If the error does not follow the normal distribution, logarithmic transformation of the variable, use the maximum likelihood method, etc. However, if the number of samples is large, not necessary
77 IV. Regression Analysis ) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) 7 steps in regression analysis Verifying the robustness of estimated results Exclude data that may be outliers The data which may be different in nature is estimated separately. Since OLS estimates the average value of explanatory variables, the influence of things that take outliers in explained variables is significant Countermeasures should be regression of the quantile (median, 25 th percentile, 75 th percentile estimate)
78 IV. Regression Analysis 78 2)Exercise Verify whether the following models are correct by using OECD data. Activate technology development Increase ratio of PCT applications Increase in technical trade balance Increase in income (+) (+) (+) Increase in technical trade balance Increase in income ( ) Being a European country European dummy ( ) Increase in added value of industry Increase ration of added value
79 IV. Regression Analysis 79 3) Run OLS Run OLS Statistics > Linear models and related > Liner regression The explained variable is the first, all the rest are explanatory variables #Regression analysis regress valueadded_growthrate laborforce_growthrate pct_growthrate eu_dummy
80 IV. Regression Analysis 80 3) Run OLS Run OLS (cont.) Set dependent and explanatory variables (including control variable)
81 IV. Regression Analysis 3) Run OLS How to read the output results 81 F statistic (Whether there is a statistically significant difference between this model and the model that does not include any explanatory variables). regress valueadded_growthrate laborforce_growthrate pct_growthrate eu_dummy Number of observations Source SS df MS Number of obs = F(3, 37) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = R valueadded_growthrate Coef. Std. Err. t P> t [95% Conf. Interval] laborforce_growthrate pct_growthrate eu_dummy _cons Estimated coefficient Standard error Significance probability Confidence interval (The factor may actually be between this number)
82 IV. Regression Analysis 82 3) Run OLS Check for multicollinearity After the regression analysis runs: Statistics > Linear models and related > Regression diagnostics > Specification tests, etc. #Compute VIF estat vif
83 IV. Regression Analysis 83 3) Run OLS Check for multicollinearity (cont.) Select Variance inflation factors
84 IV. Regression Analysis 84 3) Run OLS Check for multicollinearity (cont.): Result. estat vif Variable VIF 1/VIF eu_dummy laborforce~e pct_growth~e Mean VIF 1.27 Vif If it is 4 or more, there is multiple collinearity. (Even those who make it 10 or more)
85 IV. Regression Analysis 85 3) Run OLS Confirm heteroscedasticity of error dispersion After the regression analysis runs: Statistics > Linear models and related > Regression diagnostics > Specification tests, etc. #heteroscedasticity test estat hettest
86 IV. Regression Analysis 86 3) Run OLS Confirm heteroscedasticity of error dispersion Test for heteroscedasticity
87 IV. Regression Analysis 87 3) Run OLS Confirm heteroscedasticity of error dispersion: Results. estat hettest Hypothesis is Variance of errors is uniform" Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of valueadded_growthrate chi2(1) = 1.15 Prob > chi2 = In this example, the probability that the assumption that dispersion is uniform is 28% (not very rare) = Interprete that dispersion is uniform
88 IV. Regression Analysis 88 3) Run OLS If heteroscedasticity is found: Robust standard error Statistics > Linear models and related > Liner regression <Same as OLS> [i]click the tab SE/Robust [ii] Select Robust #Regression with robust standard error regress valueadded_growthrate pct_growthrate laborforce_growthrate eu_dummy, vce(robust)
89 IV. Regression Analysis 89 3) Run OLS If heteroscedasticity is found: Robust standard error: Results regress valueadded_growthrate pct_growthrate laborforce_growthrate eu_dummy, vce(robust) Linear regression Number of obs = 41 F( 3, 37) = Robust standard errors are shown instead of standard errors Prob > F = R-squared = Root MSE = Robust valueadded_growthrate Coef. Std. Err. t P> t [95% Conf. Interval] pct_growthrate laborforce_growthrate eu_dummy _cons
90 IV. Regression Analysis 90 3) Run OLS Check the normal distribution of errors First, save the error to a new variable #Save the error to a new variable predict resd, residual
91 IV. Regression Analysis 91 3) Run OLS Check the normal distribution of errors (cont.) Statistics>Summaries >Distributional plots and tests> Skewness/Kurtosis tests for normality #Skewness test sktest resd
92 IV. Regression Analysis 92 3) Run OLS Check the normal distribution of errors (cont.) Select the variable you just created (the error is stored)
93 IV. Regression Analysis 93 3) Run OLS Check the normal distribution of errors (cont.) Hypothesis is Errors take normal distribution". sktest resd Skewness/Kurtosis tests for Normality joint Variable Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi resd Density Residuals In this example, the probability that the assumption that it is normally distributed holds is 65% = Interpreted as being normally distributed When it is not normally distributed, adopt a logarithmic dependent variable or analysis by maximum likelihood method etc.
94 IV. Regression Analysis 94 3) Run OLS Plot estimated results #Run immediately after regression estimates: Store estimated results in a new variable predict p_va_gr In this example, X-axis: pct_growthrate #Plot estimates and actual values twoway (scatter valueadded_growthrate pct_growthrate, mcolor(gray)) (scatter p_va_gr pct_growthrate, mcolor(red)) The estimates are red and the actual data is gray PCT_GrowthRate ValueAdded_GrowthRate Fitted values
95 IV. Regression Analysis 95 3) Run OLS Robustness check: (Example) Drop top/bottom data #Compute percentile and identification of data within certain percentile summarize valueadded_growthrate, detail gen isinuse = inrange(valueadded_growthrate, r(p5), r(p95)) In this example, we create a new To change existing variable isinuse which takes 1 if variable, use replace value added growth of the data is #Change percentiles within top 5% to 95% replace isinuse = inrange(valueadded_growthrate, r(p3), r(p97)) #Regression with selected data regress valueadded_growthrate laborforce_growthrate pct_growthrate eu_dummy if isinuse == 1 if identifies the condition of data to use You must repeat = twice
96 V. Reporting of Regression Results 96
97 V. Reporting of Regression Results 97 1) Reporting of Regression Results Common practice We usually report descriptive statistics correlation matrix regression results You can integrate into one table
98 V. Reporting of Regression Results 98 1) Reporting of Regression Results Common practice Example of descriptive statistics, and correlation matrix Keller, R. T. (2001). Cross-functional project groups in research and new product development: Diversity, communications, job stress, and outcomes. Academy of Management Journal, 44(3),
99 V. Reporting of Regression Results 99 1) Reporting of Regression Results Common practice Examples of regression results Keller, R. T. (2001). Cross-functional project groups in research and new product development: Diversity, communications, job stress, and outcomes. Academy of Management Journal, 44(3),
100 V. Reporting of Regression Results 100 1) Reporting of Regression Results Set up add-ins: outreg2, mkcorr #Install outreg2 (You need to do it only once) ssc install outreg2 #Install mkcorr (You need to do it only once) ssc install mkcorr
101 V. Reporting of Regression Results 101 1) Reporting of Regression Results Export descriptive statistics You can export in MS word format. #Create a new desc_stat.doc file and export descriptive statistics outreg2 using desc_stat.doc, replace sum(log) keep(valueadded_growthrate pct_growthrate laborforce_growthrate Select eu_dummy) variables to export in keep The file (reg_res.doc) will be saved in the folder indicated the status bar Results
102 V. Reporting of Regression Results 102 1) Reporting of Regression Results Export correlation matrix #Export correlation matrix in a text file mkcorr valueadded_growthrate pct_growthrate laborforce_growthrate eu_dummy, log(corr_matrix.txt)
103 V. Reporting of Regression Results 103 1) Reporting of Regression Results Export regression results #Regression analysis regress valueadded_growthrate laborforce_growthrate eu_dummy #Create a new file regress_res.doc and export results in it outreg2 using regress_res.doc, replace ctitle(model 1) #Another regression analysis regress valueadded_growthrate pct_growthrate laborforce_growthrate eu_dummy #Append the results into the file outreg2 using regress_res.doc, append ctitle(model 2)
104 V. Reporting of Regression Results 104 1) Reporting of Regression Results Export regression results: Results
105 V. Reporting of Regression Results 105 2) Visualization of Regression Results Plot estimated marginal effect Graphs showing marginal effects with confidence intervals #Plot marginal effect with confidence intervals graph twoway lfitci valueadded_growthrate pct_growthrate #Plot marginal effect with confidence intervals and original data graph twoway (lfitci valueadded_growthrate pct_growthrate) (scatter valueadded_growthrate pct_growthrate)
106 V. Reporting of Regression Results 106 2) Visualization of Regression Results Plot estimated marginal effect PCT_GrowthRate 95% CI Fitted values ValueAdded_GrowthRate
107 V. Reporting of Regression Results 107 2) Visualization of Regression Results Plot estimated results It is divided depending on whether it is Europe or not, and other values are plotted on the assumption that they are average values #Run immediately after regression estimates: Store estimated results in variables adjust laborforce_growthrate, by(eu_dummy) gen(p2_va_gr) Here, we use the mean value of Laborforce_growthrate #Show estimates twoway (scatter p2_va_gr pct_growthrate if eu_dummy==1, mcolor(blue))(scatter p2_va_gr pct_growthrate if eu_dummy==0, mcolor(red)), legend (order(1 "EU" 2 "Non- EU")) ytitle("value Added Growth") Blue in the EU and red in the case outside the EU
108 V. Reporting of Regression Results 108 2) Visualization of Regression Results You can change it in ytitle Value Added Growth PCT_GrowthRate You can change it in legend (order( ) ) EU Non-EU
109 109 Appendix For further improvement
110 Appendix 110 Variations of regressions for causality analysis Variations of estimation models corresponding with characteristics of the dependent variable Dependent variable = dummy variable Example: Surplus of technology balance of payments logistic regression logit model regression probit model regression Depenedent variable has cut-off point Example: Longitudanal performance of engineers (suddenly decrease due to the retirement, job rotation, and other life events) Tobit model
111 Appendix 111 Variations of regressions for causality analysis Variations of estimation models (cont.) Dependent variable = count & natural number Example: Number of inventions in a organization (the number of inventors who generate n inventions is 1/n 2 of all inventors (Narin&Breitzman, 1995)) Poisson model Negative binomial model
112 Appendix 112 Variations of estimation models to reveal causality Omitted variable bias prevention Panel data analysis Use time series data and exclude unobservable effects of individuals Fixed effect model Random effect model difference-in-difference regression discontinuity
113 Appendix 113 Variations of estimation models to reveal causality Estimation of other than mean value quantile regression
Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame,
Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ I. True-False. (20 points) Indicate whether the following statements are true or false.
More informationFinal Exam - section 2. Thursday, December hours, 30 minutes
Econometrics, ECON312 San Francisco State University Michael Bar Fall 2011 Final Exam - section 2 Thursday, December 15 2 hours, 30 minutes Name: Instructions 1. This is closed book, closed notes exam.
More informationMultiple Linear Regression Analysis
Revised July 2018 Multiple Linear Regression Analysis This set of notes shows how to use Stata in multiple regression analysis. It assumes that you have set Stata up on your computer (see the Getting Started
More informationThis tutorial presentation is prepared by. Mohammad Ehsanul Karim
STATA: The Red tutorial STATA: The Red tutorial This tutorial presentation is prepared by Mohammad Ehsanul Karim ehsan.karim@gmail.com STATA: The Red tutorial This tutorial presentation is prepared by
More informationPreliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)
Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) After receiving my comments on the preliminary reports of your datasets, the next step for the groups is to complete
More informationAge (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)
Criminal Justice Doctoral Comprehensive Exam Statistics August 2016 There are two questions on this exam. Be sure to answer both questions in the 3 and half hours to complete this exam. Read the instructions
More informationMULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES
24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter
More informationAn Introduction to Modern Econometrics Using Stata
An Introduction to Modern Econometrics Using Stata CHRISTOPHER F. BAUM Department of Economics Boston College A Stata Press Publication StataCorp LP College Station, Texas Contents Illustrations Preface
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!
More informationHere are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :
Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationbivariate analysis: The statistical analysis of the relationship between two variables.
bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for
More informationNotes for laboratory session 2
Notes for laboratory session 2 Preliminaries Consider the ordinary least-squares (OLS) regression of alcohol (alcohol) and plasma retinol (retplasm). We do this with STATA as follows:. reg retplasm alcohol
More informationIntroduction to regression
Introduction to regression Regression describes how one variable (response) depends on another variable (explanatory variable). Response variable: variable of interest, measures the outcome of a study
More informationBusiness Research Methods. Introduction to Data Analysis
Business Research Methods Introduction to Data Analysis Data Analysis Process STAGES OF DATA ANALYSIS EDITING CODING DATA ENTRY ERROR CHECKING AND VERIFICATION DATA ANALYSIS Introduction Preparation of
More informationisc ove ring i Statistics sing SPSS
isc ove ring i Statistics sing SPSS S E C O N D! E D I T I O N (and sex, drugs and rock V roll) A N D Y F I E L D Publications London o Thousand Oaks New Delhi CONTENTS Preface How To Use This Book Acknowledgements
More informationIAPT: Regression. Regression analyses
Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project
More informationCHAPTER ONE CORRELATION
CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to
More informationModeling unobserved heterogeneity in Stata
Modeling unobserved heterogeneity in Stata Rafal Raciborski StataCorp LLC November 27, 2017 Rafal Raciborski (StataCorp) Modeling unobserved heterogeneity November 27, 2017 1 / 59 Plan of the talk Concepts
More informationBiology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 8 One Way ANOVA and comparisons among means Introduction
Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 8 One Way ANOVA and comparisons among means Introduction In this exercise, we will conduct one-way analyses of variance using two different
More informationIntro to SPSS. Using SPSS through WebFAS
Intro to SPSS Using SPSS through WebFAS http://www.yorku.ca/computing/students/labs/webfas/ Try it early (make sure it works from your computer) If you need help contact UIT Client Services Voice: 416-736-5800
More informationCarrying out an Empirical Project
Carrying out an Empirical Project Empirical Analysis & Style Hint Special program: Pre-training 1 Carrying out an Empirical Project 1. Posing a Question 2. Literature Review 3. Data Collection 4. Econometric
More informationWELCOME! Lecture 11 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression
More informationObservational studies; descriptive statistics
Observational studies; descriptive statistics Patrick Breheny August 30 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 38 Observational studies Association versus causation
More informationANOVA in SPSS (Practical)
ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling
More informationFrom Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1
From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Contents Dedication... iii Acknowledgments... xi About This Book... xiii About the Author... xvii Chapter 1: Introduction...
More informationBasic Biostatistics. Chapter 1. Content
Chapter 1 Basic Biostatistics Jamalludin Ab Rahman MD MPH Department of Community Medicine Kulliyyah of Medicine Content 2 Basic premises variables, level of measurements, probability distribution Descriptive
More informationMODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING
Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects
More informationMeasuring the User Experience
Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics Chapter 2 Background Tom Tullis and Bill Albert Morgan Kaufmann, 2008 ISBN 978-0123735584 Introduction Purpose Provide
More informationList of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition
List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing
More information1.4 - Linear Regression and MS Excel
1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear
More informationm 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers
SOCY5061 RELATIVE RISKS, RELATIVE ODDS, LOGISTIC REGRESSION RELATIVE RISKS: Suppose we are interested in the association between lung cancer and smoking. Consider the following table for the whole population:
More informationBusiness Statistics Probability
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationBefore we get started:
Before we get started: http://arievaluation.org/projects-3/ AEA 2018 R-Commander 1 Antonio Olmos Kai Schramm Priyalathta Govindasamy Antonio.Olmos@du.edu AntonioOlmos@aumhc.org AEA 2018 R-Commander 2 Plan
More informationWDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?
WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters
More informationData Analysis with SPSS
Data Analysis with SPSS A First Course in Applied Statistics Fourth Edition Stephen Sweet Ithaca College Karen Grace-Martin The Analysis Factor Allyn & Bacon Boston Columbus Indianapolis New York San Francisco
More informationIntroduction to Econometrics
Global edition Introduction to Econometrics Updated Third edition James H. Stock Mark W. Watson MyEconLab of Practice Provides the Power Optimize your study time with MyEconLab, the online assessment and
More informationCitation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.
University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More information2. Scientific question: Determine whether there is a difference between boys and girls with respect to the distance and its change over time.
LDA lab Feb, 11 th, 2002 1 1. Objective:analyzing dental data using ordinary least square (OLS) and Generalized Least Square(GLS) in STATA. 2. Scientific question: Determine whether there is a difference
More information6. Unusual and Influential Data
Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the
More informationProblem set 2: understanding ordinary least squares regressions
Problem set 2: understanding ordinary least squares regressions September 12, 2013 1 Introduction This problem set is meant to accompany the undergraduate econometrics video series on youtube; covering
More informationBiology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction
Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction In this exercise, we will gain experience assessing scatterplots in regression and
More informationSurvey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.
Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation
More informationSurvey research (Lecture 1)
Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation
More informationYour Task: Find a ZIP code in Seattle where the crime rate is worse than you would expect and better than you would expect.
Forensic Geography Lab: Regression Part 1 Payday Lending and Crime Seattle, Washington Background Regression analyses are in many ways the Gold Standard among analytic techniques for undergraduates (and
More informationDr. Kelly Bradley Final Exam Summer {2 points} Name
{2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.
More informationStill important ideas
Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement
More informationCNV PCA Search Tutorial
CNV PCA Search Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Data Preparation 2 A. Join Log Ratio Data with Phenotype Information.............................. 2 B. Activate only
More informationINTRODUCTION TO ECONOMETRICS (EC212)
INTRODUCTION TO ECONOMETRICS (EC212) Course duration: 54 hours lecture and class time (Over three weeks) LSE Teaching Department: Department of Economics Lead Faculty (session two): Dr Taisuke Otsu and
More information11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES
Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are
More informationSummary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0
Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0 Overview 1. Survey research and design 1. Survey research 2. Survey design 2. Univariate
More informationMidterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.
Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and
More informationMeasurement Error 2: Scale Construction (Very Brief Overview) Page 1
Measurement Error 2: Scale Construction (Very Brief Overview) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 22, 2015 This handout draws heavily from Marija
More informationPart 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.
Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. 1. The bell-shaped frequency curve is so common that if a population has this shape, the measurements are
More informationUnderstandable Statistics
Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement
More informationHour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1
Agenda for Week 3, Hr 1 (Tuesday, Jan 19) Hour 1: - Installing R and inputting data. - Different tools for R: Notepad++ and RStudio. - Basic commands:?,??, mean(), sd(), t.test(), lm(), plot() - t.test()
More informationMULTIPLE REGRESSION OF CPS DATA
MULTIPLE REGRESSION OF CPS DATA A further inspection of the relationship between hourly wages and education level can show whether other factors, such as gender and work experience, influence wages. Linear
More informationBangor University Laboratory Exercise 1, June 2008
Laboratory Exercise, June 2008 Classroom Exercise A forest land owner measures the outside bark diameters at.30 m above ground (called diameter at breast height or dbh) and total tree height from ground
More informationTwo-Way Independent ANOVA
Two-Way Independent ANOVA Analysis of Variance (ANOVA) a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment. There
More informationOrdinary Least Squares Regression
Ordinary Least Squares Regression March 2013 Nancy Burns (nburns@isr.umich.edu) - University of Michigan From description to cause Group Sample Size Mean Health Status Standard Error Hospital 7,774 3.21.014
More informationApplication of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties
Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point
More informationResearch Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process
Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social
More informationSociology Exam 3 Answer Key [Draft] May 9, 201 3
Sociology 63993 Exam 3 Answer Key [Draft] May 9, 201 3 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. Bivariate regressions are
More informationReadings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F
Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
More informationWar and Relatedness Enrico Spolaore and Romain Wacziarg September 2015
War and Relatedness Enrico Spolaore and Romain Wacziarg September 2015 Online Appendix Supplementary Empirical Results, described in the main text as "Available in the Online Appendix" 1 Table AUR0 Effect
More informationMEA DISCUSSION PAPERS
Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de
More informationLimited dependent variable regression models
181 11 Limited dependent variable regression models In the logit and probit models we discussed previously the dependent variable assumed values of 0 and 1, 0 representing the absence of an attribute and
More informationUNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016 STAB22H3 Statistics I, LEC 01 and LEC 02 Duration: 1 hour and 45 minutes Last Name: First Name:
More information10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible
Pledge: 10/4/2007 MATH 171 Name: Dr. Lunsford Test 1 100 Points Possible I. Short Answer and Multiple Choice. (36 points total) 1. Circle all of the items below that are measures of center of a distribution:
More informationBackground Information. Instructions. Problem Statement. HOMEWORK INSTRUCTIONS Homework #2 HIV Statistics Problem
Background Information HOMEWORK INSTRUCTIONS The scourge of HIV/AIDS has had an extraordinary impact on the entire world. The spread of the disease has been closely tracked since the discovery of the HIV
More informationChapter 3: Examining Relationships
Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression
More informationQ: How do I get the protein concentration in mg/ml from the standard curve if the X-axis is in units of µg.
Photometry Frequently Asked Questions Q: How do I get the protein concentration in mg/ml from the standard curve if the X-axis is in units of µg. Protein standard curves are traditionally presented as
More information1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.
LDA lab Feb, 6 th, 2002 1 1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. 2. Scientific question: estimate the average
More information2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%
Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of
More informationECON Introductory Econometrics Seminar 7
ECON4150 - Introductory Econometrics Seminar 7 Stock and Watson EE11.2 April 28, 2015 Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 1 / 25 E. 11.2 b clear set more
More informationMultiple Regression Analysis
Multiple Regression Analysis Basic Concept: Extend the simple regression model to include additional explanatory variables: Y = β 0 + β1x1 + β2x2 +... + βp-1xp + ε p = (number of independent variables
More informationChapter 1: Explaining Behavior
Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring
More informationChoosing a Significance Test. Student Resource Sheet
Choosing a Significance Test Student Resource Sheet Choosing Your Test Choosing an appropriate type of significance test is a very important consideration in analyzing data. If an inappropriate test is
More informationClincial Biostatistics. Regression
Regression analyses Clincial Biostatistics Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a
More informationName: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.
Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam May 28 th, 2015: 9am to 1pm Instructions: 1. There are seven questions and 12 pages. 2. Read each question carefully. Answer
More informationAnalysis and Interpretation of Data Part 1
Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying
More informationEconometric Game 2012: infants birthweight?
Econometric Game 2012: How does maternal smoking during pregnancy affect infants birthweight? Case A April 18, 2012 1 Introduction Low birthweight is associated with adverse health related and economic
More informationOne-Way Independent ANOVA
One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.
More informationTesting Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results.
10 Learning Objectives Testing Means After reading this chapter, you should be able to: Related-Samples t Test With Confidence Intervals 1. Describe two types of research designs used when we select related
More information4. STATA output of the analysis
Biostatistics(1.55) 1. Objective: analyzing epileptic seizures data using GEE marginal model in STATA.. Scientific question: Determine whether the treatment reduces the rate of epileptic seizures. 3. Dataset:
More informationUnit 7 Comparisons and Relationships
Unit 7 Comparisons and Relationships Objectives: To understand the distinction between making a comparison and describing a relationship To select appropriate graphical displays for making comparisons
More informationSPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.
SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods
More informationThe North Carolina Health Data Explorer
The North Carolina Health Data Explorer The Health Data Explorer provides access to health data for North Carolina counties in an interactive, user-friendly atlas of maps, tables, and charts. It allows
More informationQuantitative Methods in Computing Education Research (A brief overview tips and techniques)
Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu
More informationStatistical reports Regression, 2010
Statistical reports Regression, 2010 Niels Richard Hansen June 10, 2010 This document gives some guidelines on how to write a report on a statistical analysis. The document is organized into sections that
More informationStudent name: SOCI 420 Advanced Methods of Social Research Fall 2017
SOCI 420 Advanced Methods of Social Research Fall 2017 EXAM 1 RUBRIC Instructor: Ernesto F. L. Amaral, Assistant Professor, Department of Sociology Date: October 12, 2017 (Thursday) Section 904: 2:20 3:35pm
More informationBinary Diagnostic Tests Paired Samples
Chapter 536 Binary Diagnostic Tests Paired Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary measures
More informationApplications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis
DSC 4/5 Multivariate Statistical Methods Applications DSC 4/5 Multivariate Statistical Methods Discriminant Analysis Identify the group to which an object or case (e.g. person, firm, product) belongs:
More informationStepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality
Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,
More informationResults & Statistics: Description and Correlation. I. Scales of Measurement A Review
Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize
More informationANOVA. Thomas Elliott. January 29, 2013
ANOVA Thomas Elliott January 29, 2013 ANOVA stands for analysis of variance and is one of the basic statistical tests we can use to find relationships between two or more variables. ANOVA compares the
More informationBiostatistics II
Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,
More informationCRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys
Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests
More informationPropensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research
2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy
More information