Introduction of Empirical Analysis using Stata: For Beginners

Size: px
Start display at page:

Download "Introduction of Empirical Analysis using Stata: For Beginners"

Transcription

1 WBS seminar ('17/12/23) 1 Introduction of Empirical Analysis using Stata: For Beginners Lecturer: Tohru Yoshioka-Kobayashi Project Research Associate Department of Technology Management for innovation Graduate School of Engineering, the University of Tokyo t-koba@tmi.t.u-tokyo.ac.jp Acknowledgements: Mr. Kisa Sugihara and Mr. Akihiro Kawamura made a great contribution to the English translation. This lecture material can be used secondary according to the Creative Commons name display. Please note that there are some areas that do not adequately touch the statistical rigor.

2 0.Introduction 2 Introduction of the lecturer Researcher in MOT: '15 Ph.D. in Engineering from UTokyo Studying an organizational management in technology and design development Researcher in IP policy: '07 Master in Law from Osaka-U Seeking policy implications in the intelletual property law Career Assistant in legal affairs in the univ. start-up (Signpost, Corp.) Policy analysit in a private think tank (Mitsubishi Res. Inst.) Hitotsubashi Univ. & Univ. of Tokyo

3 0.Introduction 3 Goal We will learn basic knowledge and skills to reveal (or proof) a causal relationship. Even those who are not bright in mathematics will be able to analyze by yourself after the seminar The contents of the lecture are based on statistics, but no formula is used. Specialized in general-purpose analytical methods We use Stata.

4 0.Introduction 4 Agenda I. Preparation for the Analysis: How to Load data II. Descriptive Statistics and Graphs III. Data Processing IV. Regression Analysis V. Reporting of Regression Results

5 0.Introduction 5 Empirical analysis procedures 1 Setting research questions 2 Literature review 3 Causal model design 8 Creating a data set 9 Analysis 10 Discussion (interpretation) 4 Search for statistics and other data sources 5 Perform simple verification 6 Collecting data 7 Cleaning data (Data cleansing) Carried out in the head With data Embodiment

6 I. Preparation for the Analysis: How to Load data 6

7 I. Preparation for the analysis 7 1)Characteristics of statistical analysis software Stata SPSS R GRETL Features High High Medium (High w/add-in) User experience Good Good Bad Good Price High High Free Free Support Characteri stics Official Support + a couple of books Strong in the analysis of the social science Official support + Books A little strong in the analysis of the natural science A variety of information online + Books Strong in data processing Medium (High w/add-in) Information online Strong in analysis of the economics

8 I. Preparation for the analysis 8 2)Data to use SampleData_OECD.txt Created from OECD, Main Science and Technology Indicator tab-separated data Records of the following values 2008 and 2013 and their growth in 2013 (compare to those in 2008) Workforce population (thousands) PCT Patent applications (number of patents)...number of patent applications that are willing to apply to foreign countries Industry Value added (US $ million) Technology trade received (US $ Million) Technology trade payments (US $ Million) Technical trade balance (US $ million)...amount Received - payment

9 I. Preparation for the analysis 9 2)Data to use Data item Variable name Country Region_Narrow Region_Broad Laborforce_2008_thousands Content Country Region name Laborforce_2013_thousands (2013) Laborforce_growthrate pctpatentapplication_2008 Continent name Workforce population (thousands) (2008) Growth ( ) pctpatentapplication_2013 (2013) Pct_growthrate Valueadded_2008_m_usd Valueadded_2013_m_usd (2013) Number of international patent applications (2008) Growth rate( ) Industry Value added (US $ Million) (2008) Valueadded_growthrate Growth rate ( ) ValueAdded_Growth_M_USD Growth value ( ) Variable name Techreceipts_2008_m_usd Content Techreceipts_2013_m_usd (2013) Techreceipts_growthrate Techpayments_2008_m_usd Techpayments_2013_m_usd (2013) Techpayments_growthrate Techbalance_2008_m_usd Techbalance_2013_m_usd (2013) Techbalance_growth_m_usd Laborforce_growth_dummy Techbalance_growth_dummy Asiapacific_dummy Europe_dummy Eu_dummy Technology trade received (US $ Million) (2008) Growth rate( ) Technology trade payments (US $ Million) (2008) Growth rate( ) Technical trade balance of payment (US $ Million) (2008) Growth value (US $ Million) ( ) Dummy variable takes 1 if labor force population growth rate > 0 Dummy variable takes 1 if technology trade balance growth rate > 0 Dummy variable takes 1 if the country is in Asia or Paficif (including North America) Dummy variable takes 1 if the country is in Europe Dummy variable takes 1 if the country is one of the EU members

10 I. Preparation for the analysis 10 2)Data to use Questions to be solved What factor does increase the industry valueadded? What factor does increase the technology balance of payment? Important limitation: Examine only within the available data

11 I. Preparation for the analysis 11 3)Data for the experienced Overview IMPP_Eng_DATA.txt or IMPP_EnglishEdu_En.xlsx Source: Ministry of Education(MEXT) English Skill Survey in 2016 Surveys to public high schools and junior high schools Other government statistics Observed year FY2016

12 I. Preparation for the analysis 12 3)Data for the experienced Items Classification Items Variable names Basic information Prefecture ID ID Prefecture High School English Number of English teachers in public HS...(a) Teachers' English Skill Those who took an English examination among (a)...(b) (MEXT English Skill Survey in 2016) Those who graded Eiken Pre-1 and upper and these equivalents among (b)...(c) (c)/(a) High School Students' Seniors in public HS...(d) English Skill (MEXT Those who took an English examination among (d)...(e) English Skill Survey in Those who graded Eiken Pre-2 and upper among (e)...(f) 2016) Those who are regarded as equivalent to Eiken Pre-2 and upper except (f)...(g) (f)+(g) ((f)+(g))/(d) Pref_Str HS_T_ALL HS_T_EXAM HS_T_E1 HS_T_E1_R HS_S_ALL HS_S_EXAM HS_S_E2 HS_S_OT HS_S_E2OT HS_S_E2OT_R

13 I. Preparation for the analysis 13 3)Data for the experienced Items Classification Items Variable names Junior High School Number of English teachers in public JHS...(h) JH_T_ALL English Teachers' Those who took an English examination among (h)...(i) JH_T_EXAM English Skill (MEXT English Skill Survey in Those who graded Eiken Pre-1 and upper and these JH_T_E1 2016) equivalents among (i)...(j) (j)/(h) JH_T_E1_R Junior High School Seniors in public JHS...(k) Students' English Skill Those who took an English examination among (k)...(l) (MEXT English Skill Survey in 2016) Those who graded Eiken Pre-2 and upper among (l)...(m) Those who are regarded as equivalent to Eiken Pre-2 and upper except (m)...(n) (m)+(n) ((m)+(n))/(k) JH_S_ALL JH_S_EXAM JH_S_E2 JH_S_OT JH_S_E2OT JH_S_E2OT_R

14 I. Preparation for the analysis 14 3)Data for the experienced Items Classification Items Variable names Num. of High Schools Num. of high schools...(o) HS_I_ALL (MEXT Educational Num. of private high schools...(p) HS_I_PRIV Institution Basic Survey) Num. of public high schools...(q) HS_I_PUBL Num. of students who newly attend collage, university, and junior collage (MEXT Educational Institution Basic Survey) Num. of students who newly attend collages and universities (by HS location) Num. of students who newly attend junior collages (by HS location) Num. of graduate from JSH in 2013 (by JHS location) Percentage of students who go on to collages, universities, and junior collages Percentage of students who go on to collages and universities HS_S_UNIV_ENT HS_S_JC_ENT JH_S_PREVALL HS_S_UNJC_R HS_S_UNIV_R

15 I. Preparation for the analysis 15 3)Data for the experienced Questions What factors do influence on English skills of high school students?

16 I. Preparation for the analysis 16 4)Load data Statistics software has a fixed format The structure of the data must be followed as below Individual observation target in vertical direction (row direction) Variables (index) for each observation object in the horizontal direction (column direction) The top line should have a variable name Variable Do not put a line break in variable names No Name Gender Age Height 1 M.Y. Observations M S. F K.K. M

17 I. Preparation for the analysis 17 4)Load data Variable name guidelines How to name variables English letters and _(underscore) only make it safe. You should prevent use other symbols or Japanese Don't put a blank It is better not to use number as a first letter. Note) The data itself may contain Japanese and symbols

18 I. Preparation for the analysis 18 4)Load data File format It is best to read the Excel file. It is possible for STATA (though the old version does not work) If not,"tab-delimited text" is better than CSV. CSV data separate variables by ","(comma). In the numeric data, Excel and other database softwares may add "," as the digit indication. To avoid to be treated as separeted variables, these softwares add double-quotation like 333,231,298 when file is saved. Loading the file, R and Stata may treat numeric variables as a string. If the file is separated by tab, you can prevent this.

19 I. Preparation for the analysis 19 4)Load data FileMenu>Import> Choose Text data created by a spreadsheet

20 I. Preparation for the analysis 20 4)Load data Click on Browse [ii] Click on Browse [i] Keep checking tabdelimited data in advance

21 I. Preparation for the analysis 21 4)Load data On the file open window, choose Text Files (*.txt) and then open the data file Change to Text Files (*.txt)

22 I. Preparation for the analysis 22 4)Load data If you see a variable in the top right it is success Here

23 I. Preparation for the analysis 23 4)Load data Note the type of each variable in the imported data int Long Double Number (can be calculated) Byte 0/1(Can be calculated) Str String (not calculated) When there is garbage in the data or output to a tabdelimited text format with You can see it here.

24 I. Preparation for the analysis 24 4)Load data The type of the variable can be confirmed from [Variable Manager] Here

25 I. Preparation for the analysis 25 4)Load data The correct method A variable that is treated as string-type incorrectly can be fixed in DataMenu >Create or change data>other variabletransformation Commands>Convert variables from string to numeric.

26 II. Descriptive Statistics and Graphs 26

27 II. Descriptive Statistics and Graphs 27 1) Descriptive statistics View descriptive statistics Statistics Menu >Summaries, tables, and tests >Summary and descriptive statistics >Summary Statistics

28 II. Descriptive Statistics and Graphs 28 1) Descriptive statistics View descriptive statistics Just click on the data you want to aggregate in Variables [I] Just click and choose... [ii]ok

29 II. Descriptive Statistics and Graphs 29 1) Descriptive statistics View descriptive statistics. summarize laborforce_growthrate pct_growthrate valueadded_growthrate techbalance_gro > wth_m_usd Variable Obs Mean Std. Dev. Min Max laborforce~e pct_growth~e valueadded~e tech~h_m_usd Long variable names are omitted Standard deviation #Command lines for descriptive statistics summarize laborforce_growthrate pct_growthrate

30 II. Descriptive Statistics and Graphs 30 1) Descriptive statistics View descriptive statistics by/if/in Tags can be narrowed and aggregated by group [i]check here [ii]select a variable to be the groupʻs base (For example Europe_dummy)

31 II. Descriptive Statistics and Graphs 31 1) Descriptive statistics View descriptive statistics (results by group) -> europe_dummy = 0 Variable Obs Mean Std. Dev. Min Max laborforce~e pct_growth~e valueadded~e tech~h_m_usd > europe_dummy = 1 Variable Obs Mean Std. Dev. Min Max laborforce~e pct_growth~e valueadded~e tech~h_m_usd #Descriptive statistics by groups by europe_dummy, sort : summarize laborforce_growthrate pct_growthrate

32 II. Descriptive Statistics and Graphs 32 1) Descriptive statistics Correlations between variables Statistics > Summaries, tables and tests > Summary and descriptive statistics > Correlations and covariances #Correlations correlate valueadded_growthrate techbalance_growth_m_usd

33 II. Descriptive Statistics and Graphs 33 1) Descriptive statistics Correlations between variables (cont.)

34 II. Descriptive Statistics and Graphs 34 1) Descriptive statistics Correlations between variables (cont.): Results. correlate valueadded_growthrate techbalance_growth_m_usd techbalance_growth_dummy laborforce_growthrate pct_growthrate (obs=29) eu_dummy valuea~e ~h_m_usd techba~y laborf~e pct_gr~e eu_dummy valueadded~e tech~h_m_usd techbalanc~y laborforce~e pct_growth~e eu_dummy

35 II. Descriptive Statistics and Graphs 35 2)Graphs Drawing a histogram Graphics Menu > Histogram

36 II. Descriptive Statistics and Graphs 36 2)Graphs Drawing a histogram (cont.) Select a variable

37 II. Descriptive Statistics and Graphs 37 2)Graphs Drawing a histogram (cont.): Results Density ValueAdded_GrowthRate #Drawing a histogram histgram valueadded_growthrate

38 II. Descriptive Statistics and Graphs 38 2)Graphs Drawing a histogram by groups You can create a histogram for each group in the By tab [i]click By [ii] Select variables to use for grouping Density ValueAdded_GrowthRate Graphs by Europe_Dummy 0 5

39 II. Descriptive Statistics and Graphs 39 2)Graphs Drawing a histogram by groups Command lines #Drawing a histogram by groups histgram valueadded_growthrate, by(europe_dummy) Increase/decrease bins #Change the number of bins histgram valueadded_growthrate, bin(12) Density ValueAdded_GrowthRate

40 II. Descriptive Statistics and Graphs 40 2)Graphs Drawing a scatter chart Graphics Menu>Twoway graph (scatter, line, etc.)

41 II. Descriptive Statistics and Graphs 41 2)Graphs Drawing a scatter chart [i]click Create

42 II. Descriptive Statistics and Graphs 42 2)Graphs Drawing a scatter chart (cont.) [i]select the Scatter in the basic plots [ii]select each axis variable [iii] Press accept to return to the previous screen. Then press ok #Drawing a scatter chart twoway (scatter valueadded_growthrate pct_growthrate)

43 II. Descriptive Statistics and Graphs 43 2)Graphs Drawing a scatter chart (cont.): Results PCT_GrowthRate LaborForce_GrowthRate

44 II. Descriptive Statistics and Graphs 44 2)Graphs Drawing a scatter plot matrix Graphics > Scatterplot matrix

45 II. Descriptive Statistics and Graphs 45 2)Graphs Drawing a scatter plot matrix Select variables ValueAdded_GrowthRate LaborForce_GrowthRate PCT_GrowthRate EU_Dummy #Drawing a scatter plot matrix graph matrix valueadded_growthrate laborforce_growthrate pct_growthrate eu_dummy

46 II. Descriptive Statistics and Graphs 46 2)Graphs Drawing a box plot Graphics > Box plot

47 II. Descriptive Statistics and Graphs 47 2)Graphs Drawing a box plot PCT_GrowthRate #Drawing a box plot graph box pct_growthrate

48 II. Descriptive Statistics and Graphs 48 2)Graphs Drawing a box plot by groups [i]click Categories tab [ii]check Group1 [iii]select a variable for grouping #Drawing a box plot by groups graph box pct_growthrate, over(region_broad)

49 II. Descriptive Statistics and Graphs 49 2)Graphs Drawing a box plot by groups: Results PCT_GrowthRate Asia-Pacific Europe Other

50 II. Descriptive Statistics and Graphs 50 3)Exercise Our dataset (SampleData_OECD) includes one variable contains errors Hint: They are obvious errors Hint: Error are in specific variales among labor force, PCT, and value added related variables Find the variable by using summary statistics, histgrams, and scatter plots

51 II. Descriptive Statistics and Graphs 51 3)Exercise Answer ValueAdded_Growth_M_USD They calculated the value in 2008 minus the value in Thus, too many negative growths!

52 III. Data Processing 52

53 III. Data Processing 53 1)Create a new variable How to compute a new variable Data Menu>Create or change data>create new variable

54 III. Data Processing 54 1)Create a new variable How to compute a new variable [i]fill the name of the new variable [ii]click Create

55 III. Data Processing 55 1)Create a new variable How to compute a new variable (cont.) log( techbalance_growth_m_u sd ) [i] The mathematical process can be chosen from Function >Mathmatical [ii] You can choose a variable from variables #Create a new variable generate log( techbalance_growth_m_usd )

56 III. Data Processing 56 2)Save the dataset Save the modified dataset [1] File > Export > Textdata (delimited, *.csv)

57 III. Data Processing 57 2)Save the dataset Save the modified dataset [1] Input a file name Check Tab-delimited #Save the dataset in a tab delimited format text file export delimited using "OECD_data_v02.txt", delimiter(tab) replace

58 III. Data Processing 58 2)Save the dataset Save the modified dataset [2] File > Save as... #Save the dataset in a Stata data file(.dta) save "OECD_data.dta"

59 IV. Regression Analysis 59

60 IV. Regression Analysis 60 1) Estimating correlations with multiple variables: Basics Collect a large number of data and estimate an influence of each factor Performance b a c Factor 1 Green layer indicates the layer which is the most closest with all data (dots) Performance =a*factor 1 +b*factor 2+c Regression Analysis Factor 2 (note) Generally, green layer is not triangle, but in this example, we put limitation on Factor 1 and 2 (>0) and Performance (< p)

61 IV. Regression Analysis 61 1) Estimating correlations with multiple variables: Basics Key terms Dependent variable The variable to be estimated. In many cases, performance indicators Explanatory variables, independent variables Variables that are affected (or think there is a strong correlation with) dependent variable Control variables A variable that is not an explanatory variable that is affecting (or thinks there is a strong correlation) dependent variable In many cases, the variables used in prior research

62 IV. Regression Analysis 62 1) Estimating correlations with multiple variables: Basics What can be used as a explanatory variable? i. Squared term Estimates along with the normal one (first term?) and see the degree of influence of both to find a quadratic effect Multi-collinearity is often allowed between first term (x) and squared term (x 2 ) Interpretation Coefficients of X Coefficients of X 2 Interpretation 1 Significantly (+) 2 Significantly (-) Significantly (-) Significantly (+) Inverse-U shaped U-shaped 3 Not Significant Significantly (+) Positive impact is non-linear 4 Significantly (+) Not Significant A linear positive impact

63 IV. Regression Analysis 63 1) Estimating correlations with multiple variables: Basics What can be used as a explanatory variable? (cont.) ii. Cross section Use when there is a condition and how the explanatory variable works differently (check the moderator effect) Estimates along with each explanatory variables and see the degree of influence of both Factor 1 Performance Factor 2 Influence of Factor 1 depend on Factor 2

64 IV. Regression Analysis 64 1) Estimating correlations with multiple variables: Basics What can be used as a explanatory variable? (cont.) ii. Cross section (cont.) Notes: Cross section often cause multicollinearity with original explanatory variables: Need centering or standardization Centering: Original value mean value Standardization: (Original value - mean) / standard deviation If there is an unbalance between two explanatory variables, cross section will have biased influence: Need standardization or alignment of the number of digits

65 IV. Regression Analysis 65 1) Estimating correlations with multiple variables: Basics What can be used as a explanatory variable? (cont.) iii. Dummy variable The variable takes 1 if fulfill specific condition, otherwise 0. Useful to control the differences of conditions or affiliations (Example) Previous race win dummy: Takes 1 if the horse won in the previous race (Source) JRA Bolton, R. N., & Chapman, R. G. (1986). Searching for positive returns at the track: A multinomial logit model for handicapping horse races. Management Science, 32(8),

66 IV. Regression Analysis 66 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used The number of samples does not have to be large if it meets from i to v i. All explanatory variables are data derived from the experiment. (An uncertain value that takes a certain range = not a random variable) ii. The expected value of the error is 0 iii. No heteroscedasticity The error term is not unevenly distributed (see next page) The coefficients estimated for each explanatory variable are mathematically optimal solutions iv. No correlation between explanatory variables and errors Variable describing the explained variable is not lacking There are no variables that affect both the description variable and the explanatory variable. It also says There is no endogenous or "error terms are non-correlated" v. Error is normal distribution vi. It becomes possible to appropriately judge whether coefficients estimated for each explanatory variable are statistically correct There are no strong correlation between explanatory variables Bias is not included in the coefficients estimated for each explanatory variable

67 IV. Regression Analysis Modified the material provided by Dr. Koichi Hasegawa 67 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used iii) No heteroscedasticity Heteroscedasticity: the scattering of error tends to be greatly scattered in a specific area and scattered small in another area under the influence of a certain factor. The result is not reliable in the greatly scattered area (it is only a value taken between) Error Check by Breusch-pagan Test, or LM test Estimated formula If there is uneven dispersion Solution 1. Add missing variables to model 2. Logarithmic translation of explanatory variables and explained variables 3. Use a robust standard error 4. Estimating by Weighted least squares method (details, practice omitted), maximum likelihood method Cause1

68 IV. Regression Analysis 68 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used iv) No correlation between error and explanatory variable = no endogeneity (or no omitted variable bias) Knowledge volume and correlation Amount of knowledge Number of papers read Research time Number of hours spent Luck??? (Studentsʼ smartness) Cannot measure Highly rated research papers Evaluation from Instructors/ Awards/ Number of paper cited appear in the error sector Example:Scenes in which the seminar instructor's influence works both the number of accessible articles and the evaluation It cannot estimate the pure effect of the amount of knowledge as long as it is not possible to measure the goodness of the head of the person. Must be consider before the analysis. Durbin-wu-hausman test detect the endogeneity If there is an endogeneity Solution Fixed effect model estimation on panel data Adding control variables Adopt method of instrumental variables (IV)

69 IV. Regression Analysis 69 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used iv) No correlation between error and explanatory variable = no endogeneity (or no omitted variable bias) Phenomena observed when omitted variable bias exists R 2 is low (the model's explanatory power is weak) We have not added explanatory variables and control variables (It is not important in causality model, but it affects variable to be explained) that have been confirmed to have a significant influence on previous studies using the same explained variable Solution - check the previous research carefully!

70 IV. Regression Analysis 70 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used iv) No correlation between error and explanatory variable = no endogeneity (simultaneity bias or reverse causality) Amount of knowledge Number of papers read Correlation Example: Scenes where you can concentrate on research by being known as writing a good paper Research time Number of hours spent Devoted to research Already published highly rated research papers Highly rated research papers Evaluation from Instructors/ Awards/ Number of paper cited If itʻs not in the explanatory variable, its effect will appear in the error term Correct calculation is impossible in circulation. Must be consider before the analysis. Detectable by Durbin-Wu-Hausman test. Solution Add the value of one term before the explanatory variable Adopt method of instrumental variables (IV)

71 IV. Regression Analysis 71 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used v)normal distribution of errors However, if the sample is large enough (about a few hundreds) no verification required If the error is not normally distributed, the estimated line is not the correct slope. Confirm whether the residual is normal distribution by Kurtosis / Skewness Test or Shapiro-Wilk Normality Test If it is not a normal distribution Frequency of value to take error with actual samples Solution 1. Logarithmically transform (Log) and squared the dependent variable and explanatory variable 2. Calculate by the maximum likelihood method, like Possison model, Probit model, or Tobit model

72 IV. Regression Analysis 72 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) Conditions that OLS can be used vi)no strong correlation between explanatory variables: nonexistent of multicollinearity Multi-collinearity: it is not known which variables to influence among highly correlated explanatory variables, and the estimated coefficients become inaccurate Observed phenomena Although the coefficient of determination is high, the t value of each explanatory variable is low (not significant) Abnormally high standard error It does not coincide with the sign (+ or ) of the coefficient of the result estimated by the model with only one correlative explanatory variable. VIF (Variance inflation Factor) is obtained and it is confirmed whether or not a variable showing 4 or more (or 10 or more) exists If there is a multicollinearity Solution 1. Eliminating unnecessary explanatory variables 2. Convert explanatory variables to difference or ratio 3. Factor analysis or principal component analysis is carried out to the explanatory variables, creating a non-correlated synthetic variable

73 IV. Regression Analysis 73 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) 7 steps in regression analysis 1 Design the causal relationship model and drop it into the indicator Make a model without endogeneity (omitted variable bias, simultaneity bias) Samples should be large (at least explanatory variable 2 or ) 2 Create descriptive statistics & correlation matrix Be sure to create a histogram to verify the distribution If the dependent variable does not take normal distribution, estimates other than OLS are also considered If the digits of the explanatory variable are different from each other, multiply by 1,000, prepare by 1 / 1,000 times etc. For explanatory variables whose correlation is too strong, either one is dropped or later checked for multicollinearity

74 IV. Regression Analysis 74 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) 7 steps in regression analysis 3 Make two models with only control variables without explanatory variables and models with explanatory variables Compare R2 of both models and see contribution of explanatory variables 4 If it contains a variable with strong correlation, check whether there is multiple collinearity Check VIF : It is more than 4 or more (or 10 or more)? In the case of multiple collinearity, one drops out, converts a variable, aggregates it by principal component analysis, etc.

75 IV. Regression Analysis 75 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) 7 steps in regression analysis 5 If the estimation models including variables with strong correlations, you should conduct multiple estimations, in which there correlated variables are included/excluded If the sign (positive or negative) of the estimation result of that variable changes depending on the model, the effect of multiple collinearity strongly appears If there is a pair of explanatory variables that has a high correlation in the correlation matrix table, but does not have multiple collinearity, this can show that there is no problem in the estimation Model 1 Model 2 Model 3 Strongly correlated Explanatory variable A Explanatory variable B Included Not included Included Not included Included Included

76 IV. Regression Analysis 76 1) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) 7 steps in regression analysis 6 After performing multiple regression analysis, obtain error and verify it whether the error is not uneven distribution or normal distribution Inhomogeneity dispersion of error is confirmed by Breush-Pagan test and LM test If the error is unevenly distributed, use a robust standard error, etc. Whether it follows the normal distribution is confirmed by skewness kurtosis test and Shapiro-Wilk normality test If the error does not follow the normal distribution, logarithmic transformation of the variable, use the maximum likelihood method, etc. However, if the number of samples is large, not necessary

77 IV. Regression Analysis ) Estimating correlations with multiple variables Regression by ordinary least-square method (OLS) 7 steps in regression analysis Verifying the robustness of estimated results Exclude data that may be outliers The data which may be different in nature is estimated separately. Since OLS estimates the average value of explanatory variables, the influence of things that take outliers in explained variables is significant Countermeasures should be regression of the quantile (median, 25 th percentile, 75 th percentile estimate)

78 IV. Regression Analysis 78 2)Exercise Verify whether the following models are correct by using OECD data. Activate technology development Increase ratio of PCT applications Increase in technical trade balance Increase in income (+) (+) (+) Increase in technical trade balance Increase in income ( ) Being a European country European dummy ( ) Increase in added value of industry Increase ration of added value

79 IV. Regression Analysis 79 3) Run OLS Run OLS Statistics > Linear models and related > Liner regression The explained variable is the first, all the rest are explanatory variables #Regression analysis regress valueadded_growthrate laborforce_growthrate pct_growthrate eu_dummy

80 IV. Regression Analysis 80 3) Run OLS Run OLS (cont.) Set dependent and explanatory variables (including control variable)

81 IV. Regression Analysis 3) Run OLS How to read the output results 81 F statistic (Whether there is a statistically significant difference between this model and the model that does not include any explanatory variables). regress valueadded_growthrate laborforce_growthrate pct_growthrate eu_dummy Number of observations Source SS df MS Number of obs = F(3, 37) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = R valueadded_growthrate Coef. Std. Err. t P> t [95% Conf. Interval] laborforce_growthrate pct_growthrate eu_dummy _cons Estimated coefficient Standard error Significance probability Confidence interval (The factor may actually be between this number)

82 IV. Regression Analysis 82 3) Run OLS Check for multicollinearity After the regression analysis runs: Statistics > Linear models and related > Regression diagnostics > Specification tests, etc. #Compute VIF estat vif

83 IV. Regression Analysis 83 3) Run OLS Check for multicollinearity (cont.) Select Variance inflation factors

84 IV. Regression Analysis 84 3) Run OLS Check for multicollinearity (cont.): Result. estat vif Variable VIF 1/VIF eu_dummy laborforce~e pct_growth~e Mean VIF 1.27 Vif If it is 4 or more, there is multiple collinearity. (Even those who make it 10 or more)

85 IV. Regression Analysis 85 3) Run OLS Confirm heteroscedasticity of error dispersion After the regression analysis runs: Statistics > Linear models and related > Regression diagnostics > Specification tests, etc. #heteroscedasticity test estat hettest

86 IV. Regression Analysis 86 3) Run OLS Confirm heteroscedasticity of error dispersion Test for heteroscedasticity

87 IV. Regression Analysis 87 3) Run OLS Confirm heteroscedasticity of error dispersion: Results. estat hettest Hypothesis is Variance of errors is uniform" Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of valueadded_growthrate chi2(1) = 1.15 Prob > chi2 = In this example, the probability that the assumption that dispersion is uniform is 28% (not very rare) = Interprete that dispersion is uniform

88 IV. Regression Analysis 88 3) Run OLS If heteroscedasticity is found: Robust standard error Statistics > Linear models and related > Liner regression <Same as OLS> [i]click the tab SE/Robust [ii] Select Robust #Regression with robust standard error regress valueadded_growthrate pct_growthrate laborforce_growthrate eu_dummy, vce(robust)

89 IV. Regression Analysis 89 3) Run OLS If heteroscedasticity is found: Robust standard error: Results regress valueadded_growthrate pct_growthrate laborforce_growthrate eu_dummy, vce(robust) Linear regression Number of obs = 41 F( 3, 37) = Robust standard errors are shown instead of standard errors Prob > F = R-squared = Root MSE = Robust valueadded_growthrate Coef. Std. Err. t P> t [95% Conf. Interval] pct_growthrate laborforce_growthrate eu_dummy _cons

90 IV. Regression Analysis 90 3) Run OLS Check the normal distribution of errors First, save the error to a new variable #Save the error to a new variable predict resd, residual

91 IV. Regression Analysis 91 3) Run OLS Check the normal distribution of errors (cont.) Statistics>Summaries >Distributional plots and tests> Skewness/Kurtosis tests for normality #Skewness test sktest resd

92 IV. Regression Analysis 92 3) Run OLS Check the normal distribution of errors (cont.) Select the variable you just created (the error is stored)

93 IV. Regression Analysis 93 3) Run OLS Check the normal distribution of errors (cont.) Hypothesis is Errors take normal distribution". sktest resd Skewness/Kurtosis tests for Normality joint Variable Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi resd Density Residuals In this example, the probability that the assumption that it is normally distributed holds is 65% = Interpreted as being normally distributed When it is not normally distributed, adopt a logarithmic dependent variable or analysis by maximum likelihood method etc.

94 IV. Regression Analysis 94 3) Run OLS Plot estimated results #Run immediately after regression estimates: Store estimated results in a new variable predict p_va_gr In this example, X-axis: pct_growthrate #Plot estimates and actual values twoway (scatter valueadded_growthrate pct_growthrate, mcolor(gray)) (scatter p_va_gr pct_growthrate, mcolor(red)) The estimates are red and the actual data is gray PCT_GrowthRate ValueAdded_GrowthRate Fitted values

95 IV. Regression Analysis 95 3) Run OLS Robustness check: (Example) Drop top/bottom data #Compute percentile and identification of data within certain percentile summarize valueadded_growthrate, detail gen isinuse = inrange(valueadded_growthrate, r(p5), r(p95)) In this example, we create a new To change existing variable isinuse which takes 1 if variable, use replace value added growth of the data is #Change percentiles within top 5% to 95% replace isinuse = inrange(valueadded_growthrate, r(p3), r(p97)) #Regression with selected data regress valueadded_growthrate laborforce_growthrate pct_growthrate eu_dummy if isinuse == 1 if identifies the condition of data to use You must repeat = twice

96 V. Reporting of Regression Results 96

97 V. Reporting of Regression Results 97 1) Reporting of Regression Results Common practice We usually report descriptive statistics correlation matrix regression results You can integrate into one table

98 V. Reporting of Regression Results 98 1) Reporting of Regression Results Common practice Example of descriptive statistics, and correlation matrix Keller, R. T. (2001). Cross-functional project groups in research and new product development: Diversity, communications, job stress, and outcomes. Academy of Management Journal, 44(3),

99 V. Reporting of Regression Results 99 1) Reporting of Regression Results Common practice Examples of regression results Keller, R. T. (2001). Cross-functional project groups in research and new product development: Diversity, communications, job stress, and outcomes. Academy of Management Journal, 44(3),

100 V. Reporting of Regression Results 100 1) Reporting of Regression Results Set up add-ins: outreg2, mkcorr #Install outreg2 (You need to do it only once) ssc install outreg2 #Install mkcorr (You need to do it only once) ssc install mkcorr

101 V. Reporting of Regression Results 101 1) Reporting of Regression Results Export descriptive statistics You can export in MS word format. #Create a new desc_stat.doc file and export descriptive statistics outreg2 using desc_stat.doc, replace sum(log) keep(valueadded_growthrate pct_growthrate laborforce_growthrate Select eu_dummy) variables to export in keep The file (reg_res.doc) will be saved in the folder indicated the status bar Results

102 V. Reporting of Regression Results 102 1) Reporting of Regression Results Export correlation matrix #Export correlation matrix in a text file mkcorr valueadded_growthrate pct_growthrate laborforce_growthrate eu_dummy, log(corr_matrix.txt)

103 V. Reporting of Regression Results 103 1) Reporting of Regression Results Export regression results #Regression analysis regress valueadded_growthrate laborforce_growthrate eu_dummy #Create a new file regress_res.doc and export results in it outreg2 using regress_res.doc, replace ctitle(model 1) #Another regression analysis regress valueadded_growthrate pct_growthrate laborforce_growthrate eu_dummy #Append the results into the file outreg2 using regress_res.doc, append ctitle(model 2)

104 V. Reporting of Regression Results 104 1) Reporting of Regression Results Export regression results: Results

105 V. Reporting of Regression Results 105 2) Visualization of Regression Results Plot estimated marginal effect Graphs showing marginal effects with confidence intervals #Plot marginal effect with confidence intervals graph twoway lfitci valueadded_growthrate pct_growthrate #Plot marginal effect with confidence intervals and original data graph twoway (lfitci valueadded_growthrate pct_growthrate) (scatter valueadded_growthrate pct_growthrate)

106 V. Reporting of Regression Results 106 2) Visualization of Regression Results Plot estimated marginal effect PCT_GrowthRate 95% CI Fitted values ValueAdded_GrowthRate

107 V. Reporting of Regression Results 107 2) Visualization of Regression Results Plot estimated results It is divided depending on whether it is Europe or not, and other values are plotted on the assumption that they are average values #Run immediately after regression estimates: Store estimated results in variables adjust laborforce_growthrate, by(eu_dummy) gen(p2_va_gr) Here, we use the mean value of Laborforce_growthrate #Show estimates twoway (scatter p2_va_gr pct_growthrate if eu_dummy==1, mcolor(blue))(scatter p2_va_gr pct_growthrate if eu_dummy==0, mcolor(red)), legend (order(1 "EU" 2 "Non- EU")) ytitle("value Added Growth") Blue in the EU and red in the case outside the EU

108 V. Reporting of Regression Results 108 2) Visualization of Regression Results You can change it in ytitle Value Added Growth PCT_GrowthRate You can change it in legend (order( ) ) EU Non-EU

109 109 Appendix For further improvement

110 Appendix 110 Variations of regressions for causality analysis Variations of estimation models corresponding with characteristics of the dependent variable Dependent variable = dummy variable Example: Surplus of technology balance of payments logistic regression logit model regression probit model regression Depenedent variable has cut-off point Example: Longitudanal performance of engineers (suddenly decrease due to the retirement, job rotation, and other life events) Tobit model

111 Appendix 111 Variations of regressions for causality analysis Variations of estimation models (cont.) Dependent variable = count & natural number Example: Number of inventions in a organization (the number of inventors who generate n inventions is 1/n 2 of all inventors (Narin&Breitzman, 1995)) Poisson model Negative binomial model

112 Appendix 112 Variations of estimation models to reveal causality Omitted variable bias prevention Panel data analysis Use time series data and exclude unobservable effects of individuals Fixed effect model Random effect model difference-in-difference regression discontinuity

113 Appendix 113 Variations of estimation models to reveal causality Estimation of other than mean value quantile regression

Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame,

Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame, Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ I. True-False. (20 points) Indicate whether the following statements are true or false.

More information

Final Exam - section 2. Thursday, December hours, 30 minutes

Final Exam - section 2. Thursday, December hours, 30 minutes Econometrics, ECON312 San Francisco State University Michael Bar Fall 2011 Final Exam - section 2 Thursday, December 15 2 hours, 30 minutes Name: Instructions 1. This is closed book, closed notes exam.

More information

Multiple Linear Regression Analysis

Multiple Linear Regression Analysis Revised July 2018 Multiple Linear Regression Analysis This set of notes shows how to use Stata in multiple regression analysis. It assumes that you have set Stata up on your computer (see the Getting Started

More information

This tutorial presentation is prepared by. Mohammad Ehsanul Karim

This tutorial presentation is prepared by. Mohammad Ehsanul Karim STATA: The Red tutorial STATA: The Red tutorial This tutorial presentation is prepared by Mohammad Ehsanul Karim ehsan.karim@gmail.com STATA: The Red tutorial This tutorial presentation is prepared by

More information

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) After receiving my comments on the preliminary reports of your datasets, the next step for the groups is to complete

More information

Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)

Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized) Criminal Justice Doctoral Comprehensive Exam Statistics August 2016 There are two questions on this exam. Be sure to answer both questions in the 3 and half hours to complete this exam. Read the instructions

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

An Introduction to Modern Econometrics Using Stata

An Introduction to Modern Econometrics Using Stata An Introduction to Modern Econometrics Using Stata CHRISTOPHER F. BAUM Department of Economics Boston College A Stata Press Publication StataCorp LP College Station, Texas Contents Illustrations Preface

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics : Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Notes for laboratory session 2

Notes for laboratory session 2 Notes for laboratory session 2 Preliminaries Consider the ordinary least-squares (OLS) regression of alcohol (alcohol) and plasma retinol (retplasm). We do this with STATA as follows:. reg retplasm alcohol

More information

Introduction to regression

Introduction to regression Introduction to regression Regression describes how one variable (response) depends on another variable (explanatory variable). Response variable: variable of interest, measures the outcome of a study

More information

Business Research Methods. Introduction to Data Analysis

Business Research Methods. Introduction to Data Analysis Business Research Methods Introduction to Data Analysis Data Analysis Process STAGES OF DATA ANALYSIS EDITING CODING DATA ENTRY ERROR CHECKING AND VERIFICATION DATA ANALYSIS Introduction Preparation of

More information

isc ove ring i Statistics sing SPSS

isc ove ring i Statistics sing SPSS isc ove ring i Statistics sing SPSS S E C O N D! E D I T I O N (and sex, drugs and rock V roll) A N D Y F I E L D Publications London o Thousand Oaks New Delhi CONTENTS Preface How To Use This Book Acknowledgements

More information

IAPT: Regression. Regression analyses

IAPT: Regression. Regression analyses Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

Modeling unobserved heterogeneity in Stata

Modeling unobserved heterogeneity in Stata Modeling unobserved heterogeneity in Stata Rafal Raciborski StataCorp LLC November 27, 2017 Rafal Raciborski (StataCorp) Modeling unobserved heterogeneity November 27, 2017 1 / 59 Plan of the talk Concepts

More information

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 8 One Way ANOVA and comparisons among means Introduction

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 8 One Way ANOVA and comparisons among means Introduction Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 8 One Way ANOVA and comparisons among means Introduction In this exercise, we will conduct one-way analyses of variance using two different

More information

Intro to SPSS. Using SPSS through WebFAS

Intro to SPSS. Using SPSS through WebFAS Intro to SPSS Using SPSS through WebFAS http://www.yorku.ca/computing/students/labs/webfas/ Try it early (make sure it works from your computer) If you need help contact UIT Client Services Voice: 416-736-5800

More information

Carrying out an Empirical Project

Carrying out an Empirical Project Carrying out an Empirical Project Empirical Analysis & Style Hint Special program: Pre-training 1 Carrying out an Empirical Project 1. Posing a Question 2. Literature Review 3. Data Collection 4. Econometric

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

Observational studies; descriptive statistics

Observational studies; descriptive statistics Observational studies; descriptive statistics Patrick Breheny August 30 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 38 Observational studies Association versus causation

More information

ANOVA in SPSS (Practical)

ANOVA in SPSS (Practical) ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling

More information

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1 From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Contents Dedication... iii Acknowledgments... xi About This Book... xiii About the Author... xvii Chapter 1: Introduction...

More information

Basic Biostatistics. Chapter 1. Content

Basic Biostatistics. Chapter 1. Content Chapter 1 Basic Biostatistics Jamalludin Ab Rahman MD MPH Department of Community Medicine Kulliyyah of Medicine Content 2 Basic premises variables, level of measurements, probability distribution Descriptive

More information

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

More information

Measuring the User Experience

Measuring the User Experience Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics Chapter 2 Background Tom Tullis and Bill Albert Morgan Kaufmann, 2008 ISBN 978-0123735584 Introduction Purpose Provide

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

1.4 - Linear Regression and MS Excel

1.4 - Linear Regression and MS Excel 1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear

More information

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers SOCY5061 RELATIVE RISKS, RELATIVE ODDS, LOGISTIC REGRESSION RELATIVE RISKS: Suppose we are interested in the association between lung cancer and smoking. Consider the following table for the whole population:

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Before we get started:

Before we get started: Before we get started: http://arievaluation.org/projects-3/ AEA 2018 R-Commander 1 Antonio Olmos Kai Schramm Priyalathta Govindasamy Antonio.Olmos@du.edu AntonioOlmos@aumhc.org AEA 2018 R-Commander 2 Plan

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

Data Analysis with SPSS

Data Analysis with SPSS Data Analysis with SPSS A First Course in Applied Statistics Fourth Edition Stephen Sweet Ithaca College Karen Grace-Martin The Analysis Factor Allyn & Bacon Boston Columbus Indianapolis New York San Francisco

More information

Introduction to Econometrics

Introduction to Econometrics Global edition Introduction to Econometrics Updated Third edition James H. Stock Mark W. Watson MyEconLab of Practice Provides the Power Optimize your study time with MyEconLab, the online assessment and

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

2. Scientific question: Determine whether there is a difference between boys and girls with respect to the distance and its change over time.

2. Scientific question: Determine whether there is a difference between boys and girls with respect to the distance and its change over time. LDA lab Feb, 11 th, 2002 1 1. Objective:analyzing dental data using ordinary least square (OLS) and Generalized Least Square(GLS) in STATA. 2. Scientific question: Determine whether there is a difference

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

Problem set 2: understanding ordinary least squares regressions

Problem set 2: understanding ordinary least squares regressions Problem set 2: understanding ordinary least squares regressions September 12, 2013 1 Introduction This problem set is meant to accompany the undergraduate econometrics video series on youtube; covering

More information

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction In this exercise, we will gain experience assessing scatterplots in regression and

More information

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4. Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Survey research (Lecture 1)

Survey research (Lecture 1) Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Your Task: Find a ZIP code in Seattle where the crime rate is worse than you would expect and better than you would expect.

Your Task: Find a ZIP code in Seattle where the crime rate is worse than you would expect and better than you would expect. Forensic Geography Lab: Regression Part 1 Payday Lending and Crime Seattle, Washington Background Regression analyses are in many ways the Gold Standard among analytic techniques for undergraduates (and

More information

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Dr. Kelly Bradley Final Exam Summer {2 points} Name {2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

CNV PCA Search Tutorial

CNV PCA Search Tutorial CNV PCA Search Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Data Preparation 2 A. Join Log Ratio Data with Phenotype Information.............................. 2 B. Activate only

More information

INTRODUCTION TO ECONOMETRICS (EC212)

INTRODUCTION TO ECONOMETRICS (EC212) INTRODUCTION TO ECONOMETRICS (EC212) Course duration: 54 hours lecture and class time (Over three weeks) LSE Teaching Department: Department of Economics Lead Faculty (session two): Dr Taisuke Otsu and

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0 Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0 Overview 1. Survey research and design 1. Survey research 2. Survey design 2. Univariate

More information

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do. Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and

More information

Measurement Error 2: Scale Construction (Very Brief Overview) Page 1

Measurement Error 2: Scale Construction (Very Brief Overview) Page 1 Measurement Error 2: Scale Construction (Very Brief Overview) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 22, 2015 This handout draws heavily from Marija

More information

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. 1. The bell-shaped frequency curve is so common that if a population has this shape, the measurements are

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Hour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1

Hour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1 Agenda for Week 3, Hr 1 (Tuesday, Jan 19) Hour 1: - Installing R and inputting data. - Different tools for R: Notepad++ and RStudio. - Basic commands:?,??, mean(), sd(), t.test(), lm(), plot() - t.test()

More information

MULTIPLE REGRESSION OF CPS DATA

MULTIPLE REGRESSION OF CPS DATA MULTIPLE REGRESSION OF CPS DATA A further inspection of the relationship between hourly wages and education level can show whether other factors, such as gender and work experience, influence wages. Linear

More information

Bangor University Laboratory Exercise 1, June 2008

Bangor University Laboratory Exercise 1, June 2008 Laboratory Exercise, June 2008 Classroom Exercise A forest land owner measures the outside bark diameters at.30 m above ground (called diameter at breast height or dbh) and total tree height from ground

More information

Two-Way Independent ANOVA

Two-Way Independent ANOVA Two-Way Independent ANOVA Analysis of Variance (ANOVA) a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment. There

More information

Ordinary Least Squares Regression

Ordinary Least Squares Regression Ordinary Least Squares Regression March 2013 Nancy Burns (nburns@isr.umich.edu) - University of Michigan From description to cause Group Sample Size Mean Health Status Standard Error Hospital 7,774 3.21.014

More information

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point

More information

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social

More information

Sociology Exam 3 Answer Key [Draft] May 9, 201 3

Sociology Exam 3 Answer Key [Draft] May 9, 201 3 Sociology 63993 Exam 3 Answer Key [Draft] May 9, 201 3 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. Bivariate regressions are

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

War and Relatedness Enrico Spolaore and Romain Wacziarg September 2015

War and Relatedness Enrico Spolaore and Romain Wacziarg September 2015 War and Relatedness Enrico Spolaore and Romain Wacziarg September 2015 Online Appendix Supplementary Empirical Results, described in the main text as "Available in the Online Appendix" 1 Table AUR0 Effect

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Limited dependent variable regression models

Limited dependent variable regression models 181 11 Limited dependent variable regression models In the logit and probit models we discussed previously the dependent variable assumed values of 0 and 1, 0 representing the absence of an attribute and

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016 STAB22H3 Statistics I, LEC 01 and LEC 02 Duration: 1 hour and 45 minutes Last Name: First Name:

More information

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible Pledge: 10/4/2007 MATH 171 Name: Dr. Lunsford Test 1 100 Points Possible I. Short Answer and Multiple Choice. (36 points total) 1. Circle all of the items below that are measures of center of a distribution:

More information

Background Information. Instructions. Problem Statement. HOMEWORK INSTRUCTIONS Homework #2 HIV Statistics Problem

Background Information. Instructions. Problem Statement. HOMEWORK INSTRUCTIONS Homework #2 HIV Statistics Problem Background Information HOMEWORK INSTRUCTIONS The scourge of HIV/AIDS has had an extraordinary impact on the entire world. The spread of the disease has been closely tracked since the discovery of the HIV

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression

More information

Q: How do I get the protein concentration in mg/ml from the standard curve if the X-axis is in units of µg.

Q: How do I get the protein concentration in mg/ml from the standard curve if the X-axis is in units of µg. Photometry Frequently Asked Questions Q: How do I get the protein concentration in mg/ml from the standard curve if the X-axis is in units of µg. Protein standard curves are traditionally presented as

More information

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. LDA lab Feb, 6 th, 2002 1 1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. 2. Scientific question: estimate the average

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

ECON Introductory Econometrics Seminar 7

ECON Introductory Econometrics Seminar 7 ECON4150 - Introductory Econometrics Seminar 7 Stock and Watson EE11.2 April 28, 2015 Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 1 / 25 E. 11.2 b clear set more

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis Basic Concept: Extend the simple regression model to include additional explanatory variables: Y = β 0 + β1x1 + β2x2 +... + βp-1xp + ε p = (number of independent variables

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

Choosing a Significance Test. Student Resource Sheet

Choosing a Significance Test. Student Resource Sheet Choosing a Significance Test Student Resource Sheet Choosing Your Test Choosing an appropriate type of significance test is a very important consideration in analyzing data. If an inappropriate test is

More information

Clincial Biostatistics. Regression

Clincial Biostatistics. Regression Regression analyses Clincial Biostatistics Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a

More information

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies. Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam May 28 th, 2015: 9am to 1pm Instructions: 1. There are seven questions and 12 pages. 2. Read each question carefully. Answer

More information

Analysis and Interpretation of Data Part 1

Analysis and Interpretation of Data Part 1 Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying

More information

Econometric Game 2012: infants birthweight?

Econometric Game 2012: infants birthweight? Econometric Game 2012: How does maternal smoking during pregnancy affect infants birthweight? Case A April 18, 2012 1 Introduction Low birthweight is associated with adverse health related and economic

More information

One-Way Independent ANOVA

One-Way Independent ANOVA One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.

More information

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results.

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results. 10 Learning Objectives Testing Means After reading this chapter, you should be able to: Related-Samples t Test With Confidence Intervals 1. Describe two types of research designs used when we select related

More information

4. STATA output of the analysis

4. STATA output of the analysis Biostatistics(1.55) 1. Objective: analyzing epileptic seizures data using GEE marginal model in STATA.. Scientific question: Determine whether the treatment reduces the rate of epileptic seizures. 3. Dataset:

More information

Unit 7 Comparisons and Relationships

Unit 7 Comparisons and Relationships Unit 7 Comparisons and Relationships Objectives: To understand the distinction between making a comparison and describing a relationship To select appropriate graphical displays for making comparisons

More information

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences. SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods

More information

The North Carolina Health Data Explorer

The North Carolina Health Data Explorer The North Carolina Health Data Explorer The Health Data Explorer provides access to health data for North Carolina counties in an interactive, user-friendly atlas of maps, tables, and charts. It allows

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information

Statistical reports Regression, 2010

Statistical reports Regression, 2010 Statistical reports Regression, 2010 Niels Richard Hansen June 10, 2010 This document gives some guidelines on how to write a report on a statistical analysis. The document is organized into sections that

More information

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017 SOCI 420 Advanced Methods of Social Research Fall 2017 EXAM 1 RUBRIC Instructor: Ernesto F. L. Amaral, Assistant Professor, Department of Sociology Date: October 12, 2017 (Thursday) Section 904: 2:20 3:35pm

More information

Binary Diagnostic Tests Paired Samples

Binary Diagnostic Tests Paired Samples Chapter 536 Binary Diagnostic Tests Paired Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary measures

More information

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis DSC 4/5 Multivariate Statistical Methods Applications DSC 4/5 Multivariate Statistical Methods Discriminant Analysis Identify the group to which an object or case (e.g. person, firm, product) belongs:

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

ANOVA. Thomas Elliott. January 29, 2013

ANOVA. Thomas Elliott. January 29, 2013 ANOVA Thomas Elliott January 29, 2013 ANOVA stands for analysis of variance and is one of the basic statistical tests we can use to find relationships between two or more variables. ANOVA compares the

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information