Business Research Methods Introduction to Data Analysis
Data Analysis Process
STAGES OF DATA ANALYSIS EDITING CODING DATA ENTRY ERROR CHECKING AND VERIFICATION DATA ANALYSIS
Introduction Preparation of Data Editing, Handling Blank responses, Coding, Categorization and Data Entry These activities ensure accuracy of the data and its conversion from raw form to reduced data Exploring, Displaying and Examining data Breaking down, inspecting and rearranging data to start the search for meaningful descriptions, patterns and relationship.
Editing The Process Of Checking And Adjusting The Data For Omissions For Legibility For Consistency And Readying Them For Coding And Storage
Editing FIELD EDITING IN-HOUSE EDITING
Reasons for Editing Accurate Consistent Arranged for simplification Criteria Uniformly entered Complete
Birth Year Recorded By Interviewer 1873? 1973 MORE LIKELY
Coding Involves assigning numbers or other symbols to answers so the responses can be grouped into a limited number of classes or categories. Example: M for Male and F for Female 1 for Male and 2 for Female Numeric vs Alphanumeric Numeric versus Alphanumeric Open ended questions Check accuracy by using 10% of responses
Coding Rules Exhaustive Appropriate to the research problem Categories should be Mutually exclusive Derived from one classification principle
Appropriateness Let s say your population is students at institutions of higher learning What is you age group? 15 25 years 26 35 years 36 45 years Above 45 years
Exhaustiveness What is your race? Malay Chinese Indians Others
Mutual Exclusivity What is your occupation type? Professional Managerial Sales Crafts Operatives Unemployed Clerical Housewife Others
Single Dimension What is your occupation type? Professional Crafts Managerial Sales Clerical Housewife Operatives Unemployed Others
Coding Open-ended Responses
Coding Open Ended Questions
Handling Blank Responses How do we take care of missing responses? If > 25% missing, throw out the questionnaire Other ways of handling Use the midpoint of the scale Ignore (system missing) Mean of those responding Mean of the respondent Random number
Code Book Identifies each variable Provides a variable s description Identifies each code name and position on storage medium
Sample SPSS Codebook
Data Entry Keyboarding Database Programs Digital/ Barcodes Optical Recognition Voice recognition
Data Transformation Weights Assigning numbers to responses on a pre-determined rule Respecification of the Variable Transforming existing data to form new variables or items Recode Compute
Scale Transformation Reason for Transformation to improve interpretation and compatibility with other data sets to enhance symmetry and stabilize spread improve linear relationship between the variables (Standardized score) X z i - s X
Characteristics of Distributions
Summarizing Distributions with Shape
Parameter & Statistics Variable Population Sample Mean µ X Proportion p Variance 2 s 2 Standard deviation s Size N n Standard error of the mean x S x
Statistical Testing Procedures State null hypothesis Interpret the test Stages Choose statistical test Obtain critical test value Compute difference value Select level of significance
Hypotheses Null H0: = 50 mpg H0: < 50 mpg H0: > 50 mpg Alternate HA: 50 mpg HA: > 50 mpg HA: < 50 mpg
Accept/Reject
Accept/Reject
How to Select a Test Two-Sample Tests k-sample Tests Measurement Scale One-Sample Case Related Samples Independent Samples Related Samples Independent Samples Nominal Binomial McNemar Fisher exact test Cochran Q x 2 for k samples x 2 one-sample test x 2 two-samples test Ordinal Kolmogorov-Smirnov one-sample test Runs test Sign test Wilcoxon matched-pairs test Median test Mann-Whitney U Kolmogorov- Smirnov Wald-Wolfowitz Friedman twoway ANOVA Median extension Kruskal-Wallis one-way ANOVA Interval and Ratio t-test t-test for paired samples t-test Repeatedmeasures ANOVA One-way ANOVA Z test Z test n-way ANOVA
Research Model 5 items Attitude 5 items 3 items 4 items Subjective norm 4 items Perceived Behavioral Control Intention to Share Information Actual Sharing of Information
Reliability - Command
Reliability Question: How reliable are our instruments? Reliability Statistics Cronbach's Alpha N of Items.977 5 Item-T otal Statistics Att1 Att2 Att3 Att4 Att5 Scale Mean if Item Deleted Scale Variance if Corrected Item-T otal Cronbach's Alpha if Item Item Deleted Correlation Deleted 15.25 6.681.973.965 15.26 6.560.925.972 15.24 6.906.929.972 15.21 6.825.900.975 15.25 6.555.935.970
Reliability Reliability Statistics Cronbach's Alpha N of Items.912 4 Item-T otal Statistics Sn1 Sn2 Sn3 Sn4 Scale Mean if Item Deleted Scale Variance if Corrected Item-T otal Cronbach's Alpha if Item Item Deleted Correlation Deleted 11.20 4.243.761.900 11.03 4.135.855.868 11.00 4.021.856.867 11.21 4.250.736.909
Reliability Reliability Statistics Cronbach's Alpha N of Items.919 4 Item-Total Statistics Pbc1 Pbc2 Pbc3 Pbc4 Scale Mean if Item Deleted Scale Variance if Corrected Item-Total Cronbach's Alpha if Item Item Deleted Correlation Deleted 10.48 4.984.814.895 10.45 4.793.826.892 10.43 5.042.809.897 10.40 5.246.814.897
Reliability Reliability Statistics Cronbach's Alpha N of Items.966 5 Item-Total Statistics Intent1 Intent2 Intent3 Intent4 Intent5 Scale Mean if Item Deleted Scale Variance if Corrected Item-Total Cronbach's Alpha if Item Item Deleted Correlation Deleted 15.28 6.591.951.951 15.28 6.612.888.961 15.29 6.553.901.959 15.28 6.716.877.962 15.24 6.445.904.958
Table in Report Variable N of Item Item Alpha Deleted Attitude 5-0.977 SN 4-0.912 Pbcontrol 4-0.919 Intention 5-0.966 Actual 3-0.933
Example - Recoding Perceived Enjoyment PE1 PE2 PE3 PE4 PE5 The actual process of using Instant Messenger is pleasant I have fun using Instant Messenger Using Instant Messenger bores me Using Instant Messenger provides me with a lot of enjoyment I enjoy using Instant Messenger 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7
Recoding
Recoding
Data before Transformation
Data after Transformation
Frequencies - Command
Frequencies Question: 1. Is our sample representative? 2. Data entry error Valid Male Female Total Gender Cumulative Frequency Percent Valid Percent Percent 144 75.0 75.0 75.0 48 25.0 25.0 100.0 192 100.0 100.0 Current Position Valid Technician Engineer Sr Engineer Manager Above manager Total Cumulative Frequency Percent Valid Percent Percent 34 17.7 17.7 17.7 66 34.4 34.4 52.1 54 28.1 28.1 80.2 32 16.7 16.7 96.9 6 3.1 3.1 100.0 192 100.0 100.0
Table in Report Gender Male Female Position Technician Engineer Sr Engineer Manager Above manager Frequency 144 48 34 66 54 32 6 Percentage 75.0 25.0 17.7 34.4 28.1 16.7 3.1
Descriptives - Command
Descriptives Age Years working in the organization Total years of working experience Attitude subjective Pbcontrol Intention Actual Valid N (listwise) Question: Descriptive Statistics N Minimum Maximum Mean Std. Skewness Kurtosis Statistic Statistic Statistic Statistic Deviation Statistic Statistic Std. Error Statistic Std. Error 192 19 53 33.39 8.823.667.175 -.557.349 192 1 18 5.36 4.435 1.448.175 1.333.349 192 1 28 9.04 7.276 1.051.175 -.025.349 192 2.00 5.00 3.8104.64548 -.480.175.242.349 192 2.00 5.00 3.7031.67034 -.101.175.755.349 192 2.00 5.00 3.4792.73672.015.175 -.028.349 192 2.00 5.00 3.8188.63877 -.528.175.687.349 192 2.33 5.00 4.0625.58349 -.361.175 -.328.349 192 1. Is there variation in our data? 2. What is the level of the phenomenon we are measuring?
Table in Report Attitude Subjective Norm Behavioral Control Intention Actual Mean Std. Deviation 3.81 0.65 3.70 0.67 3.48 0.74 3.82 0.64 4.06 0.58
Chi Square Test - Command
Crosstabulation Question: Is level of sharing dependent on gender? Gender * Intention Level Cr osstabulation Gender Total Male Female Count % within Gender % within Intention Level % of Total Count % within Gender % within Intention Level % of Total Count % within Gender % within Intention Level % of Total Intention Level Low High Total 110 34 144 76.4% 23.6% 100.0% 70.5% 94.4% 75.0% 57.3% 17.7% 75.0% 46 2 48 95.8% 4.2% 100.0% 29.5% 5.6% 25.0% 24.0% 1.0% 25.0% 156 36 192 81.3% 18.8% 100.0% 100.0% 100.0% 100.0% 81.3% 18.8% 100.0% Pearson Chi-Square Continuity Correction a Likelihood Ratio Fisher's Exact T est Linear-by-Linear Association N of Valid Cases Chi-Square Tests Asymp. Sig. Value df (2-sided) 8.934 b 1.003 7.704 1.006 11.274 1.001 8.888 1.003 192 a. Computed only for a 2x2 table Exact Sig. (2-sided) Exact Sig. (1-sided).002.001 b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 9. 00.
T-test - Command
t-test (2 Independent) Question: Does intention to share vary by gender? Intention Gender Male Female Group Statistics Std. Std. Error N Mean Deviation Mean 144 3.9000.60302.05025 48 3.5750.68619.09904 Independent Samples Test Intention Equal variances assumed Equal variances not assumed Levene's Test for Equality of Variances F Sig. t df Sig. (2-tailed) t-test for Equality of Means Mean Difference 95% Confidence Interval of the Std. Error Difference Difference Lower Upper 3.591.060 3.122 190.002.32500.10410.11965.53035 2.926 72.729.005.32500.11106.10364.54636
Paired t-test - Command
t-test (2 Dependent) Question: Are there differences between intention to share and actual sharing behavior? Pair 1 Intention Actual Paired Samples Statistics Std. Std. Error Mean N Deviation Mean 3.8188 192.63877.04610 4.0625 192.58349.04211 Paired Samples Correlations Pair 1 Intention & Actual N Correlation Sig. 192.817.000 Paired Samples Test Pair 1 Intention - Actual Paired Differences 95% Conf idence Interval of the Std. Std. Error Dif ference Mean Deviation Mean Lower Upper t df Sig. (2-tailed) -.24375.37326.02694 -.29688 -.19062-9.049 191.000
One Way ANOVA - Command
One way ANOVA (k independent) Question: Does intention vary by position? ANOVA Intention Between Groups Within Groups Total Sum of Squares df Mean Square F Sig. 7.864 4 1.966 5.247.001 70.068 187.375 77.933 191 Duncan a,b Current Po sition Engine er Manag er Te chnician Sr Engineer Above manager Sig. Intention Subset for alpha =.05 N 1 2 66 3.6424 32 3.6625 34 3.8941 54 4.0000 6 4.5333.101 1.000 Means for groups in homogeneous subsets are displayed. a. Uses Harmonic Mean Sample Size = 19.157. b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels are not guaranteed.
Correlation - Command
Correlation (Interval/ratio) Question: Are the variables related? Attitude subjective Pbcontrol Intention Actual Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Correlations **. Correlation is significant at the 0.01 level (2-tailed). Attitude subjective Pbcontrol Intention Actual 1.697**.212**.808**.606**.000.003.000.000 192 192 192 192 192.697** 1 -.052.653**.552**.000.471.000.000 192 192 192 192 192.212** -.052 1.281**.031.003.471.000.665 192 192 192 192 192.808**.653**.281** 1.817**.000.000.000.000 192 192 192 192 192.606**.552**.031.817** 1.000.000.665.000 192 192 192 192 192
Table Presentation Attitude subjective Pbcontrol Intention Attitude subjective Pbcontrol Intention Actual 1.740** 1.201** -.047 1.885**.662**.326** 1 Actual.660**.553**.059.805** 1 *p< 0.05, **p< 0.01
Command
Multiple Regression Question: Which variables can explain the intention to share? Model 1 Variables Entered/Removed b Variables Variables Entered Removed Method Pbcontrol, subjective, Attitude a. Enter a. All requested variables entered. b. Dependent Variable: Intention Model 1 Model Summary b Adjusted Std. Error of Durbin- R R Square R Square the Estimate Watson.832 a.693.688.35703 1.501 a. Predictors: (Constant), Pbcontrol, subjective, Attitude b. Dependent Variable: Intention
Multiple Regression Model 1 Model 1 Regression Residual Total ANOVA b Sum of Squares df Mean Square F Sig. 53.968 3 17.989 141.127.000 a 23.964 188.127 77.933 191 a. Predictors: (Constant), Pbcontrol, subjective, Attitude b. Dependent Variable: Intention (Constant) Attitude subjective Pbcontrol Unstandardized Coefficients a. Dependent Variable: Intention Coefficients a Standardized Coefficients Collinearity Statistics B Std. Error Beta t Sig. Tolerance VIF.191.197.971.333.601.059.607 10.103.000.453 2.210.227.056.238 4.043.000.472 2.116.143.037.165 3.821.000.877 1.140
Assumptions (Multicollinearity) Collinearity Diagnostics a Model 1 Dimension 1 2 3 4 a. Dependent Variable: Intention Condition Variance Proportions Eigenvalue Index (Constant) Attitude subjective Pbcontrol 3.936 1.000.00.00.00.00.043 9.581.00.02.10.55.013 17.195.91.19.02.21.008 22.890.09.79.88.24
Assumptions (Outliers) Case Number 70 82 83 166 178 179 Casewise Diagnostics a Predicted Std. Residual Intention Value Residual 3.152 5.00 3.8748 1.12520 4.042 5.00 3.5570 1.44295 3.071 4.20 3.1037 1.09631 3.152 5.00 3.8748 1.12520 4.042 5.00 3.5570 1.44295 3.071 4.20 3.1037 1.09631 a. Dependent Variable: Intention
After Removing Outliers Model 1 Model 1 Model Summary b Adjusted Std. Error of Durbin- R R Square R Square the Estimate Watson.900 a.810.807.27373 1.725 a. Predictors: (Constant), Pbcontrol, subjective, Attitude b. Dependent Variable: Intention Model 1 Regression Residual Total (Constant) Attitude subjective Pbcontrol Unstandardized Coefficients a. Dependent Variable: Intention ANOVA b Sum of Squares df Mean Square F Sig. 58.261 3 19.420 259.182.000 a 13.637 182.075 71.898 185 a. Predictors: (Constant), Pbcontrol, subjective, Attitude b. Dependent Variable: Intention Coefficients a Standardized Coefficients Collinearity Statistics B Std. Error Beta t Sig. Tolerance VIF.067.153.441.659.758.050.784 15.281.000.396 2.523.085.047.091 1.801.073.412 2.426.145.029.173 5.015.000.875 1.143
Assumptions Advanced Diagnostics (Hair et al., 2006) Predicted Value Std. Predicted Value Standard Error of Predicted Value Adjusted Predicted Value Residual Std. Residual Stud. Residual Deleted Residual Stud. Deleted Residual Mahal. Distance Cook's Distance Centered Leverage Value a. Dependent Variable: Intention Residuals Statistics a Std. Minimum Maximum Mean Deviation N 2.1329 4.9380 3.8188.53156 192-3.172 2.106.000 1.000 192.027.111.048.020 192 2.1423 4.9493 3.8179.53167 192 -.96087 1.44295.00000.35421 192-2.691 4.042.000.992 192-2.731 4.253.001 1.012 192 -.98909 1.59761.00086.36911 192-2.779 4.461.004 1.031 192.130 17.495 2.984 3.453 192.000.485.011.051 192.001.092.016.018 192
Frequency Assumptions (Normality) Histogram Dependent Variable: Intention 70 60 50 40 30 20 10 0-4 -2 0 2 4 6 Mean = -1.99E-17 Std. Dev. = 0.992 N = 192 Regression Standardized Residual
Expected Cum Prob Assumptions (Normality of the Error term) Normal P-P Plot of Regression Standardized Residual Dependent Variable: Intention 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 Observed Cum Prob 1.0
Regression Studentized Residual Assumptions (Constant Variance) Scatterplot Dependent Variable: Intention 4 2 0-2 2.00 2.50 3.00 3.50 4.00 4.50 5.00 Intention
Intention Assumptions (Linearity) Partial Regression Plot Dependent Variable: Intention 1.5 1.0 0.5 0.0-0.5-1.0-1.5-2 -1 0 1 Attitude
Intention Assumptions (Linearity) Partial Regression Plot Dependent Variable: Intention 2.0 1.5 1.0 0.5 0.0-0.5-1.0-2 -1 0 1 2 subjective
Intention Assumptions (Linearity) Partial Regression Plot Dependent Variable: Intention 2.0 1.5 1.0 0.5 0.0-0.5-1.0-2 -1 Pbcontrol 0 1
Table Presentation Variable Attitude Subjective Norm Perceived Control R 2 Adjusted R 2 F Value D-W Dependent = Intention Standardized Beta 0.607** 0.238** 0.105** 0.693 0.688 141.13 1.501 *p< 0.05, **p< 0.01