Data Analysis Using SPSS By: Akmal Aini Othman
The key to GOOD descriptive research is knowing exactly what you want to measure and selecting a survey method in which every respondent is willing to cooperate and capable of giving you complete and accurate information efficiently Joe Ottaviani-
Uncertainty Influences The Type Of Research CAUSAL OR DESCRIPTIVE COMPLETELY CERTAIN ABSOLUTE AMBIGUITY EXPLORATORY Source: Zikmund, 2009
Problem Discovery and Definition Problem discovery Sampling Selection of research technique Selection of exploratory research technique Probability Nonprobability Secondary (historical) data Experience survey Pilot study Case study Data Gathering Collection of data (fieldwork) Problem definition (statement of research objectives) Data Processing and Analysis Editing and coding data Research Design Experiment Selection of basic research method Survey Laboratory Field Interview Questionnaire Observation Secondary Data Study Conclusions and Report Data processing Interpretation of findings Report Source: Zikmund, 2009
Thesis Contents Chap 1 - Introduction Chap 2 - Literature Review Chap 3 Methodology Chap 4 Findings & Discussion Chap 5 Conclusion and Recommendation
Thesis Contents Introduction why & what this research Literature Review who have done this research & how, what results, what shortcomings Research Framework & Data Collection why this framework, hypotheses; measurements, sample, how data can be collected Data Collection & Analysis what methods most appropriate, findings Conclusion have u achieved what you set out to do?
Thesis Contents Chap 1 Introduction Background of the study Problem Statement Research Question Research Objective Hypothesis Significance of the study Limitation
Thesis Contents Chap 4 Findings and Discussion Descriptive Analysis Test of Goodness of Data e.g Normality & Multicollinearity Factor Analysis Reliability and Validity Test Inferential Analysis / Hypothesis Testing
Data Preparation Process Prepare Preliminary Plan of Data Analysis Check Questionnaire Edit Code Transcribe Clean Data Statistically Adjust the Data Source: Malhotra, 2012 Select Data Analysis Strategy
Questionnaire Checking A questionnaire returned from the field may be unacceptable for several reasons. Parts of the questionnaire may be incomplete. The pattern of responses may indicate that the respondent did not understand or follow the instructions. One or more pages are missing. The questionnaire is received after the preestablished cutoff date. The questionnaire is answered by someone who does not qualify for participation.
Editing Treatment of Unsatisfactory Results Returning to the Field The questionnaires with unsatisfactory responses may be returned to the field, where the interviewers recontact the respondents. Assigning Missing Values If returning the questionnaires to the field is not feasible, the editor may assign missing values to unsatisfactory responses. Discarding Unsatisfactory Respondents In this approach, the respondents with unsatisfactory responses are simply discarded.
Coding Data coding Coding the variables Coding the response/items for each variable Eg. Variable for gender = sex Coding item 1=male, 2=female The numerical scale can be coded by using the actual number circled by the respondents (question 6 to 21) Random checks should be conducted to ensure data are coded correctly
Table 12.1 Coding of Serakan Co. Questionnaires 1. Age (years) 2. Education 3. Job level 4. Sex [1] Under 25 [1] High school [1] Manager [1] M [2] 25-35 [2] Some college [2] Supervisor [2] F [3] 36-45 [3] Bachelor s degree [3] Clerk 5. Work shift [4] 46-55 [4] Master s degree [4] Secretary [1] First [5] Over 55 [5] Doctoral degree [5] Technician [2] Second [6] Other (specify) [6] Other (specify) [3] Third 5a. Employment Status [1] Part time [2] Full time Here are some questions that ask you to tell us how you experience your work life in general. Please circle the appropriate number on the scales below. To what extent would you agree with the following statements, on a scale of 1 to 7, 1 denoting very low agreement and 7 denoting very high agreement? 6. The major happiness of my life comes from my job. 1 2 3 4 5 6 7 7. Time at work flies by quickly. 1 2 3 4 5 6 7 8. I live, eat and breathe my job. 1 2 3 4 5 6 7 9. My work is fascinating. 1 2 3 4 5 6 7 10. My work gives me a sense of accomplishment. 1 2 3 4 5 6 7 11. My supervisor praises good work. 1 2 3 4 5 6 7 12. The opportunities for advancement are very good here. 1 2 3 4 5 6 7 13. My coworkers are very stimulating. 1 2 3 4 5 6 7 14. People can live comfortably with their pay in this organization. 1 2 3 4 5 6 7 15. I get a lot of cooperation at the workplace. 1 2 3 4 5 6 7 16. My supervisor is not very capable. 1 2 3 4 5 6 7 17. Most things in life are more important than work. 1 2 3 4 5 6 7 18. Working here is a drag. 1 2 3 4 5 6 7 19. The promotion policies here are very unfair. 1 2 3 4 5 6 7 20. My pay is barely adequate to take care of my expenses. 1 2 3 4 5 6 7 21. My work is not the most important part of my life. 1 2 3 4 5 6 7
Data Transcription Fig. 14.4 Raw Data CATI/ CAPI Keypunching via CRT Terminal Optical Recognition Digital Tech. Bar Code & Other Technologies Verification: Correct Keypunching Errors Computer Memory Disks Other Storage Transcribed Data
Data Cleaning Consistency Checks Consistency checks identify data that are out of range, logically inconsistent, or have extreme values. Computer packages like SPSS, SAS, EXCEL and MINITAB can be programmed to identify out-of-range values for each variable and print out the respondent code, variable code, variable name, record number, column number, and out-of-range value. Extreme values should be closely examined.
Data Cleaning Treatment of Missing Responses Substitute a Neutral Value A neutral value, typically the mean response to the variable, is substituted for the missing responses. Substitute an Imputed Response The respondents' pattern of responses to other questions are used to impute or calculate a suitable response to the missing questions. In casewise deletion, cases, or respondents, with any missing responses are discarded from the analysis. In pairwise deletion, instead of discarding all cases with any missing values, the researcher uses only the cases or respondents with complete responses for each calculation.
Basic Terms Levels of Measurement Nominal Ordinal Interval Ratio Key Terms Variable Dimension Item Definition Dictionary Operational Variables Independent Dependent Moderating Mediating
Research Framework 5 items Management (Independent) 5 items 3 items Advancement (Independent) Job Satisfaction (Mediating) Productivity (Dependent) 4 items Salary (Independent) Gender (Moderating) 4 items Workload (Independent)
Scale Nominal Numbers Assigned to Runners 7 8 3 Finish Ordinal Rank Order of Winners Finish Interval Performance Rating on a 0 to 10 Scale Third place Second place First place 8.2 9.1 9.6 Ratio Time to Finish, in Seconds 15.2 14.1 13.4 Source: Malhotra, 2007
What is Statistics process of making sense of data Descriptive Stat describe the basic features of data using tables, graphs, summary stats Inferential Stat generalising from samples to populations performing estimations, hypothesis tests, determining relationships and making predictions
Descriptive Statistics Norminal data frequencies, %, cross tabulation, mode, pie chart, bar chart Ordinal data - frequencies, %, cross tabulation, mode, median, pie chart, bar chart Interval & Ratio data mean, variance, std deviation, skewness, kurtosis, index number, histogram, box plot, stem and leaf plot
Inferential Statistics Statistical Techniques: Exploring differences between groups Exploring relationship Parametric Data must be interval and the distribution must be normal Nonparametric Data is categorical (norminal/ordinal) or interval but distribution is not normal
Data analysis Basic objectives: Getting a feel for the data Testing the goodness of data Testing the hypotheses Feel for the data Checking for the central tendency and the dispersion If there is less variability, the questions could be not properly worded Check for similar response for every questions Remember, if there is no variability in the data, then no variance can be explained
Data analysis It is always prudent to obtain: Frequency distributions for the demographic variables The mean, standard deviation, range and variance on the other dependent and independent variables An inter-correlation matrix of the variables, regardless whether hypotheses are related to the these analyses. If the correlation between two variables is high, say over.75, we should wonder whether they are really two different concepts or we are measuring the same concepts.
Data analysis Testing goodness of data Reliability Cronbach s alpha. The closer Cronbach s alpha is to 1, the higher the internal consistency reliability Split-half reliability coefficient Stability measures include: Parallel from reliability Test-retest reliability Validity Criterion-related validity Convergent validity Discriminant validity
Choosing the Test Depends on: Data Norminal or Interval/Ratio Data Samples one/two/k-samples Purpose Describing, Comparing two statistics or Looking at relationship
A Classification of Univariate Techniques Univariate Techniques Metric Data Non-numeric Data One Sample * t test * Z test Independent * Two- Group test * Z test * One-Way ANOVA Two or More Samples Related * Paired t test One Sample * Frequency * Chi-Square * K-S * Runs * Binomial Independent * Chi-Square * Mann-Whitney * Median * K-S * K-W ANOVA Two or More Samples Related * Sign * Wilcoxon * McNemar * Chi-Square Source: Malhotra, 2012
Univariate Analysis Univariate analysis is the simplest form of analyzing data. Uni means one, so in other words your data has only one variable. It doesn't deal with causes or relationships (unlike regression) and it's major purpose is to describe; it takes data, summarizes that data and finds patterns in the data. It explores each variable in a data set, separately. It looks at the range of values, as well as the central tendency of the values. It describes the pattern of response to the variable. It describes each variable on its own. www.csulb.edu/.../696uni.htm
A Classification of Multivariate Techniques Multivariate Techniques Dependence Technique Interdependence Technique One Dependent Variable More Than One Dependent Variable Variable Interdependence Interobject Similarity * Cross-Tabulation * Analysis of Variance and Covariance * Multiple Regression * 2-Group Discriminant/Logit * Conjoint Analysis * Multivariate Analysis of Variance * Canonical Correlation * Multiple Discriminant Analysis * Structural Equation Modeling and Path Analysis * Factor Analysis * Confirmatory Factor Analysis * Cluster Analysis * Multidimensional Scaling Source: Malhotra, 2012
Multivariate Analysis Multivariate Data Analysis refers to any statistical technique used to analyze data that arises from more than one variable. This essentially models reality where each situation, product, or decision involves more than a single variable.
Steps Involved in Hypothesis Testing Formulate H 0 and H 1 Select Appropriate Test Choose Level of Significance Collect Data and Calculate Test Statistic Determine Probability Associated with Test Statistic (p value) Compare with Level of Significance, Determine Critical Value of Test Statistic TS CR Determine if TS CAL falls into (Non) Rejection Region Reject or Do not Reject H 0 Draw Research Conclusion
Hypothesis Testing Hnull & Halternative A null hypothesis is a statement of the status quo, one of no difference or no effect. If the null hypothesis is not rejected, no changes will be made. An alternative hypothesis is one in which some difference or effect is expected. Accepting the alternative hypothesis will lead to changes in opinions or actions. The null hypothesis refers to a specified value of the population parameter (e.g., m, s, p ), not a sample statistic (e.g., ). X
H 1 : p > 0. 40 Hypothesis Testing Hnull & Halternative A null hypothesis may be rejected, but it can never be accepted based on a single test. In classical hypothesis testing, there is no way to determine whether the null hypothesis is true. The null hypothesis is formulated in such a way that its rejection leads to the acceptance of the desired conclusion. The alternative hypothesis represents the conclusion for which evidence is sought. H 0 : p 0. 4 0
Hypothesis Testing Hnull & Halternative The test of the null hypothesis is a one-tailed test, because the alternative hypothesis is expressed directionally. If that is not the case, then a two-tailed test would be required, and the hypotheses would be expressed as: H 0 : p = 0. 4 0 H 1 : p 0. 4 0
One-Tailed & Two-Tailed Test
One-Tailed & Two-Tailed Test
Test Statistic The test statistic measures how close the sample has come to the null hypothesis. The test statistic often follows a well-known distribution, such as the normal, t, or chisquare distribution. In our example, the z statistic,which follows the standard normal distribution, would be appropriate. z = p - p s p where s p = p ( 1 - p ) n
Type I and Type II error Type I Error Type I error occurs when the sample results lead to the rejection of the null hypothesis when it is in fact true. Type II Error Type II error occurs when, based on the sample results, the null hypothesis is not rejected when it is in fact false.
Descriptive Analysis
Frequencies - Command
Frequencies Question: 1. Is our sample representative? 2. Data entry error Valid Male Female Total Gender Cumulativ e Frequency Percent Valid Percent Percent 144 75.0 75.0 75.0 48 25.0 25.0 100.0 192 100.0 100.0 Current Position Valid Technician Engineer Sr Engineer Manager Abov e manager Total Cumulativ e Frequency Percent Valid Percent Percent 34 17.7 17.7 17.7 66 34.4 34.4 52.1 54 28.1 28.1 80.2 32 16.7 16.7 96.9 6 3.1 3.1 100.0 192 100.0 100.0
Table in Report Gender Male Female Position Technician Engineer Sr Engineer Manager Above manager Frequency 144 48 34 66 54 32 6 Percentage 75.0 25.0 17.7 34.4 28.1 16.7 3.1
Descriptives - Command
Descriptives Descriptive Statistics N Minimum Maximum Mean Std. Deviation Skewness Kurtosis Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error JS 192 2.00 5.00 3.8188.63877 -.528.175.687.349 Mgt 192 2.00 5.00 3.8104.64548 -.480.175.242.349 WL 192 2.00 5.00 3.7031.67034 -.101.175.755.349 Slr 192 2.00 5.00 3.4792.73672.015.175 -.028.349 Adv 192 2.33 5.00 4.0625.58349 -.361.175 -.328.349 192 Valid N (listwise) Question: 1. Is there variation in our data? 2. What is the level of the phenomenon we are measuring?
Table in Report Mean Std. Deviation Job Satisfaction 3.82 0.64 Management 3.81 0.65 Work Load 3.70 0.67 Salary 3.48 0.74 Advancement 4.06 0.58
Research Framework 5 items Management (Independent) H1 5 items 3 items Advancement (Independent) H2 Job Satisfaction (Dependent) 4 items Salary (Independent) H3 H4 4 items Workload (Independent)
Factor Analysis (FA)- Command
Assumptions in FA Question: How valid is our instrument? KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy..890 Approx. Chi-Square 3178.651 Bartlett's Test of Sphericity df 120 Sig..000 KMO should be > 0.5 Bartlett s Test should be significant ie; p < 0.05
Measure of Sampling Adequacy MSA Comment 0.80 and above Meritorious 0.70 0.80 Middling 0.60 0.70 Mediocre 0.50 0.60 Miserable Below 0.50 Unacceptable
Assigning Questions Communalities Initial Extraction Rotated Component Matrix a Component 1 2 3 4 Management1.859.155.354.280 Management2.829.204.358.228 Management3.851.137.360.191 Management4.845.111.280.260 Management5.884.061.299.230 WorkLoad1.417 -.060.721.154 WorkLoad2.395 -.019.791.232 WorkLoad3.357 -.077.808.250 Workload4.250 -.075.836.110 Salary1.120.886.018.038 Salary2.108.886 -.080.025 Salary3.065.894 -.047 -.042 Salary4.072.897 -.026 -.032 Advancement1.355 -.107.169.748 Advancement2.308 -.096.109.785 Advancement4.132.226.385.726 Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 6 iterations. Management1 1.000.965 Management2 1.000.909 Management3 1.000.910 Management4 1.000.872 Management5 1.000.927 WorkLoad1 1.000.721 WorkLoad2 1.000.836 WorkLoad3 1.000.848 Workload4 1.000.779 Salary1 1.000.802 Salary2 1.000.804 Salary3 1.000.807 Salary4 1.000.811 Advancement1 1.000.725 Advancement2 1.000.732 Advancement4 1.000.744 Extraction Method: Principal Component Analysis. Amount of shared, or common variance, among the variables General guidelines all communnalities should be above 0.5
Significant Loadings Factor Loading Sample Size Needed 0.30 350 0.35 250 0.40 200 0.45 150 0.50 120 0.55 100 0.60 85 0.65 70 0.70 60 0.75 50
How many Factors? Total Variance Explained Component Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings Total % of Cumulative Total % of Cumulative Total % of Cumulative Variance % Variance % Variance % 1 7.694 48.085 48.085 7.694 48.085 48.085 4.438 27.739 27.739 2 3.394 21.211 69.296 3.394 21.211 69.296 3.361 21.006 48.745 3 1.120 7.000 76.296 1.120 7.000 76.296 3.246 20.287 69.032 4.984 6.149 82.445.984 6.149 82.445 2.146 13.414 82.445 5.531 3.319 85.765 6.448 2.799 88.563 7.423 2.646 91.210 8.338 2.113 93.323 9.229 1.430 94.753 10.199 1.245 95.999 11.176 1.102 97.101 12.123.771 97.873 13.120.750 98.623 14.101.633 99.256 15.085.534 99.790 16.034.210 100.000 Extraction Method: Principal Component Analysis.
How many Factors? - Scree Plot
Reliability - Command
Question: How reliable are our instruments? Reliability Statistics Cronbach's N of Items Alpha.977 5 Should be preferably > 0.3 Item-Total Statistics Scale Mean if Scale Variance Corrected Item- Cronbach's Item Deleted if Item Deleted Total Alpha if Item Correlation Deleted Management1 15.25 6.681.973.965 Management2 15.26 6.560.925.972 Management3 15.24 6.906.929.972 Management4 15.21 6.825.900.975 Management5 15.25 6.555.935.970
Table in Report Variable N of Item Item Alpha Deleted Attitude 5-0.977 SN 4-0.912 Pbcontrol 4-0.919 Intention 5-0.966 Actual 3-0.933
Computing New Variable - Command
Data after Transformation
Inferential Analysis
Chi Square Test - Command
Question: Crosstabulation Is level of sharing dependent on gender? Gender * Inten tion Level Cr osstabu lation Gender Total Male Female Count % wit hin Gender % within Intention Lev el % of Total Count % wit hin Gender % within Intention Lev el % of Total Count % wit hin Gender % within Intention Lev el % of Total Intention Lev el Low High Total 110 34 144 76.4% 23.6% 100.0% 70.5% 94.4% 75.0% 57.3% 17.7% 75.0% 46 2 48 95.8% 4.2% 100.0% 29.5% 5.6% 25.0% 24.0% 1.0% 25.0% 156 36 192 81.3% 18.8% 100.0% 100.0% 100.0% 100.0% 81.3% 18.8% 100.0% Pearson Chi-Square Continuity Correction a Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases Chi-Square Tests Asy mp. Sig. Value df (2-sided) 8.934 b 1.003 7.704 1.006 11.274 1.001 8.888 1.003 192 a. Computed only f or a 2x2 table b. Exact Sig. (2-sided) Exact Sig. (1-sided).002.001 0 cells (.0%) hav e expected count less than 5. The minimum expected count is 9. 00.
T-test - Command
Question: t-test (2 Independent) Does intention to share vary by gender? Group Statistics Intention Gender Male Female N Std. Std. Error Mean Dev iation Mean 144 3.9000.60302.05025 48 3.5750.68619.09904 Independent Samples Test Intention Equal variances assumed Equal variances not assumed Levene's Test f or Equality of Variances F Sig. t-test for Equality of Means Mean Std. Error 95% Confidence Interv al of the Dif f erence t df Sig. (2-tailed) Dif f erence Dif f erence Lower Upper 3.591.060 3.122 190.002.32500.10410.11965.53035 2.926 72.729.005.32500.11106.10364.54636
Paired t-test - Command
Question: t-test (2 Dependent) Are there differences between intention to share and actual sharing behavior? Paired Samples Statistics Pair 1 Intention Actual Std. Std. Error Mean N Dev iation Mean 3.8188 192.63877.04610 4.0625 192.58349.04211 Paired Samples Correl ations Pair 1 Intention & Actual N Correlation Sig. 192.817.000 Paired Samples Test Pair 1 Intention - Actual Paired Diff erences 95% Confidence Interv al of the Std. Std. Error Diff erence Mean Dev iation Mean Lower Upper t df Sig. (2-tailed) -.24375.37326.02694 -.29688 -.19062-9.049 191.000
One Way ANOVA - Command
One way ANOVA (k independent) Question: Does intention vary by position? ANOVA Intention Between Groups Within Groups Total Sum of Squares df Mean Square F Sig. 7.864 4 1.966 5.247.001 70.068 187.375 77.933 191 Duncan a,b Current Position Engineer Manager Technician Sr Engineer Abov e manager Sig. Intentio n Subset f or alpha =.05 N 1 2 66 3.6424 32 3.6625 34 3.8941 54 4.0000 6 4.5333.101 1.000 Means f or groups in homogeneous subsets are display ed. a. Uses Harmonic Mean Sample Size = 19.157. b. The group sizes are unequal. The harmonic mean of the group sizes is used. Ty pe I error levels are not guaranteed.
Kruskal-Wallis - Command
Kruskal-Wallis (k independent) Question: Does the variables vary by position? Ranks Intention Position Technician Engineer Sr Engineer Manager Abov e manager Total N Mean Rank 34 101.32 66 79.68 54 114.54 32 81.63 6 171.17 192 Test Statistics a,b Chi-Square df Asy mp. Sig. Intention 28.179 4.000 a. Kruskal Wallis Test b. Grouping Variable: Posit ion
Correlation - Command
Correlation (Interval/ratio) Question: Are the variables related? Attitude subjectiv e Pbcontrol Intention Actual Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Correlati ons **. Correlation is signif icant at the 0.01 lev el (2-t ailed). At tit ude subjectiv e Pbcontrol Intention Actual 1.697**.212**.808**.606**.000.003.000.000 192 192 192 192 192.697** 1 -.052.653**.552**.000.471.000.000 192 192 192 192 192.212** -.052 1.281**.031.003.471.000.665 192 192 192 192 192.808**.653**.281** 1.817**.000.000.000.000 192 192 192 192 192.606**.552**.031.817** 1.000.000.665.000 192 192 192 192 192
Table Presentation Attitude subjective Pbcontrol Intention Attitude subjective Pbcontrol Intention Actual 1.740** 1.201** -.047 1.885**.662**.326** 1 Actual.660**.553**.059.805** 1 *p< 0.05, **p< 0.01
Regression - Command
Multiple Regression Question: Which variables can explain the intention to share? Model 1 Variables Entered/Removed b Variables Variables Entered Remov ed Method Pbcontrol, subjectiv e, Attitude a. Enter a. All requested v ariables ent ered. b. Dependent Variable: Intent ion R square how much of the variance in the dependent variable is explained by the model Model 1 Model Summary b Adjusted Std. Error of Durbin- R R Square R Square the Estimate Watson.832 a.693.688.35703 1.501 a. Predictors: (Constant), Pbcontrol, subjective, Attitude b. Dependent Variable: Intention
Multiple Regression Model 1 Model 1 Regression Residual Total ANOVA b Sum of Squares df Mean Square F Sig. 53.968 3 17.989 141.127.000 a 23.964 188.127 77.933 191 a. Predictors: (Constant), Pbcontrol, subjectiv e, Attitude b. Dependent Variable: Intention (Constant) Attitude subjectiv e Pbcontrol a. Dependent Variable: Intention Coefficients a Unstandardized Standardized Coeff icients Coeff icients Collinearity Statistics B Std. Error Beta t Sig. Tolerance VIF.191.197.971.333.601.059.607 10.103.000.453 2.210.227.056.238 4.043.000.472 2.116.143.037.165 3.821.000.877 1.140
Regression Equation
Thank you a2o@icqmail.com