Analysing and Comparing the Final Grade in Mathematics by Linear Regression Using Excel and SPSS

Similar documents
Simple Linear Regression

Study Guide #2: MULTIPLE REGRESSION in education

Research paper. Split-plot ANOVA. Split-plot design. Split-plot design. SPSS output: between effects. SPSS output: within effects

CHAPTER TWO REGRESSION

Class 7 Everything is Related

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Data Analysis with SPSS

Simple Linear Regression One Categorical Independent Variable with Several Categories

REGRESSION MODELLING IN PREDICTING MILK PRODUCTION DEPENDING ON DAIRY BOVINE LIVESTOCK

Multiple Regression Analysis

F1: Introduction to Econometrics

Prospect Theory and Gain-Framing: The Relationship Between Cost Perception and Satisfaction

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

1. You want to find out what factors predict achievement in English. Develop a model that

Chapter 11 Multiple Regression

A COMPARISON BETWEEN MULTIVARIATE AND BIVARIATE ANALYSIS USED IN MARKETING RESEARCH

Answer all three questions. All questions carry equal marks.

Correlation and Regression

Fertility Rate & Infant Mortality Rate

Simple Linear Regression the model, estimation and testing

Regression Including the Interaction Between Quantitative Variables

NORTH SOUTH UNIVERSITY TUTORIAL 2

Statistical Methods and Reasoning for the Clinical Sciences

Regression. Page 1. Variables Entered/Removed b Variables. Variables Removed. Enter. Method. Psycho_Dum

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Business Statistics Probability

SPSS output for 420 midterm study

Two Factor Analysis of Variance

isc ove ring i Statistics sing SPSS

GROSS FIXED CAPITAL FORMATION SUFFICES AS A DETERMINANT FOR GROSS DOMESTIC PRODUCT IN INDIA

Appendix. I. Map of former Car Doctors site. Soil for the greenhouse study was collected in the area indicated by the arrow.

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

SPSS output for 420 midterm study

THE UNIVERSITY OF SUSSEX. BSc Second Year Examination DISCOVERING STATISTICS SAMPLE PAPER INSTRUCTIONS

WELCOME! Lecture 11 Thommy Perlinger

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

A response variable is a variable that. An explanatory variable is a variable that.

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Substantive Significance of Multivariate Regression Coefficients

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

CHAPTER 4: FINDINGS 4.1 Introduction This chapter includes five major sections. The first section reports descriptive statistics and discusses the

Performance of Median and Least Squares Regression for Slightly Skewed Data

Normal Q Q. Residuals vs Fitted. Standardized residuals. Theoretical Quantiles. Fitted values. Scale Location 26. Residuals vs Leverage

Overview of Lecture. Survey Methods & Design in Psychology. Correlational statistics vs tests of differences between groups

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

What Causes Stress in Malaysian Students and it Effect on Academic Performance: A case Revisited

STAT 135 Introduction to Statistics via Modeling: Midterm II Thursday November 16th, Name:

Still important ideas

3.2A Least-Squares Regression

Daniel Boduszek University of Huddersfield

Regional convergence in Romania: from theory to empirics

Example of Interpreting and Applying a Multiple Regression Model

SPSS Portfolio. Brittany Murray BUSA MWF 1:00pm-1:50pm

Ordinary Least Squares Regression

Chapter 3: Examining Relationships

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

ANOVA in SPSS (Practical)

Chapter 14: More Powerful Statistical Methods

TEACHING REGRESSION WITH SIMULATION. John H. Walker. Statistics Department California Polytechnic State University San Luis Obispo, CA 93407, U.S.A.

International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: Volume: 4 Issue:

Testing the Predictability of Consumption Growth: Evidence from China

STATISTICS INFORMED DECISIONS USING DATA

Score Tests of Normality in Bivariate Probit Models

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Multiple Linear Regression Analysis

Introduction to Quantitative Methods (SR8511) Project Report

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

STAT 201 Chapter 3. Association and Regression

MEASURES OF ASSOCIATION AND REGRESSION

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

11/24/2017. Do not imply a cause-and-effect relationship

Validation of Coping Styles and Emotions Undergraduate Inventory on Romanian Psychology Students

Use the above variables and any you might need to construct to specify the MODEL A/C comparisons you would use to ask the following questions.

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

SPSS Learning Objectives:

INVESTIGATION OF ROUNDOFF NOISE IN IIR DIGITAL FILTERS USING MATLAB

14.1: Inference about the Model

Luci Karbonini. Pmf St. SWAP May 27,2011

d =.20 which means females earn 2/10 a standard deviation more than males

Unit 1 Exploring and Understanding Data

Basic concepts and principles of classical test theory

Bangor University Laboratory Exercise 1, June 2008

Antecedents Concerning Student s Identity and Role Performance in the Development and Promotion of Science and Technology Talents Project (DPST)

2 Assumptions of simple linear regression

Prediction of sheep milk chemical composition using ph, electrical conductivity and refractive index

Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms).

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

CHAPTER III RESEARCH METHODOLOGY

Infrastructure Development and Economic Growth Nexus in Nigeria

bivariate analysis: The statistical analysis of the relationship between two variables.

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI

Math 124: Module 2, Part II

Determinants and Status of Vaccination in Bangladesh

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Transcription:

Analysing and Comparing the Final Grade in Mathematics by Linear Regression Using Excel and SPSS Elena Karamazova #, Teuta Jusufi Zenku *, Zoran Trifunov #3 Faculty of Computer Sciences Department of Mathematics and Statistics University "Goce Delcev" Stip Republic of Macedonia Faculty of Technical Sciences University "Mother Teresa" Skopje 000 Skopje, Republic of Macedonia 3 Faculty of Technical Sciences University "Mother Teresa" Skopje 000 Skopje, Republic of Macedonia Abstract: In this paper simple and multiple linear regression model is developed to analyze and compare the final grades of students from two Universities, University "Goce Delcev" - Stip, more specifically a group of students who studied in Kavadarci and students from "Mother Teresa" University in Skopje, for the subject of Mathematics. The models were based on the data of students' scores in three tests, st periodical exam, nd periodical exam and final examination. Statistical significance of the relationship between the variables has been provided. For obtaining our results some solvers like Excel and SPSS were used. Keywords: simple linear regression, multiple linear regression, st periodical exam, nd periodical exam, final examination. I. INTRODUCTION Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous variables: one variable, denoted x, is regarded as the predictor, explanatory, or independent variable. The other variable, denoted y, is regarded as the response, outcome, or dependent variable. Multiple linear regression can be used to analyze data from causal-comparative, correlational, or experimental research. In addition, it provides estimates both of the magnitude and statistical significance of relationships between variables. Multiple linear regression is one of the most widely used statistical techniques in educational research. Multiple linear regression is defined as a multivariate technique for determining the correlation between a response variable y and some combination of two or more predictor variables, X. In [], such a model is developed to predict anomalies of westward-moving intraseasonal precipitable water by utilizing the first through fourth powers of a time series of outgoing longwave radiation that is filtered for eastward propagation and for the temporal and spatial scales of the tropical intraseasonal oscillations. An independent and simpler compositing method is applied to show that the results of this multiple linear regression model provide better description of the actual relationships between eastward and westward moving intraseasonal modes than a regression model that includes only the linear predictor. In [] the application of regression models in macroeconomic analyses is emphasized. The particular situation approached is the influence of final consumption and gross investments on the evolution of Romania s Gross Domestic Product. II. SIMPLE AND MULTIPLE LINEAR REGRESSION MODEL A simple linear regression is carried out to estimate the relationship between a dependent variable, y, and a single explanatory variable, x, given a set of data that includes observations for both of these variables for a particular population. form is: y x () 0 where β s denote the population regression coefficients, and ε is a random error. A multiple linear regression analysis is carried out to predict the values of a dependent variable, y, given a set of k explanatory variables (x, x,.,x k ). form is: y x x x () 0 kk where β s denote the population regression coefficients, and ε is a random error. Excel and SPSS regression computer programs were used to determine the regression coefficients and analyse the data. ISSN: 3-5373 http://www.ijmttjournal.org Page 334

III. PROBLEM AND OBJECT OF STUDY The purpose of this study was to contribute to knowledge relating to the use of simple and multiple linear regression in educational research by establishing an appropriate linear regression model to analyze the relationship between variables as a final grade in the student exam in Mathematics (considered as a dependent or variable Y) depending on the st periodical exam and nd periodical exam (considered as independent or predictive X variables). The data was collected by a sample of 40 students, 0 of University Goce Delcev - Stip (UGD) and 0 from students of Mother Teresa University - Skopje (MTU). In order to determine the regression coefficients and to analyze the data, math applicative software was used. IV. EXCEL ANALYSIS OF DATA FROM MTU The equation of the multiple regression model is: Y.84 3.30X 0.436 X. for the multiple regression model is given in figure. The equation of the simple regression model for the X variable is: Y.538.863 X. for the simple regression model for the X variable is given in figure. The equation of the simple regression model for X variable is: Y.375 3.3 X. for the simple regression model for the X variable is given in figure3. V. SPSS ANALYSIS OF DATA FROM MTU In figure 4, 5 and 6 are given model summary for the simple regression model for the X variable, for simple regression model for the X variable and for multiple regression model respectively with SPSS for MTU. VI. INTERPRETING THE RESULTS FROM MTU From the analysis made using two application softwares we see the same results in the tables obtained. Some individual parameters are interpreted as follows: represents the extent of the relation between the dependent variables and two or more independent variables. The closer to one is, the greater is the connectivity. In our results we see that R> 0.93 in all cases, which means that the correlation between the Y variable and The multiple correlation coefficient R the X and X variables is relatively strong. The coefficient of determination R represents the variance of the variables interpreted by the model. The coefficient of determination serves as a measure of representation of the model. Value R 0. 87, therefore it can be said that found linear regression models are representative. Where, as we can see from tables the most representative is the multiple linear regression model. The statistical significance of the regression model is determined based on the value of the empirical ratio F, i.e. of the corresponding value p (so-called group test of the importance of the regression model). Where for our case of multiple model is 0.000000000003, i.e. with the level of risk p <0.0, we claim that at least one of the regression variables has a statistically significant impact on the final grade in Mathematics, respectively, the multiple linear regression model is statistically significant. In the previous case, the statistical significance of the overall regression model was assessed. The data found in the regression output tables provide information on the statistical significance of the respective regression coefficients (so-called relevant regression model significance test). The statistical significance of the regression coefficients is determined on the basis of t-test i.e., of the corresponding values P. For our cases this value is less than 0.05 so we conclude that in the linear regression models both two variables have an impact of statistical significance on the "final grade in Mathematics" variable. ISSN: 3-5373 http://www.ijmttjournal.org Page 335

SUMMARY OUTPUT Multiple R 0,9777 R 0,9559 0,9507 Error 3,7763 Observations 0,0000 Df SS MS F Significance F Regression 556,37 68,9 84,30 0,000000000003 Residual 7 4,43 4,6 Total 9 5498,80 Error t Stat P-value Intercept,837,35,0696 0,998 -,0 6,7884 -,0 6,7884 X Variable 3,300 0,5650 5,770 0,0000,0380 4,40,0380 4,40 X Variable -0,4359 0,6469-0,6738 0,5095 -,8007 0,990 -,8007 0,990 Fig. Regression Statistic for the multiple regression model SUMMARY OUTPUT Multiple R 0,977 R 0,9547 0,95 Error 3,786 Observations 0,0000 Df SS MS F Significance F Regression 549,8984 549,8984 379,6608 0,000000000000 Residual 8 48,906 3,879 Total 9 5498,8000 Error t Stat P-value Intercept,538,7980 0,8554 0,4036 -,394 5,355 -,394 5,355 X Variable,869 0,469 9,4849 0,0000,554 3,75,554 3,75 Fig. Regression Statistic for the simple regression model for the X variable ISSN: 3-5373 http://www.ijmttjournal.org Page 336

SUMMARY OUTPUT Multiple R 0,9334 R 0,87 0,8640 Error 6,740 Observations 0,0000 df SS MS F Significance F Regression 4790,733 4790,733,696 0,0000000093 Residual 8 708,567 39,366 Total 9 5498,8000 Error t Stat P-value Intercept -,3747 3,384-0,406 0,6894-8,4847 5,7353-8,4847 5,7353 X Variable 3,33 0,838,036 0,0000,5350 3,777,5350 3,777 Fig.3 Regression Statistic for the simple regression model for the X variable Summary R R.977 a,955,95 3,7858 a. Predictors: (Constant), x a Sum of s Df Mean F Sig. Regression 549,898 549,898 379,66.000 b Residual 48,90 8 3,88 Total 5498,800 9 b. Predictors: (Constant), x a ized Unstandardized (Constant),538,798,855,404 x,863,47,977 9,485,000 Fig. 4 Summary with SPSS for the simple regression model for the X variable for MTU ISSN: 3-5373 http://www.ijmttjournal.org Page 337

Summary R R.933 a,87,864 6,7396 a. Predictors: (Constant), x a Sum of s df Mean F Sig. Regression 4790,73 4790,73,696.000 b Residual 708,57 8 39,363 Total 5498,800 9 b. Predictors: (Constant), x a ized Unstandardized (Constant) -,375 3,384 -,406,689 x 3,3,84,933,03,000 Fig. 5 Summary with SPSS for the simple regression model for the X variable for MTU R R.978 a,956,95 3,77630 a. Predictors: (Constant), x, x a Sum of s df Mean F Sig. Regression 556,37 68,86 84,99.000 b Residual 4,48 7 4,60 Total 5498,800 9 b. Predictors: (Constant), x, x ized Unstandardized (Constant),84,35,070,300 x 3,30,565,0 5,77,000 x -,436,647 -,30 -,674,50 Fig. 6 Summary for the multiple regression model with SPSS for MTU VII. EXCEL ANALYSIS OF DATA FROM UGD The equation of the multiple regression model is Y.96.754X.43X. for the multiple regression model is given in figure 7. ISSN: 3-5373 http://www.ijmttjournal.org Page 338

The equation of the simple regression model for the X variable is Y 6.547.673 X. for the simple regression model for the X variable is given in figure 8. The equation of the simple regression model for the X variable is Y 0.947 3.67 X. for the simple regression model for the X variable is given in figure 9. VIII. SPSS ANALYSIS OF DATA FROM UGD In figure 0, and are given model summary for the simple regression model for the X variable, for simple regression model for the X variable and for the multiple regression model respectively with SPSS for U. IX. INTERPRETING THE RESULTS FROM UGD From the analysis made using two application softwares we see the same results in the tables obtained. Individual parameters are interpreted as follows: The multiple correlation coefficient R represents the extent of the relation between the dependent variables and two or more independent variables. The closer to one is, the greater is the connectivity. In our results we see that R> 0.94 in all cases, which means that the correlation between the Y variable and the X and X variables is relatively strong. The coefficient of determination R represents the variance of the variables interpreted by the model. The coefficient of determination serves as a measure of representation of the model. Value R 0. 87, therefore it can be said that found linear regression models are representative. Where, as we can see from tables the most representative is the multiple linear regression model. The statistical significance of the regression model is determined based on the value of the empirical ratio F, i.e. of the corresponding value p (so-called group test of the importance of the regression model). Where for our case of multiple model is 0.000000000005, i.e. with the level of risk p <0.0, we claim that at least one of the regression variables has a statistically significant impact on the final grade in Mathematics, respectively, the multiple linear regression model is statistically significant. In the previous case, the statistical significance of the overall regression model was assessed. The data found in the regression output tables gives us information on the statistical significance of the respective regression coefficients (so-called relevant regression model significance test). The statistical significance of the regression coefficients is determined on the basis of t-test i.e., of the corresponding values P. For our cases this value is less than 0.05 so we conclude that in the linear regression models both two variables have an impact of statistical significance on the "final grade in Mathematics" variable. X. CONCLUSIONS From the analysis above, our multiple regression model as well as simple linear regression models for predicting and analyzing the final grade in Mathematics, Y, from both Universities are useful and adequate. For further work, we can consider developing and studying similar models in the field of education, social sciences, economics, etc., adding different variables. ISSN: 3-5373 http://www.ijmttjournal.org Page 339

SUMMARY OUTPUT Multiple R 0,976 R 0,9530 0,9475 Error 3,765 Observations 0,0000 Df SS MS F Significance F Regression 489,543 445,775 7,583 0,000000000005 Residual 7 4,0069 4,769 Total 9 53,5500 Error t Stat P-value -,5343 7,4555 Intercept,9606,305,3897 0,86 -,5343 7,4555 X Variable,7538 0,3360 5,95 0,000,0449,467,0449,467 X Variable,49 0,409 3,0375 0,0074 0,3796,063 0,3796,063 Fig. 7 for the multiple regression model SUMMARY OUTPUT Multiple R 0,963 R 0,976 0,935 Error 4,5449 Observations 0,0000 Df SS MS F Significance F Regression 4760,7397 4760,7397 30,4759 0,0000000000 Residual 8 37,803 0,656 Total 9 53,5500 Error t Stat P-value Intercept 6,5465,407 3,058 0,0068,049,0440,049,0440 X Variable,673 0,76 5,84 0,0000,3033 3,043,3033 3,043 Fig. 8 for the simple regression model for the X variable ISSN: 3-5373 http://www.ijmttjournal.org Page 340

SUMMARY OUTPUT Multiple R 0,9369 R 0,8778 0,870 Error 5,9030 Observations 0,0000 Df SS MS F Significance F Regression 4505,33 4505,33 9,94 0,0000000095 Residual 8 67,77 34,8460 Total 9 53,5500 Error t Stat P-value Intercept 0,9468 3,849 0,88 0,7765-5,9545 7,848-5,9545 7,848 X Variable 3,670 0,785,3707 0,0000,588 3,75,588 3,75 Fig. 9 for the simple regression model for the X variable Summary R R.963 a,98,94 4,54490 a. Predictors: (Constant), x a Sum of s df Mean F Sig. Regression 4760,740 4760,740 30,476.000 b Residual 37,80 8 0,656 Total 53,550 9 b. Predictors: (Constant), x a Unstandardized ized (Constant) 6,547,4 3,058,007 x,673,76,963 5,8,000 Fig. 0 Summary with SPSS for the simple regression model for the X variable ISSN: 3-5373 http://www.ijmttjournal.org Page 34

Summary R R.937 a,878,87 5,90305 a. Predictors: (Constant), x a Sum of s Df Mean F Sig. Regression 4505,3 4505,3 9,9.000 b Residual 67,8 8 34,846 Total 53,550 9 b. Predictors: (Constant), x a ized Unstandardized (Constant),947 3,85,88,776 x 3,67,79,937,37,000 Fig. Summary with SPSS for the simple regression model for the X variable for UGD Summary R R.976 a,953,948 3,765 a. Predictors: (Constant), x, x a Sum of s df Mean F Sig. Regression 489,543 445,77 7,58.000 b Residual 4,007 7 4,77 Total 53,550 9 b. Predictors: (Constant), x, x a Unstandardized ized (Constant),96,30,390,83 x,754,336,63 5,9,000 x,43,409,368 3,038,007 Fig. Summary with SPSS for for the multiple regression model for UGD ISSN: 3-5373 http://www.ijmttjournal.org Page 34

REFERENCES [] C. Anghelache, M.G. Anghel, L. Prodan, C. Sacală and M. Popovici, Multiple Linear Regression Used in Economic Analyses, Romanian Statistical Review Supplement, Vol. 6. Issue 0, pp. 0-7, (October), 04 [] C. Anghelache, A. Manole and M. G. Anghel, Analysis of final consumption and gross investment influence on GDP multiple linear regression model, Theoretical and Applied Economics, Volume XXII No. 3(604), pp. 37-4, 05 [3] A.C Cameron, and P.K Trivedi, Regression Analysis of Cout Data, Cambridge, 998. [4] F. Hoxha, Metoda tё analizёs numerike, Tiranё, 008. [5] D. Jaksimovic, Zbirka Zadataka iz poslovne statistike, Beograd, 004. [6] Sh. Leka, Teoria e probabiliteteve dhe statistika matematike, Tiranë, 004. [7] J. T. McClove, P. George Benson, Terry Sincich, Statistics for Business and Economics, USA, 998. [8] M. Papiq, Statistika e aplikuar në MS EXEL, Prishtinë, 007. [9] L. Puka, Probabliliteti dhe Statistika e zbatuar, Tiranё, 00. [0] J. M. Rojo, Regresion lineal multiple, Madrid, 007. [] P. E. Roundy and W. M. Frank, Applications of a Multiple Linear Regression to the Analysis of Relationships between Eastward and Westward Moving Intraseasonal Modes, Journal of the atmospheric sciences, pp. 304-3048, 004 [] M. Shakil, A multiple Linear regression to Predict the Student s Final Grade in a Mathematics Class, [3] A. F. Siegel, Pratical Statistics, USA, 996. [4] D. S. Wilks, Statistical Methods in the Atmospheric Sciences: An Introduction, International Geophysical Series, Vol. 59, Academic Press, pp. 467, 995 Appendix Fig. 3 Appendix with Data for MTU ISSN: 3-5373 http://www.ijmttjournal.org Page 343

Fig. 4 Appendix with Data for UGD ISSN: 3-5373 http://www.ijmttjournal.org Page 344