Interpretation of Data and Statistical Fallacies

Similar documents
CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

The Human Side of Science: I ll Take That Bet! Balancing Risk and Benefit. Uncertainty, Risk and Probability: Fundamental Definitions and Concepts

Psychology Research Process

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

Statistics Concept, Definition

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Sheila Barron Statistics Outreach Center 2/8/2011

Descriptive Statistics Lecture

Basic Biostatistics. Dr. Kiran Chaudhary Dr. Mina Chandra

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

Unit 1 Exploring and Understanding Data

Avoiding Statistical Jabberwocky

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Psychology Research Process

9 research designs likely for PSYC 2100

Chapter 1 Review Questions

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Welcome to OSA Training Statistics Part II

Collection of Biostatistics Research Archive

Role of Statistics in Research

Helping Your Asperger s Adult-Child to Eliminate Thinking Errors

Carrying out an Empirical Project

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Quality Digest Daily, March 3, 2014 Manuscript 266. Statistics and SPC. Two things sharing a common name can still be different. Donald J.

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

PSY 4960/5960 Science vs. Pseudoscience

Reliability and Validity

Chapter 02. Basic Research Methodology

Thinking Critically with Psychological Science

On the purpose of testing:

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Statistics Mathematics 243

Confidence Intervals On Subsets May Be Misleading

Generalization and Theory-Building in Software Engineering Research

20. Experiments. November 7,

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

First Hourly Quiz. SW 430: Research Methods in Social Work I

Introduction to Statistical Data Analysis I

Political Science 15, Winter 2014 Final Review

Chapter 2--Norms and Basic Statistics for Testing

2 Critical thinking guidelines

Chapter 1: Exploring Data

Handout 16: Opinion Polls, Sampling, and Margin of Error

Chapter 5: Field experimental designs in agriculture

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

AP Statistics Practice Test Ch. 3 and Previous

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Chapter 3: Examining Relationships

Rational Choice Theory I: The Foundations of the Theory

Color naming and color matching: A reply to Kuehni and Hardin

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Are You a Professional or Just an Engineer? By Kenneth E. Arnold WorleyParsons November, 2014

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

Prentice Hall Connected Mathematics 2, 8th Grade Units 2006 Correlated to: Michigan Grade Level Content Expectations (GLCE), Mathematics (Grade 8)

Chapter-2 RESEARCH DESIGN

Introduction to biostatistics & Levels of measurement

3.2 Least- Squares Regression

Pitfalls in Linear Regression Analysis

Introduction: Statistics, Data and Statistical Thinking Part II

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl

Statistical Power Sampling Design and sample Size Determination

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

Chapter 3: Describing Relationships

1.4 - Linear Regression and MS Excel

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

STATISTICS AND RESEARCH DESIGN

Objectives. Data Collection 8/25/2017. Section 1-3. Identify the five basic sample techniques

AP Psychology -- Chapter 02 Review Research Methods in Psychology

Psychologist use statistics for 2 things

Lesson 11.1: The Alpha Value

Statistical Methods Exam I Review

An Introduction to Statistical Thinking Dan Schafer Table of Contents

the examples she used with her arguments were good ones because they lead the reader to the answer concerning the thesis statement.

VARIABLES AND MEASUREMENT

Lesson 8 Descriptive Statistics: Measures of Central Tendency and Dispersion

6. Unusual and Influential Data

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

STA Module 9 Confidence Intervals for One Population Mean

Chapter 2 Psychological Research, Methods and Statistics

7) Find the sum of the first 699 positive even integers. 7)

DAY 2 RESULTS WORKSHOP 7 KEYS TO C HANGING A NYTHING IN Y OUR LIFE TODAY!

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Chapter 13. Experiments and Observational Studies

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

INADEQUACIES OF SIGNIFICANCE TESTS IN

Never P alone: The value of estimates and confidence intervals

Effectiveness of planned teaching programme on knowledge regarding ill effects of alcohol consumption.

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

* In math: proof (nothing else accepted) * In Science, medicine, etc.: replicability, randomized control trial, meta-analysis, systematic review

Section 3.2 Least-Squares Regression

Organizational Behaviour

Unit 5. Thinking Statistically

Transcription:

ISSN: 2349-7637 (Online) RESEARCH HUB International Multidisciplinary Research Journal Research Paper Available online at: www.rhimrj.com Interpretation of Data and Statistical Fallacies Prof. Usha Jogi Lecturer, Department of Business Administration, Shri G. K. and C. K. Bosamiya Arts and Commerce College, Jetpur, Gujarat (India) Email: usha.gkck@gmail.com Abstract: Although statistics is essential to almost all sciences-social, physical, and natural and is very widely used in almost all spheres of huminity. It is not without limitations. The most significant limitation of statistics is the misuse of it. The use of statistic by the experts who are well experienced and skilled can analyze and interpreted the statistical data in correct way. Wrong interpretation of the data results in statistical fallacies which may arise in the collection, presentation, analysis, and interpretation of data. This paper throws light on the factors which give rise to misinterpretation of data and statistical fallacies. The study of these factors leads to minimize the statistical fallacies. Utmost care and precaution should be taken for the interpretation of statistical data. Statistics should not be used as a blind man uses a lamp-post for support instead of illumination. Statistical interpretation depends not only on statistical ideas but also on ordinary clear thinking. Clear thinking is not only essential in interpreting statistics but is sufficient in the absence of specific statistical knowledge. Statistics neither proves nor disproves anything. It is merely a tool which, if rightly used may prove extremely useful but if misused by inexperienced, unskilled, and dishonest statisticians might lead to very fallacious conclusions Statistics are like clay of which you can make a god or a devil as you please. The use of statistics by the experts who are well experienced and skilled in the analysis and interpretation of statistical data reduce the chances of mass popularity of this important science. Science of statistic is the useful servant but only of great value to those who understand its proper use. Interpretation of statistical data involves: I. Summarization and analysis of data. II. Prediction or inference. I. MEANING AND NEED Interpretation of data means to draw inference on the basis of colleted data after proper statistical analysis. The interpretation can be true if the collection, presentation analysis is properly done. Sometime wrong interpretation of the data may arise which leads to statistical fallacies. It may happen due to presentation, collection, analysis of the data wrongly. II. IMPORTANT FACTORS FOR MIS-INTERPRETATION OF DATA OR STATISTICAL FALLACIES The factors giving rise to misinterpretation of the data or statistical fallacies are; I. Bias II. Inconsistency in definition III. False generalization IV. Comparison between unlike components V. Wrong interpretation of statistical measures VI. Wrong interpretation of index components of time series VII. Technical errors VIII. Errors in selection of data distrust of statistic A. BIAS Bias is very common in statistical investigation and it leads to wrong conclusions. In any statistical investigation biased errors may be due to, Bias on the part of investigator whose personal prejudices are affecting the results of the enquiry. Bias in the measuring instrument in recording the observations. Bias due to wrong collection of the data and wrong formula used for the analysis of the data. Page 1 of 6

Bias in the technique of approximation: while rounding off each individual value is either approximated to next highest or lowest number so that all the errors move in the same direction. e.g. if the figures are 405 and 495 they are recorded as(rounded off) to 400 and 500 respectively. B. INCONSISTENCY IN DEFINITION In order the comparisons of phenomenon to be valid if it is necessary that the definitions of the object should remain constant throughout. If there is need of change in definition a footnote should be given so as to make valid comparisons. For Example, To construct cost of living index number the proper unit for enumeration is household. It should be clear whether household consist of a family blood relations or people taking food in common kitchen or all the persons living in the house or the persons listed in the ration card only. The concept of household for enquiry is to be decided in advance. So, in order to have meaningful comparison of the results the same pattern of classification should adopt throughout the analysis and also for further enquiries in the same subject. C. FALSE GENERALIZATION Many times we commit errors in drawing conclusions or making generalizations on the basis of data which is not representative of the population. For Instance, a. The number of car accidents committed in a city in a particular year by women drivers is 10 while those committed by man drivers is 40. Hence, women are safe drivers. This statement is obviously wrong because nothing is given about total number of men and women drivers in the city in the given year. b. It has found that 25% of the surgical operations by surgeon X are successful. This conclusion is not true. Out of 4 surgical operations if 3 operations on particular day are unsuccessful the 4th must be success is not true. It may be happen that 4th operation is also unsuccessful or it may happen that all 4 operations are successful. The above statement means that if the numbers of operations are larger we should expect 25% of the operations to be successful. c. Incomplete data causes to fallacious of conclusion. Let us consider the score of these two students A and B. Test-1 Test-2 Test-3 Average Score A s Score 50 60% 70% 60% B s Score 70% 60% 50% 60% If we consider average score which is 60% in each case we will conclude that the level of intelligence of the two students is same at the end of the year. But this is wrong conclusion. If we observe carefully A has improved constantly while B has deteriorated consistently. D. IMPROPER COMPARISONS The comparisons will be meaningful only if the things are really a likeotherwise the conclusions drawn will be wrong. For instance to make valid comparisons of index number it is essential that the commodities are of same quality or grade. Both the index numbers should be calculated by the same formula. If one index number is calculated by Laspeyre s formula then both index numbers are not comparable. E. WRONG INTERPRETATION OF STATISTICAL MEASURES The different type of measures like averages, dispersion, skewness etc. calculated from the collected data are many times misused to present the information for personal motives which misguide the public. a. Wrong interpretation of averages 1. Let us consider the industry the average salary of the workers is say Rs.5, 500 per week. If the salaries of few bosses are added the average wage per worker comes out to Rs. 7,500. Suppose if we say that average salary of workers is Rs.7, 500 Page 2 of 6

per week. It shows very good impression and it shows that the living standard of workers is good but the real picture is entirely different. So, for extreme observations the arithmetic mean gives wrong interpretation which leads to misleading conclusions. 2. Vishal gets pocket money allowance of Rs.12 per day. He felt that this was less amount and he enquired about the pocket money of his friends and he get the following data including Vishal s allowance in Rs. 12, 18, 10, 5, 25, 20, 22, 15, 10, 10, 15, 13, 20, 18, 10, 15, 10, 18, 15, 12, 15, 10, 15, 10, 12, 18, 20, 5, 8, 20. He presented this data to his father and requested for increase in his allowance. His father was statistician replied that Vishal s amount of allowance was more than average amount. Comment on father s statement. SOLUTION The frequently distribution of the daily allowance is as follows. Daily allowance (in Rs.) Tally Marks Frequency fx 5 II 2 10 8 I 1 8 10 IIIIIII 7 70 12 III 3 36 13 I 1 13 15 IIIIII 6 90 18 IIII 4 72 20 IIII 4 80 22 I 1 22 25 I 1 25 The average daily allowances are given by: Total N=30 fx=426 = =14.20 Since maximum frequency is Rs.7 for daily allowances Rs.10 which gives mode. Hence, modal value is Rs.10 Vishal s allowances are Rs.12 which is less than arithmetic mean X. But his father pointed out that his amount was more than average amount by referring to mode at the average which is Rs.10. b. Wrong interpretation of measures of dispersion and skewness Let us have two series of height in meters and weight in kgs for 10 persons the range for series of height is less than that series of weight. It may be concluded that the series of weight is more variable than the series of height. Without calculating S.D. of both the series however this is wrong. Calculate coefficient of variation (C.V.) for both the series. S.D. 100 σ C.V. = 100* ------------ = -------------- Which is pure number independent of units of measurement, distribution with smaller C.V. is said to be more homogenous or less variable than the other and the series with greater C.V. is said to be more variable than the others. Mean After training the average daily wage in a factory had increased from Rs.8 to Rs.12 and S.D. had increased from Rs.2 to Rs.2.50. After training the wage has become higher and more uniform. Comment on the statement. : It is given that after training the average hourly wages of workers have increased from Rs.8 to Rs.12. this implies that the total wages received per hour by all the workers together have increased but we cannot conclude that the wage of each individual has increased. Let us calculate coefficient of variation (C.V.) of the wages of workers to know about uniformity. Page 3 of 6

C.V. of wages before training - C.V. = 100*S.D. 100 *2 ------------ = -------------- = 25 Mean 8 C.V. of wages after training 100*S.D. 100 *2.5 C.V. = ------------ = -------------- = 20.93 Mean 8 Since C.V. wages after training is less than C.V. of wages before training. It may be concluded that the wages before more uniform after the training. c. Wrong interpretation of correlation, co-efficient, and regression coefficients Sometimes correlation coefficient may be wrongly used to study the relationship between two series which are really independent of each other. E.g. correlation may be studied between. 1. The series of height and income of individuals over a period of time. 2. The series of intelligence and size of shoe. If we observe higher degree of correlation coefficient between the size of the shoe and the intelligence which means that larger the shoe size more intelligent the person is. The conclusion is absurd or wrong because two series are really independent such a correlation is called chance correlation or nonsense correlation. Example-1 Suppose that the correlation coefficient between the two series of marks of 10 students in mathematics and statistics is A preferable error of correlation coefficient is: 1-r 2 P.E.(r) = 0.6745 * ---------------- Significance of r: r = 0.9 and 6 P.E. = 6 * 0.0405 = 0.2430 1-0.81 = 0.6745 * ---------------- 10 0.6745 * 0.19 = ------------------------- 10 = 0.0405 Since r is much greater than 6 P.E. (r) if hence r is highly significant. Remarks: r id significant we can say that higher the mark of student in mathematics higher is the score in the statistics and vice-versa. However it does not mean that all the students who are good in mathematics are also good in statistics and vice-versa. From the above example it is clear that The coefficient of correlation expresses the relationship between two series and not between the individual items of the series. Example- 2 If the correlation coefficient between two variables is zero then the variables are independent comment. Page 4 of 6

The statement is wrong rxy = 0 does not mean that x and y are independent. It simply means that there is absence of linear relationship between x and y. the variables may be related in same other form quadratic, logarithmic, exponential or trignometric form. Example-3 A correlation between the variables has a value r = 0.6 and correlation between other two variables is 0.3. Does it mean that the first correlation is twice stronger than the second? Discuss the above closeness of relationship. It is wrong to say that correlation in the first case is twice as strong as correlation in the second case. r1 = 0.6 coefficient of determination (r1)2 = 0.36 r2 = 0.3 coefficient of determination (r2)2 = 0.09 Thus the correlation in the first case (r1 = 0.6) is 4 time a strong as the correlation in second case r2 = 0.3 the closeness of relationship between two variables is not proportional. Example-4 Give the comments on the following statement. rxy = 0.9, byx = 2.04, bxy = -3.2 Regression coefficient cannot have different signs. Here, byx is positive and bxy is negative which is not positive. Again rxy, byx, bxy must have the same sign. Sometime index numbers are wrongly interpreted. F. WRONG INTERPRETATION OF INDEX COMPONENTS OF TIME SERIES For example, If the consumer price index is higher for Delhi than that of Mumbai does it mean that Delhi is more expensive than Mumbai. If the consumer price index is higher for Delhi than that of Mumbai generally it can be said that Delhi is more expensive than Mumbai but this is not absolutely correct because 1. Due to the variations in the conclusion pattern even with the same prices the living at one place may be expensive than the living at the other place. 2. The sources of obtaining the quotations of prices and quantities consumed of variance things may be different. This may make all differences. The various components of time series may be analyzed and interpreted incorrectly. For Example, We may fit linear trend equation to the given set of data by the principle of least squares and use it to and corresponding trend values and future estimates these value and estimates will be valid if the data really exhibits a linear trend. The linear trend is commonly used but it is rarely observed in economic and business data. G. TECHNICAL ERRORS Some of the technical errors in statistical work which result in wrong conclusions are : 1. The errors due to improper choice of sampling technique for the selection of sample e.g. we may use random sampling where specified sampling may be appropriate. 2. Size of the sample is inadequate. 3. Error due to improper choice of the statistical formula. e.g. we may use media where made is more appropriate or the use of arithmetic mean where geometric mean is appropriate. 4. The errors due to bias in the estimation method. This is due to improper choice of estimation technique. 5. The error committed in the use of ratio and percentage. e.g. if the price of commodity increases from Rs. 25 to Rs.75 during a particular period the one generally concludes that there is 300 % Increase in price during the given period. This is wrong the actual increase is 200% Page 5 of 6

Increase in Rs is (Rs.75 Rs.25) = Rs.50 50 Hence increase in Rs.100 = ---------------- * 100 = 200 25 H. EFFECT OF WRONG INTERPRETATION OF DATA - DISTRUST OF STATISTIC The wrong interpretation of data promotes distrust of statistics. The improper use of statistics tools by irresponsible, dishonest people with an improper statistical bend of mind has led to the public distrust in statistics due to which public loses it belief faith and confidence in the science of statistics and starts blaming it. The reason behind blaming may be; a. Figures are innocent and believable but they do not have the lable of quality on their face. b. Arguments are put forward to find certain results which are not true by making use of inaccurate figures or by using incomplete data and hence, distorting the truth. III. CONCLUSION Hence, if statistics and its tools are misused. The fault does not lie with the science of statistics but it is the people who misuse it, are to be blamed. Utmost care and precaution should be taken for the interpretation of statistical data. Statistical should not be used as a blind man uses a lamp-post for support instead of illumination. Page 6 of 6