Dr. SANDHEEP S. (MBBS MD DPH) Dr. BENNY PV (MBBS MD DPH) (DATA ANALYSIS USING SPSS ILLUSTRATED WITH STEP-BY-STEP SCREENSHOTS)
Biostatistics in a Nut Shell For Medical Researchers i
Publishing-in-support-of, EDUCREATION PUBLISHING RZ 94, Sector - 6, Dwarka, New Delhi - 110075 Shubham Vihar, Mangla, Bilaspur, Chhattisgarh - 495001 Website: www.educreation.in Copyright, Author All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form by any means, electronic, mechanical, magnetic, optical, chemical, manual, photocopying, recording or otherwise, without the prior written consent of its writer. ISBN: 978-93-85247-19-4 Price: ` 570.00 The opinions/ contents expressed in this book are solely of the author and do not represent the opinions/ standings/ thoughts of Educreation. The book is released by using the services of self-publishing house. Printed in India ii
BIOSTATISTICS IN A NUT SHELL FOR MEDICAL RESEARCHERS (Data analysis using SPSS illustrated with step-by-step screenshots) Dr SANDHEEP SUGATHAN (MBBS MD DPH) & Dr BENNY PV (MBBS MD DPH) EDUCREATION PUBLISHING (Since 2011) www.educreation.in iii
iv
About The Author Dr Sandheep Sugathan had acquired his MBBS, MD and DPH from Govt. Medical College, Trivandrum, India and currently is an academic faculty member handling Epidemiology, Research Methodology and Biostatistics in University Kuala Lumpur Royal College of Medicine Perak, Malaysia. He is training medical researchers from various specialities in proposal development, sample size determination, data management, systematic reviews and journal writing. He is well experienced in systematic reviews and meta analysis. He is author of many peer reviewed research publications and member in editorial board of many journals. Dr Sandheep Sugathan got many trainings in Biostatistics and Research Methodology from University Science Malaysia and Christian Medical College, Vellore, India. ************* v
About The Co-Author Dr Benny PV had acquired his MBBS, MD and DPH from Govt. Medical College, Trivandrum, India and is currently the Professor of Community Medicine and Epidemiology in Sree Gokulam Medical College, Kerala, India. He is a consultant in Revised National Tuberculosis Control Program and HIV AIDS to Govt. of Kerala, India. He is the Chief Editor of an International Journal (International Journal of Preventive and Therapeutic Medicine). He is currently involved in many clinical and epidemiological research. Dr Benny is Secretary and Treasurer of Indian Medical Association s popular public magazine Nammude Arogyam. ************* vi
Preface As medical doctors, we prepared this book based on our teaching and research experience over last 14 years. Medical research has become a vital component of clinical and academic practices. Skills regarding data processing and analysis can help each medical researcher to become empowered and confident in research. We identified and sensed a growing need and demand in data management among the researchers from various specialities. This book consists of 2 parts. Part 1 of this book consists of 8 chapters, in which we tried to cover all the must to know areas in Biostatistics. Sample size calculation, Tests of Normality and Non Parametric Tests of statistical significance are the main areas where many of the researchers get confused and lost. In this book we had explained the steps of these topics in detail with screen shots. Formulas in sample size determination in different research scenarios are explained, so that those researchers who do not have access to sample size software can also calculate sample size confidently. The knowledge in basic biostatistics can help the researchers to analyse, interpret and present their research results in a better way. Part 2 of this book on Managing your data using SPSS was carefully prepared by keeping in mind the steps through which each researcher passes through. All the vii
steps were illustrated using screenshots, to make learning easy and enjoyable. We express our sincere gratitude to all our students, medical researchers, other healthcare researchers and all other stakeholders for helping us to identify the key areas to be explored during the process of writing of this book. Screenshots used in this book was from SPSS Version 17, Microsoft Excel 2013 and EpiInfo version 7. Hope our initiative helps you to acquire skills, confidence and competencies in analysing, presenting and publishing your data. Wish you all good luck. Dr Sandheep Sugathan & Dr Benny PV ************* viii
Content List S. No. Chapter Page No. 01. Introduction to Medical Statistics Main uses of statistical methods 02. Data and Scales of Measurements Data, types of data, Sources of data. Sources of Primary and Secondary data, Classification of data. Scales of Measurement Nominal, Ordinal, Interval, Ratio. 02 04 03. Descriptive Statistics Components. Measures of Central Tendency or Central value. Measures of Dispersion or Measures of spread. Distribution Of Data. Normal or Gaussian distribution. Negatively and Positively Skewed Data. Normal distribution curve and Chebyshev s Theorem. Presentation of Data- Tables and diagrams. 09 ix
04. 05. 06. Standard Error And 95% Confidence Intervals Calculation of Standard Error and 95% Confidence Interval of :- Mean. Proportion. Difference between Means. Difference between Proportions. Tests Of Normality Shapiro Wilk Test, Q Q Plots. Use of Histogram, Normality curve & Box - Whisker Plot. Using Z scores of Skewness and Kurtosis. Inferential Statistics Definition. Null & Alternative Hypotheses. Parametric & Non Parametric Tests. P Value, Alpha and Beta errors. T tests and ANOVA. Wilcoxon Signed Rank / Mann Whitney U Test. Kruskal Wallis Test. Chi Square Test with steps in calculation. 26 32 34 x
07. 08. 09. Sampling and Sample Size Calculation Probability & Non-Probability Sampling methods. Random Sampling Simple, Systematic & Stratified. Multistage Sampling. Cluster Sampling. Various non probability sampling methods. Sample Size Calculation for various study designs. Calculation using EpiInfo (Step by Step screen shots). Linear Regression Analysis Simple Linear Regression. Steps in Data Entry for Medical Researchers Data Cleansing, Coding Of Data. Entering Data Using Various Methods. Editing the Variable Characteristics in Excel. Importing Data to SPSS. Editing the Variable Characteristics in SPSS. Converting Continuous data to Categorical Data in SPSS. Computing Variables in SPSS. 46 63 66 xi
10. Descriptive Statistics Of Continuous Variables Using SPSS. Different Methods in Descriptive Statistics. Test of Normality Shapiro Wilk Test & Q-Q Plot Test. Other Methods of Test of Normality. 89 11. 12. 13. 14. Descriptive Statistics Of Categorical Variables Using SPSS Study Of Association Between Two Categorical Variables - Chi Square Test Using SPSS. Pearson s Chi Square Value. Interpretation of findings and Fisher s Exact test. Comparison Between Two Or More Means using SPSS One Sample T Test, Independent Samples T - Test. Paired T Test. One way ANOVA Test. Non Parametric Tests Using SPSS Mann Whitney U Test. Wilcoxon Signed Rank Test. Kruskal Wallis Test. 102 105 110 131 xii
15. 16. Correlation Analysis using SPSS Types of Correlation Analysis. How to do Correlation Analysis. Drawing Scatter Diagram with line of fit. Linear Regression Analysis Using SPSS Simple Linear Regression, Durbin Watson Statistics. Interpretation Of Results. 144 157 xiii
xiv
Biostatistics In A Nut Shell For Medical Researchers PART 1 ESSENTIAL BIOSTATISTICS 1
Dr Sandheep Sugathan & Dr Benny Pv CHAPTER 1 INTRODUCTION TO MEDICAL STATISTICS Statistics is the science of collection, compilation, analysis, interpretation and presentation of data. The application of the principles of statistics in medical research is known as BIOSTATISTICS or MEDICAL STATISTICS. BIOSTATISTICS includes the scientific methods of collecting, processing, reducing, presenting, analyzing and interpreting health related data, and of making inferences and drawing conclusions from numerical data. It is a major tool in epidemiology. MAIN USES OF STATISTICAL METHODS are a. Helps to improve the quality of medical research by helping in Critically appraising the available literature / evidence. Selection of appropriate study design. Appropriate Sampling methods help to get a representative sample from the sampling frame. Calculation of adequate sample size. 2
Biostatistics In A Nut Shell For Medical Researchers Constructing a questionnaire ensuring reliability and validity. b. Description of data by using descriptive statistics c. Presentation of data using tables and graphs; d. Analyzing data and drawing conclusions from such analysis: this involves analytical methods using inferential statistics. e. Calculating rates, ratios and proportions of health data to make it more meaningful and concise. ********* 3
Dr Sandheep Sugathan & Dr Benny Pv CHAPTER 2 DATA AND SCALES OF MEASUREMENTS DATA Data consists of discrete observations of events collected from health care systems or institutions. Data need to be transformed to INFORMATION by reducing them, summarizing them and adjusting them for variations such as the age and sex composition of the population so that comparisons over time and place are possible. TYPES OF DATA Primary data is collected directly from the patient, by interview method, medical examination, laboratory techniques or a combination of these. It is collected by the investigator conducting the research. Secondary data is collected from the records which is already collected and stored. Secondary data, is collected by someone other than the researcher. SOURCES OF PRIMARY DATA IN RESEARCH A. Population surveys includes surveys related to any aspect of health such as morbidity, mortality, nutritional status, etc. It can be Surveys for evaluating the health status of a population 4
Biostatistics In A Nut Shell For Medical Researchers done for community diagnosis and community need assessment. Surveys for investigation of factors affecting health and disease environment, occupation, income, clinical manifestation, risk factors etc. Surveys related to health services Use of health services, health needs of the community. B. Collection of data directly from patients Health interview methods face to face interview with the patients regarding the factors related to health. Health examination survey is carried out by doctors, technicians and interviewers. It provides more valid information, but is more expensive. Health records survey based on health records A combination of health interview and health examination is considered as the best method of survey. SOURCES OF SECONDARY DATA are 1. Census data is the collection, compiling and publishing the socio economic data (educational status, occupational status and income) and demographic data (age, sex, marital status, urban and rural distribution) of the total population of a country or area. Census data is collected in most of the countries at an interval of 10 years. Data is so vast that there is so much delay in publishing the census data. 5
Dr Sandheep Sugathan & Dr Benny Pv 2. Registration of vital events such as live births, deaths, marriage, divorces, foetal deaths, adoptions etc. This data can be obtained from the department of statistics. 3. Data based on Notification of diseases. Notification means informing the higher centres about the details of the patients attending the lower centres with specific diseases. Many diseases such as malaria, tuberculosis, HIV - AIDS are notifiable. Internationally notifiable diseases are Plague, Yellow fever, Cholera and Polio. Information about notifiable diseases are available from Ministry of Health website and World health Organization website. 4. Hospital records is the most common source of secondary data for student research. Data can be either numerical data or categorical data. DATA ARE CLASSIFIED as numerical if they are expressed in numbers. Numerical data can be either discrete data or continuous data. Discrete data is counted, but continuous data is measured. Discrete Data can only take certain values. Following are examples number of diabetic patients in a hospital during different months no. of doctors in different cities in Malaysia 6
Biostatistics In A Nut Shell For Medical Researchers no. of hospital beds in different hospitals in a district Continuous variables are those which are measured and presented on a continuous scale. (eg:- Height, Weight, Blood sugar values, Hemoglobin values). Decimal values are possible. Continuous variables are otherwise known as quantitative variables. Categorical data are arranged as different categories. These are otherwise known as qualitative variables. SCALES OF MEASUREMENTS OF DATA There are 4 different scales of measurements of data - Nominal, Ordinal, Interval or Ratio type. Categorical data can be of two different types - Nominal or Ordinal types. Continuous data can be - Interval or Ratio types. Nominal data refers to categorical data such as gender or marital status or race. In nominal data, the categories cannot be ordered one above another. They can be arranged and presented in any order. Ordinal data refers to quantities that have a natural ordering. Data can be ranked. Examples of ordinal data are the grading of pain (mild, moderate, severe), or the staging of tumors (first stage, second stage, third stage, fourth stage), the order of runners finishing a race. With ordinal data you cannot state with certainty whether the intervals between each value are equal. Likert scale 7
Dr Sandheep Sugathan & Dr Benny Pv of agreement (Strongly disagree, Disagree, Neutral, Agree and Strongly agree) is an example of ordinal data. Interval scale of data is like ordinal data except that we can say that the intervals between each value are equal. The difference between 29 & 30 degrees Fahrenheit is the same magnitude as the difference between 29 & 28 degrees. Characteristics of interval scale is that - Order is important - Interval between each variables are the same Ratio scale of data is one in which different observations can be expressed in ratios of one another. Height, Weight and Length are examples of ratio data. Weight or height of one person can be expressed as a ratio of another (eg: double, half, one third etc). Continuous variables which start from an absolute zero value are examples of ratio data. In other terms, ratio data is interval data with an absolute zero value. ********* 8
Biostatistics In A Nut Shell For Medical Researchers CHAPTER 3 DESCRIPTIVE STATISTICS Descriptive statistics includes the description of data which includes measures of central tendency of data, measures of dispersion of data and presentation of data as tables, diagrams and proportions. COMPONENTS OF DESCRIPTIVE STATISTICS are Calculation of central tendency or Central value Calculation of variance of each value from the central value (Calculation of measures of dispersion or measure of spread) Presentation of data as tables and diagrams Distribution of data Tests of Normality of data MEASURES OF CENTRAL TENDENCY OR CENTRAL VALUE Measures of central tendency or a measure of central value is the central value or the value around which the other values are distributed. It can be mean, median or mode. Mean or Arithmetic Mean or Average is calculated by adding all the individual observations together and then dividing by the total number of observations. Mean is affected by extreme values (very high and very low 9
Dr Sandheep Sugathan & Dr Benny Pv values). Mean is the preferred measure of central value when there are no extreme values or when data is normally distributed. Median is a measure of central tendency. For calculating median, data is first arranged in ascending or descending order of magnitude and the value of the middle observation is located. If the no. of observations are even number, average of two numbers which come in the middle is taken. Median is least affected by extreme values. So it is the accepted measure of central tendency when extreme values are present or when data is not normally distributed. The data set is arranged in ascending or descending order. If N is the total number of observations, Median = (N+1) x 0.5 or (N+1) / 2 Mode is the most commonly occurring value in a distribution of data. 10
Biostatistics In A Nut Shell For Medical Researchers Get Complete Book At Educreation Store www.educreation.in 11
Dr. SANDHEEP SUGATHAN Dr Sandheep Sugathan had acquired his MBBS, MD and DPH from Govt. Medical College, Trivandrum, India and currently is an academic faculty member handling Epidemiology, Research Methodology and Biostatistics in University Kuala Lumpur Royal College of Medicine Perak, Malaysia. He is training medical researchers from various specialities in proposal development, sample size determination, data management, systematic reviews and journal writing. He is well experienced in systematic reviews and meta analysis. He is author of many peer reviewed research publications and member in editorial board of many journals. Dr Sandheep Sugathan got many trainings in Biostatistics and Research Methodology from University Science Malaysia and Christian Medical College, Vellore, India. Dr. BENNY PV Dr Benny PV had acquired his MBBS, MD and DPH from Govt. Medical College, Trivandrum, India and is currently the Professor of Community Medicine and Epidemiology in Sree Gokulam Medical College, Kerala, India. He is a consultant in Revised National Tuberculosis Control Program and HIV AIDS to Govt. of Kerala, India. He is the Chief Editor of an International Journal (International Journal of Preventive and Therapeutic Medicine). He is currently involved in many clinical and epidemiological research. Dr Benny is Secretary and Treasurer of Indian Medical Association s popular public magazine Nammude Arogyam. Also available as an ebook ACADEMIC EDUCREATION PUBLISHING (Delhi) www.educreation.in