Lecture 1 An introduction to statistics in Ichthyology and Fisheries Science

Similar documents
Statistics for Psychology

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

Slide 1 - Introduction to Statistics Tutorial: An Overview Slide notes

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Introduction to Statistics

9 research designs likely for PSYC 2100

1. Introduction a. Meaning and Role of Statistics b. Descriptive and inferential Statistics c. Variable and Measurement Scales

Unit 1 Exploring and Understanding Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Statistics Mathematics 243

NORTH SOUTH UNIVERSITY TUTORIAL 1

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Chapter 7: Descriptive Statistics

Reflection Questions for Math 58B

Distributions and Samples. Clicker Question. Review

Estimation. Preliminary: the Normal distribution

Section 1: Exploring Data

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

The random variable must be a numeric measure resulting from the outcome of a random experiment.

VU Biostatistics and Experimental Design PLA.216

Choosing the correct statistical test in research

EXPERIMENTAL DESIGN Page 1 of 11. relationships between certain events in the environment and the occurrence of particular

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4

MODULE 2 FOUNDATIONAL DEFINITIONS

The research process is a fascinating method for increasing the understanding of our world.

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

REVIEW FOR THE PREVIOUS LECTURE

Chapter 8 Estimating with Confidence

Still important ideas

Variability. After reading this chapter, you should be able to do the following:

Basic Statistical Concepts, Research Design, & Notation

Ch. 1 Collecting and Displaying Data

PHP2500: Introduction to Biostatistics. Lecture III: Introduction to Probability

factors that a ect data e.g. study performance of college students taking a statistics course variables include

MTH 225: Introductory Statistics

Descriptive Statistics Lecture

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

Essential Question: How do we incorporate good experimental design in investigations? Experiments

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 3: Examining Relationships

Statistics Coursework Free Sample. Statistics Coursework

Still important ideas

in explaning the result of research studies In planning and decision making are supported by data

Section B. Variability in the Normal Distribution: Calculating Normal Scores

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Chapter 1: Exploring Data

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Outcome Measure Considerations for Clinical Trials Reporting on ClinicalTrials.gov

Vocabulary. Bias. Blinding. Block. Cluster sample

Chapter 02 Developing and Evaluating Theories of Behavior

Introduction to biostatistics & Levels of measurement

Statistics and Probability

Activity: Smart Guessing

Sta 309 (Statistics And Probability for Engineers)

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Chapter 1 Data Types and Data Collection. Brian Habing Department of Statistics University of South Carolina. Outline

Study Design STUDY DESIGN CASE SERIES AND CROSS-SECTIONAL STUDY DESIGN

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics

Biostatistics for Med Students. Lecture 1

Lesson 8 Descriptive Statistics: Measures of Central Tendency and Dispersion

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

Theory. = an explanation using an integrated set of principles that organizes observations and predicts behaviors or events.

Review for Final Exam

Psychology Research Process

Statistics: Interpreting Data and Making Predictions. Interpreting Data 1/50

RNA-seq. Differential analysis

Advanced/Advanced Subsidiary. You must have: Mathematical Formulae and Statistical Tables (Blue)

Scientific Inquiry Section 1: Length & Measurement ruler or meter stick: equipment used in the lab to measure length in millimeters, centimeters or

Psychology Research Process

Quantitative Literacy: Thinking Between the Lines

STATISTICS AND RESEARCH DESIGN

Welcome back to Science Junior Science. Easy to read Version

Business Statistics Probability

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

Types of data and how they can be analysed

Measurement and meaningfulness in Decision Modeling

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

A LEVER PLAYING FIELD 2: NEVER SAY LEVER

Measuring the User Experience

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Chapter 3. Producing Data

POLI 343 Introduction to Political Research

Lecture 4: Research Approaches

UNIVERSITY OF THE FREE STATE DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS CSIS6813 MODULE TEST 2

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet

Running head: How large denominators are leading to large errors 1

Research Questions, Variables, and Hypotheses: Part 1. Overview. Research Questions RCS /2/04

Gene Combo SUMMARY KEY CONCEPTS AND PROCESS SKILLS KEY VOCABULARY ACTIVITY OVERVIEW. Teacher s Guide I O N I G AT I N V E S T D-65

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Statistics Lecture 4. Lecture objectives - Definitions of statistics. -Types of statistics -Nature of data -Levels of measurement

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

Transcription:

Lecture 1 An introduction to statistics in Ichthyology and Fisheries Science What is statistics and why do we need it? Statistics attempts to make inferences about unknown values that are common to a population based on information gleaned from a subset of that population - a sample. These inferences are phrased in two ways; either as estimates of the respective values such as central tendency, spread and confidence intervals, or as tests of hypotheses about their real values. Is statistics necessary? When is statistics used? Question The heights of Rhodes Zoology 3 students are measured and assumed to be representative of the Rhodes student community. These measurements are compared against a sample from the Zoology 3 class that is assumed to be representative of the UCT student body. Are Rhodes students taller than UCT students? A researcher wants to estimate fish abundance in a saltmarsh and see if abundance changes over time. The saltmarsh is block netted at spring-high tide and all the fish collected when it drains at low-tide. This experiment is conducted for each austral season. A researcher is sampling a river and measuring fish to see if fish length changes over time. A researcher is assessing three different diets to see which one provided optimum growth in a particular species of fish. The weights of male and female Ichthyology 3 students are measured to see if males are heavier than females. Answer Yes, statistics is necessary. Only samples of the student bodies are taken and not the entire population. No, statistics is not necessary. The whole saltmarsh population was sampled on each sampling occasion. Yes, statistics is needed. The researcher will never know what the average length of the fish population is and has to use the samples to draw inference about it. Yes, statistics is needed. The researcher will be using samples of fish and cannot realistically test the entire fish population. No, statistics is not necessary. The whole class. If the averages differed by even 0.0001g there was a difference! 1

Data and data generation Data (or singular datum) that we observe come from somewhere be they heights, weights, sizes etc. There are four main categories of data. Categorical: Theses data are simply non-numerical categories without a specific order. Examples include; sexes (male and female) or colours (red, green or blue). Ordinal: These data are a special category of data that are not numeric but have some underlying ordering. Examples are good, average, bad or high, middle, low. Continuous: These are numerical data that can take a decimal in other words decimal data. Examples are; height, weight, distance etc. The decimal fraction can simply be related to the accuracy of the measuring instrument therefore if a measuring board in centimeters only then if a fish was 151 mm then a measurement would say 15 cm and not necessarily 15.1 cm. Discrete: These are integer or count data. Examples are; number of offspring, incidents of fish aggression to a conspecific, the number of spines in a fish s dorsal fin, or the number of growth zones counted from a otolith. These data come from somewhere from some form of data generating function, a black box that takes some known values such as a population average and spread, and then generates observations randomly. If all the population values are known then these values will be known, but as we usually take a subset of the values in the form of a sample then we can make educated guesses only. This function, the data generating function, can be approximated by a mathematical function. In short, a mathematical model of the data generating process. Properties of data generating functions: All data generated from them have a common central tendency (or average) and a common spread (or variance) All of the probabilities of different outcomes must add to one. A data generating function is also known as a probability distribution function or pdf for short. Identifying the data generating function and then trying to estimate its parameters is what we are trying to achieve in statistics. It takes time and practice but it can be done.

The most common data generating function, and the only one examined in this course, is the Normal distribution. The Normal distribution is arguably the most important function. Not only do most features of the natural world tend to follow a normal distribution, but there is good theoretical reason to expect that many things would (see The Central Limit Theorem later). As a result, most traditional statistical tests were designed to work with data that follows a normal distribution. It is the infamous "bellshaped curve." Frequency 3.5 3.0.5.0 1.5 1.0 n 10 0.5 Frequency 0.0 4 6 8 10 7 6 5 4 3 1 n 5 Frequency 0 4 6 8 10 1 10 n 50 8 6 4 0 4 6 8 10 Value Figure 1.1: Three examples of samples of different size from a common Normal distribution. The data have all be generated randomly with a common average of 5 and a common spread of 1. If an extremely large number of observations were collected in a sample, say for instance 100000, then the sample would look like the line superimposed on top of each histogram. 3

4

Populations and Samples As mentioned earlier, when we wish to infer something about a phenomenon in nature, we normally cannot measure every instance of that phenomenon, but instead are limited to a few cases. As a result, we are left trying to infer about the whole from a relatively small subset the sample. Populations vs samples In statistical terms, we refer to the whole group of individuals about which inferences are to be made as the population. Starting any investigation requires us to know if we are sampling the population or collecting all data about the population. We assume that we only have a sample. DO NOT CONFUSE THE TWO POPULATION WORLD SAMPLE WORLD Y1 Y Data generator ( and ) Y3 Y4... Yn Y s 5

Parameters vs estimates Similarly, for every value which we want to estimate has a true value in the population, which we call a parameter. Our best guess of the value that parameter actually takes which we get from our sample is called a statistic. Therefore: Populations Parameters UNOBSERVED VALUES COMMON TO EACH DATUM Samples Statistics DATUM OBSERVED VALUES COMMON TO EACH Following common practice - Greek letters (because it is Greek to us) are used to represent parameters, and roman letters to represent sample statistics. Population parameters Sample statistics Mean y Variance s Standard deviation s The problem of sampling If we take a sample from a population, unless we measure every member of that population, the sample will not exactly have the same properties as the population. This deviation of the sample statistic from the parameter is called sampling error. A very basic concept in statistics is that a smaller sample will, on average, have more sampling error than a larger sample. It is easy to see why this is even if the argument is taken to an extreme. A sample of only one individual is obviously different from the population, in the sense that every individual is different from the mean in some respects. Therefore, a very small sample will usually deviate substantially from the population. In contrast, if we take an extremely large sample, the sample will almost recreate the population, and the sample statistic is unlikely to vary much from the population parameter. This difference between a sample of size one and a huge sample holds true for less extreme cases as well. 6

Properties of a good sample For example, sampling fish 10 000 times from a single dam to ask questions about the South Africa ichthyofauna fauna is like. Each dam has its own attributes, which cause the samples to be more similar than they would otherwise be, if they were randomly and independently chosen. Lastly, a sufficiently large number of fish should be sampled to be answer questions more precisely. Independence The members of the sample should be independent of one another. In other words, the inclusion of one individual in the sample should not change the probability of choosing another particular individual. Random The members of the sample should be a fair representation of all of the population, not just a biased subset. Sufficiently Large As we learned above, larger samples are more reliable, because the sampling error is likely to be much less. As a result, samples must be sufficiently large to be useful. 7

8