Statistical Tools in Biology

Research Methodology Design protocol/procedure. (2 types) Cross sectional study comparing two different grps. e.g, comparing LDL levels between athletes and couch potatoes. Easier and cheaper to do. Longitudinal study prospective study follow a grp. Throughout the study; perhaps follow the grp. for yrs. Expensive and detailed. Sample size (n) is important; larger the better Control grps those variables held constant (same) for all subjects being tested.

Statistics A mathematical tool used for collecting, analyzing, and interpreting numerical data e.g. determining the effects of crude oil on migratory bird populations in the Gulf of Mexico

Reliability Key words a measure of accuracy, dependability, and consistency e.g. Reliability of measuring devices or experimental procedures. reproducibility of data

Mean The average value. It simplifies a data set so that one value represents a given pop. Mean Obesity Trends in the United States The prevalence of obesity increased dramatically during the past 30 years. Although the prevalence may have stabilized, it remains high. More than one-third of U.S. adults and about 17 percent of children are now obese. Centers for Disease Control and Prevention.

Median A number located exactly in the middle of a set of numbers. Eg. If your grade is above the median, then you know you are in the top 50% of your class.

Mode The value that occurs with the greatest frequency, ie. the most common or most popular eg. Volvo car manufacturer may want to know the mode when surveying the populations favorite car color.

EXAMPLE DATA:! results of a 5-point quiz given to 13 students! Quiz Score! Frequency! (number of students)! 5! 5! 4! 1! 3! 2! 2! 1! 1! 2! 0! 2! Find the:!a) Median!B) Mode!!C) Mean!

A Histogram representing the median, mode, & mean! For a 5 point quiz!

Scatterplots Diagrams which represent two measurements per subject on a pair of axes. Good way to show a relationship between two variables. Shows if there is a pattern among the plots, then the data is a good model (predictor)

Figure 9. Illustration of scatter plots with various properties: (a) 'shotgun' scatter, with low correlation, (b) strong positive correlation, (c) strong negative correlation, (d) and (e) low correlation, with very little change in one variable compared with the other, (f) this scatter would generate a spurious high correlation because of the effect of the five points enclosed by the shaded area

Question:! 1) Which diagram(s) above are good models to use as predictors for the data? How do you know?

Regression line or line of best fit is a straight line drawn through the points in a scatterplot such that equal numbers of plots lie above & below the line in equal distance. Regression lines give one the ability to predict the values of one measurement when given the value of the other for a particular population.

Question:! 2) Why does the last scatterplot not have a regression line drawn?

Questions:! 3) Is this scatterplot a good model to predict the number of push-ups given the number of sit-ups?!

Slope of the regression line: using the slope of the regression line one can predict the value of one measurement when given the known value of the other. eg. Y = mx + b ; where m =Δy/Δx So if, y = 0.684X + 1.746 shows a slope of 0.68 which means that m = 68/100 or 17/25, i.e. the men at LJHS complete : 17 push-ups for every 25 sit-ups, or 0.68 push-ups per 1 sit-up

Correlation coefficient (r-value) a statistical tool used to determine the fitness or relationship between two variables. It measures strength and direction. i.e. is there an association between the number of sit-ups completed in one minute and the number of push-ups completed in one minute for LJHS men? Note: an r-value > 0.5 closer to 1.0 indicates a good fit, i.e, there is a strong positive correlation between the two variables.

Questions:! 4) Is there a correlation between sit-ups completed in one minute and the number of push-ups completed in one minute for LJHS men?

Standard Deviation a measure of dispersion (spread) relative to the mean. It quantifies how the scores are distributed about the mean. It is used to estimate how much the individual measurements in a set of data deviate from the mean of the set. i.e. a large SD; greater dispersion around the mean

Both error plots and box plots can be used to compare different samples or populations. A chart can include several error plots or box plots, and these allow the user to make an instant comparison between the averages and variabilities of different datasets. The degree of overlap between variabilties is an important initial indicator of the likelihood that differences in means or medians are meaningful, an assessment that can then be tested more rigorously using the appropriate test.