Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Similar documents
Understandable Statistics

Chapter 1: Exploring Data

Business Statistics Probability

Unit 1 Exploring and Understanding Data

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Simple Linear Regression the model, estimation and testing

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

2.4.1 STA-O Assessment 2

Undertaking statistical analysis of

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

SCATTER PLOTS AND TREND LINES

Introduction to Statistical Data Analysis I

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Still important ideas

Before we get started:

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

STATISTICS AND RESEARCH DESIGN

PRINCIPLES OF STATISTICS

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Still important ideas

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Chapter 1: Explaining Behavior

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1)

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Statistical Methods Exam I Review

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0

HS Exam 1 -- March 9, 2006

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

One-Way Independent ANOVA

Bangor University Laboratory Exercise 1, June 2008

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Pitfalls in Linear Regression Analysis

bivariate analysis: The statistical analysis of the relationship between two variables.

Six Sigma Glossary Lean 6 Society

Intro to SPSS. Using SPSS through WebFAS

Psychology Research Process

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments

IAPT: Regression. Regression analyses

STATISTICS INFORMED DECISIONS USING DATA

Two-Way Independent ANOVA

CHAPTER ONE CORRELATION

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

UNIT V: Analysis of Non-numerical and Numerical Data SWK 330 Kimberly Baker-Abrams. In qualitative research: Grounded Theory

Ecological Statistics

Measuring the User Experience

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Data Analysis with SPSS

CHAPTER 3 Describing Relationships

Appendix B Statistical Methods

Chapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc.

12.1 Inference for Linear Regression. Introduction

Choosing a Significance Test. Student Resource Sheet

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Day 11: Measures of Association and ANOVA

STATISTICS & PROBABILITY

ANOVA in SPSS (Practical)

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

Probability and Statistics. Chapter 1

CHAPTER TWO REGRESSION

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Making Inferences from Experiments

Psychology Research Process

Statistical techniques to evaluate the agreement degree of medicine measurements

Political Science 15, Winter 2014 Final Review

Chapter 3: Describing Relationships

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Types of Statistics. Censored data. Files for today (June 27) Lecture and Homework INTRODUCTION TO BIOSTATISTICS. Today s Outline

Knowledge discovery tools 381

Unit outcomes. Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2018 Creative Commons Attribution 4.0.

Unit outcomes. Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2018 Creative Commons Attribution 4.0.

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

Missy Wittenzellner Big Brother Big Sister Project

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

CHILD HEALTH AND DEVELOPMENT STUDY

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

Experimental Psychology

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

AP Statistics. Semester One Review Part 1 Chapters 1-5

Examining Relationships Least-squares regression. Sections 2.3

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

9 research designs likely for PSYC 2100

Transcription:

Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social processes in a systematic way of thinking based on proof derived from observation and experiment. It all started from Aristotle who was the founder of science. The first step is to observe the problem, or rather a perspective from scientific way. Then a hypothesis is made to ensure that the observation is consistent with a temporary description. Predictions are made by using the hypothesis. Experiments are based on these predictions to further observe the problem and change the hypothesis if the results are needed to be altered. If there is no immediate conclusion between the hypothesis and observation, then predictions are needed to be created again by using the hypothesis and modify the hypothesis if required. When there is consistency shown in the experiment that governs the hypothesis, then it can be concluded that results can be repeatable and theories can be falsified as well. Stages in the research problem are: (1) problems, (2) theories, (3) criticism, and (4) new problems. Statistics is an important tool in research, where the statistical results can show whether the results are significant or not, and what the numerical values mean. Statistical can also show historical trend of the subject and the statistics is more about concepts, where the statistical analysis can be highly represented as key answers to the important questions. To statistically test the results, main softwares that are used to analyze statistically are Excel, SPSS, and R, to name a few. While all these tools contribute to the statistical analysis, it is important to note when and why one software is used, but not another, because each software has it both pros and cons, such as whether the software is user friendly or it has all analysis tools.

Sampling is one example in statistical analysis. Sampling determines the whole population when there is not everything in one population, which is the concept of inference. For example, when we conduct observation and experiment on soil composition and its ecosystem, we cannot obtain all the soils and its components. Therefore, we take samples of soil and experiment/observe in a laboratory setting, for example. 2. Basic concepts in statistics Basic concepts in statistics are mean, variance, standard deviation, error, distributions, and degrees of freedom: Mean: average of values. Basically, total of values divided number of samples. Variance: difference between highest and lowest value in the sample. Also referred as dispersion. Standard deviation: how extent are the values dispersed from the mean. Error: a concept of variation, where the statistics do not comply to the real world (there are differences in the real world) Distribution: Normal distribution symmetric of mean value, which can also include mode and median. Standard deviation is included to note the inflection point of curve which points away from the mean. Examples include standard scores on a test in one class and physiological characteristics like those used in the pharmacy. Degrees of freedom: how probabilities are allocated on events. (=probability distribution) When it comes to sampling, we do not have any data on the population (i.e. mean and standard deviation). Therefore, when we take sample from the whole population, we do the statistical analysis to gain mean and standard deviation for sample. If we take multiple samples from the whole population, then we observe different patterns in the mean and standard deviation. Here, normal distribution is useful when we want to know the mean of the whole population. In normal distribution, there is usually variation, depending how the graph is represented.

3. T-test and ANOVA It is important to come up with hypotheses to test the experiment to see whether the goal of study is true of its aim. Null hypothesis (H0) is used when the theory is thought to be true or that because it is used as a question to point to the experiment. The alternative hypothesis (H1) is a statistical statement that asks the experiment in general way. Therefore, null hypothesis is the opposite of alternative hypothesis. While applying the hypothesis to the experiment, it is important to assume that there is normality, size, and whether the samples are randomly selected. T-test is used to compare the variances and test whether these variances are equal or not. H0 hypothesis considers variances to be equal while H1 hypothesis considers variances to be not equal (Levene s test). When it comes to results, rather than considering variances, means are checked as well, H0 refers to equal means while H1 refers to different means (F-Fischer s test). P-value comes to be the key factor while looking at the significance level of the test. When the p-value is greater than 0.1, it supports H0. When the value is over 0.05 but when equal or greater than 0.1, it still supports H0. However, when the value is equal or greater than 0.01 and equal or less than 0.05, this does not support H0, therefore, H0 is dismissed and H1 is accepted as result is more statistically significant. If value is equal or greater than 0.001 but equal or less than 0.01, the result is same as previous, but is more significant than the previous. Finally when the value is equal or less than 0.001, H1 is accepted and the result is very statistically significant. T-test has three different types: one sample t-test, independent sample t-test, and paired samples t-test. One sample t-test calculates whether a single variable mean is different from a specified variable. Independent sample t-test is when the two case groups are compared. It is best when the test samples are randomly selected to two groups. For paired sample t-test, two variable means are compared for one group.

When we compare 3 or more different samples, we use one-way ANOVA test. Means of replicated experiments are compared where one input is different at different settings/levels. It is important to find out about the proportion of variability because of different factors. When H0 is not accepted, then it means that variation of output is different between levels, not because of random error. It also means that there is variance in the different level outputs when it is significant and it is important to note where the actual variances are when determining levels. 4. Basics of modeling: simple regression It is important to consider parsimony and simplicity while constructing a model. Parsimony means that the variables and parameters should not be included in the model when they are needed. It is ideal to keep the model as simple as possible, as unnecessary factors just make the model more complex and create more problems. Linear regression model, as the name suggests, is a model that compares between variables (i.e. dependent vs independent variable) in straight line form. When it is a simple linear regression, it has only variable, while multiple linear regression has more than one variable. Linear regression model assumes that there is a mean of 0 in the normal distribution for the error, error variance is constant in the model but independent of variables, and error value, same as error variance, is independent of variable values as well as error values. To determine the relationship strength between the model and the dependent variable, R2 (R is called Pearson correlation coefficient) is used. Here, the ANOVA table is used to determine if there is any difference in dependent variable, although this is just an indirect way to determine the relationship strength. When the regression and residual sums of squares are almost identical, as shown in the case when analyzed using the R Studio software, then this means that only about half of the variation is explained by the (ANOVA) model.

Representativity (sample size, population), normality (model should follow a normal distribution), independency of errors (randomly collected), and homoscedasticity (variance equality between data set) are key elements to be considered while creating a simple regression model. 5. Advanced models: alternatives to simple regression It is important to choose an appropriate model based on the study purpose and what information we are looking for. It is assumed that constant variance, normal errors, and independent errors are included in the behavior of the response variable. Response variable (continuous measurements i.e. weight, height; other data examples include count and proportion) is transformed and even coupled there is one or more explanatory variables. When it is transformed, it means that the formula Y = B0 + B1X is altered. For example, functions shown below are transformed to make the model to become linear. It is important to consider to look for substitutes for transformation so models can be created effectively. It is crucial to note that response variables all do not consider assumptions into constancy of variance and normality of errors. Other than transformations, the other alternatives to simple regression models are fitting linear model, different linear model, and non-linear models, to name a few. For the model to be more effectively and efficiently represented for the results that are needed for the analysis, it may be

wise to remove outliers, find the best model that includes extra variable, and methods dealing with multicollinearity. 6. Validating of models Models need to be validated to ensure that the models are correct and there are is no bias or misleading statistical analyzed results. When the model is created, we need to ask whether the model works outside the study area. For example, if the model is best fitted for one subject in one region, will it be best fitted in another region as well? Here, R is one script/tool that is used to validate the model. After inputting data set to build a model, it is necessary to check model results to see whether the model is applicable for the modeling data set and whether the model makes sense. R2, p-values, and residuals should be checked. RMSE (root mean squared error) and bias are some measure that are commonly used when validating models. RMSE determines how the measured and modeled values are scatted in the model, which refers to precision of the model. Bias is how the average level is different between the modeled value and measured value, which refers to accuracy of the model. Below are the formulas for RMSE and BIAS. The top functions are absolute while the below functions with percentage are relative values.

7. Presentation and interpretation of scientific results Note: This section is highly related section 1 (Research process) so I am going to answer this question from a different view: from the perspective of R. R is one way that presents and interprets scientific results through coding. Basic statistical diagrams can be created via R, such as scatter plots, boxplots, and histograms. As provided in the lecture notes, this package called ggplot2 (http://r4stats.com/examples/graphics-ggplot2/) provides graphs with colors, so it is much more convenient to interpret the graph, especially when there are many plots. Range for y-axis can also be modified, for example. Other options like fit lines can be plotted, and axis labels and ranges can be added into the plot. Within ggplot2 application, bar plots and pie charts can also be made. It is also possible visualize more than one distribution in one graph, as shown below (graph taken from course notes). Other than the statistical analysis (i.e. plots, charts), when there are spatial and temporal data included in the analysis, then the data can be analyzed and interpreted in the GIS tool. Raster data can be analyzed through different tools within the GIS software, such as resampling of resolution and reclassification. Another tool is Ggmap, where the maps can be created by adding additional plots into existing maps, as shown below (from course material):

8. Qualitative research & surveys It is important to conclude observations and survey different study subjects and research based on these. Count data is one example that can be part of the qualitative research. Counts are based on frequencies, which is events are counted to see how many times have they occurred. Number of dead trees after a deadly storm, number of people visiting a website per day, and number of microorganisms on a leaf are all examples when it comes to count data. When performing statistical analysis, answers such as how the occurrence of dead trees in one part have effect on other areas with dead trees (which can be a response variable). Proportional data is another type of data that is defined as qualitative. Unlike count data, here, we look at two or more variables. In the case of number of dead trees after a deadly storm, we include surviving trees as well. Other examples include infection rates of different diseases and emissions of different gases. In statistical analysis, percentage is used as a response variable. 9. GIS tools (and Remote Sensing) When it comes to applying GIS into incorporating with statistical analysis and models, data needs to be collected first, as mentioned in the previous section on qualitative research & surveys. GIS is an important tool when it comes to decision making. Monitoring, field survey, and remote sensing are all key areas when it comes to applying into GIS.

When looking at the map/model created by the GIS, there are three different model types: descriptive, predictive, and prescriptive. Descriptive model looks for patterns, processes, and spatial/temporal interaction i.e. distribution of insects. Predictive model, as the name suggest, gives simulations, as well as providing stochastic model and scenario analysis i.e. global temperature change. Prescriptive model is to ask what is the best step, also known as optimization ie. Best option to manage prevention of insect disease/pest spread. One example is the study of snow and wind damage. Plots of past years are collected and input into the ArcGIS for interpretation of the data. Number of affected and unaffected trees that are affected by snow and wind are also included in the analysis, as well as wind/snow data, forest type, and management variables at stand level (i.e. basal area, diameter, and height). Frequency and variation according to time, spatial patterns, and average damage on trees are studied. By using all of these variables, wind and snow spatial patterns can be mapped out using different categories of different numerical values (high low) so it can be easier to visualize the model when there are not so many numerical value categories so the model cannot be confused. When the numbers are needed to be analyzed further, then the attribute table within the ArcGIS tool can be used to examine, by using the summary, or through exporting the file to be used in other softwares, such as Excel. In general, depending on what the study needs to be analyzed, it can be vector (2D) or raster (3D). Raster usually includes remote sensing tools such that aerial photos and variables that are not entirely horizontally, such as data on wind and snow are usually rasterized (along with elevation, topography, etc. if these need to be included into the model.