On Regression Analysis Using Bivariate Extreme Ranked Set Sampling

Similar documents
Unit 1 Exploring and Understanding Data

MEA DISCUSSION PAPERS

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

NORTH SOUTH UNIVERSITY TUTORIAL 2

Score Tests of Normality in Bivariate Probit Models

Chapter 1: Exploring Data

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS

An Introduction to Bayesian Statistics

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

bivariate analysis: The statistical analysis of the relationship between two variables.

6. Unusual and Influential Data

12.1 Inference for Linear Regression. Introduction

Simple Linear Regression the model, estimation and testing

A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model

Multiple Regression Analysis

Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

An informal analysis of multilevel variance

14. Linear Mixed-Effects Models for Data from Split-Plot Experiments

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

SCATTER PLOTS AND TREND LINES

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

EC352 Econometric Methods: Week 07

Chapter 1: Explaining Behavior

Empirical assessment of univariate and bivariate meta-analyses for comparing the accuracy of diagnostic tests

Instrumental Variables Estimation: An Introduction

Small-area estimation of mental illness prevalence for schools

REGRESSION MODELLING IN PREDICTING MILK PRODUCTION DEPENDING ON DAIRY BOVINE LIVESTOCK

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3

The Statistical Analysis of Failure Time Data

10. LINEAR REGRESSION AND CORRELATION

Comparison of Some Almost Unbiased Ratio Estimators

Example 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics)

Understanding the cluster randomised crossover design: a graphical illustration of the components of variation and a sample size tutorial

A re-randomisation design for clinical trials

Statistical Tests of Agreement Based on Non-Standard Data

Decomposition of the Genotypic Value

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Previously, when making inferences about the population mean,, we were assuming the following simple conditions:

Class 7 Everything is Related

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

Chapter 14: More Powerful Statistical Methods

Chapter 02. Basic Research Methodology

Understandable Statistics

Examining Relationships Least-squares regression. Sections 2.3

12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2

The Late Pretest Problem in Randomized Control Trials of Education Interventions

Bayes Linear Statistics. Theory and Methods

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Bayesian approaches to handling missing data: Practical Exercises

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC

Math 215, Lab 7: 5/23/2007

Prepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies

MEASURES OF ASSOCIATION AND REGRESSION

EpiGRAPH regression: A toolkit for (epi-)genomic correlation analysis and prediction of quantitative attributes

Linear Regression Analysis

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1)

Analyzing diastolic and systolic blood pressure individually or jointly?

Business Statistics Probability

Introduction to Econometrics

10 Intraclass Correlations under the Mixed Factorial Design

A Comparative Study of Some Estimation Methods for Multicollinear Data

Correlation and Regression

Statistical Techniques. Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview

The Pretest! Pretest! Pretest! Assignment (Example 2)

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Analysis of Hearing Loss Data using Correlated Data Analysis Techniques

Where does "analysis" enter the experimental process?

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Division of Biostatistics College of Public Health Qualifying Exam II Part I. 1-5 pm, June 7, 2013 Closed Book

Chapter 5: Field experimental designs in agriculture

Lecture 21. RNA-seq: Advanced analysis

Lessons in biostatistics

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

Correlation and regression

BIOSTATISTICAL METHODS

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0

Reliability of Ordination Analyses

Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION

Pitfalls in Linear Regression Analysis

Multiple Linear Regression Analysis

Statistics and Probability

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Data Analysis Using Regression and Multilevel/Hierarchical Models

26:010:557 / 26:620:557 Social Science Research Methods

Midterm Exam MMI 409 Spring 2009 Gordon Bleil

Transcription:

On Regression Analysis Using Bivariate Extreme Ranked Set Sampling Atsu S. S. Dorvlo atsu@squ.edu.om Walid Abu-Dayyeh walidan@squ.edu.om Obaid Alsaidy obaidalsaidy@gmail.com Abstract- Many forms of ranked set samples have been introduced recently for estimating the population mean and other parameters. The ranked set samples are proven to be more efficient than simple random sample in many situations. For multiple characteristics estimation, bivariate ranked set sample was shown to be more efficient than bivariate simple random sample and the usual univariate rank set sample for estimating population means. In this paper, bivariate extreme ranked set sample is introduced and its effect on regression analysis, when the regressors are assumed to be random, is investigated. A simulation study is used to compare the efficiency of the estimators and study the impact of these sampling plans on regression analysis in general. It is shown that the bivariate extreme ranked set sample gives unbiased and more efficient estimators of the regression model parameters, than those obtained by using bivariate simple random sample, univariate ranked set sample and the bivariate ranked set sample, using the same number of quantified observations. Keywords- Bivariate Extreme Ranked Set Sampling, Bivariate Ranked Set Sampling, Bivariate Simple Random Sampling, Random Regressors, Ranked Set Sampling, Regression Analysis. I. INTRODUCTION Linear Regression model attempts to explain the relationship between two or more variables using a straight line. For example in a chemical process where the yield is to be regressed on the temperature, the temperature cannot be assumed to be constant. When both X and Y are considered random variables, the usual results on estimation, testing and prediction as obtained from the constant X case still hold if (i) the conditional distributions of the Y i given X i, are normal and independent, with conditional means α + βx i and conditional variance σ 2 and (ii) the X I are independent random variables, whose probability distribution does not involve the parameters, α, β and σ 2 [6]. Furthermore, the major modification occurs in the interpretation of confidence coefficients and specified risks of error. When X is random, these refer to repeated 72.1

Atsu S. S. Dorvlo, Walid Abu-Dayyeh, and Obaid Alsaidy sampling of pairs of (X i, Y i ) values where the X i values as well as the Y i values change from sample to sample. In any case, the observed data are used to obtain estimates of the unknown parameters. The method of estimation used in this paper is the ordinary least squares method. In practice, data are collected on each of a number of units or cases on these variables using simple random sampling (SRS) technique. As an alternative for SRS, the rank set sampling (RSS) procedure, as introduced by McIntyre [3], has been shown to improve the efficiency of estimating a population mean. The RSS procedure can be described as follows: Identify a group of sampling units from the target population. Then, randomly partition the group into disjoint subsets each having a pre-assigned size m. In the most practical situations, the size m will be 2, 3 or 4. Then, rank each subset by a suitable method of ranking like prior information, visual inspection or by the experimenter himself... etc. Then the i th judgment ordered statistic from the i th subset, i = 1,..., m, will be quantified, say X i(i). Therefore, X 1(1), X 2(2),..., X m(m) denote a RSS. This represents one cycle. We can repeat the whole procedure c-times to get a RSS of size n = cm [2, 6, 7]. For multiple characteristics estimation, bivariate ranked set sample (BVRSS), was introduced by Al- Saleh and Zheng [1]. They indicated that this BVRSS procedure could be easily extended to multivariate RSS. Based on their procedure, a BVRSS can be obtained as follows: Suppose (X, Y) is a bivariate random vector with the joint probability density function (jpdf) f (x, y). 1. A random sample of size m 4 is identified from the population and randomly allocated into m 2 pools of size m 2 each, where each pool is a square matrix with m rows and m columns. 2. In the first pool, identify by judgment, the minimum value w.r.t. the first characteristic, for each of the m rows. 3. For the m minima obtained in Step 2, choose the pair that corresponds to the minimum value of the second characteristic, identified by judgment, for actual quantification. This pair, which resembles the label (1,1), is the first element of the BVRSS sample. 4. Repeat Steps 2 and 3 for the second pool, but in step 3, the pair that corresponds to the second minimum value w.r.t. the second characteristic is chosen for actual quantification. This pair resembled the label (1, 2). 5. The process continues until the label (m, m) is resembled from the (m 2 ) th (last) pool. The above procedure produces a BVRSS of size m 2. Thus we have m 2 observations denote by : (X [i](j),y (i)[j] ), i=1,2 m and j=1,2, m. 6. The procedure can be repeated r times to obtain a sample of size n = m 2 r which will be denoted by (X [i](j)k,y (i)[j]k ), i=1,2 m and j=1,2, m, k=1,2, r. Now we will introduce bivariate extreme ranked set sample. For any m, obtain (X [1](1),Y [1](1) ), (X [1](m ),Y [1](m ) ) (X [m](1),y [m](1) ) and (X [m](m),y [m](m) ) and repeat q times so that n=4q. Denote this sample by (X [i](j)k,y (i)[j]k ), i, j = 1,m, k = 1, 2,.., q. This will be our bivariate extreme ranked set sample BVERSS. Muttlak [4] used RSS to estimate the parameters of the simple regression model assuming that X values are known constants. However, under his model assumptions there is no improvement in estimating the model parameters. To achieve any improvement of estimated model parameters, it is required to assume that both Y and X are random variables. Moreover, he assumed that ranking is done on one variable (say X) only. Therefore, X can be perfectly ranked but Y will be in error in ranking. Samawi and Ababneh [8], used RSS, ranking only on the Special Issue of the International Journal of the Computer, the Internet and Management, Vol. 19 No. SP1, June, 2011 72.2

On Regression Analysis Using Bivariate Extreme Ranked Set Sampling variable X, to investigate the effect of RSS on regression analysis in general. They found that the efficiency of using RSS for model parameter estimation is just slightly better than using SRS. However, Samawi and Abu- Dayyeh [10] showed that using extreme ranked set sampling (ERSS) scheme will improve the performance of estimating the model parameters better than using RSS [9]. Also, Samawi and Al-Saleh [11] used BVRSS to investigate its effect on regression analysis in general. They found that the efficiency of using BVRSS for model parameter estimation is just slightly better than using SRS and RSS. In this paper, the effect of BVRSS on regression analysis assuming that both Y and X are random variables is investigated. Note that, in BVRSS and BVERSS ranking is done on both variable X and Y simultaneously. In Section 2, the general set up of the problem, notation and simple linear regression using BVERSS is discussed. Inference for the simple regression model parameters (α and β) using asymptotic results is discussed in Section 3. In Section 4 the effect of using BVERSS on residual analysis and model diagnosis is illustrated using real data. B. Simple Linear Regression Using BVERSS be a BVERSS from the population with size n=4q. The simple regression model of the two variables Y and X is defined by: y (i )[ j ]k = α + βx [i ]( j )k + ϵ ijk where α is the model intercept, β is the model slope and ϵ ijk is the random error. The assumptions needed here for the purpose of parameters estimation are the meanof the error is zero, its variance is finite and they uncorrelated. Also X i and ϵ i are independent. Then the least squares estimators of α and β are given by where II. LINEAR REGRESSION A. Some Notations The following notations will be used throughout this paper. E (X ) = μ X, Then the fitted model is Also, a consistent unbiased estimator for 72.3

Atsu S. S. Dorvlo, Walid Abu-Dayyeh, and Obaid Alsaidy Using the basic properties of conditional moments and the results of Samawi and Al- Saleh [11], we have the following theorem. The proof is straight forward and will not be presented here. Theorem: Assuming the conditions of the regression model above, then III. NUMERICAL COMPUTATIONS From Theorem 1 and the above results, we can derive the efficiencies of the estimators of α and β using BVERSS relative to the estimators using BVSRS as follows: Definitions of the efficiencies of the estimators of α and β using RSS, BVRSS and BVERSS relative to the estimators using BVSRS can be defined as above. Assume that (X, Y) follow a bivariate normal distribution, the performance of simple regression parameter estimators using BVSRS, RSS, BVRSS and BVERSS, for m = 2, 3, 4; r=1 to 4 and ρ 0.2, 0.5, 0.6, 0.8, 0.9 is investigated based on an extensive simulation study to evaluate the expressions efficiencies. In this case, the total sample size is n = m 2 r for all samples schemes used. Ranking on the concomitant variable X only is considered when using RSS. The simulation results are given in Tables 1 and 2. IV. RESULTS and CONCLUSION From the simulation results we may conclude the following: (1) The RSS, BVRSS and BVERSS are all more efficient than BVSRS. (2) BVERSS is the best followed by BVRSS which is followed RSS. Special Issue of the International Journal of the Computer, the Internet and Management, Vol. 19 No. SP1, June, 2011 72.4

On Regression Analysis Using Bivariate Extreme Ranked Set Sampling TABLE I THE RELATIVE PRECISION OF THE INTERCEPT AND SLOPE ESTIMATORS (3) The efficiencies of BVRSS and RSS decrease while BVERSS increase for fixed r and fixed ρ as m increases. (4) The efficiencies decrease for fixed m and fixed ρ as r increases. TABLE II THE RELATIVE PRECISION OF THE INTERCEPT AND SLOPE ESTIMATORS REFERENCES [1] Al-Saleh, M. F. and Zheng, G. (2002). Estimation of bivariate characteristics using ranked set sampling, Australian and New Zealand J. of Statistics, 44, 221 232. [2] Kaur, A., Patil, G., Sinha, A., and Taillie, C. (1995) Ranked set sampling: an annotated bibliography, Environmental and ecology statistics, 2, 25 54. [3] McIntyre, G. A. (1952) A method of unbiased selective sampling, using ranked sets, Australian J. Agricultural Research, 3, 385 390. [4] Muttlak, H. A. (1995) Parameters Estimation in a simple linear regression using rank set sampling, Biometrical. J., 37, 7, 799 810. [5] Muttlak, H. A. and Al-Saleh, M. F. (2000). Recent developments on ranked set sampling, PJS, Vol. 16, Issue 3, pp. 269-290. [6] Neter, J., Wasserman, W., and Kutner, M. (1985) Applied linear statistical models, Richard D. IRWIN, INC, Homewood, Illinois [7] Patil, G. P., Sinha, A. K., and Taillie, C. (1999) Ranked set sampling: A bibliography. Environ, Ecolog. Statist., 6, 91 98. [8] Samawi, H. M. and Ababneh, F. (2001) On regression analysis using ranked set sample, Journal of Statistical Research (JSR), 35 (2), 93 105. [9] Samawi, H. M., Ahmed, M. S., and Abu-Dayyeh, W. (1996) Estimating the population mean using extreme ranked set sampling, Biometrical. J., 38, 5, 577 586. [10] Samawi, H. M. and Abu-Dayyeh, W. (2002) On regression analysis using extreme ranked set samples, International J. of Information and Management Sciences, Vol. 13, Number 3, pp. 19-36. [11] Samawi, H. M. and Al-Saleh, M. F. (2002). On regression analysis using bivariate ranked set samples, Metron, Vol. LX, Issue 3, pp. 29-48. 72.5