Package StepReg. November 3, 2017

Similar documents
Package speff2trial. February 20, 2015

Package ORIClust. February 19, 2015

All Possible Regressions Using IBM SPSS: A Practitioner s Guide to Automatic Linear Modeling

Package AbsFilterGSEA

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

ivlewbel: Uses heteroscedasticity to estimate mismeasured and endogenous regressor models

Package mdsdt. March 12, 2016

Package appnn. R topics documented: July 12, 2015

Package nopaco. R topics documented: April 13, Type Package Title Non-Parametric Concordance Coefficient Version 1.0.

MODEL SELECTION STRATEGIES. Tony Panzarella

Package frailtyhl. February 19, 2015

Package preference. May 8, 2018

Tutorial 3: MANOVA. Pekka Malo 30E00500 Quantitative Empirical Research Spring 2016

Package ExactCIdiff. February 15, 2013

Package singlecase. April 19, 2009

Package inote. June 8, 2017

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Package wally. May 25, Type Package

Package citccmst. February 19, 2015

Package prognosticroc

Package p3state.msm. R topics documented: February 20, 2015

Package influence.me

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

Package ICCbin. November 14, 2017

The decline of the British Pint: Estimates of the long run on and off-trade beer price elasticities

Package sbpiper. April 20, 2018

Package PELVIS. R topics documented: December 18, Type Package

Small Group Presentations

Package propoverlap. R topics documented: February 20, Type Package

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Vessel wall differences between middle cerebral artery and basilar artery. plaques on magnetic resonance imaging

Package xseq. R topics documented: September 11, 2015

Package ordibreadth. R topics documented: December 4, Type Package Title Ordinated Diet Breadth Version 1.0 Date Author James Fordyce

Package CompareTests

Selection and Combination of Markers for Prediction

Package HAP.ROR. R topics documented: February 19, 2015

A macro of building predictive model in PROC LOGISTIC with AIC-optimal variable selection embedded in cross-validation

Package QualInt. February 19, 2015

(C) Jamalludin Ab Rahman

Package clust.bin.pair

Multiple Analysis. Some Nomenclatures. Learning Objectives. A Weight Lifting Analysis. SCHOOL OF NURSING The University of Hong Kong

4Stat Wk 10: Regression

Package seqdesign. April 27, Version 1.1 Date

Package GANPAdata. February 19, 2015

Stepwise Model Fitting and Statistical Inference: Turning Noise into Signal Pollution

National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2008 Formula Grant

ISIR: Independent Sliced Inverse Regression

isc ove ring i Statistics sing SPSS

Package longroc. November 20, 2017

Score Tests of Normality in Bivariate Probit Models

Package TEQR. R topics documented: February 5, Version Date Title Target Equivalence Range Design Author M.

Part 8 Logistic Regression

Package SAMURAI. February 19, 2015

HOW TO BE A BAYESIAN IN SAS: MODEL SELECTION UNCERTAINTY IN PROC LOGISTIC AND PROC GENMOD

Probabilistic Approach to Estimate the Risk of Being a Cybercrime Victim

Package EpiEstim. R topics documented: February 19, Version Date 2013/03/08

Package pollen. October 7, 2018

Constructing a mixed model using the AIC

3. Model evaluation & selection

Package ega. March 21, 2017

HANDOUTS FOR BST 660 ARE AVAILABLE in ACROBAT PDF FORMAT AT:

1. You want to find out what factors predict achievement in English. Develop a model that

Bayes Factors for t tests and one way Analysis of Variance; in R

TACKLING SIMPSON'S PARADOX IN BIG DATA USING CLASSIFICATION & REGRESSION TREES

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA

Multivariable Systems. Lawrence Hubert. July 31, 2011

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Package rvtdt. February 20, rvtdt... 1 rvtdt.example... 3 rvtdts... 3 TrendWeights Index 6. population control weighted rare-varaints TDT

Package cssam. February 19, 2015

Package BCRA. April 25, 2018

Clustering Autism Cases on Social Functioning

Package RNAsmc. December 11, 2018

Package ClusteredMutations

Package ncar. July 20, 2018

Supplementary Table 1. The distribution of IFNL rs and rs and Hardy-Weinberg equilibrium Genotype Observed Expected X 2 P-value* CHC

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Application of Cox Regression in Modeling Survival Rate of Drug Abuse

A COMPARISON BETWEEN MULTIVARIATE AND BIVARIATE ANALYSIS USED IN MARKETING RESEARCH

4. Model evaluation & selection

Package inctools. January 3, 2018

Package NonCompart. August 16, 2017

Bangor University Laboratory Exercise 1, June 2008

Package mediation. September 18, 2009

Package CorMut. January 21, 2019

A Monte Carlo Simulation Study for Comparing Power of the Most Powerful and Regular Bivariate Normality Tests

Package DeconRNASeq. November 19, 2017

Changing expectations about speed alters perceived motion direction

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Package memapp. R topics documented: June 2, Version 2.10 Date

Package xmeta. August 16, 2017

Package noia. February 20, 2015

Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Package BUScorrect. September 16, 2018

Package pathifier. August 17, 2018

Measuring Regressor Information Content In The Presence of Collinearity

Cross-Lagged Panel Analysis

The Statistical Analysis of Failure Time Data

Package NarrowPeaks. August 3, Version Date Type Package

Transcription:

Type Package Title Stepwise Regression Analysis Version 1.0.0 Date 2017-10-30 Author Junhui Li,Kun Cheng,Wenxin Liu Maintainer Junhui Li <junhuili@cau.edu.cn> Package StepReg November 3, 2017 Description Stepwise regression analysis for variable selection can be used to get the best candidate final regression model in univariate or multivariate regression analysis with the 'forward' and 'stepwise' steps. Procedure uses Akaike information criterion, the small-samplesize corrected version of Akaike information criterion, Bayesian information criterion, Hannan and Quinn information criterion, the corrected form of Hannan and Quinn information criterion, Schwarz criterion and significance levels as selection criteria, where the significance levels for entry and for stay are set to 0.15 as default. Multicollinearity detection in regression model are performed by checking tolerance value, which is set to 1e-7 as default. Continuous variables nested within class effect are also considered in this package. License GPL (>= 2) Imports Rcpp (>= 0.12.13) LinkingTo Rcpp,RcppEigen Depends R (>= 2.10) NeedsCompilation yes Repository CRAN Date/Publication 2017-11-03 10:17:07 UTC R topics documented: StepReg-package...................................... 2 bestcandidate_rcpp.................................... 3 stepwise........................................... 5 Index 8 1

2 StepReg-package StepReg-package Stepwise Regression Analysis Description Details Stepwise regression analysis for variable selection can be used to get the best candidate final regression model in univariate or multivariate regression analysis with the forward and stepwise steps. Procedure uses Akaike information criterion, the small-sample-size corrected version of Akaike information criterion, Bayesian information criterion, Hannan and Quinn information criterion, the corrected form of Hannan and Quinn information criterion, Schwarz criterion and significance levels as selection criteria, where the significance levels for entry and for stay are set to 0.15 as default. Multicollinearity detection in regression model are performed by checking tolerance value, which is set to 1e-7 as default. Continuous variables nested within class effect are also considered in this package. Package: StepReg Type: Package Version: 1.0.0 Date: 2017-10-30 License: GPL (>= 2) Author(s) Junhui Li,Kun Cheng,Wenxin Liu Maintainer: Junhui Li <junhuili@cau.edu.cn> References Alsubaihi, A. A., Leeuw, J. D., and Zeileis, A. (2002). Variable selection in multivariable regression using sas/iml., 07(i12). Darlington, R. B. (1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69(3), 161. Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society, 41(2), 190-195. Harold Hotelling. (1992). The Generalization of Student s Ratio. Breakthroughs in Statistics. Springer New York. Hurvich, C. M., & Tsai, C. (1989). Regression and time series model selection in small samples. Biometrika, 76(2), 297-307. Judge, & GeorgeG. (1985). The Theory and practice of econometrics /-2nd ed. The Theory and practice of econometrics /. Wiley.

bestcandidate_rcpp 3 Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate analysis. Mathematical Gazette, 37(1), 123-131. Mckeon, J. J. (1974). F approximations to the distribution of hotelling s t20. Biometrika, 61(2), 381-383. Mcquarrie, A. D. R., & Tsai, C. L. (1998). Regression and Time Series Model Selection. Regression and time series model selection /. World Scientific. Pillai, K. C. S. (2006). Pillai s Trace. Encyclopedia of Statistical Sciences. John Wiley & Sons, Inc. R.S. Sparks, W. Zucchini, & D. Coutsourides. (1985). On variable selection in multivariate regression. Communication in Statistics- Theory and Methods, 14(7), 1569-1587. Sawa, T. (1978). Information criteria for discriminating among alternative regression models. Econometrica, 46(6), 1273-1291. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), pags. 15-18. bestcandidate_rcpp Obtain one best candidate variable Description Get best candidate variable with forward or backward direction in only one step Usage bestcandidate_rcpp(findin, p, n, sigma, tolerance, Ftrace, criteria, Y,X1, X0, k) Arguments findin p n sigma tolerance Ftrace criteria Y X1 X0 k Logical value, if FALSE then add independent variable to regression model, otherwise remove independent variable from regression model The number of independent variable entered in regression The sample size Pure error variance from full regressoin model for Bayesian information criterion(bic) Tolerance value for multicollinearity Statistic of multivariate regression including Wilks lambda, Pillai trace and Hotelling-lawley trace Information criterion including AIC, AICc, BIC, SBC, HQ, HQc and SL Data set for dependent variable Data set for independent variables not in regression model Data set for independent variables entered in regression model Forces the first k effects entered in regression model, and the selection methods are performed on the other effects in the data set

4 bestcandidate_rcpp Details Value This function can compute probability value or information criteria statistics with multivariate and univariate regression using least square method PIc seq SSE RkCh P value or Information Criteria statistic value Pointer for independent variable enter or eliminate Maximum or minimum of SSE Rank changed or not Author(s) Junhui Li References Alsubaihi, A. A., Leeuw, J. D., and Zeileis, A. (2002). Variable selection in multivariable regression using sas/iml., 07(i12). Darlington, R. B. (1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69(3), 161. Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society, 41(2), 190-195. Harold Hotelling. (1992). The Generalization of Student s Ratio. Breakthroughs in Statistics. Springer New York. Hurvich, C. M., & Tsai, C. (1989). Regression and time series model selection in small samples. Biometrika, 76(2), 297-307. Judge, & GeorgeG. (1985). The Theory and practice of econometrics /-2nd ed. The Theory and practice of econometrics /. Wiley. Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate analysis. Mathematical Gazette, 37(1), 123-131. Mckeon, J. J. (1974). F approximations to the distribution of hotelling s t20. Biometrika, 61(2), 381-383. Mcquarrie, A. D. R., & Tsai, C. L. (1998). Regression and Time Series Model Selection. Regression and time series model selection /. World Scientific. Pillai, K. C. S. (2006). Pillai s Trace. Encyclopedia of Statistical Sciences. John Wiley & Sons, Inc. R.S. Sparks, W. Zucchini, & D. Coutsourides. (1985). On variable selection in multivariate regression. Communication in Statistics- Theory and Methods, 14(7), 1569-1587. Sawa, T. (1978). Information criteria for discriminating among alternative regression models. Econometrica, 46(6), 1273-1291. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), pags. 15-18.

stepwise 5 stepwise Stepwise Regression Analysis Description Stepwise regression analysis for variable selection can be used to get the best candidate final regression model in univariate or multivariate regression analysis with the forward and stepwise steps. Procedure uses Akaike information criterion, the small-sample-size corrected version of Akaike information criterion, Bayesian information criterion, Hannan and Quinn information criterion, the corrected form of Hannan and Quinn information criterion, Schwarz criterion and significance levels as selection criteria, where the significance levels for entry and for stay are set to 0.15 as default. Multicollinearity detection in regression model are performed by checking tolerance value, which is set to 1e-7 as default. Continuous variables nested within class effect are also considered in this package. Usage stepwise(data, y, notx, include, Class, selection, select, sle, sls, tolerance, Trace, Choose) Arguments data y notx include Class selection select sle sls Data set including dependent and independent variables to be analyzed Numeric or character vector for dependent variables Numeric or character vector for independent variables removed from stepwise regression analysis Forces the effects vector listed in the data to be included in all models. The selection methods are performed on the other effects in the data set Class effect variable Model selection method including "forward" and "stepwise",forward selection starts with no effects in the model and adds effects, while stepwise regression is similar to the forward method except that effects already in the model do not necessarily stay there specifies the criterion that uses to determine the order in which effects enter and/or leave at each step of the specified selection method including Akaike Information Criterion(AIC), the Corrected form of Akaike Information Criterion(AICc),Bayesian Information Criterion(BIC),Schwarz criterion(sbc),hannan and Quinn Information Criterion(HQ), Significant Levels(SL) and so on Specifies the significance level for entry Specifies the significance level for staying in the model tolerance Tolerance value for multicollinearity, default is 1e-7 Trace Statistic for multivariate regression analysis, including Wilks lamda ("Wilks"), Pillai Trace ("Pillai") and Hotelling-Lawley s Trace ("Hotelling")

6 stepwise Choose Chooses from the list of models at the steps of the selection process the model that yields the best value of the specified criterion. If the optimal value of the specified criterion occurs for models at more than one step, then the model with the smallest number of parameters is chosen. If you do not specify the Choose option, then the model selected is the model at the final step in the selection process Details Multivariate regression and univariate regression can be detected by parameter y, where numbers of elements in y is more than 1, then multivariate regression is carried out otherwise univariate regreesion Author(s) Junhui Li References Alsubaihi, A. A., Leeuw, J. D., and Zeileis, A. (2002). Variable selection in multivariable regression using sas/iml., 07(i12). Darlington, R. B. (1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69(3), 161. Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society, 41(2), 190-195. Harold Hotelling. (1992). The Generalization of Student s Ratio. Breakthroughs in Statistics. Springer New York. Hurvich, C. M., & Tsai, C. (1989). Regression and time series model selection in small samples. Biometrika, 76(2), 297-307. Judge, & GeorgeG. (1985). The Theory and practice of econometrics /-2nd ed. The Theory and practice of econometrics /. Wiley. Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate analysis. Mathematical Gazette, 37(1), 123-131. Mckeon, J. J. (1974). F approximations to the distribution of hotelling s t20. Biometrika, 61(2), 381-383. Mcquarrie, A. D. R., & Tsai, C. L. (1998). Regression and Time Series Model Selection. Regression and time series model selection /. World Scientific. Pillai, K. C. S. (2006). Pillai s Trace. Encyclopedia of Statistical Sciences. John Wiley & Sons, Inc. R.S. Sparks, W. Zucchini, & D. Coutsourides. (1985). On variable selection in multivariate regression. Communication in Statistics- Theory and Methods, 14(7), 1569-1587. Sawa, T. (1978). Information criteria for discriminating among alternative regression models. Econometrica, 46(6), 1273-1291. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), pags. 15-18.

stepwise 7 Examples set.seed(4) dfy <- data.frame(matrix(c(rnorm(20,0,2),c(rep(1,10),rep(2,10)),rnorm(20,2,3)),20,3)) colnames(dfy) <- paste("y",1:3,sep="") dfx <- data.frame(matrix(c(rnorm(100,0,2),rnorm(100,2,1)),20,10)) colnames(dfx) <- paste("x",1:10,sep="") dfyx <- cbind(dfy,dfx) #for univariate regression y <- c("y1") notx <- c("y3") #for multivariate regression you can use this ym <- c("y1","y3") notxm <- NULL #* with continuous variable nested in class effect ClassY2 <- c("y2") #* without continuous variable nested in class effect Class0 <- NULL # without forced effect in regression model include0 <- NULL # force the 'Y2' into the regression model includey2 <- c("y2") selection <- 'stepwise' tolerance <- 1e-7 Trace <- "Pillai" sle <- 0.15 sls <- 0.15 #univariate regression for 'SBC' select and 'AIC' choose #without forced effect and continuous variable nested in class effect stepwise(dfyx, y, notx, include0, Class0, selection, "SBC", sle, sls, tolerance, Trace, 'AIC') #univariate regression for 'AICc' select and 'HQc' choose #with forced effect and continuous variable nested in class effect stepwise(dfyx, y, notx, includey2, ClassY2, selection, 'AICc', sle, sls, tolerance, Trace, 'HQc') #multivariate regression for 'HQ' select and 'BIC' choose #with forced effect and continuous variable nested in class effect stepwise(dfyx, ym, notxm, includey2, ClassY2, selection, 'HQ', sle, sls, tolerance, Trace, 'BIC')

Index Topic package StepReg-package, 2 Topic stepwise regression bestcandidate_rcpp, 3 stepwise, 5 bestcandidate_rcpp, 3 StepReg (StepReg-package), 2 StepReg-package, 2 stepwise, 5 8