A Abrahamowicz, M., 100 Akaike information criterion (AIC), 141 Analysis of covariance (ANCOVA), 2 4. See also Canonical regression Analysis of variance (ANOVA) model, 2 4, 255 canonical regression (see Canonical regression) mixed linear models, 66 Artificial intelligence, 4 definition, 146 history, 146 147 multilayer perceptron modeling ( see Back propagation (BP) arti ficial neural networks) radial basis function network ( see Radial basis function network) B Back propagation (BP) artificial neural networks activity/inactivity phases, 147 Gaussian distributions, 145, 148 Haycock equation, 145, 152 imput, hidden and output layer, 147 iteration/bootstrapping, 148 linear regression analysis, 145 ninety persons physical measurements and body surfaces, 148 151 non-gaussian method, 153 variables nonlinear relationship, 148, 151 weights matrices, 148 Bastien, T., 206 Binary logistic regression, 19, 173 175 Binary partitioning best cut-off level entropy method, 82 84 ROC method, 81 83 CART, 80 data-based decision cut-off levels, 80 decision trees, 85 peripheral vascular disease, 80 representative historical data classi fications, 80 splitting procedure, 81 Binomial distribution, 83 Bootstraps, 4 BP artificial neural networks. See Back propagation (BP) arti ficial neural networks Breiman, L., 80 C Canonical regression, 2, 4 ANOVA/ANCOVA add-up sums, 226 composite variables, 226 multiple linear regression, 226 disadvantages, 233 drug efficacy scores, 227 elastic net and lasso method, 232 latent variables, 233 linear regression, 233 manifest variables, 233 MANOVA/MANCOVA add-up sums, 226 canonical weights, 231, 232 collinearity, 228 composite variables, 226 T.J. Cleophas and A.H. Zwinderman, Machine Learning in Medicine, DOI 10.1007/978-94-007-5824-7, Springer Science+Business Media Dordrecht 2013 259
260 Canonical regression (cont.) correlation coefficient, 228, 231 correlation matrix, covariates, 228 230 multiple linear regression, 226 multivariate tests, 228, 231 Pillai s statistic and normal distributions, 227 microarray gene expression levels, 227 patients data-file, example, 234 239 Centroid clusters, 192 Chi-square goodness of fit, 88 Classification and regression trees (CART), 80 Collinearity, 9 canonical regression, 228 discriminant analysis, 222 factor analysis, 169 partial correlations, 57 Components, 4, 172 174 Cox regression with segmented time-dependent predictor blood pressure study and survival, 108 cardiovascular event occurrence, 106 108 logical expression, 106 with time-dependent predictor disproportional hazard, 103 elevated LDL-cholesterol, 105, 106 LDL cholesterol level, 103 multiple Cox regression, 108 109 non-proportional hazards, 103 VAR0006, 105, 106 variables, 103 105 without time-dependent predictor covariates, 103 exponential models, 101 hazard ratio, 102 Kaplan Meier curves, 101, 102 risks of dying ratio, 103 C-reactive protein (CRP), 114, 115 Cronbach s alphas, 4 5, 8 factor analysis, 169, 172, 173 principal components analysis, 201 Cross-validation, 5 regularization, 41, 42 Cytostatic drug efficacy, 185 D Data dimension reduction, 5 Data mining, 5 Defays, D., 185 Density based clusters, 192 Discretization, 5, 40 Discriminant analysis, 5 ANOVAs, 218 collinearity, 222 health recovery, 216 latent variables, 217 linear cause-effect, 221 MANOVA, 216, 218, 221 mean function score, 219 multiple linear regression coefficients, 217 multiple outcome variables, 216 orthogonal discriminant functions, 218, 219 orthogonal linear modeling, 216 outcome variables, 218 sepsis treatment with multi-organ, 218 SPSS statistical software, 218 subgroup analysis, 220 221 test statistic of functions, 219 treatment-group plots, 219, 220 Durbin Watson test, 206 E Eftekbar, B., 152, 164 Eigenvectors, 6, 171 Elastic net regression, 6 regularization, 42, 44, 46 Exponential modeling, 135 F Factor analysis, 5, 6, 8 add-up scores, 167, 168 advantages and disadvantages, 176 binary logistic regression, 173 175 collinearity, 169 components, 172 174 Cronbach s alpha, 169, 172, 173 eigenvectors, 171 factor analysis theory, 169 171 vs. hierarchical cluster analysis, 193 194 individual patients, health risk pro fi ling, 175 iterations, 171 latent factors, 172 loadings, 170 magnitude, 170 multicollinearity, 169 multidimensional modeling, 172 multiple logistic regression, 168 original variables, 168 169 coefficients, 173, 174 test-retest reliability, 172, 173 Pearson s correlation coefficient (R), 169 sepsis patients data fi le, example, 177 181
SPSS s data dimension reduction module, 167 three factor factor-analysis, 167 varimax rotation, 171 Factor analysis theory, 6 7, 169 171 Factor loadings, 7, 170 Fisher, R.A., 216 Franke, R., 158 Fuzzy logic modeling advantages, 251 252 fuzzy memberships, 243, 246, 250 fuzzy plots, 243, 246, 250 fuzzy statistics, 251 linguistic membership names, 243 linguistic rules, 243 propranolol, time-response effect imput and output relationships, 247 248, 250 251 pharmacodynamic effect, single oral dose, 247, 248 pharmacodynamic relationship, 247, 249 quadratic regression model, 247 regression analysis, 241 thiopental, dose-response effects imput values, 244, 246 247 induction dose and number of responders, 244, 245 quantal pharmacodynamic effects, 243, 247 statistical distribution, 243 244 triangular fuzzy sets, 243 universal space, 243 Fuzzy memberships, 7 Fuzzy modeling, 2, 3, 7 Fuzzy plots, 7 G Gaussian curves, 12 Gaussian distributions, 83 BP artificial neural networks, 145, 148 item response modeling, 88 Gifi, A., 26 Goodness of fit (GoF) value, 204 261 Hierarchical cluster analysis, 2, 3, 7 8 centroid clusters and density based clusters, 192 collinearity, 185 crystallization optimization, 185 data analysis add-up sum, 186, 189 190 dendrogram, 189, 191 icicle plot, 189, 191 linear regression with progression free interval, 189, 192 drug efficacy, 184, 185 explorative data mining, 192 vs. factor analysis, 193 194 flexibility, 192 193 gastric cancer patients, cytostatic treatment genes expression levels, 185, 186 variables correlation matrix, 186 188 linear regression, 185 oral thiopurines, 185 platinum and fluorouracil chemo-resistance, 185 SPSS statistical software, 183 Hojsgaard, S., 42 Hotelling, H., 227 Huynh Feldt test, 68, 72 I Item response modeling ceiling effects, 88 vs. classical linear testing, 88 clinical and laboratory diagnostic-testing analysis results, 93, 94 item response scores and classical scores, 93, 95 vascular-laboratory tests, 93 disadvantages, 96 97 logistic models, 94 principles, 89 90 psychological and intelligence tests, 88 QOL assessment (see Quality of life (QOL) assessment) Iterations, 8 BP artificial neural networks, 148 Factor analysis, 171 H Halekoh, U., 42 Hancock s equation, 162, 163 Haycock equation, 145, 152 Hazard ratio (HR), 102 Henderson, C.R., 66 K Kernel frequency distribution modeling, 141 Kessler, R.C., 88 Klecka, W.R., 216 Kolmogorov Smirnov (KS) goodness of fit test, 88
262 L Lasso regression, 8 optimal scaling regularization, 41, 42, 44, 45 Latent factors, 8, 172 Latent variables (LVs) canonical regression, 233 discriminant analysis, 217 principal components analysis, 198, 200 Learning, 8 Linear cause-effect, 5 Linear regression analysis BP artificial neural networks, 145 radial basis function network, 159 seasonality assessments, 114, 115, 124 Linguistic membership names, 9 Linguistic rules, 9 Loess modeling, 139 140 Logistic regression, 1, 3, 9 binary, 19 b-values, 20, 22, 23 calculated odds, 21 characteristics, 21 disadvantages, 22 endometrial cancer example, 22 23 linear equation transformation, 18 19 log linear models, 17 odds of infarction and age, 18, 19 predictive models, 22 probability of events, 22 probability prediction, 17, 18 p-values, 20 regression equation, 19 LVs. See Latent variables (LVs) M Machine learning, definition, 1, 9 Manifest variables (MVs) canonical regression, 233 principal components analysis, 198, 200 McCulloch, W., 146 McLean, R.A., 66 Minsky, M.A., 146 Mixed linear models, 2, 3 advantages and disadvantages, 75 ANOVA model, 66 placebo-controlled parallel group study, cholesterol treatment data adaptation, 68, 70 71 data file, 67 general linear model, 68, 69 Huynh Feldt test, 68 hypercholesterolemia treatment, 67, 68 mixed model analysis, 68, 70 sphericity test, 68 three treatment crossover study, sleeping pills effect data adaptation, 73, 74 data file, 69, 71 mixed model analysis, 73, 74 p-value, 75 sphericity test, 72, 73 treatment effects, single group, 72 Monte Carlo methods, 4, 9 regularization, 41 Multicollinearity, 9, 169 Multidimensional modeling, 10, 172 Multilayer perceptron modeling, 10 BP artificial neural networks (see Back propagation (BP) arti ficial neural networks) Gaussian distributions, 145 Haycock equation, 145 linear regression analysis, 145 Multi-layer perceptron neural network, 158, 159, 163 Multiple linear regression, 217 Multiple logistic regression, 168 Multivariate analysis of covariance (MANCOVA), 2 4, 198. See also Canonical regression Multivariate analysis of variance (MANOVA), 198, 203, 255 canonical regression (see Canonical regression) discriminant analysis, 216, 218, 221 Multivariate machine learning methods, 10 Multivariate method, 5, 10 MVs. See Manifest variables (MVs) N Network, 10 Neural networks, 2, 10 Non-linear modeling Ace/Avas packages, 133 134 background, 127 box Cox transformation, 133, 134 disadvantages, 141 exponential modeling, 135 Gaussian curves, 141 kernel frequency distribution modeling, 141 linearity testing curvilinear regression, 128 non-linear data sets, 128, 129
quadratic and cubic models, 128, 130 squared correlation coefficient, 128 standard models, regression analysis, 128, 130 Loess modeling, 139 140 logit and probit transformations, 131 133 mechanical spline methods, 128 objective, 127 sinusoidal data, 134 135 spline modeling computer graphics, 139 low-order polynomial regression lines, 137, 138 multidimensional smoothing, 139 multiple linear regression lines, 137 non-linear dataset, 136 third order polynomial functions, 137, 138 two-dimensional, 139 trial and error method, 133 Non-linear relationships, 2, 3 O Optimal scaling, 1 3, 10 11 discretization bouncing betas, 31 continuous predictor variable, 26, 27 cross-validation, 28 disadvantages, 31 drug ef fi cacy composite score, 29, 30 elastic net regression, 28 F-tests, 30 Lasso regression, 28 microarray gene expression levels, 29, 30 Monte Carlo methods, 28 multiple linear regression analysis, 26, 27 outcome variable composite scores, 29 overdispersion, 28 overfitting, 28 quadratic approximation, 29 regularization, 28 ridge regression, 29 splines, 29 SPSS module, 25 250 subjects datafile, example, 32 37 without regularization, 30, 31 regularization bouncing betas, 46 continuous variable conversion, 40 cross-validation, 41, 42 discretization, 40 263 discretized variable correction, 40 elastic net regression, 42, 44, 46 instable regression coefficients, 46 k-fold vs. k-1 fold scale, 42 Lasso regression, 41, 42, 44, 45 microarray gene expression levels, 42 Monte Carlo methods, 41 overdispersion/overfitting, 41 patients datafile, example, 47 52 ridge path model, 43, 44 ridge regression, 41 44 ridge scale model, 43 splines, 41 Orthogonal discriminant functions, 218, 219 Overdispersion/overfitting, 11 regularization, 41 P Partial correlation analysis, 2, 10 cardiovascular factors, multiple regression, 56 exercise and calorie intake effects, weight loss with age held constant, 62 with calorie intake held constant, 61, 62 clinical outcome, 61 collinearity, 57 covariate, 57 59 data interaction, 63 64 with exercise held constant, 61, 62 higher order partial correlation analysis, 62 interaction variable, 56 59 linear correlation, 60 61 linear regression, 59 multiple linear regression, 57, 60 r-square values, 60 subgroups, 63 variables correlation matrix, 57, 59 partial regression analysis, 56 Partial least squares analysis, 2, 10 add-up scores, 198 advantages and disadvantages, 206 clusters of variables, 203 correlation coefficients, 201, 204 data dimension reduction, 199, 205 example datafile, 207 212 GoF value, 204 MANCOVA, 198 MANOVA, 198 multivariate linear regression, 201
264 Partial least squares analysis (cont.) vs. principal components analysis, 204 205 response variables, 200, 206 r-values, 204 square boolean matrix, 203 Partial regression analysis, 56 Pearson s correlation coefficient (R), 11, 169 PLS-Cox model, 206 Principal components analysis, 2, 12 add-up scores, 203 advantages and disadvantages, 206 best fit coefficients, 201, 202 Cronbach s alphas, 201 data dimension reduction, 199, 205 data validation, 201, 202 example datafile, 207 212 latent variables, 198, 200 manifest variables, 198, 200 MANOVA, 203 multiple linear regression, 199, 202 original variables, 201 outcome variables, 203 vs. partial least squares analysis, 204 205 test-retest reliability, 201 Propranolol, time-response effect imput and output relationships, 247 248, 250 251 pharmacodynamic effect, single oral dose, 247, 248 pharmacodynamic relationship, 247, 249 quadratic regression model, 247 Q Quality of life (QOL) assessment EAP scores, 92, 93 Gaussian error model, 91 5-item mobility-domain, 90, 91 LTA-2 software program, 91 normal Gaussian frequency distribution curve, 90, 92 R Radial basis function network, 12 black box modeling, 165 correlation coefficient, 162 Hancock s equation, 162, 163 kurtosis of age, 163, 164 linear regression analysis, 159 multi-layer perceptron neural network, 158, 159, 163, 164 90 persons body surfaces example, 159 161 radial distant functions, 158 sigmoidal function, 158 skewness of height, 163, 164 SPSS module neural networks, 159 symmetric functions, 158 three layer network, 159, 162 Radial basis functions, 12 Rasch, G., 88 Receiver operating characteristic (ROC) method, 81 83 Regularization, 12, 28 Ridge regression, 12 discretization, 29 regularization, 41 44 Ridge scale model, 42 Robinson, G.K., 66 Rosenblatt, F., 146 S Sample learning, 9 Seasonality assessments autocorrelation causal factors, 119 coefficients and standard errors, 116, 118 C-reactive protein levels, 114, 115 cross-correlation coefficients, 119 definition, 114 first vs. second summer pattern, 119 121 inconsistent patterns, 119, 122 lagcurves, 115 116 linear regression analysis, 114, 115, 124 original datacurve, 114, 115, 125 partial autocorrelation, 116 p-values, 118 curvilinear regression methods, 124 monthly CRP values, 116, 117, 123 objective and methods, 113 vs. para-seasonality, 124 seasonal patterns, 113, 114 Segmented time-dependent predictor blood pressure study and survival, 108 cardiovascular event occurrence, 106 108 logical expression, 106 Serendipities, 256 Sibson, R., 185 Spearman, C., 168 Spline modeling computer graphics, 139 low-order polynomial regression lines, 137, 138
265 multidimensional smoothing, 139 multiple linear regression lines, 137 non-linear dataset, 136 third order polynomial functions, 137, 138 two-dimensional, 139 Splines, 12 discretization, 29 regularization, 41 Stampfer, M.J., 57 Stevens, J., 233 Subgroup discriminant analysis, 220 221 Supervised learning, 12 Supervised machine learning. See Discriminant analysis T Thiopental, dose-response effects imput values, 244, 246 247 induction dose and number of responders, 244, 245 quantal pharmacodynamic effects, 243, 247 statistical distribution, 243 244 Thiopurines, 185 Tibshirani, R., 40 Time-dependent predictor Cox regression (see Cox regression) methods and results, 99 100 morbidity/mortality, 100 SPSS program, 109 SPSS statistical software, 99 T_*covariate, 109 time-dependent factor analysis, 99 Training data, 13 Trial and error method, 133 Triangular fuzzy sets, 13, 243 U Uebersax, J., 87, 89, 97 Universal space, 13 Unsupervised learning, 13 V Varimax rotation, 13 W Waaijenberg, S., 232 Weights, 13 Willett, W., 57 Wold, H., 199 Y Yule, G.U., 56, 114 Z Zadeh, L.A. Zwinderman, A.H., 232