Analysis of Hoge Religious Motivation Scale by Means of Combined HAC and PCA Methods

Similar documents
AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES RISK IN PRIMARY CARE

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

Chapter 1. Introduction

Credal decision trees in noisy domains

Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data

Data Mining with Weka

Reveal Relationships in Categorical Data

The Long Tail of Recommender Systems and How to Leverage It

Evaluating Classifiers for Disease Gene Discovery

10CS664: PATTERN RECOGNITION QUESTION BANK

Handling Partial Preferences in the Belief AHP Method: Application to Life Cycle Assessment

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Web Only Supplement. etable 1. Comorbidity Attributes. Comorbidity Attributes Hypertension a

MS&E 226: Small Data

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Outlier Analysis. Lijun Zhang

Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

Stage-Specific Predictive Models for Cancer Survivability

FORECASTING TRENDS FOR PROACTIVE CRIME PREVENTION AND DETECTION USING WEKA DATA MINING TOOL-KIT ABSTRACT

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

This report was generated by the EnQuireR package

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Case Studies of Signed Networks

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Comparison of discrimination methods for the classification of tumors using gene expression data

Predicting Potential Domestic Violence Re-offenders Using Machine Learning. Rajhas Balaraman Supervisor : Dr. Timothy Graham

Empirical function attribute construction in classification learning

Predicting Breast Cancer Survivability Rates

Constructing Indices and Scales. Hsueh-Sheng Wu CFDR Workshop Series June 8, 2015

Study of cigarette sales in the United States Ge Cheng1, a,

Selecting the Right Data Analysis Technique

Measuring the Spiritual and Behavioral Dimensions of Religiosity in a Muslim Sample

Model reconnaissance: discretization, naive Bayes and maximum-entropy. Sanne de Roever/ spdrnl

Supporting Online Material for

Background. Purpose. Religion and Spirituality in the African American Community 11/7/2012

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

The use of random projections for the analysis of mass spectrometry imaging data Palmer, Andrew; Bunch, Josephine; Styles, Iain

What To Expect From Counseling

A Handbook of Statistical Analyses using SAS

Dr Divine Charura and Dr John Allan

Attitudes, Self- Concept, Values, and Ethics

Data complexity measures for analyzing the effect of SMOTE over microarrays

Identification of Tissue Independent Cancer Driver Genes

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Thriving in College: The Role of Spirituality. Laurie A. Schreiner, Ph.D. Azusa Pacific University

Clustering analysis of cancerous microarray data

Practical Multivariate Analysis

WALES Personal and Social Education Curriculum Audit. Key Stage 2: SEAL Mapping to PSE outcomes

Assessing Functional Neural Connectivity as an Indicator of Cognitive Performance *

Multivariate exploration of the questionnaire and typology of the surveyed people

UNCLASSIFIED//FOUO. Spiritual Fitness. TASK: Define and discuss Spiritual Fitness and its benefits on Soldiers and their units.

Exploratory Quantitative Contrast Set Mining: A Discretization Approach

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

Evaluating the Greek Version of Religious Commitment Inventory-10 on a Sample of Pomak Households

The Relationship between Spiritual Well-Being and Academic Achievement

Seeing is Behaving : Using Revealed-Strategy Approach to Understand Cooperation in Social Dilemma. February 04, Tao Chen. Sining Wang.

Decision-Tree Based Classifier for Telehealth Service. DECISION SCIENCES INSTITUTE A Decision-Tree Based Classifier for Providing Telehealth Services

Information needs and information behavior of patients with rare chronic diseases in Croatia

Applied Machine Learning, Lecture 11: Ethical and legal considerations; domain effects and domain adaptation

Impute vs. Ignore: Missing Values for Prediction

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

Nature Methods: doi: /nmeth.3115

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Empirical Correlates of the Spiritual Well-Being and Spiritual Maturity Scales

Student Interview Paper 2. College Student Affairs 503. Nick Pazdziorko. The Pennsylvania State University

Colon cancer survival prediction using ensemble data mining on SEER data

Sense-making Approach in Determining Health Situation, Information Seeking and Usage

Knowledge Discovery and Data Mining I

Building Resilient Kids

Reliability of Ordination Analyses

Prediction of Malignant and Benign Tumor using Machine Learning

Rethinking torture trauma. In search of indigenous coping strategies

Mental Health of Adolescents in Relation to Spirituality

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets

TEMPORAL PREDICTION MODELS FOR MORTALITY RISK AMONG PATIENTS AWAITING LIVER TRANSPLANTATION

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra

Religion/Spirituality and Depression in Adolescent Psychiatric Patients

Mining Big Data: Breast Cancer Prediction using DT - SVM Hybrid Model

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

Introduction to Discrimination in Microarray Data Analysis

M.A. Alhad & S.S. Turnip Faculty of Psychology, Universitas Indonesia, Depok, Indonesia

Comparative analysis of data mining tools for lungs cancer patients

Index. E Eftekbar, B., 152, 164 Eigenvectors, 6, 171 Elastic net regression, 6 discretization, 28 regularization, 42, 44, 46 Exponential modeling, 135

COMPARISON OF DECISION TREE METHODS FOR BREAST CANCER DIAGNOSIS

Making Diabetes Connections: Expand the Conversation on Diabetes

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

The. The EACHING UMAN EXUALITY GUIDELINES FOR FOR CATHOLIC SCHOOLS IN IN TOWNSVILLE DIOCESE

Representing Association Classification Rules Mined from Health Data

How to analyze correlated and longitudinal data?

and errs as expected. The disadvantage of this approach is that it is time consuming, due to the fact that it is necessary to evaluate all algorithms,

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Capitulation of Machine Learning Techniques for Detection of Auto immune Thyroiditis

International Journal of Advance Research in Computer Science and Management Studies

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

Transcription:

Analysis of Hoge Religious Motivation Scale by Means of Combined HAC and PCA Methods Ana Štambuk Department of Social Work, Faculty of Law, University of Zagreb, Nazorova 5, HR- Zagreb, Croatia E-mail: astambuk@inet.hr Nikola Štambuk, Paško Konjevoda Ru er Boškovi Institute, Bijeni ka cesta 5, HR- Zagreb, Croatia E-mails: stambuk@irb.hr, pkonjev@irb.hr Abstract. We used a method of combined Hierarchical Agglomerative Clustering (HAC) and Principal Components Analysis (PCA) to validate Hoge Intrinsic Religious Motivation Scale and investigate if this consensus procedure provides an efficient technique for the component extraction. Our results confirm the validity of the procedure and suggest that it may be useful exploratory technique for the data analysis in social sciences. Keywords. Principal components, analysis, hierarchical clustering, social sciences, religious motivation.. Introduction One of the main goals of the exploratory statistical analyses is to identify relevant features and/or structural patterns in the data [, ]. Principal Components Analysis (PCA) is a popular statistical procedure often used in Social Sciences for the exploratory statistics, i.e. to reduce a number of variables of the dataset analysed in order to extract few underlying patterns or groups of variables []. In addition to transforming more variables of the initial dataset into few main components, PCA may also help understanding the data structure [, ]. Hierarchical Agglomerative Clustering (HAC) procedure may be also used for the exploratory data analyses of main components [, ]. However, depending on the dataset analysed the results of HAC and PCA could lead to comparable and complementary component extraction [, ]. In this investigation we used a method of combined HAC and PCA analyses in order to evaluate if this consensus procedure provides an efficient technique for the component extraction. The analysis was done on a standard social sciences example of the religious motivation assessment.. Methods.. Dataset Hoge Intrinsic Religious Motivation Scale is a standard instrument for the assessment of religious motivation [-5]. It observes statements about religious beliefs or experience [-5]. The scale is valid for the Croatian population since the results do not differ from the ones reported for the USA [5]. The responses of 7 participants with no significant differences in gender and age were used for the analysis (man/woman=/5; age=56.6 ±.8 years, range -9) [5]. Ten items/questions (Q) of the Hoge Intrinsic Religious Motivation Scale are:. My faith involves all of my life,. Beliefs are less important than living a moral life, 97 Proceedings of the ITI 7 9 th Int. Conf. on Information Technology Interfaces, June 5-8, 7, Cavtat, Croatia

. One should seek God's guidance when making important decisions,. In my life, I experience the presence of the Divine (God), 5. Refuse to let religion influence everyday affairs, 6. Faith sometimes restricts my actions, 7. Nothing is as important as serving God, 8. Many more important things in life than religion, 9. Religious beliefs lie behind my whole approach to life,. Try hard to carry religion over into life's dealings. The answers are marked on a -5 scale. () denotes the statement that is definitely true and (5) the statement that is definitely not true for the participants. Score of indicates high awareness of spiritual issues and high religious motivation while a score of 5 indicates no religious/spiritual motivation or understanding [, ]. Participants were also asked to evaluate the importance of religion for them (religiosity), on a -7 scale (=not important, 7=very important) [5]. Scores 6-7 were considered as highly important (A), -5 as moderate (B), and - as of very low or no importance (C). religiosity in Table 5 was done with a free software Weka.. [6]. Weka logistic function classifier is based on a multinomial logistic regression model with a ridge estimator [7]. The algorithm is modified to handle the instance weights [7]. For the classifier evaluation class attributes must be nominal and other variables (Qs) may be ordinal or interval [6-8]. Table. Clusters of questions obtained by the analysis of Religious Motivation Scale. Cluster Size cluster n 5 cluster n 5 cluster n 9.. PCA and hierarhical Clustering Hierarchical agglomerative clustering (HAC) and Principal Components Analysis (PCA) of the Hoge scale questions (Q-Q) were done with Tanagra software.. (http://eric.univ-lyon.fr/~ricco/tanagra/en/tanagra.html). 6 5 Tanagra implements the procedure of HAC known as Hybrid Clustering. First, a low-level clusters are built from fast clustering method such as K-Means, SOM, then HAC starts form these clusters and builds the dendogram (Fig. ). The advantage of HAC is that the user can visualize the tree and guess the right partitioning and prune the tree between the nodes. Following this PCA procedure is done based on the results of HAC. This enables explanation of HAC subgroups using PCA factors (Tables -, Fig. ). Logistic regression based classification with respect to class attributes gender, age and Figure. Hierarchical agglomerative clustering of the subjects into main groups. 98

Table. Principal component analysis of the Hoge Religious Motivation Scale. Table. Factor loadings (communality estimates) of PCA analyses Axis Eigenvalues % variance % cumulative.9 9.9% 9.9%.66 6.6% 65.89%.8 8.% 7.99%.58 5.8% 79.8% 5.5 5.7% 85.% 6..97% 89.7% 7..9% 9.6% 8..6% 95.% 9.5.5% 97.8%..7%.% Tot. - - Q Q Q Q Q5 Q6 Corr..85 -..8.86.7.7 Axis Axis Axis % (Tot.) Corr. 7 % (7 %) -.5 % ( %).5 66 % (66 %) -. 7 % (7 %). % ( %).79 5 (5).5 % (Tot.) Corr. % (7 %). 5 % (5 %).78 % (66 %).5 (75 %) -. 6 % (66 %) -.8 (5 %) -. % (Tot.) % (7 %) 6 % (97 %) (68 %) % (75 %) 8 % (7 %) (56 %) Q7.85 7 % (7 %) -.5 (75 %). % (8 %) Correlation scatterplot (PCA Axis_ vs. PCA Axis_) Q8 -.8 % ( %).8 69 % (7 %) -. % (7 %),9,8,7 Q8 Q5 Q9.8 7 % (7 %).5 (7 %) -.5 % (7 %) PCA Axis_,6,5,,,, -, -, Q Q6 Q9 Q Q Q Q7 Q Var. Expl..8.9 7 % (7 %) -.7 9 % (9 %).66 % (7 %). 7 % (66 %).8 (7 %) 8 % (7 %) -, -, -,5 -,6. Results and Discussion -,7 -,8 -,9 - - -,9 -,8 -,7 -,6 -,5 -, -, -, -,,, PCA Axis_ Figure. Analysis of Religious Motivation Scale by means of two dimensional PCA (three clusters).,,,5,6,7,8,9 HAC procedure of unsupervised learning, based on Qs scores, extracted clusters of subjects with different religious motivation (Table, Fig. ). Following this, PCA analysis identified questions that discriminate the clusters of subjects identified by HAC (Tables -, Fig. ). The first group of questions explained the intrinsic religious motivation. It consisted of Q, Q, Q, Q6, Q7, Q9 and Q (Fig., Tables -). The second group was characterized by Q5 and Q8 and the third group by Q (Fig., Tables -). 99

The advantage of combined HAC and PCA methods is that multidimensional data can be visualized as two-dimensional maps. Moreover, different subgroups (e.g. gender, age, religious attitudes, etc.) can be, as shown in Fig., also displayed by means of different graphical patterns which makes their comparison easier. a) (X) PCA Axis_ vs. (X) PCA Axis_ by (Y) GENDE_AB Table. Eigen vectors - factor scores of PCA. Attribute Mean SD Axis Axis Axis Q.6..8 -..5 Q.7. -.5.9.87 Q.7.5.7 -..7 Q...9. -. Q5...8.6 -. Q6.7... -. Q7...8 -.. Q8.9.5 -..65 -. Q9...8. -.6 Q.9..8 -.5.6 - - b) - - - A B Table 5. The results of logistic regression analysis for variables gender, age and religiosity. (X) PCA Axis_ vs. (X) PCA Axis_ by (Y) AGE_ABC % correct classification % ten-fold CV Man. 8.7 Woman 7. 68. Overall 57.7 5. - - - - - Age -.9.9 Age 5-6 6..8 Age 65 67. 6. Overall 8. 5.5 c) C A B Very religious 85. 8.8 Religious 5. 6. Not religious 7.6 7.7 Overall 7.7 7.7 (X) PCA Axis_ vs. (X) PCA Axis_ by (Y) REL_ABC - - - - - A B C The results of logistic regression analysis (Table 5) confirm the validity of the Hoge scale [, 5] and show that intrinsic and extrinsic religious motivation are dependent on the persons religiosity, mainly for very religious persons and non-religious individuals, while the group of moderately religious persons tends to be misclassified, with present set of questions (Q-Q). Figure. Analysis of Religious Motivation Scale by means of two dimensional PCA (three clusters). Logistic regression analysis (Table 5) additionally showed that variables gender and age do not affect the answers of the subjects

(Qs), however, the visual output of HAC and PCA is more intuitive (Fig. ). Hoge Intrinsic Religious Motivation Scale observes statements about religious beliefs [- 5]. The percentage of explained variance using three extracted principal components is sufficiently high (7.99% cumulative, Table ) and explains the dataset variation better then general factors like gender and age (Fig., Table 5). The second component consists of Q5 and Q8 and the third one of Q only (Fig., Tables -). However, they contribute considerably to the percentage of explained dataset variance (.7% cumulative, Table ). Combined method of HAC and PCA exploratory data analysis provides useful information regarding subsequent formal statistical procedures, since it enables the identification of factors important for further statistical modeling based on the supervised learning methods.. Acknowledgements The work was supported in part by the Croatian Ministry of Science Education and Sport (N. Štambuk and P. Konjevoda; Grant No. 98-9899-5). 5. References [] Gentle JE. Elements of Computational Statistics. New York: Springer-Verlag;. [] Everitt BS, Dunn G. Applied Multivariate Data Analysis. London:Arnold;. [] Hoge DR. (97). A validated Intrinsic Religious Motivation Scale. Journal for the Scientific Study of Religion 97; : 69-76. [] King M, Speck P, Thomas A. The Royal Free Interview for Spiritual and Religious Beliefs:Development and Validation of a Self-report Version. Psychological Medicine ; : 5-. [5] Štambuk A. Stavovi Starijih Osoba Prema Smrti i Umiranju. PhD thesis: University of Zagreb;. [6] Witten IH, Frank E. Data Mining. San Francisco: Morgan Kaufmann; 5. [7] Le Cessie S, Van Houwelingen JC. Ridge Estimators in Logistic Regression. Applied Statistics 99; : 9-. [8] Siegel S, Castellan NJ. Nonparametric Statistics for the Behavioral Sciences. Singapore: McGraw-Hill; 988. Correspondence to: Nikola Štambuk