Analysis of Hoge Religious Motivation Scale by Means of Combined HAC and PCA Methods Ana Štambuk Department of Social Work, Faculty of Law, University of Zagreb, Nazorova 5, HR- Zagreb, Croatia E-mail: astambuk@inet.hr Nikola Štambuk, Paško Konjevoda Ru er Boškovi Institute, Bijeni ka cesta 5, HR- Zagreb, Croatia E-mails: stambuk@irb.hr, pkonjev@irb.hr Abstract. We used a method of combined Hierarchical Agglomerative Clustering (HAC) and Principal Components Analysis (PCA) to validate Hoge Intrinsic Religious Motivation Scale and investigate if this consensus procedure provides an efficient technique for the component extraction. Our results confirm the validity of the procedure and suggest that it may be useful exploratory technique for the data analysis in social sciences. Keywords. Principal components, analysis, hierarchical clustering, social sciences, religious motivation.. Introduction One of the main goals of the exploratory statistical analyses is to identify relevant features and/or structural patterns in the data [, ]. Principal Components Analysis (PCA) is a popular statistical procedure often used in Social Sciences for the exploratory statistics, i.e. to reduce a number of variables of the dataset analysed in order to extract few underlying patterns or groups of variables []. In addition to transforming more variables of the initial dataset into few main components, PCA may also help understanding the data structure [, ]. Hierarchical Agglomerative Clustering (HAC) procedure may be also used for the exploratory data analyses of main components [, ]. However, depending on the dataset analysed the results of HAC and PCA could lead to comparable and complementary component extraction [, ]. In this investigation we used a method of combined HAC and PCA analyses in order to evaluate if this consensus procedure provides an efficient technique for the component extraction. The analysis was done on a standard social sciences example of the religious motivation assessment.. Methods.. Dataset Hoge Intrinsic Religious Motivation Scale is a standard instrument for the assessment of religious motivation [-5]. It observes statements about religious beliefs or experience [-5]. The scale is valid for the Croatian population since the results do not differ from the ones reported for the USA [5]. The responses of 7 participants with no significant differences in gender and age were used for the analysis (man/woman=/5; age=56.6 ±.8 years, range -9) [5]. Ten items/questions (Q) of the Hoge Intrinsic Religious Motivation Scale are:. My faith involves all of my life,. Beliefs are less important than living a moral life, 97 Proceedings of the ITI 7 9 th Int. Conf. on Information Technology Interfaces, June 5-8, 7, Cavtat, Croatia
. One should seek God's guidance when making important decisions,. In my life, I experience the presence of the Divine (God), 5. Refuse to let religion influence everyday affairs, 6. Faith sometimes restricts my actions, 7. Nothing is as important as serving God, 8. Many more important things in life than religion, 9. Religious beliefs lie behind my whole approach to life,. Try hard to carry religion over into life's dealings. The answers are marked on a -5 scale. () denotes the statement that is definitely true and (5) the statement that is definitely not true for the participants. Score of indicates high awareness of spiritual issues and high religious motivation while a score of 5 indicates no religious/spiritual motivation or understanding [, ]. Participants were also asked to evaluate the importance of religion for them (religiosity), on a -7 scale (=not important, 7=very important) [5]. Scores 6-7 were considered as highly important (A), -5 as moderate (B), and - as of very low or no importance (C). religiosity in Table 5 was done with a free software Weka.. [6]. Weka logistic function classifier is based on a multinomial logistic regression model with a ridge estimator [7]. The algorithm is modified to handle the instance weights [7]. For the classifier evaluation class attributes must be nominal and other variables (Qs) may be ordinal or interval [6-8]. Table. Clusters of questions obtained by the analysis of Religious Motivation Scale. Cluster Size cluster n 5 cluster n 5 cluster n 9.. PCA and hierarhical Clustering Hierarchical agglomerative clustering (HAC) and Principal Components Analysis (PCA) of the Hoge scale questions (Q-Q) were done with Tanagra software.. (http://eric.univ-lyon.fr/~ricco/tanagra/en/tanagra.html). 6 5 Tanagra implements the procedure of HAC known as Hybrid Clustering. First, a low-level clusters are built from fast clustering method such as K-Means, SOM, then HAC starts form these clusters and builds the dendogram (Fig. ). The advantage of HAC is that the user can visualize the tree and guess the right partitioning and prune the tree between the nodes. Following this PCA procedure is done based on the results of HAC. This enables explanation of HAC subgroups using PCA factors (Tables -, Fig. ). Logistic regression based classification with respect to class attributes gender, age and Figure. Hierarchical agglomerative clustering of the subjects into main groups. 98
Table. Principal component analysis of the Hoge Religious Motivation Scale. Table. Factor loadings (communality estimates) of PCA analyses Axis Eigenvalues % variance % cumulative.9 9.9% 9.9%.66 6.6% 65.89%.8 8.% 7.99%.58 5.8% 79.8% 5.5 5.7% 85.% 6..97% 89.7% 7..9% 9.6% 8..6% 95.% 9.5.5% 97.8%..7%.% Tot. - - Q Q Q Q Q5 Q6 Corr..85 -..8.86.7.7 Axis Axis Axis % (Tot.) Corr. 7 % (7 %) -.5 % ( %).5 66 % (66 %) -. 7 % (7 %). % ( %).79 5 (5).5 % (Tot.) Corr. % (7 %). 5 % (5 %).78 % (66 %).5 (75 %) -. 6 % (66 %) -.8 (5 %) -. % (Tot.) % (7 %) 6 % (97 %) (68 %) % (75 %) 8 % (7 %) (56 %) Q7.85 7 % (7 %) -.5 (75 %). % (8 %) Correlation scatterplot (PCA Axis_ vs. PCA Axis_) Q8 -.8 % ( %).8 69 % (7 %) -. % (7 %),9,8,7 Q8 Q5 Q9.8 7 % (7 %).5 (7 %) -.5 % (7 %) PCA Axis_,6,5,,,, -, -, Q Q6 Q9 Q Q Q Q7 Q Var. Expl..8.9 7 % (7 %) -.7 9 % (9 %).66 % (7 %). 7 % (66 %).8 (7 %) 8 % (7 %) -, -, -,5 -,6. Results and Discussion -,7 -,8 -,9 - - -,9 -,8 -,7 -,6 -,5 -, -, -, -,,, PCA Axis_ Figure. Analysis of Religious Motivation Scale by means of two dimensional PCA (three clusters).,,,5,6,7,8,9 HAC procedure of unsupervised learning, based on Qs scores, extracted clusters of subjects with different religious motivation (Table, Fig. ). Following this, PCA analysis identified questions that discriminate the clusters of subjects identified by HAC (Tables -, Fig. ). The first group of questions explained the intrinsic religious motivation. It consisted of Q, Q, Q, Q6, Q7, Q9 and Q (Fig., Tables -). The second group was characterized by Q5 and Q8 and the third group by Q (Fig., Tables -). 99
The advantage of combined HAC and PCA methods is that multidimensional data can be visualized as two-dimensional maps. Moreover, different subgroups (e.g. gender, age, religious attitudes, etc.) can be, as shown in Fig., also displayed by means of different graphical patterns which makes their comparison easier. a) (X) PCA Axis_ vs. (X) PCA Axis_ by (Y) GENDE_AB Table. Eigen vectors - factor scores of PCA. Attribute Mean SD Axis Axis Axis Q.6..8 -..5 Q.7. -.5.9.87 Q.7.5.7 -..7 Q...9. -. Q5...8.6 -. Q6.7... -. Q7...8 -.. Q8.9.5 -..65 -. Q9...8. -.6 Q.9..8 -.5.6 - - b) - - - A B Table 5. The results of logistic regression analysis for variables gender, age and religiosity. (X) PCA Axis_ vs. (X) PCA Axis_ by (Y) AGE_ABC % correct classification % ten-fold CV Man. 8.7 Woman 7. 68. Overall 57.7 5. - - - - - Age -.9.9 Age 5-6 6..8 Age 65 67. 6. Overall 8. 5.5 c) C A B Very religious 85. 8.8 Religious 5. 6. Not religious 7.6 7.7 Overall 7.7 7.7 (X) PCA Axis_ vs. (X) PCA Axis_ by (Y) REL_ABC - - - - - A B C The results of logistic regression analysis (Table 5) confirm the validity of the Hoge scale [, 5] and show that intrinsic and extrinsic religious motivation are dependent on the persons religiosity, mainly for very religious persons and non-religious individuals, while the group of moderately religious persons tends to be misclassified, with present set of questions (Q-Q). Figure. Analysis of Religious Motivation Scale by means of two dimensional PCA (three clusters). Logistic regression analysis (Table 5) additionally showed that variables gender and age do not affect the answers of the subjects
(Qs), however, the visual output of HAC and PCA is more intuitive (Fig. ). Hoge Intrinsic Religious Motivation Scale observes statements about religious beliefs [- 5]. The percentage of explained variance using three extracted principal components is sufficiently high (7.99% cumulative, Table ) and explains the dataset variation better then general factors like gender and age (Fig., Table 5). The second component consists of Q5 and Q8 and the third one of Q only (Fig., Tables -). However, they contribute considerably to the percentage of explained dataset variance (.7% cumulative, Table ). Combined method of HAC and PCA exploratory data analysis provides useful information regarding subsequent formal statistical procedures, since it enables the identification of factors important for further statistical modeling based on the supervised learning methods.. Acknowledgements The work was supported in part by the Croatian Ministry of Science Education and Sport (N. Štambuk and P. Konjevoda; Grant No. 98-9899-5). 5. References [] Gentle JE. Elements of Computational Statistics. New York: Springer-Verlag;. [] Everitt BS, Dunn G. Applied Multivariate Data Analysis. London:Arnold;. [] Hoge DR. (97). A validated Intrinsic Religious Motivation Scale. Journal for the Scientific Study of Religion 97; : 69-76. [] King M, Speck P, Thomas A. The Royal Free Interview for Spiritual and Religious Beliefs:Development and Validation of a Self-report Version. Psychological Medicine ; : 5-. [5] Štambuk A. Stavovi Starijih Osoba Prema Smrti i Umiranju. PhD thesis: University of Zagreb;. [6] Witten IH, Frank E. Data Mining. San Francisco: Morgan Kaufmann; 5. [7] Le Cessie S, Van Houwelingen JC. Ridge Estimators in Logistic Regression. Applied Statistics 99; : 9-. [8] Siegel S, Castellan NJ. Nonparametric Statistics for the Behavioral Sciences. Singapore: McGraw-Hill; 988. Correspondence to: Nikola Štambuk