The Comprehensive Approach to Analyzing Multivariate Constructs. Ryne A. Sherman & David G. Serfass. Florida Atlantic University.

Size: px

Start display at page:

Download "The Comprehensive Approach to Analyzing Multivariate Constructs. Ryne A. Sherman & David G. Serfass. Florida Atlantic University."

Alexandrina Owens
5 years ago
Views:

1 1 Article In Press Journal of Research in Personality Subject to Final Copy Editing The Comprehensive Approach to Analyzing Multivariate Constructs Ryne A. Sherman & David G. Serfass Florida Atlantic University Author Notes Ryne A. Sherman, Florida Atlantic University; David G. Serfass, Florida Atlantic University; Correspondence regarding this article may be addressed to Ryne Sherman by at rsherm13@fau.edu. All statistical analyses were conducted using R (R Core Team, 2014). We thank Dustin Wood for comments on a prior draft of this article. All errors and omissions remain our own.

2 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 2 Abstract Many psychological constructs of interest to personality psychologists, such as personality, behavior, and emotions, are made up of many variables. Moreover, similarity metrics, such as self-other agreement, profile similarity, or behavioral consistency, result from calculations conducted across many variables. When analyzed using a comprehensive approach, such multivariate constructs present unique analytic challenges. Such challenges are not well addressed in standard graduate statistics textbooks or presently available in standard commercial software. This article introduces the multicon package, freely available in the R statistical package, designed to aid researchers interested in taking a comprehensive approach to analyzing multivariate constructs. Realistic examples from personality psychology are provided to demonstrate the utility of this package. Keywords: multivariate constructs; profile correlations; R; replicability

3 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 3 The Comprehensive Approach to Analyzing Multivariate Constructs Is personality related to behavior? How do extraverts behave differently from introverts? How well do two people agree about what someone else s personality is like? How accurately can we judge someone else s personality? How similar/consistent are people or situations? Personality scientists are often concerned with these sorts of questions and many more like them. However, answering questions such as these can be quite complicated. To see why, compare these questions to another question: What is the relationship between a person s height and weight? A key difference is that the constructs of interest in the first set of questions are multivariate, while the constructs in the latter question are not. Multivariate constructs, as the name implies, refer to psychological constructs that consist of many psychological variables. 1 Many constructs of interest to personality psychologists are multivariate in nature: personality, behavior, emotions, motives, situations, etc. The difficulty with multivariate constructs is that they make answering questions like those posed in at the outset challenging. For example, answering the question about the relationship between personality and behavior requires, at minimum, some definition of what is meant by personality and behavior. Depending on one s particular perspective, the multivariate construct of personality might include thousands of traits (Allport & Odbert, 1936), one-hundred (Block, 1961), or merely a handful (i.e., 5; McCrae & Costa, 2008). Regardless, most personality scientists recognize that personality is a multivariate construct. Behavior is also a multivariate construct although arguably psychologists have put less effort into taxonomizing behavior than personality (Furr, 2009). There are roughly two strategies psychologists have used to deal with the problem of multivariate constructs. 2 The first strategy reduces the construct(s) of interest to a smaller

4 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 4 number (e.g., 1-6) of more mentally tractable, often empirically derived, essential variables. We refer to this strategy as the essential approach. For example, instead of personality (broadly construed) one might focus on just a single trait (e.g., extraversion) or a subset of broad traits (e.g., the Big 5). Likewise, instead of behavior (broadly construed) one might focus on just a single behavior (e.g., talkativeness) or on a subset of broad behaviors (e.g., interpersonal behaviors from the Interpersonal Circumplex). The second strategy for dealing with the problem of multivariate constructs tries to avoid data reduction as much as possible preferring to comprehensively assess and analyze the many relationships between the constructs of interest. We refer to this strategy as the comprehensive approach (Sherman & Wood, 2014). A researcher employing this approach may use measures designed such that each item represents a distinct characteristic such as the California Adult Q- set (CAQ: Block, 1961) or the Inventory for Individual Differences in the Lexicon (IIDL: Wood, Nye, & Saucier, 2010). Alternatively, a comprehensive approach may even employ measures designed to assess essential variables (e.g., the NEO PI-R: Costa & McCrae, 1992; the Big Five Inventory: John & Srivastava, 1999; the HEXACO-PI-R: Lee & Ashton, 2004), but treat each item as if it were to be analyzed separately (cf. Biesanz, 2010; Biesanz & Human, 2010; Human & Biesanz, 2011a, b). There are strengths and weaknesses to both approaches. The essential approach reduces complex multivariate constructs such as personality and behavior into mentally tractable subsets. This makes the research conceptually easier to transmit to other scientists and beyond. The comprehensive approach, on the other hand, can be mentally taxing (i.e., who wants to look at a correlation matrix with = 6700 unique elements?; see section Are these two multivariate constructs related ). An additional advantage of the essential approach is that it can drastically

5 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 5 reduce the number of variables analyzed resulting in lower Type I error rates. The comprehensive approach often involves computing a large number of correlations and risks identifying noise as signal. However, the essential approach may miss or obscure associations between the constructs of interest (cf. Brown & Sherman, in press; Fast & Funder, 2008; Hirsh, DeYoung, Xu, & Peterson, 2010). The comprehensive approach is less likely to miss or obscure such associations. Lastly, both the essential and comprehensive approaches can be used to answer questions about agreement, similarity, or consistency at the nomethetic (e.g., item) level. However, comprehensive approaches which include more variables may be superior for addressing these questions at the ideographic (e.g., person, profile) level because the increased number of variables increases the reliability of such profiles. A perhaps less-well recognized difference between the essential and comprehensive approaches is that the statistical tools for conducting analyses from an essential approach are well-described in graduate statistics textbooks, widely available in standard commercial software (e.g., SAS, SPSS, Excel), and easy to implement. The comprehensive approach, on the other hand, comes with a unique set of problems (e.g., how to handle so many variables, how to appropriately test for profile similarity) requiring different data analytic methods. Such methods are not (a) well-described in textbooks, (b) widely available in standard commercial software, or (c) easy to implement. This article introduces the multicon package an R package offering functions designed to deal with the problems inherent with the comprehensive approach for handling multivariate constructs (Sherman, 2014). In this article, we provide examples of realistic questions a personality scientist may encounter and show how a researcher using a comprehensive approach might use the functions available in the multicon package to address

6 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 6 these questions. Table 1 provides a summary of the types of questions we address in this article along with the functions from the multicon package used to address them. All datasets used in these examples are built into the multicon package making it easy to follow along. 3 Although we refer to differences between the essential and comprehensive approaches to handling multivariate constructs, this article is not meant to create, or resolve, a conflict between these two approaches. Indeed, as noted previously, both approaches have strengths and weaknesses. As such, this article will primarily focus on analytic issues involved in using a comprehensive approach and describe the tools provided by the multicon package to help resolve them. Are these two multivariate constructs related? We began by asking what appears to be a simple question: Is personality and behavior? Let us say that we have measured personality with the 100-item CAQ (Block, 1961) and behavior with the 67-item Riverside Behavioral Q-sort (RBQ: Funder, Furr, & Colvin, 2000; Furr, Wagerman, & Funder, 2010). The essential approach to this question would be to first, for both personality and behavior, reduce the number of items measured to some essential subset. Such subsets could be derived empirically (e.g., factor analysis, principal components) or theoretically (e.g., the interpersonal circumplex; see Markey, Funder, & Ozer, 2003). The second stop using the essential approach would then be to examine the associations (correlations) between the resultant subsets of variables. Almost all software packages, commercial or otherwise, are designed to make such analyses easy and convenient. A comprehensive approach this question though would aim to analyze the full set of correlations between all 100 personality items and the 67 behaviors. Calculating such a correlation matrix is usually quite easy in just about any statistical package. However, as previously noted, perusing through a matrix of 6700 correlations will likely prove mentally

7 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 7 intractable. Thus, an alternative method for quantifying the degree of relationship between personality and behavior is needed. One method is to count the total number of statistically significant correlations in the matrix (cf. Block, 1960). Another is to determine if the average magnitude amongst the 6700 correlations is larger than one would expect if the constructs were not related (Sherman & Funder, 2009). Following Sherman and Funder (2009), a randomization test can be used to do both of these simultaneously. The test randomly reassigns CAQ profiles to RBQ profiles, creating a pseudo dataset, and calculates both the total number of statistically significant correlations and the average absolute r amongst the 6700 correlations in this pseudo dataset. To better illustrate this process, imagine picking up each subject s CAQ profile (keeping all 100 scores intact) and randomly reassigning this profile to a subject s RBQ profile. In doing so, one is simulating a random relationship between personality and behavior, while maintaining the dependencies (covariation) within the multivariate constructs. Next, one calculates the correlation matrix on this pseudo dataset and records the number of statistically significant correlations and the average absolute r of this correlation matrix. These numbers represent simulated values under a model of a random relationship between personality and behavior. Repeating this procedure many times allows for the formation of a sampling distribution, to which we can compare the observed results from the original dataset. Calculating the proportion of simulated values greater than or equal to the observed values (for the number statistically significant and the average absolute r respectively) yields a p-value indicating the probability of obtaining the originally observed results under chance. Conducting such an analysis using standard commercial software is either not possible or would require an arduous amount of programming. The rand.test function in the multicon package conducts such an analysis. In this example, we use the rand.test function to determine

8 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 8 whether personality (as measured by the CAQ) has an overall relationship with behavior (as measured by the RBQ). install.packages( multicon ) # Only if this is the first time using this package library(multicon)# Load the mulitcon package data(caq)# Loading the CAQ dataset data(beh.comp) # Loading the behavior dataset rand.test(caq, beh.comp, sims=10000) # The analysis; could take a minute or so It should be noted that because the sims argument is set to 10,000, which is ten times more than the default value, this analysis may take 30 seconds or more. The output from this analysis is a list with two objects: one for the average absolute correlation ($AbsR) and the other for the number of statistically significant results ($Sig). 4 # Output below $AbsR Average Absolute r N Observed Exp. By Chance Standard Error p % Upperbound p % Lowerbound p th % $Sig Number Significant N Observed Exp. By Chance Standard Error p % Upperbound p % Lowerbound p th % There are 205 valid cases (listwise deletion is used). The observed average absolute r was This can be compared to the value expected by chance alone which is.0559 with a standard error of The resulting probability (p-value) of observing a value of.0699 under a null model of no association between personality and behavior is < A similar list of findings is reported for the number of statistically significant results showing 790 observed statistically significant associations, a null expected value of 335, and a p-value of < Given the arbitrariness of statistical significance levels, the results based on the average absolute r values are usually

9 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 9 preferred (Sherman & Funder, 2009). Overall, these results demonstrate the relationship between the multivariate constructs of personality and behavior is much greater than one would expect by chance alone. Or in other words, personality does really seem to be related to behavior. How is a particular variable of interest related to a multivariate construct? The prior example concerns the case where a researcher is interested in the relationship between two multivariate constructs. However, sometimes researchers are interested in the relationship between a single variable of interest and some other multivariate construct. Notable examples include: who is at risk to abuse drugs (Block, Block, & Keyes, 1988; Shedler & Block, 1990; Walton & Roberts, 2004), how is childhood personality related to adult political orientation (Block & Block, 2006), what kinds of people are liked by others (Wortman & Wood, 2011), what kinds of people are likely to procrastinate (Watson, 2001), or how is a particular personality trait associated with adult behavior (Nave, Sherman, Funder, Hampson, & Goldberg, 2010). To use an example with real data, let us say that we are interested in the association between trait extraversion and the aforementioned 67 behaviors from the RBQ. Using an essential approach, we might first attempt to empirically reduce the 67 behaviors to a more manageable set. Then, after identifying such a set we might correlate these behavioral dimensions with extraversion. 6 A comprehensive approach to the question of the relationship between extraversion and behavior would be more interested in the correlations between extraversion and all 67 behaviors. Computing these correlations (and the associated t-test) is easily done with just about any data analytic software. Consistent with tradition using this approach (Block, 1961; Funder, 2013), an abbreviated table (i.e., just those reaching some level of statistical significance) of these

10 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 10 correlations are shown in Table 2. Because such tables are common in work with multivariate constructs, but sometimes arduous to put together, the q.cor function in the multicon package generates the information for such tables. Moreover, an object resulting from the q.cor function can be quickly summarized into a tidy table by passing it to R s generic print function. data(beh.comp)# Loading the behavioral composites dataset data(rspdata) # Loading the RSP data set to get extraversion scores ext.obj1 <- q.cor(rspdata$sext, beh.comp, sex=rspdata$ssex, fem=1, male=2, sims=1) data(rbqv3.items) # Loading the item content for the RBQ print(ext.obj1, rbqv3.items, "RBQ", short=t)# Viewing the results easily The q.cor function takes several arguments. The first argument is the variable of interest (in this case extraversion). The second argument is the multivariate construct of interest (in this case the RBQ scores). The third argument is a variable denoting the sex of the participants. Traditionally, research using this approach examines the correlations for the full sample and separately by sex (Block, 1961). However, any binary variable can be passed to this argument. The fourth and fifth arguments tell the q.cor function the codes for the aforementioned binary variable for females and males respectively. Finally, the sims argument tells the function how many randomly simulated datasets to use for the randomization test (discussed shortly). For simplicity, we have set this number to 1 at this point. 7 This example also passes four arguments to the print function. The first is the object created by q.cor just discussed. The second is a vector containing the item content for the behavioral items. The third argument ( RBQ ) is character indicating a short abbreviation for the list of behavioral items. In practice, neither of these latter two arguments needs to be included. The print function will create generic item names, if these arguments are not specified (e.g., item1, item2). The fourth argument (short=true) returns only an abbreviated list of the results (i.e., the same as those in Table 2) by removing any items that do not have a p-value of less than

11 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS for the combined sample or less than.05 for either sex. By default (short=false), the full list of items and their correlations are returned. Executing the code just described generates a table similar to that shown in Table 2. It is worth noting that the vector correlation between the full set of correlations for women and men (i.e., the 67 correlations for women correlated with the 67 correlation for men) is returned (r =.67), as an indicator of the consistency of the results across sex. The value is reported in the note at the bottom of Table 2. The main results in Table 2 are based on 67 correlations and significance tests. As such, we are bound to find both some large correlations and statistically significant results, even if the data were generated randomly. What is needed is a statistic that can establish whether the pattern of correlations shown in Table 2 is more than just noise. The aforementioned rand.test function in the multicon package does just that. In this case however, instead of randomly reassigning entire personality profiles to behavioral profiles, only the extraversion scores for each subject are randomly reassigned to behavioral profiles to create pseud datasets. Otheriwse, the procedures (i.e., calculating and recording the average absolute r and the number significant on each pseudo dataset to form a sampling distribution) are the same.

12 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 12 rand.test(rspdata$sext, beh.comp, sims=10000) # Output below $AbsR Average Absolute r N Observed Exp. By Chance Standard Error p % Upperbound p % Lowerbound p th % $Sig Number Significant N Observed Exp. By Chance Standard Error p % Upperbound p % Lowerbound p th % In this example, the rand.test function takes three arguments. The first is a vector containing the scores for the variable of interest (extraversion), the second is a data.frame or matrix containing the multivariate construct of interest (behavior), and the third is the number of sims (which we have changed from the usual default of 1,000 to 10,000 to increase precision). As before, the results from this analysis are divided into two sections: one for the average absolute correlation ($AbsR) and the other for the number of statistically significant results ($Sig). The observed average absolute r was.0904 between trait-level extraversion and the 67 behavioral composites. This can be compared to the value expected by chance which is.0559 with a standard error of The resulting probability (p-value) of observing a value of.0904 under a null model of no association between extraversion and behavior is.0016 and the 99.9% confidence interval for this p-value is.0003 to.0029 (indicating our p-value is accurate to within about.0026). A similar list of findings is reported for the number of statistically significant results showing 15 observed statistically significant associations, a null expected value of 3.32, and a p-value of.0022.

13 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 13 Of perhaps most interest, the q.cor function automatically calls the rand.test function so that they need not be conducted separately: ext.obj <- q.cor(rspdata$sext, beh.comp, sex=rspdata$ssex, fem=1, male=2, sims=1000) print(ext.obj, rbqv3.items, "RBQ", short=true) The output from these lines of code is the same as from the q.cor output previously, but this time the number of sims (which is passed to the rand.test function) has been set to Perhaps the most important value to most researchers will be the p-value for the average absolute association. These values have been added as the last row in Table 2, labeled Average Absolute r. These p-values indicate that there are meaningful (i.e., non-random) relationships between self-reported extraversion and the behavioral composites. As such, we can proceed with more confidence and justification that what we are interpreting in Table 2 is more than just noise. As Table 2 shows, those who scored high on extraversion were more likely to be talkative, have a high energy level, and speak in a loud voice. Conversely, those who scored low on extraversion were more likely to act reserved, with little expression, and to keep others at a distance. How replicable are the associations between a variable of interest and a multivariate construct? Although the randomization test in the previous analysis indicate that average association between extraversion and behavior is greater than we would expect by chance, it says nothing about the replicability of the results displayed in Table 2. Specifically, how much should we expect the overall observed pattern of associations abbreviated in Table 2 to replicate in new samples? Estimating the replicability of a typical effect in psychology requires, in most cases, conducting the study again on a new sample. Interestingly however, the expected replicability of the pattern of results in Table 2 can be estimated without the need to conduct a new study (see Sherman & Wood, 2014 for details). The note at the bottom of Table 2 indicates the estimated replicabilities for the full patterns of correlations between extraversion and the behavioral

14 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 14 composites. These values should be interpreted as the expected correlation between the observed full pattern of correlations between extraversion and behavior and the pattern of correlations one would observe if one were to conduct the study again on a new sample of the same size drawn from the same population (Sherman & Wood, 2014). In other words, these values represent the replicability of the patterns of correlations expressed as an alpha reliability metric. The expected replicabilities for the patterns of correlations are computed using the vector.alpha function in the multicon package. round(vector.alpha(rspdata$sext, beh.comp),2) # Full sample, use round to get 2 digits # Output below Results N Average r 0.01 Alpha 0.67 Lower Limit 0.54 Upper Limit 0.77 The results indicate the sample size (listwise deletion is used), the average correlation amongst the transposed cross-products of Z-scores (see Sherman & Wood, 2014 for technical details), the estimated replicability (Alpha) and the confidence intervals (95% by default) for the replicability estimate. In this case we see a replicability value of.67 indicating that we would expect the full pattern of results, abbreviated in Table 2, to correlate approximately.67 [.54,.77] with the results from a new sample of the same size (N=205) drawn from the same population. Such a value also bolsters our confidence and justification that we can proceed with substantive interpretations of the pattern of results observed in Table 2. How well do judges agree about a target? The examples thus far have concerned questions of how a multivariate construct is related to another construct of interest (e.g., another multivariate construct or single variable). At other times researchers are interested in questions of agreement, similarity, or consistency, in multivariate constructs rated by different judges or measured across time. Indeed, perhaps one of

15 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 15 the most foundational questions of personality psychology pertains to the agreement between judges. Agreement among independent judgments about what targets are like provides strong evidence for the existence of some real attributes belonging to the targets (Funder & Dobroth, 1987; Norman & Goldberg, 1966). Because consensus among independent judgments of targets personalities is so well-established (Albright, Kenny, & Malloy, 1988; Albright, Malloy, Dong, Kenny, Fang, Winquist, & Yu, 1997; Kenny, Albright, Malloy, & Kashy, 1994) personality researchers are rarely interested in only estimating such effects today. More often personality scientists are interested in consensus as an indicator of the reliability of a set of informant reports about a target (Vazire, 2006). For example, many studies gather personality reports from multiple acquaintances of a target and average these ratings to form informant composites (e.g., Back, Stopfer, Vazire, Gaddis, Schmukle, Egloff, & Gosling, 2010; Carlson, Vazire, & Furr, 2011; Colvin & Funder, 1991; Funder, Kolar, & Blackman, 1995; Oltmanns & Turkheimer, 2009; Vazire & Mehl, 2008). These informant composites are then used to predict some other outcome of interest. Because the acquaintances are typically not distinguishable judges (i.e., each acquaintance rates only one target and there are no psychologically important differences between acquaintances), the appropriate reliability statistics for such composites comes from the intraclass correlation (ICC: Shrout & Fleiss, 1979). An item-level ICC is easy to compute using popular commercial software. However, when working with a multivariate construct such as personality, researchers may be interested in computing many (e.g., 100, one for each CAQ item) ICCs at a given time, something that can be rather burdensome in popular commercial software. The item.icc function in the multicon package computes such ICCs easily.

16 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 16 data(acq1) # A data.frame containing 100 personality judgments from the first acquaintance data(acq2) # A data.frame containing 100 personality judgments from the second acquaintance item.icc(acq1, acq2) The item.icc function takes at least two arguments, which must be data.frames of the same size with the columns containing the corresponding items (i.e., the first item is in the first column in both data.frames). In the case of multiple raters or occasions, one can simply add additional data.frames of the same size. The results for this example provide all six possible ICCs (Shrout & Fleiss, 1979) for the pairs of acquaintances across all 100 personality characteristics. By applying the describe function from the psych package (Revelle, 2014; automatically loaded with multicon ) to these results, we can obtain a summary of the results across all 100 personality characteristics. describe(item.icc(acq1, acq2)) # Output below var n mean sd median trimmed mad min max range skew kurtosis se ICC ICC1k ICC ICC2k ICC ICC3k In this example, the average reliability (across all 100 items rated) for a single rater was.11 (SD =.09) and the average reliability of an item composite was.18 (SD =.15). Moreover, the reliabilities for some composites ranged from a low of -.33 to a high of.53, (ICC1,k) indicating wide variability across the items in terms of agreement. Functions such as item.icc are perfect for questions about item-level agreement, similarity, or consistency. However, sometimes researchers may be interested in profile-level agreement instead, a particular strength of the comprehensive approach. The Profile.ICC function in the multicon package makes such computations effortless. Using the aforementioned acquaintance ratings we can do the following:

17 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 17 Profile.ICC(acq1,acq2) # The profile-level ICCs between the two judges describe(profile.icc(acq1,acq2)) # Descriptives for the agreements # Output below var n mean sd median trimmed mad min max range skew kurtosis se ICC ICC1k ICC ICC2k ICC ICC3k Like the item.icc function, the Profile.ICC function takes at least two arguments that must be data.frames of the same size, but this time the analysis is done on the rows rather than the columns. In the case of multiple raters or more occasions, one can simply add additional data.frames of the same size. In this example the average profile-level reliability (across all 205 acquaintance pairs) for a single rater was.35 (SD =.21) and the average reliability of a composite profile was.48 (SD =.27). Such information may be valuable, and worth reporting, when creating composite profiles from two or more raters of a target. In addition, such values also provide individual consensus scores for each target which may be used to understand which targets are more judgable than others (Colvin, 1993). How accurate are judgments about a target? When judges (or time periods) are distinguishable, the usual Pearson s correlation is often the preferred metric for indexing agreement, similarity, or consistency. One particular index of similarity of interest to personality scientists is accuracy of judgments. The question of accuracy in personality judgments has a long history and this is hardly the place to review it (see Funder, 1999; Jussim, 2012; Kenny, 1994 for excellent reviews). Instead, we simply note that accuracy in personality judgment is often quantified via agreement (e.g., self-other) between judges at either the item-level (e.g., Funder & Colvin, 1988; Küfner, Back, Nestler, & Egloff,

18 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS ; Watson, 1989) or the profile-level (e.g., Biesanz & Human, 2010; Human & Biesanz, 2011, 2012; Letzring, 2008; Letzring, Wells, & Funder, 2006). Assessing agreement, similarity, or consistency at the item-level using basic R software is straightforward. In the next example, self-ratings on the CAQ are correlated with acquaintance composite ratings on the CAQ creating a correlation matrix. Because the items are in the same order (i.e., corresponding columns in the two datasets), the diagonal of this matrix contains the item-level agreements for each of the 100 CAQ items. If we are interested in the descriptive statistics (e.g., means, medians, SDs) for these 100 correlations, the describe.r function in the multicon package does the appropriate calculations applying r-to-z transformations and back when necessary. Finally, we may be interested in estimating confidence intervals around the average item-level agreement. We can use R s built-in t.test function to calculate these. data(acq.comp) # Acquaintance composites of personality on the 100-item CAQ data(caq) # Self-reported personality on the 100-item CAQ diag(cor(acq.comp, caq)) # The agreements on the 100-items describe.r(diag(cor(acq.comp, caq))) # Describing the agreements t.test(fisherz(diag(cor(acq.comp, caq)))) # t-test against zero R s built in cor function takes two arguments, the data.frames containing the personality ratings of interest from acquaintances and from the self. Applying the diag function to the resulting correlation matrix returns the correlations of interest (i.e., one accuracy correlation per item). The describe.r function summarizes these 100 correlations appropriately applying r-to-z transformations (and back). Finally, R s built-in t.test function computes a 95% confidence interval around this average value. In this example we see that the average item-level agreement is r =.17 (SD =.09) with a minimum of -.08 and a maximum of.41. The 95% confidence interval around this average item-level agreement is [.15,.18] suggesting that the average itemlevel self-other agreement of.17 is well-captured (i.e., accurate) and greater than zero.

19 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 19 Assessing profile-level agreement (or accuracy) using the basic R statistical software, or any commercially available software package, is somewhat less straightforward. At the minimum, it typically involves first transposing one s dataset and then computing correlations on the new columns (formerly the rows). However, the Profile.r function in the multicon package easily computes profile correlations without any extra steps. Using the same acquaintance composites and self-report ratings on the CAQ, the following code can be used to quantify profile-level agreement. Again, describe.r gets the appropriate descriptive statistics for the agreement coefficients: Profile.r(acq.comp, caq) # The profile accuracy scores describe.r(profile.r(acq.comp, caq)) # Describing the accuracy scores # Output below var n miss mean sd median trimmed mad min max range skew kurtosis se In this example the average profile-level agreement between acquaintance CAQ composites and self-reports is.47 (SD =.22) with a minimum of -.04 and a maximum of.82. One complication with using profile-level agreement as an indicator of accuracy however is that such correlations are confounded by normativeness (Cronbach, 1955; Furr, 2008). In other words, a positive association between two profiles may not actually reflect agreement or knowledge of another particular person, but simply knowledge of what people are like in general (i.e., describing the average person). Thus, a t-test of the average profile agreement against zero would not appropriately test the hypothesis that people are accurate in knowing each other above chance levels. Furr (2008) provided two different routes to resolving this issue. The first is to create an empirical estimate of the true baseline level profile agreement. This can be done by randomizing the profile pairs so that they are matched with a different profile (e.g., acquaintance ratings for subject 1 are paired with self-ratings from subject 2, etc.), computing the average agreement

20 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 20 amongst these randomly paired profiles and considering it the baseline (Letzring et al., 2006). More ideally, one could calculate the average profile agreement between all non-paired profiles and test the observed average profile agreement against this number. The second solution offered by Furr (2008) is to first remove the normative (i.e., average) profiles from both sets of profiles and then to calculate profile agreement as one normally would on these distinctive profiles. Such agreements, often referred to as distinctive profile agreements, can then be appropriately tested against a baseline of zero. The Profile.r function in the multicon package has an option for easily conducting both of these analyses. 8 prof.out <- Profile.r(acq.comp, caq, distinct=true) str(prof.out) prof.out$agreement # The overall and distinctive profile accuracies round(describe.r(prof.out$agreement),2) # Their descriptives round(prof.out$tests,3) # And their appropriate test statistics By setting the distinct option in the Profile.r function to TRUE we get an object containing (a) The mean (normative) acquaintance composite profile, (b) the mean (normative) self-reported profile, (c) the correlation between the two normative profiles, (d) both the overall and distinctive profile agreements for each subject, and (e) tests of statistical significance for both the average overall and average distinctive profile agreements. Once again, by applying describe.r to the agreements we get their descriptive statistics. # Output below var n miss mean sd median trimmed mad min max range skew kurtosis se Overall Distinctive Overall Distinctive N Mean baseline t p-value

21 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 21 In this example the average overall agreement is r =.47 (SD =.22), which is the same as the average profile agreement indicated previously. In addition, the average distinctive profile agreement is r =.17 (SD =.15). Testing these against their appropriate baselines (.36 and.00 respectively) we see that both results are unlikely to have occurred under the null hypothesis of no association between self-other ratings (ps <.001). Sometimes researchers are not interested in just assessing the average level of profile agreement and testing it against its baseline. In fact, sometimes predicting profile agreements (e.g., accuracy scores) is the question of interest (e.g., Who is easy to judge?: Colvin, 1993; Who is a good judge?: Letzring, 2008; Who is similar to whom?: Wortman, Wood, Furr, Fanciullo, & Harms, 2014). In such cases where profile agreements are later correlated with some other variable(s) of interest, it may be of importance to know the reliability of the profile agreements themselves (Wood & Brumbaugh, 2009). One way of assessing the reliability (or replicability; see Sherman & Wood, 2014) of a pattern of profile agreements relies on the fact that correlations are simply averages of cross-products of standardized scores. Thus, much in the same way as one computes internal consistency for composites from a rating scale, one may apply the same logic to cross-products of standardized scores and compute alpha on these values (see Sherman & Wood, 2014 for details). The R function Profile.r.rep in the multicon package computes the reliabilities (or replicabilities) for both overall and distinctive a patterns of profile agreements. Profile.r.rep(acq.comp, caq) # Output below Replicability Lower Limit Upper Limit Overall Distinctive In this example, the replicability for the pattern of self-acquaintance overall agreements is.72 [95% CI =.64,.79] while for distinctive agreements it is.46 [.30,.60]. These numbers indicate that if one were to randomly draw another set of 100 items from the population of items

22 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 22 from which the 100 CAQ items were generated, and have these same participants rate themselves again, we would expect the patterns of profile agreements to correlate with each other at.72 and.46 for overall and distinctive profile agreements respectively. Such values have implications for researchers using profile similarity scores in subsequent analyses. For example, some researchers may desire to correlate similarity scores with length of acquaintanceship to ostensibly test the hypothesis that people who know each other longer judge each other more accurately. With replicability values of.72 and.46 respectively, we know that these are the upper bounds on the possible association between length of acquaintance and self-other agreement (much in the same way that reliability is the upper bound on validity). Although the correlation is a popular choice for quantifying profile agreement, similarity, or consistency, researchers may alternatively be interested in a regression approach providing both an intercept and slope between pairs of profiles. The Profile.reg function in the multicon package makes assessing profile agreement via regression straightforward and includes options for centering the profiles ( group [default] within-profile centering, grand between-profile centering, and none no centering) and standardizing (FALSE [default] no standardizing and TRUE standardized with the level determined by the center argument). Profile.reg(acq.comp, caq) # Intercepts and slopes, defaults to group mean (within-s) centering Profile.reg(acq.comp, caq, std=t) # Standardized Profile.reg(acq.comp, caq, std=t, center="grand") # Grand mean standardizing instead The Profile.reg function takes two arguments. The first argument is a data.frame containing the predictor profiles (i.e., X). The second argument is a data.frame containing the predicted profiles (i.e., Y). As can be seen by running these examples, an intercept and slope is returned for each pair of profiles, with again various options for how variables should be centered and/or standardized.

23 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 23 How do profiles differ? Although researchers are often interested in agreement, similarity, or consistency amongst pairs of profiles, on some occasions they may be interested in how profiles differ, or how one profile is distinctive from another. For example, in one recent study third-party ratings of a situation were statistically removed from participant-ratings of the same situation in order to retain distinctive self-ratings or construals. That is, how individuals saw the situation differently from observers, which can be used as a measure of biases in situation perception. These construals were subsequently correlated with personality (Sherman, Nave, & Funder, 2013). Conducting such an analysis using standard commercial software can be a cumbersome task involving multiple transpositions of the raw data, storing of residuals, and recombining data sets. The Profile.resid function in the multicon package makes obtaining distinctive profiles (i.e., residuals) from pairs of profiles easy. resid.out <- Profile.resid(acq.comp, caq) head(resid.out) The Profile.resid function takes two arguments. The first is a data.frame containing the predicting profiles (i.e., X) and the second is a data.frame containing the predicted profiles (i.e., Y). In this example, self-reported CAQ profiles are predicted from acquaintance composite CAQ profiles of the target. The resulting residuals for each pair of profiles are retained. Because each CAQ profile contains 100 items and there are 205 subjects in this data set, the resulting object (resid.out) is a data.frame containing the distinctive self-reported CAQ profile scores (residuals) after controlling for acquaintance composite profile scores. Intuitively, one might also think that a difference score approach, wherein acquaintance CAQ composite scores are simply subtracted from self-reported CAQ scores, would yield the same results. While these

24 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 24 two approaches are related, they are not identical. Mathematically, if the correlation between the self-reported CAQ profile and the acquaintance composite profile were 1.00, the difference score method and the regression based method provided by Profile.resid would return identical results. On the other hand, if the correlation were.00, nothing would be removed from the self-reported CAQ profile using Profile.resid. This is not true of the difference score method. Therefore, the size of the relationship between the two profiles is an important aspect of how differences are calculated when using the regression based approach provided by Profile.resid. This aspect is not captured by the difference score approach, which implicitly assumes all pairs of profiles are equally correlated (Sherman et al., 2012). In the case where a researcher is interested examining distinctiveness at the item-level instead of at the profile level (i.e., statistically removing the effect of one item on another for each pair of items rather than for pairs of profiles), the multicon package also includes the function item.resid. head(item.resid(acq.comp, caq)) The output format is the same as with Profile.resid except that the residuals come from itemlevel regressions rather than profile-level. Using multivariate constructs to test theoretical predictions Because using a comprehensive approach to dealing with multivariate constructs often involves many analyses it would be possible for someone to criticize this approach as being entirely exploratory and atheoretical. In fact however, by employing template matching (Bem & Funder, 1978) research using a comprehensive approach often is theoretically oriented (e.g., Sherman, Nave, & Funder, 2012; Sherman, Figueredo, & Funder, 2013). Template matching entails correlating (or matching) an observed profile of measured characteristics with a theoretically derived profile of those characteristics. The resulting template match scores, which

25 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 25 indicate the degree to which a particular profile corresponds with the theoretical template, can be used in subsequent analyses. In one recent study, participant self-reported CAQ profiles were correlated with a theoretically derived template for the prototypical slow-life history individual (Sherman et al., 2013). Like many of the other analyses described in this article, standard commercial software does not provide an easy and efficient method for getting template match scores. However, the temp.match function in the multicon package easily computes template match scores. data(caq) data(opt.temp) opt.temp # The optimally adjusted person temp.match(opt.temp, caq) # Overall template match scores describe.r(temp.match(opt.temp, caq)) The temp.match function takes two arguments. The first is the template itself, which is a vector containing a score for each item in the template. The second is a data.frame containing the profiles of scores to be matched to the template. In this example self-reported CAQ profiles for 205 participants are correlated with the optimally adjusted person template for the CAQ (Block, 1961). The now familiar describe.r function from the multicon package returns the descriptive statistics for these template match scores. Like other profile-level analyses though, these template match scores include both normative and distinctive components (Furr, 2008). By setting the distinct option in the temp.match function to TRUE however, both overall and distinctive (controlling for normativeness) template match scores are returned. temp.match(opt.temp, caq, distinct=true) describe.r(temp.match(opt.temp, caq, distinct=true)$matches) Interestingly, the results of this analysis reveal that while the average overall template match score with the optimally adjusted template is r =.50, when normativeness (the average personality profile) is removed, the average distinctive template match score is r =.00. Such a

26 Running Head: ANALYZING MULTIVARIATE CONSTRUCTS 26 result is in line with a flood of recent empirical evidence indicating that psychological adjustment is highly associated with normativeness (Baird, Le, & Lucas, 2006; Fleeson & Wilt, 2010; Human, Biesanz, Finseth, Pierce, & Le, 2014; Klimstra, Hale, Raaijmakers, & Meeus, 2011; Klimstra, Luyckx, Hale, Goossens, & Meeus, 2010; Letzring, 2008; Sherman, et al., 2012; Wood, Gosling, & Potter, 2007). Because template match scores are often correlated with other measures of interest researchers may also be interested in knowing the reliability or replicability of the scores themselves. For example, in one study Sherman and colleagues (2013) computed template match scores for the prototypical slow-life history person and then correlated these scores with behavior. The temp.match.rep function in the multicon package computes such replicabilities, with confidence intervals, for both overall and distinctive template match scores following the logic outlined by Sherman and Wood (2014). temp.match.rep(opt.temp, caq) # Output below Replicability Lower Limit Upper Limit Overall Distinctive The arguments given to temp.match.rep are identical those passed to temp.match (i.e., the template followed by the data.frame to be matched to the template). The results of this analysis indicate that while overall template match scores are quite replicable/reliable (.81 [.75,.86]), the distinctive template match scores are not (-.46 [-.89, -.08]). Indeed, the replicability/reliability is so low that the distinctive template match scores reflect little more than random noise. Thus, the functions available in the multicon package can illuminate the reliability of profile similarities using statistics that have not been available until recently. In some cases, as in this example, these statistics can be useful by indicating that one should not proceed with interpreting the correlates of a particular set of profile similarity scores at all.

Evidence of Differential Meta-Accuracy People Understand the Different Impressions They Make

PSYCHOLOGICAL SCIENCE Research Article Evidence of Differential Meta-Accuracy People Understand the Different Impressions They Make Erika N. Carlson 1 and R. Michael Furr 2 1 Washington University in St.