Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different variables type of psychological problem someone was treated for, type of treatment approach used, and symptom level at the end of treatment. One of these variables (symptom) is on an interval-level scale; the other two are grouping variables (i.e., they are measured on a nominal-level scale). The type of descriptive statistics you can calculate depends on the level of data that you have to work with. Here are the options: For simple description of nominal-level variables (groups) use frequencies For more complex description of nominal-level variables use crosstabs For simple description of interval- or ratio-level variables (items measured on a scale) use the descriptives command For more complex description of interval- or ratio-level variables use the explore command Each of these options will be shown in more detail below. One Important Note: Even though problem and treatment are nominal-level variables, they have to be coded as numbers (not as text) in order to use the following procedures. See the instructions on How to Recode Variables if you need to convert from a text variable to a number variable.
Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :
We will start by looking at frequencies for a nominal-level variable, like problem in this dataset. Open the Frequencies dialog box: Select the variable that you are interested in from the left-hand lists, and use the arrow button to move it from the left-hand list into the right-hand list. Use the Charts button to see a graphical output of the frequencies on your variable. In this case, I have selected a histogram. You can also use pie charts or bar charts. If you aren t happy with the way your data are displayed (ascending vs. descending order, etc.), try some of the options found by clicking on the Format button.
Hit OK in the main dialog box to continue. The output for this variable looks like this: Patient Diagnosis Valid Cumulative Frequency Percent Valid Percent Percent Anxiety 5 31.3 31.3 31.3 Depression 5 31.3 31.3 62.5 Eating Disorder 6 37.5 37.5 100.0 Total 16 100.0 100.0 This is the frequency table, which is the basic output from this procedure. It shows the number of people in each group, the percent of the total in each group, the percent in each group as a proportion of just the people with complete data (that s the valid percent ), and the cumulative percent reached as you add each new group to the previous ones (that s the cumulative percent ). Notice that the numeric values have been replaced with text labels these were entered in the values column in the variable view of the data. Histogram 6 5 4 Frequency 3 2 1 0 0.5 1 1.5 2 2.5 Patient Diagnosis Here s the graphical output: This histogram shows the number of people in each category. 3 3.5 Mean = 2.06 Std. Dev. = 0.854 N = 16
For a more detailed breakdown of nominal-level data, use the crosstabs command in the Analyze/Descriptive Statistics sub-menu. This is appropriate if you want to consider two variables in combination for example, what % of the patients with an anxiety disorder were treated with cognitive-behavioral therapy? The Crosstabs dialog box lets you enter one variable (or more) as rows in a frequency table, and another variable as the columns in the same table. Use the Cells command to get percentages for each row and column:
Hit OK in the main dialog box to continue. The output from this procedure looks like this: Patient Diagnosis * Type of Therapy Crosstabulation Patient Diagnosis Total Anxiety Depression Eating Disorder Count % within Patient Diagnosis % within Type of Therapy % of Total Count % within Patient Diagnosis % within Type of Therapy % of Total Count % within Patient Diagnosis % within Type of Therapy % of Total Count % within Patient Diagnosis % within Type of Therapy % of Total Type of Therapy CBT IPT Total 3 2 5 60.0% 40.0% 100.0% 37.5% 25.0% 31.3% 18.8% 12.5% 31.3% 2 3 5 40.0% 60.0% 100.0% 25.0% 37.5% 31.3% 12.5% 18.8% 31.3% 3 3 6 50.0% 50.0% 100.0% 37.5% 37.5% 37.5% 18.8% 18.8% 37.5% 8 8 16 50.0% 50.0% 100.0% 100.0% 100.0% 100.0% 50.0% 50.0% 100.0% The central area of this table shows you the total number of patients who have each individual combination of the various levels of the two variables. The far-right column shows you the total for each diagnosis, as a percent of all patients. The bottom-most row slices the data the other way, showing you the total for each type of treatment, as a percent of all patients. The interior cells of the table also show you each individual cell s total as a percentage of the patients with that particular diagnosis, as a percentage of the patients who received that particular type of therapy, and as a percentage of the overall total.
For interval- or ratio-level variables (i.e., scales ), use the descriptives sub-command on the Analyze/Descriptive Statistics menu: This opens a dialog box where you can select the variable that you want descriptive statistics on. Make sure that what you are selecting is actually an interval- or ratio-level variable. Because all of the data are entered as numbers, SPSS will actually calculate descriptive statistics on any of these variables; but these results are only meaningful for interval- or ratio-level variables (in this case, just the score variable).
As usual, select the variable(s) that you are interested in from the left-hand column, and move them to the right-hand column. Use the Options button to select the specific descriptive statistics that you are interested in. Usually, good choices include the mean, standard deviation, maximum, and minimum. If you are concerned about the impact of outliers on your data, a measure of skewness may also be appropriate. The output from this procedure looks like this: Descriptive Statistics N Minimum Maximum Mean Std. Deviation Symptom Severity (higher = worse) 16 19.00 33.00 24.9375 4.15482 Valid N (listwise) 16 This shows you the specific results for each variable that you entered into the analysis. It is possible to get a table that gives you basic descriptive statistics for many variables simultaneously, just by moving them all at once from the left-hand to the right-hand list in the main dialog box.
Finally, to get more complex descriptive results for an interval- or ratio-level variable, use the explore command in the Analyze/Descriptive Statistics sub-menu. The following dialog box will appear: The variable that you are interested in goes into the dependent list. For now, we won t use the other lists, but we will use the plots button to select additional options.
The plots dialog box lets you select options for graphical display of the data, including a stem-and-leaf plot like this: Frequency Stem & Leaf 1.00 1. 9 7.00 2. 0112444 5.00 2. 55567 3.00 3. 123 Stem width: 10.00 Each leaf: 1 case(s)
A boxplot like this: 33 30 27 24 21 18 Symptom Severity (higher = worse) A histogram like this: Histogram 7 6 5 Frequency 4 3 2 1 0 18.00 21.00 24.00 27.00 30.00 Symptom Severity (higher = worse) 33.00 Mean = 24.9375 Std. Dev. = 4.15482 N = 16
Or a plot to test whether the data are normally distributed, like this: Normal Q-Q Plot of Symptom Severity (higher = worse) 2 1 Expected Normal 0-1 -2 18 21 24 27 Observed Value (This result is showing a slight deviation from normality, based on the difference between the actual data points and the theoretical line they should fit, but probably nothing to worry about. Some tests of normality are found earlier in the printout, and give you an actual test to determine how far off the data are from a normal distribution. If these tests are statistically significant [p <.05], you may need to do a data transformation to correct for non-normality, before you run any statistical tests): Tests of Normality 30 33 Kolmogorov-Smirnov(a) Shapiro-Wilk Statistic df Sig. Statistic df Sig. Symptom Severity (higher = worse).181 16.166.924 16.196 a Lilliefors Significance Correction
The explore command also gives you descriptive statistics for each variable, with more extensive results (median, interquartile range, etc.) than the descriptives command: Descriptives Symptom Severity (higher = worse) Mean 95% Confidence Interval for Mean Lower Bound Upper Bound Statistic Std. Error 24.9375 1.03870 22.7236 27.1514 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis 24.8194 24.5000 17.263 4.15482 19.00 33.00 14.00 5.50.669.564 -.205 1.091 One additional feature of the explore command is the ability to get separate descriptive results for different sub-groups within your dataset. The way to do this is by grouping your data based on some additional variable, which is treated as a factor for breaking down the analyses. Go back to the main dialog box to add one of your nominal-level variables as a factor (the grouping variable always has to be nominal-level): After you add this factor, the results will show you two separate sets of statistics and plots one for just those patients in the CBT group, and a second set for just those patients in the IPT group.