INTRODUCTION TO MEDICAL RESEARCH: ESSENTIAL SKILLS

INTRODUCTION TO MEDICAL RESEARCH: ESSENTIAL SKILLS SCALES OF MEASUREMENT AND WAYS OF SUMMARIZING DATA Alecsandra IRIMIE-ANA 1 1. Psychiatry Hospital Prof. Dr. Alexandru Obregia ABSTRACT Regardless the type of analysis performed on the data we have, a prerequisite step into our research is choosing the most appropriate way of measuring our variables, namely the scale. Based on the measuring scale, we can distinguish 3 categories: nominal, ordinal, and numerical, the latter being further subdivided into continuous and discrete. This article is structured into 2 distinct parts, first of which introduces the notions regarding the measuring scales that were previously mentioned, while the second deals with ways of summarizing data. spread Keywords: measurement scales, summarizing data, measures of central tendency, measures of NOMINAL SCALES (also known as qualitative/categorical observations) are used to measure characteristics that are neither numerical nor ranked according to an inherent hierarchy [1]. Examples of data measured on such a scale could be eye color, patients gender, provenience (urban/rural), etc. *Corresponding Author: Alecsandra Irimie-Ana, MD, resident in Child and Adolescent Psychiatry, Child and Adolescent Psychiatry Department, Prof. Dr. Al. Obregia Psychiatry Hospital, Bucharest, Romania. Address: Berceni Street, no. 10-12, Postal code: 041914, sector 4, Bucharest, email: alecsandraana1@yahoo.com. When the result of the measurement can take only two values the observation is called binary or dichotomous [2].One might argue that measuring on a nominal scale is an oxymoron but, according to Vasilescu, 1992, (apud Opariuc-Dan, 2009), measuring something means, broadly speaking, attaching it to a number, while considering certain rules [3]. These being said, the process of encoding the patients response to a treatment as -1 = negative, 0 = stationary, and 1 = positive, is a form of measurement. Volume 4, Issue 1-2, January-July 2016 67

Alecsandra IRIMIE-ANA Nominal data are usually presented in terms of percentages or proportions. For each type of scale there are some particular transformations and statistical operations that are allowed and can be performed. For the nominal scale, the transformations are: renaming and permutation, while the statistical operations are: the calculation of frequencies, percentages, modes (the most frequent encountered category), Chi-square test, and correlation coefficients when the variables have 2 values each [3]. ORDINAL SCALES, as their name implies, assume the existence of an inherent order among the evaluated characteristics. In marketing it is frequently encountered if we are only to think of the questionnaires assessing client satisfaction: not at all satisfied, slightly satisfied, somewhat satisfied, very satisfied, and completely satisfied. The most well-known such scale in medicine is the universal pain assessment scale where emoticons are attached to numbers from 0 to 10. In order to understand which operations are allowed for this scale, we must always have in mind the following: Although order exists among categories in ordinal scales, the difference between two adjacent categories is not the same throughout the scale [2]. The transformations allowed in this case are: the calculation of the square root and the squaring. The statistical operations are: the calculation of frequencies, percentages, sign test, Mann-Whitney test, Wilcoxon, Kolmogorov-Smirnov, Kruskal Wallis, and correlation coefficients Spearman, Kendall [3]. Observations measured on NUMERICAL SCALES (also known as quantitative observations) are, as we stated earlier, divided into continuous (values on a continuum, can reach an infinity of values, e.g. patients weight) and discrete (integer values, e.g. number of undertaken procedures). In social sciences and psychology, numerical scales are categorized as follows: interval (differences between 2 random levels are equal, e.g. Celsius scale) and proportion scales (there is an absolute 0). The difference in classification resulted, most probably, from the need to find the most accurate approach to satisfy the characteristics of each research. Furthermore, there is an evident need for powerful and exact scales in the medical field, hence the assiduous use of continuous/proportion scales, while on the other hand, this certain level of precision, though desired in socio-humanistic sciences, is impossible to attain. Romanian Journal of Child and Adolescent Psychiatry

Scales of measurement and ways of summarizing data Transformations for interval scales: linear transformations. Statistical operations: calculation of arithmetic mean, standard deviation, skewness and symmetry, student t test, Fisher test, all types of correlations (Pearson, regression coefficient b). Transformations for proportion scales: multiplying type. All types of statistical operations are allowed [3]. The second part of this article is aimed at presenting the most useful and well known ways of summarizing numerical data, namely: central tendency measures (the mean, the median, the mode) and dispersion measures (the range, the standard deviation, the coefficient of variation, percentiles, interquartile range). THE MEAN. THE ARITHMETIC MEAN is the sum of the values of a variable divided by the number of measurements. It is sensitive to extreme values, but it still can be used when individual values tend to group around the mean while extreme values tend to cancel each other. When this is not the case and a large number of outlying observations prevails on either side of the mean, we say that the distribution of the data is skewed and we prefer to report the median. Another type of mean, encountered especially in socio-humanistic sciences, is the weighted average where each individual measurement has a different weight which will reflect in the final result. In certain situations, for example when talking dilutions measured on a logarithmic scale, another calculation is used to describe the central tendency, and this is the geometric mean, the nth root of the product of the n values. THE MEDIAN is the value that divides our set of measurements in two equal parts; half the measurements will be situated below the median while the other half will be above the median. The median is less sensitive to extreme values and can be used with ordinal and numerical data. From the calculation of the mean and the median we can characterize the distribution of our data: for a normal distribution the mean equals the median, for a distribution skewed to the left the mean is lower than the median, and for a distribution skewed to the right the mean is higher than the median. THE MODE is the value or category with the highest frequency. When the data has 2 such modes it is called bimodal. It is the only parameter reflecting the central tendency that can be used for data measured on a nominal scale. Measures of the central tendency cannot characterize the distribution of data Volume 4, Issue 1-2, January-July 2016

Alecsandra IRIMIE-ANA on their own. This is where the spread indicators come into play. THE RANGE represents the difference between the maximum and the minimum values. THE STANDARD DEVIATION is the most frequently reported measure of spread about the mean, the reason being that it gives us some useful guidance: 75% of our observations always lie in the interval: [the mean-2sd; the mean+2 SD], regardless of the shape of the distribution. When the shape of the distribution is normal, we will find 95% of our observations in the previously mentioned interval. THE COEFFICIENT OF VARIATION is calculated by dividing the standard deviation by the mean and multiplying the result by 100. The larger the coefficient of variation, the more inaccurate the mean will be as an estimator of the central tendency. The rules are that a coefficient larger than 30% reflects a wide spread (the mean is not a good indicator of the central tendency), a coefficient between 15% and 30% shows a moderate spread (the mean has a satisfying significance), while a coefficient lower than 15% is characteristic for a low spread (the mean is a good indicator of the central tendency). An advantage of this coefficient can be that it has no unit of measurement; hence it can be used to compare the variability of two distinct distributions [3]. PERCENTILES represent those values that divide our observations into 4 equal parts, the second quartile or Q50% being the median. They determine normal ranges of laboratory values; the normal limits of many laboratory values are set by the 2+1/2 and 97+1/2 percentiles, so that the normal limits contain the central 95% of the distribution [2]. We choose to report percentiles whenever we report the median. THE INTERQUARTILE RANGE is the difference between the first and the third quartiles, and contains the central 50% of our observations. Nevertheless, if the distribution of our data is strongly skewed, with many values around the superior quartile, the range will not reflect an accurate image of the dispersion and another measurement should be reported. REFERENCES: 1. Florin A. Sava. (2011) - Analiza datelor in cercetarea psihologica. Editura ASCR. 2. Beth Dawson, Robert G. Trapp. (2004) - Basic & Clinical Biostatistics. Lange Medical Books/McGraw-Hill. Romanian Journal of Child and Adolescent Psychiatry

Scales of measurement and ways of summarizing data 3. Cristian Opariuc - Dan. (2009) - Statistica aplicata in stiintele socio-umane. Notiuni de baza-statistici univariate. Editura ASCR. Volume 4, Issue 1-2, January-July 2016