SOCI 420 Advanced Methods of Social Research Fall 2017 EXAM 1 RUBRIC Instructor: Ernesto F. L. Amaral, Assistant Professor, Department of Sociology Date: October 12, 2017 (Thursday) Section 904: 2:20 3:35pm Percent of final grade: 20% Write your name on the top of all pages of this exam. Return all pages to the professor. Multiple choice questions Answer the following 30 multiple choice questions. Each question is worth 0.4 points for a total of 12 points. Mark your responses on a grey and white 8.5 x11 Scantron testing form. Only No.2 pencils can be used to bubble in answers (not ink). The correct answers are underlined. 1. Data is the same thing as a. information collected in numerical form. b. information collected in any form. c. statistics. d. proof. 2. In terms of the Wheel of Science, a hypothesis is derived from and leads to. a. statistics, observation b. theory, observation c. observation, generalizations d. theory, generalizations 3. In the research process, the role of statistics is limited because a. numbers don't prove anything. b. of possible flaws in research design or method. c. the researcher may not be a mathematician. d. people lie when answering questionnaires. 4. A hypothesis states, in part, that "income increases as education increases". In this statement, income is a. the dependent variable. b. the independent variable. c. the hypothetical variable. d. the secondary variable. 5. Ninety percent of dorm residents approved a proposed ban on smoking. This statement is an example of the use of a. inferential statistics. b. univariate descriptive statistics. c. multivariate descriptive statistics. d. inductive statistics. 6. If a researcher summarizes the age of 1,000 people by calculating the average age, she is using a. a qualitative technique. b. an incorrect hypothesis. c. data reduction. d. non-empirical reasoning. 1
7. Measures of association are a type of descriptive statistics that allow us to a. investigate the causal influence of some variables on others. b. predict the score on one variable from the score on another. c. know the strength and direction of a relationship between two or more variables. d. All of the above 8. Inferential statistics are necessary in social research because a. it may be impossible to find all members of a certain population. b. social scientists don't have the time or money to test an entire population. c. some of the population might not cooperate. d. samples are sometimes accurate representations of the population but can't always be used to generalize. 9. Which of the following questions would generate a continuous variable? a. How old are you? b. How many books do you own? c. How many times have you ever changed a flat tire? d. How many degrees do you have? 10. In addition to saying that one case is different from another, the ordinal level of measurement allows us to a. put cases in general categories. b. measure the distance between high and low. c. say that one case is more or less than another. d. calculate meaningful averages of variables. 11. Proportions and percentages, ratios and rates are all ways of expressing a. concise distributions of a variable. b. data without leaving out any details. c. raw frequencies. d. relative frequencies. 12. In Table 1, what percentage of Democrats lives in community B? a. (21/64) x 100 = 32.8 b. (21/328) x 100 = 6.4 c. (21/156) x 100 = 13.5 d. (64/264) x 100 = 24.2 Table 1. Political party membership in two communities Party Community A Community B Total Republicans 103 17 120 Democrats 135 21 156 Independents 17 15 32 Socialists 9 11 20 Total 264 64 328 Source: Fictitious data. 13. When working with a very small number of cases, it is usually preferable to report a. percentages. b. proportions. c. fractions. d. actual frequencies. 2
14. If class intervals overlap with one another, there will be issues of a. categories not being exhaustive. b. categories not being mutually exclusive. c. categories being of unequal size. d. all of these choices are correct. 15. Open-ended intervals a. are always preferable to unequal intervals. b. should never be used in actual research. c. can be useful when there are a few very high or very low scores in a distribution. d. should only be used with ordinal-level variables. 16. When creating a frequency distribution, it is best to focus on a. detail (more categories). b. clarity (fewer categories). c. the decision is irrelevant. d. the decision is made by the researcher based upon their data. 17. A city of 1567 people had 34 auto thefts last year. The auto theft rate for this city a. cannot be determined from the information given. b. is falling. c. is (34/1567) x 100,000. d. is (100,000 x 1567)/34. 18. The homicide rate for a city is reported as 23.89. This means that a. for every homicide, there were 23.89 victims. b. The homicide rate is rising. c. For every 100,000 people in the population there were 23.89 homicides. d. There was an average of 23.89 homicides each month. 19. For a single variable measured at the nominal level, an appropriate graph would be a a. pie chart. b. histogram. c. frequency polygon. d. bivariate table 20. For a single variable at the interval-ratio level, an appropriate graph would be a. a pie chart. b. a histogram. c. a bivariate table. d. none of the above. Graphs are never used for interval-ratio level variables. 21. The graphical presentation method that uses midpoints rather than real limits is a a. pie chart. b. line chart. c. histogram. d. bar chart. 22. The three commonly used measures of central tendency (mode, median, and mean) a. will always have the same value. b. will always fall in the same order: the mean will have the highest value, followed by the median and the mode. c. define "typical" or "average" in different ways and will usually have different values. d. will always fall in the same order: the mean will have the lowest value, the median will always be in the middle and the mode will have the highest value. 3
23. The median represents the score that is a. half of the sum of the other scores. b. the most common or frequent. c. in the middle. d. the average of the highest and lowest scores. 24. The most appropriate measure of central tendency for gender would be the a. mode. b. median. c. mean. d. None of the above 25. If you subtracted the mean from each score in a distribution and added the results, the sum would be a. zero. b. less than zero. c. a minimum. d. the mode. 26. In a positively skewed distribution, the mean is a. equal in value to the median. b. greater in value than the median. c. less in value than the median. d. either a or b, depending on the value of the mode. 27. Measures of dispersion indicate the degree to which a set of scores is a. heterogeneous. b. ambiguous. c. average. d. typical. 28. An advantage of the interquartile range (Q) over the range (R) is that it a. can be used for nominal level variables. b. includes the most extreme scores. c. is based on only the middle 50% of the scores. d. ignores the first and third quartiles. 29. If a test score lies at the first quartile, it is a. higher than 25% of the scores. b. the same as the median. c. the same as the mode. d. higher than 75% of the scores. 30. The age of a sample has been measured in years. Which of the following would be the preferred measure of the dispersion for this variable? a. The index of qualitative variation b. The average deviation c. The standard deviation d. The quartile deviation 4
Essay questions Please answer the following 4 essay questions. Each question is worth 2 points for a total of 8 points. Answer these questions on the back of the pages of this exam. You should number your answers. Use a black ink pen or a blue ink pen to answer these questions (not pencil). 31. Define and distinguish between the sample distribution, the sampling distribution, and the population distribution. How are these three distributions related to each other in inferential statistics? What symbols are used to identify the means, standard deviations, and proportions of each of the three distributions? Explain the two theorems that define the characteristics of the sampling distribution. 10 items at.2 points each 1. Sample distribution definition 2. Sampling distribution definition 3. Population distribution definition 4. How are these different? 5. How are they related? 6. Mean symbols 7. Standard deviation symbols 8. Proportion symbols 9. Theorem 1 10. Theorem 2 5
32. Explain and distinguish between simple random sampling, systematic random sampling, stratified random sampling, and cluster sampling. How these techniques are designed to guarantee representativeness? Why did the professor mention his research project about visitors of street markets when he was explaining sampling techniques? 7 items at.286 each 1. Simple random definition 2. Systematic random definition 3. Stratified random definition 4. Cluster definition 5. Difference between these sampling techniques 6. How do techniques guarantee representativeness? 7. Why the street market example? It was an example of a nonprobability sampling 6
33. Write and explain each component of the formulas for: (1) Z score; and (2) confidence interval for sample means (when standard deviation is unknown for population) with large samples (N>100). Explain what happens to the alpha, Z score, and width of the confidence interval when the confidence level increases. Give an example with fictitious mean, standard deviation, and sample size. Explain what happens to the confidence interval when the sample size increases. 9 items at.222 points each 1. Formula: Z score 2. Description of components 3. Formula: Confidence interval for sample means (standard deviation of population is unknown) 4. Description of components 5. Confidence level increases, alpha decreases 6. Confidence level increases, Z score increases 7. Confidence level increases, width of confidence interval increases 8. Example includes mean, standard deviation, sample size 9. Sample size increases, confidence interval is narrower 7
34. Explain in detail each of the following outputs from Stata. These tables are about a question from the 2016 General Social Survey: Do you think the number of immigrants to America nowadays should be (letin1). The symbols in the tables below indicate the possible answers (_prop_1: increased a lot; _prop_2: increased a little; _prop_3: remain the same as it is; _prop_4: reduced a little; _prop_5: reduced a lot). Write and explain the formulas for the proportion, standard error, and confidence interval. Explain the reasons of different estimations in each output.. svyset [weight=wtssall], strata(vstrat) psu(vpsu) singleunit(scaled). prop letin1 Proportion estimation Number of obs = 1,845 letin1 Proportion Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ _prop_1.0585366.0054668.0486914.0702255 _prop_2.1181572.007517.1041922.1337145 _prop_3.402168.0114186.3799914.4247522 _prop_4.2271003.0097564.2085362.2468016 _prop_5.1940379.0092092.1766116.2127395. prop letin1 [iweight=wtssall] Proportion estimation Number of obs = 1,841 letin1 Proportion Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ _prop_1.058567.0054735.0487104.0702708 _prop_2.1162894.0074725.1024166.1317655 _prop_3.4028475.0114328.380642.4254586 _prop_4.2305341.0098176.2118444.2503489 _prop_5.191762.0091768.1744047.2104065. svy: prop letin1 (running proportion on estimation sample) Survey: Proportion estimation Number of strata = 65 Number of obs = 1,845 Number of PSUs = 130 Population size = 1,841.4241 Design df = 65 Linearized letin1 Proportion Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ _prop_1.058567.0069146.0461905.0740025 _prop_2.1162894.0091266.0992656.1357926 _prop_3.4028475.0116896.3797379.4263967 _prop_4.2305341.0096834.2117649.2504382 _prop_5.191762.0101151.1723673.2127779 11 items at.182 points each 1. What is the first command doing? Complex survey design takes into account stratum (vstrat variable), primary sampling unit (vpsu variable), and weight (wtssall variable). 2. What is the second command doing? Interpret results (one proportion is enough). Proportion estimation with no weight, no complex survey design. The second command generates information that is valid only for the sample. It is not representative to the population. 3. What is the third command doing? Interpret results (one proportion is enough). Proportion estimation with weight, no complex survey design. This command corrects the estimation of the proportion. The point estimate (proportion) is representative to the population. 4. What is the fourth command doing? Interpret results (one proportion is enough). Proportion estimation with complex survey design (which also takes into account the weight). This command corrects the estimation of the proportion, standard error, and confidence interval. The point estimate (proportion) and the interval estimate (confidence interval) are representative to the population. 5. Formula for proportion 6. Explanation 7. Formula for standard error 8. Explanation 9. Formula for confidence interval 10. Explanation 8
11. What is different about the estimations in the three outputs? First estimation (no weight, no complex survey design) relates only to the sample. Second estimation (weight, no complex survey design) has a point estimate (proportion) that is representative to the population. Third estimation (complex survey design, which also informs weight) has a point estimate (proportion) and interval estimate (confidence interval) that are representative to the population. 9