Biostatistics & SAS programming

Biostatistics & SAS programming Kevin Zhang April 18, 2017 Determine Sample Size and Power 1

Errors April 18, 2017 Biostat 2

In practice When you design the study, you need to first tell how many units, i.e. the sample size, should be involved: 10, 100, 1000, or more? Which one you will trust? A sample with 10 observations A sample with 10,000 observations April 18, 2017 Biostat 3

Power The power of the hypothesis test demonstrate the sensitivity of the hypothesis: Whether the conclusion is reliable? Power function Power function is an equation of sample size: We may enlarge the power by getting a larger sample size. April 18, 2017 Biostat 4

POWER proc proc power in SAS is used for power analysis. You can detect the power for the given sample size, or determine the sample size using desired power. POWER need to know what kind of problem you will solve: MULTREG -- Tests of one or more coefficients in multiple linear regression ONECORR -- Fisher s Z test and T test of (partial) correlation ONESAMPLEFREQ -- Tests, confidence interval precision, and equivalence tests of a single binomial proportion ONESAMPLEMEANS -- One-sample test, confidence interval precision, or equivalence test ONEWAYANOVA -- One-way ANOVA including single-degree-of-freedom contrasts PAIREDMEANS -- Paired T test, confidence interval precision, or equivalence test PLOT -- Displays plots for previous sample size analysis TWOSAMPLEMEANS -- Two-sample T test, confidence interval precision, or equivalence test April 18, 2017 Biostat 5

Example 1 A clinical dietician wants to compare two different diets, A and B, for diabetic patients. She hypothesizes that diet A (Group 1) will be better than diet B (Group 2), in terms of lower blood glucose. She plans to get a random sample of diabetic patients and randomly assign them to one of the two diets. At the end of the experiment, which lasts 6 weeks, a fasting blood glucose test will be conducted on each patient. She also expects that the average difference in blood glucose measure between the two group will be about 10 mg/dl. Furthermore, she also assumes the standard deviation of blood glucose distribution for diet A to be 15 and the standard deviation for diet B to be 17. The dietician wants to know the number of subjects needed in each group assuming equal sized groups. April 18, 2017 Biostat 6

Analysis April 18, 2017 Biostat 7

SAS code explaination proc power; twosamplemeans test=diff groupmeans = 0 10 stddev = 16.03 npergroup =. power = 0.8; run; Two sample mean test, we need to check the difference. Set the averages of groups, here we just set 0 and 10 thus to describe the desired diff Leave npergroup blank thus SAS will calculate sizes for groups. Specify the desired power as 80% April 18, 2017 Biostat 8

Your settings We will achieve 80% power when 42 patients in each group April 18, 2017 Biostat 9

Evaluate the power of a given sample What happens if we only have 30 patients in each group? proc power; twosamplemeans test=diff groupmeans = 0 10 stddev = 16.03 npergroup = 30 power =.; run; Power is? 30 patients in each group April 18, 2017 Biostat 10

In practice, how to evaluate the power of an imbalance design? More patients assigned to Diet A, say 40 Only 20 patients wish to take Diet B proc power; twosamplemeans test=diff groupmeans = 0 10 stddev = 16.03 groupns = (40 20) power =.; run; April 18, 2017 Biostat 11

Small simulation study Wish to see the change of the sample size when we have different mean differences? proc power; twosamplemeans test=diff meandiff =.2 to 1.2 by.2 stddev = 1 power =.8 npergroup =. ; run; Checking differences: 0.2, 0.4, 0.6, 0.8, 1.0, 1.2 Larger difference will be easier to be detected, thus a smaller sample size will be needed. April 18, 2017 Biostat 12

Power chart A plot to show the trend of sample size proc power; twosamplemeans test=diff meandiff =.2 to 1 by.2 stddev = 1 power =.9 ntotal =.; plot x = power min=.5 max=.95; run; April 18, 2017 Biostat 13

Correlation Examples A researcher is interested in seeing whether a significant positive correlation exists between reading speed and IQ in adolescents. Before beginning the study, the researcher would like to know what sample size would be required to detect a positive correlation of 0.5 with power of 80%. Correlation analysis Hypothesis test about the significance of the correlation Assumed correlation is 0.5 HH 0 : ρρ = 0 vvvv HH 1 : ρρ > 0 April 18, 2017 Biostat 14

proc power; onecorr alpha=0.05 sides=1 corr=0.5 ntotal=. power=0.8; run; April 18, 2017 Biostat 15

Proportion Example A survey claims that 90% dentists recommend a particular brand of toothpaste for their patients suffering with sensitive teeth. A researcher decides to test this claim by taking a random sample of 80 dentists, but wants to first find out if this sample size is large enough to achieve 80% power. Hypothesis test about the proportion (i.e. percentage) April 18, 2017 Biostat 16

proc power; onesamplefreq test sides=2 nullproportion=0.9 proportion=0.05 to 0.85 by 0.05 alpha=0.05 ntotal=80 power=.; run; Assume any proportion that is different from the proposed 90%. Here we check the power for a list of different possible proportions in the sample April 18, 2017 Biostat 17

One sample T test of mean A researcher is planning a pharmaceutical study on a new formulation of a drug. The current formulation has an average elimination rate of 0.06. The researcher hypothesizes that the elimination rate for the new formulation is higher than 0.06. Wanting to be confident, the researcher would like to see how large the sample size must be to achieve 90% power. A standard deviation of 0.02 will be used based on studies of the original formulation of the drug. Hypothesis Test of the Average to 0.06 One tail test April 18, 2017 Biostat 18

proc power; onesamplemeans sides=1 nullmean=0.06 mean=0.01 to 0.1 by 0.01 stddev=0.02 ntotal=. power=0.9; run; Null hypo Test structure April 18, 2017 Biostat 19

Paired T test A researcher is interested in investigating whether BMI changes in males aged 55-65 years after spending four weeks on a novel diet and exercise program. The researcher plans to take BMI measurements on a random sample of men before and after the intervention and see whether there was a change. An 80% level of power is desired and a standard deviation of 2.0 based on past studies of weight loss and BMI change is used for calculations. Comparison between two readings T test A SAME sample has been read twice (Before vs After) Paired design April 18, 2017 Biostat 20

proc power; pairedmeans sides=2 nulldiff=0 Null assumes no diff meandiff=0.5 to 3 by 0.5 corr=0.5 stddev=2.0 npairs=. power=0.8; run; Correlation: Before vs After npairs instead of ntotal Possible differences in the sample April 18, 2017 Biostat 21

Example of ANOVA A researcher is interested in investigating the effects of three different diets on percent weight loss when implemented along with a 5-day per week cardio exercise program. The diets include a low carbohydrate diet, a high protein diet, and a control diet (just exercise). Before beginning the study, sample size determinations must be made. The researcher would like to achieve power of 80%. From previous study, the average percent weight loss values are: 9 for Low, 12 for High and 8 for Control. Assume the standard deviation is 3.0 Comparing 3 groups (Low High - Control) One-way ANOVA April 18, 2017 Biostat 22

proc power; onewayanova test=overall groupmeans=9 12 8 stddev=3.0 npergroup=. power=0.8; run; Here we have 3 groups, so we need to know how many subjects in each. Balance design assumed. April 18, 2017 Biostat 23