Today s Agenda Wk 1 - Welcome to Stat 342 - Policy - Some motivation - Course Schedule - Installing SAS - Resources available Stat 342 Notes. Week 1, Page 1 / 49
Contact: E-mail: jackd@sfu.ca Course website: http://www.sfu.ca/~jackd/stat3 42 Office Hours: 3-4pm Monday, Wednesday, and Thursday at K9510 (The stats lab), starting in Week 2. The stats lab is available for free tutoring in the through K9510, 9:30 4:30 M-F, and sometimes later hours. This room also has computers with R installed for you to use. Stat 342 Notes. Week 1, Page 2 / 49
What is this class about? My assumption is that most of you are senior undergraduates in statistics, and that you have little to no experience with SAS before, but that you have some general programming knowledge, likely with R. Also, I assume that you are taking this course because you want to learn programming, and because it is paired with Stat 341 - R. Stat 342 Notes. Week 1, Page 3 / 49
My intention is to bring maximal value to that assumed demographic. By the end of this course, successful students will be able to: 1. Write simple SAS programs (e.g. loading and saving datasets, creating derivative variables, printing the first few lines of a data set, or its variable names) with only minimal reference to outside sources (e.g. textbook, sas user guide, stack exchange). 2. Plan a typical analysis (e.g. regression, generalized model, contingency), including the input, formatting, and cleaning of data, and appropriate use of variables. Stat 342 Notes. Week 1, Page 4 / 49
By the end of this course, successful students will be able to: 3. Write SAS programs to run these full analyses with the aid of outside sources. 4. Produce and explain the standard output from several typical analyses that are done in SAS. 5. Pass the SAS Base Programmer certification exam with less than half of the preparation time of someone learning SAS from scratch. Stat 342 Notes. Week 1, Page 5 / 49
Grading policy 1/2 Minimum scores for letter grades A+ : 90% B-: 68% A: 85% C+: 64% A-: 80% C: 60% B+: 76% C-: 55% B: 72% D: 50% F: 0% Stat 342 Notes. Week 1, Page 6 / 49
Grading policy 2/2 Grades are based on 1 Midterms x 40% = 40% 1 Final x 50% = 50% 3 Assignments x 3% = 9% 1% free. Stat 342 Notes. Week 1, Page 7 / 49
About the assignments In previous semester, large groups of assignments have come in nearly identical, which defeats the purpose of having practice work, therefore this semester... The weight of assignments is very small, and assignments are only graded on completion, not on correctness. Keys will be made available soon after each assignment's due date. I recommend you do them not for the direct 10%, but for the greater preparation that it will afford you on midterms and and on the final exam. Stat 342 Notes. Week 1, Page 8 / 49
Handing in assignments, late policy - There are some drop boxes labelled Stat 342 outside the stats lab, on the main floor of Shrum Science Centre K. All assignments are to be handed in there by 4:30pm of the due date. - Assignments not in the drop-box (or handed in in-class) when they are picked up in the drop box will not be graded. -Assignments are graded by TAs, and solutions will be e- mailed out after the due date. Stat 342 Notes. Week 1, Page 9 / 49
About the textbook SAS and R Data Management, Statistical Analysis and Graphics, by Ken Kleinman and Nicholas J. Horton is ABSOLUTELY ESSENTIAL. By which I mean: You cannot succeed in this course without a copy of the textbook. It will be referenced frequently and extensively. Stat 342 Notes. Week 1, Page 10 / 49
The textbook used in this course and in Stat 341 is a fantastic reference guide. It has hundreds of examples of common tasks that can be done in both SAS and R with code and minimal commentary. Compared to a lot of textbooks, this feels a lot more like a supplement. I will be using these examples extensively, so this textbook is considered required for Stat 342. I will try to cover the material at the same time as Carl Schwarz does in Stat 341, which uses the same book. Stat 342 Notes. Week 1, Page 11 / 49
The lectures, however, will focus a lot more on the theory behind the examples in the book. To use the book more directly would involve a lot of memorization and not much understanding. You can think of the textbook as a recipe book, and the lectures as discussions of cooking technique. Additional readings will include excerpts from 1. Handbook of SAS DATA Step Programming (by Arthur Li), 2. SAS Certification Prep Guide: Base Programming for SAS 9 (by the SAS Institute) Stat 342 Notes. Week 1, Page 12 / 49
Finally, some motivation. Why learn SAS when we already have R? R is open source, and as Carl Schwarz will tell you that means it is free but not cheap. Packages for R are written all over the world by people with varying levels of skill, and varying levels of regard for common styles and documentation. SAS, on the other hand, is more consistent. It's developed mainly from a central campus in Cary, North Carolina. Stat 342 Notes. Week 1, Page 13 / 49
SAS, being closed source, is also able to provide a guarantee of quality. This is a big deal in situations where the output of an analysis has legal ramifications. It's excellent at handling very large databases. Much better than base R can. These two aspects make it the program of choice for analysis work done at Stats Canada in Ottawa, which is by far the biggest employer of BSc level statisticians in Canada. Stats Canada also has an annex at UBC and in the Health Sciences department here at SFU. Stat 342 Notes. Week 1, Page 14 / 49
Pharmaceutical drug trials in the United States, for example, have very strict controls regarding their data and registration. SAS is the tool of choice for pharmaceuticals and lots of other medical work in industry. Emmes Canada is a statistical consulting firm stationed in IRMACS that primarily operates in SAS. Stat 342 Notes. Week 1, Page 15 / 49
We will talk about the Base Programmer exam a lot in the course, but it's the first of 13 credentials that SAS offers, including Clinical Trials Programmer. See: https://support.sas.com/certify/creds/ct.html and https://support.sas.com/learn/ap/index.html SAS Canada, stationed in Toronto, wants to know whenever someone at SFU passes one of these exams, so successful candidates can be added to their hiring list. Employers regularly contact the Toronto office looking for people with SAS experience, and those opportunities get passed on to us. Stat 342 Notes. Week 1, Page 16 / 49
Finally, there's the SAS Institute itself, which hires thousands of statisticians and programmers to develop SAS and JMP. Here's how it places on the Forbes 2016 best employers list Stat 342 Notes. Week 1, Page 17 / 49
Here's SAS among information technology companies. Stat 342 Notes. Week 1, Page 18 / 49
Course Schedule Lectures will be... Thursday 12:30 PM 2:20 PM At the Education Building room 7618 at the Burnaby Campus. The education building is attached to AQ, and is on the opposite side from the Shrum Science buildings. Stat 342 Notes. Week 1, Page 19 / 49
Week 1 Thursday, Sept 7. Introduction to Stat 342, policies, schedule How to install SAS. Introduction to other references. General introduction to SAS. Stat 342 Notes. Week 1, Page 20 / 49
Week 2 Thursday, Sept 14. Textbook Chapters 1 and 2. Data steps and proc steps. The compile phase and the execution phase. Proc print. Proc contents. Input and output. Proc import, proc export, delimiters, the set command,, the input command. Dbms and file formats. (as time permits) Libraries and libref Stat 342 Notes. Week 1, Page 21 / 49
Week 3 Thursday, Sept 21. Spillover from Week 2, finishing input and output. SQL. Proc SQL, RMySQL, and MySQL in general. The artful software collection of SQL programs. Select, from, when. Connections in general. Sorts. Stat 342 Notes. Week 1, Page 22 / 49
Week 4 Thursday, Sept 28. Assignment 1 Due. SQL inner/left/right/outer merges, concatenations. (if time) Dashboards and PHP. The long format of data, transposing. Making new variables from old ones Transformations. Probability distributions. Stat 342 Notes. Week 1, Page 23 / 49
Week 5 Thursday, Oct 6. Textbook Chapter 3. Probability distribution application: Getting p-values from scratch with CDF. Example problems, overview of homework issues. Practice midterm. Stat 342 Notes. Week 1, Page 24 / 49
Week 6 Thursday, Oct 13. The practice midterm key will be given out over the Thanksgiving break, Oct 7-10. Midterm exam, 90 minutes (100 depending on room logistics). Stat 342 Notes. Week 1, Page 25 / 49
Week 7 Thursday, Oct 20. Integer and floating point operations. The similar subtraction issue. Matrix inversion example. Tolerance. Random number generation, specifically getting a sample of rows instead of the first few. Stat 342 Notes. Week 1, Page 26 / 49
Week 8 Thursday, Oct 27. Assignment 2 Due. Means, moments, quantiles, standardizing. Correlation. Proc freq, cross tabs and row/column totals. Tables option, Chisq option, cmh option, missprint option. McNemar s test (as time permits) Stat 342 Notes. Week 1, Page 27 / 49
Week 9 Thursday, Nov 3. Hypothesis tests normality, equal variance, t-test, proc univariate, proc ttest, proc npar1way. proc reg for regression with more details, model setting. proc glm for wider range of models. Note that glm standards for general linear models here, as in multivarible linear regression with interactions and categorical variables. Stat 342 Notes. Week 1, Page 28 / 49
Week 10 Thursday, Nov 10. Dummy variables and categorical variables. Diagnostics, leverage cook s D, residuals and residual plots. Prediction bounds, r-squared. proc logistic for binary responses, odds ratios. Theory: the sigmoid curve. Stat 342 Notes. Week 1, Page 29 / 49
Week 11 Thursday, Nov 17. (As time permits) Automated model selection like Stepwise regression Control of flow IF THEN DO loops Stat 342 Notes. Week 1, Page 30 / 49
Week 12 Thursday, Nov 24. Assignment 3 Due. Plots. Contour plots, density histogram, bar graphs, CDFs. (If time permits) Survival plots Kaplan-Meier. Stat 342 Notes. Week 1, Page 31 / 49
Week 13 Thursday, Dec 1. Buffer time. Final exam prep. Final exam: Monday. Dec 11 3:30 6:30pm Location To Be Announced Stat 342 Notes. Week 1, Page 32 / 49
Note about the drop deadline. Officially the deadline is Monday, Oct 10, but that's a statutory holiday and all offices will be closed. Consider Friday, Oct 7 the deadline, just in case. Also see: http://www.sfu.ca/students/deadlines/fall2016.html Stat 342 Notes. Week 1, Page 33 / 49
Regarding notes: Many course notes will be in a fill-in-the-blank system. Before each lecture, I will e-mail out notes as PDFs like I did with these ones, but with blanks to be filled in during class. The rest will be written during class using a document camera, and will be scanned into a PDF soon after the class. Stat 342 Notes. Week 1, Page 34 / 49
If you are taking notes on paper, I recommend printing these notes so that 4-6 slides appear on a single page. There are single-slide breaks between every 10-15 minutes of material. On these break slides, I like to include pictures of cute/funny animals with stupid stats puns. If there are any animals that you feel uncomfortable seeing (mice, reptiles, fish, birds, whatever), please e-mail me a request not to include those. Stat 342 Notes. Week 1, Page 35 / 49
Regarding collaboration, honesty, and plagiarism None of the assignments or exams for this course are recycled from previous sources. Anyone claiming to have a test bank for this offering of this course is lying. Please include the names of your collaborators on your assignments. This way, the markers will understand when some solutions look very similar that there wasn t blind copying. Stat 342 Notes. Week 1, Page 36 / 49
You are encouraged to work together to do the computational and analytical portions of the assignments. However, all written work is expected to be solely yours. Copying the writing of another student, or using services to write assignments on your behalf will be considered academically dishonest and will be dealt with as appropriate in SFU s academic dishonesty policy. The use of proofreading and essay skills services, such as those in the Student Learning Commons, is perfectly fine. Stat 342 Notes. Week 1, Page 37 / 49
Resources for Installing SAS SFU Software Library https://www.sfu.ca/itservices/technical/software.html http://www.sas.com/en_us/software/university-edition.html VirtualBox for PC, or VMWare for PC/Mac Stat 342 Notes. Week 1, Page 38 / 49
This is VirtualBox, it can be used to make a computer inside a computer. We'll be using it to make a SAS server locally on your computer. Download it first. Stat 342 Notes. Week 1, Page 39 / 49
Next, download SAS University edition, and open it with VirtualBox. Stat 342 Notes. Week 1, Page 40 / 49
Open the 2 GB file and choose to Import, with all the defaults. Stat 342 Notes. Week 1, Page 41 / 49
When it's imported, you can start the virtual machine you have created. Stat 342 Notes. Week 1, Page 42 / 49
The virtual machine will start up and show this screen. Leave it on. From your web browser, go to http://localhost:10000 Stat 342 Notes. Week 1, Page 43 / 49
http://localhost:10000 will bring you here. Press the big red button. Stat 342 Notes. Week 1, Page 44 / 49
A new tab opens up, letting you write, save, and run SAS code! Stat 342 Notes. Week 1, Page 45 / 49
Other SAS references. The library has several SAS books available, including... SAS Certification Prep Guide: Base Programming for SAS 9 Stat 342 Notes. Week 1, Page 46 / 49
SAS Certification Prep Guide: Base Programming for SAS 9 This book is available as an ebook, and can be found by searching for it at http://www.lib.sfu.ca/ Or by following this link directly (and using your SFU login) http://proquest.safaribooksonline.com.proxy.lib.sfu.ca/9781 607649243 This book covers data manipulation and the underlying mechanics of SAS in more detail than you'll want, but it is the authoritative guide if you want to take the cert exam. Stat 342 Notes. Week 1, Page 47 / 49
Stack exchange This is a question-and-answer forum for technical and programming questions of all sorts. The statistics part of it is called 'Cross Validated', but searching for 'stack exchange' will get you there. If you have a problem, there's a good chance someone else has had it first. http://stats.stackexchange.com/questions/tagged/sas Stat 342 Notes. Week 1, Page 48 / 49
SAS knowledge base documentation. These are most useful after you've spent some time with SAS. Like the R support documents that come up when you type?function, they have a lot of details that will be overwhelming until you know what you're looking for. http://support.sas.com/documentation/index.html Stat 342 Notes. Week 1, Page 49 / 49