Intro to R Professor Clayton Nall h/t Thomas Leeper, Ph.D. (University of Aarhus) and Teppei Yamamoto, Ph.D. (MIT) June 26, 2014 Nall Intro to R 1 / 44
Nall Intro to R 2 / 44
1 Opportunities 2 Challenges 3 The R Language Nall Intro to R 3 / 44
Opportunities 1 Opportunities 2 Challenges 3 The R Language Nall Intro to R 4 / 44
Opportunities Why should you use R? Because you want to do more Because you can create a better scientific workflow Because everyone else is doing it Because it is free Because R is beautiful Nall Intro to R 5 / 44
Opportunities What can you do with R? Everything Everything Statistical analysis (including Monte Carlo simulation) Text processing Online data (web scraping, calling online archives) High-level visualization Maps and geographic data Calling other software and services to do jobs (e.g., SQL, Google Maps) Nall Intro to R 6 / 44
Opportunities What can you do with R? Everything Everything Statistical analysis (including Monte Carlo simulation) Text processing Online data (web scraping, calling online archives) High-level visualization Maps and geographic data Calling other software and services to do jobs (e.g., SQL, Google Maps) Nall Intro to R 6 / 44
Opportunities What can you do with R? Everything But seriously... Everything Statistical analysis (including Monte Carlo simulation) Text processing Online data (web scraping, calling online archives) High-level visualization Maps and geographic data Calling other software and services to do jobs (e.g., SQL, Google Maps) Nall Intro to R 6 / 44
Opportunities What can you do with R? Everything But seriously... Everything Statistical analysis (including Monte Carlo simulation) Text processing Online data (web scraping, calling online archives) High-level visualization Maps and geographic data Calling other software and services to do jobs (e.g., SQL, Google Maps) Nall Intro to R 6 / 44
Opportunities What can you do with R? Everything But seriously... Everything: Statistical analysis (including Monte Carlo simulation) Text processing Online data (web scraping, calling online archives) High-level visualization Maps and geographic data Calling other software and services to do jobs (e.g., SQL, Google Maps) Nall Intro to R 6 / 44
Opportunities What can you do with R? Everything But seriously... Everything: Statistical analysis (including Monte Carlo simulation) Text processing Online data (web scraping, calling online archives) High-level visualization Maps and geographic data Calling other software and services to do jobs (e.g., SQL, Google Maps) Nall Intro to R 6 / 44
Opportunities What can you do with R? Everything But seriously... Everything: Statistical analysis (including Monte Carlo simulation) Text processing Online data (web scraping, calling online archives) High-level visualization Maps and geographic data Calling other software and services to do jobs (e.g., SQL, Google Maps) Nall Intro to R 6 / 44
Opportunities What can you do with R? Everything But seriously... Everything: Statistical analysis (including Monte Carlo simulation) Text processing Online data (web scraping, calling online archives) High-level visualization Maps and geographic data Calling other software and services to do jobs (e.g., SQL, Google Maps) Nall Intro to R 6 / 44
Opportunities What can you do with R? Everything But seriously... Everything: Statistical analysis (including Monte Carlo simulation) Text processing Online data (web scraping, calling online archives) High-level visualization Maps and geographic data Calling other software and services to do jobs (e.g., SQL, Google Maps) Nall Intro to R 6 / 44
Opportunities What can you do with R? Everything But seriously... Everything: Statistical analysis (including Monte Carlo simulation) Text processing Online data (web scraping, calling online archives) High-level visualization Maps and geographic data Calling other software and services to do jobs (e.g., SQL, Google Maps) Nall Intro to R 6 / 44
Opportunities Why should you use R? Because you want to do more Because you can create a better scientific workflow Because everyone else is doing it Because it is free Because R is beautiful Nall Intro to R 7 / 44
Opportunities Why should you use R? Because you want to do more Because you can create a better scientific workflow Because everyone else is doing it Because it is free Because R is beautiful Nall Intro to R 7 / 44
Opportunities Why is R good for science? Scripting Reproducible research tools Scripting Report generation Data sharing Nall Intro to R 8 / 44
Opportunities Reproducible Research All data and replication code is public All results come directly from open-source code You write a script, you don t point and click, so you can trace your steps This makes changes and collaboration easier Nall Intro to R 9 / 44
Opportunities Workflow for Scientific Replication Store data in perpetually readable formats (usually ASCII) Write a complete analysis script Script should run analysis and output results Reports should import results directly Share data, script, and codebooks Nall Intro to R 10 / 44
Opportunities Questions so far? Nall Intro to R 11 / 44
Opportunities Why should you use R? Because you want to do more Because you can create a better scientific workflow Because everyone else is doing it Because it is free Because R is beautiful Nall Intro to R 12 / 44
Opportunities Why should you use R? Because you want to do more Because you can create a better scientific workflow Because everyone else is doing it Because it is free Because R is beautiful Nall Intro to R 12 / 44
Opportunities What is everyone else doing? Nall Intro to R 13 / 44
Opportunities What is everyone else doing? Nall Intro to R 13 / 44
Opportunities What is everyone else doing? Nall Intro to R 13 / 44
Opportunities What is everyone else doing? Nall Intro to R 13 / 44
Opportunities What is everyone else doing? Nall Intro to R 13 / 44
Opportunities What is everyone else doing? Nall Intro to R 13 / 44
Opportunities What is everyone else doing? Nall Intro to R 13 / 44
Opportunities Impact of popularity Huge community support Cutting edge techniques Versatile New packages and approaches constantly being developed Nall Intro to R 14 / 44
Opportunities Why should you use R? Because you want to do more Because you can create a better scientific workflow Because everyone else is doing it Because it is free Because R is beautiful Nall Intro to R 15 / 44
Opportunities Why should you use R? Because you want to do more Because you can create a better scientific workflow Because everyone else is doing it Because it is free Because R is beautiful Nall Intro to R 15 / 44
Opportunities R is FREE No licensing hassles, all packages and downloads FREE Open source. Reverse compatibility always possible. Independence from decisions at IBM, STATA corporate HQ Individual control Nall Intro to R 16 / 44
Opportunities R is FREE No licensing hassles, all packages and downloads FREE Open source. Reverse compatibility always possible. Independence from decisions at IBM, STATA corporate HQ Individual control Nall Intro to R 16 / 44
Opportunities R is FREE No licensing hassles, all packages and downloads FREE Open source. Reverse compatibility always possible. Independence from decisions at IBM, STATA corporate HQ Individual control Nall Intro to R 16 / 44
Opportunities R is FREE No licensing hassles, all packages and downloads FREE Open source. Reverse compatibility always possible. Independence from decisions at IBM, STATA corporate HQ Individual control Nall Intro to R 16 / 44
Opportunities Why should you use R? Because you want to do more Because you can create a better scientific workflow Because everyone else is doing it Because it is free Because R is beautiful Nall Intro to R 17 / 44
Opportunities Why should you use R? Because you want to do more Because you can create a better scientific workflow Because everyone else is doing it Because it is free Because R is beautiful Nall Intro to R 17 / 44
Graphs are beautiful Opportunities You can make superb graphics in R Nall Intro to R 18 / 44
Opportunities Nall Intro to R 19 / 44
Opportunities Nall Intro to R 19 / 44
Opportunities Nall Intro to R 19 / 44
Opportunities Nall Intro to R 19 / 44
Opportunities Nall Intro to R 19 / 44
Opportunities Questions so far? Nall Intro to R 20 / 44
Challenges 1 Opportunities 2 Challenges 3 The R Language Nall Intro to R 21 / 44
Challenges Frustrations you will have with R No graphical user interface (GUI) Lots of add-on packages (lightweight base software) Mistype or misspell commands and you re usually short of luck Missing data handling can be confusing Cryptic error messages Nall Intro to R 22 / 44
Challenges Frustrations you will have with R No graphical user interface (GUI) Lots of add-on packages (lightweight base software) Mistype or misspell commands and you re usually short of luck Missing data handling can be confusing Cryptic error messages Nall Intro to R 22 / 44
Challenges Frustrations you will have with R No graphical user interface (GUI) Lots of add-on packages (lightweight base software) Mistype or misspell commands and you re usually short of luck Missing data handling can be confusing Cryptic error messages Nall Intro to R 22 / 44
Challenges Frustrations you will have with R No graphical user interface (GUI) Lots of add-on packages (lightweight base software) Mistype or misspell commands and you re usually short of luck Missing data handling can be confusing Cryptic error messages Nall Intro to R 22 / 44
Challenges Frustrations you will have with R No graphical user interface (GUI) Lots of add-on packages (lightweight base software) Mistype or misspell commands and you re usually short of luck Missing data handling can be confusing Cryptic error messages Nall Intro to R 22 / 44
Challenges Intro to R Professor Clayton Nall h/t Thomas Leeper, Ph.D. June 26, 2014 Nall Intro to R 24 / 44
Challenges Nall Intro to R 25 / 44
Challenges 1 Opportunities 2 Challenges 3 The R Language Nall Intro to R 26 / 44
The R Language 1 Opportunities 2 Challenges 3 The R Language Nall Intro to R 27 / 44
The R Language Try on your own Use R as a calculator: Do the Basic Math tutorial Nall Intro to R 28 / 44
The R Language Tips to Avoid Frustration Always know where you are with getwd() R is case-sensitive: mean is not the same as Mean and cycle through your command history Page Up and Page Down control your screen > prompt means R is ready for a command + means your last command wasn t complete Nall Intro to R 29 / 44
The R Language Tips to Avoid Frustration Always know where you are with getwd() R is case-sensitive: mean is not the same as Mean and cycle through your command history Page Up and Page Down control your screen > prompt means R is ready for a command + means your last command wasn t complete Nall Intro to R 29 / 44
The R Language Tips to Avoid Frustration Always know where you are with getwd() R is case-sensitive: mean is not the same as Mean and cycle through your command history Page Up and Page Down control your screen > prompt means R is ready for a command + means your last command wasn t complete Nall Intro to R 29 / 44
The R Language Tips to Avoid Frustration Always know where you are with getwd() R is case-sensitive: mean is not the same as Mean and cycle through your command history Page Up and Page Down control your screen > prompt means R is ready for a command + means your last command wasn t complete Nall Intro to R 29 / 44
The R Language Tips to Avoid Frustration Always know where you are with getwd() R is case-sensitive: mean is not the same as Mean and cycle through your command history Page Up and Page Down control your screen > prompt means R is ready for a command + means your last command wasn t complete Nall Intro to R 29 / 44
The R Language Questions so far? Nall Intro to R 30 / 44
The R Language Questions so far? Nall Intro to R 31 / 44
R Language Basics The R Language Everything in R is an object All objects have a class We execute functions on objects Functions work differently (or not at all) on different classes Functions return objects and print to the console We can then do further things with those objects Nall Intro to R 32 / 44
The R Language Try on your own Create variables: Do the Variables tutorial Nall Intro to R 33 / 44
The R Language Questions so far? Nall Intro to R 34 / 44
R Language Basics The R Language Everything in R is an object All objects have a class We execute functions on objects Functions work differently (or not at all) on different classes Functions return objects and print to the console We can then do further things with those objects Nall Intro to R 35 / 44
R Language Basics The R Language Everything in R is an object All objects have a class We execute functions on objects Functions work differently (or not at all) on different classes Functions return objects and print to the console We can then do further things with those objects Nall Intro to R 35 / 44
The R Language Try on your own Understand R data structures: Vectors Vector indexing Matrices Lists Dataframes Nall Intro to R 36 / 44
The R Language Questions so far? Nall Intro to R 37 / 44
R Language Basics The R Language Everything in R is an object All objects have a class We execute functions on objects Functions work differently (or not at all) on different classes Functions return objects and print to the console We can then do further things with those objects Nall Intro to R 38 / 44
R Language Basics The R Language Everything in R is an object All objects have a class We execute functions on objects Functions work differently (or not at all) on different classes Functions return objects and print to the console We can then do further things with those objects Nall Intro to R 38 / 44
R Language Basics The R Language Everything in R is an object All objects have a class We execute functions on objects Functions work differently (or not at all) on different classes Functions return objects and print to the console We can then do further things with those objects Nall Intro to R 38 / 44
R Language Basics The R Language Everything in R is an object All objects have a class We execute functions on objects Functions work differently (or not at all) on different classes Functions return objects and print to the console We can then do further things with those objects Nall Intro to R 38 / 44
The R Language Try on your own Understand objects and printing: Do the Objects tutorial Nall Intro to R 39 / 44
The R Language When things go wrong... Nall Intro to R 40 / 44
The R Language When things go wrong... 1 Don t panic 2 Use?functionname to see the help file for a function 3 Parsing errors versus syntax errors: Error: unexpected ) in "lm(y ~)" Error in eval(expr, envir, enclos) : found object y not 4 Google the error or warning 5 Use StackOverflow Nall Intro to R 40 / 44
The R Language A script editor makes your life easier... Helps you understand code more easily Work interactively to build a script R script is like a Stata do file File extension:.r or.r Write down all of your code Use comments (#) liberally Nall Intro to R 41 / 44
... so let s install one The R Language install.packages("devtools") library("devtools") install_github("rite", "leeper") library("rite") rite() Nall Intro to R 42 / 44
... so let s install one The R Language install.packages("devtools") library("devtools") install_github("rite", "leeper") library("rite") rite() Nall Intro to R 42 / 44
... so let s install one The R Language install.packages("devtools") library("devtools") install_github("rite", "leeper") library("rite") rite() Nall Intro to R 42 / 44
... so let s install one The R Language install.packages("devtools") library("devtools") install_github("rite", "leeper") library("rite") rite() Nall Intro to R 42 / 44
... so let s install one The R Language install.packages("devtools") library("devtools") install_github("rite", "leeper") library("rite") rite() Nall Intro to R 42 / 44
... so let s install one The R Language install.packages("devtools") library("devtools") install_github("rite", "leeper") library("rite") rite() Nall Intro to R 42 / 44
The R Language Questions so far? Nall Intro to R 43 / 44