class: title-slide, center, middle <span class="fa-stack fa-4x"> <i class="fa fa-circle fa-stack-2x" style="color: #ffffffcc;"></i> <strong class="fa-stack-1x" style="color:#e7553c;"></strong> </span> # Let's start at the very beginning ## A very good place to start ### Jen Richmond · UNSW Psychology/R-Ladies Sydney #### [https://tinyurl.com/aimos-workshop-slides](https://tinyurl.com/aimos-workshop-slides) --- layout: true <div class="my-footer"><span>https://tinyurl.com/aimos-workshop-slides</span></div> --- .pull-left[ # The Plan 1. Why R? 2. Navigating RStudio 3. Getting data in 4. Cleaning it up 5. Summary stats & ggplot <br> > You will need a laptop and access to wifi, but we will use RStudio Cloud, so don't worry about installation. ] .pull-right[ <img src="img/r_first_then.png" width="700px" /> Image credit: [Allison Horst](https://github.com/allisonhorst/stats-illustrations) @allison_horst from R-Ladies Santa Barbara ] ] --- class: center, middle # Disclaimer .pull-left[ #### Today we are really unlikely to know the answers to all your questions, but we can help you google! #### Google is every R user's best friend, even Hadley Wickham. <img src="img/google_hadley.png" width="300px" /> ] .pull-right[ <img src="img/google_ResearcHers.png" width="600px" /> ] --- class: center, middle # 1. Why R? Talk to your neighbour, why do you want to learn R?
02
:
00
--- # Why I R... .left-narrow[ ### [Reproducibility](https://www.nature.com/articles/s41559-017-0160) ### [One stop shop](https://twitter.com/kara_woo/status/1100908125396193281) ### [Equity](https://rfortherestofus.com/2019/07/equity/) ### [Community](https://qz.com/work/1661486/r-ladies-made-data-science-inclusive/) ### [Fun!](https://headrush.typepad.com/creating_passionate_users/2005/06/kicking_ass_is_.html) ] .right-wide[ <img src="img/stats.jpeg" width="800px" /> Image credit Darren Dahly @statsepi ] --- class: center # Warning: You will want to quit <img src="img/pit.jpg" width="600px" /> --- class: center <img src="img/mountain.png" width="900px" /> Image credit Dani Navarro [Workflow](https://slides.com/djnavarro/workflow#/) --- class: center <img src="img/climb.png" width="900px" /> Image credit Dani Navarro [Workflow](https://slides.com/djnavarro/workflow#/) --- class: center <img src="img/team.png" width="900px" /> Image credit Dani Navarro [Workflow](https://slides.com/djnavarro/workflow#/) --- class: center, middle # Navigating RStudio https://tinyurl.com/do-re-mi-aimos --- class: inverse # Your turn 1: Set up RStudio Cloud - Go to this link https://tinyurl.com/do-re-mi-aimos - Sign up for an RStudio Cloud account - Once at the Welcome screen, choose Projects, do-re-mi - Let it deploy (it will chug a little bit) <br> <img src="img/cloud.png" width="500px" />
03
:
00
--- class: center # Think of R-Studio as your kitchen <img src="img/studio.jpg" width="700px" /> Image credit to Jessica Ward from @RLadiesNCL --- class: inverse # Your turn 2: Let's start by redecorating .pull-left[ From the top menu choose - Tools - Global Options - Appearance <br> Then try out a few Editor themes and choose one you like. ] .pull-right[ <img src="img/redecorate.png" width="600px" />
04
:
00
] --- class: center # What questions do you have ? <img src="img/questions1.jpg" width="700px" /> --- # Packages A package is a group functions that some kind person has written, tested, bundled together and given away for everyone to use. <br> .pull-left[ To install packages, type into your console. Just do this once. <br> To use a package, load it using the `library()` function. You need to do this every time you want to use the functions in a package. ] .pull-right[ ``` install.packages("packagename") ``` <br> ``` library(packagename) ``` ] --- # What is the tidyverse? .pull-left[ The tidyverse is a mega-package (i.e. a package of other packages) designed to make data wrangling and visualisation easier. ```r library(tidyverse) ``` is the same as... ```r library(readr) library(dplyr) library(tidyr) library(ggplot2) library(purrr) library(tibble) library(stringr) library(forcats) ``` ] .pull-right[ <img src="img/tidy_workflow.png" width="700px" /> ] It's a good idea to put `library(tidyverse)` at the top of every analysis document. You will probably need it. --- class: inverse # Your turn 3: Install some packages Step 1. In your console, install... - janitor - here - beepr - skimr ```r install.packages("packagename") ``` Step 2. Then open the Rmd_demo_babynames.Rmd file and add the following to the "code chunk" under `library(tidyverse)`. ``` library(janitor) library(here) library(beepr) library(skimr) ``` Run the code in the chunk using the little green arrow on the right
04
:
00
--- class: center # What questions do you have ? <br> <br> <img src="img/questions2.jpg" width="700px" /> --- # An aside about RMarkdown .pull-left[ RMarkdown documents (or .Rmd files) are key to reproducible analysis. - notes/explanations + "chunks" of R code - "knit" the document (notes + code + output) into a nice format that other people (who may not know R) can access. ### Tricks and tips - Insert a chunk using shortcut Alt-Cmd-I - Hash to get levels of heading - Asterisks to get bold and italics ] .pull-right[ <img src="img/rmarkdown_wizards.png" width="700px" /> Image credit: [Allison Horst](https://github.com/allisonhorst/stats-illustrations) @allison_horst from R-Ladies Santa Barbara ] --- class: inverse # Your turn 4: "Knit" your RMarkdown document .pull-left[ Change a few things about the text in the demo.Rmd doc - Change the title at the top - Add a few headings using # - Make words bold by putting them in double asterik Then choose "Knit" and check out the output. ] .pull-right[ <iframe src="https://giphy.com/embed/26BRIVreINK0mQdW0" width="480" height="352" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/wind-hat-gnome-26BRIVreINK0mQdW0">via GIPHY</a></p> ] <br>
02
:
00
--- class: center # What questions do you have ? <img src="img/questions6.jpg" width="300px" /> --- class: center, middle # Getting data in --- # Where is the data? .pull-left[ The `here` package will help avoid file path drama. You can use it to tell R where you data is, relative to the top level of your project folder. <img src="img/here.png" width="400px" /> ] .pull-right[ We can use the `read_csv()` from `readr` and `here()` to read a .csv file. #### Template code ``` df <- read_csv(here("data", "file.csv")) ``` <br> > BTW you can read .xls, .sav, .sas, .stata, but you will need to install the `readxl` or `haven` packages. <br> Image credit: [Allison Horst](https://github.com/allisonhorst/stats-illustrations) @allison_horst from @RLadiesSB ] --- class: inverse # Your turn 5: Read in the babynames data - Step 1: Make a new chunk (Cmd-Option-I) - Step 2: Copy the template code ``` df <- read_csv(here("data", "file.csv")) ``` - Step 3: Adapt the template to read in the SAbabynames.csv data - Step 4: Play with the code in each chunk to plot... - How popular is your name? - How popular is your name relative to your neighbours? - Has the popularity of your name changed since you were born? **don't forget to change r eval = TRUE on each chunk** - Step 5: Knit your document
10
:
00
--- class: center, middle # What questions do you have ? <img src="img/questions3.png" width="300px" /> --- class: center, middle # Before we break... a key idea <img src="img/pipe.png" width="600px" /> Image credit Dani Navarro [Tidyverse for Beginners](https://slides.com/djnavarro/tidyverse-for-beginners#/) --- class: center, middle # Time for a break... ### Next up.... your own data
15
:
00
--- class: inverse # Your turn 6: Lets get YOUR data in <br> - Step 1: Upload your data into the data folder - Step 2: Make a new RMarkdown document (File-NewFile-RMarkdown), save as my_analysis.Rmd - Step 3: Make a new chunk (Cmd-Option-I) and load packages using library() - Step 4: Make another new chunk, copy the template code ``` df <- read_csv(here("data", "file.csv")) ``` - Step 5: Adapt the template to match your datafile - Step 6: Run the chunk to read your data in - Step 7: Have a look at your data, any obvious problems?
05
:
00
--- # YAY your data is in R! .pull-left[ ## But its probably "dirty". #### Maybe... #### ... there are no variable names #### ... the variable names are weird #### ... there are empty rows/columns #### ... you don't need all of those variables #### ... you regret your choice of variable names ] .pull-right[ <img src="img/dplyr_wrangling.png" width="600px" /> Image credit: [Allison Horst](https://github.com/allisonhorst/stats-illustrations) @allison_horst from R-Ladies Santa Barbara ] --- ## clean_names() and friends from `janitor` #### Problem 1: there are no variable names ``` betterdata <- yourdata %>% row_to_names() ``` #### Problem 2: the variable names are weird ``` betterdata <- yourdata %>% clean_names() ``` #### Problem 3: there are empty rows/columns ``` betterdata <- yourdata %>% remove_empty(which = c("rows", "columns")) ``` --- ## Problem 1, 2, AND 3: Use the pipe %>% <img src="img/janitor_clean_names.png" width="600px" /> ``` betterdata <- yourdata %>% row_to_names() %>% clean_names() %>% remove_empty(which = c("rows", "columns")) ``` --- ## select() and rename() from `dplyr` ### Problem 4: you don't need all of those variables ``` betterdata <- yourdata %>% select(col1, col2) betterdata <- yourdata %>% select(col1:col5) betterdata <- yourdata %>% select(-col6) ``` ### Problem 5: you regret your variable names ``` betterdata <- yourdata %>% rename(newname1 = oldname1, newname2 = oldname2) ``` --- class: middle, center # My turn: Lets clean up the favourite things data --- class: inverse # Your turn 7: Clean up your own data .pull-left[ No variable names ``` betterdata <- yourdata %>% row_to_names() ``` Weird variable names ``` betterdata <- yourdata %>% clean_names() ``` ] .pull-right[ Get rid of variables ``` betterdata <- yourdata %>% select(col1, col2) ``` Rename variables ``` betterdata <- yourdata %>% rename(newname1 = oldname1) ``` ] Empty rows/columns ``` betterdata <- yourdata %>% remove_empty(which = c("rows", "columns")) ```
15
:
00
--- class: center # What questions do you have ? <img src="img/questions4.jpg" width="300px" /> --- # Well done! Your data is clean-er, but is it "tidy"? ## Three rules for tidy data: - Each variable must have its own column. - Each observation must have its own row. - Each value must have its own cell. <img src="img/tidy-1.png" width="700px" /> Image credit: [R4DS](https://r4ds.had.co.nz/tidy-data.html) <br> <br> <br> Read more about tidy data [Wickham (2014)](https://www.jstatsoft.org/article/view/v059i10) --- # Wide to Long <img src="img/wide_long.png" width="1000px" /> --- # pivot_longer() from `tidyr` ### pivot_longer wants to know 3 things 1. names_to = what you want to call the new column containing the names of the columns 2. values_to = what you want to call the new column containing the values 3. the range of columns that you want to make long (col1:col6) <br> ``` longdata <- wide %>% pivot_longer(names_to = "new_names_col", values_to = "new_values_col", col1:col6) ``` --- class: middle, center # My turn: Make favourite things long --- class: inverse # Your turn 8: make your wide data long Take a look at your data and work out ... 1. which columns need to become long 2. what the values column should be called 3. what the "names" column should be called <br> ``` longdata <- wide %>% pivot_longer(names_to = "new_names_col", values_to = "new_values_col", col1:col6) ```
15
:
00
--- # Before we break... reminding you about a key idea ### The pipe %>% allows you to string together a series of functions and accomplish a LOT in just a few lines of code ``` ready_to_plot <- raw_data %>% select(Just, The, Columns, You, Want) %>% rename(just = Just, the = The, columns = Columns, you = You, want = Want) %>% pivot_longer(names_to = "condition", values = "response", columns:want) ``` --- class: center, middle # Time for a break... ### Next up.... summary stats & plotting
10
:
00
--- # Get a quick summary of... ## ... the structure of your data ``` head(yourdata) str(yourdata) length(yourdata) glimpse(yourdata) names(yourdata) ``` --- # Get some quick summary stats There are lots of ways to get summary stats in R. .pull-left[ #### base R ``` summary(yourdata) ``` #### `skimr` package ``` skim(yourdata) ``` ] .pull-right[ #### `dplyr` package ``` summary_stats <- yourdata %>% group_by(something_interesting) %>% summarise(mean = mean(resp), stdev = sd(resp), n = n(), se = stdev/sqrt(n)) ``` ] --- class: middle, center # My turn: summarise my favourite things --- class: inverse # Your turn 9: Get some summary stats from your data Try out summary() or skim(), or create a dataframe using this code template ``` summary_stats <- yourdata %>% group_by(something_interesting) %>% summarise(mean = mean(resp), stdev = sd(resp), n = n(), se = stdev/sqrt(n)) ``` > Heads Up: do you need na.rm = TRUE?
10
:
00
--- class: center # What questions do you have ? <img src="img/questions5.jpg" width="300px" /> --- class: center, middle # We made it! Lets make a ggplot! --- # The grammar of graphics ### ggplot wants to know 3 things 1. what data you want to plot 2. the "aesthetics" i.e. which variables you want on the x and y, what things look like (colour/shape/fill) 3. which "geom" you want to use (point, line, col, histogram, violin) ``` ggplot(data, aes(x = __, y = __)) + geom_point() ``` #### WATCH OUT! You can pipe data into ggplot (and combine with dplyr functions) but within ggplot you need to ADD LAYERS with + ``` data %>% filter(interesting_variable > z) %>% ggplot(aes(x = __, y = __)) + geom_point() ``` --- class: middle, center # My turn: plot favourite things --- class: inverse # Your turn 10: Plot your data .pull-left[ ``` ggplot(data, aes(x = __, y = __)) + geom_point() ``` ``` data %>% filter(interesting_variable > z) %>% ggplot(aes(x = __, y = __, colour = condition)) + geom_point() + facet_wrap(~ group) ggsave("your_first_ggplot.png") ``` ] .pull-right[ <img src="img/ggplot2_masterpiece.png" width="400px" /> Image credit: [Allison Horst](https://github.com/allisonhorst/stats-illustrations) @allison_horst from R-Ladies Santa Barbara ]
10
:
00
--- # Google: How to I add/get rid of ________ ggplot in R? .pull-left[ ### 1. error bars ### 2. axis labels ### 3. a title ### 4. legend ### 5. colour .... ] .pull-right[ <img src="img/google_hadley.png" width="400px" /> ] Or check out the [R Graph Gallery](https://www.r-graph-gallery.com/index.html) for inspiration and code --- # What next? .pull-left[ ### 1. Unconf session tomorrow!! Come and help us make a "walkthrough" for R beginners ### 2. Check out [About](https://jenrichmond.github.io/AIMOSworkshop/about.html) for links to learning resources ### 3. Find an [R-Ladies chapter](https://gqueiroz.shinyapps.io/rshinylady/) If there isn't an R-Ladies in your town, start one! Or join R-Ladies Remote community. ] .pull-right[ ### 4. Get on Twitter Follow #rstats, try #TidyTuesday, join #r4ds Slack ### 5. Organise a "ShutUpAndCode" group Find people in your school who also want to learn; make new friends and make learning fun. ] --- class: centre # The End <iframe src="https://giphy.com/embed/xIJLgO6rizUJi" width="480" height="367" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/alice-in-wonderland-thank-you-xIJLgO6rizUJi">via GIPHY</a></p> ---