cool stuff from tidytuesdayR

Author
Published

May 16, 2025

Have you tried #TidyTuesday? It is a weekly data challenge where the team from the Data Science Learning Community curate and post a dataset to their github repository, then data nerds from all over the world have a go at making a cool visualisation with it and everyone shares what they came up with on social media via the hashtag #tidytuesday.

Data organising within the #rstats community takes work; TidyTuesday doesn’t just happen. Ted Laderas has talked a lot about burnout among organisers. It is common in volunteer settings for 20% of the people to do 80% of the work, but this can lead to unsustainable communities. Ted points to importance of expanding your core group of organisers and making it easy for people to contribute, as ways to make a data initiative work better for everyone; many hands make light work.

Recently the Jon Harmon and the #tidytuesday team have put this philosophy into action by creating some functions within the tidytuesdayR package that make it easy to curate and contribute a dataset to the challenge.

How to use tidytuesdayR as a participant

The package has a number of functions that make it super easy for you get the data into RStudio. No need to download the csv and read it back in. Just use the tt_load() with the date or year and week.

Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
# install.packages("tidytuesdayR")
library(tidytuesdayR)

tues_data <- tidytuesdayR::tt_load("2025-04-29") 
---- Compiling #TidyTuesday Information for 2025-04-29 ----
--- There is 1 file available ---


── Downloading files ───────────────────────────────────────────────────────────

  1 of 1: "user2025.csv"
Code
# OR use year and week
# tues_data <- tidytuesdayR::tt_load(2025, week = 17) 

Your tues_data object will be a list that will sometimes contain more than one table so using str() will give you an idea of which dataframe might be of interest.

str(tues_data)
List of 1
 $ user2025: spc_tbl_ [128 × 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
  ..$ id             : num [1:128] 170 79 30 31 39 169 94 163 13 51 ...
  ..$ session        : chr [1:128] "Virtual" "Virtual" "Virtual" "Virtual" ...
  ..$ date           : Date[1:128], format: "2025-08-01" "2025-08-01" ...
  ..$ time           : chr [1:128] "TBD" "TBD" "TBD" "TBD" ...
  ..$ room           : chr [1:128] "Online" "Online" "Online" "Online" ...
  ..$ title          : chr [1:128] "A Robust and Informative Application for viewing the dataframes in R" "A first look at Positron" "Analyzing Census Data in R: Techniques and Applications" "Automating workflows with webhooks and plumber in R" ...
  ..$ content        : chr [1:128] "In R programming, the View() function from the Utils package provides a basic interface for viewing the datafra"| __truncated__ "Positron is a next generation data science IDE built by the creators of RStudio. It has been available for beta"| __truncated__ "This talk provides an introduction to working with IPUMS Census American Community Survey (ACS) data in R, focu"| __truncated__ "Webhooks have brought to us new possibilities for automating workflows. With such, we can eliminate the need fo"| __truncated__ ...
  ..$ video_recording: chr [1:128] "✅" "✅" "✅" "✅" ...
  ..$ keywords       : chr [1:128] "statistical programming, clinical trials data, dataset interface, workflow" "ide, workflow, tooling" "demography, frameworks, census data, equity ml/ai, anti-discrimination in ml/ai" "automation, event-driven workflows, plumber api, github webhooks" ...
  ..$ speakers       : chr [1:128] "Madhan Kumar Nagaraji" "Julia Silge (Posit PBC)" "Joanne Rodrigues" "CLINTON DAVID" ...
  ..$ co_authors     : chr [1:128] NA NA NA NA ...
  ..- attr(*, "spec")=
  .. .. cols(
  .. ..   id = col_double(),
  .. ..   session = col_character(),
  .. ..   date = col_date(format = ""),
  .. ..   time = col_character(),
  .. ..   room = col_character(),
  .. ..   title = col_character(),
  .. ..   content = col_character(),
  .. ..   video_recording = col_character(),
  .. ..   keywords = col_character(),
  .. ..   speakers = col_character(),
  .. ..   co_authors = col_character()
  .. .. )
  ..- attr(*, "problems")=<externalptr> 
 - attr(*, ".tt")= 'tt' chr "user2025.csv"
  ..- attr(*, ".files")='data.frame':   1 obs. of  3 variables:
  .. ..$ data_files: chr "user2025.csv"
  .. ..$ data_type : chr "csv"
  .. ..$ delim     : chr ","
  ..- attr(*, ".readme")=List of 2
  .. ..$ node:<externalptr> 
  .. ..$ doc :<externalptr> 
  .. ..- attr(*, "class")= chr [1:2] "xml_document" "xml_node"
  ..- attr(*, ".date")= Date[1:1], format: "2025-04-29"
 - attr(*, "class")= chr "tt_data"

In this case the data is about the userR2025 conference schedule. We can pull out the dataframe from the list using list$dataframe to get started.

user25 <- tues_data$user2025

glimpse(user25)
Rows: 128
Columns: 11
$ id              <dbl> 170, 79, 30, 31, 39, 169, 94, 163, 13, 51, 144, 145, 1…
$ session         <chr> "Virtual", "Virtual", "Virtual", "Virtual", "Virtual",…
$ date            <date> 2025-08-01, 2025-08-01, 2025-08-01, 2025-08-01, 2025-…
$ time            <chr> "TBD", "TBD", "TBD", "TBD", "TBD", "TBD", "TBD", "TBD"…
$ room            <chr> "Online", "Online", "Online", "Online", "Online", "Onl…
$ title           <chr> "A Robust and Informative Application for viewing the …
$ content         <chr> "In R programming, the View() function from the Utils …
$ video_recording <chr> "✅", "✅", "✅", "✅", "✅", "✅", "✅", "✅", "✅", "✅", "✅",…
$ keywords        <chr> "statistical programming, clinical trials data, datase…
$ speakers        <chr> "Madhan Kumar Nagaraji", "Julia Silge (Posit PBC)", "J…
$ co_authors      <chr> NA, NA, NA, NA, "Abbie Brookes (Data Scientist @ Datac…

How to use tidytuesdayR as a contributor

Do you have an idea of a dataset that might be of interest to other tidytuesdayers? Great! The new functions in tidytuesdayR make it super easy to contribute a dataset.

All you have to do is follow these instructions.

Here are some notes I made for myself while I was curating the week 20 water quality dataset. Make sure you have the tidytuesdayR installed and loaded, and a github account sorted before you begin.

Step 1: write a cleaning script

This function opens a cleaning.R script that you can use to write the code you need to get your data file from its raw state into a state that other people can use.

tt_clean()

Step 2: save your clean data to .csv

Once you have written your cleaning.R script and checked that it produces clean dataframes, you can save your datafile. This function will save your dataframes as .csv in your submission folder. It will also open a .md file with a table that you can complete that describes each of the variables in your dataset.

tt_save_dataset(nameofyourdf)

Step 3: introduce your data

This function opens another .md file that you can use to write an introduction to your dataset. You can describe the data, where it comes from and suggest some questions that people might like to explore. You also want to think about an image that can go along with the data.

tt_intro()

Step 4: meta data

This bit was cool: this function walks you through a question and answer session, filling in all the details needed for the meta data. Once you are done answering the questions, it creates a meta.yaml file.

tt_meta()

Step 5: submit

This bit creates a pull request (i.e. a request that the tidytuesday team pull in your curated dataset to their repo). Make sure you have a github account sorted before embarking on this last step.

tt_submit()

DONE!