rowwise %>% mean

data wrangling
Author
Published

May 24, 2023

When you have data from a survey, the responses for each item are most often listed in different variables. Generally you have to average across the items to get a mean value for that scale for each participant. But dealing with calculations across rows is sometimes difficult in R.

Photo by Travis Essinger on Unsplash

load packages + make some data

library(tidyverse)


pID <- c("p1", "p2", "p3", "p4", "p5", "p6")
item1 = sample(1:7, 6, replace=T)
item2 = sample(1:7, 6, replace=T)
item3 = sample(1:7, 6, replace=T)
item4 = sample(1:7, 6, replace=T)
item5 = sample(1:7, 6, replace=T)

survey <- data.frame(pID, item1, item2, item3, item4, item5)

glimpse(survey)
Rows: 6
Columns: 6
$ pID   <chr> "p1", "p2", "p3", "p4", "p5", "p6"
$ item1 <int> 6, 5, 7, 6, 7, 3
$ item2 <int> 5, 7, 3, 2, 5, 3
$ item3 <int> 6, 1, 4, 4, 2, 5
$ item4 <int> 6, 2, 7, 7, 5, 6
$ item5 <int> 4, 2, 4, 5, 5, 2

base R rowMeans

The rowMeans() function works, but why the x and what do the dots mean??

survey_means_base <- survey %>%
  mutate(item_mean = rowMeans(x = select(.data = . , starts_with(match = "item"))))

tidyverse rowwise

The tidyverse version involves using rowwise() to tell R that you would like a mean calculated for each row in the dataset. Use c() to tell R which columns to average across.

Without rowwise(), R will calculate the mean of all rows/columns and put that in the new variable. You will end up with the same value for each row.

survey_means_norowwise <- survey %>%
  mutate(item_mean = mean(c(item1, item2, item3, item4, item5))) 

glimpse(survey_means_norowwise)
Rows: 6
Columns: 7
$ pID       <chr> "p1", "p2", "p3", "p4", "p5", "p6"
$ item1     <int> 6, 5, 7, 6, 7, 3
$ item2     <int> 5, 7, 3, 2, 5, 3
$ item3     <int> 6, 1, 4, 4, 2, 5
$ item4     <int> 6, 2, 7, 7, 5, 6
$ item5     <int> 4, 2, 4, 5, 5, 2
$ item_mean <dbl> 4.533333, 4.533333, 4.533333, 4.533333, 4.533333, 4.533333

With rowwise(), it calculates across the rows, separately for each participant.

Note

It is important to get into the habit of adding ungroup() after a rowwise() in the same way as you would after a group_by() because the dataframe becomes grouped by row, which can mess with calcuations further down the pipeline.

survey_means_rowwise <- survey %>%
  rowwise() %>%
  mutate(item_mean = mean(c(item1, item2, item3, item4, item5))) %>%
  ungroup()

glimpse(survey_means_rowwise)
Rows: 6
Columns: 7
$ pID       <chr> "p1", "p2", "p3", "p4", "p5", "p6"
$ item1     <int> 6, 5, 7, 6, 7, 3
$ item2     <int> 5, 7, 3, 2, 5, 3
$ item3     <int> 6, 1, 4, 4, 2, 5
$ item4     <int> 6, 2, 7, 7, 5, 6
$ item5     <int> 4, 2, 4, 5, 5, 2
$ item_mean <dbl> 5.4, 3.4, 5.0, 4.8, 4.8, 3.8

If there are a lot of columns to average across, you can avoid typing all of the names using c_across().

survey_means_rowwise_across <- survey %>%
  rowwise() %>%
  mutate(item_mean = mean(c_across(item1:item5))) %>%
  ungroup()

glimpse(survey_means_rowwise_across)
Rows: 6
Columns: 7
$ pID       <chr> "p1", "p2", "p3", "p4", "p5", "p6"
$ item1     <int> 6, 5, 7, 6, 7, 3
$ item2     <int> 5, 7, 3, 2, 5, 3
$ item3     <int> 6, 1, 4, 4, 2, 5
$ item4     <int> 6, 2, 7, 7, 5, 6
$ item5     <int> 4, 2, 4, 5, 5, 2
$ item_mean <dbl> 5.4, 3.4, 5.0, 4.8, 4.8, 3.8