Purrr is a new package that fills in the missing pieces in R’s functional programming tools: it’s designed to make your pure functions purrr. Like many of my recent packages, it works with magrittr to allow you to express complex operations by combining simple pieces in a standard way.

Install it with:

install.packages("purrr")

Purrr wouldn’t be possible without Lionel Henry. He wrote a lot of the package and his insightful comments helped me rapidly iterate towards a stable, useful, and understandable package.

Map functions

The core of purrr is a set of functions for manipulating vectors (atomic vectors, lists, and data frames). The goal is similar to dplyr: help you tackle the most common 90% of data manipulation challenges. But where dplyr focusses on data frames, purrr focusses on vectors. For example, the following code splits the built-in mtcars dataset up by number of cylinders (using the base split() function), fits a linear model to each piece, summarises each model, then extracts the the (R^2):

mtcars %>%
  split(.$cyl) %>%
  map(~lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
  map_dbl("r.squared")
#>     4     6     8
#> 0.509 0.465 0.423

The first argument to all map functions is the vector to operate on. The second argument, .f specifies what to do with each piece. It can be:

Map functions come in a few different variations based on their inputs and output:

iris %>% map_if(is.factor, as.character) %>% str()
#> 'data.frame':    150 obs. of  5 variables:
#>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>  $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...

map_at() works similarly but instead of working with a logical vector or predicate function, it works with a integer vector of element positions.

map2(1:3, 2:4, c)
#> [[1]]
#> [1] 1 2
#>
#> [[2]]
#> [1] 2 3
#>
#> [[3]]
#> [1] 3 4
map2(1:3, 2:4, ~ .x * (.y - 1))
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 4
#>
#> [[3]]
#> [1] 9

map3() does the same thing for three lists, and map_n() does it in general.

list(m1 = mean, m2 = median) %>%
  invoke_dbl(rcauchy(100))
#>    m1    m2
#> 9.765 0.117

Purrr and dplyr

I’m becoming increasingly enamoured with the list-columns in data frames. The following example combines purrr and dplyr to generate 100 random test-training splits in order to compute an unbiased estimate of prediction quality. These tools are still experimental (and currently need quite a bit of extra scaffolding), but I think the basic approach is really appealing.

library(dplyr)
random_group <- function(n, probs) {
  probs <- probs / sum(probs)
  g <- findInterval(seq(0, 1, length = n), c(0, cumsum(probs)),
    rightmost.closed = TRUE)
  names(probs)[sample(g)]
}
partition <- function(df, n, probs) {
  n %>%
    replicate(split(df, random_group(nrow(df), probs)), FALSE) %>%
    zip_n() %>%
    as_data_frame()
}

msd <- function(x, y) sqrt(mean((x - y) ^ 2))

# Genearte 100 random test-training splits,
cv <- mtcars %>%
  partition(100, c(training = 0.8, test = 0.2)) %>%
  mutate(
    # Fit the model
    model = map(training, ~ lm(mpg ~ wt, data = .)),
    # Make predictions on test data
    pred = map2(model, test, predict),
    # Calculate mean squared difference
    diff = map2(pred, test %>% map("mpg"), msd) %>% flatten()
  )
cv
#> Source: local data frame [100 x 5]
#>
#>                   test             training   model     pred  diff
#>                 (list)               (list)  (list)   (list) (dbl)
#> 1  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.70
#> 2  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  2.03
#> 3  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  2.29
#> 4  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  4.88
#> 5  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.20
#> 6  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  4.68
#> 7  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.39
#> 8  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.82
#> 9  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  2.56
#> 10 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.40
#> ..                 ...                  ...     ...      ...   ...
mean(cv$diff)
#> [1] 3.22

Other functions

There are too many other pieces of purrr to describe in detail here. A few of the most useful functions are noted below:

x <- list(list(a = 1, b = 2), list(a = 2, b = 1))
x %>% str()
#> List of 2
#>  $ :List of 2
#>   ..$ a: num 1
#>   ..$ b: num 2
#>  $ :List of 2
#>   ..$ a: num 2
#>   ..$ b: num 1

x %>%
  zip_n() %>%
  str()
#> List of 2
#>  $ a:List of 2
#>   ..$ : num 1
#>   ..$ : num 2
#>  $ b:List of 2
#>   ..$ : num 2
#>   ..$ : num 1

x %>%
  zip_n(.simplify = TRUE) %>%
  str()
#> List of 2
#>  $ a: num [1:2] 1 2
#>  $ b: num [1:2] 2 1
1:10 %>% keep(~. %% 2 == 0)
#> [1]  2  4  6  8 10
1:10 %>% discard(~. %% 2 == 0)
#> [1] 1 3 5 7 9

list(list(x = TRUE, y = 10), list(x = FALSE, y = 20)) %>%
  keep("x") %>%
  str()
#> List of 1
#>  $ :List of 2
#>   ..$ x: logi TRUE
#>   ..$ y: num 10

list(NULL, 1:3, NULL, 7) %>%
  compact() %>%
  str()
#> List of 2
#>  $ : int [1:3] 1 2 3
#>  $ : num 7

Design philosophy

The goal of purrr is not try and turn R into Haskell in R: it does not implement currying, or destructuring binds, or pattern matching. The goal is to give you similar expressiveness to a classical FP language, while allowing you to write code that looks and feels like R.