We are excited to announce that the keras package is now available on CRAN. The package provides an R interface to Keras, a high-level neural networks API developed with a focus on enabling fast experimentation. Keras has the following key features:

If you are already familiar with Keras and want to jump right in, check out https://keras.rstudio.com which has everything you need to get started including over 20 complete examples to learn from.

To learn a bit more about Keras and why we’re so excited to announce the Keras interface for R, read on!

Keras and Deep Learning

Interest in deep learning has been accelerating rapidly over the past few years, and several deep learning frameworks have emerged over the same time frame. Of all the available frameworks, Keras has stood out for its productivity, flexibility and user-friendly API. At the same time, TensorFlow has emerged as a next-generation machine learning platform that is both extremely flexible and well-suited to production deployment.

Not surprisingly, Keras and TensorFlow have of late been pulling away from other deep learning frameworks:

The good news about Keras and TensorFlow is that you don’t need to choose between them! The default backend for Keras is TensorFlow and Keras can be integrated seamlessly with TensorFlow workflows. There is also a pure-TensorFlow implementation of Keras with deeper integration on the roadmap for later this year.

Keras and TensorFlow are the state of the art in deep learning tools and with the keras package you can now access both with a fluent R interface.

Getting Started

Installation

To begin, install the keras R package from CRAN as follows:

install.packages("keras")

The Keras R interface uses the TensorFlow backend engine by default. To install both the core Keras library as well as the TensorFlow backend use the install_keras() function:

library(keras)
install_keras()

This will provide you with default CPU-based installations of Keras and TensorFlow. If you want a more customized installation, e.g. if you want to take advantage of NVIDIA GPUs, see the documentation for install_keras().

MNIST Example

We can learn the basics of Keras by walking through a simple example: recognizing handwritten digits from the MNIST dataset. MNIST consists of 28 x 28 grayscale images of handwritten digits like these:

The dataset also includes labels for each image, telling us which digit it is. For example, the labels for the above images are 5, 0, 4, and 1.

Preparing the Data

The MNIST dataset is included with Keras and can be accessed using the dataset_mnist() function. Here we load the dataset then create variables for our test and training data:

library(keras)
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y

The x data is a 3-d array (images,width,height) of grayscale values. To prepare the data for training we convert the 3-d arrays into matrices by reshaping width and height into a single dimension (28x28 images are flattened into length 784 vectors). Then, we convert the grayscale values from integers ranging between 0 to 255 into floating point values ranging between 0 and 1:

# reshape
dim(x_train) <- c(nrow(x_train), 784)
dim(x_test) <- c(nrow(x_test), 784)
# rescale
x_train <- x_train / 255
x_test <- x_test / 255

The y data is an integer vector with values ranging from 0 to 9. To prepare this data for training we one-hot encode the vectors into binary class matrices using the Keras to_categorical() function:

y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)

Defining the Model

The core data structure of Keras is a model, a way to organize layers. The simplest type of model is the sequential model, a linear stack of layers.

We begin by creating a sequential model and then adding layers using the pipe (%>%) operator:

model <- keras_model_sequential() 
model %>% 
  layer_dense(units = 256, activation = "relu", input_shape = c(784)) %>% 
  layer_dropout(rate = 0.4) %>% 
  layer_dense(units = 128, activation = "relu") %>%
  layer_dropout(rate = 0.3) %>%
  layer_dense(units = 10, activation = "softmax")

The input_shape argument to the first layer specifies the shape of the input data (a length 784 numeric vector representing a grayscale image). The final layer outputs a length 10 numeric vector (probabilities for each digit) using a softmax activation function.

Use the summary() function to print the details of the model:

summary(model)
Model
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
================================================================================
dense_1 (Dense)                     (None, 256)                     200960      
________________________________________________________________________________
dropout_1 (Dropout)                 (None, 256)                     0           
________________________________________________________________________________
dense_2 (Dense)                     (None, 128)                     32896       
________________________________________________________________________________
dropout_2 (Dropout)                 (None, 128)                     0           
________________________________________________________________________________
dense_3 (Dense)                     (None, 10)                      1290        
================================================================================
Total params: 235,146
Trainable params: 235,146
Non-trainable params: 0
________________________________________________________________________________

Next, compile the model with appropriate loss function, optimizer, and metrics:

model %>% compile(
  loss = "categorical_crossentropy",
  optimizer = optimizer_rmsprop(),
  metrics = c("accuracy")
)

Training and Evaluation

Use the fit() function to train the model for 30 epochs using batches of 128 images:

history <- model %>% fit(
  x_train, y_train, 
  epochs = 30, batch_size = 128, 
  validation_split = 0.2
)

The history object returned by fit() includes loss and accuracy metrics which we can plot:

plot(history)

Evaluate the model’s performance on the test data:

model %>% evaluate(x_test, y_test,verbose = 0)
$loss
[1] 0.1149

$acc
[1] 0.9807

Generate predictions on new data:

model %>% predict_classes(x_test)
  [1] 7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2
 [40] 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2
 [79] 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6 9
 [ reached getOption("max.print") -- omitted 9900 entries ]

Keras provides a vocabulary for building deep learning models that is simple, elegant, and intuitive. Building a question answering system, an image classification model, a neural Turing machine, or any other model is just as straightforward.

The Guide to the Sequential Model article describes the basics of Keras sequential models in more depth.

Examples

Over 20 complete examples are available (special thanks to @dfalbel for his work on these!). The examples cover image classification, text generation with stacked LSTMs, question-answering with memory networks, transfer learning, variational encoding, and more.

Example Description
addition_rnn Implementation of sequence to sequence learning for performing addition of two numbers (as strings).
babi_memnn Trains a memory network on the bAbI dataset for reading comprehension.
babi_rnn Trains a two-branch recurrent network on the bAbI dataset for reading comprehension.
cifar10_cnn Trains a simple deep CNN on the CIFAR10 small images dataset.
conv_lstm Demonstrates the use of a convolutional LSTM network.
deep_dream Deep Dreams in Keras.
imdb_bidirectional_lstm Trains a Bidirectional LSTM on the IMDB sentiment classification task.
imdb_cnn Demonstrates the use of Convolution1D for text classification.
imdb_cnn_lstm Trains a convolutional stack followed by a recurrent stack network on the IMDB sentiment classification task.
imdb_fasttext Trains a FastText model on the IMDB sentiment classification task.
imdb_lstm Trains a LSTM on the IMDB sentiment classification task.
lstm_text_generation Generates text from Nietzsche’s writings.
mnist_acgan Implementation of AC-GAN (Auxiliary Classifier GAN ) on the MNIST dataset
mnist_antirectifier Demonstrates how to write custom layers for Keras
mnist_cnn Trains a simple convnet on the MNIST dataset.
mnist_irnn Reproduction of the IRNN experiment with pixel-by-pixel sequential MNIST in “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units” by Le et al.
mnist_mlp Trains a simple deep multi-layer perceptron on the MNIST dataset.
mnist_hierarchical_rnn Trains a Hierarchical RNN (HRNN) to classify MNIST digits.
mnist_transfer_cnn Transfer learning toy example.
neural_style_transfer Neural style transfer (generating an image with the same “content” as a base image, but with the “style” of a different picture).
reuters_mlp Trains and evaluates a simple MLP on the Reuters newswire topic classification task.
stateful_lstm Demonstrates how to use stateful RNNs to model long sequences efficiently.
variational_autoencoder Demonstrates how to build a variational autoencoder.
variational_autoencoder_deconv Demonstrates how to build a variational autoencoder with Keras using deconvolution layers.

Learning More

After you’ve become familiar with the basics, these articles are a good next step:

Keras provides a productive, highly flexible framework for developing deep learning models. We can’t wait to see what the R community will do with these tools!