Hero image
Short term forecast from the California COVID Assessment Tool (CalCAT)

“Things move along so rapidly nowadays that people saying: “It can’t be done,” are always being interrupted by somebody doing it.” – Puck magazine, 1903.

As we at RStudio have talked about the topic of serious data science, we often field questions about the suitability of R for use in large-scale, production environments. Those questions typically coalesce around:

  1. Speed: Is R fast enough to run production workloads?
  2. Scalability: Can R be used for large scale production?
  3. Infrastructure: What kind of R infrastructure do administrators need to run production applications?

Instead of debating these question in theory in this post, we’ll instead turn to an organization that is not just talking about deploying Shiny dashboards in large-scale production, but is actually “doing it”.

Many definitions exist for what constitutes an application being in large-scale production. For the purposes of this article, we’ll define large-scale production as:

Applications serving thousands of users on a daily basis.

One application that fits this definition nicely is the California COVID Assessment Tool (CalCAT) which serves 32 million Californian citizens. CalCAT is a Shiny dashboard written in R by a group of data scientists within the California Department of Public Health (CDPH) and is hosted on an array of commercial RStudio Team servers.

RStudio recently talked with members of the team who deployed this dashboard to understand how this public, large-scale Shiny app came to be. The following sections present some of our takeaways from those discussions.

CDPH’s First Shiny Dashboard Tracked Opioid Use

Opioid dashboard
CDPH's Opioid Overdose Surveillance application

The CalCAT dashboard project was born out of CDPH’s experience fielding a prior public-facing Shiny dashboard in 2016, namely the CDPH Opioid Overdose Surveillance application. That application evolved largely from:

COVID-19’s Arrival Made Sharing Data Mission Critical

When COVID-19 arrived in the United States in early 2020, many organizations, both inside and outside of the California Department of Public Health, suddenly found themselves wanting data to respond to the pandemic. That demand led to:

Once other departments gained access to the data, the app quickly became a vital source of COVID information throughout the state because it:

Responding to the Emergency: Creating A Public Dashboard for California Citizens

Covid dashboard
The CalCAT public dashboard

The extranet site helped CDPH and the county health officers understand both the depth and breadth of pandemic infections within California. However, on March 4, 2020, the following announcement spurred the department to build a public site.

“As part of the state’s response to address the global COVID-19 outbreak, Governor Gavin Newsom today declared a State of Emergency to make additional resources available, formalize emergency actions already underway across multiple state agencies and departments, and help the state prepare for broader spread of COVID-19. The proclamation comes as the number of positive California cases rises and following one official COVID-19 death.” – Gavin Newsom, Governor of California, March 4, 2020

In response to the Governor’s mandate, the team:

CDPH’s R Infrastructure Evolved to Support the Pandemic Efforts

As CalCAT gained popularity and the team gained experience, the infrastructure supporting the team evolved to meet the new demands by adding:

CalCAT’s Success Has Encouraged R Use Within CDPH

The project team noted how much the opioid dashboard changed CDPH’s thinking about how R could be used to deliver data to the public by:

Takeaways

The CalCAT experience shows that, despite claims to the contrary, R can be used for large-scale production applications. When we re-examine the three categories of concern about R with which we started the piece, we discover that:

By using a code-based approach, the California Department of Public Health has built a repository of human and intellectual capital around building public health dashboards. This small team’s work and open source code can now be passed on to others both within and outside of California government. Their efforts will likely spawn new projects that will better inform citizens and continue to help them stay safe throughout this unprecedented pandemic.

To Learn More

You can learn about each of RStudio’s commercial products by following the links below.