Spark

Registration open for rstudio::conf 2018!

2017-07-12 Roger Oberg
Thumbnail
RStudio is very excited to announce that rstudio::conf 2018 is open for registration! rstudio::conf, the conference on all things R and RStudio, will take place February 2 and 3, 2018 in San Diego, California, preceded by Training Days on January 31 and February 1. This year’s conference will feature keynotes from Di Cook, Monash University Professor and Iowa State University Emeritus Faculty; and J.J. Allaire, RStudio Founder, CEO & Principal Developer, along with talks from Shiny creator Joe Cheng and (no-introduction-necessary) Hadley Wickham. Read more →

See RStudio + sparklyr for big data at Strata + Hadoop World

2017-02-13 Roger Oberg
If big data is your thing, you use R, and you’re headed to Strata + Hadoop World in San Jose March 13 & 14th, you can experience in person how easy and practical it is to analyze big data with R and Spark. In a beginner level talk by RStudio’s Edgar Ruiz and an intermediate level workshop by Win-Vector’s John Mount, we cover the spectrum: What R is, what Spark is, how Sparklyr works, and what is required to set up and tune a Spark cluster. Read more →

sparklyr 0.5

2017-01-24 Javier Luraschi
Thumbnail
We’re happy to announce that version 0.5 of the sparklyr package is now available on CRAN. The new version comes with many improvements over the first release, including: Extended dplyr support by implementing: do() and n_distinct(). New functions including sdf_quantile(), ft_tokenizer() and ft_regex_tokenizer(). Improved compatibility, sparklyr now respects the value of the ‘na.action’ R option and dim(), nrow() and ncol(). Experimental support for Livy to enable clients, including RStudio, to connect remotely to Apache Spark. Read more →

Spark 1.4 for RStudio

2015-07-14 Garrett Grolemund
Thumbnail
Today’s guest post is written by Vincent Warmerdam of GoDataDriven and is reposted with Vincent’s permission from blog.godatadriven.com. You can learn more about how to use SparkR with RStudio at the 2015 EARL Conference in Boston November 2-4, where Vincent will be speaking live. This document contains a tutorial on how to provision a spark cluster with RStudio. You will need a machine that can run bash scripts and a functioning account on AWS. Read more →

SparkR preview by Vincent Warmerdam

2015-05-28 Garrett Grolemund
This is a guest post by Vincent Warmerdam of koaning.io. SparkR preview in Rstudio Apache Spark is the hip new technology on the block. It allows you to write scripts in a functional style and the technology behind it will allow you to run iterative tasks very quickly on a cluster of machines. It’s benchmarked to be quicker than hadoop for most machine learning use cases (by a factor between 10-100) and soon Spark will also have support for the R language. Read more →