Spark

sparklyr 0.9

2018-10-01 Javier Luraschi
Thumbnail
Today we are excited to share that a new release of sparklyr is available on CRAN! This 0.9 release enables you to: Create Spark structured streams to process real time data from many data sources using dplyr, SQL, pipelines, and arbitrary R code. Monitor connection progress with upcoming RStudio Preview 1.2 features and support for properly interrupting Spark jobs from R. Use Kubernetes clusters with sparklyr to simplify deployment and maintenance. Read more →

sparklyr 0.7

2018-01-29 Kevin Kuo
Thumbnail
We are excited to share that sparklyr 0.7 is now available on CRAN! Sparklyr provides an R interface to Apache Spark. It supports dplyr syntax for working with Spark DataFrames and exposes the full range of machine learning algorithms available in Spark. You can also learn more about Apache Spark and sparklyr in spark.rstudio.com and our new webinar series on Apache Spark. Features in this release: Adds support for ML Pipelines which provide a uniform set of high-level APIs to help create, tune, and deploy machine learning pipelines at scale. Read more →

sparklyr 0.6

2017-07-31 Javier Luraschi
We’re excited to announce a new release of the sparklyr package, available in CRAN today! sparklyr 0.6 introduces new features to: Distribute R computations using spark_apply() to execute arbitrary R code across your Spark cluster. You can now use all of your favorite R packages and functions in a distributed context. Connect to External Data Sources using spark_read_source(), spark_write_source(), spark_read_jdbc() and spark_write_jdbc(). Use the Latest Frameworks including dplyr 0.7, DBI 0. Read more →

Registration open for rstudio::conf 2018!

2017-07-12 Roger Oberg
Thumbnail
RStudio is very excited to announce that rstudio::conf 2018 is open for registration! rstudio::conf, the conference on all things R and RStudio, will take place February 2 and 3, 2018 in San Diego, California, preceded by Training Days on January 31 and February 1. This year’s conference will feature keynotes from Di Cook, Monash University Professor and Iowa State University Emeritus Faculty; and J.J. Allaire, RStudio Founder, CEO & Principal Developer, along with talks from Shiny creator Joe Cheng and (no-introduction-necessary) Hadley Wickham. Read more →

See RStudio + sparklyr for big data at Strata + Hadoop World

2017-02-13 Roger Oberg
If big data is your thing, you use R, and you’re headed to Strata + Hadoop World in San Jose March 13 & 14th, you can experience in person how easy and practical it is to analyze big data with R and Spark. In a beginner level talk by RStudio’s Edgar Ruiz and an intermediate level workshop by Win-Vector’s John Mount, we cover the spectrum: What R is, what Spark is, how Sparklyr works, and what is required to set up and tune a Spark cluster. Read more →

sparklyr 0.5

2017-01-24 Javier Luraschi
Thumbnail
We’re happy to announce that version 0.5 of the sparklyr package is now available on CRAN. The new version comes with many improvements over the first release, including: Extended dplyr support by implementing: do() and n_distinct(). New functions including sdf_quantile(), ft_tokenizer() and ft_regex_tokenizer(). Improved compatibility, sparklyr now respects the value of the ‘na.action’ R option and dim(), nrow() and ncol(). Experimental support for Livy to enable clients, including RStudio, to connect remotely to Apache Spark. Read more →

Spark 1.4 for RStudio

2015-07-14 Garrett Grolemund
Thumbnail
Today’s guest post is written by Vincent Warmerdam of GoDataDriven and is reposted with Vincent’s permission from blog.godatadriven.com. You can learn more about how to use SparkR with RStudio at the 2015 EARL Conference in Boston November 2-4, where Vincent will be speaking live. This document contains a tutorial on how to provision a spark cluster with RStudio. You will need a machine that can run bash scripts and a functioning account on AWS. Read more →

SparkR preview by Vincent Warmerdam

2015-05-28 Garrett Grolemund
This is a guest post by Vincent Warmerdam of koaning.io. SparkR preview in Rstudio Apache Spark is the hip new technology on the block. It allows you to write scripts in a functional style and the technology behind it will allow you to run iterative tasks very quickly on a cluster of machines. It’s benchmarked to be quicker than hadoop for most machine learning use cases (by a factor between 10-100) and soon Spark will also have support for the R language. Read more →