sparklyr

sparklyr 1.0: Apache Arrow, XGBoost, Broom and TFRecords

2019-03-15 Javier Luraschi
Thumbnail
With much excitement built over the past three years, we are thrilled to share that sparklyr 1.0 is now available on CRAN! The sparklyr package provides an R interface to Apache Spark. It supports dplyr, MLlib, streaming, extensions and many other features; however, this particular release enables the following new features: Arrow enables faster and larger data transfers between Spark and R. XGBoost enables training gradient boosting models over distributed datasets. Read more →

sparklyr 0.9: Streams and Kubernetes

2018-10-01 Javier Luraschi
Thumbnail
Today we are excited to share that a new release of sparklyr is available on CRAN! This 0.9 release enables you to: Create Spark structured streams to process real time data from many data sources using dplyr, SQL, pipelines, and arbitrary R code. Monitor connection progress with upcoming RStudio Preview 1.2 features and support for properly interrupting Spark jobs from R. Use Kubernetes clusters with sparklyr to simplify deployment and maintenance. Read more →

sparklyr 0.8: Production pipelines and graphs

2018-05-14 Kevin Kuo
Thumbnail
We’re pleased to announce that sparklyr 0.8 is now available on CRAN! Sparklyr provides an R interface to Apache Spark. It supports dplyr syntax for working with Spark DataFrames and exposes the full range of machine learning algorithms available in Spark ML. You can also learn more about Apache Spark and sparklyr at spark.rstudio.com and the sparklyr webinar series. In this version, we added support for Spark 2.3, Livy 0. Read more →

sparklyr 0.7: Spark Pipelines and Machine Learning

2018-01-29 Kevin Kuo
Thumbnail
We are excited to share that sparklyr 0.7 is now available on CRAN! Sparklyr provides an R interface to Apache Spark. It supports dplyr syntax for working with Spark DataFrames and exposes the full range of machine learning algorithms available in Spark. You can also learn more about Apache Spark and sparklyr in spark.rstudio.com and our new webinar series on Apache Spark. Features in this release: Adds support for ML Pipelines which provide a uniform set of high-level APIs to help create, tune, and deploy machine learning pipelines at scale. Read more →

sparklyr 0.6: Distributed R and external sources

2017-07-31 Javier Luraschi
We’re excited to announce a new release of the sparklyr package, available in CRAN today! sparklyr 0.6 introduces new features to: Distribute R computations using spark_apply() to execute arbitrary R code across your Spark cluster. You can now use all of your favorite R packages and functions in a distributed context. Connect to External Data Sources using spark_read_source(), spark_write_source(), spark_read_jdbc() and spark_write_jdbc(). Use the Latest Frameworks including dplyr 0. Read more →

See RStudio + sparklyr for big data at Strata + Hadoop World

2017-02-13 Roger Oberg
If big data is your thing, you use R, and you’re headed to Strata + Hadoop World in San Jose March 13 & 14th, you can experience in person how easy and practical it is to analyze big data with R and Spark. In a beginner level talk by RStudio’s Edgar Ruiz and an intermediate level workshop by Win-Vector’s John Mount, we cover the spectrum: What R is, what Spark is, how Sparklyr works, and what is required to set up and tune a Spark cluster. Read more →

sparklyr 0.5: Livy and dplyr improvements

2017-01-24 Javier Luraschi
Thumbnail
We’re happy to announce that version 0.5 of the sparklyr package is now available on CRAN. The new version comes with many improvements over the first release, including: Extended dplyr support by implementing: do() and n_distinct(). New functions including sdf_quantile(), ft_tokenizer() and ft_regex_tokenizer(). Improved compatibility, sparklyr now respects the value of the ‘na.action’ R option and dim(), nrow() and ncol(). Experimental support for Livy to enable clients, including RStudio, to connect remotely to Apache Spark. Read more →