Distributed Computing

sparklyr 0.7: Spark Pipelines and Machine Learning

2018-01-29 Kevin Kuo
Thumbnail
We are excited to share that sparklyr 0.7 is now available on CRAN! Sparklyr provides an R interface to Apache Spark. It supports dplyr syntax for working with Spark DataFrames and exposes the full range of machine learning algorithms available in Spark. You can also learn more about Apache Spark and sparklyr in spark.rstudio.com and our new webinar series on Apache Spark. Features in this release: Adds support for ML Pipelines which provide a uniform set of high-level APIs to help create, tune, and deploy machine learning pipelines at scale. Read more →

sparklyr 0.6: Distributed R and external sources

2017-07-31 Javier Luraschi
We’re excited to announce a new release of the sparklyr package, available in CRAN today! sparklyr 0.6 introduces new features to: Distribute R computations using spark_apply() to execute arbitrary R code across your Spark cluster. You can now use all of your favorite R packages and functions in a distributed context. Connect to External Data Sources using spark_read_source(), spark_write_source(), spark_read_jdbc() and spark_write_jdbc(). Use the Latest Frameworks including dplyr 0. Read more →