Arrow

sparklyr 1.0: Apache Arrow, XGBoost, Broom and TFRecords

2019-03-15 Javier Luraschi
Thumbnail
With much excitement built over the past three years, we are thrilled to share that sparklyr 1.0 is now available on CRAN! The sparklyr package provides an R interface to Apache Spark. It supports dplyr, MLlib, streaming, extensions and many other features; however, this particular release enables the following new features: Arrow enables faster and larger data transfers between Spark and R. XGBoost enables training gradient boosting models over distributed datasets. Read more →

Arrow and beyond: Collaborating on next generation tools for open source data science

2018-04-19 JJ Allaire
Two years ago, Wes McKinney and Hadley Wickham got together to discuss some of the systems challenges facing the Python and R communities. Data science teams inevitably work with multiple languages and systems, so it’s critical that data flow seamlessly and efficiently between these environments. Wes and Hadley wanted to explore opportunities to collaborate on tools for improving interoperability between Python, R, and external compute and storage systems. This discussion led to the creation of the feather file format, a very fast on-disk format for storing data frames that can be read and written to by multiple languages. Read more →