.. _sparklyr: Sparklyr ======== Sparklyr is an R package that interfaces from R to Apache Spark. It was created in 2016 by the Rstudio team and fits in the tydiverse ecosystem providing a complete dplyr backend for Spark. It makes Spark's APIs accesible from R, including SparkDataFrames and the MLlib machine learning library. To use sparklyr on the platform you will need to load the sparklyr module (:ref:`modules`).:: module load sparklyr This module includes an anaconda installation of python 2.7, R 3.1.5, sparklyr 1.0.5, and all its dependencies, so in order to use it you only need to start R and load the package:: R :: library(sparklyr) .. note:: You can check the list of preinstalled packages by typing `installed.packages()` on the R console. After that you will need to connect to the spark cluster, this is done using the `spark_connect()` function.:: sc <- spark_connect(master = "yarn-client", spark_home = Sys.getenv('SPARK_HOME')) And then use your spark connection `sc` to access any spark tool. Finally execute::: spark_disconnect(sc) to disconnect from spark. You can also use Sparklyr on a Jupyter Notebook with an R kernel. .. warning:: Remember to disconnect from spark and properly shut down the notebook server before logging out. Sparklyr is not limited to interactive use, you can also use `spark-submit` to launch a script as a job:: spark-submit --class sparklyr.Shell '/opt/cesga/anaconda/Anaconda2-2018.12-sparklyr/lib/R/library/sparklyr/java/sparklyr-2.4-2.11.jar' 8880 1234 --batch example_sparklyr_script.R For further information on Sparklyr you can check the getting started `Sparklyr Tutorial`_ and take a look at the `Sparklyr workshop`_. There is also the `official documentation`_ by the RStudio Team, including this handy `cheatsheet`_. .. _Sparklyr Tutorial: https://bigdata.cesga.es/tutorials/sparklyr.html .. _Sparklyr workshop: https://github.com/aurora-mareviv/sparklyr_test .. _official documentation: https://spark.rstudio.com/ .. _cheatsheet: https://rstudio.com/resources/cheatsheets/#sparklyrcheatsheet