Работа с кластером на языке R
reysand - 3 июля, 2025 - 13:12
В начале файла инициализируйте Spark-сессию:
library(SparkR) sparkR.session(appName = "SparkR-example")
В конце программы обязательно завершайте сессию:
sparkR.session.stop()
Пример программы на R (./example.R):
library(SparkR) # Initialize SparkSession sparkR.session(appName = "SparkR-DataFrame-example") # Create a simple local data.frame localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18)) # Convert local data frame to a SparkDataFrame df <- createDataFrame(localDF) # Print its schema printSchema(df) # root # |-- name: string (nullable = true) # |-- age: double (nullable = true) # Create a DataFrame from a JSON file path <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/people.json") peopleDF <- read.json(path) printSchema(peopleDF) # root # |-- age: long (nullable = true) # |-- name: string (nullable = true) # Register this DataFrame as a table. createOrReplaceTempView(peopleDF, "people") # SQL statements can be run by using the sql methods teenagers <- sql("SELECT name FROM people WHERE age >= 13 AND age <= 19") # Call collect to get a local data.frame teenagersLocalDF <- collect(teenagers) # Print the teenagers in our dataset print(teenagersLocalDF) # Stop the SparkSession now sparkR.session.stop()
Запуск программы на кластере:
spark-submit --jars /opt/spark-plugins/rapids-4-spark_2.12-25.06.0.jar example.R
- 24 просмотра