In the context of Apache Spark, RDD, DataFrame, and Dataset are different abstractions for working with structured and semi-structured data. Here’s a brief definition of each: RDD (Resilient Distributed Dataset): RDD is the basic abstraction in Spark. It represents an immutable, distributed collection of objects that can be processed in parallel across a cluster. RDDs […]
Posts Tagged ‘dataset’