Skip to main content

Posts Tagged ‘RDDs’

Pens 1080451 1280

Spark: RDD vs DataFrame vs Dataset

In the context of Apache Spark, RDD, DataFrame, and Dataset are different abstractions for working with structured and semi-structured data. Here’s a brief definition of each: RDD (Resilient Distributed Dataset): RDD is the basic abstraction in Spark. It represents an immutable, distributed collection of objects that can be processed in parallel across a cluster. RDDs […]