In the realm of databases, SQL (Structured Query Language) serves as the lingua franca, enabling users to interact with data stored in various systems effectively. While SQL encompasses a wide array of commands, understanding the distinctions between Data Manipulation Language (DML), Data Definition Language (DDL), and Data Control Language (DCL) is fundamental for wielding this […]
Saranya Sridhar
Saranya is an Technical Consultant at Perficient, exploring the stages of Big Data. Her experience includes Spark, Scala, SQL, Databricks, Java, and BI tools like Tableau, Spotfire, and Power BI. She passionately delves into emerging technologies.

Blogs from this Author
Spark: RDD vs DataFrame vs Dataset
In the context of Apache Spark, RDD, DataFrame, and Dataset are different abstractions for working with structured and semi-structured data. Here’s a brief definition of each: RDD (Resilient Distributed Dataset): RDD is the basic abstraction in Spark. It represents an immutable, distributed collection of objects that can be processed in parallel across a cluster. RDDs […]
Scala: mutable data structure
Scala, a programming language that combines object-oriented and functional programming paradigms, provides a variety of mutable data structures. Mutable collections such as ArrayBuffer and HashMap facilitate in-place modifications, making them well-suited for situations demanding high-performance, mutable structures. They present a conventional alternative, providing a mutable counterpart to their immutable equivalents. All the mutable scala collections […]
Scala: Immutable data structure
Scala, a programming language that combines object-oriented and functional programming paradigms, provides a variety of immutable data structures. Immutable data structures are those that cannot be modified after they are created, which can be beneficial for ensuring safety and simplicity in concurrent or parallel programming. Here are some commonly used immutable data structures in Scala: […]
Spark: Dataframe joins
In Apache Spark, DataFrame joins are operations that allow you to combine two DataFrames based on a common column or set of columns. Join operations are fundamental for data analysis and manipulation, particularly when dealing with distributed and large-scale datasets. Spark provides a rich set of APIs for performing various types of DataFrame joins. Import […]
Spark Partition: An Overview
In Apache Spark, efficient data management is essential for maximizing performance in distributed computing. Partitioning, repartitioning, and coalescing actively govern how data organizes and distributes across the cluster. Partitioning involves dividing datasets into smaller chunks, enabling parallel processing and optimizing operations. Repartitioning allows for the redistribution of data across partitions, adjusting the balance for more […]