Skip to main content

Saranya Sridhar

Saranya is an Associate Technical Consultant at Perficient, exploring the stages of Big Data. Her experience includes Spark, Scala, SQL, Databricks, Java, and BI tools like Tableau, Spotfire, and Power BI. She passionately delves into emerging technologies.

Blogs from this Author

Equality 2110602 1280

SQL: DML, DDL and DCL

In the realm of databases, SQL (Structured Query Language) serves as the lingua franca, enabling users to interact with data stored in various systems effectively. While SQL encompasses a wide array of commands, understanding the distinctions between Data Manipulation Language (DML), Data Definition Language (DDL), and Data Control Language (DCL) is fundamental for wielding this […]

Pens 1080451 1280

Spark: RDD vs DataFrame vs Dataset

In the context of Apache Spark, RDD, DataFrame, and Dataset are different abstractions for working with structured and semi-structured data. Here’s a brief definition of each: RDD (Resilient Distributed Dataset): RDD is the basic abstraction in Spark. It represents an immutable, distributed collection of objects that can be processed in parallel across a cluster. RDDs […]

mutable

Scala: mutable data structure

Scala, a programming language that combines object-oriented and functional programming paradigms, provides a variety of mutable data structures. Mutable collections such as ArrayBuffer and HashMap facilitate in-place modifications, making them well-suited for situations demanding high-performance, mutable structures. They present a conventional alternative, providing a mutable counterpart to their immutable equivalents. All the mutable scala collections […]

immutable_image

Scala: Immutable data structure

Scala, a programming language that combines object-oriented and functional programming paradigms, provides a variety of immutable data structures. Immutable data structures are those that cannot be modified after they are created, which can be beneficial for ensuring safety and simplicity in concurrent or parallel programming. Here are some commonly used immutable data structures in Scala: […]

Business 4838852 1280

Spark: Dataframe joins

In Apache Spark, DataFrame joins are operations that allow you to combine two DataFrames based on a common column or set of columns. Join operations are fundamental for data analysis and manipulation, particularly when dealing with distributed and large-scale datasets. Spark provides a rich set of APIs for performing various types of DataFrame joins.  Import […]

Blog 2355684 1280

Spark Partition: An Overview

In Apache Spark, efficient data management is essential for maximizing performance in distributed computing. Partitioning, repartitioning, and coalescing actively govern how data organizes and distributes across the cluster. Partitioning involves dividing datasets into smaller chunks, enabling parallel processing and optimizing operations. Repartitioning allows for the redistribution of data across partitions, adjusting the balance for more […]