Skip to main content

Aarthii Gurunathan

Aarthii is an Associate Technical Consultant at Perficient, currently specializes as a Databricks Spark Developer. She is proficient in technologies such as SQL, Databricks, Spark, Scala and Java. She enthusiastically explores new technologies, continually learning to maintain her productivity. She endeavours in contributing in various capacities to give back to the community.

Blogs from this Author

Star Vs Snowflake

Introduction to Star and Snowflake schema

In the world of data warehousing and business intelligence, two key concepts are fundamental: Snowflake and Star Schema. These concepts play a pivotal role in designing effective data models for analyzing large volumes of data efficiently. Let’s delve into what Snowflake and Star Schema are and how they are used in the realm of data […]

Sparkler 4629347 1920

Spark SQL Properties

The spark.sql.* properties are a set of configuration options specific to Spark SQL, a module within Apache Spark designed for processing structured data using SQL queries, DataFrame API, and Datasets. These properties allow users to customize various aspects of Spark SQL’s behavior, optimization strategies, and execution environment. Here’s a brief introduction to some common spark.sql.* […]

Date and time

Date and Timestamp in Spark SQL

Spark SQL offers a set of built-in standard functions for handling dates and timestamps within the DataFrame API. These functions are valuable for performing operations involving date and time data. They accept inputs in various formats, including Date type, Timestamp type, or String. If the input is provided as a String, it must be in […]

Matrix 1735640 1920

DBFS (Databricks File System) in Apache Spark

In the world of big data processing, efficient and scalable file systems play a crucial role. One such file system that has gained popularity in the Apache Spark ecosystem is DBFS, which stands for Databricks File System. In this blog post, we’ll explore into what DBFS is, how it works, and provide examples to illustrate […]

Computer 4484282 1920

Spark: Persistence Storage Levels

Spark Persistence is an optimization technique, which saves the results of RDD evaluation. Spark provides a convenient method for working with datasets by storing them in memory throughout various operations. When you persist a dataset, Spark stores the data on disk or in memory, or a combination of the two, so that it can be […]

blog

Understanding Spark Transformations and Actions – Spark RDD Operations

A comprehensive understanding of Spark’s transformation and action is crucial for efficient Spark code. This blog provides a glimpse on the fundamental aspects of Spark.  Before we deep dive into Spark’s transformation and action, let us see a glance of RDD and Dataframe.  Resilient Distributed Dataset (RDD): Usually, Spark tasks operate on RDDs, which is […]