In data engineering and analytics workflows, merging files emerges as a common task when managing large datasets distributed across multiple files. Databricks, furnishing a powerful platform for processing big data, prominently employs Scala. In this blog post, we’ll delve into how to merge files efficiently using Scala on Databricks. Introduction: Merging files entails combining the […]
Gowtham Ramadoss Baskaran
Gowtham holds the role of Technical Consultant at Perficient, specializing as a Databricks Spark Developer. He is proficient in technologies like SQL, Databricks, Spark, Scala, and Java, so he actively pursues new knowledge to bolster his productivity. He works diligently in various roles to contribute and give back to the community.

Blogs from this Author
Spark DataFrame: Writing into Files
This blog post explores how to write Spark DataFrame into various file formats for saving data to external storage for further analysis or sharing. Before diving into this blog have a look at my other blog posts discussing about creating the DataFrame and manipulating the DataFrame along with writing a DataFrame into tables and views. […]
Spark DataFrame: Writing to Tables and Creating Views
In this Blog Post we will see methods of writing Spark DataFrame into tables and creating views, for essential tasks for data processing and analysis. Before diving into this blog have a look at my other blog posts discussing about creating the DataFrame and manipulating the DataFrame. Creating DataFrame: https://blogs.perficient.com/2024/01/10/spark-scala-approaches-toward-creating-dataframe/ Manipulating DataFrame: https://blogs.perficient.com/2024/02/15/spark-dataframe-basic-methods/ Dataset: The […]
Spark: DataFrame Basic Methods
DataFrame is a key abstraction in Spark which represents structured data and allows for easy manipulation and analysis. In this blog post, we’ll explore the various basic DataFrame methods available in Spark and how they can be used for data processing tasks using examples. Dataset: There are many DataFrame methods which are subclassified into Transformation […]
Spark: Parser Modes
Apache Spark is a powerful open-source distributed computing system widely used for big data processing and analytics. When working with structured data, one common challenge is dealing with parsing errors—malformed or corrupted records that can hinder data processing. Spark provides flexibility in handling these issues through parser modes, allowing users to choose the behavior that […]
Spark Scala: Approaches toward creating Dataframe
In Spark with Scala, creating DataFrames is fundamental for data manipulation and analysis. There are several approaches for creating DataFrames, each offering its unique advantages. You can create DataFrames from various data sources like CSV, JSON, or even from existing RDDs (Resilient Distributed Datasets). In this blog we will see some approaches towards creating dataframe […]