Spark SQL offers a set of built-in standard functions for handling dates and timestamps within the DataFrame API. These functions are valuable for performing operations involving date and time data. They accept inputs in various formats, including Date type, Timestamp type, or String. If the input is provided as a String, it must be in a format compatible with date (e.g., yyyy-MM-dd) or timestamp (e.g., yyyy-MM-dd HH:mm:ss.SSSS) representations. The functions return the corresponding date or timestamp values based on the input type. If the input string cannot be successfully converted to a date or timestamp, the functions return null.
Let’s see some Date and Timestamp syntax and examples in Spark SQL:
First Create a Sample Dataset and save it as View, which we can use for seeing the date and timestamp functions in SQL.
To Know how to create dataframe and it methods , check into : https://blogs.perficient.com/2024/01/10/spark-scala-approaches-toward-creating-dataframe/
and also to know about writing into Table , look into : https://blogs.perficient.com/2024/02/25/spark-dataframe-writing-to-tables-and-creating-views/
- Creating a temp view
val df = spark.createDataFrame(Seq( ("2024-02-27", "2024-02-27 15:30:45.123"), ("2024-01-15", "2024-01-15 08:45:30.555"), ("2023-11-20", "2023-11-20 12:00:00.000"), ("invalid date", "invalid timestamp") )).toDF("date_string", "timestamp_string") df.createOrReplaceTempView("my_table")
- Displaying my_table
Date Data Type
The DateType represents a date without a time component. Dates can be created from string using the DATE literal or the TO_DATE() function.
- Example of DATE Literal
- Example of TO_DATE() function
Timestamp Data Type
The TimestampType represents a date and time with millisecond precision. Timestamps can be created using the TIMESTAMP literal or the TO_TIMESTAMP() function.
- Example of TIMESTAMP Literal
- Example of TO_TIMESTAMP() function
Date and Timestamp Functions
Spark SQL provides various functions for working with dates and timestamps:
- Date Functions: year(), month(), dayofmonth(), dayofweek(), dayofyear(), weekofyear(), etc.
- Timestamp Functions: hour(), minute(), second(), date_format(), etc.
Formatting Dates and Timestamps
You can format dates and timestamps using the date_format() function.
- date_format example with formating Date
- date_format example with formating timestamp
Date and Timestamp Arithmetic
You can perform arithmetic operations on dates and timestamps using functions like date_add(), date_sub(), and arithmetic operators.
- Example of adding 1 day to given dates
- Example of adding 1 hour to given timestamp
Filtering Dates and Timestamps
You can filter data based on dates and timestamps using comparison operators.
- Example of filtering with date
- Example of filtering with timestamp
These are some common operations you can perform with dates and timestamps in Spark SQL. They are essential for various analytical tasks, especially when dealing with time-series data.
References:
Spqk SqL build-in function : https://spark.apache.org/docs/2.3.0/api/sql/index.html
Hello
Do you know how to set diffrent time zone to be default one for CURRENT_TIMESTAMP or TIMESTAMP ‘now’.
No it seesm is in UTS comparing to my local time and i want to be in local or CEST.
You can use the function “from_utc_timestamp(CAST(timestamp AS TIMESTAMP), ‘Europe/Paris’) AS cest_timestamp”
This will convert from UTC to CEST.