Shubham Deshmukh, Author at Perficient Blogs https://blogs.perficient.com/author/sdeshmukh/ Expert Digital Insights Tue, 16 Apr 2024 14:59:23 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png Shubham Deshmukh, Author at Perficient Blogs https://blogs.perficient.com/author/sdeshmukh/ 32 32 30508587 Step by step guide to secure JDBC SSL connection with Postgres in AWS Glue https://blogs.perficient.com/2024/04/05/step-by-step-guide-to-secure-jdbc-ssl-connection-with-postgre-in-aws-glue/ https://blogs.perficient.com/2024/04/05/step-by-step-guide-to-secure-jdbc-ssl-connection-with-postgre-in-aws-glue/#respond Sat, 06 Apr 2024 04:48:06 +0000 https://blogs.perficient.com/?p=360279

Have you ever tried connecting a database to AWS Glue using a JDBC SSL encryption connection? It can be quite a puzzle. A few months ago, I faced this exact challenge. I thought it would be easy, but  I was wrong! When I searched for help online, I couldn’t find much useful guidance. So, I rolled up my sleeves and experimented until I finally figured it out.

Now, I am sharing my learnings with you. In this blog, I’ll break down the steps in a clear, easy-to-follow way. By the end, you’ll know exactly how to connect your database to AWS Glue with SSL encryption. Let’s make this complex task a little simpler together.

Before moving ahead let’s discuss briefly how SSL encryption works

  1. The client sends a connection request (Client Hello).
  2. The server responds, choosing encryption (Server Hello).
  3. The client verifies the server’s identity using its certificate and root certificate.
  4. Key exchange establishes a shared encryption key.
  5. Encrypted data exchanged securely.
  6. Client may authenticate with its certificate before encrypted data exchange.
  7. Connection terminates upon session end or timeout.

SSL encription

Now you got basic understanding let’s continue to configure the Glue for SSL encryption.

The steps above are the basic steps of SSL encryption process. Let’s us now discuss how to configure the AWS Glue for SSL encryption.Before we start the configuration process we need the following Formatting below

1)Client Certificate

2 Root Certificate

3) Certificate Key

into DER format. This is the format suitable for AWS glue.

DER (Distinguished Encoding Rules) is a binary encoding format used in cryptographic protocols like SSL/TLS to represent and exchange data structures defined by ASN.1. It ensures unambiguous and minimal-size encoding of cryptographic data such as certificates.

Here’s how you can do it for each component:

1 .Client Certificate (PEM):

This certificate is used by the client (in this case, AWS Glue) to authenticate itself to the server (e.g., another Database) during the SSL handshake. It includes the public key of the client and is usually signed by a trusted Certificate Authority (CA) or an intermediate CA.

If your client certificate is not already in DER format, you can convert it using the OpenSSL command-line tool:

openssl x509 -in client_certificate.pem -outform der -out client_certificate.der

Replace client_certificate.pem with the filename of your client certificate in DER format, and client_certificate.der with the desired filename for the converted DER-encoded client certificate.

 

2.Root Certificate (PEM):

The root certificate belongs to the Certificate Authority (CA) that signed the server’s certificate (in this case, Postgre Database). It’s used by the client to verify the authenticity of the server’s certificate during the SSL.

Convert the root certificate to DER format using the following command:

openssl x509 -in root_certificate.pem -outform der -out root_certificate.der

Replace root_certificate.pem with the filename of your root certificate in DER format, and root_certificate.der with the desired filename for the converted DER-encoded root certificate.

 

3.Certificate Key (PKCS#8 PEM):

This is the private key corresponding to the client certificate. It’s used to prove the ownership of the client certificate during the SSL handshake.

Convert the certificate key to PKCS#8 PEM format using the OpenSSL command-line tool:

openssl pkcs8 -topk8 -inform PEM -outform DER -in certificate_key.pem -out certificate_key.pk8 -nocrypt

Replace certificate_key.pem with the filename of your certificate key in PEM format, and certificate_key.pk8 with the desired filename for the converted PKCS#8 PEM-encoded certificate key.

 

Stored the above certificates and key to S3 bucket. We will need these certificates while configuring the AWS glue.

 

AWS S3 Files

 

To connect AWS Glue to a PostgreSQL database over SSL using PySpark, you’ll need to provide the necessary SSL certificates and configure the connection properly. Here’s an example PySpark script demonstrating how to achieve this:

from pyspark.context import SparkContext
from awsglue.context import GlueContext
from pyspark.sql import SparkSession

# Initialize Spark and Glue contexts
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

# Define PostgreSQL connection properties
jdbc_url = "jdbc:postgresql://your_postgresql_host:5432/your_database"
connection_properties = {
    "user": "your_username",
    "password": "your_password",
    "ssl": "true",
    "sslmode": "verify-ca",  # SSL mode: verify-ca or verify-full
    "sslrootcert": "s3://etl-test-bucket1/root_certificate.der",  # S3 Path to root certificate
    "sslcert": "s3://etl-test-bucket1/client_certificate.der",     # S3 Path to client certificate
    "sslkey": "s3://etl-test-bucket1/certificate_key.pk8"         # S3 Path to client certificate key
}

# Load data from PostgreSQL table
dataframe = spark.read.jdbc(url=jdbc_url, table="your_table_name", properties=connection_properties)

# Perform data processing or analysis
# For example:
dataframe.show()

# Stop Spark session
spark.stop()

 

Now inside your glue job click on Job details page and scroll down until you see Dependent JARs path and Referenced files path options. Under Dependent JARs path put location of S3  path where you stored the jar file and in Referenced files path add the S3 path of converted Client,Root and Key certificate separated by comma “,”

AWS Glue Job Details

 

Now Click on Save option and you are ready to Go

 

This concludes the steps to configure secure JDBC Connection with DB in AWS Glue. To summarize, in this blog we:

1)Explained how to SSL encryption can be used for secure data exchange between AWS Glue and your database(here Postgresql)

2) The steps to configure SSL Encryption in AWS Glue to secure JDBC connection with a database

 

You can read my other blogs here

read more about AWS Glue

 

 

 

 

 

 

]]>
https://blogs.perficient.com/2024/04/05/step-by-step-guide-to-secure-jdbc-ssl-connection-with-postgre-in-aws-glue/feed/ 0 360279
EXPLORE TIME TRAVEL IN SNOWFLAKE https://blogs.perficient.com/2022/07/25/explore-time-travel-in-snowflake/ https://blogs.perficient.com/2022/07/25/explore-time-travel-in-snowflake/#comments Mon, 25 Jul 2022 13:08:08 +0000 https://blogs.perficient.com/?p=314421

Have you ever wondered if it would ever be possible to time travel like in the old movies with a time machine? If we could go back in time and see the world, would we? If you asked me, I would have said yes! But not in the way that Hollywood portrays science fiction films. Today, we’ll look at one such feature. Go get ready to Time travel in the Data’s world of Snowflake.

 

Introduction to Time Travel

 

“Snowflake Time Travel enables accessing historical data (i.e., data that has been changed or deleted) at any point within a defined period.” – Snowflake

 

Time Travel is one of the cool features that Snowflake provides to its users. It allows us to recover data that has been changed or deleted at any point within a specified time frame.

We can do some amazing things with this powerful feature, such as:

  • We can recover deleted objects such as tables, schemas, and databases. So there’s no need to worry about new employees accidentally deleting data.
  • Duplicating and backing up data from key points in the past was never as simple as it is now.
  • Examine data usage and manipulation over specified time periods.

 

NOTE:

Snowflake’s Time travel define by Databases, Schemas, and Tables. The data retention period parameter specifies the amount of time we can view the table’s historical data. In all Snowflake editions, It is set to 1 day by default for all objects.

This parameter can be extended to 90 days for Enterprise and Business-Critical editions.

The parameter “DATA RETENTION PERIOD” controls an object’s time travel capability.

Once the time travel duration is exceeded the object enters the Fail-safe region. If you need to retrieve the object while it is in Fail safe mode, you must contact the snowflake itself.

snowflake time travel feature

The following SQL extensions have been implemented to support Time Travel:

  • The AT | BEFORE clause, which can be used in SELECT statements and CREATE… CLONE commands (immediately after the object name).

To pinpoint the exact historical data you want to access, the clause uses one of the following parameters:

  • TIMESTAMP
  • OFFSET (time difference in seconds from the present time)
  • STATEMENT (identifier for statement, e.g. query ID)

 

#select the data for the specified Query ID executed at specific period of time
SELECT * FROM OUR_FIRST_DB.public.test before (statement => '01a58f86-3200-7cb6-0001-25ce0002d232') //  Query ID

#select the data as of before a couple of (seconds, minutes, hours) ago in snowflake using the time travel
SELECT * FROM OUR_FIRST_DB.public.test before (offset => -300) //  seconds only

#select the data as of specified date time in snowflake using the time travel
select * from OUR_FIRST_DB.public.test  at (TIMESTAMP=>'2022-12-07 00:57:35.967'::timestamp) //  Timestamp
  • UNDROP command for tables, schemas, and databases.

 

#Will UNDROP TABLE
UNDROP TABLE TABLENAME

#Will UNDROP SCHEMA
UNDROP SCHEMA SCHEMANAME

#Will UNDROP DATABASE
UNDROP DATABASE DATABASENAME

 

Let us illustrate this with an example.

EXAMPLE OF TIME TRAVEL

 

  1. Create an employee table with a 4-day data retention period. Note that I am using the DEMO_DB database and PUBLIC schema.
create or replace table EMPLOYEE (empid int ,emp_name varchar(20) ) data_retention_time_in_days=4;
insert into EMPLOYEE values(1,'Shubham');
insert into EMPLOYEE values(2,'Chandan');
insert into EMPLOYEE values(3,'Simran');
insert into EMPLOYEE values(4,'Nikita')
insert into EMPLOYEE values(5,'Achal');
insert into EMPLOYEE values(6,'Aditi');
select * from EMPLOYEE;

Create Table snowflake

 

  1. After 5 minutes, I inserted another row with EMPID 7 as follows:
insert into EMPLOYEE values(7,'Shobit'); 
select * from EMPLOYEE;

Update Table snowflake

 

  1. The table now has 7 rows, but let’s go back 5 minutes and see how the table looks.
select * from EMPLOYEE at(offset=>-60*5);

snowflake Time Travel Result

 

In this way, you can check the data the table holds in the past.

 

 

Final Reflections

This brings us to the conclusion about the Snowflake time travel. This article has taught us what time travel is and how to use it in Snowflake. Additionally, I have demonstrated to you how to customize the Snowflake retention settings at table levels. I hope you gained an overview of one of Snowflake’s most significant features.

Please share your thoughts and suggestions in the space below, and I’ll do my best to respond to all of them as time allows.

Refer to the official Snowflake documentation here if you want to learn more.

for more such blogs click here

]]>
https://blogs.perficient.com/2022/07/25/explore-time-travel-in-snowflake/feed/ 26 314421