Skip to main content

Technical

Databricks Integration with Snowflake

Learning from collaboration

What is Databricks?

Databricks is a unified cloud-based data platform that is powered by Apache Spark. It specializes in collaboration and analytics for big data. Databricks is a data science workspace, with Collaborative Notebooks, Machine Learning Runtime, and Managed ML flow.

  • Collaborative Notebooks support multiple data analytics languages, such as SQL, Scala, R, Python, and Java. Data analysts will find it much easier and timesaving to work with their teammates, share insights with built-in visualization, and automatic visioning
  • Machine Learning Runtime (MLR) takes off the burden from managing necessary libraries and keeping the module versions up-to-date; instead, data scientists can connect to the most popular Machine Learning frameworks (TensorFlow, Keras, XGBoost, Scikit-learn, etc.) with one click. MLR can also speed up the model tuning process with its built-in AutoML function, by hyperparameter tuning and model search, using Hyperopt and ML flow

What are the Benefits?

Databricks’ ability to process and transform a massive amount of data makes it an industry-leading solution for data scientists and analysts. Some of its key benefits include:

Getting Started: Data practitioners can find commonly used programming languages – namely, Python, R, and SQL that can be used in Databricks. This shortens time spent on getting familiar with the language, and ease the learning curve for newcomers. When launched, users see the notebook in a format that is similar to Jupyter notebook, which is widely used around the world.

Collaboration: Besides what’s mentioned above, Databricks encourages multiple team members to work on the same project with interactive workspaces. All members can work under the same workspaces without worrying about version control.

Production: After training and testing, data engineers can quickly deploy the model in Databricks. Deployment for big data is prone to be messy and complex. But Databricks can give your team an edge.

Why Snowflake?

Ameex is a proud partner with Snowflake, and we are excited to deliver cloud data warehouses to our clients.

In this blog, we will walk through the steps on how to connect Databricks to Snowflake so that you can begin your data journey with first-in-class machine-learning capabilities. You will discover that your data is securely stored in a reliable cloud warehouse.

To begin with, connecting Databricks to Snowflake will require the following:

  • An up-to-date Databricks account, with secret setup; A Snowflake account, with the critical information below available:
    • URL for your Snowflake account
    • Login name and password for the user who connects to the account
    • Default database and schema to use for the session after establishing the connection
    • Default virtual warehouse to use for the session after establishing the connection

The connection process can be summarized as:

  • Enable token-based authentication for Databricks workspace
  • Install Databricks CLI
  • Create Databricks Scope
  • Create Databricks Secrets within the Scope
  • Use the Secrets to connect Databricks to Snowflake

Step 1: Enable token-based authentication for your workspace

  1. Click on your User icon at the top right corner in your Databricks account and navigate to Admin Console

Databricks toSnowflake

  1. Once in the Admin Console, select Access Control

  2. Find the Personal Access Tokens, and click Enable

  3. Confirm

After a few minutes, the Personal Access Tokens would be available.

Databricks toSnowflake

  1. Click on your User icon at the top right corner in your Databricks account and navigate to User Settings

Databricks to Snowflake

  1. Select Access Tokens

  2. Click Generate New Token button

  3. You can enter an optional description for the new token and specify the expiration period

  4. Click the Generate button, copy the generated token and store it for the next step

Step 2: Install Databricks CLI

The Databricks command-line interface will be helpful in providing an interface to the platform. We will install this assuming you have the following:

  • Python 2.7.9 and above or
  • Python 3.6 and above

1. Install

Run pip install Databricks-cli using the appropriate version of pip for your Python installation. If you are using Python 3, run pip3 install Databricks-cli.

2. Set up

Run Databricks configure –token
In the prompts below, type in your host and token (from previous step)

Databricks Host (should begin with https://):
Token:

3. Access credential

Your access credential should be stored in the file ~/.databrickscfg

host = https://<databricks-instance>
token = <personal-access-token>

Step 3: Create Databricks scope

  1. Create a scope

Scope name is case insensitive

Databricks secrets create-scope –scope <scope-name>

  1. Scopes are created with MANAGE permission by default. If your account is not Premium, you must override the same and grant manage permission to “users” while creating the scope:

Databricks secrets create-scope –scope <scope-name> –initial-manage-principal users

  1. To double check the scope is created successfully:

Databricks secrets list-scopes

Step 4: Once scope is ready, secrets are required to be created

  1. Create a secret

Secret name is case insensitive

Databricks secrets put –scope <scope-name> –key <key-name>

  1. Confirm the secret is created

Databricks secrets list –scope <scope-name>

Step 5: Connect Databricks to Snowflake

For the last step, you can refer to the following documents: Python, Scala.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Adarsh Srivastava

Adarsh Srivastava is a Senior Business Consultant with over 6 years of content-related experience working in the travel, healthcare, and information technology sectors. Upholding the principles of sincerity, punctuality, and compassion through his writings, he delivers conversational and fact-based content. He is also the author of two books – one of which is a romantic thriller and another an anthology. Feel free to reach out to him for any form of communication.

More from this Author

Follow Us