Jupyter Notebook is widely used in both large organizations/corporations, as well as in Academia – such as universities and research institutes. It is an open-source web application for interactive computing across Julia, Python, R, and many other programming languages. Being different from Integrated Development Environment (IDE), Jupyter Notebook received its popularity among Data Scientists, Data Analysts, and other Data practitioners mainly because it provides a simple, clean, and interactive interface for immediate results and graphs display. Using Jupyter Notebook, Data Scientists can print the prediction results in the Notebook with different parameters, showcasing the impact of tuning on accuracy; Data Analysts can display various approaches to dicing and slicing the dataset, finding insights with the most applicable and actionable values.
Few key benefits of Jupyter Notebook:
- Easy to navigate: Jupyter Notebooks are user-friendly, easy to get started with, and easy to share. Users can also choose to save the Notebook in different formats, such as HTML or PDF. This is particularly helpful when one of the Data Analysts wants to quickly share a few preliminary discoveries within the group
- Flexibility: Users can customize the interface to better adapt their Use Cases; furthermore, Users can make use of Jupyter Notebook for various programming languages
- Experiment: Jupyter’s interactive interface allows data to be thoroughly experimented, examined, and analyzed. Users will be able to try different methods and approaches after viewing the results and outputs immediately subsequent to the previous steps
Value of connecting Jupyter Notebook to Snowflake
Snowflake is a modern data warehouse solution for massive amounts of computing and storage. Its multi-cluster warehouse changes the way data is stored. Instead of multiple on-premises data marts, owned by different business units for similar or dissimilar Use Cases, Snowflake provides a way to safely store all data, granting different Users with appropriate access privileges.
In this post, we will connect the Jupyter Notebook to Snowflake. In doing so, we will be able to use Python to interact with the cloud storage.
Steps to Connect Jupyter Notebook to Snowflake
To connect Jupyter Notebook to Snowflake, we will need SnowCD and Python Connector.
Verifying network connection with Snowflake using SnowCD
Step 1: Obtain Snowflake host name IP addresses and ports
Run the SELECT SYSTEM$WHITELIST or SELECT SYSTEM$WHITELIST_PRIVATELINK() command in your Snowflake worksheet.
Step 2: Save the query result to a file
Step 3: Download and Install SnowCD
Click here for more info on SnowCD.
Step 4: Run SnowCD
Follow SnowCD’s instruction, run the SnowCD on your local terminal. If successful, you will be able to view results similar to the following:
Performing 27 checks for 11 hosts
All checks passed
Step 5: Install the Connector
Check if your Python is in a supported version through this link.
pip install –upgrade snowflake-connector-python
Step 6: Verify Your Installation
If installed correctly, you can verify the connector version as follows:
Step 7: Connecting, with Default Authenticator
conn = snowflake.connector.connect(
user=USER,
password=PASSWORD,
account=ACCOUNT,
warehouse=WAREHOUSE,
database=DATABASE,
schema=SCHEMA
)
If all previous steps are performed, you will be able to login into your Snowflake account using the script displayed above.
In this blog, we have walked through the steps to be followed for connecting Jupyter Notebook to Snowflake using Python. Further steps can be explored as data practitioners will be able to create tables, load data, and perform querying using Python via Jupyter Notebook with the speed and scalability of Snowflake.