Google

The Star Schema is Obsolete

Success Is A Collaborative Effort

If you are still using a star schema and dimensional modeling to build your data warehouse / Data Lake, I implore you to reconsider. This has been a contentious subject in the past, but when building modern data environments  It is important to me to cover this topic in detail.

Shopping Cart Symbol Futuristic Sketch

Ralph Kimball introduced the industry to the star schema and techniques of dimensional modeling in his first edition book The Data Warehouse Toolkit was published in 1996. Ralph and the Kimball institute were recognized experts in data warehousing for more than 25 years. The second edition of The Data Warehouse Toolkit was published in 2002 and the third edition in 2013. However, the Kimball institute closed its doors in 2015 and with all due respect to Ralph Kimball and his colleagues, the star schema and dimensional modeling are obsolete. This is not my opinion; it is a fact, based on Kimball’s own explanation of why dimensional modeling was beneficial.

 

 

In the first Chapter of The Data Warehouse Toolkit Ralph Kimball explained that dimensional modeling and the star schema offered the following benefits:

  • Reduced cost
  • Better performance
  • Helped data uses / knowledge workers better understand the data

While all three of these benefits were true in 1996, by 2015 they were starting to be questioned and today, they just are not true at all. Many things have changed around data management, data processing technologies, and the people that use data to make business decisions.

 

Storage

In the past two decades, due to the exponential rise in data usage, data centers developed stringent requirements for greater storage capacity and faster data transmission, and the industry continues to evolve. Innovators are focused on finding ways to achieve larger capacity and faster throughput while using limited space.

 

Cloud Computing

Cloud computing is the on-demand delivery of IT resources over the Internet with pay-as-you-go pricing. Instead of buying, owning, and maintaining physical data centers and servers, you can access technology services, such as computing power, storage, and databases, on an as-needed basis from a cloud provider like Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure.

Abstract Network Background

Modern Data Warehouse/Data Lake Engines

In the modern data warehouse / Data Lake architecture, the actual Data Lake is accessed through the cloud. There are several cloud-based Data Lake options like Redshift, SnowFlake, Databricks, BigQuery, and Athena, each of which has different architectures for the same benefits of integrating, analyzing, and acting on data from different data sources. Data plays a pivotal role in almost everything we do these days. but it’s no longer enough to just have access to data-driven insights, particularly if they are outdated and obsolete. As the amount of data generated grows and data capture increasingly moves to cloud environments, urgent processing is critical to delivering timely intelligence that reflects real-time circumstances. Organizations are progressively experiencing more pressure to obtain and apply insights rapidly before situations change. This fact makes it imperative for business leaders across all mainstream industries to embrace active data and deploy ways of capturing, transporting, and managing it for immediate processing.

 

The Modern Knowledge Worker

The modern data analyst/knowledge worker is not part of the information technology organization. The modern data worker is part of the business organization. It is rare that they know SQL and have the capacity to join many tables together. The primary skill for many modern knowledge workers is the ability to manage data in a spreadsheet.

 

The Star Schema is Obsolete

In my next two blogs, I will explain in detail using a star schema and dimensional model are no longer relevant and to help guide you to a better solution model. So, keep an open mind and stay tuned there is much more to come.

 

Perficient’s Cloud Data Expertise

Our cloud, data, and analytics team can assist with your entire data and analytics lifecycle, from data strategy to implementation. We will help you make sense of your data and show you how to use it to solve complex business problems. We’ll assess your current data and analytics issues and develop a strategy to guide you to your long-term goals.

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Chuck Brooks

Dr. Chuck is a Senior Data Strategist / Solution Architect. He is a technology leader and visionary in big data, data lakes, analytics, and data science. Over a career that spans more than 40 years, Dr. Chuck has developed many large data repositories based on advancing data technologies. Dr. Chuck has helped many companies become data-driven and develop comprehensive data strategies. The cloud is the modern ecosystem for data and data lakes. Dr. Chuck’s expertise lies in the Google Cloud Platform, Advanced Analytics, Big Data, SQL and NoSQL Databases, Cloud Data Management Engines, and Business Management Development technologies such as SQL, Python, Data Studio, Qlik, PowerBI, Talend, R, Data Robot, and more. The following sales enablement and data strategy results from 40 years of Dr. Chuck’s career in the data space. For more information or to engage Dr. Chuck in an engagement, contact him at chuck.brooks@perficient.com.

More from this Author

Subscribe to the Weekly Blog Digest:

Sign Up
Follow Us
TwitterLinkedinFacebookYoutubeInstagram