If you are still using a star schema and dimensional modeling to build your data warehouse / Data Lake, I implore you to reconsider. This has been a contentious subject in the past, but when building modern data environments It is important to me to cover this topic in detail.
Ralph Kimball introduced the industry to the star schema and techniques of dimensional modeling in his first edition book The Data Warehouse Toolkit was published in 1996. Ralph and the Kimball institute were recognized experts in data warehousing for more than 25 years. The second edition of The Data Warehouse Toolkit was published in 2002 and the third edition in 2013. However, the Kimball institute closed its doors in 2015 and with all due respect to Ralph Kimball and his colleagues, the star schema and dimensional modeling are obsolete. This is not my opinion; it is a fact, based on Kimball’s own explanation of why dimensional modeling was beneficial.
In the first Chapter of The Data Warehouse Toolkit Ralph Kimball explained that dimensional modeling and the star schema offered the following benefits:
- Reduced cost
- Better performance
- Helped data uses / knowledge workers better understand the data
While all three of these benefits were true in 1996, by 2015 they were starting to be questioned and today, they just are not true at all. Many things have changed around data management, data processing technologies, and the people that use data to make business decisions.
Shaping the Future of Healthcare with Google Cloud
Learn how healthcare organizations are leveraging Google Cloud Platform to help reduce operational spend while increasing revenue, improving the quality of care, and meeting industry standards.
In the past two decades, due to the exponential rise in data usage, data centers developed stringent requirements for greater storage capacity and faster data transmission, and the industry continues to evolve. Innovators are focused on finding ways to achieve larger capacity and faster throughput while using limited space.
Cloud computing is the on-demand delivery of IT resources over the Internet with pay-as-you-go pricing. Instead of buying, owning, and maintaining physical data centers and servers, you can access technology services, such as computing power, storage, and databases, on an as-needed basis from a cloud provider like Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure.
Modern Data Warehouse/Data Lake Engines
In the modern data warehouse / Data Lake architecture, the actual Data Lake is accessed through the cloud. There are several cloud-based Data Lake options like Redshift, SnowFlake, Databricks, BigQuery, and Athena, each of which has different architectures for the same benefits of integrating, analyzing, and acting on data from different data sources. Data plays a pivotal role in almost everything we do these days. but it’s no longer enough to just have access to data-driven insights, particularly if they are outdated and obsolete. As the amount of data generated grows and data capture increasingly moves to cloud environments, urgent processing is critical to delivering timely intelligence that reflects real-time circumstances. Organizations are progressively experiencing more pressure to obtain and apply insights rapidly before situations change. This fact makes it imperative for business leaders across all mainstream industries to embrace active data and deploy ways of capturing, transporting, and managing it for immediate processing.
The Modern Knowledge Worker
The modern data analyst/knowledge worker is not part of the information technology organization. The modern data worker is part of the business organization. It is rare that they know SQL and have the capacity to join many tables together. The primary skill for many modern knowledge workers is the ability to manage data in a spreadsheet.
The Star Schema is Obsolete
In my next two blogs, I will explain in detail using a star schema and dimensional model are no longer relevant and to help guide you to a better solution model. So, keep an open mind and stay tuned there is much more to come.
Perficient’s Cloud Data Expertise
Our cloud, data, and analytics team can assist with your entire data and analytics lifecycle, from data strategy to implementation. We will help you make sense of your data and show you how to use it to solve complex business problems. We’ll assess your current data and analytics issues and develop a strategy to guide you to your long-term goals.