This is the second blog in the series. Read the first blog, here.
In my first blog, I stated that the star schema and dimensional modeling are obsolete, and I urge you to stop using them. I explained that in the first chapter of The Data Warehouse Toolkit Ralph Kimball explained that dimensional modeling and the star schema offered few benefits that are now archaic.
In 1996 when the star schema and dimensional modeling were introduced, storage costs were expensive, almost $12,000 per gigabyte. Since the star schema helped reduce data redundancy it also helped reduce storage costs. In 2022 the cost of a gigabyte of data storage in the cloud is only $.023 cents per month. Reducing data redundancy does not create significant cost savings. Additionally in the same time frame, labor costs for data workers have greatly increased. Converting a normal form model to the star schema dimensional model takes significant data engineering time every day. The star schema dimensional model also increases the time necessary to join information from multiple dimensions to obtain usable data sets. The result is that using a star schema dimensional model in the modern data processing environment significantly increases costs.
Shaping the Future of Healthcare with Google Cloud
Learn how healthcare organizations are leveraging Google Cloud Platform to help reduce operational spend while increasing revenue, improving the quality of care, and meeting industry standards.
In a study done by Fivetran, Michael Kaminsky provides definitive benchmark proof that today’s modern data management platforms like Snowflake, Redshift, and BigQuery Perform significantly faster when they use wide tables as opposed to a star schema dimensional model. This is because when we join data together into one big wide table (OBT) the modern data management platform does not have to do any joins or at least fewer joins. You see, the single most expensive performance operation in a relational platform is still the SQL JOIN. When we combine this with previous knowledge that data visualization, reporting, and many data science tools perform better using wide tables it becomes clear that OBT or wide tables are a preferable table architecture.
Help Data Users and Knowledge Workers Better Understand the Data
Perhaps in 1996, this was plausible because data workers were primarily within Information Technology (IT) organizations and because they knew SQL well. However, I am not really convinced this argument was ever accurate. In 2022, knowledge workers are most often in the business organization. They are not IT employees they are business people trying to make better business decisions based on data. For many, the only way/tool they know for working with data is a spreadsheet. This means the star schema is not understandable and perhaps is even unusable as they lack the knowledge necessary to join information from multiple tables. At the very least using a spreadsheet to join information from multiple star schema dimensional models is time and effort intensive. However, OBT wide table is familiar to these data workers, it looks and feels like a spreadsheet. The OBT/wide table is much easier for the modern data worker to understand, and I think much easier for everyone to understand.
When we closely examine the reasons that Ralph Kimball himself said a star schema dimensional model was effective it is clear, that they no longer hold true. You need to stop using star schema dimensional models. They are ludicrously expensive, decreasing performance, and inhibiting your data/knowledge workers. For an explanation of what you should be doing instead look for my next blog.
Perficient’s Cloud Data Expertise
Our cloud, data, and analytics team can assist with your entire data and analytics lifecycle, from data strategy to implementation. We will help you make sense of your data and show you how to use it to solve complex business problems. We’ll assess your current data and analytics issues and develop a strategy to guide you to your long-term goals.