I often work with organizations that have a need to move data out of their transactional operational systems to a repository in real-time. These organizations usually have a need to reduce query, reporting, and analytics loads on operational systems while increasing the real-time query and analytics applications to make better data-driven decisions. This operational data store, ODS, concept is not new, but with the massive amounts of data that typically need to be moved and managed, it is an architecture that became limited by data technology, until recently. In this post, we will define what an ODS is, how it differs from a data lake, and finally the recent data technology advancements that have made an ODS exciting again.
What is an ODS?
An ODS provides current ( real-time), clean data from multiple sources in a single place, and the benefits apply primarily to business operations.
- The ODS provides a consolidated repository that integrates data from multiple IT systems.
- ODS analytics and reporting that is focused on data from multiple integrated operational systems, can be more sophisticated and more complete than reports from individual underlying systems. The ODS is architected to provide a consolidated view of data integrated from multiple systems, so reports can provide a holistic view of operational processes.
- The up-to-date view of operational status also makes it easier for users to diagnose problems before digging into component systems.
- An ODS contains critical, time-sensitive business rules. These rules, in the aggregate, are a kind of process automation that greatly improves efficiency, which would be impossible without current and integrated operational data.
ODS is not a Data Lake!
ODS is designed for a different purpose than a data lake.
- An ODS may be used as an interim area for a data lake; it sits between the data sources and the data lake.
- An ODS deals exclusively with current operational data. An ODS continuously overwrites data. A data lake continually inserts records into existing tables and can aggregate data across historical views.
There are two primary components that have rejuvenated the ODS Architecture. The first is Chance Data Capture (CDC). CDC refers to the process of identifying and capturing changes as they are made in a database or source application, then delivering those changes in real-time to a downstream process, system, ODS, or data lake. In recent years CDC has become both faster and more reliable. Faster means we can now really move from operation systems to an ODS in just a few seconds. More reliable means that users can trust and use data in the ODS with full confidence in its accuracy. I have been working with the Qlik CDC tools and I have found them to be an excellent, flexible, fast, and easy-to-use tool
The second technology for implementing an ODS is Google’s Alloydb for PostgreSQL. You know I think that GCP is the best data management platform on the planet and Alloydb supports my belief. AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible database service that’s designed for the most demanding workloads, including hybrid transactional and analytical processing. Alloydb for PostgreSQL is 2 times faster than PostgreSQL for write transactions and 200 times faster than PostgreSQL for analytics transactions.
With Qlik CDC and Google’s AlloyDB for PostgreSQL, the ODS Architecture is a viable option for real-time data movement, integration, and decision-making.
Perficient’s Cloud Data Expertise
The world’s leading brands choose to partner with us because we are large enough to scale major cloud projects, yet nimble enough to provide focused expertise in specific areas of your business. Our cloud, data, and analytics team can assist with your entire data and analytics lifecycle, from data strategy to implementation. We will help you make sense of your data and show you how to use it to solve complex business problems. We’ll assess your current data and analytics issues and develop a strategy to guide you to your long-term goals.
Download the guide, Becoming a Data-Driven Organization with Google Cloud Platform, to learn more about Dr. Chuck’s GCP data strategy