In a recent post, Perficient’s Arvind Murali explained how the rise of Big Data and data lake technology is not an indication that traditional data warehousing is obsolete. There is a place for both solutions, since they address different challenges, and plenty of opportunities for integration as well. According to Gartner, there is a market shift towards hybrid approaches to data management, as companies look to solve complex challenges with a mix of alternative and traditional deployments.
Arvind discussed some of the similarities between traditional and Big Data solutions – but what are the differences? Data lakes are collections of various data assets that are stored within a Hadoop ecosystem with minimal change to the original format or content of the source data, and information access requires a schema-on-read approach. In contrast, with an enterprise data warehouse (EDW), as data is written to the warehouse it must conform at write (a.k.a. “schema-on-write”).
Traditional EDW approaches are built with the intention of attracting a large number of self-service business users with an interest in day-to-day clinical, financial and operational reporting that draws from structured and pre-processed data. On the other hand, data lakes are designed as highly agile, configurable alternatives for answering complex questions, leveraging all available data sources.
Within the healthcare industry, there are two major use cases for data lakes: predicting comprehensive healthcare costs and providing evidence-based care. In both instances, data lakes improve analysis processes by providing access to traditionally unavailable information. These new sources may include personal health records, unstructured doctors’ notes, smart device data feeds, clinical trial research, genomic research, and many others. This expanded source of information allows healthcare providers to gain greater insights into care decisions and costs, by combining new data with the traditional source of information – electronic medical records.
If you’d like more information on these use cases for data lakes in the healthcare industry and best practices in architecture, check out our guide: