There is a lot of interest in Big Data these days. The common definition for Big Data is often considered to be data sets which are too large (e.g., hundreds of terabytes or into petabytes) to be handled by traditional means. Leveraging Big Data requires an extensive degree of parallelism of both data and computing, requiring a Massively Parallel Processing (MPP) architecture, and may take advantage of low-cost commodity hardware and open source software to help drive down costs. Healthcare is certainly an industry which generates massive volumes of data which very often isn’t leveraged anywhere near to full potential.
Healthcare providers and payers often want to uncover insight into how to improve clinical care and operations by mining the data, but might not want to face the expense and delay involved in building an Enterprise Data Warehouse in order to quickly investigate a hypothesis. In such a situation, a Data Scientist might benefit from having all HL7 messages available in their raw form so as to have access to all the data without the restructuring that usually takes place to integrate HL7 data into systems. Personalized medicine will require massive amounts of patient specific genomic, proteomic, metabolic and other data. Evidence-based medicine may benefit from intensive text mining of unstructured data (medical literature, physicians notes, etc.) in order to better align practices to established norms. Streaming HL7 messages and other real-time data to be able to proactively monitor and respond to conditions, such as potential illness outbreaks.
Big Data of course should not be thought of as a silver bullet. Issues of data governance, security, and privacy have to be addressed and require careful thought given the new paradigms that might be employed. Not all Big Data technology (e.g., NoSQL databases) will be applicable or suitable for every situation and careful tradeoff analysis has to be analyzed. A learning curve is to be expected to be able to fully exploit Big Data capabilities.