Skip to main content

Customer Experience and Design

A Low Cost Big Data Integration Option?

With all of the interest in big data in healthcare, it’s easy to get drawn in by the excitement and not realize that it’s not a silver bullet that’s going to address all your data and infrastructure problems. Unless you are able to understand and integrate your data, throwing all the data onto a platform like Hadoop or Cassandra probably won’t provide the benefit you’re looking for. Of course, there really is benefit to leveraging a big data platform for the right kinds of use cases, such as increased scalability, performance, potentially lower TCO, etc.

Of course, there are many integration tools out on the market that perform well. However, I’d like to propose consideration of Semantic Web technologies as a low cost alternative to traditional data integration. Many are open sourced and are based on approved standards from W3C such as RDF (Resource Description Framework) and OWL (Web Ontology Language).

How_information_is_connecting_all_types_of_healthcare_data_to_make_a_difference_LRGUsing Semantic Web technologies to enable integration, for example the Open Link Data initiative for integrating data across the internet, can (besides being less expensive) provide significant advantages for automated inferencing of new data which would previously require specialized programming to derive. Indeed, your Semantic Web environment can serve as the knowledge base for artificial intelligence.

To enable enterprise integration, you probably won’t start by converting all of your data into an RDF/OWL (in fact, I wouldn’t recommend it). Instead, you might leverage converters such as DB2RDF which will translate RDBMS data into RDF triples on the fly at query time.

Converting information to RDF/OWL (at query time or in a triple store) can bridge the semantic divide more easily. Different systems call the same thing by different names and so it can be confusing to integrate. For example, in System A, patients are identified using a PAT_ID; in system B it’s MPI_ID, and so on. Using an OWL equivalence property owl:sameAs, this mapping can be handled in a single triple (A:PAT_ID owl:sameAs B:MPI_ID) in our mapping ontology, then using SPARQL (Semantic Web query language) this data can be integrated at query time. Of course, this is a greatly simplified example, but given the hundreds and thousands of systems large healthcare organizations have, being able to have a comprehensive view of the patient can provide tremendous value.

Data integration is a good starting point for using Semantic Web technologies – your users don’t have to know what the data is called in all the various systems. They are able to do analysis by terminology they’re familiar with and don’t have to adjust their vocabulary; as long as the terminology & data is mapped (relatively simply as demonstrated above), they can find the information they need.

However, the real icing on the cake for Semantic Web technologies lies with machine reasoning and having new data inferred for you. Semantic Web technologies have been likened to “programming by example” and are very powerful for encapsulating data, meta-data, business rules, and transformation logic into a common standard format.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.