Data Virtualization offers a unique opportunity for IT and Business to leverage this technology to cut down the development time for adding new sources of data. The providers of this technology is the top software vendors like IBM, Microsoft etc. (see Forrester wave) … with the new entrant Cisco (bought Composite recently). This is not a complete list. There are other players in this market.
Many BI tools offer connectivity to different types of data sources as part of their interface (think ODBC) – but it falls more in the ETL side of the offering. Virtualization provides a way to hide the physical names and provides a common model / canonical models for the business user’s consumption.
Use case
ETL development for adding new data sources to Enterprise Data Warehouse (EDW) takes a long time simply because of the rigor needed for loading and validation of the data. Business users want these new data for analysis or even just for cross checking as soon as possible. Adding new data sources in a reasonably shorter turnaround time like in days as opposed to weeks and months is possible by using Data Virtualization tools.
Benefits of Data Virtualization:
- Buys time for IT: Provides the intermediate solution to business while IT take their time to build the data integration with proper controls.
- Assess the value of the data: Business users can validate the usability and the overall Quality of the Data and help define the business rules for data cleansing.
- Seamless Deployment: IT can change the sources of data underneath the logical layer without any interruption to services when the data is ready for full integration.
IT can leverage Data virtualization for providing quick access to the needed Data to power users without compromising the control. After establishing the trustworthiness of the data, bigger roll out can follow suit. Putting the proper processes for access and letting IT manage the meta-data (Logical) layer will be a good way to have an oversight on the usage. These processes will give the needed control to IT in managing the Data sources to avoid operational nightmares.
“Business users want these new data for analysis or even just for cross checking as soon as possible.”
What’s the point of gathering all this data if you can’t use it in a reasonable amount of time? Things change so rapidly that if it takes you months to verify some data set you might be working with out of date data!
Absolutely correct. Just to clarify the statement, Virtualization cuts down the development time and provides the data quicker but there is a compromise here. Data Quality may or may not meet the expectations. On the other hand traditional ETL offers robust solution but development time will be longer. Once development is complete the data itself is made avaialabe meeting the business timeline expectations in both approaches. Virtualization augments the ETL does not replace it.