Like my 1st grade teacher would tell me when I ended a sentence with this preposition……”It’s between the ‘A’ and the ‘T’”. Well, in this situation, it’s between the “cloud” and the “on premise”.
More and more companies are starting to explore and use Infrastructure as a Service (IaaS) as a viable option for developing and maintaining their data warehouse. There are many companies on the market that provide Iaas like Amazon, AT&T, and bluelock, to name only a few. We see this market taking off almost exponentially because providers are offering companies environments that are safe, secure, fast, redundant, and cheap. Also, without a doubt, many companies are already using Software as a Service (SaaS) where much of their data is also stored in the cloud (Sales Force, Workday, Facebook, Twitter, etc.).
Although much of the company’s data is being relocated and used in the cloud, there is a lot that is still on premise (On-Prem) and for all practical purposes will remain there. According to Chris Howard, managing vice president at Gartner, “Hybrid IT is the new IT and it is here to stay. While the cloud market matures, IT organizations must adopt a hybrid IT strategy that not only builds internal clouds to house critical IT services and compete with public CSPs, but also utilizes the external cloud to house noncritical IT services and data, augment internal capacity, and increase IT agility.”
The issue now starts to become, how do I manage my data environment that is both in the Cloud and On-Prem? And, how do I keep the information in sync and current so that I can use the data where appropriate to make better business decisions?
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
There are several software vendors on the market that realized that this is something that quickly needs to be addressed (short of manual coding) and they provide solutions in this area. Right now, Informatica is the market leader in data integration and they also have solutions that easily manage the issues of a hybrid data environment (Cloud and On-Prem). Informatica has been recognized by ChannelWeb as the pioneer of Cloud data integration and by salesforce.com customers as the #1 integration application on AppExchange for the past 5 years.
So why is managing in the cloud and on-prem that easy? From what I have seen with this product, since Informatica already offers connectivity to just about everything (well, maybe everything), it uses some of the same logic and thought process for extending the concept of data integration to everything in the Cloud. This concept includes data synchronization, data quality, Master Data Management, etc. They have created connectors to many of the SaaS applications in the cloud so a user of this solution does not need to hand code anything to quickly connect and start using the service. Plus, if a person already knows how to use any of Informatica’s On-Prem solutions (like PowerCenter, DQ, MDM, etc.) there is very little to no learning curve to quickly apply this knowledge to the Cloud solution.
With Informatica’s concept of VIBE (virtual data machine), a person can map once and deploy anywhere. What this means is that a developer can create data mappings in PowerCenter with the On-Prem solution and then run the mappings in the Cloud solution. These solutions can also be created directly in the Cloud product and then run On-Prem if needed.
So let’s take a look at the architecture of the Informatica Cloud solution. The main thing about how this works is that the company’s data does not pass through Informatica’s environment in the cloud to reach any destination whether it is in the Cloud or On-Prem. When installing the Informatica Cloud product, a runtime agent is placed in the customer’s environment (yep, behind the firewall if needed) and this is where all the work is done. Metadata about your environments is stored in the Informatica Cloud (data about the sources, targets, jobs, transformations, etc.) and managing and monitoring of your integration processes are performed through a web application. All the work and data movement is done in the customer’s environment. The only actual data that goes to the Cloud is data that you choose to store in the cloud (e.g. Sales Force, your data warehouse in Amazon RedShift, etc.).
The product has prebuilt connectors to many Cloud Based solutions so it’s only a matter of selecting the application that you need to connect with in the Cloud and the Informatica Cloud solution automatically understands it’s structure and how to access the data stored there. I was very surprised how quick and easy a job could be set up to maintain data synchronicity between On-Prem and Cloud data.
Here is a diagram of the architecture that I mentioned earlier. The dotted line represents the management of the metadata in the Informatica Cloud. The company’s actual data travels only between the On-Prem location and the Cloud applications that the company subscribes to…… Well there you go; I ended my blog with a preposition. Forgive me Mrs. Rita Hart….
Image courtesy of Informatica