Skip to main content

Cloud

Why Does Data Warehousing Take So Long? Part 2

In my last post, I wrote about BI’s reputation for being a long drawn-out process. And of course, where the market sees a vacuum…  A plethora of products and processes exist to aid with “shortcutting” BI/DW development. So this time, I want to look some common factors at play when you use the various types of frameworks, accelerators, and off-the-shelf products. I also want to point out how you can easily miss the boat on the long-term benefits of a BI solution by depending too much on these tools.
The market contains some pretty sophisticated enterprise metadata management tools, with full-featured user interfaces that provide drag-and-drop interaction with connected data sources, will auto-generate all kinds of artifacts, etc. On that end of the spectrum, you enter the realm of data virtualization and cloud-based solutions, big vendors with pretty huge brand names (i.e. Teradata, Informatica, IBM Cognos), Big Data, etc.  Although this level of tool does significantly more than just generically speed up the BI process, they do offer a collection of “accelerator” features. Down the cost scale, others tools in this segment are a little bit more “black box,” and will, say, create ETL from scratch based on comparing data models (like Wherescape RED), or generate dashboards by just pointing the tool at a set of tables (like Tableau or QlikView). And still others are merely frameworks, either ETL or front-end building blocks that essentially allow you to Lego yourself a BI solution (i.e. the growth of BIML for SSIS-based ETL).

But this tool class sometimes offers to speed up the process by deferring the requirements gathering piece until after the data is exposed, or even indefinitely.  And then the users drive requirements by identifying the data they want among what’s been exposed.   The base assumption here is that identifying the data that people select the most will ultimately illuminate the “true” requirements. So you let people at the data first, and then you can create standard artifacts like reports and dashboards based on what they end up using.
The underlying risk in these tools is that they can let the business off the hook for understanding their own requirements. It promotes a solution which “back into” fulfilling needs, with the assumption that whatever data is needed is either already available, or is cheap and easy to access.   In the end, this approach may speed up the development cycle, but it demonstrates the Achilles’ Heel of this entire class of tools.  They accelerate gathering and exposing data, but they don’t directly expose business processes underlying those data.  This gap is where the long-term overall improvement originates with basic BI requirements analysis.  Because speaking to users about why and how they use the data identifies not just necessary data points, but helps spot process inefficiencies and opportunities for improvement in data delivery and consumption.  A truly valuable BI/DW solution comes from rationalizing business processes, and building an information architecture that eliminates unnecessary stops and channels when getting people information they need.
Am I just being a curmudgeon, standing on my virtual porch and yelling at the virtual kids to get off my data landscape?  Honestly, no. I think that leveraging these tools can be valuable and expedient in setting up a baseline system — to which tweaks are then made based on user input into requirements, of course.  But I think letting users sort through the resultant towering piles of data like WALL*E is an opportunity for the a project to go off the rails.  And I think it transfers data professionals’ work onto users, which to me seems like a dubious compromise to “hurry up” the BI development timeline.
Ultimately, I think there’s a balance between gaining foundational benefit from a tool (i.e. via auto-construction of a baseline SSIS package llibrary, or generation of data mapping documentation, etc.), while still maintaining communication with users at the level of intent — because the “why and what for” element of the system is critical.
Next time: will Agile processes speed up my BI development timeline?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Andrew Tegethoff

Andy leads Perficient's Microsoft BI team. He has 16 years of IT and software experience with a primary focus on Enterprise Information Management solutions using the Microsoft Data Platform.

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram