I’m not really asking the question, I’m setting the stage for a topic that has been on my mind recently. For background, I grew up through the data architecture ranks so the problems I see our project teams experiencing seem natural to me (if not basic). In my mind it’s always been this difficult, but I just got used to it, so I don’t think about it any longer. However that doesn’t help the teams that have never experienced the ‘data anomaly’ issue and had to spend a week chasing ghosts in the data.
For starters, let’s define two different projects:
Project 1 is a web design project that has a couple of forms that allow the user to enter data.
Project 2 takes that data and combines it with some other data and produces a couple of reports.
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
The testing for Project 1 is fairly straight forward. e.g. Are all of the pages there? When I click Save does the data get stored and can I retrieve it back later?
**Disclaimer: I realize web projects can be more complex. This is just an example.**
The testing for Project 2 is much different however. For example:
– What business rules are governing the combination of the two datasets?
– What business rules were already applied to the dataset we’re combining with?
– Are the two datasets at a level of granularity (detail) that they logically should be combined?
– Finally, can the client visualize in their mind how these factors interact with each other and how this new dataset will be used?
Obviously there are a lot of other factors that can affect the outcome of Project 2, but the idea here is to point out that data analytic projects are built around ‘black box’ functionality that is difficult for people to understand. Understanding the individual steps is one thing, but visualizing a working, data analytics machine is something completely different.
In reality, if Project 2 were a true BI project there would probably be a number of additional black boxes, e.g. ODS, DM, DW, Cube, Semantic Layer, Universe, etc. and every one of them adds another layer of complexity to the final solution, which makes it even more difficult for our clients to fully grasp the complexity of what is being built.
So what do we do to make this process as painless as possible? Let’s explore this in the next post…