In part 1 of this series, we discussed some of the most common assumptions associated with Big Data Proof of Concept (POC) projects. Today, we’re going to begin exploring the next stage in Big Data POC definition – “The What.”
The ‘What’ for Big Data has gotten much more complicated in recent years; and now involves several key considerations:
- What business goals are involved – this is perhaps the most important part of defining any POC yet strangely is often ignored in many POC efforts.
- What scope is involved – for our purposes this means how much of the potential solution architecture will be evaluated. This can be highly targeted (database layer only) or can be comprehensive (an entire multi-tiered stack).
- What technology is involved – this one is tricky because often times people view a POC only in the context of proving a specific technology (or technologies). However, our recommended approach involves aligning technologies and business expectations up front – thus the technology isn’t necessarily the main driver. Once the goals are better understood then selecting the right mix of technologies becomes supremely important. There are different types of Big Data databases and a growing list of BI platforms to choose from – these choices are not interchangeable – some are much better tailored for specific tasks than others.
- What platform is needed – this is one of the first big technical decisions associated with both Big Data and Data Warehouse projects these days. While Big Data evolved sitting atop commodity hardware, now there are a huge number of device options and even Cloud platform opportunities.
- What technical goals or metrics are required – this consideration is of course what allows us to determine whether we’ve achieved success or not. Often times, organizations think they’re evaluating technical goals but don’t develop sufficiently detailed metrics in advance. And of course this needs to be tied to specific business goals as well.
Once we get through those first five items, we’re very close to having a POC Solution Architecture. But how is this Architecture represented and maintained? Typically, for this type of Agile project, there will be three visualizations:
- A conceptual view that allows business stakeholders to understand the core business goals as well as technical choices (derived from the exploration above).
- A logical view which provides more detail on some of the data structure/design and well as specific interoperability considerations (such as login between DB and analytics platform if both are present). This could be done using UML or freeform. As most of these solutions will not include Third Normal Form (3NF) Relational approaches, the data structure will not be presented using ERD diagram notation. We will discuss how to model Big Data in a future post.
- There is also often a need to represent the core technical architecture – server information, network information and specific interface descriptions. This isn’t quite the same as a strict data model analogy (Conceptual Logical, Physical). Rather this latter representation is simply the last level of detail for the overall solution design (not merely the DBMS structure).
It is also not uncommon to represent one or more solution options in the conceptual or logical views – which helps stakeholders decide which approach to select. Usually, the last view or POC technical architecture is completed after the selection is made.
There is another dimension to “The What” that we need to consider as well – the project framework. This project framework will likely include the following considerations:
- Who will be involved – both from a technical and business perspective
- Access to the capability – the interface (in some cases there won’t be open access to this and then it becomes a demo and / or presentation)
- The processes involved – what this means essentially is that the POC is occurring in a larger context; one that likely mirrors existing processes that are either manual or handled in other systems
The POC project framework also includes identification of individual requirements, overall timeline as well as specific milestones. In other words, the POC ought to managed as a real project. The project framework also serves as part of the “How” of the POC, but at first it represents the overall parameters of what will occur and when.
So, let’s step back a moment and take a closer look at some of the top level questions from the beginning. For example, how do you determine a Big Data POC scope? That will be my next topic in this series.
copyright 2014, Perficient Inc.