Perficient Enterprise Information Solutions Blog

Blog Categories

Subscribe via Email

Subscribe to RSS feed

Archives

Follow Enterprise Information Technology on Pinterest

Posts Tagged ‘Big Data’

Data Quality – Don’t Fix It If It Ain’t Broke

What is broke?  If I drive a pickup truck around that has a small, unobtrusive crack in the windshield and a few dings in the paint, it will still pull a boat and haul a bunch of lumber from Home Depot. Is the pickup broke if it still meets my needs?

pickup truck haulingSo, when is data broke? In our legacy data integration practices, we would profile data and identify all that is wrong with the data. Orphan keys, in-appropriate values, and incomplete data (to name a few) would be identified before data was moved. In the more stringent organizations data would need to near perfect for it to be used in a data warehouse. This ideal world or perfect data was strived after, but rarely obtained. It was too expensive, required too much business buy in, and lengthen BI and DW projects.   Read the rest of this post »

Big Data Changes Everything – Has Your Governance Changed?

A few years ago, Big Data/Hadoop systems were generally a side project for either storing bulk data or for analytics. But now as companies  have pursued a data unification strategy, leveraging the Next Generation Data Architecture, Big Data and Hadoop systems are becoming a strategic necessity in the modern enterprise.

shutterstock_124189609 (1)

Tupungato / Shutterstock.com

Big Data and Hadoop are technologies with so much promise and a very broad and deep value proposition. But why are enterprises struggling to see real-world results from their Big Data investments? Simply put it is governance.   Read the rest of this post »

Analytics in the Digital Transformation Era

Successful Enterprises compete on many capabilities ranging from product excellence, customer service and marketing to name a few. Increasingly the back office / Information Technology (IT) is becoming a strategic player in the Digital Business Model which supports these key capabilities. In other words back office/IT Capability itself is becoming a differentiator. All of the key strategies like Customer Excellence, Product Excellence, and Market Segmentation depend on the successful Digital Business Model.

Having more data especially noisy data is complex to deal with. New platforms and tools are a must to make it possible to deal with them. Working with internally captured Enterprise data to answer strategic questions like “Should there be a pricing difference of life, annuities, and long-term care?” or setting up the benchmark for “Servicing cost per policy for life, annuities, and long-term care” can only go that much far. Ingesting and integrating the external data including machine data will change the way pricing and segmentation is done today.

In the technology space a wide variety of capabilities in terms tools / platforms, architecture offering Time to Market opportunities to leading edge predictive / prescriptive models to enable Business to operate and execute efficiently. What this all means is that Business has to embrace the Digital transformation happening faster than ever.

Traditional Analytics

two_approach

Key strategies from IT should include two kinds of applications / platforms for dealing with new analytical and old analytical methods. The first kind is slow-moving or traditional Enterprise data which ends up in the warehouse and made available for ‘What happened’ questions, traditional reporting, business intelligence / Analytics etc.

Fast Analytics

The second kind is the real-time analytical response to the interactive customer, keeping in constant touch through multiple channels while providing seamless interaction and user experience. Technologies, platforms, architecture and applications are different for these two types of processing.

In the new world of Information management, traditional Enterprise applications and Data Warehouse becomes another source rather than the complete source of Data. Even absence of data is relevant information if the context is captured. Analytics is becoming more real-time with adaptive algorithms influencing different outcome based on the contextual data. Building the modern Information platforms to address these two different needs of the enterprise is becoming the new standard.

Lambda Architecture for Big Data – Quick peek…

In the Big Data world Lambda architecture created by Nathan Marz is a standard technique applied to solve many predictive analytics problems. This architecture effectively delivers the streaming data and batch data to combine the past information with the current changes, producing a comprehensive platform for predictive framework.

lambda_arch

Lambda Architecture

On a very high generic level the architecture has 3 components.

  • Batch Layer, which has all the processed batch data from the past.
  • Speed Layer or real-time feed of similar or same information.
  • Servicing layer which holds the batch views relevant for the queries needed by the predictive analytics

Lcapambda architecture solves the issue of intended output can change because of code changes.  In other words enhancement in code for better data processing is achieved by keeping the original input data intact or read only. Though some may claim that Lambda architecture is an exception to CAP theorem is debatable.

In reality, programming for batch and the stream typically needs two different set of codes. This is an issue because business logic and other enhancements has to be done in two different places. Creating a single API for both batch and real-time data can be one way to hide the complexity for the higher level code but the fact remains there are two different branches for processing at the lower level.

Extended lambda Architecture

Assuming you are satisfied with the limitations of Lambda architecture, most predictive analytics needs past data along with  the data captured within the enterprise. Including those key data will enhance the overall quality and provide the most available data for the predictive engine.

As the industry matures, these techniques will become more robust and will provide the best available data faster than ever. As we now take star schemas and their variations as a given for Data Warehousing, Lambda architecture and their variations will be prevalent in the near future as well.

 

 

Hadoop’s Ever-Increasing Role

With the advent of Splice Machine and the release of Hive 0.14 we are seeing Hadoop’s role in the data center continue to grow. Both of these technologies support limited transactions against data stored in HDFS.

Untitled design (4)Now, I would not suggest moving your mission-critical ERP systems to Hive or Splice Machine, but the support of transactions is opening up Hadoop to support more use cases, especially those use cases supported by RDBMS based data warehouses. With transaction support there is a more elegant way to handle slowly changing dimensions of all types in Hadoop now that records can be easily updated. Fact tables with late-arriving information can be updated in place. With transactional support, Master Data can be supported more efficiently. The writing is on the wall: more and more of the functionality that has been historically provided by the data warehouse is now moving to the Hadoop cluster.

To address this ever-changing environment, enterprises must have a clear strategy for evolving their Big Data capabilities within their enterprise architecture. This Thursday, I will be hosting a webinar, “Creating the Next-Generation Big Data Architecture,” where we will discuss Hadoop’s different roles within in a modern enterprise’s data architecture.

What is Your Big Data Strategy…?

Big Data is big deal. Every vendor has a strategy and a suite of products. Navigating the maze and picking the right Big Data platform and tools takes some level of planning and looking beyond techie’s dream product suite. Compounding the issue is the open source option vs. going with a vendor version of the open source. Like every other new technology, product shakedowns will happen sooner or later. So picking a suite now is like betting on the stock market, exercising caution and being  conservative with long-term outlook will pay off.

Organizations tend to follow the safe route of sticking with the big vendor strategy but the downside is getting the funding and putting up with the procurement phase of waiting forever for the approval. The hard part is knowing the product landscape, assessing the strengths of each type of solution and prioritizing the short-term and long-term strategy.

I have seen smaller companies building their entire solution in open stack and don’t pay a penny for the software. bg_confusionObviously the risk and the rewards plays out.  Training the resources and hiring trained resources from the market place is a huge factor as well. Open source still has the same issues of version, bugs and compatibility, so having the knowledgeable team makes a big difference in managing the environment and the overall quality of the delivery.

But despite the confusion, there is good news. If you are in the process of figuring out how you want to play the Big Data game, big and small vendors alike are providing you with the sandbox or Dev environment almost free or for limited duration. Leveraging this option as part of the Big Data strategy will not only save money but also the learning curve. IBM Bluemix is an example of that. So does Cloudera, Datastax and the list is growing.

To maximize the benefit, follow the basic portfolio management strategy.

  • Take an inventory of tools already available within the organization
  • Identify the products which will play better with the existing tools
  • Figure out the business case and the types of tools needed to get a successful POC
  • Match the product selection with resource knowledge base
  • Get as much help from external sources (a lot of them can be free, if you have the time) from training to POC
  • Start small and use it to get the buy in for the larger project
  • Invest in developing the strategy with POC to uncover the benefits and to build strong business case

Combining this strategy with little bit external help to narrow down the selection and avoiding the pitfalls based on the industry experience will add tremendous value in navigating the complex selection process. Time to market can be drastically cut down especially when you make use of the DevOps platform on the cloud.

The direct benefits in leveraging the try-before-buy options are:

  • No Hardware / wait time or IT involvement for setting up the environment
  • All the tools are available and ready to test
  • Pricing and the product stack can be validated rather than finding out later that you need to buy one more product which is not in the budget
  • Time to market is drastically cut down
  • Initial POC and Business Case can be built with solid proof
  • Throwaway work can be minimized

Looking at the all the benefits, it is worth taking this approach especially if you are in the initial stages and you want proof before asking for the millions which is hard to justify.

Defining Big Data Prototypes – part 2

In part 1 of this series, we discussed some of the most common assumptions associated with Big Data Proof of Concept (POC) projects. Today, we’re going to begin exploring the next stage in Big Data POC definition – “The What.”

The ‘What’ for Big Data has gotten much more complicated in recent years; and now involves several key considerations:

  1. What business goals are involved – this is perhaps the most important part of defining any POC yet strangely is often ignored in many POC efforts.
  2. What scope is involved – for our purposes this means how much of the potential solution architecture will be evaluated. This can be highly targeted (database layer only) or can be comprehensive (an entire multi-tiered stack).
  3. What technology is involved – this one is tricky because often times people view a POC only in the context of proving a specific technology (or technologies). However, our recommended approach involves aligning technologies and business expectations up front – thus the technology isn’t necessarily the main driver. Once the goals are better understood then selecting the right mix of technologies becomes supremely important.  There are different types of Big Data databases and a growing list of BI platforms to choose from – these choices are not interchangeable – some are much better tailored for specific tasks  than others.
  4. What platform is needed – this is one of the first big technical decisions associated with both Big Data and Data Warehouse projects these days. While Big Data evolved sitting atop commodity hardware, now there are a huge number of device options and even Cloud platform opportunities.
  5. What technical goals or metrics are required – this consideration is of course what allows us to determine whether we’ve achieved success or not. Often times, organizations think they’re evaluating technical goals but don’t develop sufficiently detailed metrics in advance. And of course this needs to be tied to specific business goals as well.

 

Big Data POC Architecture views

Big Data POC Architecture views

 

Once we get through those first five items, we’re very close to having a POC Solution Architecture. But how is this Architecture represented and maintained? Typically, for this type of Agile project, there will be three visualizations:

  • A conceptual view that allows business stakeholders to understand the core business goals as well as technical choices (derived from the exploration above).
  • A logical view which provides more detail on some of the data structure/design and well as specific interoperability considerations (such as login between DB and analytics platform if both are present). This could be done using UML or freeform. As most of these solutions will not include Third Normal Form (3NF) Relational approaches, the data structure will not be presented using ERD diagram notation. We will discuss how to model Big Data in a future post.
  • There is also often a need to represent the core technical architecture – server information, network information and specific interface descriptions. This isn’t quite the same as a strict data model analogy (Conceptual Logical, Physical). Rather this latter representation is simply the last level of detail for the overall solution design (not merely the DBMS structure).

It is also not uncommon to represent one or more solution options in the conceptual or logical views – which helps stakeholders decide which approach to select.  Usually, the last view or POC technical architecture is completed after the selection is made.

There is another dimension to “The What” that we need to consider as well – the project framework. This project framework will likely include the following considerations:

  • Who will be involved – both from a technical and business perspective
  • Access to the capability – the interface (in some cases there won’t be open access to this and then it becomes a demo and / or presentation)
  • The processes involved – what this means essentially is that the POC is occurring in a larger context; one that likely mirrors existing processes that are either manual or handled in other systems

The POC project framework also includes identification of individual requirements, overall timeline as well as specific milestones. In other words, the POC ought to managed as a real project.  The project framework also serves as part of the “How” of the POC, but at first it represents the overall parameters of what will occur and when.

So, let’s step back a moment and take a closer look at some of the top level questions from the beginning. For example, how do you determine a Big Data POC scope? That will be my next topic in this series.

 

copyright 2014, Perficient Inc.

Three Big Data Business Case Mistakes

Tomorrow I will be giving a webinar on creating business cases for Big Data. One of the reasons for the webinar was that there is very little information available on creating a Big Data business cases. Most of what is available boils down to a “trust me, Big Data will be of value.” Most information available on the internet basically states:

More information, loaded into a central Hadoop repository, will enable better analytics, thus making our company more profitable.  

Although logically, this statement seems true and most analytical companies have accepted the above statement, it illustrates the 3 most common mistakes we see in creating a business case for Big Data.

The first mistake, is not directly linking the business case to the corporate strategy. The corporate strategy is the overall approach the company is taking to create shareholder value.   By linking the business case to the objectives in the corporate strategy, one will be able to illustrate the strategic nature of Big Data and how the initiative will support the overall company goals. Read the rest of this post »

It’s all about the data, the data…

Credit_cardWhen Apple jumped into the payment processing with ApplePay, I thought this would be a great leg up for Apple. But who will be the winner and who will be the loser? Granted the payment switches from the credit card to ApplePay which indirectly pays for the purchase, who cares as long as we can charge on the card we want, right? Also what is the market share of Apple Pay going to be? Before we answer all those questions, let’s take a look at how we pay today for services and goods.

Cash may still be the king, that may very well be the last one to die, but what everyone is after is the middle class market which is fast adapting to credit cards and now to smart phones based services, dwindling check usage tells you so. With many ways of shopping using credit cards, store cards, pre-paid cards, Paypal, Internet (billpay,  bitcoin?), the convenience I see is carrying less or no cards at all. I seldom carry my store cards, especially when they can look it up.

Apple pay will be convenient, and may help get rid of the cards altogether, if it is accepted by majority of the merchants. Discover has to go through hurdles before it got accepted, so I don’t see myself getting rid of the cards in the near future, although cards may disappear before cash does.

Credit_trans

I read the news that many major merchants have signed up with Apple and I thought, what happens to the data? Who will be owning the granular consumer spend information? Before I could finish the blog I heard the news 2 major retailers pulled out of Apple. Ha, they realized it, the data is more valuable than the technology or convenience to customers. Imagine the data movement and explosion even if Apple shares the detailed information to each of the parties involved.

Apple is expected to have around 34 Million customers with an average of 200 transaction per customer it is going to explode. You can do the math, if this information has to be shared with 2- 5 parties. No wonder some retailers are wary of signing up. I won’t be surprised if each one of the financial institutions / retailers come up with their own App for payment mechanism.

In the end having the customer spend data is more valuable for the business operations, customer excellence etc. Having the right Information Governance to manage this Information asset is not only strategic but also a matter of survival to the enterprise.

The Chief Analytics Officer

One of the key points I make in our Executive Big Data Workshops is that effective use of Big Data analytics will require transforming both business and IT organizations.   Big Data with access to cross-functional data will transform the strategic processes within a company that guide long term and year to year investments. With the ability to apply machine learning, data mining, and advance analytics to view how different business processes interact with each other, companies now have empirical information for use in their strategic processes.

We are now seeing evidence of this transformation happening with the emergence of the  Chief Analytics Officer position.  As detailed in this InfoWorld article, Chief analytics officer: The ultimate big data job, it’s not about data but what you do with the data. And it is important enough to create a new position, the CAO. I recommend reading this article.