Perficient Business Intelligence Solutions Blog

Blog Categories

Subscribe via Email

Subscribe to RSS feed

Archive for the ‘Emerging BI Trends’ Category

Thoughts on Oracle Database In-Memory Option

Last month Oracle announced Oracle In-Memory database option. The overall message is that once installed, you can turn this “option” on and Oracle will become an in-memory database.   I do not think it will be that simple. However, I believe Oracle is on the correct track with this capability.

Thoughts on Oracle Database In-Memory OptionThere are two main messages with Oracle In-Memory’s vision which I view are critical capabilities in a modern data architecture. First, is the ability to store and process data based on the temperature of the data.
That is, hot, highly accessible data should be kept DRAM or as close to DRAM as possible.   As the temperature decreases, data can be stored on flash, and for cold, rarely accessed data, on disk (either in the Oracle DB or in Hadoop).   Of course we can store data of different temperatures today, however, the second feature, which is making this storage transparent to the application, makes the feature it very valuable. An application programmer, data scientist, or report developer, should not have to know where the data is stored.   It should be transparent. The Oracle DB or a DBA can optimize the storage of data based on cost /performance of storage without having to consider the compatibility with the application, cluster (RAC), or recoverability is quite powerful and useful.   Yes, Oracle has been moving this way for years, but now it has most, if not all, the pieces.

Despite the fact that the In-Memory option leverages a lot of existing core code, most of Oracle’s IT shops will need to remember that this is a version 1 product.   Plan accordingly. Understand the costs and architectural impacts. Implement the Oracle In-Memory option on a few targeted applications and then develop standards for its use. A well planned, standards-based approach, will assure that your company will maximize its return on your Oracle In-Memory investment.

A little stuffed animal called Hadoop

Doug Cutting – Hadoop creator – is reported to have explained how the name for his Big Data technology came about:

“The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria.”

A little stuffed animal called HadoopThe term, of course, evolved over time and almost took on a life of its own… this little elephant kept on growing, and growing… to the point that, nowadays, the term Hadoop is often used to refer to a whole ecosystem of projects, such as:

  1. Common – components and interfaces for distributed filesystems and general I/O
  2. Avro – serialization system for RPC and persistent data storage
  3. MapReduce – distributed data processing model and execution environment running on large clusters of commodity machines
  4. HDFS – distributed filesystem running on large clusters of commodity machines
  5. Pig – data flow language / execution environment to explore huge datasets (running on HDFS and MapReduce clusters)
  6. Hive – distributed data warehouse, manages data stored in HDFS providing a query language based on SQL for querying the data
  7. HBase – distributed, column-oriented database that uses HDFS for its underlying storage, supporting both batch-style computations and random reads
  8. ZooKeeper – distributed, highly available coordination service, providing primitives to build distributed applications
  9. Sqoop – transfer bulk data between structured data stores and HDFS
  10. Oozie – service to run and schedule workflows for Hadoop jobs

This is a sizable portion of the Big Data ecosystem… an ecosystem that keeps on growing almost by the day. In fact, we could spend a considerable amount of time describing additional technologies out there that play an important part in the Big Data symphony – DataStax, Sqrrl, Hortonworks, Cloudera, Accumulo, Apache, Ambari, Cassandra, Chukwa, Mahout, Spark, Tez, Flume, Fuse, YARN, Whirr, Grunt, HiveQL, Nutch, Java, Ruby, Python, Perl, R, NoSQL, PigLatin, Scala, etc.

Interestingly enough, most of the aforementioned technologies are used in the realm of Data Science as well, mostly due to the fact that the main goal of Data Science is to make sense out of and generate value from all data, in all of its many forms, shapes, structures and sizes.

In my next blog post, we’ll see how Big Data and Data Science are actually two sides of the same coin, and how whoever does Big Data, is actually doing Data Science as well to some extent – wittingly, or unwittingly.

Web analytics and Enterprise data…

I was looking at the market share of  Google Analytics (GA) and it is definitely on the rise. So I was curious to see the capabilities and what this tool can do. Of course it is a great campaign management tool. It’s been a while since I worked on campaign management.

GA_graphicsI wanted to know all the more now about this tool, off to YouTube and got myself up to speed on the tools capabilities. Right off the bat I noticed campaign management has changed drastically compared to the days when we were sending email blasts or snail mail, junk mail etc. I remember the days when we generated email lists and run it through third-party campaign management tools, blast it out to the world and wait. Once we get enough data (mostly when the purchase the product) to run the results through SAS, we could see the effectiveness. It took more than a month to see any valuable insights.

Fast-track to the social media era, GA provides instant results and the intelligent click-stream data for  tracking campaign management in  real-time. Checkout the YouTube Webinars to see what GA can do in a 45 min.

GA1On a very basic level, GA can track the new visitor, micro conversion (download a newsletter, or add something in a shopping cart), Macro Conversion (buy a product), or is it a returning customer. GA can track the ad-word traffic (how did they get to the website, trigger). It also has a link tag feature– which is very useful to identify the channel (email, referral website etc), linking the traffic to a specific campaign, based on the origination. It has many other features besides cool reports and analytical abilities as well.

There is so much information collected whether the customer buys a product or not. How much of this web analytics data is part of enterprise data. Does historical analysis include this data? Is this data used  for predictive and prescriptive analytics? It is important to ask the following questions to assess what percentage of gathered information is actually used at the enterprise level:

  • How well the organizations integrate this campaign data into enterprise data?
  • Do they collect and manage new prospect information at enterprise level?
  • Does the organization use this tool to enhance their master data?

This may become a Big Data question, depending on the number of Campaigns/ hits and the amount of micro activates the site can offer. Chances are that the data resides  in silo or at a third-party location and the results are not stored in the enterprise data.

SAP HANA and Hadoop – complementary or competitive?

In my last blog post, we learned about SAP HANA… or as I called it, “a database on steroids”. Here is what SAP former CTO and Executive Board Member, Vishal Sikka, told InformationWeek:

SAP HANA and Hadoop – complementary or competitive?“Hana is a full, ACID-compliant database, and not just a cache or accelerator. All the operations happen in memory, but every transaction is committed, stored, and persisted.”

In the same InformationWeek article you can read of how SAP is committed to become the #2 database vendor by 2015.

So, even if HANA is a new technology, it looks like SAP has pretty much bet its future on it. Soon, SAP customers may have SAP ERP, SAP NetWeaver BW, and their entire SAP system landscape sitting on a HANA database.

But if HANA is such a great database, you may wonder, why would SAP HANA need a partnership with Hadoop, or be integrated with Hadoop at all? Can HANA really integrate with Hadoop seamlessly? And, most importantly, are HANA and Hadoop complementary or competitive?

Well, in October 2012, SAP announced the integration of Hadoop into its data warehousing family – why?

The composite answer, in brief, is:

  1. tighter integration – SAP, Hadoop, Cloudera, Hitachi Data Systems, HP, and IBM are all brought together in order to address the ever-growing demands in the Big Data space
  2. analytics scenarios – in order to build more complex and mature analytics scenarios, HANA can be integrated with Hadoop via SAP Sybase IQ, SAP Data Services, or R queries, and include structured AND unstructured Big Data with prior integration and consolidation by Hadoop
  3. in-memory capabilities – some organizations already have existing Hadoop strategies or solutions but cannot do in-memory Big Data without HANA
  4. tailored solutions – by bringing together speed, scale and flexibility, SAP enables customers to integrate Hadoop into their existing BI and Data Warehousing environments in multiple ways, so as to tailor the integration to their very specific needs
  5. transparency for end-users – SAP BusinessObjects Data Integrator allows organizations to read data from Hadoop Distributed File Systems (HDFS) or Hive, and load the desired data very rapidly into SAP HANA or SAP Sybase IQ, helping ensure that SAP BusinessObjects BI users can continue to use their existing reporting and analytics tools
  6. queries federation – customers can federate queries across an SAP Sybase IQ MPP environment using built-in functionality
  7. direct exploration – SAP BusinessObjects BI users can query Hive environments giving business analysts the ability to directly explore Hadoop environments

In short, SAP is looking at a co-exist strategy with Hadoop… NOT a competitive one.

In the next blog post, we’ll look at Hadoop and its position in the Big Data landscape… stay tuned.

SAP HANA – A ‘Big Data’ Enabler

Some interesting facts and figures for your consideration:

  • 90% - of stored data in the world today was created in the past 2 years
  • 50% - annual data growth rate
  • 34,000 – tweets sent each minute
  • 9,000,000 – daily Amazon orders
  • 7,000,000,000 – daily Google Page Views
  • 2.5 Exabyte – amount of data created every day (an Exabyte is 1,000,000,000,000,000,000 B = 1000 petabytes = 1 million terabytes = 1 billion gigabytes)

Looking at these numbers it is easy to see why more and more technology vendors want to provide solutions to ‘Big Data’ problems.

In my previous blog, I mentioned how we’ll soon get to a place where it will be more expensive for a company not to store data than to store data – some pundits claim that we’ve already reached this pivotal point.

Either way, it would be greatly beneficial to come to terms with at least some of those technologies that have made a substantial investment in the Big Data space.

One such technology is SAP HANA – a Big Data enabler. I am sure that some of you have heard this name before… but what is SAP HANA exactly?

The acronym H.AN.A. in ‘SAP HANA’, stands for High-performance ANalytical Appliance. If I went beyond the name/acronym and described SAP HANA in one sentence, I would say that SAP HANA is a database on steroids, perfectly capable of handling Big Data in-memory, and one of the few in-memory computing technologies that can be used as an enabler of Big Data Solutions.

Dr. Berg and Ms. Silvia – both SAP HANA gurus – provide a comprehensive and accurate definition of SAP HANA:

“SAP HANA is a flexible, data-source-agnostic toolset (meaning it does not care where the data comes from) that allows you to hold and analyze massive volumes of data in real time, without the need to aggregate or create highly complex physical data models. The SAP HANA in-memory database solution is a combination of hardware and software that optimizes row-based, column-based, and object-based database technologies to exploit parallel processing capabilities. We want to say the key part again: SAP HANA is a database. The overall solution requires special hardware and includes software and applications – but at its heart, SAP HANA is a database”.

Or as I put it, SAP HANA is a database on steroids… but with no side-effects, of course. Most importantly though, SAP HANA is a ‘Big Data Enabler’, capable of:

  • Conducting Massive Parallel Processing (MPP), handling up to 100TB of data in-memory
  • Providing a 360 degree view of any organization
  • Safeguarding the integrity of the data by reducing, or eliminating data migrations, transformations, and extracts across a variety of environments
  • Ensuring overall governance of key system points, measures and metrics

All with very large amounts of data, in-memory and in real time… could this be a good fit for your company? Or, if you are already using SAP HANA, I’d love to hear from you and see how you have implemented this great technology and what benefits you’ve seen working with it.

My next blog post will focus on SAP HANA’s harmonious, or almost harmonious, co-existence with Hadoop…

Big Data: Integral Part of an Information Architecture

Forrester recently released some research titled “Reset on Big Data” and this research highlighted the lag between IT’s understanding of Big Data’s role within the enterprise data ecosystem.   In short, business leaders were 40% less likely to cite Big Data’s role as an extension to the current the current analytical environment.  This is not surprising and is consistent with our (Perficient’s) observations.   Big Data hype, with its own buzz words, code names like Falcon, Pig, Hive, Yarn etc. only create a perception that Big Data is its own little world.

Big DataHowever, Big Data is nothing new. IT organizations have been for the last fifty plus years battled to cost effectively manage the ever increasing volumes of data.   Storage costs, delivery processes, data access technology, and skill sets have evolved to allow IT to provide value from ever increasing volumes of data.   In the mid-90s with SMP hardware, software from major RDBMS vendors, and dimensional design techniques, we saw the data warehouse explosion.   Now, with the commodity priced servers, extremely low cost per TB disk storage, and analytical skills, we are seeing the next technology explosion, Big Data.   However, just like data warehousing is a part of the overall technology capability set and information infrastructure, Big Data is just the same. It is part of the overall technology information architecture and must be treated as so.

Interestingly, even though we can agree this is a rather obvious statement, most organizations that we have consulted, have not defined Big Data’s role in the overall enterprise data architecture. Has your organization defined the role of Big Data with respect to your enterprise’s overall data architecture?   If, not, why?

Yarn – The Big Data Accelerator

Yarn….. Yes, Hadoop may be changing everything, but when Yarn was released, the change pedal has been pushed aggressively to the floor. Putting the technical details aside, the bottom-line is that now multiple concurrent workloads can be executed and managed on Hadoop clusters. This “pluggable” service layer has separated the data processing and cluster resource management layer. Result is that we are not dependent on MapReduce to access and process HDFS data.

Yarn - the Big Data AcceleratorMost companies with products accessing HDFS data are doing it without MapReduce. Oracle, SAS, IBM and many niche providers run their own software components on the data nodes. This will change the dynamics of how we construct clusters. More memory and more CPU will be required to support these additional processing requirements. It is too early to tell if we should beef up our nodes or add more nodes. Short of running your own POC and tests, keep an eye on the “all-in-one” appliance vendors as they bring out their new appliances in the year. How they move will be a good indicator.

Does any vendor have a “silver bullet”?   Until these solutions get into production and mature, there will be challenges.   However, they still will provide exceptional value creation – even with any associated headaches. Do not shy away. Do your due diligence and choose tools that leverage your current capabilities. Move forward, Big Data is here to stay and you need to move forward or be left behind. The accelerator has been pushed. Are you stuck in neutral or are you in the race to develop a competitive advantage from Big Data?

If you want to learn how to quickly gain value from your Big Data; contact Perficient!

Risks Associated with Niche Big Data Vendors

One of the not so nice parts of the conference is seeing companies that have technology that have been superseded by the new releases of Hadoop, primarily Hive. One company, boasted about the fact it did not have to do full scans and could return SQL queries in seconds on large datasets. The look on the booth attendant’s face was initially startled, when I asked how his company’s technology was different then that that was included in Risks Associated with Niche Big Data VendorsHive/Stinger’s ORCFile. Bottom-line, is that he did not have a good answer other than saying we are significantly faster than “legacy” versions of Hive.

Hadoop market is in the Cambrian Explosion stage where new vendors and solutions are coming to the market at an incredible pace. However, we do know that most will either be acquired or bankrupt within the next few years which adds risk for companies needing to invest in these niche solutions.  Understand the Apache Hadoop roadmap, understand the unique capabilities of the niche provider, and understand the risks with selecting the particular niche vendor before you buy.

Making Big Data Real

Update from the Hadoop Summit: Its only part way through day one and there is an un-mistakable theme: Interactive SQL on top of Hadoop is here, and in a big way.   Stinger, Impala, and a number of other niche providers are not promising, but delivering on interactive SQL.   Benchmarks, case studies of production clients, hands on demos that you can look underneath the covers, are all on display.

Now back down to earth. Yes, the technology is here.   This technology WILL change how we approach not only enterprise data warehousing but enterprise data architecture as a whole!   However, it is quite clear that the mindshare on implementing Big Data solutions is not a prevalent as the number of solutions in the market.

It is amazing how few service providers are here at the conference exhibiting.  Even with seasoned providers, they are still shoe-horning Hadoop solutions into a traditional model.   Most presenters are the software and hardware vendors sponsoring the event, but few are really addressing the how to implement Big Data. This tells us at Perficient that we are on the correct track.   Technology is not the commodity, but the thought-leadership is.   We at Perficient have developed extensive mind-share in helping organization’s gain value from their Big Data investments.   Yes, Big Data technology is exciting and powerful, but it’s the people and the processes that make it real.

Stages of MDM…

MDM is a popular topic and many organizations are in different stages of MDM journey. Many times clients (primarily IT) want to engage consultants who can recommend a MDM tool and start the implementation, bypassing the Planning / Pre-planning stages. Typically this leads to a MDM solution which is not thought through completely or end up having similar Master Data problems even after implementing the tool.

One of my previous clients ran into the following situation after implementing MDM.  The IT department had a very capable CIO and a strong technical team. In this case IT drove the MDM implementation.  The team completed the MDM implementation successfully. But users hardly noticed the change or the improvement. The CIO recognized this right away and challenged his team to find a remedy to improve the perception. What happened?

mdm_stagesLet us step back and look at the big picture of MDM and the various stages one has to go through for a successful MDM. Three Major stages of MDM Journey consists of:

  • Planning / Pre-Planning stage
  • Development / Implementation stage
  • Steady state or ongoing support stage

Understanding the details of MDM will help align people, process and technology for these stages. Taking a holistic approach and developing the overall vision involves Business and IT working together. Analyzing the situation above told us:

The users (Business) were not engaged in all stages in the right roles and level Applies to all stages
Governance organization was not deep enough Applies to all stages
Clear communication of benefits and metrics to track them were not in place Planning, Steady State
Overall vision did not engage Business deep enough (ownership, monitoring) Planning, Steady state
Steady state stage was not fully thought out (e.g. Competency center etc.)  Steady State

 

At this point they reached outside the organization for help, to improve the perception and Business participation. IT understood the underlying technical MDM issues, and has even solved some of the complex Quality issues. But issue here was some of the approaches were fundamentally wrong. Granted this happened several years ago and one would imagine we would learn from these case studies and approach MDM  differently.  But to my surprise even today we get questions like “Can you suggest an MDM tool ?” casually, without thinking through the implications of embarking on an MDM journey.

Understanding each of the three MDM stages and engaging the business and communicating back is a critical part of a successful MDM program.  Part of the MDM challenge is to make the Business engage in defining the policies, performance metrics etc.  besides just implementing the MDM tool. In my experience, nimble and agile approaches are an option. But that doesn’t mean you don’t  take the time to understand the magnitude of the issue and lay out a well thought out strategy. Finally, MDM is more than just an IT solution, though it saves a lot of headache for IT, it is an ongoing partnership program with Business and IT.