Perficient Enterprise Information Solutions Blog

Blog Categories

Subscribe via Email

Subscribe to RSS feed

Archives

Follow our Enterprise Information Technology board on Pinterest

Archive for the ‘Emerging BI Trends’ Category

Qlik Makes Sense … the .Next Big Thing?

In my blog post about ‘Qlik Leadership’ – back in April – I pointed out how Qlik was going to reinvent itself and the BI market once again. A few months later Qlik Sense was released. Qlik Sense Desktop is a Windows-based desktop application, and I view it as Qlik’s first installment on the .Next wave of innovation.

“Just as Qlik disrupted the business intelligence industry to pioneer the data discovery category, the company is now helping transform the category as it matures to governed, user-driven creation” – per TDWI (see full TDWI Article).

Here are some features of Qlik Sense:

  • Drag-and-drop – interface that allows users to create dynamic visualizations by clicking on sets of data
  • Architecture – alternative to OLAP, with clear goal to avoid boxing the user into a predefined view of data-sets
  • Data-source Agnostic – the system works with many kinds of data, and multiple data-sources
  • BD Interactive Visualization – end-users create interactive visualizations of data that can be shared with others
  • Data Storytelling – for interactive explanations and discussions for presentations, and break out data in detail
  • HTML5 – publishing that allows results to be examined in a conventional Web browser
  • Free – for personal and internal business use

These are just some of the new features in Qlik Sense leading to a more and more “consumerized” analytic software. Read the rest of this post »

The Chief Analytics Officer

One of the key points I make in our Executive Big Data Workshops is that effective use of Big Data analytics will require transforming both business and IT organizations.   Big Data with access to cross-functional data will transform the strategic processes within a company that guide long term and year to year investments. With the ability to apply machine learning, data mining, and advance analytics to view how different business processes interact with each other, companies now have empirical information for use in their strategic processes.

We are now seeing evidence of this transformation happening with the emergence of the  Chief Analytics Officer position.  As detailed in this InfoWorld article, Chief analytics officer: The ultimate big data job, it’s not about data but what you do with the data. And it is important enough to create a new position, the CAO. I recommend reading this article.

The Best Way to Limit the Value of Big Data

A few years back I worked for a client that was implementing cell level security on every data structure within their data warehouse. They had nearly 1,000 tables and 200,000 columns — yikes! Talking about administrative overhead. The logic was that data access should only be given on a need-to-know basis. The idea would be that users would have to request access to certain tables and columns.

Big DataNeed-to-know is a term frequently used in military and government institutions that refers to granting access to sensitive information to cleared individuals. This is a good concept, but the key here is the part about “granting access to SENSITIVE data.” The key is that the information has to be classified first, then need-to-know (for cleared individuals) is applied.

Most government documents are not sensitive. This allows the administrative resources to focus on the sensitive, classified information. The system for classifying information as Top Secret, Secret, and Confidential, has relatively stringent rules for, but also discourages the over classification of information. This is because when a document is classified, its use becomes limited.

This same phenomenon is true in the corporate world. The more a set of data is locked down, the less it will be used. Unnecessary limiting an information’s workers access to data obviously does not help the overall objectives of the organization. Big Data just magnifies this dynamic and unnecessarily restricting access to Big Data is the best way to limit its value. Unreasonably lock down Big Data, its value will be severely limited.
Read the rest of this post »

One Cluster To Rule Them All!

In the Hadoop space we have a number of terms for the Hadoop File System used for data management. Data Lake is probably the most popular. I have heard it called a Data Refinery as well as some other not so mentionable names. The one that has stuck with me has been is the Data Reservoir. Mainly because this most accurate water analogy to what actually happens in a Hadoop implementation that is used for data storage and integration.

Consider, that data is first landed in the Hadoop file system. This is the un-processed data just like water running into a reservoir from different sources. The data in this form in only fit for limited use, like analytics by trained power users. The data is then processed just like water is processed. Process water you end up with water that is consumable. Go one step further and distill it, and you have water that is suitable for medical applications. Data is the same way in a Big Data environment. Process it enough and one ends up with conformed dimensions and fact tables. Process it even more, and you have data that is suitable for basing bonuses or even publishing to government regulators. Read the rest of this post »

The Modern Data Warehouse Will Augment Hadoop

The data warehouse has been a part of the EIM vernacular for nearly 20 years. The vision of the single source of the truth and a single repository for reporting and analysis are two objectives that have resulted in a never-ending journey.   The data warehouse never has had enough data and the quality required for a single version of the truth demands significant investment that only rare business cases could support. Further, the role of the analytical database has generally been difficult to achieve. Ad-hoc analysis on large sets of complex data has generally been a significant challenge for the traditional data warehouse. Historically, to address this, companies have implemented appliances, analytical data marts, or a varying set of database features and compromises (think bit mapped indexing, a variety of hardware and software caching techniques, indexed stored data to name a few).   All with significant investment and usually adding significant overhead.   Read the rest of this post »

Internet of Things and Enterprise Data Management…

 

It is amazing to see the technology terms we come up with to explain new technology or trend. The consulting thought leadership coins the words to group a set of technology, trend to make it easier for people to have a context. However the success and adoption of the technology/trend defines the term’s reputation. For example Data warehouse was an in-thing only to be shunned when it did not deliver on its promises. Industry quickly realized the mistake and called it Business Intelligence and hid Data Warehouse behind BI until things settled. Now no one questions value of DW or EDW or perceive that as a risky project.

Some terms are really great and they are here to stay for a long time. Some withers away, some change and take a different meaning. One such term which got my attention is IoT – Internet of Things – what is this? It sounds like ‘Those things’ but really what is this trend or technology?

Wikipedia gives you this definition:

“The Internet of Things (IoT) is the interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure. Typically, IoT is expected to offer advanced connectivity of devices, systems, and services that goes beyond machine-to-machine communications (M2M) and covers a variety of protocols, domains, and applications.[1] The interconnection of these embedded devices (including smart objects), is expected to usher in automation in nearly all fields, while also enabling advanced applications like a Smart Grid.[2]

IoT1

That is a lot of stuff. Looks like pretty much everything we do with Internet. I am sure this term will change and take shape. But let’s look how this relates to Enterprise Data Management. So from an enterprise data perspective, Let us consider a subset of IoT – machine generated internet data and consolidation of data from the systems operating on the cloud. What we end up with is a whole lot of data which is new, and also not in the traditional Enterprise Data framework. The impact and exposure are real, and much of the IoT data may live outside the firewalls.

In essence, the Enterprise Data Management need to deal with the added dimension of Architecture, Technology, and Governance of IoT. Considering IoT Data as out of scope for Enterprise Data Management will lead to more issues than it can solve, especially if you are generating or depend on the IoT data.

Realizing Agile Data Management …

Years of work went into building the elusive single version of truth. Despite all the attempts from IT and business, Excel reporting and Access databases were impossible to eliminate. Excel is the number one BI tool in the industry and for the following good reasons : accessibility to the tool, speed and familiarity. Almost all the BI tools export data to Excel for those reasons. Business will produce the insight they need as soon as the data is available, manual or otherwise. It is time to come to terms with the fact change is imminent and there is no such thing as Perfect Data but only what is good enough to business. As the saying goes:

‘Perfect is the enemy of Good!’

So waiting for all the business rules and perfect data to produce the report or analytics, is too late for the business. Speed is of essence, when the data is available, business wants it; stale data is as good as not having it.

Data_Management_1

In the changing paradigm of Data Management, agile ideas and tools are in play. Waiting for Months, weeks or even a day to analyze the data from Data warehouse is a problem. Data Discovery through Agile BI tools which doubles as ETL, offers significant reduction in data availability. Data Virtualization provides access to data in real-time for quicker insights along with metadata. In-Memory data appliances produce analytics in fraction of the time compared to traditional Data warehouse/ BI.

We are moving from the Gourmet sit-in dining to fast food concept for Data access and analytical insights. Though both have its place, usage benefits and short comings. They complement each other in terms of use and the value they bring to the Business. In the following series let’s look at these new set of tools and how they help Agile  Data Management throughout the life cycle.

  1. Tools in play:
    1. Data Virtualization
    2. In-Memory Database (appliances)
    3. Data Life Cycle Management
    4. Data Visualization
    5. Cloud BI
    6. Big Data (Data Lake & Data Discovery)
    7. Cloud Integration (on-prem and off-prem)
    8. Information Governance (Data Quality, Metadata, Master Data)
  2. Architectural changes traditional Vs Agile
  3. Data Management Impacts
    1. Data Governance
    2. Data Security & Compliance
    3. Cloud Application Management

DevOps Considerations for Big Data

Big Data is on everyone’s mind these days. Creating an analytical environment involving Big Data technologies is exciting and complex. New technology, new ways of looking at the data which is otherwise remained dark or not available. The exciting part of implementing the Big Data solution is to make it a production ready solution.

Once the enterprise comes to rely on the solution, dealing with typical production issues is a must. Expanding the data lakes and creating multiple applications accessing, changing and deploying new statistical learning solutions can hit the overall platform performance. In the end-user experience and trust will become an issue if the environment is not managed properly. Models which used to run in minutes may turn into hours and days based on the data changes and algorithm changes deployed. bigdata_1Having the right DevOps process framework is important to the success of Big Data solutions.

In many organizations the Data Scientist reports to the business and not to IT. Knowing the business and technological requirements and setting up the DevOps process is key to make the solutions production ready.

Key DevOps Measures for Big Data environment:

  • Data acquisition performance (ingestion to creating a useful data set)
  • Model execution performance (Analytics creation)
  • Modeling platform / Tool performance
  • Software change impacts (upgrades and patches)
  • Development to Production –  Deployment Performance (Application changes)
  • Service SLA Performance (incidents, outages)
  • Security robustness / compliance

 

One of the top key issue is Big Data security. How secured is the data and who has the access and the oversight of the data? Putting together a governance framework to manage the data is vital for the overall health and compliance of the Big Data solutions. Big Data is just getting the traction and much of best practices for Big Data DevOps scenarios yet to mature.

Virtualization – THE WHY?

 

The speed in which we receive information from multiple devices and the ever-changing customer interactions providing new ways of customer experience, creates DATA! Any company that knows how to harness the data and produce actionable information is going to make a big difference to their bottom line. So Why Virtualization? The simple answer is Business Agility.

As we build the new information infrastructure and the tools for the modern Enterprise Information Management, one has to adapt and change. In the last 15 years, the Enterprise Data Warehouse has matured to a point with proper ETL framework and Dimension models.

With the new ‘Internet of Things’ (IoT) a lot more data is created and consumed from external sources. Cloud applications create data which may not be readily available for analysis. Not having the data for analysis will greatly change the critical insights outcome.

Major Benefits of Virtualization

 Virtualization_benefits

Additional considerations

  • Address performance impact of Virtualization on the underlying Application and the overall refresh delays appropriately
  • It is not a replacement for Data Integration (ETL) but it is a quicker way to get data access in a controlled way
  • May not include all the Business rules, which implies Data Quality issues, may still be an issue

In conclusion, having the Virtualization tool in the Enterprise Data Management portfolio of products will add more agility in Data Management. However, use Virtualization  appropriately to solve the right kind problem and not as a replacement to traditional ETL.

Cloud BI use cases

Cloud BI comes in different forms and shapes, ranging from just visualization to full-blown EDW combined with visualization and Predictive Analytics. The truth of the matter is every niche product vendor offers some unique feature which other product suite does not offer. In most case you almost always need more than one suite of BI to meet all the needs of the Enterprise.

De-centralization definitely helps the business in achieving agility and respond to the market challenges quickly. At the same token that is how companies may end up with silos of information across the enterprise.

Let us look at some scenarios where a cloud BI solution is very attractive to Departmental use.

time_2_mktTime to Market

Getting the business case built and approved for big CapEx projects is a time-consuming proposition. Wait times for HW/SW and IT involvement means lot longer delays in scheduling the project. Not to mention the push back to use the existing reports or wait for the next release which is allegedly around the corner forever.

 

deploymentDeployment Delays

Business users have immediate need for analysis and decision-making. Typical turnaround for IT to get new sources of data takes anywhere between 90 days to 180 days. This is absolutely the killer for the business which wants the data now for analysis. Spreadsheets are still the top BI tool just for this reason. With Cloud BI (not just the tool) Business users get not only  the visualization and other product features but also the data which is not otherwise available. Customer analytics with social media analysis are available as  a third-party BI solution. In the case of value-added analytics there is business reason to go for these solutions.

 

Tool CapabilitiesBI_cap

Power users need ways to slice and dice the data, need integration of other non traditional sources (Excel, departmental cloud applications) to produce a combined analysis. Many BI tools comes with light weight integration (mostly push integration) to make this a reality without too much of IT bottleneck.

So if we can add new capability, without much delay and within departmental budget where is the rub?

The issue is not looking at the Enterprise Information in a holistic way. Though speed is critical, it is equally important to engage Governance and IT to secure the information and share appropriately to integrate into the Enterprise Data Asset.

As we move into the future of Cloud based solutions, we will be able to solve many of the bottlenecks, but we will also have to deal with security, compliance and risk mitigation management of leaving the data in the cloud. Forging a strategy to meet various BI demands of the enterprise with proper Governance will yield the optimum use of resources and /solution mix.