Perficient Enterprise Information Solutions Blog

Blog Categories

Archives

Facebook Faces-off with YouTube

Yesterday, Facebook announced a News Feed algorithm change that will show members more videos similar to ones they “expand to full-screen, un-mute or opt to watch in HD”, even if they don’t Like, share or comment. Those same indicators will tell Facebook that a video is enjoyable so the Feed shows it to more people.

Now that is Big Data on viewership, leveraging insights from viewer behavior patterns, and it could definitely give Facebook an edge on YouTube, the leading internet video giant.

Unlike Facebook, YouTube is not a feed. Many of Facebook’s 1 billion+ users come daily, sometimes even hourly, to view the latest from their friends. As a matter of fact, every clip you see is essentially a recommendation from someone you are connected to and Like, and since the feed is so popular, it knows you will at some point see these videos. All it has to do is figure out what you want to see! The latter, in turn, is getting a lot easier with big data-driven personalization engines.

Facebook can process highly specific data on viewer behavior to learn what each individual member wants to see. How many seconds did people watch the video for? Was it visually stimulating? Multiply that by 4 billion videos per day in an IoT world where Facebook knows that a video popular with some people will probably be interesting to people similar to them based on all the biographical and behavior data it has.

So, now all of a sudden, it becomes quite clear why and how data-driven personalization could lead to a Feed full of the exact videos you may want to see.

These and other similar enhancements will be made over a period of time, and it sure remains to be seen if Facebook is trying to usher us into an era of diminished Tube popularity, or simply making it fun for its members.

Posted in Big Data, News

Cassandra NoSQL Data Modeling Snip-pet

Data modeling in Cassandra is a little tricky and requires a combination of science and art.  Think of the Cassandra column family as a map of a map: an outer map keyed by a row key, and an inner map keyed by a column key. Both maps are sorted. To maximize Cassandra’s capabilities and for long term maintenance need’s, it’s better to analyze, know and follow certain high level rules while implementing Cassandra.

A few things to consider while implementing Cassandra:

  • Column based
  • Cluster
  • Nodes
  • Duplicated data
  • Distributed data platform
  • Performance should scale linearly when more nodes are added to the cluster
  • Writes in Cassandra is cheaper than reads and less problematic.
  • Denormalization and duplication are encouraged in Cassandra. Efficiency in Cassandra is partly because of data duplication
  • Forget about what you know about Joins in RDBMS because there’s no Joins in Cassandra

In Cassandra, you have clusters and nodes, you want to make sure that during write’s, data is written to all cluster nodes evenly. Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. To increase read efficiency make sure that data is read from as few nodes as possible.

A Cassandra stress test below with Consistency level set to ALL and ONE proves why it’s better to read from as few nodes as possible

Command:

  • Cassandra-stress read n= 2000 cl=ALL no-warmup –rate threads=1

Result:

Stress Test Cassandra

Command:

  • Cassandra-stress read n= 2000 cl=ONE no-warmup –rate threads=1

Result:

StressTest2

Isolate Clusters by functional areas and criticality. Use cases with similar criticality from the same functional area share the cluster and reside in different Keyspaces(Database). Determine Queries and Build Model based on those queries. Design and think about query pattern up front and design column families also ahead. Another reason why this rule should be followed is that unlike relational database, it’s not easy to tune or introduce new query patterns in Cassandra. In other words, you can’t just introduce or add a complex SQL (TSQL, PLSQL etc.) or Secondary Indexes to Cassandra because of it highly distributed nature.

On the high level bases, Below are some of the things you need to do to determine with your query pattern:

  • Enforcing uniqueness in the result set
  • Filtering based on some set of conditions
  • Ordering by an attribute
  • Grouping by an attribute
  • Identify the most frequently used query pattern
  • Identify queries that are sensitive to latency

Create your queries to read from one partition. Keep in mind that your data is replicated to multiple nodes and so you can create individual queries that reads from different partition. When you query reads from multiple nodes, It has to go to each individual nodes and get the data and this takes time. But when it gets the data from one node it saves time.

An example would be the create Table below:

CREATE TABLE users_by_email( Name VARCHAR, Dob TIMESTAMP, Email VARCHAR, Join_date TIMESTAMP, PRIMARY KEY (email));

CREATE TABLE users_by_join_date( Name VARCHAR, Dob TIMESTAMP, Email VARCHAR, Join_date TIMESTAMP, PRIMARY KEY (join_date,email));

The above creates tables that enables you to read from one partition and basically, each user gets their own partition.

If you are trying to fit a group into a partition, you can use a compound PRIMARY KEY for this example:

CREATE TABLE groups (groupname text, username text, email text, join_date int, PRIMARY KEY (groupname, username)).

 

 

 

NoSQL NoSecuity – Security issues with NoSQL Database

As More companies’ debate on adopting a Big Data Solution. Some of the discussion that comes across is whether to use Hadoop or Spark, NoSQL database or continue using their current RDBMS. The ultimate question is “is this technology for us?”  NoSQL database are highly scalable, provide better performance, designed to store and process a significant amount of unstructured data at a speed 10 times faster than RDBMS, high availability and strong fail over capabilities.   So why hesitate to use a NoSQL database.

Security is a major concern for IT Enterprise Infrastructures. Security in NoSQL databases is very weak, Authentication and Encryption is almost nonexistence or is very weak when implemented. The following are security issues associated with NoSQL databases:

  • Administrative user or authentication is not enabled by default.
  • It has a very weak password storage
  • Client communicates with server via plaintext(MongoDB)
  • Cannot use external encryption tools like LDAP, Kerberos etc
  • Lack of encryption support for the data files
  • Weak authentication both between client and the servers
  • Vulnerability to SQL injection
  • Denial of service attacks.
  • Data at rest is Unencrypted.
  • The Available encryption solution isn’t production ready
  • Encryption isn’t available for client communication.

With all this security problems, it best to understand that NoSQL databases are still new technologies and more security enhancements will be added to newer version. Enterprise package Cassandra tools provided by companies like Datastax does have more security enhancements and hence is more secure and provide companies with all the security needed.

Datastax enterprise provides:

  • Client to node encryptions for Cassandra which includes an optional, secure form of communication from client machine to database cluster. Client to server SSL. This ensures that data is not compromised inflight.
  • Administrators can create, drop and alter internal users using CQL that are authenticated to Cassandra database cluster
  • Permissions can be granted to user to perform certain task after their initial authentication
  • JMX authentication can be enabled and tools such as nodetool and Datastax OpsCenter can be configured to use this authentication
  • Ability Configure and use external Security tools like Kerberos
  • Provides a Transparent data encryption (TDE) to help protect at rest data. (At rest data is data that has been flushed from the memtable in system memory to the SSTables on disk)

Hadoop, Spark, Cassandra, Oh My!

On Monday, I reviewed why Spark will not by itself replace Hadoop, but Spark combined with other data storage and resource management technologies creates other options for managing Big Data.  Today, in this blog post we will investigate how an enterprise should proceed in this new, “Hadoop is not the only option” world. Big Decisions

Hadoop, Spark, Cassandra, Oh My!  Open source Hadoop and NoSQL are moving pretty fast. No wonder some companies might feel like Dorothy in the Land of Oz.  To make sense and things and find the yellow brick road, first we need to understand Hadoop’s market position.
HDFS and YARN can support the storage of just about any type of data.  With Microsoft’s help, now it is possible to run Hadoop on Windows and leverage .NET.  Of course most are running on Linux and Java.   HBASE, Hive, and Cassandra all can run on top of Hadoop.   Hadoop and HIVE support is quickly becoming ubiquitous across data discovery, BI, and analytics tools sets.  Hadoop is maturing fast from the security perspective.  Thanks to HortonWorks, Apache Knox and Ranger have delivered enterprise security capabilities.   Cloudera and IBM both have their own stories on enterprise security as well.   WANdisco provides robust multi-site data replication and state of the art data protection.  The bottom line is that Hadoop has and is continuing to mature AND there is an extensive amount of support from most vendors and related Apache open-source projects.   Read the rest of this post »

Spark Gathers More Momentum

Yesterday, IBM threw its weight behind Spark. This announcement is significant because it is a leading indicator of a transition frspark-logoom IT-focused Big Data efforts to business-driven analytics and Big Data investments. If you are interested in learning more about this announcement and what it means in the bigger picture, I wrote a blog entry on our IBM blog, which can be found here.

Will Spark Replace Hadoop?

I have seen a number of articles asking the question of whether Apache Spark will replace Hadoop.   This is the wrong questionSpark does not equal hadoop.jpg!  It is like asking if your your DVD player will replace your entire home theater system, which is pretty absurd.  Just like a home theatre system has many components, a TV or Projector, a Receiver, DVD player, and speakers, Hadoop has different components.   Specifically, Hadoop has three main components:

  1. File System – Hadoop File System (HDFS) – the scalable and reliable information storage layer
  2. Resource Management – YARN – the resource manager which manages processing across the cluster (eg. HDFS).
  3. Cluster Computing Framework – Originally this was MAPREDUCE now there are other options like TEZ , STORM and SPARK.

Read the rest of this post »

3 “Cool” Big Data Vendors

modern computer technology in businessGartner recently published its “Cool” Vendors in Big Data for 2015 list. To qualify, “vendors must meaningfully combine multiple types of functionality”. These companies were selected based on their ability to effectively address complicated challenges of this space such as big data storage in the cloud or big data privacy.

Now, let’s look at the vendors that made the list:

Cool Vendor 1: Altiscale addresses the complications of moving data between the cloud and on-premise systems. This is something that most IoT organizations have been looking for. Further, providing easy access to Data scientists and data warehousing specialists as a means to simplify their data environments is Altiscale’s promise. Those seeking to create agile solutions for analytics, especially with regard to big data infrastructure should definitely check out Altiscale.

Cool vendor 2: Indyco offers a solution that enables highly collaborative data warehouse design. “What is cool about Indyco is the link to graphical representation of the business process model through its indyco explorer tool, and that it encapsulates the expertise of DW design best practices,” says Gartner.

Cool Vendor 3: Platfora runs natively on Hadoop, and enables interactive visualizations. It is not surprising that Platfora has planned for Spark extensibility, which will “allow data science users to leverage Spark’s machine learning and advanced analytics libraries in its data processing pipelines.”

As innovation in the Big Data space continues, it will be interesting to see how these three vendors are able to change our ever-evolving Big Data landscape.

Posted in Big Data, News

IBM Vision 2015

Last week, I had the pleasure of attending IBM Vision 2015 at the Hilton Bonnet Creek in Florida. Perficient was a sponsor of the event. This was my first time attending IBM Vision, I have been to a number of the other IBM conferences, and I came away impressed. Unlike massive events such as Insight and IOD, Vision is quite small and intimate. At Vision, it is easy to interact with attendees and presenters and to get around to see all the exhibitors. It was great to be able to check back in with people to see what sessions they had enjoyed. This year, IBM Vision had five tracks into which the material was collected:

  • Financial and operational performance management
  • Sales performance management (SPM)
  • Financial close and disclosure management
  • Governance, risk, and compliance
  • Cloud based solutions for business insight

I chose to concentrate on presentations in the Sales Performance Management track.

Read the rest of this post »

Implementing Cognos ICM at Perficient

The Bait by nist6dh, on Flickr
Creative Commons Creative Commons Attribution-Share Alike 2.0 Generic License   by  nist6dh 

Defining the Problem

For any growing organization, with a good size sales team compensated through incentives for deals and revenue, calculating payments becomes a bigger and bigger challenge. Like many organizations, Perficient handled this problem with Excel spreadsheets, long-hours, and Excedrin. Our sales team is close to a hundred strong and growing 10% each year. To help reward activities aligned to our business goals and spur sales that move the company in its strategic direction, the Perficient sales plans are becoming more granular and targeted. Our propensity to acquire new companies jolts the sales teams size and introduces new plans, products, customers, and territories. With Excel, it is almost impossible, without a Herculean effort, to identify whether prior plan changes had the desired effect or what plan changes might cost. With, literally, hundreds of spreadsheets being produced each month the opportunity to introduce errors is significant. Consequently, executives, general managers, sales directors, business developers, and accountants spend hundreds if not thousands of hours each month validating, checking, and correcting problems. The risks involved in using Excel are significant, with an increased likelihood of rising costs for no benefit, and limited ability to model alternative compensation scenarios

Choosing Cognos Incentive Compensation Management (ICM)

While there are many tools on the market, the choice to use Cognos ICM was relatively simple. Once we had outlined the benefits and capabilities of the tool, our executive team was onboard.

Cognos ICM is a proven tool, having been around for a number of years. Cognos ICM was formerly known as Varicent, before Varicent’s acquisition by IBM. The features of the tool that really make sense for Perficient are numerous. The calculation engine is fast and flexible allowing any type of complexity and exception to be handled with ease, and for reports and commission statements to be opened virtually instantaneously. The data handling and integration capabilities are excellent, allowing the use of virtually any type of data from any system. In our case, we are consuming data from our ERP, CRM, and HR systems, along with many other files and spreadsheets. Cognos ICM’s hierarchy management capabilities, allow us to manage sales team, reporting, and approval hierarchies with ease. User and payee management permissions with security comes bundled with the tool and will allow integration with external authentication tools. From a process point of view, workflow and scheduling are built in and can be leveraged to simplify the administration of the incentive compensation calculation and payment processes. Finally, the audit module tracks everything that is going on in the system from user activity, to process and calculation timing, to errors that occur.

Perficient is one of a few elite IBM Premier Business Partners. As the Business Analytics business unit within Perficient, we have a history of not only implementing IBM’s Business Analytics tools for our clients but also ourselves. We have implemented Cognos TM1 as a time and expense management system from which we could generate invoices, feed payroll, and pay expenses directly. We use Cognos Business Intelligence (BI) to generate utilization and bonus tracking reports for our consultants. We feel it essential that we not only implement solutions for our clients but to eat our own dog food, if you will.

Implementation and Timeline

Once we made the decision to implement and the budget had been approved, we decided on a waterfall-based lifecycle to drive the project. The reason for this selection has to do with our implementation team’s availability. As a consulting organization, the need to pull consultants into client engagements is absolute. We are also geographically dispersed so co-location with the business users was not an option. Having discrete phases, which could be handed from resource to resource was a must. As is typical with most waterfall projects, we implemented Cognos ICM in four major phases: requirements, design, development, and testing.

During the requirements phase, we broke down what we did today and layered that with what we wanted to do tomorrow. The output of the requirements phase was the Requirements Document with narrative and matrix style requirements.

Our design approach was to use the Cognos ICM Attributes Approach best practices developed by IBM. Rather than blindly following IBM’s prescribed methodology, we adopted the components that fit and discarded those that did not. The output of our design phase was a detailed design document that was ready for direct implementation in the tool.

The development phase had three distinct flavors. Data integration, where we sourced, prepared, and loaded the incoming data. Our goal was to load as much data as possible without forcing manual intervention. The calculation development segment, where we developed the calculations for hierarchies, quota, crediting, attainment, and payout. This is where the ICM logic resides and feeds into the compensation statements and reports. The last component was reporting. This included the development of the commission statements, analytical reports, and the file sent to payroll.

The testing phase had two components, one of system testing and one of user acceptance and parallel testing. Today we are in the midst of the parallel testing, ensuring that we mirror the current statements or know exactly why we have differences.

Already, we are defining enhancements and future uses of the system. We need new reports to support detailed review of compensation statements and to analyze the success of programs. We have new plans for different types of business developers and others in the organization with incentive compensation. We have new data sources to be integrated to allow prospective and booked projects to feature into the report set.

Our goal, at the outset, was to get to parallel testing in three months assuming that our resources were available full-time. Starting at the end of January and being in parallel test today got us close. We lost out because client engagements took two of our resources; one part-time and one full-time. Targeting 90 days for an initial implementation is quite feasible.

Team

The most important people on our team were the accountants and sales plan designers. They are the ones who know the ins and outs of the current plan and all the exceptions that apply. Going forward, they are the people who will continue to administer the plans and system. We also identified a secondary group of VIPs; business developers, managers, and executives to be involved as they are on the sharp end of the ICM system.

Our implementation team consisted of three to four resources. A solution architect who drove the design and calculation development. A developer who was responsible for data integration and report development. A business analyst for requirements gathering and system testing. A project manager who also moonlit as a business analyst.

Benefits

We expect to receive many benefits from implementing Cognos ICM. We expect that the accuracy and consistency of our compensation statements to improve. Accenture, Deloitte, and Gartner estimate that variable compensation overpayments range from 2% to 8%. A company with $30M in incentive compensation will overpay between $600,000 and $2,400,000 every year. During the development process we identified issues with the current commission statements that needed correction.

Using Cognos ICM will improve incentive compensation visibility and transparency. Our business developers can review their commission statements throughout the month to ensure they are credited for the correct transactions. They can quickly identify where they stand in terms of accounts receivable, for which they are penalized. The sales managers can see how their teams are doing and who needs assistance. Our management team can perform what-if analyses to understand plan changes

Amongst the biggest benefits across the board will be time. Our Business Developers and General Managers can reduce their shadow accounting time. Our accounting team can reduce the amount of time they spend on data integration and cleanup, manually generating compensation statements, along with the amount of time they spend resolving errors and issues.

Challenges

Going into this we knew one of the problems we would face is having resources available. For a consulting company like Perficient this is a great problem to have, our Cognos ICM resources are engaged on client implementation projects. It is always said that the Cobbler’s children have no shoes.

The second challenge of implementing Cognos ICM is exceptions. For the most part, implementing an incentive compensation solution is simple and the project sponsors will express a desire for it to be simpler. Then all the exceptions will come to light that need to be handled. We found a number of exceptions after beginning the project, but because of the power of Cognos ICM we were able to handle them and reduce the manual changes the accounting team needed to make.

The other challenge we faced was the data. The data coming out of our systems supports its original purpose but is often lacking for other uses. We needed to integrate and cleanse the data, all processes the accounting team had done manually, in order to have it flow through the ICM system. As we used the Cloud version of Cognos ICM, we leveraged staging and intra-system imports to smooth the integration process.

Finding Out More

Perficient will have a booth at the IBM Vision 2015 conference, which will feature Cognos ICM heavily. I will be there and look forward to meeting with you if you plan on attending. If you’re at the event, stop by and chat for a while. You can also leave me a comment in the box below. I look forward to hearing from you.

Data & Security on Top of the Mind of CIOs

shutterstock_103378880

Of the 10 top concerns of CIOs and CTOs, as reported in Janco Associates Annual Review, Consolidation of Legacy Data and Big Data both show up in the Top 5, and have moved up substantially from prior years’ surveys. Furthermore, In Forbes’ Top 10 Strategic CIO Issues For 2015, “Drive Customer-Centric Innovation Throughout Your Organization”, comes in at #1.

This shows that CIOs and CTOs are becoming increasingly aware that they are in the hot seat for fixing their data mess. This is also a growing justification for introducing the Chief Data Officer role, most of the times, reporting directly to the CEO. The steady increase in concern also points to the urgency around becoming data-driven organizations in order to effectively support Business innovation and corporate objectives that are tied directly to the bottom-line.

If you look at the recent Security breaches at Sony, and elsewhere, it is clear that data and security are intertwined issues, and big impediments to digital business. Consequently, we also see the injection of predictive analytics in this discussion. For any real transformation to take place, especially around customer-centricity, organizations first need to become data-driven and must focus on addressing data, holistically, from a Process, People and Platform standpoint.

Posted in News