Perficient Enterprise Information Solutions Blog

Blog Categories

Subscribe via Email

Subscribe to RSS feed

Archives

Follow our Enterprise Information Technology board on Pinterest

Posts Tagged ‘bi’

DevOps Considerations for Big Data

Big Data is on everyone’s mind these days. Creating an analytical environment involving Big Data technologies is exciting and complex. New technology, new ways of looking at the data which is otherwise remained dark or not available. The exciting part of implementing the Big Data solution is to make it a production ready solution.

Once the enterprise comes to rely on the solution, dealing with typical production issues is a must. Expanding the data lakes and creating multiple applications accessing, changing and deploying new statistical learning solutions can hit the overall platform performance. In the end-user experience and trust will become an issue if the environment is not managed properly. Models which used to run in minutes may turn into hours and days based on the data changes and algorithm changes deployed. bigdata_1Having the right DevOps process framework is important to the success of Big Data solutions.

In many organizations the Data Scientist reports to the business and not to IT. Knowing the business and technological requirements and setting up the DevOps process is key to make the solutions production ready.

Key DevOps Measures for Big Data environment:

  • Data acquisition performance (ingestion to creating a useful data set)
  • Model execution performance (Analytics creation)
  • Modeling platform / Tool performance
  • Software change impacts (upgrades and patches)
  • Development to Production –  Deployment Performance (Application changes)
  • Service SLA Performance (incidents, outages)
  • Security robustness / compliance

 

One of the top key issue is Big Data security. How secured is the data and who has the access and the oversight of the data? Putting together a governance framework to manage the data is vital for the overall health and compliance of the Big Data solutions. Big Data is just getting the traction and much of best practices for Big Data DevOps scenarios yet to mature.

Cloud BI use cases

Cloud BI comes in different forms and shapes, ranging from just visualization to full-blown EDW combined with visualization and Predictive Analytics. The truth of the matter is every niche product vendor offers some unique feature which other product suite does not offer. In most case you almost always need more than one suite of BI to meet all the needs of the Enterprise.

De-centralization definitely helps the business in achieving agility and respond to the market challenges quickly. At the same token that is how companies may end up with silos of information across the enterprise.

Let us look at some scenarios where a cloud BI solution is very attractive to Departmental use.

time_2_mktTime to Market

Getting the business case built and approved for big CapEx projects is a time-consuming proposition. Wait times for HW/SW and IT involvement means lot longer delays in scheduling the project. Not to mention the push back to use the existing reports or wait for the next release which is allegedly around the corner forever.

 

deploymentDeployment Delays

Business users have immediate need for analysis and decision-making. Typical turnaround for IT to get new sources of data takes anywhere between 90 days to 180 days. This is absolutely the killer for the business which wants the data now for analysis. Spreadsheets are still the top BI tool just for this reason. With Cloud BI (not just the tool) Business users get not only  the visualization and other product features but also the data which is not otherwise available. Customer analytics with social media analysis are available as  a third-party BI solution. In the case of value-added analytics there is business reason to go for these solutions.

 

Tool CapabilitiesBI_cap

Power users need ways to slice and dice the data, need integration of other non traditional sources (Excel, departmental cloud applications) to produce a combined analysis. Many BI tools comes with light weight integration (mostly push integration) to make this a reality without too much of IT bottleneck.

So if we can add new capability, without much delay and within departmental budget where is the rub?

The issue is not looking at the Enterprise Information in a holistic way. Though speed is critical, it is equally important to engage Governance and IT to secure the information and share appropriately to integrate into the Enterprise Data Asset.

As we move into the future of Cloud based solutions, we will be able to solve many of the bottlenecks, but we will also have to deal with security, compliance and risk mitigation management of leaving the data in the cloud. Forging a strategy to meet various BI demands of the enterprise with proper Governance will yield the optimum use of resources and /solution mix.

Creating Transactional Searches with Splunk

Transactions refer to a “unit of work” or “grouped information” that someone is treating as a perhaps “logical” data point or singular target. Transactions are made up of multiple events or actions and, may mean something entirely different when looked at as a group than if examined one by one or each at a time.

Using either Splunk Web or its command line interface, you can search for and identify what it is referred t as related raw events” and group them into one single event”, which you can then denote as “a single Splunk transaction”.

These events can be linked together by fields they have in common. In addition, transactions can be saved as transactional types for later reuse.

Your Splunk transactions can include:

  • Different events from the same source/same host.
  • Different events from different sources/same host.
  • Similar events from different hosts/different sources.

Some Conceptual Examples

To help understand the power of Splunk transactional searches, let’s consider a few conceptual examples for its use:

  • A certain server error triggers several events to be logged
  • All events that occur within a precise time frame
  • Events that share the same host or cookie value
  • Password change attempts, that occurred near where there were unsuccessful logins.
  • All of the web addresses a particular IP address viewed, over a time range

To use Splunk transactions, you can either call a transaction type (that you configured via the Splunk configuration file: transactiontypes.conf), or define transaction constraints within your search (by setting the search options of the transaction command).

Here is the transaction command syntax:

transaction [<field-list>] [name=<transaction-name>] <txn_definition-opt>* <memcontrol-opt>* <rendering-opt>*

Splunk Transactions are made up of 2 key required arguments: a field name (or list of field names delimited by a comma) and your name for the transaction, and several other optional arguments.

Field Name/List

The field list will be a string value made up of 1 or more field names that you want Splunk to use the values of for grouping events into transactions.

Transaction Name

This will be the ID (name) that your transaction will be referred to or, the name of a transaction type from transactiontypes.conf.

Optional Arguments

If other configuration arguments (such as maxspan) are provided in your Splunk search, they overrule the values of that parameter that is specified in the transaction definition (within the transactiontypes.conf file). If those parameters are not specified in the file, Splunk will use the default value.

Here is an example

A simple example of a Splunk transaction might be to define a transaction that groups Cognos TM1 ERRORS that appear in a message log that have the same value for the field “date_month” (in other words errors that occur in the same month) and with a maximum span of 90 seconds into a transaction:

sourcetype=tm1* ERROR | transaction date_month maxspan=90s

pj

 

 

 

 

 

 

 

As always, never stop learning…

Sub-Searching – with Splunk

You’ll find that it is pretty typical to utilize the concept of sub-searching in Splunk.

A “sub search” is simply a “search within a search” or, a search that uses another search as an argument. Sub searches in Splunk must be contained in square brackets and are evaluated first by the Splunk interpreter.

Sub-Searching - with SplunkThink of a Sub search as being similar to a SQL subquery (a subquery is a SQL query nested inside a larger query).

Sub searches are mainly used for three purposes:

  • Parametrization (of a search, using the output of another search)
  • Appending (running a separate search, but stitching the output to the first search using the Splunk append command).
  • Conditions (To create a conditional search where you only see results of your search if it meets the criteria or perhaps threshold of the sub-search).

Normally, you’ll use a sub-search to take the results of one search and use them in another search all in a single Splunk search pipeline. Because of how this works, the second search must be able to accept arguments; such as with the append command (as I’ve already mentioned).

Parametrization

sourcetype=TM1* ERROR[search earliest=-30d | top limit=1 date_mday| fields + date_mday]

The above Splunk search utilizes a sub search as a parametrized search of all TM1 logs indexed within a Splunk instance that have “error” events. The sub search (enclosed in the square brackets) filters the search first to the past 30 days and then to the day which had the most events.

Appending

The Splunk appendcommand can be used to append the results of a sub-search to the results of a current search:

sourcetype=TM1* ERROR | stats dc(date_year), count by sourcetype | append [search sourcetype=TM1* | top 1 sourcetype by date_year]

The above Splunk search utilizes a sub search with an append command to combine 2 TM1 server log searches; these search though all indexed TM1 sources for “Error” events. The first search yields a count of events by TM1 source by year; the second (sub) search returns the top or (or most active) TM1 source by year. The results of the 2 searches are then appended.

 Conditional

sourcetype=access_* | stats dc(clientip), count by method | append [search sourcetype=access_* clientip where action = 'addtocart' by method]

The above Splunk search – which counts the number of different IP addresses which accessed a server and also finds the user who accessed the server the most for each type of page request (method) is modified with a “where clause” to limit the counts to only those that are “addtocart” actions. (In other words, which user added the most to his online shopping cart whether they actual purchased anything or not).

Output Settings for Sub-searches

When performing Splunk sub searches you will often utilize the format command. This command takes the results of a sub-search and formats them into a single result.

Depending upon the search pipeline, the results returned may be numerous, which will impact the performance of your search. To remedy this you can change the number of results that the format command operates over in-line with your search by appending the following to the end of your sub-search:

| format maxresults = <integer>.

I recommended that you take a very conservative approach and utilize the Splunk limits.conf file to enforce limits of all you rsub-searches.  This file exists in the $SPLUNK_HOME/etc/system/default/ folder (for global settings) or, for localized control, you may find (or create) a copy in $SPLUNK_HOME/etc/system/local/ folder.

The file controls all Splunk searches (providing it is coded correctly, based upon your environment) but also contains a section specific to Splunk sub-searches, titled “subsearch”.

Within this section, there are 3 important sub-sections:

  • maxout (this is the maximum number of results to return form a subsearch. The default is 100).
  • maxtime (this is the maximum number of seconds to run a subsearch before finalizing. Defaults to 60).
  • ttl (this is the time to cache a given subsearch’ s results (the default is 300).

Splunk On!

 

An Architectural Approach to Cognos TM1 Design

Overtime, I’ve written about keeping your TM1 model design “architecturally pure”. What this means is that you should strive to keep a models “areas of functionality” distinct within your design.

Common Components

I believe that all TM1 applications, for example, are made of only 4 distinct “areas of functionality”. They are absorption (of key information from external data sources), configuration (of assumptions about the absorbed data), calculation (where the specific “magic” happens; i.e. business logic is applied to the source data using the set assumptions) and consumption (of the information processed by the application and is ready to be reported on).

Some Advantages

Keeping functional areas distinct has many advantages:

  • Reduces complexity and increases sustainability within components
  • Reduces the possibility of one component negativity effecting another
  • Enables the probability of reuse of the particular (distinct) components
  • Promotes a technology independent design; meaning components can be built using the technology that best fits their particular objective
  • Allows components to be designed, developed and supported by independent groups
  • Diminishes duplication of code, logic, data, etc.
  • Etc.

Resist the Urge

There is always a tendency to “jump in” and “do it all” using a single tool or technology or, in the case of Cognos TM1, a few enormous cubes and today, with every release of software, there are new “package connectors” that allow you to directly connect (even external) system components. In addition, you may “understand the mechanics” of how a certain technology works which will allow you to “build” something, but without comprehensive knowledge of architectural concepts, you may end up with something that does not scale, has unacceptable performance or is costly to sustain.

Final Thoughts

Some final thoughts:

  • Try white boarding the functional areas before writing any code
  • Once you have your “like areas” defined, search for already existing components that may meet your requirements
  • If you do decide to “build new”, try to find other potential users for the new functionality. Could you partner and co-produce (and thus share the costs) a component that you both can use?
  • Before building a new component, “try out” different technologies. Which best serves the need of these components objectives? (A rule of thumb, if you can find more than 3 other technologies or tools that better fit your requirements than the technology you planned to use, you’re in trouble!).

And finally:

Always remember, just because you “can” doesn’t mean you “should”.

A Practice Vision

Vision

Most organizations today have had successes implementing technology and they are happy to tell you about it. From a tactical perspective, they understand how to install, configure and use whatever software you are interested in. They are “practitioners”. But, how may can bring a “strategic vision” to a project or to your organization in general?

An “enterprise” or “strategic” vision is based upon an “evolutionary roadmap” that starts with the initial “evaluation and implementation” (of a technology or tool), continues with “building and using” and finally (hopefully) to the organization, optimization and management of all of the earned knowledge (with the tool or technology). You should expect that whoever you partner with can explain what their practice vision or mythology is or, at least talk to the “phases” of the evolution process:

Evaluation and Implementation

The discovery and evaluation that takes place with any new tool or technology is the first phase of a practices evolution. A practice should be able to explain how testing is accomplished and what it covers How was it that they determined if the tool/technology to be used will meet or exceed your organization’s needs? Once a decision is made, are they practiced at the installation, configuration and everything that may be involved in deploying the new tool or technology for use?

Build, Use, Repeat

Once deployed, and “building and using” components with that tool or technology begin, the efficiency at which these components are developed as well as the level of quality of those developed components will depend upon the level of experience (with the technology) that a practice possess. Typically, “building and using” is repeated with each successful “build” so how many times has the practice successfully used this technology? By human nature, once a solution is “built” and seems correct and valuable, it will be saved and used again. Hopefully, this solution would have been shared as a “knowledge object” across the practice. Although most may actually reach this phase, it is not uncommon to find:

  • Objects with similar or duplicate functionality (they reinvented the wheel over and over).
  • Poor naming and filing of objects (no one but the creator knows it exists or perhaps what it does)
  • Objects not shared (objects visible only to specific groups or individuals, not the entire practice)
  • Objects that are obsolete or do not work properly or optimally are being used.
  • Etc.

Manage & Optimization

At some point, usually while (or after a certain number of) solutions have been developed, a practice will “mature its development or delivery process” to the point that it will begin investing time and perhaps dedicate resources to organize, manage and optimize its developed components (i.e. “organizational knowledge management”, sometimes known as IP or intellectual property).

You should expect a practice to have a recognized practice leader and a “governing committee” to help identify and manage knowledge developed by the practice and:

  • inventory and evaluate all known (and future) knowledge objects
  • establish appropriate naming standards and styles
  • establishing appropriate development and delivery standards
  • create, implement and enforce a formal testing strategy
  • continually develop “the vision” for the practice (and perhaps the industry)

 

More

As I’ve mentioned, a practice needs to take a strategic or enterprise approach to how it develops and delivers and to do this it must develop its “vision”. A vision will ensure that the practice is leveraging its resources (and methodologies) to achieve the highest rate of success today and over time. This is not simply “administrating the environment” or “managing the projects” but involves structured thought, best practices and continued commitment to evolved improvement. What is your vision?

Testing and Splunk

Tests are commonly categorized by where they are added in the software development process, or by how specific the test is.

Testing levels are classified by the test’s objectives. The common levels of testing are: unit, integration, system, acceptance, performance and regression.

Testing and SplunkUnit testing

Unit (or module) testing refers to testing that verifies a specific “section of code”, usually at a very basic level.

Integration

This type of testing focuses on validating the linkage between components within a solution.

Component interface

This kind of testing emphases the information that is passed between components in a solution (not to be confused with integration testing that is focused on the actual component linkage

System

System testing (often referred to as your “end-to-end” testing) refers to a completely integrated solution test – verifying that the solution really does meet requirements.

Acceptance

Acceptance testing is the (perhaps) last step, phase or “level” in your testing effort. This is when your solution is actually “distributed” to your user community’s for (hopefully) acceptance.

Performance

The objective of performance testing is to focus on determining what level that a particular component or entire solution will perform given expected workloads.

Splunk and its amazing “Performance Testing Kit”

“True to form”, Splunk offers a downloadable “test kit” to help with your Splunk performance testing and tuning. “Splunkit” is an extendable app that streamlines an organizations practice of performance testing by:

  • Automatically generating test data
  • Creating “patterned searches” that simulate a Splunk user running command line searches
  • Producing sets of benchmark measurements informational statistics

Splunkit is configurable – you can set the speed at which data is generated or use your own custom data. You can also set the number of simulated users and thier specific usage patterns. Splunkit will work without a complicated setup – even with complicated directory structures or deployment configurations.

IBM OpenPages GRC Platform –modular methodology

The OpenPages GRC platform includes 5 main “operational modules”. These modules are each designed to address specific organizational needs around Governance, Risk, and Compliance.

Operational Risk Management module “ORM”

IBM OpenPages GRC Platform - modular methodologyThe Operational Risk Management module is a document and process management tool which includes a monitoring and decision support system enabling an organization to analyze, manage, and mitigate risk simply and efficiently. The module automates the process of identifying, measuring, and monitoring operational risk by combining all risk data (such as risk and control self-assessments, loss events, scenario analysis, external losses, and key risk indicators (KRI)), into a single place.

Financial Controls Management module “FCM”

The Financial Controls Management module reduces time and resource costs associated with compliance for financial reporting regulations. This module combines document and process management with awesome interactive reporting capabilities in a flexible, adaptable easy-to-use environment, enabling users to easily perform all the necessary activities for complying with financial reporting regulations.

Policy and Compliance Management module “PCM”

The Policy and Compliance Management module is an enterprise-level compliance management solution that reduces the cost and complexity of compliance with multiple regulatory mandates and corporate policies. This model enables companies to manage and monitor compliance activities through a full set of integrated functionality:

  • Regulatory Libraries & Change Management
  • Risk & Control Assessments
  • Policy Management, including Policy Creation, Review & Approval and Policy Awareness
  • Control Testing & Issue Remediation
  • Regulator Interaction Management
  • Incident Tracking
  • Key Performance Indicators
  • Reporting, monitoring, and analytics

IBM OpenPages IT Governance module “ITG”

This module aligns IT services, risks, and policies with corporate business initiatives, strategies, and operational standards. Allowing the management of internal IT control and risk according to the business processes they support. In addition, this module unites “silos” of IT risk and compliance delivering visibility, better decision support, and ultimately enhanced performance.

IBM OpenPages Internal Audit Management module “IAM”

This module provides internal auditors with a view into an organizations governance, risk, and compliance, affording the chance to supplement and coexist with broader risk and compliance management activities throughout the organization.

One Solution

The IBM OpenPages GRC Platform Modules Object Model (“ORM”, “FCM”, “PCM”, “ITG” an “IAM”) interactively deliver a superior solution for Governance, Risk, and Compliance. More to come!

The installation Process – IBM OpenPages GRC Platform

When preparing to deploy the OpenPages platform, you’ll need to follow these steps:

  1. Determine which server environment you will deploy to – Windows or AIX.
  2. Determine your topology – how many servers will you include as part of the environment? Multiple application servers? 1 or more reporting servers?
  3. Perform the installation of the OpenPages prerequisite software for the chosen environment -and for each server’s designed purpose (database, application or reporting).
  4. Perform the OpenPages installation, being conscious of the software that is installed as part of that process.

Topology

Depending upon your needs, you may find that you’ll want to use separate servers for your application, database and reporting servers. In addition, you may want to add additional application or reporting servers to your topology.

 

 

topo

 

 

 

 

 

 

 

 

 

 

 

 

After the topology is determined you can use the following information to prepare your environment. I recommend clean installs (meaning starting with fresh or new machines and VM’s are just fine (“The VMWare performance on a virtualized system is comparable to native hardware. You can use the OpenPages hardware requirements for sizing VM environments” – IBM).

(Note – this is if you’ve chosen to go Oracle rather than DB2):

MS Windows Severs

All servers that will be part of the OpenPages environment must have the following installed before proceeding:

  • Microsoft Windows Server 2008 R2 and later Service Packs (64-bit operating system)
  • Microsoft Internet Explorer 7.0 (or 8.0 in Compatibility View mode)
  • A file compression utility, such as WinZip
  • A PDF reader (such as Adobe Acrobat)

The Database Server

In addition to the above “all servers” software, your database server will require the following software:

  • Oracle 11gR2 (11.2.0.1) and any higher Patch Set – the minimum requirement is Oracle 11.2.0.1 October 2010 Critical Patch Update.

Application Server(s)

Again, in addition to the above “all servers” software, the server that hosts the OpenPages application modules should have the following software installed:

  • JDK 1.6 or greater, 64-bit Note: This is a prerequisite only if your OpenPages product does not include WebLogic Server.
  • Application Server Software (one of the following two options)

o   IBM Websphere Application Server ND 7.0.0.13 and any higher Fix Pack Note: Minimum requirement is Websphere 7.0.0.13.

o   Oracle WebLogic Server 10.3.2 and any higher Patch Set Note: Minimum requirement is Oracle WebLogic Server 10.3.2. This is a prerequisite only if your OpenPages product does not include Oracle WebLogic Server.

  • Oracle Database Client 11gR2 (11.2.0.1) and any higher Patch Set

Reporting Server(s)

The server that you intend to host the OpenPages CommandCenter must have the following software installed (in addition to the above “all servers” software):

  • Microsoft Internet Information Services (IIS) 7.0 or Apache HTTP Server 2.2.14 or greater
  • Oracle Database Client 11g R2 (11.2.0.1) and any higher Patch Set

During the OpenPages Installation Process

As part of the OpenPages installation, the following is installed automatically:

 

For Oracle WebLogic Server & IBM WebSphere Application Server environments:

  • The OpenPages application
  • Fujitsu Interstage Business Process Manager (BPM) 10.1
  • IBM Cognos 10.2
  • OpenPages CommandCenter
  • JRE 1.6 or greater

If your OpenPages product includes the Oracle WebLogic Server:

  • Oracle WebLogic Server 10.3.2

If your OpenPages product includes the Oracle Database:

  • Oracle Database Server Oracle 11G Release 2 (11.2.0.1) Standard Edition with October 2010 CPU Patch (on a database server system)
  • Oracle Database Client 11g Release 2 (11.2.0.1) with October 2010 CPU Patch applied 64-bit (on an application server system)
  • Oracle Database Client 11g Release 2 (11.2.0.1) with October 2010 CPU Patch applied 32-bit (on a reporting server system)

 Thanks!

IBM OpenPages Start-up

In the beginning…

OpenPages was a company “born” in Massachusetts, providing Governance, Risk, and Compliancesoftware and services to customers. Founded in 1996, OpenPages had more than 200 customers worldwide including Barclays, Duke Energy, and TIAA-CREF. On October 21, 2010, OpenPages was officially acquired by IBM:

http://www-03.ibm.com/press/us/en/pressrelease/32808.wss

IBM OpenPages Start-upWhat is it?

OpenPages provides a technology driven way of understanding the full scope of risk an organization faces. In most cases, there is extreme fragmentation of a company’s risk information – like data collected and maintained in numerous disparate spreadsheets – making aggregation of the risks faced by a company extremely difficult and unmanageable.

Key Features

IBM’s OpenPages GRC Platform can help by providing many capabilities to simplify and centralize compliance and risk management activities. The key features include:

  • Provides a shared content repository that can (logically) present the processes, risks and controls in many-to-many and shared relationships.
  • Supports the import of corporate data and maintains an audit trail ensuring consistent regulatory enforcement and monitoring across multiple regulations.
  • Supports dynamic decision making with its CommandCenter interface, which provides interactive, real-time executive dashboards and reports with drill-down.
  • Is simple to configure and localize with detailed user-specific tasks and actions accessible from a personal browser based home page.
  • Provides for Automation of Workflow for management assessment, process design reviews, control testing, issue remediation and sign-offs and certifications.
  • Utilizes Web Services for Integration. OpenPages utilizes OpenAccess API Interoperate with leading third-party applications to enhance policies and procedures with actual business data.

Understanding the Topology

The OpenPages GRC Platform consists of the following 3 components:

  • 1 database server
  • 1 or more application servers
  • 1 or more reporting servers

Database Server

The database is the centralized repository for metadata, (versions of) application data, and access control. OpenPages requires a set of database users and a tablespace (referred to as the “OpenPages database schema”). These database components install automatically during the OpenPages application installation, configuring all of the required elements. You can use either Oracle or DB2 for your OpenPages GRC Platform repository.

 Application Server(s)

The application server is required to host the OpenPages applications. The application server runs the application modules, and includes the definition and administration of business metadata, UI views, user profiles, and user authorization.

 Reporting Server

The OpenPages CommandCenter is installed on the same computer as IBM Cognos BI and acts as the reporting server.

Next Steps

An excellent next step would be to visit the ibm site and review the available slides and whitepapers. After that, keep tuned to this blog!