Content Solutions Articles / Blogs / Perficient https://blogs.perficient.com/tag/content-solutions/ Expert Digital Insights Tue, 28 Sep 2021 19:18:22 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png Content Solutions Articles / Blogs / Perficient https://blogs.perficient.com/tag/content-solutions/ 32 32 30508587 IT (Operational) Vs. BT (Business Technology) Investments https://blogs.perficient.com/2015/02/28/it-operational-vs-bt-business-technology-investments/ https://blogs.perficient.com/2015/02/28/it-operational-vs-bt-business-technology-investments/#respond Sat, 28 Feb 2015 22:15:08 +0000 http://blogs.perficient.com/dataanalytics/?p=5562

IT spending is primarily focused on technologies to run the business primarily operations. With new ways of doing business, technology platforms decide the winners and losers. Typical brick and mortar versus online stores. If you look at the CIO’s budget, more than 70% goes to operational systems, infrastructure and keeping-the-lights-on-type-of applications, and the rest is spent on customer-facing applications/systems.

With Digital Transformations happening at many enterprises, the shift in IT budget is also tracking the trend. Customer experience is one of the key strategies for successful companies. With smartphones and tools for accessing information, customers are one step ahead of the traditional organizations. Investing in new technologies like Big Data, Fast Analytics and pro-active customer experience strategies through converging technologies are not just futuristic but has to be fully functional now.

WSJ_01

CIOs are looking for ways to invest in new technologies for enhancing customer experience and leveraging the data (internal and external) to  accurately deliver customer experience not just operational systems. As more and more CIOs get invited to the business leadership table, business technology investment becomes a strategic asset to manage, leverage and deliver greater customer experience. (see spending shift in CIOs face the “Age of the Customer” ).


Connect with us on LinkedIn here

]]>
https://blogs.perficient.com/2015/02/28/it-operational-vs-bt-business-technology-investments/feed/ 0 200103
Three Big Data Business Case Mistakes https://blogs.perficient.com/2014/11/04/three-big-data-business-case-mistakes/ https://blogs.perficient.com/2014/11/04/three-big-data-business-case-mistakes/#respond Tue, 04 Nov 2014 13:25:39 +0000 http://blogs.perficient.com/dataanalytics/?p=5123

Tomorrow I will be giving a webinar on creating business cases for Big Data. One of the reasons for the webinar was that there is very little information available on creating a Big Data business cases. Most of what is available boils down to a “trust me, Big Data will be of value.” Most information available on the internet basically states:

More information, loaded into a central Hadoop repository, will enable better analytics, thus making our company more profitable.  

Although logically, this statement seems true and most analytical companies have accepted the above statement, it illustrates the 3 most common mistakes we see in creating a business case for Big Data.

The first mistake, is not directly linking the business case to the corporate strategy. The corporate strategy is the overall approach the company is taking to create shareholder value.   By linking the business case to the objectives in the corporate strategy, one will be able to illustrate the strategic nature of Big Data and how the initiative will support the overall company goals.

The second mistake is not quantifying the benefits of Big Data analytics.   Yes, quantification in a business case is not an exact science, but it illustrates the overall benefit of the program in objectives that are important to executives – financial terms.  Big Data projects cost millions of dollars, should we illustrate to our executives that there will be a return on this investment?   Quantifying the financial benefits will provide the needed support that Big Data is a wise investment.

The third mistake is not addressing the “who.”   That is who will be using the system to get arrive at the benefit.   In most cases, this is the data scientist.   However, most organizations are facing a shortage of data scientists. Not addressing this staffing concern, which your CIO will definitely be aware, can sink a good business case early in the approval process.

In my webinar tomorrow, Creating a Business Case for Big Data, we will show how to address these mistakes and create a rock solid business case for Big Data Analytics.   See you there!

 

[pardot-form id=”33862″ title=”EIS: 2014-11-5: Webinar BLOG form; Creating a Business Case for Big Data”]

]]>
https://blogs.perficient.com/2014/11/04/three-big-data-business-case-mistakes/feed/ 0 200076
The Best Way to Limit the Value of Big Data https://blogs.perficient.com/2014/10/06/the-best-way-to-limit-the-value-of-big-data/ https://blogs.perficient.com/2014/10/06/the-best-way-to-limit-the-value-of-big-data/#comments Mon, 06 Oct 2014 15:23:10 +0000 http://blogs.perficient.com/dataanalytics/?p=5058

A few years back I worked for a client that was implementing cell level security on every data structure within their data warehouse. They had nearly 1,000 tables and 200,000 columns — yikes! Talking about administrative overhead. The logic was that data access should only be given on a need-to-know basis. The idea would be that users would have to request access to certain tables and columns.

Big DataNeed-to-know is a term frequently used in military and government institutions that refers to granting access to sensitive information to cleared individuals. This is a good concept, but the key here is the part about “granting access to SENSITIVE data.” The key is that the information has to be classified first, then need-to-know (for cleared individuals) is applied.

Most government documents are not sensitive. This allows the administrative resources to focus on the sensitive, classified information. The system for classifying information as Top Secret, Secret, and Confidential, has relatively stringent rules for, but also discourages the over classification of information. This is because when a document is classified, its use becomes limited.

This same phenomenon is true in the corporate world. The more a set of data is locked down, the less it will be used. Unnecessary limiting an information’s workers access to data obviously does not help the overall objectives of the organization. Big Data just magnifies this dynamic and unnecessarily restricting access to Big Data is the best way to limit its value. Unreasonably lock down Big Data, its value will be severely limited.

Now this is not to say, certain data should not be restricted. Social Security Numbers (SSN), HIPPA governed data elements, and Account numbers are a few examples. We do need solutions to restrict access to this critical information but that systems should restrict escalating those controls to information that should not be as tightly controlled.

A Classify, Separate, and Secure strategy is quite effective for securing only critical data elements. Classify information, if possible at the field/column) level, using specific, consistent, guidelines that do not unnecessarily restrict information. When we load information into a data reservoir (or data lake), we Separate sensitive information from unrestricted information. This should be executed at the column level in tables. For example, if a table has field containing SSNs, physically separate this into another table. Masking may also be appropriate, and depending on the other data elements, we may want to not the sensitive data columns into our cluster. This prevents the security escalation effect that happens when we classify a table as sensitive because of just one column of sensitive data. Lastly, we Secure the sensitive information. This may be in another directory or system (like Apache Accumolo). The objective is focus our efforts into locking down the secure information and minimizing the administrative overhead.

]]>
https://blogs.perficient.com/2014/10/06/the-best-way-to-limit-the-value-of-big-data/feed/ 1 200067
KScope14 Session: Empower Mobile Restaurant Operations Analytics https://blogs.perficient.com/2014/06/24/kscope14-session-empower-mobile-restaurant-operations-analytics/ https://blogs.perficient.com/2014/06/24/kscope14-session-empower-mobile-restaurant-operations-analytics/#respond Tue, 24 Jun 2014 18:29:24 +0000 http://blogs.perficient.com/dataanalytics/?p=4673

Perficient is exhibiting and presenting this week at KScope14 in Seattle, WA. On Monday, June 23 I presented my retail-focused solution offering built upon the success of Perficient’s Retail Pathways, but using the Oracle suite of products. In order to focus the discussion to fit within a one hour window I chose restaurant operations to represent the solution.

Here is the abstract for my presentation.

Multi-unit, multi-concept restaurant companies face challenging reporting requirements. How should they compare promotion, holiday, and labor performance data across concepts? How should they maximize fraud detection capabilities? How should they arm restaurant operators with the data they need to react to changes affecting day-to-day operations as well as over-time goals? An industry-leading data model, integrated metadata, and prebuilt reports and dashboards deliver the answers to these questions and more. Deliver relevant, actionable mobile analytics for the restaurant industry with an integrated solution of Oracle Business Intelligence and Oracle Endeca Information Discovery.

We have tentatively chosen to brand this offering as Crave – Designed by Perficient. Powered by Oracle. This way we can differentiate this new Oracle-based offering from the current Retail Pathways offering.

Crave Logo

At its core, Crave leverages the star-schema model of Retail Pathways, but includes updates specifically designed to enable native Oracle Business Intelligence features. We also offer the same set of proven core reports for both in-store and above-store analysis. These core reports, however were designed with a “mobile first” approach for use by restaurant operators.

RTLPATH Architecture

A new type of data analysis included in Crave is the Growth-Share Matrix report. Invented in 1970 by Bruce D. Henderson for Boston Consulting Group, this scatter graph ranks items on the basis of their relative market shares and growth rates. By using the contribution margin of each item along with its popularity and popularity growth we are able to identify four types of products.

Growth-Share Matrix

  • Stars: Menu items generating strong sales which may cost a lot to produce.
  • Question Marks: Menu items gaining popularity which require tweaking or investment to make profitable
  • Cash Cows: Menu items that are easy to make, low-cost, and are responsible for a larger share of profits.
  • Dogs: Menu items that don’t sell well, but are perhaps also low cost.

By identifying these classes of menu items we begin a process of evaluating candidates for investment or divestment.

The complete KScope14 presentation is available.

]]>
https://blogs.perficient.com/2014/06/24/kscope14-session-empower-mobile-restaurant-operations-analytics/feed/ 0 200027
Where and How to Learn Splunk https://blogs.perficient.com/2014/02/23/where-and-how-to-learn-splunk/ https://blogs.perficient.com/2014/02/23/where-and-how-to-learn-splunk/#respond Mon, 24 Feb 2014 01:46:00 +0000 http://blogs.perficient.com/dataanalytics/?p=4076

“Never become so much of an expert that you stop gaining expertise.” – Denis Waitley

In all professions, and especially information services (IT), success and marketability depends upon an individual’s propensity for continued learning. With Splunk, there exist a number of options for increasing your knowledge and expertise. The following are just a few. We’ll start with the obvious choices:

  • Where and How to Learn SplunkCertifications,
  • Formal training,
  • Product documentation and
  • The company’s website.

Certifications

Similar to most main-stream technologies, Splunk offers various certifications and as of this writing, Splunk categorizes certifications into the following generalized areas:

The Knowledge Manager

A Splunk Knowledge Manager creates and/or manages knowledge objects that are used in a particular Splunk project, across an organization or within a practice. Splunk knowledge objects include saved searches, event types, transactions, tags, field extractions and transformations, lookups, workflows, commands and views. A knowledge manager not only will have a though understanding of Splunk, the interface, general use of search and pivot, etc. but also possess the “big picture view” required extend the Splunk environment, through the management of the Splunk knowledge object library.

The Administrator

A Splunk Administrator is required to support the day-to-day “care and feeding” of a Splunk installation. This requires “hands-on” knowledge of best practices, configuration details as well as the ability to create and manage Splunk knowledge objects, in a distributed deployment environment.

The Architect

The Splunk Architect will include both knowledge management expertise, administration know-how and the ability to design and develop Splunk Apps. Architects must also possess the ability to focus on larger deployments, learning best practices for planning, data collection, sizing and documenting in a distributed environment.

Supplementary Certifications

In addition to these 3 certifications, there are also a number of supplemental certifications that are only available to Splunk Partners.

Partnering

Splunk currently extends an offering based upon level of expertise, experience or interests. You have the ability to become a:

  • Powered Associate Partner
  • Consulting Partner
  • Developing Partner
  • Reselling Partner
  • Service Providing Partner or a
  • Technology Partner

Splunk partners are individuals or organizations that can be an excellent source of knowledge as you progress towards mastering Splunk (eventually, you may want to consider becoming a Splunk Partner yourself). To find out more about the partnering program or to identify a partner, you can use the Splunk website and it is recommended that you establish your own splunk.com account and login to the website as a “returning Splunker”.

Formal Training

The conventional method of Splunk education- instructor-led classes – is offered both virtually and “at your site”. The complete Splunk curriculum is offered once a month and all classes include lots of hands-on exercises focused on:

  • Power users
  • App development
  • Administration
  • Architecting
  • Security

Product Documentation

The perhaps most-obvious learning option is exploring the actual product documentation. Splunk has done a fine job providing explicit details and working examples that are valuable to all levels of users.

If you go online and visit docs.splunk.com/Documentation/Splunk (and be sure to bookmark it) you will have the choice of select the (Splunk) version that you are interested in and examine (at your own speed and on your own schedule):

  • Release notes
  • Tutorials on searching, data models and Pivot, etc.
  • Administration manuals
  • Installation Instructions
  • Alerting Manuals
  • Dashboard & Visualizations
  • Distributed deployments
  • Forwarding
  • Knowledge Manager
  • Module system references
  • Pivoting
  • API reference
  • Search reference
  • Trouble shooting
  • Developing Views & Aps for Splunk Web
  • Distributed Searching
  • Getting Data in
  • Indexing and Clustering
  • Modules System User Manual
  • Reporting
  • Security
  • Updating your instances

www.Splunk.com

The company’s website is online and available 24-7. It is a particularly well done searchable portal to all kinds of Splunk related information that is organized into areas such as:

  • Basic product information,
  • Solutions,
  • Industries,
  • Partners,
  • About (Splunk),
  • Support,
  • Services,
  • Resources,
  • Etc.

Also on the website you will find:

Splunk Answers

Engage with everyone within the Splunk community for fast answers to your Splunk questions.

Splunkbase

A searchable knowledgebase of Splunk Apps, answers and advice and where you can earn points and badges to increase your Splunkbase “rank” – making a name for yourself on the Splunkbase leadership board as your expertise grows (you can also access splunk.com and spunk documentation sites from the location).

The Support Portal

Accessible from the main Splunk website splunk.com (under the menu item “Support”), the support portal is where you can become a “Splunker” by creating a Splunk.com account to:

  • Down load Splunk software and updates
  • Participate in online forums and submit support issues
  • Access and download application guides and whitepapers
  • Access future product roadmaps and add comments
  • Get technical updates, the SplunkVox newsletter and plenty of other “goodies”

The Splexicon

The word “lexicon” derives from the Greek and means “of or for words”. The Splunk lexicon (or the “Sp-lexicon”) defines all of the technical terms that are specific to Splunk. These definitions include links to related information from within the Splunk product documentation. The Splexicon can be used as an “intelligent interface” to the online Splunk manuals allowing you to quickly lookup Splunk terminology and then quickly access the product documentation details pertaining to that term.

“How-to” Tutorials

As of this writing, the Splunk Education team offers roughly 9 or 10 video instructions with topics including:

  • Installing Splunk (on both MS windows and Linux)
  • Adding data to Splunk
  • Basic searching fundamentals
  • Use of fields
  • How to saving & share your searches
  • Splunk Tags and
  • Reports and dashboards

Note: Splunk Education also offers a number of free self-paced eLearning courses that are designed to teach the end-user level features of Splunk through content, simulations, and quizzes. All you need to do is create your own online profile.

User conferences, blogs & news groups

There is an almost endless amount and variety of conferences open to anyone to attend to increase the breadth of ones exposure to Splunk. The most popular is SplunkLive!

SplunkLive is your opportunity to learn more about the most recent announcements, ways to extend the platform, updates on hot topics – such as Splunk delivered in the Cloud, Splunk for Mobile, Splunk Analytics for Hadoop and a brevity Apps available in the Splunk App store SplunkLive always include general sessions, speakers, and detailed breakout sessions

Professional Services

Finally, it’s important to point out that there is always the opportunity to take advantage of the skill and expertise offered by the Splunk professional services team who provide custom services scoped according to your specific requirements. These may include development of applications, implementation of use cases, workshops and design sessions, or other services requiring Splunk-related expertise.

Time to get Splunking!

]]>
https://blogs.perficient.com/2014/02/23/where-and-how-to-learn-splunk/feed/ 0 199983
All about CLEM https://blogs.perficient.com/2013/12/09/all-about-clem/ Mon, 09 Dec 2013 19:04:05 +0000 http://blogs.perficient.com/dataanalytics/?p=3970

SPSS CLEM is the control Language for Expression Manipulation, which is used to build expressions within SPSS Modeler streams. CLEM is actually used in a number of SPSS “nodes” (among these are the Select and Derive nodes) and you can check the product documentation to see the extended list.

CLEM expressions are constructed from:

  • Values,
  • Fields,
  • Operators (arithmetic, relational, logical operators) and
  • Functions

Scripting?

CLEM should not be confused with the scripting capability that both SPSS Statistics and SPSS Modeler offer (known as Syntax). Syntax scripting is more of an automation tool, while CLEM’s purpose is focused on specific data manipulations, however a subset of the CLEM language can be used when scripting in the user interface – supporting many of the same data manipulations in an automated fashion.

With “CLEM”, you have the power to (using expressions) do many things, such as:

  • Compare and evaluate conditions on record fields.
  • Derive values for new fields.
  • Derive new values for existing fields.
  • Reason about the sequence of records.
  • Insert data from records into reports

 

Using CLEM

In nodes like “Select” and “Derive”, expressions and conditions can be entered directly (by typing them in) or by using the SPSS Expression Builder. The Expression Builder is invoked by clicking the Launch expression builder button in the respective dialog (for example, in a Select node you click the Launch expression builder button to launch the Expression builder).

The Expression Builder provides a powerful tool to construct expressions and in addition, the Expression Builder can check the validity of your expression in real-time.

clem1

 

 

 

 

 

 

 

 

 

 

 

 

More on the Expression Builder

  • Across the top of the Expression Builder dialog is where you can enter your expression directly (or where the generated expression ends up).
  • “General Functions” list all available SPSS functions available to the expression builder.
  • The middle of the dialog shows the expression operators for selection.
  • Under “Fields” you can view and select a field (from your data) to be used in the expression.
  • When you click on a function, its description is displayed under the Function list.
  • You can use the “Check” button to test the syntax of your expression.
  • A value can be picked from a list of values (provided the field has been instantiated) by clicking the fields Value button or right clicking on the field itself.

Scripting and Expressions in Cognos TM1

In Cognos TM1, the closest thing to the SPSS expression builder is its rules editor. Which started out “pretty rudimentary” in form:

 

clem2

 

 

 

 

 

In more recent versions, TM1 introduced the “advanced rules editor” (remember, you have to set the TM1p.ini AdvancedRulesEditor = T setting to access this editor) with its specific tools to help you create, manage, and verify TM1 rules. This Rules Editor has a full set of menus for creating, editing, and managing rules. Keyboard shortcuts are provided for the more commonly used menu options.

clem3

 

 

 

 

 

The advanced version of TM1’s rule editor is similar to the SPSS Expression builder in that it allows the developer to either enter the express (the rule) directly (type it in) or “build it” from dialog selections such as the Insert Function button (which then displays the Insert Function dialog allowing you to select a function and even the function parameters from succeeding dialogs):

clem4

 

 

 

 

 

 

 

 

 

 

 

 

Conclusion

If you are a TM1 developer and have written your share of TM1 rules, you will feel comfortable with the concept of using CLEM and the expression builder with in SPSS Modeler.

 

 

 

 

]]>
199973
Primary Practices for Examining Data https://blogs.perficient.com/2013/11/21/primary-practices-for-examining-data/ https://blogs.perficient.com/2013/11/21/primary-practices-for-examining-data/#respond Fri, 22 Nov 2013 02:03:18 +0000 http://blogs.perficient.com/dataanalytics/?p=3929

SPSS Data Audit Node

z1

 

 

 

Once data is imported into SPSS Modeler, the next step is to explore the data and to become “thoroughly acquainted” with its characteristics. Most (if not all) data will contain problems or errors such as missing information and/or invalid values. Before any real work can be done using this data you must assess its quality (higher quality = more accurate the predictions).

Addressing issues of data quality

Fortunately, SPSS Modeler makes it (almost too) easy! Modeler provides us several nodes that can be used for our integrity investigation. Here are a couple of things even a TM1 guy can do.

Auditing the data

After importing the data, do a preview to make sure the import worked and things “look okay”.

In my previous blog I talked about a college using predictive analytics to predict which students might or might not graduate on time, based upon their involvement in athletics or other activities.

From the Variable File Source node, it was easy to have a quick look at the imported file and verify that the import worked.

z2

 

 

 

 

 

 

 

 

Another useful option is run a table. This will show if field values make sense (for example, if a field like age contains numeric values and no string values). The Table node is cool – after dropping it into my stream and connecting my source node to it, I can open it up and click run (to see all of my data nicely fit into a “database like” table) or I can do some filtering using the real-time “expression builder”.

z3

 

 

 

 

 

 

 

 

 

 

 

 

 

The expression builder lets me see all of the fields in my file (along with their level of measurement (shown as Type) and their Storage (integer, real, string). It also gives me the ability to select from SPSS predefined functions and logical operators to create a query expression to run on my data. Here I wanted to highlight all students in the file that graduated “on time”:

z4

 

 

 

 

 

 

 

 

 

 

You can see the possibilities that the Table node provides – but of course it is not practical to visually inspect thousands of records. A better alternative is the Data Audit node.

The Data Audit node is used to study the characteristics of each field. For continuous fields, minimum and maximum values are displayed. This makes it easy to detect out of range values.

Our old pal measurement level

Remember, measurement level (a fields “use” or “purpose”)? Well the data audit node reports different statistics and graphs, depending on the measurement level of the fields in your data.

For categorical fields, the data audit node reports the number of unique values (the number of categories).

For continuous fields, minimum, maximum mean, standard deviation (indicating the spread in the distribution), and skewness (a measure of the asymmetry of a distribution; if a distribution is symmetric it has a skewness value of 0) are reported.

For typeless fields, no statistics are produced.

“Distribution” or “Histogram”?

The data audit node also produces different graphs for each field (except for typeless fields, no graphs are produced for them) in your file (again based upon the field’s level of measurement).

For a categorical field (like “gender”) the Data Audit Node will display a distribution graph and for a continuous field (for example “household income”) it will display a histogram graph.

So back to my college’s example, I added an audit node to my stream and took a look at the results.

z5

 

 

 

 

 

 

 

 

 

First, I excluded the “ID” field (it is just a unique student identification number and has no real meaning for the audit node). Most of the fields in my example (gender, income category, athlete, activities and graduate on time) are qualified as “Categorical” so the audit node generated distribution graphs, but the field “household income” is a “Continuous” field, so a histogram was created for it (along with the meaningful statistics like Min, Max, Mean, etc.).

z6

 

 

 

 

 

 

 

 

 

 

 

 

Another awesome feature – if you click on the generated graphs, SPSS will give you a close up of the graph along with totals, values and labels.

Conclusion

I’ve talked before about the importance of understanding field measure levels. The fact that the audit data node generates statistics and chart types are derived from the measurement level is another illustration of how modeler uses the approach that measurement level determines the output.

 

]]>
https://blogs.perficient.com/2013/11/21/primary-practices-for-examining-data/feed/ 0 199967
Data Consumption – Cognos TM1 vs. SPSS Modeler https://blogs.perficient.com/2013/11/20/data-consumption-cognos-tm1-vs-spss-modeler/ Thu, 21 Nov 2013 00:55:41 +0000 http://blogs.perficient.com/dataanalytics/?p=3925

In TM1, you may be used to “integer or string”, in SPSS Modeler, data gets much more interesting. In fact, you will need to be familiar with a concept known as “Field Measurement Level” and the practice of “Data Instantiation.

In TM1, data is transformed by aggregation, multiplication or division, concatenation or translation, and so on, all based on the “type” of the data (meaning the way it is stored), with SPSS, the storage of a field is one thing, but the use of the field (in data preparation and in modeling) is another. For example if you take (numeric) data fields such as “age” and “zip code”, I am sure that you will agree that age has “meaning” and a statistic like mean age makes sense while the field zip code is just a code to represent a geographical area so mean doesn’t make sense for this field.

So, considering the intended use of a field, one needs the concept of measurement level. In SPSS, the results absolutely depend on correctly setting a field’s measurement level.

Measurement Levels in Modeler

SPSS Modeler defines 5 varieties of measurement levels. They are:

  • Flag,
  • Nominal,
  • Ordinal,
  • Continuous and
  • Typeless

Flag

This would describe a field with only 2 categories – for example male/female.

Nominal

A nominal field would be a field with more than 2 categories and the categories cannot be ranked. A simple example might be “region”.

Ordinal

An Ordinal field will contain more than 2 categories but the categories represent ordered information perhaps an “income category” (low, medium or high).

Continuous

This measurement level is used to describe simple numeric values (integer or real) such as “age” or a “years of employment”.

Typeless

Finally, for everything else, “Typeless” is just that – for fields that do not conform to any other types –like a customer ID or account number.

 

Instantiation

Along with the idea of setting measurement levels for all fields in a data file, comes the notion of Instantiation.

In SPSS Modeler, the process of specifying information such as measurement level (and appropriate values) for a field is called instantiation.

inst1

 

 

 

 

 

 

Data consumed by SPSS Modeler qualifies all fields as 3 kinds:

  • Un-instantiated
  • Partially Instantiated
  • Fully Instantiated

Fields with totally unknown measurement level are considered un-instantiated. Fields are referred to as partially instantiated if there is some information about how fields are stored (string or numeric or if the fields are Categorical or Continuous), but we do not have all the information. When all the details about a field are known, including the measurement level and values, it is considered fully instantiated (and Flag, Nominal Ordinal, or Continuous is displayed with the field by SPSS).

It’s a Setup

Just as TM1’s TurboIntegrator “guesses” what field (storage) type and use (contents to TM1 developers) based upon a specified fields value (of course you can override these guesses), SPSS data source nodes will initially assign a measurement level to each field in the data source file for you- based upon their storage value (again, these can be overridden). Integer, real and date fields will be assigned a measurement level of Continuous, while strings area assigned a measurement level of Categorical.

inst2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This is the easiest method for defining measurement levels – allowing Modeler to “autotype” by passing data through the source node and then manually reviewing and editing any incorrect measurement levels, resulting a fully Instantiated data file.

]]>
199966
Importing Data into SPSS Modeler for the TM1 Developer https://blogs.perficient.com/2013/11/19/importing-data-into-spss-modeler-for-the-tm1-developer/ Tue, 19 Nov 2013 17:04:18 +0000 http://blogs.perficient.com/dataanalytics/?p=3918

If you have a TM1 background it is a quick step to using SPSS Modeler -if you look for similarities in how the tools handle certain tasks like, for example, importing data.

With TM1, source data is transformed and loaded into cube structures for consolidation, modeling and reporting using its ETL tool TurboIntegrator. In SPSS Modeler, source data is loaded and transformed through a “logic stream of nodes” for modeling and reporting.

Here is a closer look:

Sourcing Data

Cognos TM1 uses TurboIntegrator as its data import mechanism. TurboIntegrator (referred to as “TI”) is a programming or scripting tool that allows you to automate the data importation into a TM1 application. Scripts built with TurboIntegrator or “TI”, can be saved, edited and, through the use of chores, be set up to run at regular intervals.

Through the use of TI’s Data Source feature, you can import data or information external to TM1, such as:

  • Industry standard comma-delimited (CSV) text files including ASCII files.
  • Information stored in relational databases (accessible through an ODBC data source).
  • Other OLAP cubes, including Cognos TM1 cubes and views.
  • Microsoft® Analysis Services.
  • SAP via RFC.

Rather than using scripts, SPSS Modeler utilizes data import nodes, which are all found on the SPSS Sources palette.

SPSS Modeler source nodes allow you to reads data from:

  • Industry standard comma-delimited (CSV) text files including ASCII files.
  • XML files (wish TM1 did this!)
  • Statistics Files
  • SAS
  • Excel
  • Databases (DB2TM, OracleTM. SWL ServerTM, and a variety of other databases) supported via ODBC
  • Other OLAP cubes, including Cognos TM1 cubes and views.

Components of a Script

To create a new TI script (using TM1 Server Explorer) you just right-click on “Processes” and select “Create New Process”. (TM1 then opens the “TurboIntegrator” dialog showing a new, blank process ready for you modify).

sp1

 

 

 

Within any TI process, there will be 5 visible “tabs”:

  • The “Data Source” tab,
  • The “Variables” tab,
  • The “Maps” tab (this tab is initially greyed out),
  • The “Advanced” tab and
  • The “Schedule” tab.

Components of a Node

In SPSS, to create a node, you simply “drag” the appropriate node from the sources palette into your current “stream”.

sp2

 

 

 

Data source nodes also have the concept of “tabs” – and each source node will have four:

  • The Data Tab,
  • The Filter Tab,
  • The Type Tab, and
  • The Annotations Tab.

Configuring for Import

With TM1, you use the Data Source tab to identify the source from which you want to import data to TM1. The fields and options available on the Data Source tab will then vary according to the Datasource “type” that you select.

In SPSS, you use the File tab to set the options for the data import and the dialog will then be specific for the type of data that are imported.

Data Structuring

The concept of database table columns or “file fields” are handled in the TM1 TurboIntegrator “Variables” tab. If the process has a data source defined, each column in that datasource will become a variable in the variables tab.  Based upon your selected datasource, TM1 attempts to setup the variables for you in the variables tab. TM1 will assign names, storage types and a “purpose” or “content” to each of your variables (each of which you can override).

Similarly, Modeler also requires a “rectangular” data structure – records (rows of the data table) and fields (column of the data table) that are handled in the Data tab. Again, based upon the data source, options on the Data tab allow you to override the specified storage type for fields as they are imported (or created).

Logic and Processing

To “process” imported data, TurboIntegrator processes include several “procedures tabs”:  Prolog, Metadata, Data, and Epilog (“Sub-tabs” of the “Advanced” tab). When you run a (TurboIntegrator) process, the procedures are executed in sequence (Prolog, Meta, Data and Epilog) and provide the ability to apply data transformation (or other) logic to each record of data in the source using lines of script that contain various predefined functions.

In SPSS, the Filter tab can be used to rename or exclude fields (from the source file) and, using the Type tab, you can specify field properties and logic such as:

• Usage typing

• Options for handling missing values and system nulls

• Setting the role of a fields

• Specifying values for a field as well as options used to automatically read values from the dataset.

• Specify field and value labels.

Using the Data

Using the Data Tab in the TI process, data will then be written to cube cells (using a CellPutN functions) where it will be available for further analysis.

In SPSS Modeler, the source node you used to import your data would be “connected” to 1 (or more) nodes for additional processing or analysis.

Keep in mind that in TM1, the TurboIntegrator processes you create can be saved and rerun for new data files. In SPSS Modeler, you create and save streams (that begin with your source node) that can be rerun for new data.

 

Conclusion

When learning a new tool, equate what you learn to what you already knew – there will be plenty of similarities!

 

]]>
199965
Exposing TM1 Top https://blogs.perficient.com/2013/11/05/exposing-tm1-top/ Tue, 05 Nov 2013 19:40:23 +0000 http://blogs.perficient.com/dataanalytics/?p=3873

So what is TM1 TOP?

Definition and Access

The TM1 Top utility empowers you to dynamically monitor “threads” running in an instance of a Cognos TM1 server. TM1 Threads are of three possible “types”:

  • User Threads – Name of an actual user that is logged into TM1
  • Chore Threads – A chore running on the TM1 server
  • System Threads – A TM1 system process running on the TM1 server. System threads can be:
    • Pseudo (used to clean up user-defined consolidation (UDC) objects)
    • Stats (represents the thread for the performance monitor feature that is started when a user selects the Start Performance Monitor option in TM1 Architect and Server Explorer) or
    • DynamicConf (dynamically reads and updates parameters in the TM1 server configuration file, tm1s.cfg.

TM1 Top is a stand-alone utility (similar to the UNIX “top” utility which allows dynamic monitoring of the processes running on a given system) that runs within a console (command) window on a Microsoft Windows system. It is designed to make minimal demands on the TM1 server and the supporting network and system.

With the exception of a user-initiated login process, TM1 Top does not use any cube or dimension resources in the TM1 server, and does not use or interact with the data or locks on the TM1 server. The server-side processing that supports TM1 Top runs in a separate light thread to allow TM1 Top to report server state even if the server is unresponsive to users.

Generally, TM1 Top provides real-time monitoring of your TM1 servers, similar to the GNU operating system top command.

The Formal Installation

TM1 Top is installed by default when you install TM1 Server (when you perform a custom TM1 installation with the TM1 Installation Wizard, TM1 Top is listed under Servers on the Component Selection screen).

After installation, you need to locate the following files:

  • Tm1top.exe
  • TM1top.ini

These files will be located in your TM1 “bin” folder”.

top1

 

If TM1 Top is not currently installed on your system, you can run theTM1 Installation Wizard to install the utility as follows.

  1. Run the TM1 Installation Wizard.
  • If your system has a previous installation of TM1, click Next to advance to the Program Maintenance screen. On the Program Maintenance screen, select the Modify option. Click Next to advance to the Installation Options screen.
  • If your system does not have a previous installation of TM1, follow the Installation Wizard steps until the Installation Options screen opens.
  1. On the Installation Options screen, select the Custom option for the Installation Type.
  2. Click Next.

The Component Selection screen opens.

  1. On the Component Selection screen, expand the Servers component category and select the TM1 Top sub-category.
  2. Select the This feature will be installed on local hard drive option for TM1 Top.
  3. Follow the steps in the TM1 Installation Wizard to complete the installation.

Configuring the TM1top.ini File

Before you can run TM1 Top, you need to edit the initialization file Tm1top.ini. Like TM1 Server, TM1 top utilizes a simple text file to initialize at startup. This file is named TM1top.ini and is an ASCII file that specifies environment information for the TM1 Top utility.

By default, a sample Tm1top.ini file is installed to the TM1_install_dir\bin directory. When you run TM1 Top, the Tm1top.ini file needs to be located in the same directory as the TM1 Top executable file.

A sample of a configured Tm1top.ini file is shown below.

adminhost=
servername=planning sample
logfile=c:\temp\tm1top.log
logperiod=50
logappend=T
refresh=10
adminsvrsslcertid=
adminsvrsslcertauthority=
adminsvrsslcertrevlist=
exportadminsvrsslcert=
adminsvrsslexportkeyid=

Remember; do not include any spaces between the parameter name and the parameter value when editing the Tm1top.ini file. The parameters in the Tm1top.ini file are described in the following table.

top2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Running TM1 Top with Command-line Options

You can also enter the configuration parameter values at the command prompt when starting TM1 Top (which will over-ride the values in the Tm1top.ini file).

Use must use the following syntax to run TM1 Top with command-line options:

tm1top.exe -OptionName1 OptionValue1  -OptionName2 OptionValue2 ..

OptionName and OptionValue can be any of the following parameter and value combinations:

  • -adminhost admin-host-name
  • -servername server-host-name
  • -refresh refresh-period
  • -logfile file-path
  • -logperiod nnn
  • -logappend T or F

For example, to run TM1 Top with the ServerName parameter set to sdata, the refresh parameter set to 5 seconds and output sent to a logfile, enter the following:

tm1top.exe  -servername sdata  -refresh 5 –logfile c:\tm1logs\topout.txt

Note: Use quotes for parameter values that include spaces, as follows:

tm1top.exe  -servername "planning sample"

Also from the command line you can see a list of available parameters, use the /? option as follows:

tm1top.exe /?

top3

 

 

 

 

 

 

 

 

 

Understanding the Top Display

When TM1 Top is running, it displays a set of fields and status information in the following format:

top4

 

 

 

 

 

Each row in the display represents one unique thread running in the TM1 server that you are monitoring. The title bar of the console window displays the current values for the AdminHost, ServerName, and Refresh parameters.

TM1 Top is a DOS based application executable, so to see more lines or a wider display, you need to re-size the console window or use a smaller font size. If the display fills the entire height of the console window, you can use the up and down arrow keys on your keyboard to go to the next or previous page within the console window.

The following table is based upon information provided in the product documentation and describes the status fields displayed by TM1 Top.

top5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Locking

TM1 Server uses a set of three lock modes to control access to TM1 data. When a TM1 server is running, it applies these locks to individual TM1 objects, such as cubes, views, and dimensions, as these objects are accessed by TM1 threads. All locks will impact TM1 performance at some level.

The level of impact to application performance will be based upon:

  • Lock Mode applied
  • User activities
  • Model architectural design

The lock modes for TM1 objects are described in the following table.

top6

 

 

 

 

TM1 Top displays the status of locks used by all threads running in a TM1 server. Lock status is displayed by TM1 Top under the State, Obj Lock Status, and Total Lock Status fields. It is important that lock status be reviewed regularly by the application administrator to determine the application’s locking pattern(s). Based upon observed locking patterns, the application administrator may:

  • Reschedule application processes to off peak intervals
  • Cancel locked threads
  • Ask for a review of a particular area of the application architecture that appears to be a thread bottleneck

Thread Processing States

As previously mentioned, TM1 application activity is represented in “threads”. TM1 Top displays the current processing state of each thread in the State field. A TM1 thread can be in one of the following processing states.

top7

 

 

 

 

 

Important for the application administrator to note are:

  • Thread states Idle, Run and Login are “normal” and mostly non-impacting to application performance.
  • Threads in a Wait state should be monitored. It is normal for all threads to experience some Wait time, but if a thread remains in this state for an extended period of time, the application administrator will need to determine the cause of the wait. Waits are caused by one of the following events:
    • The thread is waiting for R-locks to be released so it can obtain a W-lock on the object.
    •                 The thread is waiting for a W-lock to finish so it can get either an R-lock or an IX-lock on an object.
    •                 The thread is requesting an IX-lock, but is waiting for another thread with an IX-lock on the same object to finish and release the lock.
    •                 The thread is requesting an IX-lock for an object, but is waiting for a thread with a R-lock on the same object to release its lock.
    •                 The thread is waiting for another thread to complete and release its locks.
  • A thread in a Commit state will cause TM1 Server to apply object locks and have the potential to impact application performance.
  • A thread in a Rollback state literally means that that TM1 Server has not allowed it to continue to completion and has restarted the thread from its most recent “checkpoint”. There is no “maximum times” that a thread will be restarted; therefore a thread will be rolled backed as many times as it takes to complete or it is canceled by the administrator. This is one of the most common causes of application performance degradation

Basic Top Commands

It is important that the application administrator have TM1 administrator access. Using TM1 Top, an administrator can see exactly what is happening in/on a TM1 Server. In addition, particular threads can be reviewed and, if required, canceled.

Canceling a Thread

Using the TM1 Top cancel command, the application administrator can attempt to cancel and remove a thread (by the unique thread ID). The TOP cancel command inserts a “ProcessQuit” into the chosen thread. If the thread is calculating a large view or is a TurboIntegrator process “stuck in a loop”, the cancel command may not be effective. In this case, the only option may be to terminate the user’s connection using the TM1 Server manager.

The following are the most commonly used TM1 Top commands:

X -exit
W -write display to a file
H -help
V -verify/login to allow cancelling jobs
C -cancel threads, you must first login to use that command

Potential Errors and Exceptions

 

During the user of TM1 Top, the following exceptions may be encountered:

“Could not connect to server “servername” on admin host “adminname””

Press “R” to retry; or any other key to exit

 

This error occurs when TM! Top cannot connect to the TM1 server. The most common causes are:

  • Incorrect TM1 Top configuration
  • TM1 Server not running

 

TM1Top will not run. Can’t find the tm1api.dll

This error occurs when the TM1Top executable cannot locate the file TM1API.dll was not found. Re-installing the application may fix this problem. TM1Top needs the following files to work:

  • TM1Top.exe
  • TM1Top.ini
  • tm1api.dll
  • tm1sip.dll
  • tm1lib.dll

 TM1Top flashes and the command console does not appear

This error occurs when the path to the log file in the tm1top.ini file does not exist. Verify that the path to the log file in the tm1top.ini file exists. If not, choose a path that does exist and the window should come up as expected.

Conclusion

There are many methods for monitoring and managing TM1 applications. One of the most useful and easy to implement is TM1 Top. No application administrator should be without it!

 

 

]]>
199963
Data Mining with IBM SPSS Modeler v15 https://blogs.perficient.com/2013/11/01/data-mining-with-ibm-spss-modeler-v15/ Fri, 01 Nov 2013 12:45:53 +0000 http://blogs.perficient.com/dataanalytics/?p=3862

Having recently completed the course “IBM SPSS Modeler & Data Mining” offered by Global Knowledge, I was looking to find more opportunities to do some modeling with SPSS Modeler. So, when I read in the news recently, about college recruiters using predictive techniques to determine the probability of a particular recruit graduating on time, I thought it would be interesting to explore that idea.

For Example

My college wants to determine if a recruit will graduate on time or not. The institution can draw a sample from its historic data and using this sample, possibly predict if a particular recruit would graduate on time. The sample below gives us an idea such a historical dataset. Typically, a dataset will include a field that indicates the behavior, here: has the student graduated on time? Yes or no.

dm1

 

 

 

 

 

An Idea – and some cross tabulation!

The college recruiters have a hunch that there is a difference between students who are athletes and students who are not and if a student participates in collegiate activities or not. Based on this hunch, they investigate to see if there might be any differences in graduating on time statistics – by cross tabulating on “athlete”:

dm2

 

 

In IBM SPSS Modeler, it is very simple to cross tabulate data using the Matrix node. You can simply drop it into your stream, connect it to your source data and set some parameters. For example, I set “Rows” to the field in my file “graduate on time” and “Columns” to “athlete”. (I also went to the “Appearance” tab and clicked-on “Counts” and “Percentage of column” for my “Cross-tabulation cell contents”.

dm3

 

 

 

 

 

 

 

 

After clicking “Run”, the output is ready for review:

dm4

 

 

 

 

 

 

 

 

dm5

 

Another Analysis Tool

SPSS Modeler also provides the Distribution Node which lets you show the occurrence of symbolic (non-numeric) values, in this case, “graduated on time” or “athlete”, in our dataset. A typical use of the Distribution node can be to show imbalances in the data (that can be rectified by using a Balance node before creating a model). What I did was use the node to plot “athlete” overlaid with “graduate on time” for an interesting perspective:

dm6

 

 

 

Back to the Analysis

Looking at my cross-tabulation output, it appears that 93 % of the non-athlete students did not graduate on time, while for the students who were athletes; only 13 % did not graduate on time. The question now is -can this difference be attributed to chance (because just a sample was drawn) or, does the difference in the sample reflect a true difference in the population of all students?

The Chi-Square test is a statistical test is used to answer this question. This test gives the probability that the difference between athletes and non-athletes can be attributed to chance.

dm7

 

The CHAID Node

CHAID, or Chi-squared Automatic Interaction Detection, is a classification method for building decision trees by using chi-square statistics to identify optimal splits. Again, SPSS Modeler offers the ‘CHAID Node” that can be dropped into a stream and configured. In my exercise, I set my (CHAID) target to “graduate on time” and my predictors to “activities” and “athlete”. My results are presented in the viewer which shows a “tree” to present the data.  The initial node shows the breakdown of graduate on-time vs. not on-time and then modeler broke out the next level as students who did not participate in activities and those who did.

dm8

 

 

 

 

 

 

 

The exercise found the probability (P-Value) to be 0, so the probability is 0 that the difference between students involved in activities vs. those who are not can be attributed to chance.  In other words: there are differences between participating in activities and graduating on-time!

Looking at these results, I concluded that students who do not participate in activities during their college career have a much higher chance of NOT graduating on time (96 %) – vs. those that do participate in activities (3 %).

The next step might be to zoom in on these students that do not participate in groups. Modeler broke down the “tree” into a third level:

dm9

 

 

 

 

 

Here, modeler tells me that those students that do not participate in activities are both athletes and non-athletes. The non-athletes who do not participate in activities have a slightly better “on time” rate then do the athletes who do not participate in activities.

Conclusion

Of course there is more to a legitimate data mining project, but it clear that IBM SPSS is a handy tool that “fits” for novice to expert level data scientists. More exploration to come!

]]>
199962
Three Attributes of an Agile BI System https://blogs.perficient.com/2013/10/10/three-attributes-of-an-agile-bi-system/ Thu, 10 Oct 2013 16:31:12 +0000 http://blogs.perficient.com/dataanalytics/?p=3834

In an earlier blog post I wrote that Agile BI was much more than just applying agile SDLC processes to traditional BI systems.  That is, Agile BI systems need to support business agility.   To support business agility, BI systems should address three main attributes:

  1. Usable and Extensible –  In a recent TDWI webinar on business enablement, Claudia Imholf said “Nothing is more agile than a business user creating their own report.”   I could not agree more, with Ms. Imholf’s comments.   Actually, I would go farther.  Today’s BI tools allow users to create and publish all types of BI content like dashboards, and scorecards.  They allow power users to conduct analysis and then storyboard, annotate, and interpret the results.   Agile BI systems allow power users to publish content to portals, web-browsers, and mobile devices.  Finally, Agile BI systems do not confine users to data published in a data warehouse, but allow users to augment IT published data with “user” data contained in spreadsheets and text files. 
  2. Easily Changeable – Agile BI systems should be easily changeable. Furthermore, they should support incremental development so that each iterative delivery cycle produces visible end-user value.  This requires architecture and tools that do not require a lot of analysis and modeling to implement.  Most data discovery tools allow IT to publish data for end-user consumption without a lot of analysis or up-front design.   This is not to say that data modeling and analysis are not needed in modern agile BI systems, however, an Agile BI system should allow for small, iterative analysis and modeling efforts.
  3. Jointly Governed – Agile BI systems are jointly governed by IT and business stakeholders.  Decisions about what data governance and quality activities should be applied to different data entities and attributes must be driven by the business.  User classes should be established that distinguish between business content creators and BI consumers.  This will enable IT to focus governance efforts on those actually developing dashboards and reports.  With IT not developing as much BI content it is freed to concentrate on driving adoption of BI tools, publishing data for business consumption, and enhancing data quality.

If you are interested in learning more about Agile BI, I will be hosting a webinar titled “Agile BI: How to Deliver More Value in Less Time, ”  where I will be covering these 3 aspects of Agile BI in more depth.  The webinar is October 15, at 2pm Eastern Time.  You can sign up for the webinar at: https://cc.readytalk.com/cc/s/registrations/new?cid=mr0vyoc61877 .

]]>
199960