SPSS Articles / Blogs / Perficient https://blogs.perficient.com/tag/spss/ Expert Digital Insights Tue, 15 Dec 2015 14:28:25 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png SPSS Articles / Blogs / Perficient https://blogs.perficient.com/tag/spss/ 32 32 30508587 SPSS Does Regular Expressions https://blogs.perficient.com/2015/12/15/spss-does-regular-expressions/ https://blogs.perficient.com/2015/12/15/spss-does-regular-expressions/#respond Tue, 15 Dec 2015 14:28:25 +0000 https://blogs.perficient.com/ibm/?p=5415

Well. OK, it does use Python, but you can use Python and regular expressions within IBM SPSS.  Regular expressions allow you to do complex string searches with a minimal amount of coding.  Both IBM SPSS Statistics and Modeler do a good job with string manipulation.  However, you often are faced with performing complex string searches and manipulations that can take multiple loops within Statistics or several Derive Nodes within Modeler.  Often, once you finally get the desired results, your code is not scalable or able to be productionalized.  Your code is specific only to your development data.   Splunk is an alternative, but that might be more horsepower than you need.  Regular expressions are a nice way to fill in the middle ground.  Here is a link to an example with addresses and other with zip codes.

I think regular expressions can be a huge time saver and make for better code.  They are only my radar this coming year to get more practice working with them.

]]>
https://blogs.perficient.com/2015/12/15/spss-does-regular-expressions/feed/ 0 214283
IBM Watson Analytics: The Time is Now https://blogs.perficient.com/2015/07/20/ibm-watson-analytics-the-time-is-now/ https://blogs.perficient.com/2015/07/20/ibm-watson-analytics-the-time-is-now/#respond Mon, 20 Jul 2015 17:05:28 +0000 https://blogs.perficient.com/ibm/?p=4825

WAScreenshot1In mid-May, at the IBM Vision Conference, IBM announced a deal for a free year of licenses of IBM Watson Analytics for existing Cognos BI and TM1 customers. If you missed that offer, IBM has recently introduced another offer for free licenses:

  • Watson Analytics Professional Edition
  • Up to 60 users
  • 6 months of use
  • Offer available through 10/31/2015

Professional:  http://www.ibm.com/web/portal/analytics/analyticszone/wagoproforfree

If you are an existing Cognos customer, now is the time to start learning Watson Analytics.

Perficient has been working with Watson Analytics since it was first released in beta form near the end of 2014.  Since our team has worked with Cognos for 15 years, and more recently SPSS and Predictive Analytics, we feel that we’re well positioned to help customers capitalize on this new technology.

Throughout the year, we have expanded our use of the Watson Analytics solution and have recently started engaging with customers to explore Watson Analytics at their organizations.  Most recently we are partnering with a Healthcare provider to create Watson Analytics dashboards to analyze Medicare claim data and enhance patient readmission models previously built on SPSS.

WAScreenshot2

If you’re an existing Cognos BI or TM1 customer, the first question is: How do we engage? The answer – quickly.

It starts with a use case and data. Since Watson Analytics runs in the cloud, we can get started quickly with data.  Some questions to ask yourself:

  • What do you wish you could do with Cognos today and can’t?
  • How could you enrich existing dashboards and data sets with predictive KPIs?
  • Is there value in moving to a Cloud model for Analytics?

At this point, which is early for Watson Analytics, we are not advocating this as a wholesale replacement.  We are advocating an enhancement based on your current approach.  Getting data to Watson Analytics and exploring what it can do for you is time well spent.

As the Watson Analytics product matures over the next 9-12 months, Perficient will be on the forefront working with customers to absorb and extract value out of this technology.

If you’d like you get started and also learn what other companies are doing with the Watson brand of technologies, please reach out using the form below.

]]>
https://blogs.perficient.com/2015/07/20/ibm-watson-analytics-the-time-is-now/feed/ 0 214231
Great BI Experiences with IBM Cognos, SPSS and Digital Experience https://blogs.perficient.com/2015/06/02/great-bi-experiences-with-ibm-cognos-spss-and-digital-experience/ https://blogs.perficient.com/2015/06/02/great-bi-experiences-with-ibm-cognos-spss-and-digital-experience/#respond Tue, 02 Jun 2015 14:30:56 +0000 https://blogs.perficient.com/ibm/?p=4651

Pankaj Bose spoke at the IBM Digital Experience 2015 conference about how IBM Cognos, Digital Experience and SPSS can be integrated together for a great Business Intelligence Experience. spark-3Typical business challenges include:

  • How to build, run and manage a mobile, cloud and social solution
  • Complexity of great experiences
  • Lack of centralized storage and poor, incompatible data formats
  • Lack of forecasting impacts decision making and key investment strategies
  • Managing unstructured social content
  • Inability to provide personalized customer interactions

For the solution Pankaj talked about, IBM Digital Experience is the center of the approach providing the experience, social, mobile and multi-lingual capabilities, with IBM SPSS as the predictive analytics, BI is provided by Cognos, Data handling is from IBM DataStage and the operations are provided by IBM Pure and DB2 system. These products can share a common security framework and common data management services.

There are four integration patterns for the BI Experience:

  • Portal Services:  Cognos and SPSS both provide portlets that can be installed on IBM Digital Experience.  These portlets provide a set of features define by each of those products.Cognos uses its Servlet Gateway. SPSS uses Collaboration and Deployment Services to deliver content tot he portal.  This is a preferred way to implement the BI Portal because it is all out of the box.
  • Web Services:  Here you call services provide by Cognos and SPSS from custom-built portlets running on Digital Experience.  In this case, you can completely customize the solution rather than rely on the features provided by the out of the box portlets.  For example, if Cognos doesn’t deliver the exact type of chart you want to display, you can call a Cognos service to get the data you want and then create your own chart in the portlet.  This approach is trickier because you need to work with SOAP, take care of security through the service interface, etc.
  • WSRP: This is Web Services for Remote Portlets, which allows one portal to display content running on a different portal. This approach can potentially help with performance where the Cognos server runs the Cognos portlets and then your BI experience displays that content via WSRP.  There are limits with this approach, but it can be helpful in certain situations.
  • Web Application Bridge: Digital Experience includes the WAB portlet to display a website within the portal via an iFrame.  Both Cognos and SPSS have web interfaces so you can reuse this already built interface without making changes.  WAB supports reverse proxy and optimizes the rendering of the content on your BI experience.  When you have SSO implemented, this approach works really well.

All of these approaches can be used for Responsive Web Design techniques so you can easily create a mobile BI experience right off your portal.  IBM MobileFirst also can provide native apps for this environment.

In addition to provide nice charts and graphs, this same BI experience can be use to deliver reports via Digital Experience.

Here are a few potential uses cases:

  1. 360 degree view of customers – this applies to many industries – for customer service organizations.  The BI experience can provide overview of a customer’s spend including details. Using predictive analytics, management can predict churn
  2. Smarter Planet concepts to enable governments to efficiently gather information from various stakeholders and deliver concise information to constituents.  Governments can also use constituent data to predict what citizens are needing in terms of services.
  3. Track how business partners, dealers, etc are doing.  Partner relationships often depend on a lot of data – opportunities, sales, discounts, fulfillment, etc.  The BI experience could provide a dashboard to help manage all this information.

Follow the link here to see how Perficient helps clients with BI, Big Data and Analytics.

 

]]>
https://blogs.perficient.com/2015/06/02/great-bi-experiences-with-ibm-cognos-spss-and-digital-experience/feed/ 0 214215
Social Network Analysis in Action – Crime Ring Fraud Detection https://blogs.perficient.com/2015/05/01/social-network-analysis-in-action-crime-ring-fraud-detection/ https://blogs.perficient.com/2015/05/01/social-network-analysis-in-action-crime-ring-fraud-detection/#respond Fri, 01 May 2015 14:08:16 +0000 https://blogs.perficient.com/ibm/?p=4385

This is a continuation of our previous discussion on getting started with Social Network Analysis (SNA). So now that we can do some SNA with NodeXL, how do you now go out and catch the bad guys?  Well, remember, SNA is a network of entities.  So let’s take auto insurance fraud for example and show how just you and NodeXL are not going to uncover the Mafia network up the street.  In a typical auto insurance claim you are going to have a claimant or a person making the claim, you might have another driver or passenger that could also be making a claim.  You could have a tow truck operator, body shop, healthcare provider, lawyer, claims adjuster, witnesses etc. that could all be part of a claim.  The general auto industry rule of thumb is that around 1 in 10 claims have some sort of fraud or abuse or stated differently 90% of the claims are legit while 10% are suspicious.  Now we know that SNA is a network of entities, we can network all of the parties involved in a claim.  Below is an actual example from NodeXL from a client’s claims.  The red lines is a very prolific crime ring that was detected using SPSS Modeler that I highlighted manually within NodeXL.  The remaining networks are a combination of bad guys and good guys.

SNA graph

SNA does not know which is which other than there is some sort of a connection going on.  Only through data mining and predictive modeling can you determine which networks are just part of normal business and which are possibly fraudulent.  In another post I will go through how this model worked but a high level it was tasked with finding crime rings not networks.  A ring can be thought of as an independent group of 2 or more entities like a terror cell for example.  A network is a connection of multiple entities possibly rings or other connected entities usually controlled centrally.  A mafia or organized crime network would be an example.  What this graph shows is why traditional SNA using just graphs was not the silver bullet that we had all hoped for and that we still needed predictive modeling.

In the next post I will start going over getting into more detail with SNA and SPSS. If you are looking for some data to start playing with, here a couple of sources at Stanford and Arizona State that you might find useful.

]]>
https://blogs.perficient.com/2015/05/01/social-network-analysis-in-action-crime-ring-fraud-detection/feed/ 0 214191
Interview: Reducing Patient Readmissions with IBM Analytics https://blogs.perficient.com/2015/02/04/interview-reducing-patient-readmissions-with-ibm-analytics/ https://blogs.perficient.com/2015/02/04/interview-reducing-patient-readmissions-with-ibm-analytics/#respond Wed, 04 Feb 2015 19:37:35 +0000 https://blogs.perficient.com/ibm/?p=3550

Predictive Analytics solutions deal with uncovering insights from trends and patterns to determine the impact of operational adjustments and market forces on your organization. Statistical analysis and predictive modeling expand on the findings gained through business intelligence solutions to answer “What will happen?” given certain business situations.

Perficient’s IBM Predictive Analytics practice has seen tremendous success over the past year in delivering custom solutions across a variety of industries, including healthcare. One key healthcare solution created and implemented by the practice is the Patient Readmission Predictive Analytics model. To find out more about this healthcare solution, we interviewed subject matter expert Dale Less, a Senior Solutions Architect at Perficient.

Tell us about your experience with IBM Predictive Analytics solutions?

I have been using IBM Predictive Analytics solutions for approximately 4 years and other analytics products for over 10 years.  I have built solutions using IBM Predictive Analytics to identify different types of insurance fraud and abuse, violent crimes for law enforcement, marketing research and hospital readmission prediction.

shutterstock_2415Where has the IBM BA team recently implemented our Patient Readmission Predictive Analytics solution?

We developed a custom readmission solution for a large Healthcare System in Ohio.  This solution is unique in that rather than focusing on a single disease or condition, this solution predicts readmissions across all diseases and conditions.  Another unique aspect of this solution is that it focuses on the psycho-social needs on the patient and the family rather than a traditional clinical approach, which is the focus of most of the industry.

What key outcomes was this customer looking to achieve by implementing this solution?

The Healthcare Organization is attempting to reduce hospital readmissions across all of their patients.  They are taking the approach of educating the patients and their families to better manage their condition while coordinating various services within the community.  They are looking to identify patients who are in the most need of interventions, identify which interventions are most appropriate and begin the intervention process as quickly as possible; whereas traditionally, interventions occurred at the end of a patient’s stay.  Early intervention allows for more time for clinicians to provide instruction, increase patient and family member understanding and allow time for community services to be arranged so they are ready when the patient is discharged.  This solution also helped to optimize hospital resources to provide interventions to those who stood to benefit the most.

What outcomes did the client achieve?

The predictive modeling process derives a prediction of whether or not a person is at risk for being readmitted within 30 days for each day they are in the hospital.  These predictions are assigned a probability which is used to prioritize daily patient interventions.  Nursing assessment data is also segmented into different risk and intervention profiles to identify health and lifestyle behaviors that clinicians can use to develop appropriate interventions and plans of care.

Did we deliver additional value to the customer they did not intend to receive?

We derived unique measures of patient frailty and how well they manage their care outside of their hospital stay.  We developed a unique method for determining if a patient has fallen and the injuries that were possibly sustained prior to being admitted.  This fall indictor, along with nutrition and certain types of medications, increased the readmission risk for certain patients by over a factor of 13, compared to similar patients.  The client was aware that falls were an issue with their elderly population but they were not aware of how it was related to hospital readmissions.

What were the biggest surprises the customer has uncovered through these efforts?

We looked beyond just the inpatient hospital stay, to activities at the physician office level and during Emergency Room visits.  Our analysis showed that patients who were frequent users of the ER were significantly more likely to be readmitted. The client was unaware of this finding.  Many of these patients did not have a primary care physician and used the ER as their primary care physician.  As a result of this discovery, the hospital is exploring staffing the ER with social workers that can quickly intervene and decrease hospital admission.  This could lead to millions of dollars of annual savings.

What IBM software components were implemented as part of this solution?

IBM SPSS Modeler Gold and SPSS Statistics.

Outside of the Patient Readmission solution, what are some other use cases for Predictive Analytics?

Pretty much any industry that has large volumes of structured or unstructured data.  IBM SPSS connects to most databases using a standard ODBC connection.  Consistently the feedback we get from new users is that the software is very intuitive and “gets out of their way,” allowing them more time to do analytics and predictive modeling and less time writing code.

 

]]>
https://blogs.perficient.com/2015/02/04/interview-reducing-patient-readmissions-with-ibm-analytics/feed/ 0 214151
Using SPSS to Leverage Predictive Analytics https://blogs.perficient.com/2014/01/07/using-spss-to-leverage-predictive-analytics/ https://blogs.perficient.com/2014/01/07/using-spss-to-leverage-predictive-analytics/#respond Tue, 07 Jan 2014 17:12:38 +0000 https://blogs.perficient.com/ibm/?p=1932

Jim Miller, Senior Solutions Architect at Perficient, recently wrote a blog post outlining an example of how to use IBM SPSS to leverage predictive analytics to improve performance in every day business solutions.

A technology company has literally participated in thousands of projects over the years. At some point the group decided it wants to determine what factors or characteristics may influence the (hopefully successful) competition of current and future implementation projects. Thankfully, they have maintained records in on every project that they were involved in and that data contains the following (among other) informational fields:

To read the entire example, click here.

]]>
https://blogs.perficient.com/2014/01/07/using-spss-to-leverage-predictive-analytics/feed/ 0 214059
Which Modeling Approach Should You Use with SPSS Modeler? https://blogs.perficient.com/2014/01/06/which-modeling-approach-should-you-use-with-spss-modeler/ https://blogs.perficient.com/2014/01/06/which-modeling-approach-should-you-use-with-spss-modeler/#respond Mon, 06 Jan 2014 14:28:31 +0000 https://blogs.perficient.com/ibm/?p=1930

Jim Miller, Senior Solutions Architect at Perficient, recently wrote a blog post explaining the many options when using IBM SPSS Modeler.

Coming from a TM1 background (more business than statistics), it is easy to get started with modeling once you determine your modeling objective, and Modeler can help with that. IBM SPSS Modeler offers an intuitive interface that will appeal to a wide range of users from the non-technical business user to the statistician, data miner or data scientist.

To read Jim’s full blog post, click here.

]]>
https://blogs.perficient.com/2014/01/06/which-modeling-approach-should-you-use-with-spss-modeler/feed/ 0 214058
TM1 vs. SPSS Modeler Comparison Continues – Setting to Flags https://blogs.perficient.com/2013/12/03/tm1-vs-spss-modeler-comparison-continues-setting-to-flags/ https://blogs.perficient.com/2013/12/03/tm1-vs-spss-modeler-comparison-continues-setting-to-flags/#comments Wed, 04 Dec 2013 01:35:52 +0000 http://blogs.perficient.com/dataanalytics/?p=3957

Consider the scenario where you have to convert information held in a “categorical field” into a “collection of flag fields” – found in a transactional source. For example, suppose you have a file of transactions that includes (among other fields) a customer identifier (“who”) and a product identifier (“what”).  This file of transactional data indicates purchases of various energy drinks by unique customer IDs:

b10.1

 

 

 

 

 

 

 

 

 

 

 

Aggregation and Flag Fields

What I need is have a single record per unique customer showing whether or not that customer purchased each of the energy drinks (True or False – not sales amounts) during a period of time. More like:

b10.2

 

 

Doing it in TM1

In Cognos TM1, you’d utilize TurboIntegrator to read the file – record by record – aggregating the data (by customer id) and updating each measure (energy drink) – a reasonable trivial process, but still requiring lines  of script to be written. Something (assuming you perhaps initialized the period being loaded to all False values) like:

b10.3

 

 

 

 

 

 

MODELER’s Approach

In SPSS MODELER, the SetToFlag node (found in the Field Ops palette) enables you to create the flags you need and to aggregate the data (based on a grouping, or aggregate key fields) at the same time – without writing any script.

In order to have the SetToFlag node populated with the values of the categorical field, the field has to be instantiated so that MODELER knows the values for which to create flag fields. (In practice, this means that your data has to be read in a Type node prior to the SetToFlag node.) The procedure to create flag fields from the file is:

  1. Add a SetToFlag node to your stream.
  2. Edit the SetToFlag node to set the options for the SetToFlag operation.

b10.4

 

 

 

Editing the SetToFlag

Once you have added your SetToFlag node, you use the Settings tab to:

  1. Select the categorical field to be expanded in flags (I selected “Product”). The Available set value box is populated with its categories (when the data are instantiated).
  2. Optionally, add a field name extension for the new flag field’s names, either as suffix or prefix (I left this blank).
  3. Select the categories for which one wishes to create flag fields in the Create flag fields list box and move them to Create flag fields (I selected all of the products).
  4. True and False value can be changed, if desired (I left it blank – no changed).
  5. Optionally, aggregate records by checking the Aggregate keys check box and selecting the appropriate key field(s) – I selected “id”.

b10.5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Conclusion

If you go back to the TM1 example, you now have a cube loaded that you can view by customer and period to which products each customer purchased within the period:

b10.8

 

 

 

 

 

 

 

 

 

In SPSS Modeler, the output from the SetToFlag node is similar:

b10.7

 

 

 

 

 

 

Next up – Data Sampling Patterns!

]]>
https://blogs.perficient.com/2013/12/03/tm1-vs-spss-modeler-comparison-continues-setting-to-flags/feed/ 1 199971
Data Indiscretions https://blogs.perficient.com/2013/11/25/data-indiscretions/ https://blogs.perficient.com/2013/11/25/data-indiscretions/#respond Mon, 25 Nov 2013 17:32:08 +0000 http://blogs.perficient.com/dataanalytics/?p=3940

Data loaded into a TM1 or SPSS model will, in most cases, include files consisting of thousands (or hundreds of thousands) of records. It is not reasonable, given the number of fields and records in files of this size, for you to visually inspect all fields in every record (of every file) for missing or invalid values.

Data Transformations

In TM1, data is usually loaded from an EDW or GL system, so (once the initial testing phase has been completed) the probability that incoming data contains unexpected values (should be) somewhat small. However, the data will (most likely) need to be transformed into a format that the TM1 model can use, or is more optimal for the model’s use. Additionally, if data is being manually input to the model (for example a user entering a (sales) forecast value), then inspecting the data for invalid values is required.

With SPSS Modeler, data may come from sources more “sketchy” – marketing surveys, multiple files merged into one or even manually typed-in data. Obviously, the need for auditing and transforming is critical.

Types of Data Violations

Generally, (with TM1 or SPSS) three different types of data violations may be found during the loading process:

  • Values do not comply (with a field’s defined storage type). For example, finding a string value in a numeric field.
  • Only a set of values or a range of values are allowed for a field and the incoming value does not exist within the defined set or is greater or less than the acceptable range.
  • Undefined (SPSS refers to this as $null$ and has “rules” about how it handles these values) values encountered anywhere (no data!).

Actions

When an invalid value (a value violating one of the three rules) is found, one of five possible actions can be taken:

  • Nullify – in SPSS, you can convert the value to undefined ($null$) again, this has a special meaning for SPSS, for TM1, this is not usually a viable option since TM1 does not work well with NULL values.
  • Coerce – you can convert the invalid to valid. What invalid and valid is usually determined by the measurement level of the field. In SPSS:
    • Flags; if not True or False, then it is False
    • Nominal, Ordinal; if invalid, it becomes the first member of the sets values
    • Continuous; value less than lower bound-> lower bound, value greater than upper bound -> upper bound
    • Undefined ($null$) -> midpoint of range

In TM1, this process is more typically a “remapping” from one value to another. For example, translating a product’s code used in one system to that products code used in another.

  • Discard – delete the record – with SPSS, you can Discard the record, in TM1, you might use an ItemReject function;
  • Warn – invalids are reported in a message window in SPSS, minor errors may be written to the message log in TM1.
  • Abort – In SPSS, the first invalid value encountered can result in an error and the stream execution aborted, with TM1, processes can be terminated with the ProcessError function.

Handling data Violations

In SPSS Modeler, the Type node enables data checking and transforming. (Checking can also be done using the Types tab in a data source node). To check and transform, you need to specify 1) what the valid values are and 2) the action to be taken:

  1. Select the Types tab.
  2. Select the field to check and click in the corresponding cell in the Values column.
  3. Select Specify values and labels and enter the lower and upper bound.
  4. Select the action to take when an invalid value is encountered.

r2

 

 

 

 

 

 

 

 

 

 

 

In TM1, checking for and handling (transforming) of invalid values is a little more work.

Using TurboIntegrator processes is the best approach for loading and transforming data in a TM1 model and scripting logic (using predefined functions) is required to evaluate and take action on records being loaded into TM1:

  • Select the Data tab.
  • Type the script to use the appropriate TM1 functions. Some examples include:
    • Checking for invalid data types – Value_Is_String (does the cell contain a string value?)
    • Checking data against a set of values – DIMIX (does the value exist in a dimension?)
    • Checking data to ensure it is within a range – If (CapitalValue < MaxAllowed & CapitalValue > MinAllowed); (is the value within the allowed range?)
    • Checking for missing data (empty fields) – IF (value) @= ‘’ (is the value empty?)
  • Save and Run the process.

r3

 

 

 

 

 

 

 

 

 

 

Conclusion

In both tools you have similar objectives – load data, ensure that the data is “usable” and if it is not, perform an appropriate action (rather than break the model!). SPSS Modeler allows you to do some checking and transforming by selecting values in dialogs, while Cognos TM1 requires you to use TI scripting to accomplish these basic operations. Both can be straight forward or complex and both can be automated and reused on future datasets.

 

]]>
https://blogs.perficient.com/2013/11/25/data-indiscretions/feed/ 0 199968
Primary Practices for Examining Data https://blogs.perficient.com/2013/11/21/primary-practices-for-examining-data/ https://blogs.perficient.com/2013/11/21/primary-practices-for-examining-data/#respond Fri, 22 Nov 2013 02:03:18 +0000 http://blogs.perficient.com/dataanalytics/?p=3929

SPSS Data Audit Node

z1

 

 

 

Once data is imported into SPSS Modeler, the next step is to explore the data and to become “thoroughly acquainted” with its characteristics. Most (if not all) data will contain problems or errors such as missing information and/or invalid values. Before any real work can be done using this data you must assess its quality (higher quality = more accurate the predictions).

Addressing issues of data quality

Fortunately, SPSS Modeler makes it (almost too) easy! Modeler provides us several nodes that can be used for our integrity investigation. Here are a couple of things even a TM1 guy can do.

Auditing the data

After importing the data, do a preview to make sure the import worked and things “look okay”.

In my previous blog I talked about a college using predictive analytics to predict which students might or might not graduate on time, based upon their involvement in athletics or other activities.

From the Variable File Source node, it was easy to have a quick look at the imported file and verify that the import worked.

z2

 

 

 

 

 

 

 

 

Another useful option is run a table. This will show if field values make sense (for example, if a field like age contains numeric values and no string values). The Table node is cool – after dropping it into my stream and connecting my source node to it, I can open it up and click run (to see all of my data nicely fit into a “database like” table) or I can do some filtering using the real-time “expression builder”.

z3

 

 

 

 

 

 

 

 

 

 

 

 

 

The expression builder lets me see all of the fields in my file (along with their level of measurement (shown as Type) and their Storage (integer, real, string). It also gives me the ability to select from SPSS predefined functions and logical operators to create a query expression to run on my data. Here I wanted to highlight all students in the file that graduated “on time”:

z4

 

 

 

 

 

 

 

 

 

 

You can see the possibilities that the Table node provides – but of course it is not practical to visually inspect thousands of records. A better alternative is the Data Audit node.

The Data Audit node is used to study the characteristics of each field. For continuous fields, minimum and maximum values are displayed. This makes it easy to detect out of range values.

Our old pal measurement level

Remember, measurement level (a fields “use” or “purpose”)? Well the data audit node reports different statistics and graphs, depending on the measurement level of the fields in your data.

For categorical fields, the data audit node reports the number of unique values (the number of categories).

For continuous fields, minimum, maximum mean, standard deviation (indicating the spread in the distribution), and skewness (a measure of the asymmetry of a distribution; if a distribution is symmetric it has a skewness value of 0) are reported.

For typeless fields, no statistics are produced.

“Distribution” or “Histogram”?

The data audit node also produces different graphs for each field (except for typeless fields, no graphs are produced for them) in your file (again based upon the field’s level of measurement).

For a categorical field (like “gender”) the Data Audit Node will display a distribution graph and for a continuous field (for example “household income”) it will display a histogram graph.

So back to my college’s example, I added an audit node to my stream and took a look at the results.

z5

 

 

 

 

 

 

 

 

 

First, I excluded the “ID” field (it is just a unique student identification number and has no real meaning for the audit node). Most of the fields in my example (gender, income category, athlete, activities and graduate on time) are qualified as “Categorical” so the audit node generated distribution graphs, but the field “household income” is a “Continuous” field, so a histogram was created for it (along with the meaningful statistics like Min, Max, Mean, etc.).

z6

 

 

 

 

 

 

 

 

 

 

 

 

Another awesome feature – if you click on the generated graphs, SPSS will give you a close up of the graph along with totals, values and labels.

Conclusion

I’ve talked before about the importance of understanding field measure levels. The fact that the audit data node generates statistics and chart types are derived from the measurement level is another illustration of how modeler uses the approach that measurement level determines the output.

 

]]>
https://blogs.perficient.com/2013/11/21/primary-practices-for-examining-data/feed/ 0 199967
Data Consumption – Cognos TM1 vs. SPSS Modeler https://blogs.perficient.com/2013/11/20/data-consumption-cognos-tm1-vs-spss-modeler/ Thu, 21 Nov 2013 00:55:41 +0000 http://blogs.perficient.com/dataanalytics/?p=3925

In TM1, you may be used to “integer or string”, in SPSS Modeler, data gets much more interesting. In fact, you will need to be familiar with a concept known as “Field Measurement Level” and the practice of “Data Instantiation.

In TM1, data is transformed by aggregation, multiplication or division, concatenation or translation, and so on, all based on the “type” of the data (meaning the way it is stored), with SPSS, the storage of a field is one thing, but the use of the field (in data preparation and in modeling) is another. For example if you take (numeric) data fields such as “age” and “zip code”, I am sure that you will agree that age has “meaning” and a statistic like mean age makes sense while the field zip code is just a code to represent a geographical area so mean doesn’t make sense for this field.

So, considering the intended use of a field, one needs the concept of measurement level. In SPSS, the results absolutely depend on correctly setting a field’s measurement level.

Measurement Levels in Modeler

SPSS Modeler defines 5 varieties of measurement levels. They are:

  • Flag,
  • Nominal,
  • Ordinal,
  • Continuous and
  • Typeless

Flag

This would describe a field with only 2 categories – for example male/female.

Nominal

A nominal field would be a field with more than 2 categories and the categories cannot be ranked. A simple example might be “region”.

Ordinal

An Ordinal field will contain more than 2 categories but the categories represent ordered information perhaps an “income category” (low, medium or high).

Continuous

This measurement level is used to describe simple numeric values (integer or real) such as “age” or a “years of employment”.

Typeless

Finally, for everything else, “Typeless” is just that – for fields that do not conform to any other types –like a customer ID or account number.

 

Instantiation

Along with the idea of setting measurement levels for all fields in a data file, comes the notion of Instantiation.

In SPSS Modeler, the process of specifying information such as measurement level (and appropriate values) for a field is called instantiation.

inst1

 

 

 

 

 

 

Data consumed by SPSS Modeler qualifies all fields as 3 kinds:

  • Un-instantiated
  • Partially Instantiated
  • Fully Instantiated

Fields with totally unknown measurement level are considered un-instantiated. Fields are referred to as partially instantiated if there is some information about how fields are stored (string or numeric or if the fields are Categorical or Continuous), but we do not have all the information. When all the details about a field are known, including the measurement level and values, it is considered fully instantiated (and Flag, Nominal Ordinal, or Continuous is displayed with the field by SPSS).

It’s a Setup

Just as TM1’s TurboIntegrator “guesses” what field (storage) type and use (contents to TM1 developers) based upon a specified fields value (of course you can override these guesses), SPSS data source nodes will initially assign a measurement level to each field in the data source file for you- based upon their storage value (again, these can be overridden). Integer, real and date fields will be assigned a measurement level of Continuous, while strings area assigned a measurement level of Categorical.

inst2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This is the easiest method for defining measurement levels – allowing Modeler to “autotype” by passing data through the source node and then manually reviewing and editing any incorrect measurement levels, resulting a fully Instantiated data file.

]]>
199966
Performance Testing TM1Web Applications with HP LoadRunner https://blogs.perficient.com/2013/09/30/performance-testing-tm1web-applications-with-hp-loadrunner/ Mon, 30 Sep 2013 11:53:39 +0000 http://blogs.perficient.com/dataanalytics/?p=3816

speedometer

If you’ve ever attempted to perform a performance test on a TM1 application, you know that there is not really an effective way to manually create sufficient load on a model. Getting “real users” to execute application operations – over and over again – is nearly impossible. Thankfully, an automated testing product can solve for this.

Automated testing refers to creating and running test “scripts” mechanically, concurrently, for as long as you need to.  There are many tools today that can be used for automating your performance testing. One of the best is HP LoadRunner.

What is LoadRunner?

HP LoadRunner is a performance and test automation product from Hewlett-Packard that can be used for examining system behavior and performance, while generating actual load on your TM1 application.

HP’s description of the product is:

“HP LoadRunner is the industry-standard performance testing product for predicting system behavior and performance. Using limited hardware resources, LoadRunner emulates hundreds or thousands of concurrent users to put the application through the rigors of real-life user loads.”

LoadRunner is made up of 3 main components which are:

  • Virtual User Generator or “VUGen” (where LoadRunner creates virtual users for your application using “scripts” that you create)
  • The Controller (the “user script runner”)
  • Analysis Engine (shows the detailed results)

LoadRunner Objectives

The LoadRunner website goes on to outline typical performance testing objectives as:

  • Emulate conditions of controlled load and maximum load on the application environment
  • Measure application performance under load (response time, memory, throughput, etc.)
  • Check where performance delays occur: network or client delays, CPU performance, I/O delays, database locking, or other issues at the server
  • Monitor the network and server resources under load

Plan

A clearly defined routine (a plan) will make sure that your performance test is effective. Just because you have the ability to automate activities doesn’t guarantee that those activities are meaningful, based upon your objectives. It is advisable to clearly define and document even the most obvious information. A “best practice” recommendation is to develop a “performance testing questionnaire template” that can be filled out before the testing begins. The questionnaire will insure that all of the “ingredients” required are available at the appropriate time during the testing routine.

The following are examples of detail provided in a typical questionnaire:

Generalizations

  • Define “the team” – Identify application owners and their contact information
  • Are there any SLAs in place?
  • What are the start and (expected) end dates for the testing?
  • Name the environment to perform the test (QA? Production?) – including platform, servers, network, database, Web services, etc.
  • What about network specifics – is the application accessible to outside users via the internet? Is there a specific bandwidth allocated to the application? Dedicated or burstable? Is the application hosted internally or externally? Is any load balancing occurring?
  • Describe the details of all components of the infrastructure required to support this application (IP Address, Server Name, Software/version, OS (version/SP), CPU, Memory, Disk Sizes, etc.)
  • From a client perspective, how does a user access the application? A web browser or other? (Citrix or thick client, etc.) If a browser is used define the supported browser – JAVA, ActiveX?
  • Identify the conceptual goals of the performance tests – for example “to support expected user concurrency with acceptable response times.” And then provide quantitative goals – such as “simulate 75 concurrent users due to the nature of the business process which has the worldwide Affiliates (largest group of users) performing updates and reporting in a period of less than 24 hours”.
  • Test requirements such as response time, load, memory, CPU utilization, etc. and provide any exiting performance metrics

Application, User and Use-Case Overviews

  • What is the application to be tested? What is the purpose of the application (an overview)?(It is recommended that a high level architectural diagram be provided)
  • Name the critical vs. non-critical transactions within the application under test
  • User behavior, user locations, and application timeframe usability regarding the application under test
  • List the business use cases that the application solves for and a brief description of each, then define the business process with in your application that will be included in the scope of the performance test. For example ‘Sales Planning contributors enter, review and adjust a current sales plan.”
  • Define the categories of users for your application; identify which business processes are used by these users and state the percentage of the overall system activity this category of user represents.
  • Estimate the average and peak number of simultaneous users for each business process (as seen in a normal day of business). Indicate which business processes may experience heavy throughput or are considered mission critical.
  • For each business process (selected to be included in the performance test) document the specific user steps required to complete and determine whether the step requires input data.
  • For each business process selected to be included in the performance test document number of concurrent users and percentage of total concurrent users that will be assigned to each process.
  • Finally, for each remote location to be simulated, document number of concurrent users and percentage of total concurrent users that will be assigned to each process for tat site.

 Preparation

Once the questionnaire is completed, the performance test team will use the information to create virtual or “Vusers” scripts within LoadRunner. Vusers emulate human users interacting with your application. A Vuser script contains the actions that each Vuser performs during scenario execution.

Test

Once the Vuser scripts are constructed, you can emulate load by instructing multiple Vusers to perform tasks simultaneously (load level is set by increasing or decreasing the number of Vusers that perform tasks at the same time).

Presentation

The LoadRunner analysis engine helps you “slice and dice” generated test data in many ways to determine which transactions passed or failed (your test objectives defined in the questionnaire), as well as some potential causes of failure. You can then generate custom reports to present to stakeholders.

Conclusion

Any enterprise level application will be performance tested before production delivery and periodically during its life. Typically you’ll find performance testing is conducted prior to delivery of the application but is not executed thereafter; is recommended that a schedule be established for re-execution of the performance tests, especially for applications that are evolving (new features being added) or experiencing a widening user base.

jm

]]>
199957