Skip to main content

Data & Intelligence

Predictive Model Engineering

Organizations interested in using analytics to predict outcomes will score data pools by applying an appropriate predictive model. Pre-built predictive models are becoming increasingly available in the market place. Data scientists that are knowledge experts in particular areas are developing models that have increasingly better success rates. However the best approach may be for an organization to develop its own models based upon its own data as input to the models design.

IBM SPSS Statistics provides procedures for building predictive models such as regression, clustering, tree, and neural network models. The process of building a model with SPSS is not at all complicated given the tools power; however challenges remain in perfecting the ability to:

  • select the appropriate model type
  • evaluate the performance of the constructed or generated model
  • save the model in such a way that it can be easily available for reuse on future datasets and by less experienced individuals within the organization

What Model Type best supports my needs?

The practice of selecting an appropriate predictive model type starts with learning about and truly understanding your data (a process greatly simplified by SPSS and also the subject of several of my earlier (and future) blog posts). The sorting, labeling, categorizing, etc. of your data will help you establish what is known as the “outcome of interest” or the “target” of your model – which is then used to determine which type of predictive model you should build.

SPSS builds predictive models based upon an identified and labeled “result dataset”. For example, if you want to build a model that will predict responses from a direct mailing campaign, you need to start with data that contains information on who responded and who did not respond in a previous mailing.

Building a Predictive Model

Once you have established what model type to build, you can use the appropriate SPSS procedures. For example, (using the above mentioned direct mailing example), in SPSS, you can select “Direct Marketing” from the Viewer menu and then “Choose Technique”.

The “Direct Marketing” dialog is displayed (which) is separated into 3 sections:  “Understanding My Contacts”, “Improve My Marketing Campaigns” and “Score My Data”.

If we select “Select Contacts Most Likely to Purchase” and then Continue,

Data Intelligence - The Future of Big Data
The Future of Big Data

With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.

Get the Guide

The “Propensity to Purchase” dialog is then displayed – this is where we the designate key attributes required for building our model:

  • Response Field and Positive Response Value (these fields indicate who responded to the test mailing or previous campaign, and what value is consider to be the positive response).
  • Predict Propensity with (fields from the data file) (these are the fields in your file that you want to use to predict the probability that contacts with similar characteristics will respond) and an
  • Output format (as XML) (this is the name of the XML file that the generated model will be saved as).

After you have supplied the above information, you can click on the “Settings” tab to:

  • Indicate if you want to validate your model (you should always validate)
  • Designate what types of diagnostic output for SPSS to generate on your model (in case there are errors in the generation of the model)
  • Provide  a Name and Label for the Recorded Response Field recode (SPSS creates this new field to use for its scoring)
  • Provide a Name for the file to save the scores of the data generated by the model

Once you have set all of your parameters, you can click Run and SPSS will generate your predictive model.

Evaluating a Predictive Models Performance

Of course, before you can actually use your model, you need to evaluate how well the model will actually work. This effort can be complicated and should not be taken lightly (and might be the topic of an entire book rather than a paragraph in a blog). The results generated by SPSS for evaluation depend on the techniques you used to generate the model. In our mailing example, we used the “Propensity to Purchase” feature which produces an “overall model quality chart” and a “classification table”. Other SPSS techniques will supply other model evaluation information.

Saving and Deploying your Predictive Model

Once your model has been built, the model can be saved in a “model file” that contains all of the information necessary to reconstruct the model. The most popular file format is XML (or PMML which I covered in an earlier blog and is really the future standard for predictive model files).

(In our example we designated the output file format on the “Propensity to Purchase” dialog).

Once SPSS has successfully generated your model (and you have performed an evaluation of the model), you can then use that model file to generate predictive scores based upon other datasets:

Form the SPSS Viewer; you can select “Utilities” and then “Scoring Wizard”. From the Scoring Wizard initial dialog, you can browse to and select your previously saved predictive model XML file.

The SPSS Scoring Wizard will guide you through:

  • Selecting a previously saved scoring model (this is your XML file)
  • Ensuring all Model names are matched to the current data file names
  • Indicating which scoring functions (available in the file) are to be used and
  • Actually applying the model to the data (perform the scoring)

Conclusion

Understanding your data, preparing it for input to model generation, generating a predictive model, evaluating model performance and saving a model for re-use are all tasks that IBM SPSS fully supports making it a premier choice for predictive analytics. It is at tool that simplifies the mechanics of predictive model engineering.

Next time, more analytics excitement!

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Jim Miller

Mr. Miller is an IBM certified and accomplished Senior Project Leader and Application/System Architect-Developer with over 30 years of extensive applications and system design and development experience. His current role is National FPM Practice Leader. His experience includes BI, Web architecture & design, systems analysis, GUI design and testing, Database modeling and systems analysis, design, and development of Client/Server, Web and Mainframe applications and systems utilizing: Applix TM1 (including TM1 rules, TI, TM1Web and Planning Manager), dynaSight - ArcPlan, ASP, DHTML, XML, IIS, MS Visual Basic and VBA, Visual Studio, PERL, Websuite, MS SQL Server, ORACLE, SYBASE SQL Server, etc. His Responsibilities have included all aspects of Windows and SQL solution development and design including: analysis; GUI (and Web site) design; data modeling; table, screen/form and script development; SQL (and remote stored procedures and triggers) development and testing; test preparation and management and training of programming staff. Other experience includes development of ETL infrastructure such as data transfer automation between mainframe (DB2, Lawson, Great Plains, etc.) systems and client/server SQL server and Web based applications and integration of enterprise applications and data sources. In addition, Mr. Miller has acted as Internet Applications Development Manager responsible for the design, development, QA and delivery of multiple Web Sites including online trading applications, warehouse process control and scheduling systems and administrative and control applications. Mr. Miller also was responsible for the design, development and administration of a Web based financial reporting system for a 450 million dollar organization, reporting directly to the CFO and his executive team. Mr. Miller has also been responsible for managing and directing multiple resources in various management roles including project and team leader, lead developer and applications development director. Specialties Include: Cognos/TM1 Design and Development, Cognos Planning, IBM SPSS and Modeler, OLAP, Visual Basic, SQL Server, Forecasting and Planning; International Application Development, Business Intelligence, Project Development. IBM Certified Developer - Cognos TM1 (perfect score 100% on exam) IBM Certified Business Analyst - Cognos TM1

More from this Author

Follow Us