Organizations interested in using analytics to predict outcomes will score data pools by applying an appropriate predictive model. Pre-built predictive models are becoming increasingly available in the market place. Data scientists that are knowledge experts in particular areas are developing models that have increasingly better success rates. However the best approach may be for an organization to develop its own models based upon its own data as input to the models design.
IBM SPSS Statistics provides procedures for building predictive models such as regression, clustering, tree, and neural network models. The process of building a model with SPSS is not at all complicated given the tools power; however challenges remain in perfecting the ability to:
- select the appropriate model type
- evaluate the performance of the constructed or generated model
- save the model in such a way that it can be easily available for reuse on future datasets and by less experienced individuals within the organization
What Model Type best supports my needs?
The practice of selecting an appropriate predictive model type starts with learning about and truly understanding your data (a process greatly simplified by SPSS and also the subject of several of my earlier (and future) blog posts). The sorting, labeling, categorizing, etc. of your data will help you establish what is known as the “outcome of interest” or the “target” of your model – which is then used to determine which type of predictive model you should build.
SPSS builds predictive models based upon an identified and labeled “result dataset”. For example, if you want to build a model that will predict responses from a direct mailing campaign, you need to start with data that contains information on who responded and who did not respond in a previous mailing.
Building a Predictive Model
Once you have established what model type to build, you can use the appropriate SPSS procedures. For example, (using the above mentioned direct mailing example), in SPSS, you can select “Direct Marketing” from the Viewer menu and then “Choose Technique”.
The “Direct Marketing” dialog is displayed (which) is separated into 3 sections: “Understanding My Contacts”, “Improve My Marketing Campaigns” and “Score My Data”.
If we select “Select Contacts Most Likely to Purchase” and then Continue,
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
The “Propensity to Purchase” dialog is then displayed – this is where we the designate key attributes required for building our model:
- Response Field and Positive Response Value (these fields indicate who responded to the test mailing or previous campaign, and what value is consider to be the positive response).
- Predict Propensity with (fields from the data file) (these are the fields in your file that you want to use to predict the probability that contacts with similar characteristics will respond) and an
- Output format (as XML) (this is the name of the XML file that the generated model will be saved as).
After you have supplied the above information, you can click on the “Settings” tab to:
- Indicate if you want to validate your model (you should always validate)
- Designate what types of diagnostic output for SPSS to generate on your model (in case there are errors in the generation of the model)
- Provide a Name and Label for the Recorded Response Field recode (SPSS creates this new field to use for its scoring)
- Provide a Name for the file to save the scores of the data generated by the model
Once you have set all of your parameters, you can click Run and SPSS will generate your predictive model.
Evaluating a Predictive Models Performance
Of course, before you can actually use your model, you need to evaluate how well the model will actually work. This effort can be complicated and should not be taken lightly (and might be the topic of an entire book rather than a paragraph in a blog). The results generated by SPSS for evaluation depend on the techniques you used to generate the model. In our mailing example, we used the “Propensity to Purchase” feature which produces an “overall model quality chart” and a “classification table”. Other SPSS techniques will supply other model evaluation information.
Saving and Deploying your Predictive Model
Once your model has been built, the model can be saved in a “model file” that contains all of the information necessary to reconstruct the model. The most popular file format is XML (or PMML which I covered in an earlier blog and is really the future standard for predictive model files).
(In our example we designated the output file format on the “Propensity to Purchase” dialog).
Once SPSS has successfully generated your model (and you have performed an evaluation of the model), you can then use that model file to generate predictive scores based upon other datasets:
Form the SPSS Viewer; you can select “Utilities” and then “Scoring Wizard”. From the Scoring Wizard initial dialog, you can browse to and select your previously saved predictive model XML file.
The SPSS Scoring Wizard will guide you through:
- Selecting a previously saved scoring model (this is your XML file)
- Ensuring all Model names are matched to the current data file names
- Indicating which scoring functions (available in the file) are to be used and
- Actually applying the model to the data (perform the scoring)
Conclusion
Understanding your data, preparing it for input to model generation, generating a predictive model, evaluating model performance and saving a model for re-use are all tasks that IBM SPSS fully supports making it a premier choice for predictive analytics. It is at tool that simplifies the mechanics of predictive model engineering.
Next time, more analytics excitement!