Skip to main content

Data & Intelligence

Automated Data Preparation (ADP) IBM SPSS Statistics Base

Automated Data Preparation (ADP)

 

The seasoned data scientist knows that probably the single most import step in creating a predictive model is pinpointing the appropriate “data pond” and ensuring that it is properly “prepared”. I’ve written about the many “out of the box” tools that SPSS users can use to manage data, such as the ability to:

  • List Cases ,
  • Identify and Replace Missing Values,
  • Transform and Compute new variables,
  • Recode,
  • Select Cases,
  • Sort Cases and even
  • Merge Files.

These features are accessible in SPSS Statistics Base from pull-down menus. In addition, SPSS goes one step further and offers “Automated Data Preparation” or “ADP”.

Automated Data Preparation (ADP) automatically analyzes your data and identifies fixes, screening out fields that are a problem or not useful, deriving new attributes when appropriate, and improving performance through intelligent screening techniques. You can use the ADP in “Automatic” mode (allowing it to choose and apply fixes), or you can use it in “Interactive” mode (previewing the changes before they are made and accept or reject them as desired).

Using ADP enables you to make your data ready for model building quickly and easily, without needing prior knowledge of the statistical concepts involved. Models will tend to build and score more quickly; in addition, using ADP improves the robustness of automated modeling processes.

To run ADP interactively, you simply choose Transform and then Prepare for Modeling and then Interactive…

The “Interactive Data Preparation” dialog is displayed:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The first tab asks for an objective that controls the default settings. The options are:

  • Balance speed & accuracy
  • Optimize for Speed
  • Optimize for accuracy
  • Customize

Each of the objectives will yield different results. It is recommended that each of the options be explored and understood before attempting to select the option that might be best for your data. The online help tells us:

• Balance speed & accuracy creates fields usable in modeling from dates, and may transform continuous fields like reside to make them more normally distributed.

• Optimize for accuracy creates some extra fields from dates (it also checks for outliers, and if the target is continuous, may transform it to make it more normally distributed).

• Optimize for speed does not prepare dates and does not rescale continuous fields, but does merge categories of categorical predictors and bin continuous predictors when the target is categorical (and perform feature selection and construction when the target is continuous).

Off course, ADP runs its analysis using its “best guess” fields and settings based upon what it “sees” in your file, but as you become more experienced, you may want to “override” the default choices and select and set your own.

Finally, from the “Analysis” tab, you can review both tabular and graphical output that summarizes the processing of your data and displays recommendations as to how the data may be modified or improved for scoring. You can then review and either accept or reject those recommendations.

Finally, (of course) all of the results of the analysis and automated data preparation can be saved to a PMML file!

 

Butch Cassidy: [to Sundance] Boy, I got vision, and the rest of the world wears bifocals.

 

 

 

Thoughts on “Automated Data Preparation (ADP) IBM SPSS Statistics Base”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Jim Miller

Mr. Miller is an IBM certified and accomplished Senior Project Leader and Application/System Architect-Developer with over 30 years of extensive applications and system design and development experience. His current role is National FPM Practice Leader. His experience includes BI, Web architecture & design, systems analysis, GUI design and testing, Database modeling and systems analysis, design, and development of Client/Server, Web and Mainframe applications and systems utilizing: Applix TM1 (including TM1 rules, TI, TM1Web and Planning Manager), dynaSight - ArcPlan, ASP, DHTML, XML, IIS, MS Visual Basic and VBA, Visual Studio, PERL, Websuite, MS SQL Server, ORACLE, SYBASE SQL Server, etc. His Responsibilities have included all aspects of Windows and SQL solution development and design including: analysis; GUI (and Web site) design; data modeling; table, screen/form and script development; SQL (and remote stored procedures and triggers) development and testing; test preparation and management and training of programming staff. Other experience includes development of ETL infrastructure such as data transfer automation between mainframe (DB2, Lawson, Great Plains, etc.) systems and client/server SQL server and Web based applications and integration of enterprise applications and data sources. In addition, Mr. Miller has acted as Internet Applications Development Manager responsible for the design, development, QA and delivery of multiple Web Sites including online trading applications, warehouse process control and scheduling systems and administrative and control applications. Mr. Miller also was responsible for the design, development and administration of a Web based financial reporting system for a 450 million dollar organization, reporting directly to the CFO and his executive team. Mr. Miller has also been responsible for managing and directing multiple resources in various management roles including project and team leader, lead developer and applications development director. Specialties Include: Cognos/TM1 Design and Development, Cognos Planning, IBM SPSS and Modeler, OLAP, Visual Basic, SQL Server, Forecasting and Planning; International Application Development, Business Intelligence, Project Development. IBM Certified Developer - Cognos TM1 (perfect score 100% on exam) IBM Certified Business Analyst - Cognos TM1

More from this Author

Follow Us