Skip to main content

Data & Intelligence

Missing Data – “Nothing from nothing” – leaves something?

In TM1, missing data usually means that there is a defect in the logic of your ETL script or you need to check your SQL. In SPSS Modeler, missing values arise for a variety of reasons and they must be considered carefully.

You might expect that missing values imply errors or should those records be thrown away, but that is not always the case. In fact, (in the SPSS world) analysts have found that missing values can be useful information.

What to do

Assuming you TM1 developers have your ETL and SQL under control, we can focus on SPSS. When your data contains missing data the first step is to assess the type and amount of missing values (for each field) in your file, the second step is to determine how to handle it.

Modeler recognizes types of “missing values”. They are:

  • System undefined values (represented as $null$).
  • Predefined (or user-defined) values (that represent missing information).

Blanks

You’ll know (as a TM1 developer) that you need to understand that there is a difference in how you handle the programming of numeric missing data as opposed to string missing data (but that’s another story for another time as this discussion is focused on SPSS MODELER).

In MODELER, missing information is “dealt with” by declaring them as “blanks”. Note: You need to declare your missing information because by default, neither of the two types of missing information (mentioned above) is defined as a blank value by MODELER. For example, if a field, sex is checked against values Male and Female, with action Abort attached to it, then stream execution is aborted if an empty space is encountered. If it is desired that stream execution is not interrupted in this situation, you need to declare these values as blank values.

Blanks for Strings

String fields with missing values require additional considerations.

  1. A string value can be empty, which means that it contains nothing (this is called an empty string).
  2. A string value can be a series of spaces (this is called white space).

Global Knowledge – Introduction to Data Mining“It is important to distinguish between the normal use of the word “blank” and Modelers use of it; to avoid misunderstanding, the use of the wording “series of spaces” instead of “series of blanks” here – blanks are predefined codes representing missing data”.

Blanks in Your Analysis

It is important to understand that defining blanks does not necessarily have a direct effect on how MODELER treats that value for a field in all nodes.

For example, if the value 99 is defined as a blank value for a field children (number of children); a histogram will include the blank value. One way to work around this would be to replace blank values with the undefined value ($null$), because that value is excluded from graphical displays.

It should be emphasized that how you handle your missing data when developing a model has some complications. If you use only valid records to develop a model to predict and outcome, then you cannot use the model successfully on new data unless all the fields in that file have non missing data.

Defining Blanks

To define blanks in SPSS Modeler, you can use the Missing column on the Type tab of a Type node. The Type node is a general approach to defining blanks and can be inserted at any point in a stream:

  1. Chick on the Types tab.
  2. Click in the column Missing for the field that needs blank definitions and select Specify… from the context menu.

bb1

 

 

 

 

 

 

 

 

 

 

 

 

 

(Once blanks have been defined for a field, an asterisk in the cell in the Missing column indicates that missing values have been defined for that field!)

 Numeric v. String

Although the dialogs are the same for numeric and string fields, it is important to note that when we talk about numeric fields we use “blanks whereas with strings, it’s “missing”.

Cheers!

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Jim Miller

Mr. Miller is an IBM certified and accomplished Senior Project Leader and Application/System Architect-Developer with over 30 years of extensive applications and system design and development experience. His current role is National FPM Practice Leader. His experience includes BI, Web architecture & design, systems analysis, GUI design and testing, Database modeling and systems analysis, design, and development of Client/Server, Web and Mainframe applications and systems utilizing: Applix TM1 (including TM1 rules, TI, TM1Web and Planning Manager), dynaSight - ArcPlan, ASP, DHTML, XML, IIS, MS Visual Basic and VBA, Visual Studio, PERL, Websuite, MS SQL Server, ORACLE, SYBASE SQL Server, etc. His Responsibilities have included all aspects of Windows and SQL solution development and design including: analysis; GUI (and Web site) design; data modeling; table, screen/form and script development; SQL (and remote stored procedures and triggers) development and testing; test preparation and management and training of programming staff. Other experience includes development of ETL infrastructure such as data transfer automation between mainframe (DB2, Lawson, Great Plains, etc.) systems and client/server SQL server and Web based applications and integration of enterprise applications and data sources. In addition, Mr. Miller has acted as Internet Applications Development Manager responsible for the design, development, QA and delivery of multiple Web Sites including online trading applications, warehouse process control and scheduling systems and administrative and control applications. Mr. Miller also was responsible for the design, development and administration of a Web based financial reporting system for a 450 million dollar organization, reporting directly to the CFO and his executive team. Mr. Miller has also been responsible for managing and directing multiple resources in various management roles including project and team leader, lead developer and applications development director. Specialties Include: Cognos/TM1 Design and Development, Cognos Planning, IBM SPSS and Modeler, OLAP, Visual Basic, SQL Server, Forecasting and Planning; International Application Development, Business Intelligence, Project Development. IBM Certified Developer - Cognos TM1 (perfect score 100% on exam) IBM Certified Business Analyst - Cognos TM1

More from this Author

Follow Us