Skip to main content

Data & Intelligence

BM SPSS Statistics – Data Management Toolset

IBM SPSS Statistics – Data Management Toolset (DMS)

In a recent blog post I listed some of the more helpful “data management tools” offered within IBM SPSS Statistics version 20 (Case Summaries, Replace Missing Values, Transform and Compute, Recode, Select Cases, Sort Cases and Merge Files) and would like to review them today.

These tools can be used to support a best practice approach to data analysis:

 

 

 

 

 

 

 

 

 

  1. Identification (of the data to be used for analysis)
  2. Labeling (of the data variables)
  3. Verification (of the data – based upon label variable assumptions)

Case Summaries

The best statisticians (today’s data scientists) will find it valuable to do a visual examination of their data and its defined variable assumptions. One of the most effective methods to do this is to make use of the SPSS Case Summaries option. The Case Summaries command [from the Statistics Viewer select Analyze and then Reports and then Case Summaries] allows you to list an entire data file or a subset of that file, either grouped or in the order of the original data. From the “Summarize Cases” dialog the variables defined in the data are listed and you select which variables you wish to summarize on. In selecting the variables you have the option of choosing the order in which they appear in the output generated. Several other options also allow you to select and format both the content and structure of the output -you can specify groupings,  provide headings and captions for your summarizes as well as have SPSS provide additional information on those case summaries – such as minimum, maximum, first, last, mean, etc.

Replace Missing Values

In any data there is a very good chance that you will encounter missing values. Missing values can be a pain to deal with especially in a larger data file. More so, missing values may also influence the analyses of the data. To resolve this issue, SPSS offers Replace Missing Values [from the Statistics Viewer select Transform and then Replace Missing Values].

You should take note that SPSS specifies a difference between system-missing values and user-missing values where system-missing values are simply omissions in your dataset; a user-missing value is a value that is specified by the researcher as a missing value.

The Replace Missing Values dialog box allows you to create new variables from existing ones, replacing missing values with estimates computed with one of several methods.

Transform and Compute Variable

In most data files you want to have SPSS calculate totals based upon existing values. You can use Transform and Compute Variable Values [from the Statistics Viewer select Transform and then Compute Variable] to simplify this task. You can:

  • Compute values for a variable based on numeric transformations of other variables.
  • Compute values for numeric or string (alphanumeric) variables.
  • Create new variables or replace the values of existing variables. For new variables, you can also specify the variable type and label.
  • Compute values selectively for subsets of data based on logical conditions.
  • Use a large variety of built-in functions, including arithmetic functions, statistical functions, distribution functions, and string functions.

Recode

SPSS Recode can also generate new variables – not by calculating totals from existing values like Transform and Compute – but by dividing existing variables into new categories. Using Recode Values [from the Statistics Viewer select Transform and then Recode into Same or Recode into Different] you can reassign the values of existing variables or collapse ranges of existing values into new values (Recode into Same) or reassign the values of existing variables or collapse ranges of existing values into new values for a new variable (Recode into Different).  There is also an “Automatic Recode” feature that can be used to convert string and numeric values into consecutive integers as required by some procedures.

Select Cases

The idea of Select Cases [from the Statistics Viewer select Data and then Select Cases] is to provide the ability to conduct your analysis on selected subsets of the data file. Select Cases provides several methods for selecting a subgroup of cases based on variables and expressions (you can also select a random sample of cases). The criteria used to define a subgroup can include:

• Variable values and ranges

• Date and time ranges

• Case (row) numbers

• Arithmetic expressions

• Logical expressions

• Functions

Sort Cases

A typical approach to data verification is to reorganize (or resort) the data. SPSS handles this for you with Sort Cases [from the Statistics Viewer select Data and then Sort Cases].  Using this feature you can sort cases (rows) of the data based on the values of one or more sorting variables you select.

Merge Files

Sooner or later, you’ll need to combined files into 1 dataset for analysis. This can be a tedious chore. Fortunately, SPSS provides some simplification to the process with the Merge Files [from the Statistics Viewer select Data and then Merge Files] feature if:

  • Your files contain the same values – just different cases or
  • Your files contain the same variables – just different cases.

I Promise

Okay, next time, I promise – into the analysis!

Thoughts on “BM SPSS Statistics – Data Management Toolset”

  1. Thanks for ones marvelous posting! I quite enjoyed reading it, you’re a great author.I will be sure to bookmark your blog and will often come back at some point. I want to encourage yourself to continue your great posts, have a nice morning!

  2. My brother suggested I would possibly like this blog. He was entirely right. This submit actually made my day. You cann’t consider just how much time I had spent for this information! Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Jim Miller

Mr. Miller is an IBM certified and accomplished Senior Project Leader and Application/System Architect-Developer with over 30 years of extensive applications and system design and development experience. His current role is National FPM Practice Leader. His experience includes BI, Web architecture & design, systems analysis, GUI design and testing, Database modeling and systems analysis, design, and development of Client/Server, Web and Mainframe applications and systems utilizing: Applix TM1 (including TM1 rules, TI, TM1Web and Planning Manager), dynaSight - ArcPlan, ASP, DHTML, XML, IIS, MS Visual Basic and VBA, Visual Studio, PERL, Websuite, MS SQL Server, ORACLE, SYBASE SQL Server, etc. His Responsibilities have included all aspects of Windows and SQL solution development and design including: analysis; GUI (and Web site) design; data modeling; table, screen/form and script development; SQL (and remote stored procedures and triggers) development and testing; test preparation and management and training of programming staff. Other experience includes development of ETL infrastructure such as data transfer automation between mainframe (DB2, Lawson, Great Plains, etc.) systems and client/server SQL server and Web based applications and integration of enterprise applications and data sources. In addition, Mr. Miller has acted as Internet Applications Development Manager responsible for the design, development, QA and delivery of multiple Web Sites including online trading applications, warehouse process control and scheduling systems and administrative and control applications. Mr. Miller also was responsible for the design, development and administration of a Web based financial reporting system for a 450 million dollar organization, reporting directly to the CFO and his executive team. Mr. Miller has also been responsible for managing and directing multiple resources in various management roles including project and team leader, lead developer and applications development director. Specialties Include: Cognos/TM1 Design and Development, Cognos Planning, IBM SPSS and Modeler, OLAP, Visual Basic, SQL Server, Forecasting and Planning; International Application Development, Business Intelligence, Project Development. IBM Certified Developer - Cognos TM1 (perfect score 100% on exam) IBM Certified Business Analyst - Cognos TM1

More from this Author

Follow Us