Skip to main content

Data & Intelligence

IBM SPSS Syntax -using Functions to perform Data Transformations

I recently finished another productive mentoring session exploring Data Transformations using SPSS Syntax. We had so much fun using some basic SPSS, I just had to share, so:

Getting back to my Predictive startup example, we now have a new version of the quarterly hours file. This one includes a few new columns (“variables” to SPSS) which are:

  • Startdate (which is the date the consultant first starting billing hours)
  • Hoursbilled2012, hoursbilled2011 and hoursbilled2010 (these will be the total hours billed for the consultant for the period for prior years)

newinputfile

MEAN and MISSING VALUES

One of the new objectives was to calculate the MEAN billable hours for each consultant for each month, based upon hours billed for the current year (hoursbilled) and the hours billed for the prior 3 years (the 3 new columns).

That was easy, since SPSS provides the statistical function MEAN, but after some analysis of our results, I noticed that some consultants did not have billable hours for each of the years and in those years, the file contained the value -999. Since this is a numeric value, it invalidates the MEAN calculation for that consultant, for that month.

To accommodate for this, I used the MISSING VALUES function (which basically tells SPSS to exclude the -999 values) and then our MEAN calculation works just fine. Cool!

STRING FUNCTIONS and Variables

There sure are plenty of SPSS string functions to choose from.

Data Intelligence - The Future of Big Data
The Future of Big Data

With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.

Get the Guide

I also wanted to create a corporate email address variable for each one of my consultants in the file – based upon their name and our corporate email address “@perdictive.performers.com”.

So… I utilized a few more functions and features of SPSS Syntax:

  1. I declared a new string variable named “emailaddress”:

STRING emailaddress (A55)

  1. I combined the consultants name with a static value using COMPUTE and CONCAT:

COMPUTE emailaddress =rtrim(concat(consultant, ‘@predictive.performers.com’))

DATE FUNCTIONS

The final “exercise” was to calculate each consultants “tenure” (defined as the total number of months from the consultants first billable day up to today).

For this, I created a “temporary” variable named “tday” using the SPSS function DATE.MDY (and yes, I just hardcoded it with today’s date) and then used the DATEDIFF function to do the required math.

So here is the script:

myscript

And here is the final file, including everything I expected!

outputfile

Next time I’ll share our explorations with loops and vectors.

Remember, “Time is the coin of your life. It is the only coin you have, and only you can determine how you spend it. Be careful lest you let others spend it for you.”

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Jim Miller

Mr. Miller is an IBM certified and accomplished Senior Project Leader and Application/System Architect-Developer with over 30 years of extensive applications and system design and development experience. His current role is National FPM Practice Leader. His experience includes BI, Web architecture & design, systems analysis, GUI design and testing, Database modeling and systems analysis, design, and development of Client/Server, Web and Mainframe applications and systems utilizing: Applix TM1 (including TM1 rules, TI, TM1Web and Planning Manager), dynaSight - ArcPlan, ASP, DHTML, XML, IIS, MS Visual Basic and VBA, Visual Studio, PERL, Websuite, MS SQL Server, ORACLE, SYBASE SQL Server, etc. His Responsibilities have included all aspects of Windows and SQL solution development and design including: analysis; GUI (and Web site) design; data modeling; table, screen/form and script development; SQL (and remote stored procedures and triggers) development and testing; test preparation and management and training of programming staff. Other experience includes development of ETL infrastructure such as data transfer automation between mainframe (DB2, Lawson, Great Plains, etc.) systems and client/server SQL server and Web based applications and integration of enterprise applications and data sources. In addition, Mr. Miller has acted as Internet Applications Development Manager responsible for the design, development, QA and delivery of multiple Web Sites including online trading applications, warehouse process control and scheduling systems and administrative and control applications. Mr. Miller also was responsible for the design, development and administration of a Web based financial reporting system for a 450 million dollar organization, reporting directly to the CFO and his executive team. Mr. Miller has also been responsible for managing and directing multiple resources in various management roles including project and team leader, lead developer and applications development director. Specialties Include: Cognos/TM1 Design and Development, Cognos Planning, IBM SPSS and Modeler, OLAP, Visual Basic, SQL Server, Forecasting and Planning; International Application Development, Business Intelligence, Project Development. IBM Certified Developer - Cognos TM1 (perfect score 100% on exam) IBM Certified Business Analyst - Cognos TM1

More from this Author

Follow Us