I recently finished another productive mentoring session exploring Data Transformations using SPSS Syntax. We had so much fun using some basic SPSS, I just had to share, so:
Getting back to my Predictive startup example, we now have a new version of the quarterly hours file. This one includes a few new columns (“variables” to SPSS) which are:
- Startdate (which is the date the consultant first starting billing hours)
- Hoursbilled2012, hoursbilled2011 and hoursbilled2010 (these will be the total hours billed for the consultant for the period for prior years)
MEAN and MISSING VALUES
One of the new objectives was to calculate the MEAN billable hours for each consultant for each month, based upon hours billed for the current year (hoursbilled) and the hours billed for the prior 3 years (the 3 new columns).
That was easy, since SPSS provides the statistical function MEAN, but after some analysis of our results, I noticed that some consultants did not have billable hours for each of the years and in those years, the file contained the value -999. Since this is a numeric value, it invalidates the MEAN calculation for that consultant, for that month.
To accommodate for this, I used the MISSING VALUES function (which basically tells SPSS to exclude the -999 values) and then our MEAN calculation works just fine. Cool!
STRING FUNCTIONS and Variables
There sure are plenty of SPSS string functions to choose from.
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
I also wanted to create a corporate email address variable for each one of my consultants in the file – based upon their name and our corporate email address “@perdictive.performers.com”.
So… I utilized a few more functions and features of SPSS Syntax:
- I declared a new string variable named “emailaddress”:
STRING emailaddress (A55)
- I combined the consultants name with a static value using COMPUTE and CONCAT:
COMPUTE emailaddress =rtrim(concat(consultant, ‘@predictive.performers.com’))
DATE FUNCTIONS
The final “exercise” was to calculate each consultants “tenure” (defined as the total number of months from the consultants first billable day up to today).
For this, I created a “temporary” variable named “tday” using the SPSS function DATE.MDY (and yes, I just hardcoded it with today’s date) and then used the DATEDIFF function to do the required math.
So here is the script:
And here is the final file, including everything I expected!
Next time I’ll share our explorations with loops and vectors.
Remember, “Time is the coin of your life. It is the only coin you have, and only you can determine how you spend it. Be careful lest you let others spend it for you.”