Skip to main content

Data & Intelligence

SPSS Virtual Files

The power of SPSS allows the data scientist or predictive modeler to consume large data volumes. This data may come in smaller manageable subsets or possible huge “data ponds”.  Depending upon the procedures you will be performing in your analysis, SPSS may reread the entire data set for each procedure.  Of course, procedures that change data require a certain amount of temporary disk space to keep track of the changes, and some actions even require enough disk space for at least one entire copy of that data.

To Virtualize or Not

The “virtual active file” feature of IBM SPSS enables you to work with very large data files without actually requiring equally large (or larger) amounts of “working” disk space. This allows us to define a “data cache” which, in some instances can improve performance. The key is to understand how SPSS will deal with the data for the procedure you wish to perform.

Actions that don’t require any temporary disk space include:

• Reading SPSS Statistics data files

• Merging two or more SPSS Statistics data files

• Reading database tables with the Database Wizard

• Merging SPSS Statistics data files with database tables

• Running procedures that read data (for example, Frequencies, Crosstabs, Explore)

 

Actions that create one or more columns of data in temporary disk space include:

• Computing new variables

• Recoding existing variables

• Running procedures that create or modify variables (for example, saving predicted values in Linear Regression)

 

Actions that create an entire copy of the data file in temporary disk space include:

• Reading Excel files

• Running procedures that sort data (for example, Sort Cases, Split File)

• Reading data with GET TRANSLATE or DATA LIST commands

• Using the Cache Data facility or the CACHE command

• Launching other applications from SPSS Statistics that read the data file (for example, AnswerTree, DecisionTime)

 

Actions that create an entire copy of the data file by default:

• Reading databases with the Database Wizard

• Reading text files with the Text Wizard

Creating your Cache

Although virtualizing a file can vastly reduce the amount of temporary space required for processing, the absence of a temporary copy of the “active” file does means that the original data source must be reread for each procedure you perform in SPSS. For larger data files read from an external source, creating a temporary copy of the data may improve your performance.  If you have the luxury of having sufficient disk space (locally or remote) you can eliminate multiple read and improve processing time by creating a data cache of the file.

To create your cache:

File the menus choose: File > Cache Data…

 

 

 

Click OK or Cache Now.

OK – creates a data cache the next time the program reads the data; Cache Now creates the data cache immediately. This option is used if you want to “lock” your data so that it cannot be updated or changed until your processing is complete.

That’s all there is to it!

“From now on, we live in a world where man has walked on the moon. And it’s not a miracle, we just decided to go.” – Jim Lovell, Apollo 13

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Jim Miller

Mr. Miller is an IBM certified and accomplished Senior Project Leader and Application/System Architect-Developer with over 30 years of extensive applications and system design and development experience. His current role is National FPM Practice Leader. His experience includes BI, Web architecture & design, systems analysis, GUI design and testing, Database modeling and systems analysis, design, and development of Client/Server, Web and Mainframe applications and systems utilizing: Applix TM1 (including TM1 rules, TI, TM1Web and Planning Manager), dynaSight - ArcPlan, ASP, DHTML, XML, IIS, MS Visual Basic and VBA, Visual Studio, PERL, Websuite, MS SQL Server, ORACLE, SYBASE SQL Server, etc. His Responsibilities have included all aspects of Windows and SQL solution development and design including: analysis; GUI (and Web site) design; data modeling; table, screen/form and script development; SQL (and remote stored procedures and triggers) development and testing; test preparation and management and training of programming staff. Other experience includes development of ETL infrastructure such as data transfer automation between mainframe (DB2, Lawson, Great Plains, etc.) systems and client/server SQL server and Web based applications and integration of enterprise applications and data sources. In addition, Mr. Miller has acted as Internet Applications Development Manager responsible for the design, development, QA and delivery of multiple Web Sites including online trading applications, warehouse process control and scheduling systems and administrative and control applications. Mr. Miller also was responsible for the design, development and administration of a Web based financial reporting system for a 450 million dollar organization, reporting directly to the CFO and his executive team. Mr. Miller has also been responsible for managing and directing multiple resources in various management roles including project and team leader, lead developer and applications development director. Specialties Include: Cognos/TM1 Design and Development, Cognos Planning, IBM SPSS and Modeler, OLAP, Visual Basic, SQL Server, Forecasting and Planning; International Application Development, Business Intelligence, Project Development. IBM Certified Developer - Cognos TM1 (perfect score 100% on exam) IBM Certified Business Analyst - Cognos TM1

More from this Author

Follow Us