Knowledge Bottleneck?
When building a predictive model, the larger the number of examples or “cases” considered, the better the model. Typically, these cases exist in multiple data files (or data sources) that must be “stitched together”.
The task of accessing each data source, performing some analysis on the cases contained in the data and formatting and moving that information into a single “knowledge base” can be a labor intensive process, and is referred to as the “knowledge bottleneck problem”.
IBM SPSS Statistics allows the data technician to span more than a single data source at the same time. You simply open each data source in a new SPSS Data Editor window:
- When you first open a data source, it automatically becomes the “active dataset”.
- You can change the active dataset simply by clicking anywhere in the Data Editor window of the data source that you want to use or by selecting the Data Editor window for that data source from the Window menu.
- At least one Data Editor Window must be open during a session. When you close the last open Data Editor window, SPSS Statistics automatically shuts down (prompting you to save changes first).
Accessing multiple data sources all at once allows you to:
- Switch back and forth between open data files.
- Compare the contents of different data files.
- Copy and paste data between data files.
- Create multiple subsets of cases and/or variables for analysis.
- Merge multiple data sources from various data formats (for example, spreadsheet, database, text data) without saving each data source first.
Conclusion
Understanding your data is key in predictive modeling and this involves rigorous data analysis; IBM SPSS is a powerful tool that supports this effort.