I have had limited experience with SPSS Data Mining for Predictive Analytics. I was very impressed with the predictive capabilities of the tool. The business cases for this and any predictive analytics tool are endless. While conducting some research about predictive analytics, I came across a Microsoft solution – Microsoft Analysis Services Data Mining; which was accompanied by a tutorial – Microsoft Analysis Services Basic Data Mining Tutorial. Rather than regurgitate the tutorial found on the website, I want to provide a consolidated list of some helpful links and terms that I discovered as I was going through the tutorial. I highly recommend going through the tutorial, step-by-step in order to gain a better understanding of the tool.
Executive summary of the tutorial…
Welcome to the Microsoft Analysis Services Basic Data Mining Tutorial. Microsoft SQL Server provides an integrated environment for creating and working with data mining models. In this tutorial, you will complete a scenario for a targeted mailing campaign in which you create models for analyzing and predicting customer purchasing behavior and for targeting potential buyers. The tutorial demonstrates how to use three of the most important data mining algorithms, how to analyze your findings using the mining model viewers, create predictions and accuracy charts, using the data mining tools that are included in Microsoft SQL Server Analysis Services. The fictitious company, Adventure Works Cycles, is used for all examples.
When you are comfortable using the data mining tools, we recommend that you also complete the Intermediate Data Mining Tutorial, which demonstrates how to use forecasting, market basket analysis, time series, association models, nested tables, and sequence clustering.
NOTE: there is a known issue in the tutorial in Lesson 6: Creating and Working with Predictions (Basic Data Mining Tutorial) – Creating Predictions (Basic Data Mining Tutorial) with the calcAge column not appearing. I have not been able to get this resolved. I am still searching for a solution – you can follow the resolution here Basic Data Mining Tutorial – Lesson 6 Problem. I will also update the blog when I get this resolved.
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
The following diagram describes the relationships between each step in the process, and the technologies in Microsoft SQL Server that you can use to complete each step.
Prerequisites to the Tutorial
Make sure that the following are installed:
- Microsoft SQL Server 2012
- Used deploying the structure and models
- NOTE: you may need to provide access to the AdventureWorksDW2012 Database
- Microsoft SQL Server Analysis Services in multidimensional mode
- Installation of the AdventureWorksDW2012 Database: Datasource for the tutorial
Key Terms & Concepts
- Great resources for Data Mining Concepts
- Why Use Microsoft Data Mining?
- Microsoft Analysis Services Data Mining includes the following features that help you easily develop and compare multiple predictive models and then take actions on the results :
- Holdout Test Sets – When you create a mining structure, you can now divide the data in the mining structure into training and testing sets. This lets you test models on similar data sets, and compare the accuracy of related models.
- Mining model filters – You can now attach filters to a mining model, and apply the filter during both training and testing. This lets you easily build related models on different subsets of the data.
- Drillthrough to Structure Cases and Structure Columns – You can now easily move from the general patterns in the mining model to actionable detail in the data source.
- SQL Server Data Tools (SSDT): toolset that allows professional database and application developers to carry out all their database design work for SQL Server and SQL Azure within Visual Studio. SSDT provides a rich SQL Server development experience with first class language services and VS integration, as well as declarative, model-based tools that can be utilized for both online and offline development.
- SQL Server Analysis Services (SSAS): delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications. Analysis Services supports OLAP by letting you design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases. For data mining applications, Analysis Services lets you design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standard data mining algorithms.
- Data Source View: built on a data source and defines a subset of the data, which you can then use in your mining structures. You can also use the data source view to add columns, create calculated columns and aggregates, and add named views. By using data source views, you can select the data that relates to your project, establish relationships between tables, and modify the structure of the data, without modifying the original data source.
- Mining Model Content: Mining model content includes metadata about the model, statistics about the data, and patterns discovered by the mining algorithm. Depending on the algorithm that was used, the model content may include regression formulas, the definitions of rules and itemsets, or weights and other statistics.
- Additional details can be found here – Mining Model Content (Analysis Services – Data Mining)
- Data Mining Algorithms: is a set of heuristics and calculations that creates a data mining model from data. To create a model, the algorithm first analyzes the data you provide, looking for specific types of patterns or trends. The algorithm uses the results of this analysis to define the optimal parameters for creating the mining model. These parameters are then applied across the entire data set to extract actionable patterns and detailed statistics.
- The mining model that an algorithm creates from your data can take various forms, including:
- A set of clusters that describe how the cases in a dataset are related.
- A decision tree that predicts an outcome, and describes how different criteria affect that outcome.
- A mathematical model that forecasts sales.
- A set of rules that describe how products are grouped together in a transaction, and the probabilities that products are purchased together.
- Additional details can be found here – Data Mining Algorithms (Analysis Services – Data Mining)
- Training and Testing Data Sets: Separating data into training and testing sets is an important part of evaluating data mining models. Typically, when you separate a data set into a training set and testing set, most of the data is used for training, and a smaller portion of the data is used for testing. Analysis Services randomly samples the data to help ensure that the testing and training sets are similar. By using similar data for training and testing, you can minimize the effects of data discrepancies and better understand the characteristics of the model.
- After a model has been processed by using the training set, you test the model by making predictions against the test set. Because the data in the testing set already contains known values for the attribute that you want to predict, it is easy to determine whether the model’s guesses are correct.
- Drill-through Queries: lets you retrieve details from the underlying cases or structure data, by sending a query to the mining model. Drillthrough is useful if you want to view the cases that were used to train the model, versus the cases that are used to test the model, or if you want to see additional details from the case data.
- Analysis Services Data Mining provides two different options for drillthrough:
- Drilling through to the model cases: Drillthrough to model cases is used when you want to go from a specific pattern in the model—such as a cluster or branch of a decision tree—and view details about the individual cases.
- Drilling through to the structure cases: Drillthrough to structure cases is used when the structure contains information that might not be available in the model. For example, you would not use customer contact information in a clustering model, even if the data was included in the structure. However, after you create the model, you might want to retrieve contact information for customers who are grouped into a particular cluster.
- Additional details can be found here – Drillthrough Queries (Data Mining)
- Deploying and Processing Models: Models cannot be used until they have been deployed and processed.
- Deploying sends the project to a server and creates any objects in that project on the server.
- Processing is the step, or series of steps, that populates Analysis Services objects with data from relational data sources.
- Microsoft Generic Mining Model Content Viewer: provides detailed information about the patterns found by the mining algorithm, and also provides access to various statistics generated during the analysis process.
- The amount and type of information depends on the algorithm that was used, but can include the following categories:
- Segments of data, and their characteristics.
- Descriptive statistics about each group or about the whole set of data.
- The number of branches or child nodes in a tree.
- Calculations, such as variance and mean, for a cluster or a whole set of data.
- Viewing this information can help you better understand the results of your analysis. You can also identify ways to fine-tune, and then retrain, your model. Or, you might decide to retrain by using a different algorithm.
- Mining Model Viewer Tasks and How-tos: Data Mining Designer in SQL Server Data Tools (SSDT) contains several tools that you can use to explore mining models.
- Additional details can be found here – Mining Model Viewer Tasks and How-tos
- Tools for Charting Model Accuracy: tools to validate mining models
- The tutorial only uses Lift Charts
- Mining Model Prediction tab in the Data Mining Designer: interface helps you build queries in DMX, or the Data Mining Extensions (DMX) language
- Data Mining Query Interfaces
Next Possible Topics
- Intermediate Data Mining Tutorial
- Exploring the Data Mining Add-ins for Microsoft Office
Questions for the Readers
- Have you used Microsoft Analysis Services Data Mining? If so, what are your thoughts? How does this tool compare to other tools you have used?
- What other Predictive Analytics tools have you used?
It is really very useful for data mining services…. Thanks…