A source system that contains dynamic or customer-defined data fields is both an old problem and a sticky one in the DW/BI environment. A standard approach of flattening the dynamic data into name/value pairs is recommended for a relational-type database. But what about multidimensional modeling or OLAP storage? Name/value pairs are just the beginning. To create full-fledged OLAP cube Dimensions and Measures, additional metadata must be leveraged. In addition to name and value fields, you also need information regarding data type, data length, display rules, access rights and more, depending on the situation.
On a recent project, I worked through such a data conundrum with a client, and we ended up with an architecture for dynamically defining and generating multidimensional cubes in SQL Server Analysis Services 2012.
The organization had a portfolio of SaaS applications oriented around their vertical market. They wanted to enhance their offerings with a new product: a SaaS offering that served customers insightful analytics and ad-hoc reporting against the application data of their other licensed tools. One of the principal tools in the portfolio had a scheme for providing Custom Fields. And, of course, the ability to analyze that data was viewed as a key project requirement by my client.
The Data Warehouse architecture as it had been proposed included SQL Server 2012 databases, ETL based on SSIS, and SSAS OLAP Cubes serving data visualizations. The data architecture of the Data Warehouse was straightforward Star Schema, with modifications as demanded by source application design. The web application architecture was ASP.NET MVC with a 3rd party tool leveraged for data presentation in the application. The primary area of concern was the data architecture for handling the Custom Fields. How would we store the data pulled from the source system for usage downstream? How could the Custom Fields data be represented in the OLAP layer? After extensive discussion and analysis, we arrived at a set of options.
Our first design attempt involved numbered Custom Dimensions, which would effectively store the custom field data column-wise in tables. Despite significant effort being spent on development, this method was ultimately rejected, as the factors of table growth and the complexity of schema management required were shaping up to be an immediate maintenance nightmare.
The next logical solution that presented itself was to create dynamic Dimensions. But how? We would need to have the same type of flattened schema for storing custom field data, but we would need to decorate that data with metadata that would allow dimensions or measures to be constructed on the fly out of the Data Warehouse.
The model we ultimately established, while not pure dimensional modeling, would serve the utilitarian purposes described above at the best possible grain for our reporting needs. Here is the schema defined for the Custom Field data:
Thus, we established our foundation in the data warehouse, and a design direction for the solution. The next challenge was to dynamically create and deploy cubes…
Next time: Pt. 2 – The Cube Generator