OK, I know the title of the song is Twist and Shout. When I played bass in my high school rock band, I was always amazed at how difficult it was to play this simple, 3-chord song really well. (I think it’s the syncopation). Sometimes the seemingly simple things have some added hidden intricacies.
Here’s one of those intricacies in MDX, a question that comes up fairly frequently: How to create “OR” logical functionality. If you’re an experienced SQL coder, you’re probably thinking, “What could be simpler than ‘OR’?” Because MDX is superficially similar to SQL (both querying languages use SELECT … FROM … WHERE constructs), it can be difficult to understand the differences between how these two languages operate.
First, let’s start with an example in SQL, using SQL 2005 AdventureWorksDW that shows all customers that have
a) a Bachelors education level and income between than $10,000 and $30,000
or
b) a Bachelors education level and income between than $130,000 and $170,000:
SELECT
COUNT(*) AS OrCount
FROM
DimCustomer
WHERE
((YearlyIncome BETWEEN 10000 AND 30000)
AND
(EnglishEducation = N‘Bachelors’))
OR
((YearlyIncome BETWEEN 130000 AND 170000)
AND
(EnglishEducation = N‘Bachelors’))
This example uses the lowest and highest Income Level “buckets” present in the Adventure Works cube. We’ll use the same data in the MDX examples below.
What the SQL engine does with this query is test each row against the logical constructs in the WHERE clause. If one side of the OR evaluates to TRUE, the row is counted in the results set. To anyone used to SQL, that’s such an simple and intuitive operation, it seems almost too obvious to describe.
MDX works differently. Rather than testing each row, (there really is no such concept as a “row” in MDX; it’s all cell-sets), the entire sub-cube that is in scope is tested for intersection with the members specified in the WHERE clause. In this example MDX query,
SELECT
[Measures].[Customer Count] ON Columns
FROM
[Adventure Works]
WHERE
([Customer].[Yearly Income].[10000 – 30000], [Customer].[Education].[Bachelors])
The OLAP engine is returning the aggregate value for [Customer Count] for only those cells that
a) intersect with the [10000 – 30000] member of the [Customer].[Yearly Income] attribute
and simultaneously
b) Intersect with the [Bachelors] member of the [Customer].[Education] attribute.
The two conditions (actually, better termed as members in the MDX world) in the WHERE clause are ANDed together. In fact, it is not possible to refer to the same dimensional attribute twice in the WHERE clause. This makes sense, because it is not possible to have an income level of [10000 – 30000] at the same time as [130000 – 170000]. So how is it possible to reproduce “OR” functionality with MDX? By UNIONing two sets together. The following MDX queries work in a series of discrete steps up to the solution:
// Step 1) What are the raw demographic values from the cube?
SELECT [Measures].[Customer Count] ON 0 FROM [Adventure Works] WHERE ([Customer].[Education].&[Bachelors]) — 18,484 / 5,356
SELECT [Measures].[Customer Count] ON 0 FROM [Adventure Works] WHERE ([Customer].[Yearly Income].[10000 – 30000]) — 18,484 / 5209
SELECT [Measures].[Customer Count] ON 0 FROM [Adventure Works] WHERE ([Customer].[Yearly Income].[130000 – 170000]) — 18,484 / 821
// Step 2) What are the compound demographic values from the cube?
SELECT [Measures].[Customer Count] ON 0 FROM [Adventure Works] WHERE ([Customer].[Yearly Income].[10000 – 30000], [Customer].[Education].&[Bachelors]) — 18,484 / 814
SELECT [Measures].[Customer Count] ON 0 FROM [Adventure Works] WHERE ([Customer].[Yearly Income].[130000 – 170000], [Customer].[Education].&[Bachelors]) — 18,484 / 258
// Step 3) Create a union of the two dimensionally identical sets
WITH SET UnionedSet AS {([Customer].[Yearly Income].[10000 – 30000], [Customer].[Education].&[Bachelors])} + {([Customer].[Yearly Income].[130000 – 170000], [Customer].[Education].&[Bachelors])}
SELECT [Measures].[Customer Count] ON 0, UnionedSet on 1 FROM [Adventure Works] — 814, 258
// Step 4) Display the union as a single value using an aggregate function
WITH SET UnionedSet AS {([Customer].[Yearly Income].[10000 – 30000], [Customer].[Education].&[Bachelors])} + {([Customer].[Yearly Income].[130000 – 170000], [Customer].[Education].&[Bachelors])}
MEMBER [Measures].[OrCount] AS SUM(UnionedSet,[Measures].[Customer Count])
SELECT [Measures].[OrCount] ON 0 FROM [Adventure Works] — 1072
// Step 5) Display the union as a single value using an aggregate function in the slicer spec/where clause
WITH SET UnionedSet AS {([Customer].[Yearly Income].[10000 – 30000], [Customer].[Education].&[Bachelors])} + {([Customer].[Yearly Income].[130000 – 170000], [Customer].[Education].&[Bachelors])}
MEMBER [Measures].[OrCount] AS SUM(UnionedSet,[Measures].[Customer Count])
SELECT FROM [Adventure Works] WHERE [Measures].[OrCount] — 1072
// Step 6) Equivalent query using an MDX subquery expression
SELECT
[Measures].[Customer Count] ON 0
FROM (
SELECT
{([Customer].[Yearly Income].[10000 – 30000], [Customer].[Education].&[Bachelors])} + {([Customer].[Yearly Income].[130000 – 170000], [Customer].[Education].&[Bachelors])} ON 0
FROM [Adventure Works]
)
It’s important to remember that Analysis Services is purpose-built to return datasets. For most analytic purposes, intersection with dimensional attribute members is a both effective and (most importantly) efficient way to query data. It’s just when the conditions need to be ORed that things get tricky.
As you can see from the examples above, it’s perfectly possible to achieve OR functionality. If you’re doing custom development where you are generating the MDX yourself, then you’re home free. If you’re relying on the many front-end tools that are out there, then you should verify
a) whether this type of analysis is needed by the users – and –
b) whether your chosen front-end querying tool is smart enough to generate this type of MDX
For example, the current version of Excel appears to be capable of generating this type of MDX query and results.