When I speak to clients about In-Memory features in SQL Server, I find that Columnstore indexes just haven’t gained much traction as a marquee feature. The functionality itself is quite useful in the BI/DW realm as far as potentially boosting query performance by 1.5 to 10 times. But I think it gets overlooked because the use-case just isn’t very obviously derived from the typical description of it. The explanations I have read/heard/seen of Columnstore and how it works get tedious very quickly.
So I don’t want to cover details of how Columnstore works in this post. I just want to clarify when it might be useful. Then, if it sounds like it fits your situation, you can dive into some links and have all the tedium you want.
So here are, to me, the most pertinent Columnstore facts to be aware of:
- It stores data in a columnar data format, heavily compressed, and in-memory — so it’s FAST.
- It is very focused on large result sets that need to be aggregated or grouped. If you are doing full table scans in your queries, you might be interested.
- It requires partitioning. If you have a large Fact table that is a candidate for partitioning, this again is potentially right up your alley.
- Columnstore is not ideal for frequently updated tables. You will end up having to drop and re-create the index before/after data update operations. So a rapid incremental refresh environment is not an ideal fit. UPDATE: I am reminded by a very helpful colleague that SQL Server 2014 removes this limitation and allows table updates/deletes/etc. (Thanks Andrew!)
- Because it is an In-Memory feature, your capability and performance is dependent upon hardware and SQL Server memory configuration.
If you have large fact tables and query performance issues, and if SSAS is either not an option or itself has performance issues, columnstore is an option to investigate. Columnstore indexes have been shown to be faster than an Analysis Services cube in some instances! From my perspective, a couple of use case scenarios immediately come to mind:
- Creation of specific fact structures for highly responsive reports/dashboards — especially in situations where Analysis Services is not an option, or is also not performing adequately
- Improving cube processing performance (although the drop/rebuild time for the columnstore index will then likely take place during ETL — so net performance gain would have to be tested)
For further info, this article seems to be the granddaddy of all columnstore articles. It contains a massive and detailed FAQ, and includes the formula for determining memory capacity. More focused instructions and examples of creating and using a columnstore can be found here, on Microsoft’s TechNet site. Cheers!