When it comes to performance problems indexing is the bread and butter of most data warehouse developers. And why not? It’s a cheap and easy way address simple performance problems. However, like anything cheap and easy it can only fix the cheap and easy problems. Indexing will not solve large underlying performance problems, but it is a good way to get the “low hanging fruit”. Below are some basic tips on effective index use.
Keep in mind, too many indexes is a bad thing.
Indexes not only take up a lot of space, indexing too much data will actually make your database slower. The general rule of, “When all data is crucially important data, then no data is crucially important.” tends to hold true here. This doesn’t mean one should shy from using indexes. But one should take care not to over index. Every index should be in there for a reason and a good one.
Explain plans are your friend!
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
Developers often think they can solve a performance problem just by looking at a SQL statement and throwing indexes whereever they like. At the very least, an explain plan should be run because it will show where the problems are likely to lie and bring to light possibilities that a simple glance at the SQL might have missed.
Think ahead functionally in creating indexes
A performance problem might point to one particular column. Indexing the singular column and moving on would be simple. But if you know that users that query on that column tend to also filter on another column it might make sense to create a multi-column index instead of just taking care of the singular column. Again, use your best judgement to not create a index that will hurt you instead of help you. Having a functional knowledge of how users are coming at the data can help in creating a holistic indexing solution.
Recognize your weapons. Practice them well.
Some companies shoot themselves in the foot by using the wrong index type in the wrong scenarios. Others try to fix all problems with just one kind of index. Knowing which index to use is just as important as knowing when to use one. There are many different ways of building indexes so it is important to know what to use and when. Does the column you’re looking at have a high cardinality (like Customer Name, Product Name, etc)? If so, then a “normal” or btree index would be best. Do you want it clustered or not? Is the data low cardinality (such as gender, job title, end market segment)? If so, Then a bitmap indexes tend to be best.
Also keep in mind some database products allow for different kinds of indexes. Work with your DBA (remember Part 1 of this series?) to find the right index type for the performance problem.
Good tips on managing index well specifically in DW. Indexing is like two-edged sword – using it appropriately can help a lot to achieve performance goal; It also might do harm in some cases. When we are going to move large volume of data from source to data warehouse, one possible way we can follow is to disable all index then start to load data in bulk mode. That will greatly improve performance.After load is complete, it’s time to enable index on existing warehouse datasets.
Oh, certainly when moving large amounts of data dropping and recreating indexes is a must. The post was very report/query centric, but you’re absolutely right, care should be taken on the ETLs as well. Great catch!