Skip to main content

Data & Intelligence

Managing BI Performance Problems Series Part 4: Partitioning

Welcome to part 4 of managing BI performance problems. In part 1 I discussed how to properly manage and approach a performance problem. Part 2 was about applying indexes to speed up queries. Part 3 was about how if indexing didn’t offer enough performance return then creating aggregate tables could help alleviate performance problems. In this, part 4, I will be discussing table partitioning. If your table is large, and the data cannot be aggregated to an aggregate table, partioning your database tables is a good way to cut down on performance if used correctly.

What is table partitioning?

Table partitioning in essence slices up your table into smaller chunks to make it so that instead of the database having to sort through hundreds of thousands of rows it only has to go through tens of thousands.

This makes partitioning a very good solution for tables that for some reason cannot be aggregated. It should be said that aggregate tables, when possible, should always be the first course of action. Partitioning is good and can cut down query times substantially, but aggregates almost always do better.

Should I do partitioning via my meta-layer or in the RDBMS?

There are in essence two ways to accomplish partitioning: either do it in the meta layer vendor software, or do it in the RDBMS. Both sides have their pros and cons.

Meta-layer Partitioning

Meta-layer software is getting more and more advanced allowing for customers to have an ever increasing number of options. The gist of doing partitioning in the meta-layer would be that you take two physically separate tables and create them as one logical table and tell the meta-layer when to apply the correct fragmentation to know when to get what data from where.

For example: A lot of companies will split off their tables into separate physical history tables then use the meta layer to create a single large logical table and then the meta layer figures out what table to query based on the logical query.

Pros:

– A lot of companies already have “history” or “subset” tables as they were very popular in past methodologies.

Data Intelligence - The Future of Big Data
The Future of Big Data

With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.

Get the Guide

– History/subset tables are very easy to create and incorporate into your schema

– Getting data from a single physical table is very fast

– Partitioning is in the hands of developers that will (hopefully) know the data rather well

Cons:

– You have two choices as to expanding data: keep making your history/sub tables bigger (thereby possibly negating their value) or create more history/sub tables which means that you’ll keep making meta layer changes to include these new tables. Also if tables keep getting added to the meta layer then the fragmentation logic will keep getting more and more complex.

– When a query spans more than one table it will do a UNION. This, while faster than selecting from a large table, is typically slower than going with the RDBMS equivalent.

– Selecting everything from the table might actually be slower than if it were one whole table due to the unions.

– If users are going straight against the database they’ll need to know to bring in any subset tables manually for their queries otherwise they’ll miss out on data.

RDBMS Partitioning

Just as meta layer software is getting better so too is RDBMS software. Oracle 11g in particular gives DBAs a wonderful set of tools and utilities to handle complex partitioning. Developers and DBAs can select to partition by time, or certain values. There are a lot of different approaches and possibilities when it comes to partitioning within the RDBMS check with your vendor to see what they do and do not support.

Pros:

– Partitioning logic is coded into the RDBMS itself; no meta layer software will access the data faster.

– Easy to create additional partitions if needed without incurring a code push

– Tons of options available to create partitions

– Invisible to the user even if the user is going straight against the database.

Cons:

– Typically partitioning is a DBA function and DBAs usually don’t know the data as well as developers.

– Can make inserts a bit slower

– Because there are so many options available it is possible to really screw this up for yourself if you don’t know what you’re doing.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.