The goal of performance tuning in SQL is to minimize the execution time of the query and reduce the number of resources while processing the query. Whenever we run the query, performance depends on the amount of data and the complexity of the calculations we are working on. So, by reducing the no of calculations and data, we can improve the performance. For that, we have some best practices and major factors which we are going to discuss in detail.
Data Types – Deciding on the right data type can reduce the storage space and improve our performance. We should always choose the minimum size of the data type which will work for all the values in all columns.
- Choosing a specific data type helps to ensure that only specific values are stored in a particular column and reduce storage size.
- Sometimes we need to convert one datatype to another which increases resource utilization and thereby reduces performance. So, to avoid that while creating a table care should be taken that we use the correct datatype across the tables in our data model, by doing so we are reducing the chances of changing them in the future. In this way, we don’t need to convert data type implicitly or explicitly and our query runs faster.
- We should use a new data type instead of the deprecated data type.
- Store the date and time in a separate column. It helps to aggregate data on the date and timewise also helps when we filter the data.
- When we have a column with a fixed length, go for the fixed length data type, for example- Gender, Flag value, Country code, Mobile number, Postal code, etc.
Filtering Data- Query performance depends on how much data we are processing so it is important to take only the required data for our query. Also, at which level we are filtering the data.
Let’s see some of the scenarios –
- For example, if we want to see aggregation for the year 2022 and for the ABC department then we should always filter data before aggregation in the Where clause instead of Having.
- If we want to join two tables with specific data, then we should filter the required data before joining the table.
Joins – Join is a very common and useful concept in Databases and data warehouses. In order to improve performance choosing the appropriate join for our requirements is very important. Below are some best practices of join.
- If we want only matching records from joining tables, then we should go for inner join. If we want full data from any one table, then we should go for left or right outer join and if we want full data from both the table then we should go for full outer join. Always try to avoid Cross join
- Use ON instead of writing join condition on where clause.
- Use alias name for table and column.
- Avoid OR in the join condition.
- Always prefer to join instead of a correlated subquery. correlated subquery has poor performance as compared to the joins.
Exist vs IN –
- We should use EXIST instead of IN whenever the subquery returns a large amount of data.
- We should use IN when the subquery returns a small amount of data.
Index- If we talk about performance tuning in SQL, Index plays a very important role. We can create an index either implicitly or explicitly. We have to use the index very carefully because, on one hand, it increases performance in searching, sorting, and grouping record, and on another hand, it increases disk space and takes more time while inserting, updating, and deleting data. There are two types of indexes, Cluster and Non-Cluster indexes. We can have only one Cluster index per table and whenever we create a primary key in the table, the database creates a clustered index implicitly. We can have multiple non-cluster indexes in the table. Whenever we create Unique Key on the table, the database creates a non-cluster index.
Below are some best practices for creating indexes.
- It is always recommended that we should create a clustered index before creating a non-cluster index.
- Integer data type works faster with index as compared to string because integer has low space requirement. That is why it is recommendable to create a primary key on the Integer column.
- Indexing in OLTP database -One should avoid multiple indexes in OLTP (online transaction processing) database since there is a need to frequently insert and modify the data hence multiple indexes might have a negative impact on performance.
- Indexing in OLAP database –OLAP (online analytical processing) database is mostly used for analytical purposes hence we commonly use select statements to get the data. In this scenario, we can use more indexes on multiple columns without affecting the performance.
Union vs Union All- Union all is faster as compared to Union because union check for duplicates and returns distinct values while Union all returns all records from both the table. If you know that both tables have unique and distinct records from each other, then we can go for union all for better performance.
These are some best practices that we can follow to improve the SQL performance.