SQL performance tuning can be a challenging practice, particularly when working with large-scale data. As the minor change happens, it can cause both positive and negative impacts on performance. When dealing with a big database that includes multiple tables and complex relationships, it will likely fail to identify whether your stored procedures or queries are properly established. By applying tools, you will take a new viewpoint to identify problems, analyze statistics, and optimize your queries effectively.
My experience tuning the performance of SQL queries while developing applications inspires me to share these practical steps with you. This article will go through how to debug and tune the performance of SQL queries in MSSQL. We also cover eight helpful SQL performance tuning techniques, which help you get the most out of this versatile SQL language and create effective queries.
What is performance tuning in SQL Server?
Performance tuning is a practice of query optimization, SQL client code optimization, database index management. Database performance tuning aims to minimize the response time of your queries by making the best use of your system resources, such as reducing network traffic, disk I/O, and CPU time.
How to tune the performance of SQL queries
Use the Client Statistics feature to debug query performance
Client Statistics is a part of SQL Server Data Tools. Analyzing the Client Statistics such as application profile, network, and execution time statistics enables you to determine whether your query has improved the performance. This one can also help debug query performance issues. Now, I will demonstrate how to implement all of them by employing the Client Statistics.
First things first, we need to turn the option on by clicking (Include Client Statistics) from the toolbar (Shift + Alt + S).
The Active Client Statistics option
Step 1: Execute the query that needs to increase performance.
Step 2: Read the statistics of the “Trial 1” column at the Client Statistics tab in the following screenshot.
Client Statistics - The original query result
Each time we execute the query, a new Trial is included, which indicates several data. The Client Statistics presented with three categories:
1. Query Profile Statistics
- The number of INSERT, DELETE, and UPDATE statements: The value is 0 because we don’t have the same statement defined in the query.
- Rows affected by INSERT, DELETE, or UPDATE statements: The value is also 0 as there are no such statements.
- The number of SELECT statements: We have 2 values since one is in the main query, and another is in the function.
- Rows returned by SELECT statements: 1070 is the total rows of 2 SELECT statements that we have.
- The number of transactions: We don’t acquire any transactions in the query.
The information from this category allows us to summarize all the sub-queries that we have and how much they cost for executing. Ever since we can decide whether to improve the query or not.
2. Network Statistics
- The number of server round trips: The value is 1 as we only have one cycle (send a request to the server then receive the reply).
- TDS (tabular data stream) packets sent from clients: The value is 1 packet sent from clients.
- TDS packets received from servers: We have 4 TDS packets sent from the server to the client.
3. Time Statistics
- Client processing time: The amount of time counted in milliseconds between the first receive response packet and the last receive response packet from clients.
- Wait time on server replies: This is the amount of time in milliseconds between the last request packet left by the client and the first response packet that came back from the server to the client.
- Total execution time: This is the total time in milliseconds between Client processing time and Wait time on server replies.
Step 3: Let’s check whether your query needs improvement. Now, I’m using a function inside the SELECT statement, which is not the best practice. Therefore, the query should be refactored as follows:
Client Statistic - Refactor query
After refactoring your query, let’s execute them again.
Step 4: Look at the Client Statistics tab; we already have the “Trial 2” column. As you can see, the performance result after refactoring the SQL query has changed. The initial black arrows of “Trial 1” have switched to green arrows downwards, which means the executing time has been reduced.
Client Statistic - Performant result after refactoring SQL query
In this example, I followed the following suggestion Do not use a Scalar-Valued function inside your SELECT that does not use functions inside the SELECT statement. The result is so amazing! While the total time for the first trial is around 31 seconds, the total time for the second one is just 0.0062 seconds.
Moreover, it can be seen from the screenshot, the “Bytes sent from the client” in “Trial 2” are bigger than the ones in “Trial 1”, meaning “Trial 2” executes higher performance than “Trial 1”.
Let's have a look at the “Client processing time” and “Wait time on server replies” values in the Time Statistics category. The total execution time is inconsiderable when we refactor the query by replacing the function inside the SELECT statement with a JOINable query statement.
When we achieve a good result from the Client Statistics, our query has been tuned perfectly.
Select necessary fields instead of using SELECT *
Applying the SELECT statement helps to return needed data that meets the business requirements. Below is an example where the business requirements request some fields of the product table.
Select specific fields
Execute two queries above; we will have the following analysis data:
The above screenshot also shows that the “Bytes received from server” in the “Trial 2” column are less than the ones of “Trial 1” about 1.5 times. Hence, we can conclude that “Trial 2” should be more efficient than “Trial 1”. In addition, it should be noted that the query of “Trial 2” is much clearer and only collects the required fields.
Use condition along with ON statement when JOINing, don’t put all conditions to WHERE
Many developers use “Primary Key” (PK) and “Foreign Key” (FK) to declare conditions when JOINing tables in a query. After that, they put other conditions of the JOINing table into the WHERE condition. In fact, the database has to collect all data in JOINing tables before filtering conditions with the WHERE clause. Obviously, query performance has decreased significantly.
Inefficient - Using join condition inside where clause
As the above example, all data of the ProductSubcategory table is joined with the Product table based on the ProductSubcategoryID. Another condition is also added to the WHERE clause instead of declaring it in the same line as the JOIN statement.
Efficient - Using JOIN condition inside the ON statement
Data of the ProductSubcategory table is filtered before joining with the Product table.
When we put the condition along with the ON statement, the JOIN query only needs to join filtered data before triggering the WHERE clause.
Do not use a Scalar-Valued function inside your SELECT
The complicated stored procedures often require using functions. This will make the queries clearer and easily follow the SOLID principles like the Single responsibility principle. Unfortunately, carrying out functions inside the Select statement can be an inefficient practice.
Imagine that we have a query that needs to select around 100,000 records in sequence, and we have a function query inside the SELECT. In this case, the function has to execute with equivalent times. Therefore, the query time increases considerably.
The function [dbo].[ufnGetProductListPrice](I.ProductID, @OrderDate) is executed multiple times depending on the data row of table Production.ProductInventory.
Inefficient - Using functions along with SELECT statements
Convert the function to a view, and then we can join with the view that all data already executes before SELECT.
Efficient - Using a view to Join instead
Optimizing: We can convert the view above into a Table-Valued function.
Sometimes, the view cannot meet with parameters and multiple statements. Therefore, the inline Table-Valued function is the best practice.
Table valued function - GetListPriceByOrderDate
Efficient - Optimized - Using a table-valued function to JOIN
Shouldn’t use OFFSET for pagination
Almost all pagination have both @Skip and @Take parameters; the OFFSET statement skips @Skip rows before implementing a FETCH NEXT @Take rows (@Take = 20, for example). It means if we have 100,000 records and @Skip value is 50,000, it would need to fetch 50,020 records first. After that, it skips 50,000 records and only returns the last 20 records.
Let’s see the example below:
It selects to the max 5020 records but skips 5000 records, then only returns 20 records. There has been no impact on 5000 selected records.
Inefficient - Paging with OFFSET.png
When paging, we need an index or ID parameter, which becomes cursor pagination.
Let’s see the example below:
The query selects only 20 needed rows. The performance is optimized to the maximum.
Efficient - Paging with Index Cursor.png
Keep the keyword of SET NOCOUNT as ON at the beginning of a Stored Procedure
Whenever you execute a query or retrieve information, rows affected by that T-SQL statement will return a short message to the client. Surprisingly, when you use SET NOCOUNT ON, this message is not sent. This allows you to improve the system's performance by reducing network traffic slightly. If you aren't required to use SET NOCOUNT OFF, you should apply SET NOCOUNT ON in queries or stored procedures. Using SET NOCOUNT On has no impact on the result that is returned.
A long time ago, I excluded this statement because I had no idea about it and neglected query performance. However, I realize sending the count and executing the query has no value to the final results.
A simple example below will help us compare performance between SET ON or OFF of the NOCOUNT statement.
Set NOCOUNT ON or OFF - Demo query
Check the performance result of 3 trials:
Use SET NOCOUNT OFF or not define SET NOCOUNT.
As we can see, for the “Trial 1” and “Trial 2” which have not defined SET NOCOUNT as well as set the NOCOUNT is OFF.
Rows affected by INSERT, DELETE, and UPDATE statements value is 9505, meaning the system's sending the count by 9505 times.
The total execution time is about 585 milliseconds.
Using SET NOCOUNT ON
According to “Trial 3”, almost all statistics data has been optimized. The query does not send the count.
The execution time of “Trial 3” was reduced to two-third compared to “Trial 1” and “Trial 2”.
Use EXISTS to check if a row existed instead of SELECT COUNT()
Some developers often select data to count and then check for row existence by comparing the SELECT result with a number. Nonetheless, this is not encouraged; we should use the EXISTS keyword instead.
Inefficient - Select-Count to check if arrow exists
The query SELECT with COUNT scans the entry table, counting up all entries matching the condition. As a result, performance is considerably decreased.
Efficient - Using EXISTS to check if a row exists by condition
For the EXISTS keyword, the result is returned as soon as the condition matches the first row. To be more clear, we can analyze more with the Client Statistics, “Trial 2” takes less execution time than “Trial 1”. This statement will be more efficient if we are in a complex query with big data. Hence, it’s the best practice to increase performance.
Indexes are critical parts that we need to accomplish when building Tables. More than 50 percent of the performance of a query depends on it.
To accomplish this SQL feature, please follow our previous article about How to use SQL Server Indexes for performance tuning.
Query performance is a critical task that we should not neglect in any application or website. It helps to fine-tune the overall performance of the database systems, make your software run smoothly, and work at high performance.
Although there are multiple ways to tune the performance of SQL queries, what I described above are some tips based on my working experience and reading SQL books. I believe these tips will allow you to craft better queries, have a well-structured database, avoid draining the production database resources and maintain more stability to your software applications.
Happy coding and have a nice day!