Friday, 2 May 2014

MS SQL Server Tuning and Tips

Server Configuration:

The performance gains that were seen after applying these techniques and configuration parameters were applied on a Database server with the following configuration
Windows Server 2008 64 bit
32 Gig of Ram
8 Processors
Microsoft SQL Server 2008 R2

Configuration Parameters

MaxDop – Otherwise known as Maximum Degree of Parallelism is what I consider the needle in the haystack. This parameter is used to designate how many processors can be used on a single SQL statement execution. By changing this one parameter we saw significant increases of performance.  This parameter can be either an option that is used on an individual query or a global server setting.  Not having access to the product SQL statements, I have always set the MAXDOP setting at the server level to 1. The default for MAXDOP is 4.
To verify or change your MaxDop setting in the SQL Server Management Studio (SSMS) right click on the server, and select properties.  This will bring up the following popup window on which you should select the advanced topic to see something similar to this:

After changing this parameter, you will need to restart the server before the change will take effect.
If you are not able to change this server based parameter then is can be specified on an individual SQL statement as seen below.
Select *
From Lombardi.Product
Order by productId
OPTION (MAXDOP 1)

I do not suggest adding hints to embedded SQL statements as even though it could be helping now, it may come back to hinder you later on. In an environment where removing a hint from a SQL statement could involve a snapshot deployment and migration, SQL statements should be written in a way to avoid having to supply hints to the compiler.

Another option that I like to adjust is the parameterization option.  Different people have different opinions on this one but I like to force parameterization.  This causes the engine to parse each statement and all of the variables are converted into parameters. The engine then configures the execution plan based on the parameterized queries. Based on the queries written, the time spent parsing and parameterizing them is gained in the reuse of the execution plans, in other cases it becomes costly and your performance will degrade a little.
To turn this on right click on the DB in SSMS and select properties. Select the option group to see a screen looking like this:


If it is not already set to forced, change the parameterization parameter from simple to forced and restart the server to have this take effect.

            Indexing

Contrary to what many believe, indexes are not the answer to every performance problem.  In reality an index is just a table with a key and a pointer to the record in the table that the index is applied on.  Indexes are primarily used for reading and finding data but will actually hinder writing the actual data. Selecting rows from that table will use the “best” available index if one exists, otherwise it will do a table scan which is very costly.  I cannot emphasis enough that indexes are good but too much of a good thing can be bad.

At a high level the best place to look for what should be indexed is how the columns in the table are used and how the data will be referenced.  Frequently, some of the best columns to index on are foreign key columns and columns that are used in the where clause to select the data.

Many people love to run a missing index report after a performance test and then just blindly apply each and every index that has been suggested. Frequently the SQL can be modified to use the proper joins to take advantage of the proper indexes.

        Covering Indexes

You can also use the SQL server concept of ‘covering indexes’ to reduce the cost and increase the speed of SQL statements.  In a simple index, the rows in the index are ordered by the values in the column and each row has a pointer to the row in the actual table.  In a covering index, the index definition uses the keyword ‘include’ which will include other fields in the index but not index them.  This way, instead of the system finding the row in the index and then using the index pointer to get to the row in the actual table, the system can find the row it is looking for in the index but then read the additional returned column(s)  in the query from the index and never have to go to the table.

       Filtered Indexes

Another index type that can be used in special situations is called ‘filtered indexes’. These indexes are a combination of indexing and a where clause.  This keeps the index smaller and faster to use.

The syntax used to create a filtered index is:
Create Nonclustered index IX_product_name on Lombardi.product (name) where sales_region is null

       Index Column Order

The column order in the index is an important part of index creation.  The most selective column should be placed first in the index. This should be followed by the second most selective and the third, etc.  Index on as few of the columns as needed to make it easier for the SQL engine to use the index.

       Looking for Issues

SQL Server Management Studio (SSMS) can help you identify potential issues after you have narrowed down the suspect query to a single SQL statement.  Before you start evaluating it, you need to turn on some features in SSMS.  To do this bring up a query window that you will be working with.  These options are window specific so make sure you are setting them on the window you will be using.

To set these options select Query and then Query Options… to open the popup window.  After selecting the advanced property you should see a window like this



Check the ‘Set Showplan_Text’, ‘Set Statistics Time’, and ‘Set Statistics IO’ options on and then select OK.  These options will do the following
Set Showplan Text – This option will turn on the display of the execution plan that the optimizer selected for the query.  The plan will show you how the optimizer has decided to execute the query.
Set Statistics Time – This option will display the times each set takes from the CPU perspective.

Set Statistics IO – This option will show you the quantity of reads and writes for each step of the execution. Reducing the number of reads and writes will free up disk access and lower the execution costs.


Once you have identified a slow query, you need to figure out why it is slow.  The three most common items to look as are:
Execution plans
I/O Statistics
Execution time

            Execute Plans

SSMS is able to show you the actual execution plan that will be used when the query is run on the server.  In the graphical representation the following query

Select name,
financialSystemId,
clientConfirmationDate
from gdm_request
join gdm_client on gdm_request.clientId = gdm_client.id


Will present a plan that looks like


This is the plan to be used when the SQL is executed.  A quick look can tell you what part of the query is taking the longest. A rule of thumb is that Seek activities are better and faster than  scan activities.  Clicking on any of the activities will give you information on that activity.  A few things to look for are the number of actual rows vs the Estimated number of rows, Scans vs Seeks, and any special functions that the optimizers has to use to join properly (data conversions).  There is a lot of information on the Web on how to evaluate a plan so I will not cover it here.


Many people assume that this is the best plan but there is not always the case.  SQL Server tries to balance the time it takes to produce the best plan and the actual time it takes to execute the SQL.  If it can look at all of the available options before time run out it will pick the best plan and use it.  However if the query has so many ways that it can be executed (i.e. lots of joins or sub-queries), time may run out before it finds the best plan and it will just pick one and go.  To check for this right click on the first box (Select in this case) and select properties.  To the right of the window the following will appear:



Looking at the QueryHash and QueryHashPlan you can see that it selected one to use.  Looking at the Reason for early termination property you can see that if found one good enough so it stopped looking and executed that plan.

Sometimes with complex queries with multiple tables, embedded sub queries and where clauses, time will run out on finding a plan and the engine will use the best plan so far.  The way to spot that is that the reason for early termination will be ‘Timed out’ and the hash plan field will be empty.

        I/O Statistic

On the Statistics tab you will see messages that will look like this



This shows you any progress made while you are tuning the query for the last 10 runs.

       Execution Time

On the messages tab, you will see the about of IO and the timing of each section.  It will look like this:





Keep an eye out for large number of scans, physical reads and read ahead.  Always try to limit the amount of data you are dealing with as soon as possible to reduce data merging for data that is not going to be used.

Tuning the query is a complex issue and not really to be addressed here, but it can be as simple as re-arranging the joins, adding or changing an index.

      Slowest Query Report

After a performance test run and before the servers are reset, you can run the following query:
 To get the results by the top IO users
SELECT TOP 20
   qs.execution_count,
   (qs.total_physical_reads + qs.total_logical_reads + qs.total_logical_writes) AS [Total IO],
   (qs.total_physical_reads + qs.total_logical_reads + qs.total_logical_writes) /qs.execution_count AS [Avg IO],
   SUBSTRING(qt.[text], qs.statement_start_offset/2, (
       CASE
           WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX), qt.[text])) * 2
           ELSE qs.statement_end_offset
       END - qs.statement_start_offset)/2
   ) AS query_text,
   qt.[dbid],
   qt.objectid,
   tp.query_plan
FROM
   sys.dm_exec_query_stats qs
   CROSS APPLY sys.dm_exec_sql_text (qs.[sql_handle]) AS qt
   OUTER APPLY sys.dm_exec_query_plan(qs.plan_handle) tp

ORDER BY [Total IO] DESC 
To get the results by the top Avg Exec Time in ms
SELECT
(total_elapsed_time/execution_count)/1000 AS [Avg Exec Time in ms]
, max_elapsed_time/1000 AS [MaxExecTime in ms]
, min_elapsed_time/1000 AS [MinExecTime in ms]
, (total_worker_time/execution_count)/1000 AS [Avg CPU Time in ms]
, qs.execution_count AS NumberOfExecs
, (total_logical_writes+total_logical_Reads)/execution_count AS [Avg Logical IOs]
, max_logical_reads AS MaxLogicalReads
, min_logical_reads AS MinLogicalReads
, max_logical_writes AS MaxLogicalWrites
, min_logical_writes AS MinLogicalWrites
,(

SELECT SUBSTRING(text,statement_start_offset/2,
(
CASE WHEN statement_end_offset = -1
then LEN(CONVERT(nvarchar(max), text)) * 2
ELSE statement_end_offset
end -statement_start_offset)/2

)
FROM sys.dm_exec_sql_text(sql_handle)
) AS query_text
FROM sys.dm_exec_query_stats qs
ORDER BY [Avg Exec Time in ms] DESC

These two queries will show you which sql statements are impacting your DB performance the most

       Missing Index Report

You can also run a query after a performance run to see a list and content of suggested indexes.  This list is based on the queries that were run. DO NOT BLINDLY ADD THEM ALL.  You will need to evaluate each index suggested and tie it to the query that used it.  Then you will need to evaluate if tuning the query will eliminate the need for the suggested index.  Remember, an index is just another table.  If a table has 8 indexes on it and you insert a row into the table, you are actually updating 9 table.
The query to get the list of suggested indexes is
SELECT CAST(SERVERPROPERTY('ServerName') AS [nvarchar](256)) AS [SQLServer],
    db.[database_id] AS [DatabaseID],
    db.[name] AS [DatabaseName],
    id.[object_id] AS [ObjectID],
    id.[statement] AS [FullyQualifiedObjectName],
    id.[equality_columns] AS [EqualityColumns],
    id.[inequality_columns] AS [InEqualityColumns],
    id.[included_columns] AS [IncludedColumns],
    gs.[unique_compiles] AS [UniqueCompiles],
    gs.[user_seeks] AS [UserSeeks],
    gs.[user_scans] AS [UserScans],
    gs.[last_user_seek] AS [LastUserSeekTime],
    gs.[last_user_scan] AS [LastUserScanTime],
    gs.[avg_total_user_cost] AS [AvgTotalUserCost],
    gs.[avg_user_impact] AS [AvgUserImpact],
    gs.[system_seeks] AS [SystemSeeks],
    gs.[system_scans] AS [SystemScans],
    gs.[last_system_seek] AS [LastSystemSeekTime],
    gs.[last_system_scan] AS [LastSystemScanTime],
    gs.[avg_total_system_cost] AS [AvgTotalSystemCost],
    gs.[avg_system_impact] AS [AvgSystemImpact],
    gs.[user_seeks] * gs.[avg_total_user_cost] * (gs.[avg_user_impact] * 0.01) AS [IndexAdvantage],
    'CREATE INDEX [Missing_IXNC_' + OBJECT_NAME(id.[object_id], db.[database_id]) + '_'
    + REPLACE(REPLACE(REPLACE(ISNULL(id.[equality_columns], ''), ', ', '_'), '[', ''), ']', '')
    + CASE
        WHEN id.[equality_columns] IS NOT NULL
            AND id.[inequality_columns] IS NOT NULL
            THEN '_'
        ELSE ''
        END + REPLACE(REPLACE(REPLACE(ISNULL(id.[inequality_columns], ''), ', ', '_'), '[', ''), ']', '') + '_'
        + LEFT(CAST(NEWID() AS [nvarchar](64)), 5) + ']' + ' ON ' + id.[statement] + ' (' + ISNULL(id.[equality_columns], '')
        + CASE
        WHEN id.[equality_columns] IS NOT NULL
            AND id.[inequality_columns] IS NOT NULL
            THEN ','
        ELSE ''
        END + ISNULL(id.[inequality_columns], '') + ')' + ISNULL(' INCLUDE (' + id.[included_columns] + ')', '') AS [ProposedIndex],
    CAST(CURRENT_TIMESTAMP AS [smalldatetime]) AS [CollectionDate]
FROM [sys].[dm_db_missing_index_group_stats] gs WITH (NOLOCK)
INNER JOIN [sys].[dm_db_missing_index_groups] ig WITH (NOLOCK)
    ON gs.[group_handle] = ig.[index_group_handle]
INNER JOIN [sys].[dm_db_missing_index_details] id WITH (NOLOCK)
    ON ig.[index_handle] = id.[index_handle]
INNER JOIN [sys].[databases] db WITH (NOLOCK)
    ON db.[database_id] = id.[database_id]
WHERE id.[database_id] > 4 -- Remove this to see for entire instance
ORDER BY [IndexAdvantage] DESC
OPTION (RECOMPILE);

      Other Tricks


Data Volume

One of the things that you should remember is that the smaller the amount of data we are touching, the faster our response will be and the less stress will be put on the optimizer.  The smaller quantity of data will take less memory, less disk space, less time and less CPU to process.  So always restrict your data as much as possible as soon as you can.
One way of doing this is to create a temporary table to hold the records that narrow your data the most,  then joining your main table to this temporary table to eliminate the rows that cannot be joined.

            Indexed Views

When a set of tables are joined together on a regular basis, it might become easier to create a view of the joined tables and have your queries reference the simple view instead of all of the joined tables.  A normal view will use the indexes associated with each of the individual tables as it sees fit when the view is queried.  In the case of a normal view the processing will be very close to the same as referencing all of the tables individually as a view is a virtual representation that does not exists on disk until it is needed.
In some cases the creation of an indexed view can provide a performance gain.  An indexed view is a view that is actually created on the disk when the view syntax is executed. The SQL server engine will keep the view up to date in a real time so you do not need to worry about that aspect. The benefits of an indexed view is that you can create indexes on the view and the view definition can be such that you do not have to have rows in the table that you know will not be referenced (i.e. historical data).  An example of this is when you have long running processes with many tasks but you have a custom dashboard that only list tasks in progress or waiting to be selected. In this case you could create an indexed view only looking at tasks for your application and only in those two statuses.  This will create a smaller and faster table for your use and can eliminate the overhead of joins as the view definition will already include the join. Some of the drawbacks of this method is that the view has to be schema bound to the tables it references.  Because of this, I always suggest that you drop any indexed views that reference system tables before upgrading as tables that are schema bound to a view cannot be altered and without dropping the views, it might cause your upgrade to fail.

 

            Parameterized Queries

When a SQL statement is sent to the engine, one of the first things it does is to look to see if it has already calculated an execution plan for this query in the hopes that it does not have to take time to do it again.  With the goal of having a plan already cached in memory it is much better to send the engine the statement
Select name from product where product_Id = ?
And send in a parameter of ‘12345’ rather than sending in the statement of
Select name from product where product_Id = 12345
Now when you want the name for product 67890, the SQL engine will already have the best plan in memory and will not have to regenerate one if you use parameterized queries.  If you had not used parameterized queries, the engine would have to look at all of the possible plans again to calculate the best one and would come up with the same plan, but waste the resources and time of finding it.  You can make this the standard DB setting. (see the configuration setting above)

            Stored Procedures

Large complex queries or those that do processing on multiple individual rows can be moved to stored procedures to increase performance.  A few of the advantages of using a stored procedure would be that you have separated the SQL logic from the application allowing you to make changes without updating the application.  Also a stored procedure is precompiled code and will run faster than the same SQL issued straight to the DB.  Using a stored procedure opens up the possibility of what you can do in a single call to the database including recursive processing and any data manipulation.

            Common Table Expressions

As part of a stored procedure (but they can be used in submitted query as well) is the Common Table Expression (CTE).  A CTE is temporary view that can be referenced by the next SQL Statement in the same batch.  This allows for easier reading of complex queries, two pass processing and not having to repeat multiple joins in different part of Unioned SQL statements. While the use of a CTE really has no effect on performance, they can make very complex SQL statement more readable and easier to find performance issues.

            No Lock

Another trick that I use on a regular basis is the nolock hint.  Use of this hint tells SQL server to read the data for read only purposes and not to apply a lock on the rows that are being read.  Of course, in SQL server row lock could become page or table locks, and session blocking could occur. The draw back here is that you can get what are considered ‘Dirty’ reads, which are rows written to disk that have not been committed yet and could be rolled back.  Most systems are ok doing dirty reads but you will want to examine your system requirements to be sure.  If it is ok with system wide dirty reads this option can also be set at a connection level by using the following statement:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

To use the no lock hint on the statement by statement basis, use the ‘with (Nolock)’ syntax in the SQL statement. Examples of a single table a multiple table select statements with the nolock hint in place.

Select *
From Lombardi.Product with (Nolock)
Order by productId


Select *
From Lombardi.Product P with (Nolock)
Left Outer Join Lombardi.PImage PI with (Nolock) on p.productId = PI.productId
Order by productId.


No comments:

Post a Comment