Thursday, February 8, 2007

Data Type Performance Tuning Tips for Microsoft SQL Server

Always specify the narrowest columns you can. The narrower the column, the less amount of data SQL Server has to store, and the faster SQL Server is able to read and write data. In addition, if any sorts need to be performed on the column, the narrower the column, the faster the sort will be. [6.5, 7.0, 2000, 2005] Updated 8-21-2005

*****

If you need to store large strings of data, and they are less than 8,000 characters, use a VARCHAR data type instead of a TEXT data type. TEXT data types have extra overhead that drag down performance. [7.0, 2000, 2005] Updated 8-21-2005

*****

If you have a database running on SQL Server 7 or 2000 that used to run under version 6.5, and because of the limited row size had to split a column into two or more columns because the column width exceed what SQL Server version 6.5 was able to support, consider altering the table so that the multiple columns now fit back into one column again. This of course assumes that your column width is 8,000 characters or less for ASCII data. This will reduce server overhead and boost performance. [6.5, 7.0, 2000] Updated 8-21-2005

*****

Don't use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O and wasting unnecessary space in your buffer cache. [7.0, 2000] Updated 8-21-2005

*****

If the text data in a column varies greatly in length, use a VARCHAR data type instead of a CHAR data type. The amount of space saved by using VARCHAR over CHAR on variable length columns can greatly reduce I/O reads cache memory used to hold data, improving overall SQL Server performance.

Another advantage of using VARCHAR over CHAR columns is that sorts performed on VARCHAR columns are generally faster than on CHAR columns. This is because the entire width of a CHAR column needs to be sorted. [6.5, 7.0, 2000] Updated 8-21-2005.

*****

If a column's data does not vary widely in length, consider using a fixed-length CHAR field instead of a VARCHAR. While it may take up a little more space to store the data, processing fixed-length columns is faster in SQL Server than processing variable-length columns. [6.5, 7.0, 2000] Updated 8-21-2005

*****

Always choose the smallest data type you need to hold the data you need to store in a column. For example, if all you are going to be storing in a column are the numbers 1 through 10, then the TINYINT data type is more appropriate that the INT data type. The same goes for CHAR and VARCHAR data types. Don’t specify more characters in character columns that you need. This allows you to store more rows in your data and index pages, reducing the amount of I/O needed to read them. It also reduces the amount of data moved from the server to the client, reducing network traffic and latency. And last of all, it reduces the amount of wasted space in your buffer cache. [6.5, 7.0, 2000] Updated 8-21-2005

*****

If you have a column that is designed to hold only numbers, use a numeric data type, such as INTEGER, instead of a VARCHAR or CHAR data type. Numeric data types generally require less space to hold the same numeric value as does a character data type. This helps to reduce the size of the columns, and can boost performance when the columns is searched (WHERE clause), joined to another column, or sorted. [6.5, 7.0, 2000] Updated 10-16-2005

*****

Don't use FLOAT or REAL data types for primary keys, as they add unnecessary overhead that hurts performance. Use one of the integer data types instead. [6.5, 7.0, 2000] Updated 5-15-2006

*****

When specifying data types during table creation, always specify for each NULL or NOT NULL column. If you don't, then the column will default to NOT NULL if the ANSI NULL DEFAULT database option is not selected (the default), and will default to NULL of the ANSI NULL DEFAULT database option is selected.

For best performance, and to reduce bugs, columns should ideally be set to NOT NULL. For example, use of the IS NULL keywords in the WHERE clause makes that portion of the query non-sargable, which means that portion of the query cannot use an index. [6.5, 7.0, 2000] Updated 5-15-2006

*****

If you are using fixed length columns (CHAR, NCHAR) in your table, do your best to avoid storing NULLs in them. If you do, the entire amount of space dedicated to the column will be used up. For example, if you have a fixed length column of 255 characters, and if you place a NULL in it, then 255 characters have to be stored in the database. This is a large waste of space that will cause SQL Server to have to perform extra disk I/O to read data pages. It also wastes space in the data cache buffer. Both of these contribute to reduced SQL Server performance.

Instead of using NULLs, use a coding scheme similar to this in your databases:

NA: Not applicable
NYN: Not yet known
TUN: Truly unknown
Such a scheme provides the benefits of using NULLs, but without the drawbacks.

If you really must use NULLs, use a variable length column instead of a fixed length column. (I doubt your reason for using Nulls is very good.) Variable length columns only use a very small amount of space to store a NULL. [7.0, 2000] Updated 5-15-2006

*****

If you use the CONVERT function to convert a value to a variable length datatype, such as VARCHAR, always specify the length of the variable datatype. If you do not, SQL Server assumes a default length of 30. Ideally, you should specify the shortest length to accomplish the required task. This helps to reduce memory use and SQL Server resources. [6.5, 7.0, 2000] Updated 5-15-2006

*****

Generally, using computed columns in a table is not recommended because it does not follow the standard rules of normalization. But, it is sometimes more efficient overall to use computed columns in a table rather than re-computing the same data repeatedly in queries. This is especially true if you are running the same query over and over against your data that performs the same calculations over and over. By performing the calculations in the table, it can reduce the amount of work performed by a query each time it is run. You have to determine for yourself where the bottleneck in performance is, and act accordingly. If the bottleneck is in INSERTS and UPDATES, then using calculated columns may not be a good idea. But if your SELECT statements are the bottleneck, then using calculated columns may pay off. [6.5, 7.0, 2000] Updated 5-15-2006

*****

Avoid using the new bigint data type unless you really needs its additional storage capacity. The bigint data type uses 8 bytes of memory verses 4 bytes for the int data type. [2000] Updated 12-6-2005

*****

Avoid using the SQL Server 2000 sql_variant datatype. Besides being being a performance hog, it significantly affects what you can do with the data stored as a sql_variant. For example, sql_variant columns cannot be a part of primary or foreign keys, can be used in indexes and unique keys if they are shorter than 900 bytes, cannot have an identity property, cannot be part of a computed column, must convert the data to another datatype when moving data to objects with other datatypes, are automatically converted to nvarchar(4000) when accessed by client applications using the SQL Server 7.0 OLE DB or ODBC providers, are not supported by the LIKE predicate in the WHERE clause, cannot be concatenated, and don't work with some functions. [2000] Updated 12-6-2005

*****

Don't use the DATETIME data type as a primary key. From a performance perspective, it is more efficient to use a data type that uses less space. For example, the DATETIME datatype uses 8 bytes of space, while the INT datatype only takes up 4 bytes. The less space used, the smaller the table and index, and the less I/O overhead that is required to access the primary key. Updated 12-6-2005

*****

If you are creating a column that you know will be subject to many sorts, consider making the column integer-based and not character-based. This is because SQL Server can sort integer data much faster than character data. [6.5, 7.0, 2000] Updated 10-16-2005

*****

Take care when using Unicode data in your queries, as it can affect query performance. A classic problem is related to an application passing in Unicode literals, while the column searched in the database table is non-Unicode. This, of course, may be visa-versa depending on your scenario.

Here is an example. The DB column "orgname_name" has been indexed, and is of type varchar. The code below performs OK (so we think) performing an index scan operation:

declare @myvar nvarchar(200)
set @myvar = N'Central West College of TAFE'
select * from Organisation_Name where orgname_name = @myvar

|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([corpsys].[dbo].[Organisation_Name]))
|--Index Scan(OBJECT:([corpsys].[dbo].[Organisation_Name].[Organisation_Name]),
WHERE:(Convert([Organisation_Name].[orgname_name])=[@myvar]))

Table 'Organisation_Name'.
Scan count 1,
logical reads 1145,
physical reads 0,
read-ahead reads 0.

If we change this around slightly, using a varchar variable instead (no explicit Unicode conversion) we see this:

declare @myvar varchar(200)
set @myvar = 'Central West College of TAFE'
select * from Organisation_Name where orgname_name = @myvarM

|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([corpsys].[dbo].[Organisation_Name]))
|--Index Seek(OBJECT:([corpsys].[dbo].[Organisation_Name].[Organisation_Name_nameix]),
SEEK:([Organisation_Name].[orgname_name]=[@myvar]) ORDERED FORWARD)

Here we see an INDEX SEEK lookup with a massive performance improvement:

Table 'Organisation_Name'.
Scan count 1,
logical reads 9,
physical reads 0,
read-ahead reads 0.

No comments: