I/O Requests Taking Longer Than 15 Seconds To Complete
This article describes the warning "SQL Server has encountered n occurrence(s) of I/O requests taking longer than 15 seconds to complete on file <filename> in database <dbname>".
In the majority of cases this warning indicates that an I/O operation has taken too long. This article looks at the possible reasons for this and describes what can be done to reduce the likelihood of it recurring.
There is another possible reason that is not I/O related and this is briefly referred to at the end of the article.
Let's start by examining the reason behind it. I/O requests from SQL Server are handled asynchronously by the operating system. This means that when a read or write request is made, the thread making it waits for the I/O to complete.
This frees CPU for use by another thread. Optimally an I/O should take a few ms, but can take considerably more than this. This warning is reported when the time between requesting an I/O and its completion is greater than 15 seconds.
The possible reasons for this are: I/O subsystem problems or misconfiguration, excessive I/O being requested by SQL Server, data files not optimally placed on the disk, and fragmentation.
I/O SubsystemThe first thing to do is to examine the I/O subsystem. Hardware errors are a common cause and you should run diagnostics if you are at all suspicious that this might be the case.
Two common reasons for poor I/O throughput on a SAN are out of date firmware, and insufficient queue length on the HBA. Be aware that if you upgrade SAN firmware you often need to upgrade the HBA drivers at the same time, or the server may fail to access the SAN altogether. Contact the SAN vendor for confirmation and details.
Another common cause of poor I/O performance is if a file system filter driver has been installed. A filter driver intercepts requests before they reach the file system, and performs additional processing such as anti-virus checking and encryption. It goes without saying that this can only have a negative impact on I/O performance.
If you have to have anti-virus software installed on your SQL Server, ensure that mdf, ndf and ldf files are added to the exclusion list. Even better is for realtime virus checking to be disabled completely; schedule a regular scan during quiet times instead.
I/O LoadIt is often the case that there are I/O issues at night, when batch jobs are running, but it performs well during the day with no warnings in the logs.
You may find that there are several jobs running at the same time, either through poor scheduling or because one or more jobs have overrun. Provided there is sufficient free time you could reschedule one or more jobs and monitor for a couple of days to see if it has made a difference.
This is more complicated in environments where a SAN is used by several servers, as it may be jobs running on non-SQL servers overloading the SAN and causing your problems. In this case you need to discuss the problem with the system administrators responsible for these servers and agree a schedule that is mutually acceptable.
FragmentationFragmentation may be internal (within tables/indexes) or external (file fragmentation on the disk).
Internal fragmentation is only a problem where in-order scans are being performed, and only on large tables. The warning message that is the subject of this article is unlikely to occur in this situation. Best practise is to monitor logical fragmentation and regularly rebuild or reorganize affected indexes.
Disk fragmentation can also be reduced through best practise, such as presizing data files (capacity planning), and not using autoshrink, or manually shrinking database files.
If you suspect disk fragmentation may be causing performance problems, I can recommend Diskeeper. This runs continuously in the background using idle CPU time only, and is able to defragment SQL Server files while they are in use. Your mileage may vary, but I have seen staggering improvements from running this tool.
I/O ConfigurationOther best practise includes splitting tempdb into several files, putting data and log files on separate disks, placing clustered and non-clustered indexes in separate filegroups (on different disks), keeping statistics up-to-date, etc. All these will improve performance and some are described in SQL Server Configuration.
One Last PossibilityIn a minority of cases there may be not be an issue with I/O at all, but a problem caused by CPU drift. This is described in this article.
I stress that this is rare, and you should always examine the possible causes of slow I/O described above.
SummaryThis article has described some of the most common reasons for "SQL Server has encountered n occurrence(s) of I/O requests taking longer than 15 seconds to complete on file <filename> in database <dbname>".
In most cases this message is an accurate description of the problem, and reasons for slow I/O should be investigated.