Rebuild Very Large Primary Key Index

Question

I have a SQL database that is hosted on Azure. The problem is that the size is getting out of control, I can see up to 99% fragmentation in the Primary Key clustered indexes.

I’m able to rebuild all other indexes with online=on option and it won’t affect performance. The size of one of the PK Clustered indexes is greater than 200GB, and for this one a REBUILD...WITH (ONLINE=ON) causes locking.

We do have users from all timezones accessing the site so really, I’m unable to find a time where I can rebuild the index offline.

What is the best strategy to rebuild large indexes without having a downtime in the site?

I believe reorganize won’t help since fragmentation is 99%. The problem is that the table gets locked even with online. The main problem is that the index is greater than 200GB. The primary key is an integer.

asked 2017-09-02 by Techy

Answer

First, it’s important to consider whether fragmentation matters.

If your query is only doing single-row seeks, you might not notice fragmentation at all. On modern SANs, the SAN-level caching may make phyiscal IOs fast enough that fragmentation doesn’t matter. On SSD, the random IO pattern caused by scanned a fragmented index may actually result in better performance than non-fragmented data.

Often times, people notice that rebuiliding an index fixed a performance problem. Rebuilding an index also builds fresh statistics. It may be the case that the real fix is fresh statistics, not rebuilding the index. UPDATE STATISTICS...WITH FULLSCAN may be a cheaper, faster, less intrusive way to solve the same performance problem.

If you are not having problems caused by fragmentation, then you could be spending significant time & effort for no actual gain.

Second, there are two kinds of fragmentation:

Physical fragmentation. This is what most people think of when they think of fragmentation. Pages are out of order, and need to be re-ordered. When scanning an index this type of fragmentation can sometimes be a problem. I’ve generally noticed this has the largest impact on performance with physical reads. If you are looking at the results from sys.dm_db_index_physical_stats, this number is the avg_fragmentation_in_percent column.
Low-density fragmentation. This fragmentation is caused by pages that are only partially filled with data. You have low density of data because your data is spread across more pages than necessary. As a result, reading the data requires more IOs because the data is spread across more pages than necessary. This can affect both logical and physical reads. If you are looking at the results from sys.dm_db_index_physical_stats, this number is the avg_page_space_used_in_percent column. This column is only populated when using SAMPLED or DETAILED mode.

So what do you do about it:

Physical Fragmentation: If you are simply chasing high numbers for avg_fragmentation_in_percent, really consider if you’re wasting your time. Make sure that you have an actual query that is performing poorly, and use a test environment to confirm that you’re fixing a problem by eliminating fragmentation.

You can address physical fragmentation by doing ALTER INDEX...REORGANIZE. The REORGANIZE operation is online, moving pages one at a time to reorganize them back into physical order. If you kill a REORGANIZE statement part way through, any work that was already performed is maintained–only the one page currently being moved will be rolled back. Doing a REORGANIZE on a large table that is highly fragmented can require more total transaction log space, and in full recovery mode may generate a significant amount of transaction log backups. It may also take longer to REORGANIZE a highly fragmented index than to REBUILD it.

You will often see advice to perform a REBUILD on highly-fragmented indexes, rather than a REORGANIZE — This is because rebuilding from scratch can be more efficient. However, reorganizing can be a "more online" operation and is sometimes preferred, even for highly fragmented indexes.

Low-density fragmentation cannot be fixed by REORGANIZE. It can only be fixed by doing an ALTER INDEX...REBUILD. By doing the index with ONLINE=ON, you should be able to minimize blocking. However, the REBUILD still needs to take a lock for a moment to swap the old index for the new index. On a very busy system, attaining this exclusive lock can sometimes be a problem. You should be able to confirm if you are having this issue by using something like sp_whoisactive to examine blocking during your rebuild, and looking at the details of the locks & waits. Using the WAIT_AT_LOW_PRIORITY option may be useful if you know that there is an upcoming period of low utilization, and your rebuild can "sneak in" for this swap when activity drops low enough to attain that lock. Note that a long-running REBUILD operation is also going to be a long-running open transaction. Long-running open transactions can have their own problems, related to transaction log use/reuse. If you are using mirroring or Availability Groups, there are also considerations for transaction log redo on the secondary replica.

answered 2017-09-05 by Andy Mallon