Data Compression: How row compression works

SQL Server supports two kinds of data compression on rowstore data. Columnstore compression is for a different day. You can compress heaps, clustered indexes (on both tables and views), and nonclustered indexes (on both tables and views).

Compression requirements

Data compression is available in SQL Server 2008 & newer. It is an Enterprise Edition feature–if you’re using Standard Edition or Express, then you’re out of luck. There are benefits & costs to using data compression–as with anything, make sure you understand it before you implement it.

How row compression saves space

You can think of row compression as working by treating certain fixed-length data types as variable-length data types. By removing certain metadata, NULL and 0 values, and the padding of fixed-length values, SQL Server can reduce the total size of a row.

The easiest way to think of it is that char(n) no longer takes n bytes for every row, but instead gets treated more like varchar(n) where the storage used varies for each value. The behavior for each data type varies, with some data types getting more or less (or no) savings compared to others.

As with most things, Books Online has some good information. Check out the section in this article on How Row Compression Affects Storage, which has a blow-by-blow for how each data type is affected by row compression.

The syntax

If you’re creating a new index, or rebuilding an existing one, you just tack on WITH (DATA_COMPRESSION=ROW) to your statement. If you’re creating a new table and declaring your Primary Key inline, you use the same WITH statement on that line of your CREATE TABLE. If you’re using 2014 and creating indexes inline with your CREATE TABLE (I love this syntax!), it’s still the same WITH statement.

In general, any time you are building (or rebuilding) a new B-Tree or heap, you can tack on this syntax

Here’s an example CREATE TABLE that uses row compression for the clustered index (also the PK), as well as two nonclustered indexes:

CREATE TABLE dbo.TestROWCompression (
 ID BIGINT IDENTITY(1,1) NOT NULL,
 StateName CHAR(100),
 MedianIncome MONEY,
 IncomeYear SMALLINT,
 CapitalCity CHAR(100),
 LargestCity CHAR(100), --NULL if same as Capital
 StatehoodDate DATETIME2(0),
 CONSTRAINT PK_TestROWCompression PRIMARY KEY CLUSTERED (ID) 
                   WITH (Data_Compression=ROW),
 INDEX IX_TestROWCompression_StateName NONCLUSTERED (StateName) 
                   WITH (Data_Compression=ROW),
 INDEX IX_TestROWCompression_Statehood NONCLUSTERED (StatehoodDate) 
                   WITH (Data_Compression=ROW)
);

The demo

I can read Books Online all day, but I only ever REALLY learn something if I see it in action. So lets look at data compression in action.

I’m going to use AdventureWorks2014 for this, but you could do the same on any demo database. I know it’s a crummy database, but it’s readily available.

For the sake of demonstration, let’s apply row compression to EVERY heap and index in AdventureWorks. I’ve seen this as a common technique in the real world, so let’s see how it works. This SQL will generate all the ALTER…REBUILD statements for every table or indexed view in the database. Don’t run the whole thing all at once though. Keep reading.

SELECT 'ALTER TABLE [' + s.name + '].[' + t.name 
              + '] REBUILD WITH (DATA_COMPRESSION=ROW);'
FROM sys.objects t
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.indexes i ON i.object_id = t.object_id
WHERE i.type = 0 --Just heaps
AND t.type IN ('U','V')
ORDER BY s.name, t.name;
--
SELECT 'ALTER INDEX [' + i.name + '] ON [' + s.name + '].[' + t.name 
              + '] REBUILD WITH (DATA_COMPRESSION=ROW);'
FROM sys.objects t
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.indexes i ON i.object_id = t.object_id
WHERE i.type NOT IN (0,3) --Exclude heaps & XML indexes
AND t.type IN ('U','V')
ORDER BY s.name, t.name;

Now that you have a script that will compress everything, step through it one table at a time. Look at the size of the table before and after you compress it. Take note that some indexes compress better than others. If you want an easy way to look at index size, you can do something like this:

USE Adventureworks2014
--HumanResources.JobCandidate
SELECT DbName = db_name(database_id) 
 ,TableName = object_name(object_id,database_id) 
 ,IndexID = index_id
 ,IndexType = index_type_desc
 ,AllocUnitType = alloc_unit_type_desc
 ,IndexSizeKB = page_count * 8
 ,AvgRecordSizeBytes = avg_record_size_in_bytes
FROM sys.dm_db_index_physical_stats(db_id('AdventureWorks2014'),
 object_id('HumanResources.JobCandidate'),
 NULL,NULL,'Detailed') -- Have to use Detailed mode to get compression info
WHERE index_level = 0 --Only the leaf level of the index
ORDER BY compressed_page_count DESC, 
 page_count DESC;

Chances are, you’ve noticed that pretty much every index gets smaller–sometimes significantly smaller. Row compression works! But take a look at this index:

ALTER INDEX [AK_SalesOrderDetail_rowguid] ON [Sales].[SalesOrderDetail] 
      REBUILD WITH (DATA_COMPRESSION=ROW);

Check the size before & after you compress this index, and you’ll see that the index gets BIGGER. Uncompressed, the index is 3608KB, and after compression it’s 3936KB. What’s up with that?

Well, the index is on a uniqueidentifier column–it’s a fixed width column that cannot be compressed. The clustering keys (which are always included in non-clustered indexes) are integers, so some of those values are compressible, but we don’t save a lot of space there. The overhead associated with row compression adds a few bytes of space, but compression doesn’t save enough for there to be net savings.

Conclusion

In most cases, you’ll find that row compression is able to save some space–but not always. In some cases, applying row compression will make your index bigger. You need to know your data, know your schema, and do testing. Watch for those scenarios where row compression is more costly than no compression.

henrikstaunpoulsen
August 5, 2019 at 3:17 am

I had a few external tables, and your query reported those as being Heaps.
Here is a modification, so those rows are excluded:
SELECT ‘ALTER TABLE [‘ + s.name + ‘].[‘ + t.name
+ ‘] REBUILD WITH (DATA_COMPRESSION=ROW);’
FROM sys.objects t
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.indexes i ON i.object_id = t.object_id
JOIN sys.tables t1 ON i.object_id = t1.object_id
WHERE i.type = 0 –Just heaps
AND t.type IN (‘U’,’V’)
and t1.is_external=0 — no external tables
ORDER BY s.name, t.name;