Nonclustered index storage on clustered columnstore

clip art of 
 a double-quote character

Question

In SQL Server, a non-unique nonclustered index on a rowstore table incorporates the base object’s bookmark (RID or clustering key) at all levels of the nonclustered index structure. The bookmark is stored as part of the nonclustered index key at all index levels.

On the other hand, if the nonclustered index is unique, the bookmark is present only at the leaf level of the index – not as part of the key (the bookmark is present as one or more included columns, in effect).

In SQL Server 2016, it is possible to build a nonclustered b-tree index on a column-oriented table (one that has a clustered columnstore index).

  1. What is the ‘bookmark’ used for a nonclustered b-tree index on a clustered columnstore table?
  2. Do the differences between unique and non-unique nonclustered indexes described above still apply?

asked 2017-01-06 by Paul White


Answer

  1. The “bookmark” is the columnstore index original locator (per “Pro SQL Server Internals” by Dmitri Korotkevitch). This is an 8-byte value, with the columnstore index’s row_group_id in the first 4-bytes and an offset in the second 4-bytes.
  2. If you use DBCC PAGE to look at the non-clustered index, the 8-byte columnstore index original locator appears in the “uniquifier” column of the DBCC PAGE output. This shows that a unique non-clustered index does not need to include the columnstore row locator, whereas a non-unique non-clustered index does.

The following code creates a columnstore-organized table with a unique and non-unique b-tree nonclustered index on the same column:

CREATE TABLE dbo.Heapish
(
    c1 bigint NOT NULL,
    c2 bigint NOT NULL,
    INDEX CCI_dbo_Heapish CLUSTERED COLUMNSTORE
);
GO
INSERT dbo.Heapish WITH (TABLOCKX)
    (c1, c2)
SELECT TOP (1024 * 1024 * 8)
    c1 = ROW_NUMBER() OVER
        (ORDER BY C1.[object_id], C1.column_id),
    c2 = ROW_NUMBER() OVER
        (ORDER BY C1.[object_id], C1.column_id)
FROM master.sys.columns AS C1
CROSS JOIN master.sys.columns AS C2
ORDER BY
    c1
OPTION (MAXDOP 1);
GO
CREATE UNIQUE NONCLUSTERED INDEX UNIQUE_c2 ON dbo.Heapish (c2) WITH (MAXDOP = 1);
CREATE NONCLUSTERED INDEX NONUNIQUE_c2 ON dbo.Heapish (c2) WITH (MAXDOP = 1);

We can see the size of the index row at different levels of the b-tree using sys.dm_db_index_physical_stats:

SELECT
    DDIPS.index_level,
    DDIPS.page_count,
    DDIPS.record_count,
    DDIPS.min_record_size_in_bytes,
    DDIPS.max_record_size_in_bytes
FROM sys.dm_db_index_physical_stats
(
    DB_ID(),
    OBJECT_ID(N'dbo.Heapish', N'U'),
    INDEXPROPERTY(OBJECT_ID(N'dbo.Heapish', N'U'), N'UNIQUE_c2', 'IndexID'),
    NULL, 'DETAILED'
) AS DDIPS;

SELECT
    DDIPS.index_level,
    DDIPS.page_count,
    DDIPS.record_count,
    DDIPS.min_record_size_in_bytes,
    DDIPS.max_record_size_in_bytes
FROM sys.dm_db_index_physical_stats
(
    DB_ID(),
    OBJECT_ID(N'dbo.Heapish', N'U'),
    INDEXPROPERTY(OBJECT_ID(N'dbo.Heapish', N'U'), N'NONUNIQUE_c2', 'IndexID'),
    NULL, 'DETAILED'
) AS DDIPS;

The output is:

Both structures have the same row size at the leaf level, but the nonunique nonclustered index is 12 bytes larger than the unique nonclustered index at the non-leaf levels due to the 8-byte columnstore locator, plus 4 bytes of overhead for the first variable-length column in a row (uniquifier is variable length).

answered 2017-01-06 by Andy Mallon