When to partition table in SQL Server based on the amount of memory on server

clip art of 
 a double-quote character

Question

I understand that table partitioning is mostly done for data management. I understand that table maintenance becomes more difficult for large tables, since e.g. an index rebuild cannot fit in memory or because e.g. sorts scale with O(n*log(n)) and this causes extra problems with large tables.

Can someone give some examples how large a table should be, relative to the amount of memory on the server, to be a definite candidate for partitioning? Say the table is larger than the amount of RAM. Will it be bad practice not to partition the table?

I am looking for some principles I can base my analysis upon.

In the article linked to in the answer below it also mentions:

  • Long-running index maintenance jobs (or an inability to run them at all because they would take so long) reference.

So I guess my question can be formulated as: When will SQL Server start having problems with rowstore index rebuilds when they are not partitioned, given the hardware specifications?

I am not convinced that "it is not about the size of the table". When an index cannot be rebuilt then I would say partitioning is very much related to the size of the table.

asked 2021-06-25 by xhr489


Answer

There’s not really a size-based formula to start partitioning a table. Unfortunately, it’s a lot more complicated than that.

A tangent on performance vs data management

I would word it even more strongly than saying partitioning is "mostly done for data management." I would say that partitioning is a data management feature–full stop. Some folks will see performance benefits, but that’s really a side benefit in certain scenarios related to coding & index patterns that are forced to accomplish partition alignment. Those coding & indexing patterns on their own often bring similar benefit. But that’s a bit of a ranty tangent from your actual question.

The important thing is that partitioning is a data management feature, not a performance feature.

When to use partitioning?

Kendra Little wrote a great post titled How To Decide if You Should Use Partitioning and I suggest reading that. It’s a few years old, but still fully accurate and relevant.

The summary is that there are certain "stories" around data management (not size) that are good drivers for adopting partitioning. Further, any size considerations will usually be related to the size of a partition rather than the size of the overall data.

A few of the "textbook" reasons are:

  • There is a specific key (such as date windows) that are used to identify different segments (partitions) of your data.
  • When doing ETL/ELT, using a separate/partition as a "landing" spot, then SWITCH the partition into the main table.
  • When purging old data, partitioning by date so that old data can be purged by truncating/dropping old partitions, rather than using DELETEs.
  • Using different compression levels on "active" data compared to archive data.
  • Query patterns vary between different segments of data (such as active vs archive, or other partitioning key).

That’s just a quick list off the top of my head, but needless to say it’s more about how you manage it, rather than specifically about size. If you partition by an arbitrary key that isn’t related to your data management needs, you’re likely to gain a bunch of complexity without realizing any benefits.

answered 2021-06-25 by Andy Mallon