Always On Synchronous Commit Mode With Large Delays

clip art of 
 a double-quote character

Question

For SQL Server Enterprise’s Always On availability groups in Synchronous Commit mode, is there some amount of latency that can’t be exceeded? For example, can it work between very distant data centers, such as between the US and Europe?

asked 2020-10-18 by uncaged


Answer

The answer you’re looking for is in the docs

⚠ Note

If primary’s session-timeout period is exceeded by a secondary replica, the primary replica temporarily shifts into asynchronous-commit mode for that secondary replica. When the secondary replica reconnects with the primary replica, they resume synchronous-commit mode.

When this happens, a synchronous secondary will fall back to be asynchronous (what some folks call "pseudo synchronous" because you set it to synchronous, but it’s actually async). This is not specific to network latency, or physical distance, but can happen under a number of scenarios. As an example, if you restart a synchronous replica and that server is offline, your AG will fall back to asynch and continue to process transactions.

Synchronous AGs only pay a performance penalty on writing to the transaction log (which needs to write to the secondary replica). Write activity is often a minority of the database activity, with the majority of traffic being read operations, which are not affected by synch/async.

Running an AG in sync commit mode between US & EU is totally possible. Write transactions will be slower, as they need to write to the log on the other side of the ocean. Generally, a better design is to keep synch replicas closer to avoid the network latency. However, depending on the IO patterns of your database, your business needs, and your network backbone between geographies, it’s possible. On the systems I support, we can tolerate synchronous commit latency across hundreds of miles–but when the distance gets longer than that the performance impact becomes too great. It’s not that SQL Server can’t support it–it’s that the users will not tolerate that slowness.

Generally, I find most folks end up running synchronous commit within a data center/geography for high availability, and asynchronous commit between geographies/data centers for disaster recovery. For a planned failover across geographies, you can temporarily switch the remote replica to synchronous mode to ensure no data loss during the failover.

The easiest way to find out the actual impact on your application is to test it.

answered 2020-10-18 by Andy Mallon