log_reuse_wait_desc = AVAILABILITY_REPLICA, log is 100% full but AG is all fine
Question
That’s a weird scenario. The transaction log is full. It waits on AG. AG is all good. No latency, last commit/hardened time is almost real-time on all the nodes. The Estimated Recovery Time is zero. I checked this from the primary node and secondaries as well (in case if Dashboard is not updating).
This page lists only two reasons for this error, a delivery latency and redo latency. Nothing applies to my system. How to troubleshoot this further?
Edit 1. The only change we have done to the environment, we added two SQL Server 2019 nodes in preparation for migration. It shouldn’t follow to such problem though.
Edit 2. The only workaround I found is to re-add the DB to the AG.
asked 2021-11-08 by Alsin
Answer
You should check the DMVs to confirm that the AG is healthy with send & redo queues. The dashboard in SSMS does not do a great job at representing the true send & redo health.
As part of my open-source DBA database I have a Check_AGHealth
stored procedure that you can use to give more information.
Without using the full stored procedures, a couple of the important statements to run on primary would be these:
Check dm_hadr_database_replica_states
This query returns a bunch of data, but the interesting things you’ll want to look for are:
- Anything reporting that it is suspended
- Anything reporting it is unhealthy
- Large log send queue
- Large redo queue
The ORDER BY
clause on this should bubble those things up to the top of the result set, but any of those things would present a problem to clearing the log and would show a log reuse wait of AVAILABILITY_REPLICA
SELECT ServerName = ar.replica_server_name, AgName = ag.Name, DbName = db_name(ds.database_id), AgRole = rs.role_desc, rs.role , UnsentLogMB = ds.log_send_queue_size/1024.0, RedoQueueSizeMB = ds.redo_queue_size/1024.0, RedoEstCompletion = CASE WHEN redo_rate = 0 THEN 0 ELSE ds.redo_queue_size/ds.redo_rate END, SynchState = ds.synchronization_state_desc, AgHealth = ds.synchronization_health_desc, SuspendReason = ds.suspend_reason_desc, SynchHardenedLSN = ds.last_hardened_lsn, LastHardenedTime = ds.last_hardened_time, LastRedoneTime = ds.last_redone_time, LastCommitTime = ds.last_commit_time FROM sys.availability_groups AS ag JOIN sys.availability_replicas AS ar ON ar.group_id = ag.group_id JOIN sys.dm_hadr_database_replica_states AS ds ON ds.group_id = ar.group_id AND ds.replica_id = ar.replica_id JOIN sys.dm_hadr_availability_replica_states AS rs ON rs.replica_id = ds.replica_id ORDER BY CASE WHEN ds.suspend_reason_desc IS NOT NULL THEN 1 WHEN ds.synchronization_state_desc <> 'HEALTHY' THEN 1 ELSE 9 END, rs.role DESC, ds.log_send_queue_size + ds.redo_queue_size DESC ;
answered 2021-11-08 by Andy Mallon