Using SQL Agent MSX for Availability Groups


Availability Groups (AGs) are all the rage these days. Every SQL Saturday, every local Users Group, bloggers and DBAs everywhere are all talking about how cool they are. There’s a lot of talk about how AGs provide a compelling alternative to Failover Cluster Instances (FCIs). The cost (nothing is free, of course) is that with AGs, you have non-contained objects that you need to manually keep in sync between nodes.

A lot has been said and blogged about syncing non-contained objects. Here. Here. Here. Here. And plenty of other places.

Recently, I was talking to Mike Hillwig (blog|twitter) about SQL Server Agent Multi-Server Administration (known as MSX). I realized that there hasn’t been much written about leveraging the (built-in, free) MSX feature to sync your jobs for you. It’s what I implemented at my day job, and it’s worked flawlessly.

Mike has a great blog post about MSX that I recommend checking for an overview of the feature.

The basic approach

We want to deploy the job to EVERY server participating in the Availability Group, but only have it run on the Primary replica. (Or maybe you have a job that you only want to run on secondary?) We want to update the job in one place, and have it magically go everywhere it needs to go.

MSX can handle keeping the jobs in sync, but we need to use a little special sauce to have it only run on Primary. I wrote a scalar function (dbo.AGDbRole_Get) that returns either PRIMARY or SECONDARY, depending on the AG role.

Let’s do it

My function isn’t anything terribly fancy–it just looks at the DMVs and figures out if the DB is the primary replica or not. I create it in my DBA database, which is my superhero utility belt that lives on every server and has all my code in it.

Now, you create your job on the MSX and use the function as your special sauce. Wrap EVERY JOB STEP in an IF statement:

IF DBA.dbo.AGDbRole_Get('AM2') IN ('PRIMARY','ONLINE')
BEGIN
 PRINT 'Do Stuff';
END;

 
NOTE: If the DB is removed from the AG (say, for some sort of maintenance or in an emergency), you probably still want the job to run. That’s why I built my function to return the database state_desc for databases not in an AG, and then check that the DB is either PRIMARY (if it’s in an AG) or ONLINE (if it’s not in an AG).

Simply select every Availability Group replica as a target for the Agent job, and you’re in business. You have a job that stays in sync and only runs when the database is the primary replica (or if it is removed from the AG).

Your jobs will run at the scheduled time on every server, but the dbo.AGDbRole_Get function will ensure that it only does “stuff” on the primary replica. On secondary replicas, the job will run successfully, but complete in about a second.