Alerting on SQL Server Blocking

Andy

8 years ago

When blocking becomes excessive, it can cause problems. Are your blocks excessive?

Identifying blocking is one of the most basic things that a DBA should be monitoring for. I previously wrote about how to identify blocking in SQL Server and find the leading blocker. Recently, my friend Jim Donahoe (blog | twitter) asked me about setting up email alerts related to blocking, which prompted me to clean up some additional code I have that does just this. If you haven’t already read my previous post, you should probably do that now.

Blocking is OK. Excessive blocking is bad.

Think of blocking within the database like traffic lights or a stop sign controlling the flow of traffic on the roads. It’s normal to have to stop at an intersection, let some other folks go, then it’s your turn, and you’re off on your way. However, when the roads are busy, you might have to sit through several light cycles before you get through the intersection and are moving again. Or what happens when a bus doesn’t make it all the way through the intersection and blocks the cross traffic. Well, it’s that excessive blocking that DBAs need to identify and fix in the databases under our watch.

If you have monitoring software, that software probably has monitoring & alerting built in for blocking–but not everyone has every server monitored. For one, monitoring software isn’t free–and for lower-priority servers it might not be in the budget. As part of my home-grown DBA database, I have some stored procedures that alert so that I can handle alerting on the basics, even when full-blown monitoring isn’t in the cards.

What do you need in an alert?

I have a few basic requirements when I’m making an alert. They seem pretty straightforward, but I’m constantly bumping into alerts that don’t meet these requirements. In this context, an alert equals an email.

Only alert if there’s actually a problem that is actionable. Please don’t throw email alerts at me if there’s nothing for me to do. Don’t be the boy who cried wolf. You probably already have an email rule that ignores alerts you don’t care about. Don’t make more spam.
Only alert if the problem is critical. In addition to my #1 requirement that it be actionable, I should have to act now. Email alerts are a cry for help, and aren’t appropriate for every problem. For lower-priority problems, I prefer digest reports, where I can set aside time in my day to work on many lower-priority issues and take care of them all at once.
Give me all the info I need, but just the info I need. It’s 9 o’clock on a Saturday, and your database alerts start to roll in while you’re making love to your tonic and gin. It’s imperative that you’re able to triage problems directly from the alert, and ideally take action without having to do a bunch of investigation to figure out what to do.
Alert promptly. We’ve already established that we’re dealing with critical problems–if it’s critical, then I need to know now. In particular, I need to know before I find out from end users that they have a problem. Ideally, I can fix it before anyone else notices.
Don’t over-alert. How many email alerts do you need on the same problem? You probably don’t need to send an email every 30 seconds–but it can also be helpful to alert periodically to let you know a problem is ongoing.

It’s not a lot to ask for, but it does require we do a little bit of extra work.

Let’s build an alert

Identify actionable problems

We’ve already done the heavy lifting in dbo.Check_Blocking to identify excessive blocking. When we have excessive blocking, the action is probably to kill a session, or stop an application process, or tell a coworker to cease & desist.

Identify critical problems

You’ll still need to determine what the right thresholds are for the @BlockingDurationThreshold and @BlockedSessionThreshold–every environment and application might have different thresholds. How much blocking is normal in your environment? How much blocking is acceptable? Do nights & weekends have different critical thresholds than the business day? Are there maintenance windows during which you ignore problems?

I can’t answer any of those questions for you. You’ve got the two parameter knobs to turn, and we’re going to schedule a job to run the procedure, so you’ll need to determine your own job schedule. I frequently use two different schedules — one that runs during business hours, then another with more lax thresholds that runs nights & weekends.

Getting the right data points

We’ve already done the work with the dbo.Check_Blocking procedure to get the right data points, so we’re done, right? Not so fast. We’ve got more work to do.

A couple years ago, I wrote about the importance of sending pretty emails. We’re going to use my dbo.EmailCss() function from that post to help format the email. The CSS generated by that function will define styles for fonts & formatting for HTML tables. This helps make sure that we’re not only presenting the right info, but we’re also doing it in a way that makes it easy to digest. Every email alert that I send from SQL Server starts with this CSS before I even build the HTML email body.

In my dbo.Check_Blocking procedure, I used an XML field to SessionInfo for all of the sessions involved in the blocking chain. I’m going to make some modifications to that code, and this time, I’m going to generate HTML instead, including some formatting to make sure it’s all pretty:

There’s one more critical piece of info that we need to add to our alerts. What SQL Instance is sending these alerts? If it’s a Failover Cluster Instance, which node is the instance running on? Has it failed over or restarted recently? What’s the instance start time? Similar to the way I use a function to generate standard CSS, I use a function to generate a standard footer, dbo.EmailServerInfo_Get().

Notice that for this FCI, the Server & Instance are different?

Alert promptly

This is pretty easy. We’ll just schedule a job to run frequently. Depending on what you determined was your critical threshold, you might set this alert to run as frequently as every minute.

For most people, “scheduling a job” means setting up a SQL Agent job to run the procedure. If you’re using Azure SQL DB (which has neither DB Mail nor SQL Agent), or if you just love PowerShell, you might want to use PowerShell to centrally run this alert and send emails. (This actually isn’t a bad idea. Depending on the nature of your problem, DB Mail might not be able to send an email.) In this case, simply run the procedure with @Debug=2. With that parameter value, the procedure just returns a single-row, single-column result set that is an HTML blob. Just use that HTML blob as the body of your email and send it along from PowerShell.

Don’t over-alert

I just told you to schedule the job to run every minute. That means when there is blocking, you’re going to get an email every minute, right? WRONG. Each time dbo.Alert_Blocking sends an email, it logs that to a table(dbo.Monitor_Blocking). Each time it identifies blocking, it checks that table to see if it recently sent an email alert. If it recently sent an email, then it snoozes itself & won’t re-alert. The @EmailThreshold parameter controls how long to snooze alerts. Simply pass in how many minutes to snooze between emails. As a bonus, this table can be used as a time machine to look back at blocking problems over time.

Gah, you just threw a lot at me, Andy

OK, OK, OK… I threw a LOT of info at you in a relatively short post (plus those two other posts I referenced). So what do you actually need to do to set up this alert?

If you install my full DBA database, then you have everything you need. Instructions to install the whole thing are in the project’s README. If you want to install just the bits for this alert, there are a few objects you’ll need out of that project:

dbo.EmailCSS_Get() – This scalar function provides the CSS to ensure the email alerts are well-formatted and easy to read.
dbo.EmailServerInfo_Get() – This scalar function provides the HTML for the footer of the email with standard server info.
dbo.Monitor_Blocking – This is the table used to throttle emails & for historical purposes.
dbo.Alert_Blocking – This is the procedure that actually does the work, and the procedure you’ll schedule to send your alerts.

Now that it’s installed, let’s see the alert

Here’s a sample email, shown in the Outlook 2016 Desktop app, and Outlook for Android:

If you try using this let me know how you like it. If you find problems, let me know in the comments below, or by logging an issue on GitHub.

And remember: If a developer is causing blocking, kill the session, don’t kill the developer.