How to dig deeper into ASYNC_NETWORK_IO waits

clip art of 
 a double-quote character

Question

I’ve been handed a graph that comes from SolarWinds showing top wait events broken down by day. The last 2 days there’s a part of the daily color coded bar representing a total of 3 hours for ASYNC_NETWORK_IO. Still trying to see if I can get access to Solar Winds panels to try and see if it offers a drill down into the graph or not.

I’ve been searching through google, websites and documentation for the last 2 hours and finding little of consequence to tell me how to drill down into this 3 hours.

Is it one large block, is it a few minutes here and there through out the day, does it match any particular window of high I/O activity (data warehouse refresh or something).

All I have is "here, there’s a total of 3 hours of this wait yesterday and the day before, now tell me why" – and I must admit I really don’t have any idea on how to drill down into this any further.

I’ve read all sorts of articles about "Well, it’s usually a poorly designed application" or something along those lines.

Funny this is of the last 7 days, the first 5 are all clear. No signs of this wait. Suddenly there’s a huge chunk of it. I need to know how I can dig more.

As far as I’m aware there are no user complaints about system performance.

Is there a DMV or something to help out here ?

Anyone able to give me some pointers?

Thanks

asked 2020-08-15 by SJWales


Answer

ASYNC_NETWORK_IO is never a server side problem.

This wait indicates SQL Server has results to send to the client, but is waiting for the client to consume the results so it can finish sending.

It’s possible (though unlikely) that a slow network is the bottleneck. This almost never is the reason, unless someone is using dial up quality network connections. The fix here is a faster network connection. (But that’s not your problem.)

It’s also possible that the application is explicitly stopping it’s consumption of results due to bad design. This will happen if the application accepts the first 25 rows to display, then stops accepting results forcing the server to wait…. And only taking the next 25 rows when the user hits the "next page" button. In this case, you’d need to change the application to either consume the whole result set at once, then handle pagination just in the display, or to change the app to simply ask for one page of results and do a round trip for now 6 every time the user hits "next page".

Lastly, it could be a client machine that is underpowered, or a poorly performing application. SSMS is one of those poorly performing applications. If you do a SELECT * FROM dbo.HugeTrillionRowTable you’ll see ASYNC_NETWORK_IO waits pile up for your session. The right fix here depends on the details. You might need to add more memory to the client machine, or scale out to more client containers, or have a code change in the app (that’s the fix for SSMS, but it’s not a trivial change).

In your case, if you can get read only access to SolarWinds DPA, you will be able to drill into that wait to detrimine the query, client, etc to get into those details. If you can’t access DPA, you can monitor sys.dm_exec_session_wait_stats, though this will only show you info for currently running queries, so you’ll need to catch the offending code in the act. There isn’t a "time machine" DMV to go back after the fact.

answered 2020-08-15 by Andy Mallon