If you’re familiar with supporting the database behind Microsoft Dynamics CRM, you probably know that it’s not the fastest-performing database. Honestly, that shouldn’t be a surprise–it’s not designed to be a screaming-fast database. It’s designed to be a flexible database. Most Customer Relationship Management (CRM) systems are designed to be flexible so that they can meet the needs of many businesses in many industries with vastly different business requirements. They put those requirements ahead of database performance. That’s probably smart business, but I’m not a business person–I’m a database person. My experience with Dynamics CRM is when people come to me and say
Andy, the database is slow
One recent occurrence was with a report failing due to a 5-minute query timeout. With the proper indexes, we should be able to get a few hundred rows really fast. I got my hands on the query and some example parameters, dropped it into Plan Explorer, and ran it a few times in our Test environment (I’m doing all this in Test–that’s going to be important later). I wanted to make sure I was running it with a warm cache, so that I could use “the best of the worst” for my benchmark. The query was a big nasty SELECT
with a CTE, and a bunch of joins. Unfortunately, I can’t provide the exact query, since it had some customer-specific business logic (Sorry!).
Right off the bat, there’s a lot of bad going on here. 1.5 million reads is a heck of a lot of I/O. 457 seconds to return 200 rows is slow. The Cardinality Estimator expected 2 rows, instead of 200. And there were a lot of writes–since this query is only a SELECT
statement, this means we must be spilling to TempDb. Maybe I’ll get lucky, and be able to create an index to eliminate a table scan and speed this thing up. What’s the plan look like?
There will be no quick hits
Let me pause for a moment to explain something about Dynamics CRM. It uses views. It uses nested views. It uses nested views to enforce row-level security. In Dynamics parlance, these row-level-security-enforcing nested views are called “filtered views.” Every query from the application goes through these filtered views. The only “supported” way to perform data access is to use these filtered views.
Recall I said this query was referencing a bunch of tables? Well, it’s referencing a bunch of filtered views. So the complicated query I was handed is actually several layers more complicated. At this point, I got a fresh cup of coffee, and switched to a bigger monitor.
A great way to solve problems is to start at the start. I zoomed in on the SELECT operator, and followed the arrows to see what was going on:
Look at all those table-valued function calls! Followed immediately by a really expensive hash match. My Spidey Sense started to tingle. What is fn_GetMaxPrivilegeDepthMask
, and why is it being called 30 times? I bet this is a problem. When you see “Table-valued function” as an operator in a plan, that actually means it’s a multi-statement table-valued function. If it were an inline table-valued function, it would get incorporated into the larger plan, and not be a black box. Multi-statement table-valued functions are evil. Don’t use them. The Cardinality Estimator isn’t able to make accurate estimates. The Query Optimizer isn’t able to optimize them in the context of the larger query. From a performance perspective, they don’t scale.
Even though this TVF is an out-of-the-box piece of code from Dynamics CRM, my Spidey Sense tells me that it’s the problem. Forget this big nasty query with a big scary plan. Lets step into that function and see what’s going on:
create function [dbo].[fn_GetMaxPrivilegeDepthMask](@ObjectTypeCode int) returns @d table(PrivilegeDepthMask int) -- It is by design that we return a table with only one row and column as begin declare @UserId uniqueidentifier select @UserId = dbo.fn_FindUserGuid() declare @t table(depth int) -- from user roles insert into @t(depth) select --privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global) -- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global) -- do an AND with 0x0F ( =15) to get basic/local/deep/global max(rp.PrivilegeDepthMask % 0x0F) as PrivilegeDepthMask from PrivilegeBase priv join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId) join Role r on (rp.RoleId = r.ParentRootRoleId) join SystemUserRoles ur on (r.RoleId = ur.RoleId and ur.SystemUserId = @UserId) join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId) where potc.ObjectTypeCode = @ObjectTypeCode and priv.AccessRight & 0x01 = 1 -- from user's teams roles insert into @t(depth) select --privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global) -- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global) -- do an AND with 0x0F ( =15) to get basic/local/deep/global max(rp.PrivilegeDepthMask % 0x0F) as PrivilegeDepthMask from PrivilegeBase priv join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId) join Role r on (rp.RoleId = r.ParentRootRoleId) join TeamRoles tr on (r.RoleId = tr.RoleId) join SystemUserPrincipals sup on (sup.PrincipalId = tr.TeamId and sup.SystemUserId = @UserId) join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId) where potc.ObjectTypeCode = @ObjectTypeCode and priv.AccessRight & 0x01 = 1 insert into @d select max(depth) from @t return end GO
This function follows a classic pattern in multi-statement TVFs:
- Declare a variable that is used as a constant
- Insert into a table variable
- Return that table variable
There’s nothing fancy going on here. We could re-write these multiple statements as a single SELECT
statement. If we can write it as a single SELECT
statement, we can re-write this as an inline TVF.
Let’s do it
If it isn’t obvious, I’m about to re-write code provided by a software vendor. I’ve never met a software vendor that considers this to be “supported” behavior. If you change the out-of-the-box application code, you are on your own. Microsoft certainly considers this “unsupported” behavior for Dynamics. I’m going to do it anyway, since I’m using the test environment and I’m not playing around in production. Re-writing this function took just a couple minutes–so why not give it a try and see what happens? Here’s what my version of the function looks like:
create function [dbo].[fn_GetMaxPrivilegeDepthMask](@ObjectTypeCode int) returns table -- It is by design that we return a table with only one row and column as RETURN -- from user roles select PrivilegeDepthMask = max(PrivilegeDepthMask) from ( select --privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global) -- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global) -- do an AND with 0x0F ( =15) to get basic/local/deep/global max(rp.PrivilegeDepthMask % 0x0F) as PrivilegeDepthMask from PrivilegeBase priv join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId) join Role r on (rp.RoleId = r.ParentRootRoleId) join SystemUserRoles ur on (r.RoleId = ur.RoleId and ur.SystemUserId = dbo.fn_FindUserGuid()) join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId) where potc.ObjectTypeCode = @ObjectTypeCode and priv.AccessRight & 0x01 = 1 UNION ALL -- from user's teams roles select --privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global) -- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global) -- do an AND with 0x0F ( =15) to get basic/local/deep/global max(rp.PrivilegeDepthMask % 0x0F) as PrivilegeDepthMask from PrivilegeBase priv join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId) join Role r on (rp.RoleId = r.ParentRootRoleId) join TeamRoles tr on (r.RoleId = tr.RoleId) join SystemUserPrincipals sup on (sup.PrincipalId = tr.TeamId and sup.SystemUserId = dbo.fn_FindUserGuid()) join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId) where potc.ObjectTypeCode = @ObjectTypeCode and priv.AccessRight & 0x01 = 1 )x GO
I went back to my original test query, dumped the cache, and re-ran it a few times. Here’s the slowest run time, when using my version of the TVF:
It’s still not the most efficient query in the world, but it’s fast enough–I don’t need to make it any faster. Except… I had to modify Microsoft’s code to make it happen. That’s not ideal. Let’s take a look at the full plan with the new TVF:
It’s still a really gnarly plan, but if you look at the start, all those black box TVF calls are gone. The super-expensive hash match is gone. SQL Server gets right down to work without that big bottleneck of TVF calls (the work behind the TVF is now inline with the rest of the SELECT
):
Big picture impact
Where is this TVF actually used? Nearly every single filtered view in Dynamics CRM uses this function call. There are 246 filtered views and 206 of them reference this function. It is a critical function as part of the Dynamics row-level security implementation. Virtually every single query from the application to the databases calls this function at least once–usually a few times. This is a two-sided coin: on one hand, fixing this function will likely act as a turbo boost for the entire application; on the other hand, there’s no way for me to do regression tests for everything that touches this function.
Wait a second–if this function call is so core to our performance, and so core to Dynamics CRM, then it follows that everyone who uses Dynamics is hitting this performance bottleneck. We opened a case with Microsoft, and I called a few folks to get the ticket bumped along to the engineering team responsible for this code. With a little luck, this updated version of the function will make it into the box (and the cloud) in a future release of Dynamics CRM.
This isn’t the only multi-statement TVF in Dynamics CRM–I made the same type of change to fn_UserSharedAttributesAccess
for another performance issue. And there are more TVFs that I haven’t touched because they haven’t caused problems.
A lesson to everyone, even if you’re not using Dynamics
Repeat after me: MULTI-STATEMENT TABLE VALUED FUNCTIONS ARE EVIL!
Re-factor your code to avoid using multi-statement TVFs. If you are trying to tune code, and you see a multi-statement TVF, look at it critically. You can’t always change the code (or it may be a violation of your support contract if you do), but if you can change the code, do it. Tell your software vendor to stop using multi-statement TVFs. Make the world a better place by eliminating some of these nasty functions from your database.