Multi-Statement TVFs in Dynamics CRM

If you’re familiar with supporting the database behind Microsoft Dynamics CRM, you probably know that it’s not the fastest-performing database. Honestly, that shouldn’t be a surprise–it’s not designed to be a screaming-fast database. It’s designed to be a flexible database. Most Customer Relationship Management (CRM) systems are designed to be flexible so that they can meet the needs of many businesses in many industries with vastly different business requirements. They put those requirements ahead of database performance. That’s probably smart business, but I’m not a business person–I’m a database person. My experience with Dynamics CRM is when people come to me and say

Andy, the database is slow

One recent occurrence was with a report failing due to a 5-minute query timeout. With the proper indexes, we should be able to get a few hundred rows really fast. I got my hands on the query and some example parameters, dropped it into Plan Explorer, and ran it a few times in our Test environment (I’m doing all this in Test–that’s going to be important later). I wanted to make sure I was running it with a warm cache, so that I could use “the best of the worst” for my benchmark. The query was a big nasty SELECT with a CTE, and a bunch of joins. Unfortunately, I can’t provide the exact query, since it had some customer-specific business logic (Sorry!).

7 minutes, 37 seconds is as good as it gets.
7 minutes, 37 seconds is as good as it gets.

Right off the bat, there’s a lot of bad going on here. 1.5 million reads is a heck of a lot of I/O. 457 seconds to return 200 rows is slow. The Cardinality Estimator expected 2 rows, instead of 200. And there were a lot of writes–since this query is only a SELECT statement, this means we must be spilling to TempDb. Maybe I’ll get lucky, and be able to create an index to eliminate a table scan and speed this thing up. What’s the plan look like?

Looks like an apatosaurus, or maybe a giraffe.
Looks like an apatosaurus, or maybe a giraffe.

There will be no quick hits

Let me pause for a moment to explain something about Dynamics CRM. It uses views. It uses nested views. It uses nested views to enforce row-level security. In Dynamics parlance, these row-level-security-enforcing nested views are called “filtered views.” Every query from the application goes through these filtered views. The only “supported” way to perform data access is to use these filtered views.

Recall I said this query was referencing a bunch of tables? Well, it’s referencing a bunch of filtered views. So the complicated query I was handed is actually several layers more complicated. At this point, I got a fresh cup of coffee, and switched to a bigger monitor.

A great way to solve problems is to start at the start. I zoomed in on the SELECT operator, and followed the arrows to see what was going on:

Zooming In
Even on my 34″ ultra-wide monitor, I had to fiddle with the display settings for the plan to see this much. Plan Explorer can rotate plans 90 degrees to make “tall” plans fit on a wide monitor.

Look at all those table-valued function calls! Followed immediately by a really expensive hash match. My Spidey Sense started to tingle. What is fn_GetMaxPrivilegeDepthMask, and why is it being called 30 times? I bet this is a problem. When you see “Table-valued function” as an operator in a plan, that actually means it’s a multi-statement table-valued function. If it were an inline table-valued function, it would get incorporated into the larger plan, and not be a black box. Multi-statement table-valued functions are evil. Don’t use them. The Cardinality Estimator isn’t able to make accurate estimates. The Query Optimizer isn’t able to optimize them in the context of the larger query. From a performance perspective, they don’t scale.

Even though this TVF is an out-of-the-box piece of code from Dynamics CRM, my Spidey Sense tells me that it’s the problem. Forget this big nasty query with a big scary plan. Lets step into that function and see what’s going on:

create function [dbo].[fn_GetMaxPrivilegeDepthMask](@ObjectTypeCode int) 
returns @d table(PrivilegeDepthMask int)
-- It is by design that we return a table with only one row and column
as
begin
	declare @UserId uniqueidentifier
	select @UserId = dbo.fn_FindUserGuid()

	declare @t table(depth int)

	-- from user roles
	insert into @t(depth)	
	select
	--privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global) 
	-- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global)
	-- do an AND with 0x0F ( =15) to get basic/local/deep/global
		max(rp.PrivilegeDepthMask % 0x0F)
	   as PrivilegeDepthMask
	from 
		PrivilegeBase priv
		join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId)
		join Role r on (rp.RoleId = r.ParentRootRoleId)
		join SystemUserRoles ur on (r.RoleId = ur.RoleId and ur.SystemUserId = @UserId)
		join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId)
	where 
		potc.ObjectTypeCode = @ObjectTypeCode and 
		priv.AccessRight & 0x01 = 1

	-- from user's teams roles
	insert into @t(depth)	
	select
	--privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global) 
	-- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global)
	-- do an AND with 0x0F ( =15) to get basic/local/deep/global
		max(rp.PrivilegeDepthMask % 0x0F)
	   as PrivilegeDepthMask
	from 
		PrivilegeBase priv
        join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId)
        join Role r on (rp.RoleId = r.ParentRootRoleId)
        join TeamRoles tr on (r.RoleId = tr.RoleId)
        join SystemUserPrincipals sup on (sup.PrincipalId = tr.TeamId and sup.SystemUserId = @UserId)
        join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId)
	where 
		potc.ObjectTypeCode = @ObjectTypeCode and 
		priv.AccessRight & 0x01 = 1
	
	insert into @d select max(depth) from @t
	return	
end		
GO

This function follows a classic pattern in multi-statement TVFs:

  • Declare a variable that is used as a constant
  • Insert into a table variable
  • Return that table variable

There’s nothing fancy going on here. We could re-write these multiple statements as a single SELECT statement. If we can write it as a single SELECT statement, we can re-write this as an inline TVF.

Let’s do it

If it isn’t obvious, I’m about to re-write code provided by a software vendor. I’ve never met a software vendor that considers this to be “supported” behavior. If you change the out-of-the-box application code, you are on your own. Microsoft certainly considers this “unsupported” behavior for Dynamics. I’m going to do it anyway, since I’m using the test environment and I’m not playing around in production. Re-writing this function took just a couple minutes–so why not give it a try and see what happens? Here’s what my version of the function looks like:

create function [dbo].[fn_GetMaxPrivilegeDepthMask](@ObjectTypeCode int) 
returns table
-- It is by design that we return a table with only one row and column
as
RETURN
	-- from user roles
	select PrivilegeDepthMask = max(PrivilegeDepthMask) 
	    from	(
	    select
            --privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global) 
	    -- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global)
	    -- do an AND with 0x0F ( =15) to get basic/local/deep/global
		    max(rp.PrivilegeDepthMask % 0x0F)
	       as PrivilegeDepthMask
	    from 
		    PrivilegeBase priv
		    join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId)
		    join Role r on (rp.RoleId = r.ParentRootRoleId)
		    join SystemUserRoles ur on (r.RoleId = ur.RoleId and ur.SystemUserId = dbo.fn_FindUserGuid())
		    join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId)
	    where 
		    potc.ObjectTypeCode = @ObjectTypeCode and 
		    priv.AccessRight & 0x01 = 1
        UNION ALL	
	    -- from user's teams roles
	    select
            --privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global) 
	    -- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global)
	    -- do an AND with 0x0F ( =15) to get basic/local/deep/global
		    max(rp.PrivilegeDepthMask % 0x0F)
	       as PrivilegeDepthMask
	    from 
		    PrivilegeBase priv
            join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId)
            join Role r on (rp.RoleId = r.ParentRootRoleId)
            join TeamRoles tr on (r.RoleId = tr.RoleId)
            join SystemUserPrincipals sup on (sup.PrincipalId = tr.TeamId and sup.SystemUserId = dbo.fn_FindUserGuid())
            join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId)
	    where 
		    potc.ObjectTypeCode = @ObjectTypeCode and 
		    priv.AccessRight & 0x01 = 1
        )x
GO

I went back to my original test query, dumped the cache, and re-ran it a few times. Here’s the slowest run time, when using my version of the TVF:

That looks much better!

It’s still not the most efficient query in the world, but it’s fast enough–I don’t need to make it any faster. Except… I had to modify Microsoft’s code to make it happen. That’s not ideal. Let’s take a look at the full plan with the new TVF:

Goodbye apatosaurus, hello PEZ dispenser!

It’s still a really gnarly plan, but if you look at the start, all those black box TVF calls are gone. The super-expensive hash match is gone. SQL Server gets right down to work without that big bottleneck of TVF calls (the work behind the TVF is now inline with the rest of the SELECT):

Big picture impact

Where is this TVF actually used? Nearly every single filtered view in Dynamics CRM uses this function call. There are 246 filtered views and 206 of them reference this function. It is a critical function as part of the Dynamics row-level security implementation. Virtually every single query from the application to the databases calls this function at least once–usually a few times. This is a two-sided coin: on one hand, fixing this function will likely act as a turbo boost for the entire application; on the other hand, there’s no way for me to do regression tests for everything that touches this function.

Wait a second–if this function call is so core to our performance, and so core to Dynamics CRM, then it follows that everyone who uses Dynamics is hitting this performance bottleneck. We opened a case with Microsoft, and I called a few folks to get the ticket bumped along to the engineering team responsible for this code. With a little luck, this updated version of the function will make it into the box (and the cloud) in a future release of Dynamics CRM.

This isn’t the only multi-statement TVF in Dynamics CRM–I made the same type of change to fn_UserSharedAttributesAccess for another performance issue. And there are more TVFs that I haven’t touched because they haven’t caused problems.

A lesson to everyone, even if you’re not using Dynamics

Repeat after me: MULTI-STATEMENT TABLE VALUED FUNCTIONS ARE EVIL!

Re-factor your code to avoid using multi-statement TVFs. If you are trying to tune code, and you see a multi-statement TVF, look at it critically. You can’t always change the code (or it may be a violation of your support contract if you do), but if you can change the code, do it. Tell your software vendor to stop using multi-statement TVFs. Make the world a better place by eliminating some of these nasty functions from your database.