This morning, when I logged into Twitter, the tweet at the top of my timeline was from Shane O’Neill (blog|twitter). It was to his blog post, Shane, What’s wrong with DELETE EXISTS?. It’s not that unusual that Shane would talk to himself in the third person, but I was intrigued by the title.
Fittingly, his post was about an IM that greeted him in the morning, and he discusses how he struggles for the best way to explain the behavior of this code. In this code, @t1
ends up with zero rows–but Shane needs to explain why the values 4 & 5 don’t remain:
DECLARE @t1 table (id1 int); DECLARE @t2 table (id2 int); INSERT INTO @t1 VALUES (1), (2), (3), (4), (5); INSERT INTO @t2 VALUES (1), (2), (3); DELETE FROM @t1 WHERE EXISTS ( SELECT * FROM @t1 AS d1 JOIN @t2 AS d2 ON d1.id1 = d2.id2 ); SELECT * FROM @t1;
Shane knows damn well how it will behave, but how do you explain it to someone who doesn’t?
Why’s it do that? Ask my Dear Aunt Sally.
My apologies if what I’m about to say causes you sleeplessness tonight, but…Think back to Algebra class. Do you remember, “Please Excuse My Dear Aunt Sally” (PEMDAS)? No, not the beer your teacher was sneaking between classes. It’s the mnemonic to help you remember the order of operations in Algebra–Parentheses, Exponents, Division & Multiplication, Addition & Subtraction. If you told Mr Fantini, your Algebra teacher, “I’ll never use this in the real world,” then it’s time for you to eat some crow. Databases are just set-based algebra. I dropped out of school before doing too much with set theory, but here’s the important parts of the Set Theory To Algebra Dictionary: Parentheses are…well…parentheses. Joins are your basic Multiplication. Union operations are like Addition.
Now that we’ve got that down, lets ask Aunt Sally how to proceed.
Parentheses:
We need to start at the inner most parentheses, and evaluate as much as we can. Everything that can be evaluated must be evaluated:
( SELECT * FROM @t1 AS d1 JOIN @t2 AS d2 ON d1.id1 = d2.id2 );
This statement runs entirely on it’s own, and returns a set of values:
So now let’s back up, and simplify the original statement by with the now-fully-evaluated parentheses:
SELECT * FROM @t1 WHERE EXISTS ( SELECT * FROM (VALUES (1,1),(2,2),(3,3)) AS x(id1,id2) );
WHERE is the problem?
Still don’t see it? The problem is in the WHERE
clause. “Where exists <some values>” is true, but there’s no correlation to the outer query. This is painfully obvious to see when we alias every table:
DELETE d0 FROM @t1 AS d0 WHERE EXISTS ( SELECT * FROM @t1 AS d1 JOIN @t2 AS d2 ON d1.id1 = d2.id2 );
You’re deleting from d0, but there are no references to d0 in the WHERE
clause! That’s not what you wanted to do at all! Your deletion criteria is based on deleting some rows, and not others. You’re going to have to reference d0 in the WHERE
clause.
Check out that plan
We scan the two tables, join them together, spool those results (they’re just a bunch of values now!), use nested loops to join @t1
to the spool, create another spool, then do the delete. The trick here, is that you can see that there are too many @t1
references.
We can fix it! Correlate the subquery
We just have to delete out all references to d1
, and hook up the references for d0
directly to d2
:
DELETE d0 FROM @t1 AS d0 WHERE EXISTS ( SELECT * FROM @t2 AS d2 WHERE d0.id1 = d2.id2 );
That’s it! Using a correlated subquery allows SQL Server to connect the bits of the inner to the outer, and will delete just the three rows you expected.
Now look at the plan
Well that’s a lot easier to read. Here, you can see how the two tables are joined directly, and that join result is used to perform the DELETE
.
TL;DR
Me? Verbose? Shocker!
The easiest way to see the problem is to simply alias your tables. If I’d been in Shane’s shoes, I’d have probably replied directly to the developer by adding the d0
alias, and showing how the table being deleted from isn’t actually referenced in the WHERE
clause.
DELETE d0 FROM @t1 AS d0 WHERE EXISTS ( SELECT * FROM @t1 AS d1 JOIN @t2 AS d2 ON d1.id1 = d2.id2 );
This is a very good example of how people don’t understand the scoping rules in SQL. It’s almost like, but not quite the same as block structured languages. Nice job.