Exclude All Matching Rows
Question
I have a table that has a unique ID, name, email, and some other fields. I want to match this table against another table on email, but sometimes emails are shared. The unique ID is not shared among tables, but I will input the ID into the other table when I have identified them. Name is not unique.
I’d like to build two rounds of joins. The first round I would like to match on email and append the ID, and the second round would match on ID and update fields as necessary.
Is there a way to query into the table and grab only those that have unique emails, while ignoring those that share emails? I could then append the ID and re-query those that don’t have IDs and join on the name. Would that even work?
Would this be a good use of a partition? I don’t understand those well.
asked 2016-04-08 by Chance
Answer
Is there a way to query into the table and grab only those that have
unique emails, while ignoring those that share emails?
I can think of a couple ways to do this:
1) If I restate this, I might say “Get all of the email addresses that have only 1 entry in the table, and exclude email addresses having more than one entry in the table.
Use GROUP BY...HAVING
to identify the non-duplicated email addresses, grouping by the address, and literally looking for a COUNT
of 1 (or > 1):
--UNIQUE EMAILS SELECT EmailAddress FROM TableName GROUP BY EmailAddress HAVING COUNT(*) = 1; ---NON-UNIQUE EMAILS SELECT EmailAddress FROM TableName GROUP BY EmailAddress HAVING COUNT(*) > 1;
This would literally give you the list of email addresses, which you would then need to do stuff with.
2) If I restate your question another way, “Get all of the emails where there doesn’t exist another row with the same email address. Use ‘EXISTS’ and ‘NOT EXISTS’ and a correlated subquery to identify rows with or without duplicates.
--UNIQUE EMAILS SELECT t1.EmailId, t1.EmailAddress FROM TableName t1 WHERE NOT EXISTS (SELECT 1 FROM TableName t2 WHERE t1.EMailId <> t2.EmailId AND t1.EmailAddress = t2.EmailAddress); --NON-UNIQUE EMAILS SELECT t1.EmailId, t1.EmailAddress FROM TableName t1 WHERE EXISTS (SELECT 1 FROM TableName t2 WHERE t1.EMailId <> t2.EmailId AND t1.EmailAddress = t2.EmailAddress);
Personally, I’d go with the second approach, as I think the logic is easier to follow, and I suspect it would perform better (though, that’s just a guess).
answered 2016-04-08 by Andy Mallon