How to implement a unique key for a long VARCHAR column in MySQL
Question
I need to let MySQL to check the uniqueness of a VARCHAR column which is 5000 max length and is not the primary key. I can use an index of length 500 on the column to speed up query, but how to create an unique key that works?
An unique key of length 500 fails when trying to insert two records whose that column is the same for the first 500 characters and differs on somewhere after that.
asked 2019-10-27 by jw_
Answer
Checking directly for uniqueness of a 5000-character string is going to be pretty expensive, no matter how you implement it.
I’d suggest computing a hash, storing that hash, and computing uniqueness on that hash. This is almost the same thing, but does introduce the opportunity for a hash collision, where two different values result in the same hash.
The likelihood of a collision will vary with the hashing algorithm: MD5 has a 1 in 2128 chance in a collision. SHA1, and SHA256 each provide better protection against collision, but are slower/more expensive.
For example: You could create a computed/virtual column with the MD5()
function, then create a unique index on that column. This won’t guarantee perfect uniqueness, but is likely unique enough for most purposes where you have a 5k chunk of text that you want to keep unique.
With hashing, if two different, unique strings have a hash collision, this would prevent that second unique value from being inserted. However, two identical strings will produce the same hash. So the problem with a weak hashing algorithm is that it would be too aggressive in preventing dupes when it shouldn’t.
answered 2019-10-27 by Andy Mallon