This is done best in two stages.
Stage 1: Merge Update on condition
SO Answer from before (Thanks to @Laurence!)
Stage 2: hash key condition to compare
Limits: max 4000 characters, including column separator characters
A rather simple way to compare multiple columns in one condition is the use of a computed column on both sides that HASHBYTES( , <column(s)> ) generates.
This moves writing lots of code from the merge statement to the table generation.
Quick example:
CREATE TABLE dbo.Test
(
id_column int NOT NULL,
dsc_name1 varchar(100),
dsc_name2 varchar(100),
num_age tinyint,
flg_hash AS HashBytes( 'SHA1',
Cast( dsc_name1 AS nvarchar(4000) )
+ N'?' + dsc_name2 + N'?' + Cast( num_age AS nvarchar(3) )
) PERSISTED
)
;
Comparing columns flg_hash between source and destination will make comparison quick as it is just a comparison between two 20 bit varbinary columns.
Couple of Caveat Emptor for working with HashBytes:
- Function only works for a total of 4000 nvarchar characters
- Trade off for short comparison code lies in generation of correct order in views and tables
- There is a duplicate collision chance of around an 2^50+ for SHA1 - as security mechanism this is now considered insecure and a few years ago MS tried to drop SHA1 as algorithm
- Added columns to tables and views can be overlooked from comparison if hash bytes code is outside of consideration for amendments
- Overall I found that when comparing multiple columns this can overload my server engines but never had an issue with hash key comparisons
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…