For a fast, high-level comparison between an Oracle and a SQL Server table you can use aggregations of the functions STANDARD_HASH
and HASH_BYTES
.
Oracle Code
--Create a simple table.
create table table1
(
id number,
contractdatetime date,
Attr3 varchar2(100),
Attr4 varchar2(100),
Attr5 varchar2(100)
);
--Insert 4 rows, the first three will be identical between databases,
--the last row will be different.
insert into table1 values (1, date '2000-01-01', 'a', 'a', 'a');
insert into table1 values (2, date '2000-01-01', 'b', 'b', 'b');
insert into table1 values (2, date '2000-01-02', null, null, null);
insert into table1 values (3, date '2000-01-02', 'Oracle', 'Oracle', 'Oracle');
commit;
select
id,
--Format the number
trim(to_number(
--Sum per group.
sum(
--Convert to a number.
to_number(
--Get the first 14 bytes. This seems to be the maximum that SQL Server can handle
--before it runs into math errors.
substr(
--Hash the value.
standard_hash(
--Concatenate the values using (hopefully) unique strings to separate the
--columns and represent NULLs (because the hashing functions treat nulls differently.)
nvl(to_char(contractdatetime, 'YYYY-MM-DD HH24:MI:SS'), 'null') ||
'-1-' || nvl(attr3, 'null') || '-2-' || nvl(attr3, 'null') || '-3-' || nvl(attr3, 'null')
, 'MD5')
, 1, 14)
, 'xxxxxxxxxxxxxxxxxxxx'))
, '99999999999999999999')) hash
from table1
group by id
order by 1;
SQL Server Code
create table table1
(
id numeric,
contractdatetime datetime,
Attr3 varchar(100),
Attr4 varchar(100),
Attr5 varchar(100)
);
insert into table1 values (1, cast('2000-01-01 00:00:00.000' as datetime), 'a', 'a', 'a');
insert into table1 values (2, cast('2000-01-01 00:00:00.000' as datetime), 'b', 'b', 'b');
insert into table1 values (2, cast('2000-01-02 00:00:00.000' as datetime), null, null, null);
insert into table1 values (3, cast('2000-01-02 00:00:00.000' as datetime), 'SQL Server', 'SQL Server', 'SQL Server');
commit;
select
id,
sum(
convert(bigint, convert(varbinary,
substring(
hashbytes('MD5',
isnull(convert(varchar(19), contractdatetime, 20), 'null') +
'-1-' + isnull(attr3, 'null') + '-2-' + isnull(attr3, 'null') + '-3-' + isnull(attr3, 'null'))
, 1, 7)
, 1))) hash
from table1
group by id
order by 1;
Results
As expected, the hashes for the first two groups are identical, and the hash for the third group is different.
Oracle:
ID HASH
1 50696302970576522
2 69171702324546493
3 50787287321473273
SQL Server
ID HASH
1 50696302970576522
2 69171702324546493
3 7440319042693061
Here is an Oracle fiddle and a SQL Server fiddle.
Problems
- I assume this solution will only work if the databases use similar characters sets, or perhaps only use the first 127 ASCII characters that are often encoded the same in different character sets.
- There is a (perhaps unreasonably) high chance of hash collisions. MD5 hashes aren't good enough for preventing cryptographic attacks, but they're good enough for comparing sets of data. The problem is that I had to use substrings to get the math to work for SQL Server. This is probably my fault for not understanding SQL Server well enough - BIGINTS should support roughly 19 digits of precision but my math only worked up to 14 digits. I probably have a conversion bug somewhere. You may need to play with the "14" and "7" numbers if you get too many collisions or overflow problems. (14 for Oracle, which counts based on the displayed hex characters. 7 for SQL Server, which counts based on the number of characters that can be represented by each hex characters, which is 0.5.)