I've been struggling to get Deduplication to work in SolrCloud (version 8.6). My solrconfig.xml contains:
<updateRequestProcessorChain name="dedupeOn">
<processor class="solr.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<str name="signatureField">dedupeId</str>
<bool name="overwriteDupes">true</bool>
<str name="fields">journal_doi,internal_pmid</str>
<str name="signatureClass">solr.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.DistributedUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
and
<requestHandler name="/update" class="solr.UpdateRequestHandler" >
<lst name="defaults">
<str name="update.chain">dedupeOn</str>
</lst>
</requestHandler>
my managedschema contains:
<field name="dedupeId" type="string" indexed="true" stored="true" multiValued="false" />
In my test, I add 1000 documents, and commit manually. I see the "dedupeId" is created with the hash.
I then add 10 more documents that I know are duplicates, and again commit manually. These 10 rows are added, and the original document with the matching dedupeId is not overwritten. For example:
"response":{"numFound":2,"start":0,"maxScore":2.1554677,"numFoundExact":true,"docs":[
{
"internal_pmid":"13367837",
"dedupeId":"7f0306ecd909a68e",
"journal_doi":"10.1097/00005053-195603000-00006"},
{
"internal_pmid":"13367837",
"dedupeId":"7f0306ecd909a68e",
"journal_doi":"10.1097/00005053-195603000-00006"}]
}}
I'm not sure if its significant, but in the solr logs, I see some "add" entries that contain, in part:
webapp=/solr path=/update params={update.distrib=TOLEADER&update.chain=dedupeOn&distrib.from=*(shard path)*/&wt=javabin&version=2}{add=[00001hLxMb (1690871781072568320)]} 0 2
but other add entries do not contain the update.chain property e.g.
webapp=/solr path=/update params={wt=javabin&version=2}{add=[00000sta0n (1690871780667817984)]} 0 2
Any help would be greatly appreciated.
question from:
https://stackoverflow.com/questions/66067082/solrcloud-deduplication-overwrite-isnt-working 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…