Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
227 views
in Technique[技术] by (71.8m points)

Elasticsearch can't search the word before dot(.) with whitespace analyzer

I have the Following index settings

{
"analysis": {
    "filter": {
        "dutch_stop": {
            "type": "stop",
            "stopwords": "_dutch_"
        },
        "my_word_delimiter": {
            "type": "word_delimiter",
            "preserve_original": "true"
        }
    },
    "analyzer": {
        "dutch_search": {
            "filter": [
                "lowercase",
                "dutch_stop"
            ],
            "char_filter": [
                "special_char_filter"
            ],
            "tokenizer": "whitespace"
        },
        "dutch_index": {
            "filter": [
                "lowercase",
                "dutch_stop"
            ],
            "char_filter": [
                "special_char_filter"
            ],
            "tokenizer": "whitespace"
        }
    },
    "char_filter": {
        "special_char_filter": {
            "pattern": "/",
            "type": "pattern_replace",
            "replacement": " "
        }
    }
}}

Mapping

{
"properties": {
    "title": {
        "type": "text",
        "fields": {
            "keyword": {
                "type": "keyword",
                "ignore_above": 256
            }
        },
        "analyzer": "dutch_search",
        "search_analyzer": "dutch_search"
    }
}}

Here is one document which I have inserted

{
   "title": "This is test data."
}

now I'm searching for the word "data" and my query for this is

{
"query": {
    "multi_match": {
        "query": "data",
        "fields": [
            "title"
        ]
    }
}

but it returned zero records. I know this is because of the whitespace analyzer but I need that also so can anyone suggest any solution for this. How can I use a whitespace analyzer and can search the word that is before a dot(.)?

question from:https://stackoverflow.com/questions/66060978/elasticsearch-cant-search-the-word-before-dot-with-whitespace-analyzer

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You need to use the mapping char filter where you can remove the . character and this should solve your issue.

Below is the working example:

GET http://localhost:9200/_analyze

{
  "tokenizer": "whitespace",
  "char_filter": [
    {
      "type": "mapping",
      "mappings": [
        ".=>"
      ]
    }
  ],
  "text": "This is test data."
}

returns below tokens

{
    "tokens": [
        {
            "token": "This",
            "start_offset": 0,
            "end_offset": 4,
            "type": "word",
            "position": 0
        },
        {
            "token": "is",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 1
        },
        {
            "token": "test",
            "start_offset": 8,
            "end_offset": 12,
            "type": "word",
            "position": 2
        },
        {
            "token": "data",
            "start_offset": 13,
            "end_offset": 18,
            "type": "word",
            "position": 3
        }
    ]
}

Or you can modify your current pattern replace character filter as

"char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "\.",            // note this
          "replacement": ""
        }
      }

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...