You need to use the mapping char filter where you can remove the .
character and this should solve your issue.
Below is the working example:
GET http://localhost:9200/_analyze
{
"tokenizer": "whitespace",
"char_filter": [
{
"type": "mapping",
"mappings": [
".=>"
]
}
],
"text": "This is test data."
}
returns below tokens
{
"tokens": [
{
"token": "This",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 0
},
{
"token": "is",
"start_offset": 5,
"end_offset": 7,
"type": "word",
"position": 1
},
{
"token": "test",
"start_offset": 8,
"end_offset": 12,
"type": "word",
"position": 2
},
{
"token": "data",
"start_offset": 13,
"end_offset": 18,
"type": "word",
"position": 3
}
]
}
Or you can modify your current pattern replace character filter as
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "\.", // note this
"replacement": ""
}
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…