How to highlight ngram tokens in a word using elastic search - elasticsearch

I would like to highlight just the ngrams which match, not the whole word.
Example:
term: "Wo"
highlight should be: "<em>Wo</em>nderfull world!"
currently it is: "<em>Wonderfull</em> world!"
Mapping is:
{
"global_search_1495732922733" : {
"mappings" : {
"meeting" : {
"properties" : {
...
"name" : {
"type" : "text",
"analyzer" : "meeteor_index_analyzer",
"search_analyzer" : "meeteor_search_term_analyzer"
},
...
}
}
}
}
}
Analyzers are:
"analysis" : {
"filter" : {
"meeteor_stemmer" : {
"name" : "english",
"type" : "stemmer"
},
"meeteor_ngram" : {
"type" : "nGram",
"min_gram" : "2",
"max_gram" : "15"
}
},
"analyzer" : {
"meeteor_search_term_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"tokenizer" : "standard"
},
"meeteor_index_analyzer" : {
"filter" : [
"lowercase",
"asciifolding",
"meeteor_ngram"
],
"tokenizer" : "standard"
},
"meeteor_project_id_analyzer" : {
"tokenizer" : "standard"
}
}
},
Concrete example:
curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"name": "Me"
}
},
"highlight":{
"fields": {
"name": {}
}
}
}
'
The result is:
"...highlight" : {
"name" : [
"Sad <em>Meeting</em>"
]
}

The correct way to achieve what you want is using ngram as tokenizer and not filter. You can do something like this:
"analysis" : {
"filter" : {
"meeteor_stemmer" : {
"name" : "english",
"type" : "stemmer"
}
},
"tokenizer" : {
"meeteor_ngram_tokenizer" : {
"type" : "nGram",
"min_gram" : "2",
"max_gram" : "15"
}
},
"analyzer" : {
"meeteor_search_term_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"tokenizer" : "standard"
},
"meeteor_index_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"tokenizer" : "meeteor_ngram_tokenizer"
},
"meeteor_project_id_analyzer" : {
"tokenizer" : "standard"
}
}
},
It will generate the highlighting by ngram for you like this:
"...highlight" : {
"name" : [
"Sad <em>Me</em>eting"
]
}

Related

ELK bool query with match and prefix

I'm new in ELK. I have a problem with the followed search query:
curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X GET "https://localhost:9200/commsrch/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"should" : [
{"match" : {"cn" : "franc"}},
{"prefix" : {"srt" : "99889300200"}}
]
}
}
}
'
I need to find all documents that satisfies the condition: OR field "cn" contains "franc" OR field "srt" starts with "99889300200".
Index mapping:
{
"commsrch" : {
"mappings" : {
"properties" : {
"addr" : {
"type" : "text",
"index" : false
},
"cn" : {
"type" : "text",
"analyzer" : "compname"
},
"srn" : {
"type" : "text",
"analyzer" : "srnsrt"
},
"srt" : {
"type" : "text",
"analyzer" : "srnsrt"
}
}
}
}
}
Index settings:
{
"commsrch" : {
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "commsrch",
"creation_date" : "1675079141160",
"analysis" : {
"filter" : {
"ngram_filter" : {
"type" : "ngram",
"min_gram" : "3",
"max_gram" : "4"
}
},
"analyzer" : {
"compname" : {
"filter" : [
"lowercase",
"stop",
"ngram_filter"
],
"type" : "custom",
"tokenizer" : "whitespace"
},
"srnsrt" : {
"type" : "custom",
"tokenizer" : "standard"
}
}
},
"number_of_replicas" : "1",
"uuid" : "C15EXHnaTIq88JSYNt7GvA",
"version" : {
"created" : "8060099"
}
}
}
}
}
Query works properly with just only one condition. If query has only "match" condition, results has properly documents count. If query has only "prefix" condition, results has properly documents count.
In case of two conditions "match" and "prefix", i see in result documents that corresponds only "prefix" condition.
In ELK docs can't find any limitation about mixing "prefix" and "match", but as i see some problem exists. Please help to find where is the problem.
In continue of experince I have one more problem.
Example:
Source data:
1st document cn field: "put stone is done"
2nd document cn field:: "job one or two"
Mapping and index settings the same as described in my first post
Request:
{
"query": {
"bool": {
"should" : [
{"match" : {"cn" : "one"}},
{"prefix" : {"cn" : "one"}}
]
}
}
}
'
As I understand, the high scores got first document, because it has more repeats of "one". But I need high scores for documents, that has at least one word in field "cn" started from string "one". I have experiments with query:
{
"query": {
"bool": {
"should": [
{"match": {"cn": "one"}},
{
"constant_score": {
"filter": {
"prefix": {
"cn": "one"
}
},
"boost": 100
}
}
]
}
}
}
But it doesn't work properly. What's wrong with my query?

Autocompletion with whitespace tokenizer in elasticsearch. Tokenize whitespaces correctly

I have an elastic index I want to do autocompletion with.
Therfore i have a suggestField of type completion where i put text that should be autocompleted.
"suggestField" : {
"type" : "completion",
"analyzer" : "IndexAnalyzer",
"search_analyzer" : "SearchAnalyzer",
"preserve_separators" : true,
"preserve_position_increments" : true,
"max_input_length" : 50
},
With Analyzers:
"IndexAnalyzer" : {
"filter" : [
"lowercase",
"stop",
"stopGerman",
"EdgeNGramFilter"
],
"type" : "custom",
"tokenizer" : "MyTokenizer"
},
"SearchAnalyzer" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "MyTokenizer"
},
Filters and Tokenizer:
"filter" : {
"EdgeNGramFilter" : {
"type" : "edge_ngram",
"min_gram" : "1",
"max_gram" : "50"
},
"stopGerman" : {
"type" : "stop",
"stopwords" : "_german_"
}
},
"tokenizer" : {
"MyTokenizer" : {
"type" : "whitespace"
}
}
My Problem is now that if i query that field the autocompletion only works if i start at the beginning of the text, not for every word.
E.g. i have one value in my suggest field that looks like: "123-456-789 thisisatest"
If i search my suggest field for 123- i get that value as a result.
But if i search for thisis id do not get a result.
This is my Query.
POST myindex/_search?typed_keys=true
{
"suggest": {
"completion-term": {
"completion" : {
"field" : "suggestField"
} ,
"prefix" : "thisis"
}
}
}
The Question: How do I have to change the above setup to get the given value as a result if i search for thisis ?
FYI: If I use the IndexAnalyzer in kibana with an _analyze query for 123-456-789 thisisatest i get the (from my point of view correct) tokens:
1
12
123
123-
123-4
123-45
123-456
123-456-7
123-456-78
123-456-789
t
th
thi
this
thisi
thisis
thisisa
thisisat
thisisate
thisisates
thisisatest

Elasticsearch multi_match + nested search

I am trying to execute a multi_match + nested search in ElasticSearch 6.4. I have the following mappings:
"name" : {
"type" : "text"
},
"status" : {
"type" : "short"
},
"user" : {
"type" : "nested",
"properties" : {
"first_name" : {
"type" : "text"
},
"last_name" : {
"type" : "text"
},
"pk" : {
"type" : "integer"
},
"profile" : {
"type" : "nested",
"properties" : {
"location" : {
"type" : "nested",
"properties" : {
"name" : {
"type" : "text",
"analyzer" : "html_strip"
}
}
}
}
}
}
},
And this is the html_strip analyzer:
"html_strip" : {
"filter" : [
"lowercase",
"stop",
"snowball"
],
"char_filter" : [
"html_strip"
],
"type" : "custom",
"tokenizer" : "standard"
}
And my current query is this one:
"query": {
"bool": {
"must": {
"multi_match": {
"query": 'Paris',
"fields": ['name', 'user.profile.location.name']
},
},
"filter": {
"term": {
"status": 1
}
}
}
}
Obviously searching for "Paris" in user.profile.location.name doesn't work. I was trying to adapt my code to following this answer https://stackoverflow.com/a/48836012/12007123 but without any success.
What I am basically trying to achieve, is to be able to search for a value in multiple fields, this may or may not be nested.
I was also checking this discussion https://discuss.elastic.co/t/multi-match-query-string-with-nested-and-non-nested-fields/118652/5 but everything I tried wasn't successful.
If I just search for name, the search is working fine.
Any tips on how can I achieve this the right way, would be much appreciated.
EDIT:
While I didn't get an answer to my initial question, I was following Nikolay's (#nikolay-vasiliev) comment and changed th mappings to Object instead of Nested.
At least now I am able to search in user.profile.location.name. This is how the new mapping for user looks like:
"user" : {
"properties" : {
"first_name" : {
"type" : "text"
},
"last_name" : {
"type" : "text"
},
"pk" : {
"type" : "integer"
},
"profile" : {
"properties" : {
"location" : {
"properties" : {
"name" : {
"type" : "text",
"analyzer" : "html_strip"
}
}
}
}
}
}
},

Mapping definition for has unsupported parameters: [dynamic : true]

I have been trying to make one of the fields being indexed to be dynamic and also changed the elasticsearch.yml for the same by adding
index.mapper.dynamic: false
in the end and also restarted elasticsearch and kibana sense. Also tried with different fields and index names but I am still getting the same error :
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Mapping definition for [albumdetailid] has unsupported parameters: [dynamic : true]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [package]: Mapping definition for [albumdetailid] has unsupported parameters: [dynamic : true]",
"caused_by": {
"type": "mapper_parsing_exception",
"reason": "Mapping definition for [albumdetailid] has unsupported parameters: [dynamic : true]"
}
},
"status": 400
}
the code for adding index is below :
PUT /worldtest221
{
"settings" : {
"index" : {
"creation_date" : "1474989052008",
"uuid" : "Ae-7aFrLT466ZJL4U9QGLQ",
"number_of_replicas" : "0",
"analysis" : {
"char_filter" : {
"quotes" : {
"type" : "mapping",
"mappings" : [ "=>'", "=>'", "‘=>'", "’=>'", "‛=>'" ]
}
},
"filter" : {
"nGram_filter" : {
"max_gram" : "400",
"type" : "nGram",
"min_gram" : "1",
"token_chars" : [ "letter", "digit", "punctuation", "symbol" ]
}
},
"analyzer" : {
"quotes_analyzer" : {
"char_filter" : [ "quotes" ],
"tokenizer" : "standard"
},
"nGram_analyzer" : {
"type" : "custom",
"filter" : [ "lowercase", "asciifolding", "nGram_filter" ],
"tokenizer" : "whitespace"
},
"whitespace_analyzer" : {
"type" : "custom",
"filter" : [ "lowercase", "asciifolding" ],
"tokenizer" : "whitespace"
}
}
},
"cache" : {
"query_result" : {
"enable" : "true"
}
},
"number_of_shards" : "1",
"version" : {
"created" : "2030099"
}
}
},
"mappings" : {
"package" : {
"properties" : {
"autosuggestionpackagedetail" : {
"type" : "string",
"index" : "not_analyzed"
},
"availability" : {
"type" : "nested",
"include_in_all" : false,
"properties" : {
"displaysequence" : {
"type" : "long",
"include_in_all" : false
},
"isquantityavailable" : {
"type" : "boolean"
},
.....
"metatags" : {
"type" : "string",
"include_in_all" : false
},
"minadultage" : {
"type" : "long",
"include_in_all" : false
},
"newmemberrewardpoints" : {
"type" : "long",
"include_in_all" : false
},
"packagealbum" : {
"include_in_all" : false,
"properties" : {
"albumdetailid" : {
"type" : "string",
"include_in_all" : false,
"dynamic": true
},
....
Look at the second last line where I mention "dynamic" : true
This happens because you are trying to set "dynamic: true" on a field ("albumdetailid") of type string! This make no sense! no new fields can be created under a "string" field. The "dynamic" parameter should be set under an "object" field. so Either define "albumdetailid" as "object" or put the "dynamic: true" one level higher - under "packagealbum" like this:
"packagealbum" : {
"include_in_all" : false,
"dynamic": true,
"properties" : {
"albumdetailid" : {
"type" : "string",
"include_in_all" : false
},
....

Why ElasticSearch is not finding my term

I just installed and testing elastic search it looks great and i need to know some thing i have an configuration file
elasticsearch.json in config directory
{
"network" : {
"host" : "127.0.0.1"
},
"index" : {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval" : "2s",
"analysis" : {
"analyzer" : {
"index_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase"]
},
"search_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase"]
}
},
"// you'll need lucene dep for this: filter" : {
"snowball": {
"type" : "snowball",
"language" : "English"
}
}
}
}
}
and i have inserted an doc that contains a word searching if i search for keyword
search it says nothing found...
wont it stem before indexing or i missed some thing in config ....
How looks your query?
your config does not look good. try:
...
"index_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase", "snowball"]
},
"search_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase", "snowball"]
}
},
"filter" : {
"snowball": {
"type" : "snowball",
"language" : "English"
}
}
I've had trouble overriding the "default_search" and "default_index" analyzer as well.
This works though.
You can add "index_analyzer" to default all string fields with unspecified analyzers within a type, if need be.
curl -XDELETE localhost:9200/twitter
curl -XPOST localhost:9200/twitter -d '
{"index":
{ "number_of_shards": 1,
"analysis": {
"filter": {
"snowball": {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": { "a2" : {
"type":"custom",
"tokenizer": "standard",
"filter": ["lowercase", "snowball"]
}
}
}
}
}
}'
curl -XPUT localhost:9200/twitter/tweet/_mapping -d '{
"tweet" : {
"date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
"properties" : {
"user": {"type":"string"},
"message" : {"type" : "string", "analyzer":"a2"}
}
}}'
curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{ "user": "kimchy", "post_date": "2009-11-15T13:12:00", "message": "Trying out searching teaching, so far so good?" }'
curl -XGET localhost:9200/twitter/tweet/_search?q=message:search
curl -XGET localhost:9200/twitter/tweet/_search?q=message:try

Resources