No results for basic query in Elasticsearch - elasticsearch

I'm trying to set up a basic Elasticsearch index locally and using Kibana, I am able to get all results when I do a match_all search, but I've tried many variations of a simple match query and none work.
My mapping:
{
"recipes-v1": {
"mappings": {
"dynamic": "false",
"properties": {
"description": {
"type": "keyword"
},
"ingredients": {
"type": "text"
},
"instructions": {
"type": "keyword"
},
"title": {
"type": "keyword"
}
}
}
}
}
Results from a match_all query:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "recipes-v1",
"_id": "0",
"_score": 1,
"_source": {
"Name": "Alfredo Sauce",
"Description": "Cheesy alfredo sauce that is delicious. Definitely not vegan",
"Ingredients": [
"1/2 cup butter",
"3 cloves garlic"
],
"Instructions": [
"Melt butter in a saucepan then add heavy cream and combine on medium low heat",
"Let the mixture simmer for 2 minutes then add garlic, salt, pepper, and italian seasoning to taste. Let simmer until fragrent (about 1 minute)"
]
}
},
{
"_index": "recipes-v1",
"_id": "1",
"_score": 1,
"_source": {
"Name": "Shrimp Scampi",
"Description": "Definitely not just Gordon Ramsay's shrimp scampi minus capers",
"Ingredients": [
"1 lb shrimp",
"2 lemons"
],
"Instructions": [
"Do things",
"Do more things"
]
}
}
]
}
}
I've tried deleting the index and recreating it and every variation of Alfredo, alfredo, alfredo sauce, AlfredoSauce, etc. and none have worked. Please help
All variations in these queries yield no hits though:
POST recipes-v1/_search
{
"query": {
"match": {
"title": {
"query": "alfredo"
}
}
}
}
POST recipes-v1/_search
{
"query": {
"bool": {
"should": {
"match": {
"name": "alfredo"
}
}
}
}
}
EDIT/UPDATE:
I changed the document fields to be all lowercase and the problem persists. However, if I set dynamic mapping to True with a new index, everything works. The mapping is now this and works, but I would like still like to know why my static mapping did not work, as eventually I'd want to make this static.
{
"recipes-v1": {
"mappings": {
"properties": {
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"ingredients": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"instructions": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}

Your documents contains field names that are capital-cased, i.e. Description, Ingredients, etc
Your mapping contains the same field names but lowercased, i.e. description, ingredients, etc and has dynamic mapping disabled (dynamic: false) so new fields will not be created and indexed dynamically.
You need to either change your mapping or your documents so that both have the exact same field names.

Related

Extract keywords (multi word) from text using elastic search and return offset of the searched words

I have a lot of keywords that I want to extract from a query and tell the position(offset) of were the keywords are in that text
So this is my progress I created two custom analyzers keyword and shingles:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase"
]
},
"my_analyzer_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"shingle"
]
}
}
}
},
"mappings": {
"your_type": {
"properties": {
"keyword": {
"type": "string",
"index_analyzer": "my_analyzer_keyword",
"search_analyzer": "my_analyzer_shingle"
}
}
}
}
And here are the keywords that I say:
{
"hits": {
"total": 2000,
"hits": [
{
"id": 1,
"keyword": "python programming"
},
{
"id": 2,
"keyword": "facebook"
},
{
"id": 3,
"keyword": "Microsoft"
},
{
"id": 4,
"keyword": "NLTK"
},
{
"id": 5,
"keyword": "Natural language processing"
}
]
}
}
And I make a query something like this:
{
"query": {
"match": {
"keyword": "I post a lot of things on Facebook and quora"
}
}
}
So with the code above I get
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.009332742,
"hits": [
{
"_index": "test",
"_type": "your_type",
"_id": "2",
"_score": 0.009332742,
"_source": {
"id": 2,
"keyword": "facebook"
}
},
{
"_index": "test",
"_type": "your_type",
"_id": "4",
"_score": 0.009207102,
"_source": {
"id": 4,
"keyword": "quora"
}
}
]
}
}
But I don't know were in the text are that words the offset of those words:
I want to know that quora start at index 40. But not to highlight them between tags or something like this.
I want to mention that my post is based on this post
Extract keywords (multi word) from text using elastic search

how do I get elasticsearch to always return the smallest field value ("url length")?

How do I always return the documents with the lowest value in the "url_length" field regardless of (from) that I sent to search?
in the query below, I request the results that have the word (netflix) and that the field (pgrk) is between 9 and 10 and that the field (url_length) is less than 4, but when I put it ("from": 1, "size ": 1) does not return the doc of (_id 15) that has the word (netflix) the field pgrk = 10 and the field (url_length) = 2. Returns the doc of (_id 14) that has the word (netflix) the field pgrk = 10 and the field (url_length) = 3
just return the doc of (_id 15) that has the field (url_length) = 2 if I put it in the query from ZERO ("from": 0, "size": 1)
because I had it searched ("from": 1, "size": 1,) and didn't bring the record of (_id 15) that has the ("url_length" = 2) brought the record of (_id 14) that has the ("url_length" = 3)
{
"from": 1,
"size": 1,
"sort": [
{
"pgrk": {
"order": "desc"
}
},
{
"url_length": {
"order": "asc"
}
}
],
"query": {
"bool": {
"must": {
"multi_match": {
"query": "netflix",
"type": "cross_fields",
"fields": [
"tittle",
"description",
"url"
],
"operator": "and"
}
},
"filter": [
{
"range": {
"pgrk": {
"gte": 9,
"lte" : 10
}
}
},
{
"range": {
"url_length": {
"lt" : 4
}
}
}
]
}
}
}
if I put ("from": 1, "size": 1,) it does not return the record (_id 15) that has "url_length = 2" returns the doc of _id 14 that has "url_length = 3" as shown below:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "teste",
"_type": "_doc",
"_id": "14",
"_score": null,
"_source": {
"url": "www.333.com",
"title": "netflix netflix netflix netflix netflix netflix netflix netflix netflix netflix",
"description": "tudo sobre netflix netflix netflix netflix netflix netflix",
"pgrk": "10",
"url_length": "3"
},
"sort": [
10,
3
]
}
]
}
}
if I put ("from": 0, "size": 1,) then it returns the record (_id 15) that has "url_length = 2"
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "teste",
"_type": "_doc",
"_id": "15",
"_score": null,
"_source": {
"url": "www.netflix.yahoo.com",
"title": "melhor filme",
"description": "tudo sobre series",
"pgrk": "10",
"url_length": "2"
},
"sort": [
10,
2
]
}
]
}
}
how do I always return the documents with the lowest value in the "url_length" field regardless of (from) that I sent to search?
follows my mapping:
{
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "0",
"analysis": {
"filter": {
"stemmer_plural_portugues": {
"name": "minimal_portuguese",
"stopwords" : ["http", "https", "ftp", "www"],
"type": "stemmer"
}
},
"analyzer": {
"analyzer_customizado": {
"filter": [
"lowercase",
"stemmer_plural_portugues",
"asciifolding"
],
"tokenizer": "lowercase"
}
}
}
}
},
"mappings": {
"properties": {
"q": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"id": {
"type": "long"
},
"#timestamp": {
"type": "date"
},
"data": {
"type": "date"
},
"#version": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"quebrado": {
"type": "byte"
},
"pgrk": {
"type": "integer"
},
"url_length": {
"type": "integer"
},
"term": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"titulo": {
"analyzer": "analyzer_customizado",
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"descricao": {
"analyzer": "analyzer_customizado",
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"url": {
"analyzer": "analyzer_customizado",
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
}
}
}

Elasticsearch Completion Suggester doesn't return documents on searches that match input

I have a weird problem with Elasticsearch 6.0.
I have an index with the following mapping:
{
"cities": {
"mappings": {
"cities": {
"properties": {
"city": {
"properties": {
"id": {
"type": "long"
},
"name": {
"properties": {
"en": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"it": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"slug": {
"properties": {
"en": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"it": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
},
"doctype": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"suggest": {
"type": "completion",
"analyzer": "accents",
"search_analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": false,
"max_input_length": 50
},
"weight": {
"type": "long"
}
}
}
}
}
}
I have these documents in my index:
{
"_index": "cities",
"_type": "cities",
"_id": "991-city",
"_version": 128,
"found": true,
"_source": {
"doctype": "city",
"suggest": {
"input": [
"nazaré",
"nazare",
"나자레",
"najare",
"najale",
"ナザレ",
"Ναζαρέ"
],
"weight": 1807
},
"weight": 3012,
"city": {
"id": 991,
"name": {
"en": "Nazaré",
"it": "Nazaré"
},
"slug": {
"en": "nazare",
"it": "nazare"
}
}
}
}
{
"_index": "cities",
"_type": "cities",
"_id": "1085-city",
"_version": 128,
"found": true,
"_source": {
"doctype": "city",
"suggest": {
"input": [
"nazareth",
"nazaret",
"拿撒勒",
"na sa le",
"sa le",
"le",
"na-sa-lei",
"나사렛",
"nasares",
"nasales",
"ナザレス",
"nazaresu",
"नज़ारेथ",
"nj'aareth",
"aareth",
"najaratha",
"Назарет",
"Ναζαρέτ",
"názáret",
"nazaretas"
],
"weight": 1809
},
"weight": 3015,
"city": {
"id": 1085,
"name": {
"en": "Nazareth",
"it": "Nazareth"
},
"slug": {
"en": "nazareth",
"it": "nazareth"
}
}
}
}
Now, when I search using the suggester, with the following query:
POST /cities/_search
{
"suggest":{
"suggest":{
"prefix":"nazare",
"completion":{
"field":"suggest"
}
}
}
}
I expect to have both documents in my results, but I only get the second one (nazareth) back:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0.0,
"hits": []
},
"suggest": {
"suggest": [
{
"text": "nazare",
"offset": 0,
"length": 6,
"options": [
{
"text": "nazaresu",
"_index": "cities",
"_type": "cities",
"_id": "1085-city",
"_score": 1809.0,
"_source": {
"doctype": "city",
"suggest": {
"input": [
"nazareth",
"nazaret",
"拿撒勒",
"na sa le",
"sa le",
"le",
"na-sa-lei",
"나사렛",
"nasares",
"nasales",
"ナザレス",
"nazaresu",
"नज़ारेथ",
"nj'aareth",
"aareth",
"najaratha",
"Назарет",
"Ναζαρέτ",
"názáret",
"nazaretas"
],
"weight": 1809
},
"weight": 3015,
"city": {
"id": 1085,
"name": {
"en": "Nazareth",
"it": "Nazareth"
},
"slug": {
"en": "nazareth",
"it": "nazareth"
}
}
}
}
]
}
]
}
}
This is unexpected, because in the suggester input for the first document, the term that I searched "nazare" appears exactly as I input it.
Another fun fact is that if I search for "najare" instead of "nazare" I get the correct results.
Any hint will be really appreciated!
For a quick solution, use the size parameter in the completion object of your query.
GET /cities/_search
{
"suggest":{
"suggest":{
"prefix":"nazare",
"completion":{
"field":"suggest",
"size": 100 <- HERE
}
}
}
}
The size parameter default to 5, so once elasticsearch as found 5 terms (and not document) having the correct prefix, it will stop looking for more terms (and consequently documents).
This limit is per term, not per document. So if one document contains 5 terms having the correct and you use the default value of 5, then possibly the other documents will not be returned.
I strongly believe that it is whats happening in your case. The returned document has at least 5 suggest terms having the prefix nazare so only this one will be returned.
For your fun fact, when you are searching najare, there is only one term having the correct prefix, so you have the correct result.
The tricky thing is that the results depends on the order elasticsearch retrieve the documents. If the first document would have been retrieved first, it would not have reach the size threshold (only 2 or 3 prefix occurrences), the next document would be also retrieved and you would have get the correct result.
Also, unless necessary, avoid using a very high value (e.g. > 1000) for the sizeparameter. It might impact the performance particularly for short or common prefixes.

Boost score based on integer value - Elasticsearch

I'm not very experienced with ElasticSearch and would like to know how to boost a search based on a certain integer value.
This is an example of a document:
{
"_index": "links",
"_type": "db1",
"_id": "mV32vWcBZsblNn1WqTcN",
"_score": 8.115617,
"_source": {
"url": "example.com",
"title": "Example website",
"description": "This is an example website, used for various of examples around the world",
"likes": 9,
"popularity": 543,
"tags": [
{
"name": "example",
"votes": 5
},
{
"name": "test",
"votes": 2
},
{
"name": "testing",
"votes": 1
}
]
}
}
Now in this particular search, the focus is on the tags and I would like to know how to boost the _score and multiply it by the integer in the votes under tags.
If this is not possible (or very hard to achieve), I would simply like to know how to boost the _score by the votes (not under tags)
Example, add 0.1 to the _score for each integer in votes
This is the current search query I'm using (for searching tags only):
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool":{
"should":{
"match":{
"tags.name":"example,testing,something else"
}
}
}
}
}
}
}
I couldn't find much online, and hope someone can help me out.
How do I boost the _score with an integer value?
Update
For more info, here is the mapping:
{
"links": {
"mappings": {
"db1": {
"properties": {
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"likes": {
"type": "long"
},
"popularity": {
"type": "long"
},
"tags": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"votes": {
"type": "long"
}
}
}
}
}
}
}
}
Update 2
Changed the tags.likes/tags.dislikes to tags.votes, and added a nested property to the tags
This took a long time to figure out. I have learnt so much on my way there.
Here is the final result:
{
"query": {
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"tags.name": "example"
}
},
{
"match": {
"tags.name": "testing"
}
},
{
"match": {
"tags.name": "test"
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "tags.votes"
}
}
],
"boost_mode": "multiply"
}
}
}
}
}
The array in should has helped a lot, and was glad I could combine it with function_score
You are looking at function score query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
And field value factor https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor.
Snippet from documentation:
GET /_search
{
"query": {
"function_score": {
"field_value_factor": {
"field": "tags.dislikes",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
Or with script score because your nested tags field (not sure if field value score works fine with nested structure).

Smartcase searches/highlights with ElasticSearch

Context
I am trying to support smart-case search within our application which uses elasticsearch. The use case I want to support is to be able to partially match on any blob of text using smart-case semantics. I managed to configure my index in such a way that I am capable of simulating smart-case search. It uses ngrams of max length 8 to not overload storage requirements.
The way it works is that each document has both a generated case-sensitive and a case-insensitive field using copy_to with their own specific indexing strategy. When searching on a given input, I split the input in parts. This depends on the ngrams length, white spaces and double quote escaping. Each part is checked for capitalized letters. When a capital letter is found, it generates a match filter for that specific part using the case-sensitive field, otherwise it uses the case-insensitive field.
This has proven to work very nicely, however I am having difficulties with getting highlighting to work the way I would like. To better explain the issue, I added an overview of my test setup below.
Settings
curl -X DELETE localhost:9200/custom
curl -X PUT localhost:9200/custom -d '
{
"settings": {
"analysis": {
"filter": {
"default_min_length": {
"type": "length",
"min": 1
},
"squash_spaces": {
"type": "pattern_replace",
"pattern": "\\s{2,}",
"replacement": " "
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "8"
}
},
"analyzer": {
"index_raw": {
"type": "custom",
"filter": ["lowercase","squash_spaces","trim","default_min_length"],
"tokenizer": "keyword"
},
"index_case_insensitive": {
"type": "custom",
"filter": ["lowercase","squash_spaces","trim","default_min_length"],
"tokenizer": "ngram_tokenizer"
},
"search_case_insensitive": {
"type": "custom",
"filter": ["lowercase","squash_spaces","trim"],
"tokenizer": "keyword"
},
"index_case_sensitive": {
"type": "custom",
"filter": ["squash_spaces","trim","default_min_length"],
"tokenizer": "ngram_tokenizer"
},
"search_case_sensitive": {
"type": "custom",
"filter": ["squash_spaces","trim"],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"_default_": {
"_all": { "enabled": false },
"date_detection": false,
"dynamic_templates": [
{
"case_insensitive": {
"match_mapping_type": "string",
"match": "case_insensitive",
"mapping": {
"type": "string",
"analyzer": "index_case_insensitive",
"search_analyzer": "search_case_insensitive"
}
}
},
{
"case_sensitive": {
"match_mapping_type": "string",
"match": "case_sensitive",
"mapping": {
"type": "string",
"analyzer": "index_case_sensitive",
"search_analyzer": "search_case_sensitive"
}
}
},
{
"text": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"analyzer": "index_raw",
"copy_to": ["case_insensitive","case_sensitive"],
"fields": {
"case_insensitive": {
"type": "string",
"analyzer": "index_case_insensitive",
"search_analyzer": "search_case_insensitive",
"term_vector": "with_positions_offsets"
},
"case_sensitive": {
"type": "string",
"analyzer": "index_case_sensitive",
"search_analyzer": "search_case_sensitive",
"term_vector": "with_positions_offsets"
}
}
}
}
}
]
}
}
}
'
Data
curl -X POST "http://localhost:9200/custom/test" -d '{ "text" : "tHis .is a! Test" }'
Query
The user searches for: tHis test which gets split into two parts as ngrams are maximum 8 in lengths: (1) tHis and (2) test. For (1) the case-sensitive field is used and (2) uses the case-insensitive field.
curl -X POST "http://localhost:9200/_search" -d '
{
"size": 1,
"query": {
"bool": {
"must": [
{
"match": {
"case_sensitive": {
"query": "tHis",
"type": "boolean"
}
}
},
{
"match": {
"case_insensitive": {
"query": "test",
"type": "boolean"
}
}
}
]
}
},
"highlight": {
"pre_tags": [
"<em>"
],
"post_tags": [
"</em>"
],
"number_of_fragments": 0,
"require_field_match": false,
"fields": {
"*": {}
}
}
}
'
Response
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.057534896,
"hits": [
{
"_index": "custom",
"_type": "test",
"_id": "1",
"_score": 0.057534896,
"_source": {
"text": "tHis .is a! Test"
},
"highlight": {
"text.case_sensitive": [
"<em>tHis</em> .is a! Test"
],
"text.case_insensitive": [
"tHis .is a!<em> Test</em>"
]
}
}
]
}
}
Problem: highlighting
As you can see, the response shows that the smart-case search works very well. However, I also want to give feedback to the user using highlighting. My current setup uses "term_vector": "with_positions_offsets" to generate highlights. This indeed gives back correct highlights. However, the highlights are returned as both case-sensitive and case-insensitive independently.
"highlight": {
"text.case_sensitive": [
"<em>tHis</em> .is a! Test"
],
"text.case_insensitive": [
"tHis .is a!<em> Test</em>"
]
}
This requires me to manually zip multiple highlights on the same field into one combined highlight before returning it to the user. This becomes very painful when highlights become more complicated and can overlap.
Question
Is there an alternative setup to actually get back the combined highlight. I.e. I would like to have this as part of my response.
"highlight": {
"text": [
"<em>tHis</em> .is a!<em> Test</em>"
]
}
Attempt
Make use of highlight query to get merged result:
curl -XPOST 'http://localhost:9200_search' -d '
{
"size": 1,
"query": {
"bool": {
"must": [
{
"match": {
"case_sensitive": {
"query": "tHis",
"type": "boolean"
}
}
},
{
"match": {
"case_insensitive": {
"query": "test",
"type": "boolean"
}
}
}
]
}
},
"highlight": {
"pre_tags": [
"<em>"
],
"post_tags": [
"</em>"
],
"number_of_fragments": 0,
"require_field_match": false,
"fields": {
"*.case_insensitive": {
"highlight_query": {
"bool": {
"must": [
{
"match": {
"*.case_insensitive": {
"query": "tHis",
"type": "boolean"
}
}
},
{
"match": {
"*.case_insensitive": {
"query": "test",
"type": "boolean"
}
}
}
]
}
}
}
}
}
}
'
Response
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.9364339,
"hits": [
{
"_index": "custom",
"_type": "test",
"_id": "1",
"_score": 0.9364339,
"_source": {
"text": "tHis .is a! Test"
},
"highlight": {
"text.case_insensitive": [
"<em>tHis</em> .is a!<em> Test</em>"
]
}
}
]
}
}
Warning
When ingesting the following, note the additional lower-case test keyword:
curl -X POST "http://localhost:9200/custom/test" -d '{ "text" : "tHis this .is a! Test" }'
The response to the same query becomes:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.9364339,
"hits": [
{
"_index": "custom",
"_type": "test",
"_id": "1",
"_score": 0.9364339,
"_source": {
"text": "tHis this .is a! Test"
},
"highlight": {
"text.case_insensitive": [
"<em>tHis</em><em> this</em> .is a!<em> Test</em>"
]
}
}
]
}
}
As you can see, the highlight now also includes the lower-case this. For such a test example, we do not mind. However, for complicated queries, the user might (and probably will) get confused when and how the smart-case has any effect. Especially when the lower-case match would include a field that only matches on lower-case.
Conclusion
This solution will give you all highlights merged as one, but might include unwanted results.

Resources