Optimize elasticsearch query using highlighting - elasticsearch

I would like to know why the processing time is increased when highlighting is used. Is there a way to optimize it?
An example query is mentioned below :
{
"from": 30,
"size": 60,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "shall have the right",
"fields": ["subType", "title", "type", "content"],
"fuzziness": 1
}
}
}
},
"highlight": {
"type": "unified",
"fields": {
"*": {}
}
}
}

Because to get the highlight information, ES needs to run a fetch phase, which means another call.
Please read more about the fetch phase https://www.elastic.co/guide/en/elasticsearch/guide/current/_fetch_phase.html. Also in the same doc, information about the highlight is mentioned.
The shard loads the document bodies—the _source field—and, if
requested, enriches the results with metadata and search snippet
highlighting. Once the coordinating node receives all results, it
assembles them into a single response that it returns to the client.

Related

Why did hit count increase after adding ElasticSearch mapping?

I added a mapping to ElasticSearch. A while after adding the mapping, I noticed the hit count when querying an unrelated property was much higher than it should be.
Is the change in hit count a result of the operations I performed, or is it more likely that something else is going on? Did these operations introduce duplicate documents?
UPDATE: Duplicates were introduced to ElasticSearch by another process.
(1) Added a new mapping to support numeric searching, like this:
PUT http://myserver:9200/foo/bar/_mapping
{
"properties": {
"price": {
"type": "text",
"fields": {
"numeric": {
"type": "double"
}
}
}
}
}
(2) Then I sent _update_by_query with no post body:
POST http://myserver:9200/foo/bar/_update_by_query
(3) Added a numeric mapping to another property the same way as the last two steps.
After this operation, the hit count for the following query increased by approx 4x. There are (should be) only approx 50,000 documents where document_status = active, but the query now returns a hit count of around 200,000. The property I'm querying on was NOT modified in the previous steps.
POST http://myserver:9200/foo/bar/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
},
{
"match_phrase": {
"document_status": {
"query": "active"
}
}
}
],
"filter": [],
"should": [],
"must_not": []
}
},
"from": 0,
"size": 0
}

Elasticsearch template in Logstash doesn't mapping and not able to sort fields

I want to sort datas via elasticsearch rest client, below is my template in logstash
{
"index_patterns": ["index_name"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"properties": {
"int_var": {
"type": "keyword"
}
}
}
}
}
}
When I try to reach, with the below code
{
"size": 100,
"query": {
"bool": {
"must": {
"match": {
"match_field": user_request
}
}
}
},
"sort": [
{"int_var": {"order": "asc"}}
]
}
I've got this error
Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true
How can i solve this ? Thanks for answering
Here's the documentation regarding field data and how to enable it as long as you are aware of the performance impacts.
When ingested into Elasticsearch, field values are tokenized based on their data type.
Text fields are broken into tokens delimited by whitespace. I.E. "quick brown fox" creates three tokens: 'quick', 'brown', and 'fox'. If you perform a search for any of these three words, you will generate matches.
Keyword fields, on the other hand, create a single token of the entire value. I.E. "quick brown fox" is a single token, 'quick brown fox'. Searching for anything that is not exactly 'quick brown fox' will generate no matches.
Unless you scrubbed your query before you posted it here, you need to modify the field name under match to be the actual field name, like below.
{
"size": 100,
"query": {
"bool": {
"must": {
"match": {
"int_var": "whatever value you are searching for"
}
}
}
},
"sort": [
{"int_var": {"order": "asc"}}
]
}

ElasticSearch must-terms does not return data

My ElasticSearch must-terms does not work, the data has clientId value "08d71bc7-c4ab-6e1d-f858-cf3448242e8b" but the result is empty. I am using elasticsearch:6.7.1. Do you know the problem here?
{
"from": 0,
"size": 20,
"query": {
"bool": {
"must": [
{ "terms": { "clientId": ["08d71bc7-c4ab-6e1d-f858-cf3448242e8b", "08d71bc7-c4ab-6e1d-f858-cf3448242e8c"] } },
{
"query_string": {
"query": "*d*",
"fields": ["name", "description", "title"]
}
},
{ "query_string": { "query": "1", "fields": ["type"] } }
]
}
}
}
I share sample data
I haven't worked enough with "query_string"... But if you don't put them and run your query, I'm sure it should at least give you some results. If so, your "query_string"s are the ones that are giving you this bad time
I first recommend you to use "filter" instead of "must".
Consider using the Regexp query your first "query_string". I found here how to query multiple fields with Regexp.
For the second, it would be enough to use "term" instead of "query_string".
Hope this is helpful! :D
The search results depends on the analysis type of clientId . If clientId is a 'keyword' your query should work as expected, but if the type of clientId is 'text' then the value might get tokenized to smaller parts (break at the dash).
You can check the clientId fields type in the index mappings, and also run the analyze API to check the tokenization: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html

Elasticsearch - Include fields in highlight excluded in _source

I know objects marked as excluded in the _source mapping can be included in the search query. But I have a requirement to include matching terms in the highlight section of the response.
e.g.
I have a mapping like:
{
"mappings": {
"doc": {
"_source": {
"excludes": ["some_nested_object.complex_tags_object"]
},
"properties": {
"some_nested_object": {
"type": "nested"
}
}
}
}
}
Search Query:
GET my_index/_search {
"size": 500,
"query": {
"bool": {
"must": [{
"nested": {
"query": {
"bool": {
"must":
[{
"match_phrase_prefix": {
"some_nested_object.complex_tags_object.name": {
"query": "account"
}
}
}
]
}
},
"path": "some_nested_object"
}
}
]
}
},
"highlight": {
"pre_tags": [
""
],
"post_tags": [
""
],
"fields": {
"some_nested_object.complex_tags_object.name": {}
}
}
}
If I don't exclude in the mapping but in the search query at runtime then I am able to return matching terms in the highlight section but the response is very slow due to the large size of the object.
So is it possible to include fields marked as exclude in the mapping/doc/_source as part of highlight?
So is it possible to include fields marked as exclude in the mapping/doc/_source as part of highlight?
The short answer to your question unfortunately is no. From the Elasticsearch highlighting documentation:
Highlighting requires the actual content of a field. If the field is not stored (the mapping does not set store to true), the actual _source is loaded and the relevant field is extracted from _source.
You have a few options, each of which involve compromise:
Include your field back into the source if you absolutely need to support highlighting over it (I appreciate this will conflict with the reasons for excluding it from the source in the first place)
Relax the requirement to support highlighting over this field (compromise on features)
Implement a highlighting feature for this field outside Elasticsearch (probably this will compromise on quality of your solution and perhaps cost)

Elasticsearch query on parent child using facet count for matching results from both parent and child

The idea is to perform a query on everything that matches a basic query statement and return a facet count
The children matched of type page (child) and then the count of the book (parent). Use case of this would be to show X amount of books on Y amount of pages. These would then have seperates links, with additional queries etc.
I'm fresh out of the box with elasticsearch, very cool what I've got into so far, hit a brick wall with this, any help would be really useful.
Thank you for your time :)
{
"query": {
"has_child": {
"type": "page",
"query": {
"filtered": {
"query": {
"query_string": {
"default_field": "text",
"query": "some example search query"
}
}
}
}
},
"facets": {},
"sort": [
"_score"
],
"from": 0,
"size": 10
}
}
Yes, I've read the documentation on facets

Resources