Elasticsearch first query is slow, rest of them are fast - elasticsearch

I'm using this kind of mapping (well, it's a shortener version in order to make the question easier) on a children-parent relationship where item is the parent and user_items is the children.
curl -XPUT 'localhost:9200/myindex?pretty=true' -d '{
"mappings": {
"items": {
"dynamic": "strict",
"properties" : {
"title" : { "type": "string" },
"body" : { "type": "string" },
}},
"user_items": {
"dynamic": "strict",
"_parent": {"type": "items" },
"properties" : {
"user_id" : { "type": "integer" },
"source_id" : { "type": "integer" },
}}}}'
And the type of query I usually make:
curl -XGET 'localhost:9200/myindex/items/_search?pretty=true' -d '{
"query": {
"bool": {
"must": [
{
"query_string": {
"fields": ["title", "body"],
"query": "mercado"
}
},
{
"has_child": {
"type": "user_items",
"query": {
"term": {
"user_id": 655
}}}}]}}}'
On this query it has to search on the fields title and body the string mercado on a given user_id, in this case 655.
I read that the reason of being so slow the first query is that it gets cacheed and then the rest queries are fast because it works with the cached content.
I read I can make the first query faster using eager to preload my data (using "loading" : "eager", right?) but I dont know what do I've to preload. Do I've to use the earger on title and body as follows?
{
"mappings": {
"items": {
"dynamic": "strict",
"properties" : {
"title" : { "type": "string" ,
"fielddata": {
"loading" : "eager"}},
"body" : { "type": "string",
"fielddata": {
"loading" : "eager"}},
}},
"user_items": {
"dynamic": "strict",
"_parent": {"type": "items" },
"properties" : {
"user_id" : { "type": "integer" },
"source_id" : { "type": "integer" },
}}}}'
Any other recommendation fot boosting/cacheeing the first query is welcome. Thanks in advance
PS: I'm using ES 2.3.2 under a Linux machine and I've a total of 25.396.369 documents.

Just fixed the same issue. Field Data preloading was the key. Index warming was deprecated and doc_values is on by default. My application searched a couple fields on a large index (100G+) and was slow I had to rebuild the index with loading=eager for all of the fields that I searched on. This preloads it and causes a pretty long startup but after that search went from initial of 10s (<400ms afterwards) to <900ms initial search (<400ms afterwards). Make the mapping and reimport the data
PUT localhost:9200/newindex/
{
"mappings": {
"items": {
"properties": {
"title": {
"type": "string",
"fielddata": {
"loading" : "eager"
}
},
"body": {
"type": "string",
"fielddata": {
"loading" : "eager"
}
}
}
}
}
}

There are three things you can do.
Use field data preloading
To preload field data use following snippet in mapping
"fielddata": {
"loading" : "eager"
}
More details here
Use index warmer
Index warmers are certain queries that you can configure which will run automatically whenever a index is refreshed.
This link contains details on how to set up a warmer.
Use doc_values
Doc values are the on-disk data structure, built at document index time, which makes data access pattern possible for aggregation and sorting possible.
Find more details here

Related

Elasticsearch - Using copy_to on the fields of a nested type

Elastic version 7.17
Below I've pasted a simplified version of my mappings which represent a nested object structure. One top-level-object will have one or more second-level-object. A second-level-object will have one or more third-level-object. Fields field_a, field_b, and field_c on third-level-object are all related to each other so I'd like to copy them into a single field that can be partial matched against. I've done this on a lot of attributes at the top-level-object level, so I know it works.
{
"mappings": {
"_doc": { //one top level object
"dynamic": "false",
"properties": {
"second-level-objects": { //one or more second level objects
"type": "nested",
"dynamic": "false",
"properties": {
"third-level-objects": { //one or more third level objects
"type": "nested",
"dynamic": "false",
"properties": {
"my_copy_to_field": { //should have the values from field_a, field_b, and field_c
"type": "text",
"index": true
},
"field_a": {
"type": "keyword",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_b": {
"type": "long",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_c": {
"type": "keyword",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_d": {
"type": "keyword",
"index": true
}
}
}
}
}
}
}
}
}
However, when I run a nested query against that my_copy_to_field I get no results because the field is never populated, even though I know my documents have data in the 3 fields with copy_to. If I perform a nested query against field_d which is not part of the copied info I get the expected results, so it seems like there's something going on with nested (or double-nested in my case) usage of copy_to that I'm overlooking. Here is my query which returns nothing:
GET /my_index/_search
{
"query": {
"nested": {
"inner_hits": {},
"path": "second-level-objects",
"query": {
"nested": {
"inner_hits": {},
"path": "second-level-objects.third-level-objects",
"query": {
"bool": {
"should": [
{"match": {"second-level-objects.third-level-objects.my_copy_to_field": "my search value"}}
]
}
}
}
}
}
}
}
I've tried adding include_in_root:true to the third-level-objects, but that didn't make any difference. If I could just get the field to populate with the copied data then I'm sure I can work through getting the query working. Is there something I'm missing about using copy_to with nested fields?
Additionally, when I view my data in Kibana -> Discover, I see second-level-objects as an available "Nested" field, but I don't see anything for third-level-objects, even though KQL recognizes it as a field. Is that symptomatic of an issue?
You must add complete path nested, like this:
"field_a": {
"type": "keyword",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
},
"field_b": {
"type": "long",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
},
"field_c": {
"type": "keyword",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
}

Elasticsearch Field Preference for result sequence

I have created the index in elasticsearch with the following mapping:
{
"test": {
"mappings": {
"documents": {
"properties": {
"fields": {
"type": "nested",
"properties": {
"uid": {
"type": "keyword"
},
"value": {
"type": "text",
"copy_to": [
"fulltext"
]
}
}
},
"fulltext": {
"type": "text"
},
"tags": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
While searching I want to set the preference of fields for example if search text found in title or url then that document comes first then other documents.
Can we set a field preference for search result sequence(in my case preference like title,url,tags,fields)?
Please help me into this?
This is called "boosting" . Prior to elasticsearch 5.0.0 - boosting could be applied in indexing phase or query phase( added as part of field mapping ). This feature is deprecated now and all mappings after 5.0 are applied in query time .
Current recommendation is to to use query time boosting.
Please read this documents to get details on how to use boosting:
1 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html
2 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html

Elastic Search,lowercase search doesnt work

I am trying to search again content using prefix and if I search for diode I get results that differ from Diode. How do I get ES to return result where both diode and Diode return the same results? This is the mappings and settings I am using in ES.
"settings":{
"analysis": {
"analyzer": {
"lowercasespaceanalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"articles": {
"properties": {
"title": {
"type": "text"
},
"url": {
"type": "keyword",
"index": "true"
},
"imageurl": {
"type": "keyword",
"index": "true"
},
"content": {
"type": "text",
"analyzer" : "lowercasespaceanalyzer",
"search_analyzer":"whitespace"
},
"description": {
"type": "text"
},
"relatedcontentwords": {
"type": "text"
},
"cmskeywords": {
"type": "text"
},
"partnumbers": {
"type": "keyword",
"index": "true"
},
"pubdate": {
"type": "date"
}
}
}
}
here is an example of the query I use
POST _search
{
"query": {
"bool" : {
"must" : {
"prefix" : { "content" : "capacitance" }
}
}
}
}
it happens because you use two different analyzers at search time and at indexing time.
So when you input query "Diod" at search time because you use "whitespace" analyzer your query is interpreted as "Diod".
However, because you use "lowercasespaceanalyzer" at index time "Diod" will be indexed as "diod". Just use the same analyzer both at search and index time, or analyzer that lowercases your strings because default "whitespace" analyzer doesn't https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-whitespace-analyzer.html
There will be no term of Diode in your index. So if you want to get same results, you should let your query context analyzed by same analyzer.
You can use Query string query like
"query_string" : {
"default_field" : "content",
"query" : "Diode",
"analyzer" : "lowercasespaceanalyzer"
}
UPDATE
You can analyze your context before query.
AnalyzeResponse resp = client.admin().indices()
.prepareAnalyze(index, text)
.setAnalyzer("lowercasespaceanalyzer")
.get();
String analyzedContext = resp.getTokens().get(0);
...
Then use analyzedContext as new query context.

How to fetch values stored in custom mapping fields in elasticsearch index

The elasticsearch (1.7) index I am dealing with has a property "title" which has many custom field mappings each having an analyser. How to fetch the data stored in these individual fields?
title.std
title.stp
Here's the mappings data.
"mappings": {
"myindex": {
"_all": {
"enabled": false
},
"properties": {
"title": {
"type": "string",
"analyzer": "standard",
"fields": {
"std": {
"type": "string",
"analyzer": "standard"
},
"stp": {
"type": "string",
"analyzer": "stop"
}
}
}
}
You can use the term vectors API in order to return information and statistics on terms in the fields of a particular document.
You'd invoke the endpoint like this:
curl -XGET 'http://localhost:9200/myindex/mytype/1/_termvector?pretty=true' -d '{
"fields" : ["title.std", "title.stp"],
"offsets" : true,
"payloads" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}'

Elasticsearch: multiple languages in two fields when the query's language is unknown or mixed

I am new to Elasticsearch, and I am not sure how to proceed in my situation.
I have the following mapping:
{
"mappings": {
"book": {
"properties": {
"title": {
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"ar": {
"type": "string",
"analyzer": "arabic"
}
}
},
"keyword": {
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"ar": {
"type": "string",
"analyzer": "arabic"
}
}
}
}
}
}
}
A sample document may have two languages for the same field of the same book. Here are two example documents:
{
"title" : {
"en": "hello",
"ar": "مرحبا"
},
"keyword" : {
"en": "world",
"ar": "عالم"
}
}
{
"title" : {
"en": "Elasticsearch"
},
"keyword" : {
"en": "full-text index"
}
}
When I know what language is used in query, I am able to build query as follows (when English is used):
"query": {
"multi_match" : {
"query" : "keywords",
"fields" : [ "title.en", "keyword.en" ]
}
}
Based on my current document mapping, how can I build a query if
the query language is unknown or
is mixed with English and Arabic?
Thanks for any input!
Regards.
p.s. I am also open to any improvement to the above mapping.
the query language is unknown
You can use same multi match query but on all the fields.for eg,
Assuming you are using keyword analyzer
"query": {
"multi_match" : {
"query" : "keywords",
"fields" : [ "title.en", "keyword.en", "title.ar", "keyword.ar" ]
}
}
is mixed with English and Arabic
You need to change the analyzer to standard and then you can perform the same query.
Thanks

Resources