Unable to retrieve nested object within Elastic Search - elasticsearch

An ELK noob here, having the ELK task drop to me last minute.
We are adding an extra data named prospects into the vehicle index, so the user could search for it. I'm able to to add the prospects into the index, now I'm unable to get the nested prospects obj within the vehicle index. I'm using Elastic Search & Kibana v6.8.11, and elastic-search-rails gem and checked up the docs on nested object. My search method looks correct according to the docs. Would like some expert to point out what when wrong here, please let me know if you need more info.
Here is the suppose index obj -
{
"_index" : "vehicles",
"_type" : "_doc",
"_id" : "3MZBxxxxxxx",
"_score" : 0.0,
"_source" : {
"vin" : "3MZBxxxxxxx",
"make" : "mazda",
"model" : "mazda3",
"color" : "unknown",
"year" : 2018,
"vehicle" : "2018 mazda mazda3",
"trim" : "grand touring",
"estimated_mileage" : null,
"dealership" : [
209
],
"current_owner_group_id" : null,
"current_owner_customer_id" : null,
"last_service_date" : null,
"last_service_revenue" : null,
"purchase_type" : [ ],
"in_service_date" : null,
"deal_headers" : [ ],
"services" : [ ],
"customers" : [ ],
"salesmen" : null,
"service_appointments" : [ ],
"prospects" : [
{
"first_name" : "Kammy",
"last_name" : "Maytag",
"name" : "Kammy Maytag",
"company_name" : null,
"emails" : [ ],
"phone_numbers" : [ ],
"address" : "31119 field",
"city" : "helen",
"state" : "keller",
"zip" : "81411",
"within_dealership_aoi_region" : true,
"dealership_ids" : [
209
],
"dealership_dppa_protected_ids" : [
209
],
"registration_id" : 12344,
"id" : 1054,
"prospect_source_id" : "12344",
"type" : "Prospect"
}
]
}
}
]
}
}
Here is how I'm trying to get it -
GET /vehicles/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": [
{ "term": { "dealership": "209" } },
{
"nested": {
"path": "prospects",
"query": {
"bool": {
"must": [
{ "term": { "prospects.first_name": "Kammy" } },
{ "term": { "prospects.dealership": "209" } },
{ "term": { "prospects.type": "Prospect" } }
]
}
}
}
},
{ "bool": { "must_not": { "term": { "purchase_type": "Wholesale" } } } }
]
}
},
"sort": [{ "_doc": { "order": "asc" } }]
}

I see two issues with the nested query:
You're querying prospects.dealership but the example doc only shows prospects.dealership_ids. Change query to target prospects.dealership_ids.
More importantly, you're using a term query on prospects.first_name and prospects.type. I'm assuming your index mapping doesn't define those as keywords which means that they were most likely lowercased (for reasons explained here) but term is looking for exact matches.
Option 1: Use match instead of term.
Option 2: Change prospects.first_name → prospects.first_name.keyword and do the same for .type.

Related

elasticsearch filter nested object

I have an index with a nested object containing two attributes namely scopeId and categoryName. Following is the mappings part of the index
"mappedCategories" : {
"type" : "nested",
"properties": {
"scopeId": {"type":"long"},
"categoryName": {"type":"text",
"analyzer" : "productSearchAnalyzer",
"search_analyzer" : "productSearchQueryAnalyzer"}
}
}
A sample document containing the nested mappedCategories object is as follows:
POST productsearchna_2/_doc/1
{
"categoryName" : "Operating Systems",
"contexts" : [
0
],
"countryCode" : "US",
"id" : "10076327-1",
"languageCode" : "EN",
"localeId" : 1,
"mfgpartno" : "test123",
"manufacturerName" : "Hewlett Packard Enterprise",
"productDescription" : "HPE Microsoft Windows 2000 Datacenter Server - Complete Product - Complete Product - 1 Server - Standard",
"productId" : 10076327,
"skus" : [
{"sku": "43233004",
"skuName": "UNSPSC"},
{"sku": "43233049",
"skuName": "SP Richards"},
{"sku": "43234949",
"skuName": "Ingram Micro"}
],
"mappedCategories" : [
{"scopeId": 3228552,
"categoryName": "Laminate Bookcases"},
{"scopeId": 3228553,
"categoryName": "Bookcases"},
{"scopeId": 3228554,
"categoryName": "Laptop"}
]
}
I want to filter categoryName "lap" on scopeId: 3228553 i.e. my query should return 0 hits since Laptop is mapped to scopeId 3228554. But my following query is returning 1 hit with scopeId : 3228554
POST productsearchna_2/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "mappedCategories",
"query": {
"term": {
"mappedCategories.categoryName": "lap"
}
},
"inner_hits": {}
}
}
],
"filter": [
{
"nested": {
"path": "mappedCategories",
"query": {
"term": {
"mappedCategories.scopeId": {
"value": 3228552
}
}
}
}
}
]
}
},
"_source": ["mappedCategories.categoryName", "productId"]
}
Following is part of the result of the query:
"inner_hits" : {
"mappedCategories" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.5586993,
"hits" : [
{
"_index" : "productsearchna_2",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "mappedCategories",
"offset" : 2
},
"_score" : 1.5586993,
"_source" : {
"scopeId" : 3228554,
"categoryName" : "Laptop"
}
}
]
}
}
I want my query to return zero hits, and in case I search for "book" with scopeId: 3228552, I want my query to return 2 hits, 1 for Bookcases and another for Laminate Bookcases categoryNames. Please help.
This query solves part of the problem but when searching for book" with scopeId: 3228552 it will only get 1 result.
GET idx_test/_search?filter_path=hits.hits.inner_hits
{
"query": {
"nested": {
"path": "mappedCategories",
"query": {
"bool": {
"filter": [
{
"term": {
"mappedCategories.scopeId": {
"value": 3228553
}
}
}
],
"must": [
{
"match": {
"mappedCategories.categoryName": "laptop"
}
}
]
}
},
"inner_hits": {}
}
}
}

Elastic search dynamic field mapping with range query on price field

I have two fields in my elastic search which is lowest_local_price and lowest_global_price.
I want to map dynamic value to third field price on run time based on local or global country.
If local country matched then i want to map lowest_local_price value to price field.
If global country matched then i want to map lowest_global_price value to price field.
If local or global country matched then i want to apply range query on the price field and boost that doc by 2.0.
Note : This is not compulsary filter or query, if matched then just want to boost the doc.
I have tried below solution but does not work for me.
Query 1:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price]]
],
"boost" => 2.0
]
]
];
Query 2:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price, "boost" => 2.0]]
],
]
]
];
None of them working for me, because it can boost the doc. I know filter does not work with boost, then what is the solution for dynamic field mapping with range query and boost?
Please help me to solve this query.
Thank you in advance!
You can (most likely) achieve what you want without runtime_mappings by using a combination of bool queries, here's how.
Let's define test mapping
We need to clarify what mapping we are working with, because different field types require different query types.
Let's assume that your mapping looks like this:
PUT my-index-000001
{
"mappings": {
"dynamic": "runtime",
"properties": {
"country_en_name": {
"type": "text"
},
"lowest_local_price": {
"type": "float"
},
"global_rates": {
"properties": {
"UK": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"FR": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"US": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
}
}
}
}
}
}
Note that country_en_name is of type text, in general such fields should be indexed as keyword but for the sake of demonstration of the use of runtime_mappings I kept it text and will show later how to overcome this limitation.
bool is the same as if for Elasticsearch
The query without runtime mappings might look like this:
POST my-index-000001/_search
{
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"country_en_name": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
This can be interpreted as the following:
Any document
OR (
(document with country_en_name=UK AND lowest_local_price > X)
OR
(document with global_rates.UK.lowest_global_price > X)
)[boost this part of OR]
The match_all is needed to return also documents that do not match the other queries.
How will the response of the query look like?
Let's put some documents in the ES:
POST my-index-000001/_doc/1
{
"country_en_name": "UK",
"lowest_local_price": 1500,
"global_rates": {
"FR": {
"lowest_global_price": 1000
},
"US": {
"lowest_global_price": 1200
}
}
}
POST my-index-000001/_doc/2
{
"country_en_name": "FR",
"lowest_local_price": 900,
"global_rates": {
"UK": {
"lowest_global_price": 950
},
"US": {
"lowest_global_price": 1500
}
}
}
POST my-index-000001/_doc/3
{
"country_en_name": "US",
"lowest_local_price": 950,
"global_rates": {
"UK": {
"lowest_global_price": 1100
},
"FR": {
"lowest_global_price": 1000
}
}
}
Now the result of the search query above will be something like:
{
...
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 4.9616585,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 4.9616585,
"_source" : {
"country_en_name" : "UK",
"lowest_local_price" : 1500,
...
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "3",
"_score" : 3.0,
"_source" : {
"country_en_name" : "US",
"lowest_local_price" : 950,
"global_rates" : {
"UK" : {
"lowest_global_price" : 1100
},
...
}
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"country_en_name" : "FR",
"lowest_local_price" : 900,
"global_rates" : {
"UK" : {
"lowest_global_price" : 950
},
...
}
}
}
]
}
}
Note that document with _id:2 is on the bottom because it didn't match any of the boosted queries.
Will runtime_mappings be of any use?
Runtime mappings are useful in case there's an existing mapping with data types that do not permit to execute a certain type of query. In previous versions (before 7.11) one would have to do a reindex in such cases, but now it is possible to use runtime mappings (but the query is more expensive).
In our case, we have got country_en_name indexed as text which is suited for full-text search and not for exact lookups. We should rather use keyword instead. This is how the query may look like with the help of runtime_mappings:
POST my-index-000001/_search
{
"runtime_mappings": {
"country_en_name_keyword": {
"type": "keyword",
"script": {
"source": "emit(params['_source']['country_en_name'])"
}
}
},
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"country_en_name_keyword": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
Notice how we created a new runtime field country_en_name_keyword with type keyword and used a term lookup instead of match query.

Elasticsearch Nested 2 Step Sorting

Given the following data with nested objects (members within teams), I need to do a 2 step sort:
Return the youngest member of each team.
Sort the teams by the name of that youngest member.
I have a query below that is close: it does get the youngest member of each team, but then it sorts the teams using the names of all the members, not just the one selected per team.
What would the query be to do this?
And would such a query be performant assuming there was a lot of data? (Probably a few million objects each having 1-3 nested objects.)
Note: Although it's not clear in this simple example, I cannot simply store the youngest member, since in my real world case, the sorting of the nested objects is determined by a formula that includes an external parameter. This is just a very simplified example of the many sorts like this I would have to do on a larger data set, where I need to get the single best matching nested document for each outer document sorted in one way, but then sort the outer objects based on some other property of that selected nested object.
Data
PUT nested_test
{
"mappings": {
"dynamic": "strict",
"properties": {
"team": { "type": "keyword", "index": true, "doc_values": true },
"members": {
"type": "nested",
"properties": {
"name": { "type": "keyword", "index": true, "doc_values": true },
"age": { "type": "integer", "index": true, "doc_values": true}
}
}
}
}
}
PUT nested_test/_doc/1
{
"team" : "A" ,
"members" :
[
{ "name" : "Curt" , "age" : "34" } ,
{ "name" : "Dave" , "age" : "33" }
]
}
PUT nested_test/_doc/2
{
"team" : "B" ,
"members" :
[
{ "name" : "Alex" , "age" : "36" } ,
{ "name" : "Earl" , "age" : "32" }
]
}
PUT nested_test/_doc/3
{
"team" : "C" ,
"members" :
[
{ "name" : "Brad" , "age" : "35" } ,
{ "name" : "Gary" , "age" : "31" }
]
}
Attempted Query
GET nested_test/_search?filter_path=hits.hits._source.team,hits.hits.sort.*,hits.hits.inner_hits.members.hits.hits._source.*,hits.hits.inner_hits.members.hits.hits.sort.*
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "members",
"query": {
"match_all" : { }
} ,
"inner_hits": {
"size": 1,
"sort": {
"members.age": { "order": "asc" }
}
}
}
}
]
}
}
,
"sort": [
{ "members.name": {
"order": "asc" ,
"nested": {
"path": "members",
"filter": { "match_all" : { } }
}
} }
]
}
Results (If the query was correct, the teams would be in A, B, C order, but they are B, C, A)
{
"hits" : {
"hits" : [
{
"_source" : {
"team" : "B"
},
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"name" : "Earl",
"age" : "32"
}
}
]
}
}
}
},
{
"_source" : {
"team" : "C"
},
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"name" : "Gary",
"age" : "31"
}
}
]
}
}
}
},
{
"_source" : {
"team" : "A"
},
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"name" : "Dave",
"age" : "33"
}
}
]
}
}
}
}
]
}
}
I not feasable with nested sort. And you cant use the result of the inner_hits to sort your documents.
You could maybe use some runtime field with a complex script to extract the name of the youngest member at search time, but it will certainly be ugly and the performance of the query will be impacted, it will perform poorly at scale.
Since you use a nested model, you have all the data needed during indexation to store the youngest member name in a specific field at the root of the document.
Then you will be able to use a standard sort for this use case.
Its the right way to do it in Elasticsearch it you want to keep the performance.

Combining nested query get illegal_state_exception failed to find nested object under path

I'm creating a query on Elasticsearch, for find documents through all indices.
I need to combine should, must and nested query on Elasticsearch, i get the right result but i get an error inside the result.
This is the query I'm using
GET _all/_search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{ "term": { "trimmed_final_url": "https://www.repubblica.it/t.../" } }
],
"must": [
{
"nested": {
"path": "entities",
"query": {
"bool": {
"must": [
{ "term": { "entities.id": "138511" } }
]
}
}
}
},
{
"term": {
"language": { "value": "it" }
}
}
]
}
}
And this is the result
{
"_shards" : {
"total" : 38,
"successful" : 14,
"skipped" : 0,
"failed" : 24,
"failures" : [
{
"shard" : 0,
"index" : ".kibana_1",
"node" : "7twsq85TSK60LkY0UiuWzA",
"reason" : {
"type" : "query_shard_exception",
"reason" : """
failed to create query: {
...
"index_uuid" : "HoHi97QFSaSCp09iSKY1DQ",
"index" : ".reporting-2019.06.02",
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "[nested] failed to find nested object under path [entities]"
}
}
},
...
"hits" : {
"total" : {
"value" : 50,
"relation" : "eq"
},
"max_score" : 16.90015,
"hits" : [
{
"_index" : "i_201906_v1",
"_type" : "_doc",
"_id" : "MugcbmsBAzi8a0oJt96Q",
"_score" : 16.90015,
"_source" : {
"language" : "it",
"entities" : [
{
"id" : 101580,
},
{
"id" : 156822,
},
...
I didn't write some fields because the code is too long
I am new to StackOverFlow (made this account to answer this question :D) so if this answer is out of line bear with me. I have been dabbling in nested fields in Elasticsearch recently so I have some ideas as to how this error could be appearing.
Have you defined a mapping for your document type? I don't believe Elasticsearch will recognize the field as nested if you do not tell it to do so in the mapping:
PUT INDEX_NAME
{
"mappings": {
"DOC_TYPE": {
"properties": {
"entities": {"type": "nested"}
}
}
}
}
You may have to specify this mapping for each index and document type. Not sure if there is a way to do that all with one request.
I also noticed you have a "should" clause with minimum matches set to 1. I believe this is exactly the same as a "must" clause so I am not sure what purpose this achieves (correct me if I'm wrong). If your mapping is specified, the query should look something like this:
GET /_all/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "entities",
"query": {
"term": {
"entities.id": {
"value": "138511"
}
}
}
}
},
{
"term": {
"language": {
"value": "it"
}
}
},
{
"term": {
"trimmed_final_url": {
"value": "https://www.repubblica.it/t.../"
}
}
}
]
}
}
}

Aggregate nested objects in ElasticSearch

Let's say we have this document:
{
"Article" : [
{
"id" : 12
"title" : "An article title",
"categories" : [1,3,5,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author" : [
{
"firstname" : "Francois",
"surname": "francoisg",
"id" : 18
},
{
"firstname" : "Gregory",
"surname" : "gregquat"
"id" : "2"
}
]
}
},
{
"id" : 13
"title" : "A second article title",
"categories" : [1,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author" : [
{
"firstname" : "Gregory",
"surname" : "gregquat",
"id" : "2"
}
]
}
}
How can I find all unique authors by id? What is the proper query? I need to return all unique authors ("author.id")
Thanks for help.
First, you should set your mapping with nested type for the field author.
Second, as #Taras_Kohut mentioned, and after you re-indexed the entire data, you can do:
{
"size": 0,
"aggregations": {
"records": {
"nested": {
"path": "author"
},
"aggregations": {
"ids": {
"terms": {
"field": "author.id"
}
}
}
}
}
}
See Nested Aggregation

Resources