Cannot sort search results within nested objects in Elasticsearch 7 - elasticsearch

I want to sort objects in ascending order but the sort doesn't work.
Here is a sort query below.
"sort":[
{
"category.position": {
"order":"asc",
"mode":"min",
"nested": {
"path": "category",
"filter": {
"term": {"category_category_id":42} }
}
}
}]
And here are the objects below.
"name": "Yeti",
"category": [
{
"category_id": 42,
"name": "Raamiga",
"position": 3
},
],
"name": "Venus",
"category": [
{
"category_id": 42,
"name": "Raamiga",
"position": 4
}
],
Please, help! Many thanks in advance!

Solved. There was a typo… Must be "category.category_id" indtead of "category_category_id".

Related

Filter documents out of the facet count in enterprise search

We use enterprise search indexes to store items that can be tagged by multiple tenants.
e.g
[
{
"id": 1,
"name": "document 1",
"tags": [
{ "company_id": 1, "tag_id": 1, "tag_name": "bla" },
{ "company_id": 2, "tag_id": 1, "tag_name": "bla" }
]
}
]
I'm looking to find a way to retrieve all documents with only the tags of company 1
This request:
{
"query": "",
"facets": {
"tags": {
"type": "value"
}
},
"sort": {
"created": "desc"
},
"page": {
"size": 20,
"current": 1
}
}
Is coming back with
...
"facets": {
"tags": [
{
"type": "value",
"data": [
{
"value": "{\"company_id\":1,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
},
{
"value": "{\"company_id\":2,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
}
]
}
],
}
...
Can I modify the request in a way such that I get no tags by "company_id" = 2 ?
I have a solution that involves modifying the results to strip the extra data after they are retrieved but I'm looking for a better solution.

Extract record from multiple arrays based on a filter

I have documents in ElasticSearch with the following structure :
"_source": {
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price": [
"€ 139",
"€ 125",
"€ 120",
"€ 108"
],
"max_occupancy": [
2,
2,
1,
1
],
"type": [
"Type 1",
"Type 1 - (Tag)",
"Type 2",
"Type 2 (Tag)",
],
"availability": [
10,
10,
10,
10
],
"size": [
"26 m²",
"35 m²",
"47 m²",
"31 m²"
]
}
}
Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price: ": "€ 125",
"max_occupancy": "2",
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
}
Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").
Is it possible to extract such a result from elastic search? What kind of query do I need to perform?
Could someone suggest the best approach?
My best approach: go nested with Nested Datatype
Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.
Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.
How would the mapping is going to look like? something like this:
{
"mappings": {
"properties": {
"last_updated": {
"type": "date"
},
"country": {
"type": "string"
},
"records": {
"type": "nested",
"properties": {
"price": {
"type": "string"
},
"max_occupancy": {
"type": "long"
},
"type": {
"type": "string"
},
"availability": {
"type": "long"
},
"size": {
"type": "string"
}
}
}
}
}
}
EDIT: New document structure (containing nested documents) -
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"records": [
{
"price": "€ 139",
"max_occupancy": 2,
"type": "Type 1",
"availability": 10,
"size": "26 m²"
},
{
"price": "€ 125",
"max_occupancy": 2,
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
},
{
"price": "€ 120",
"max_occupancy": 1,
"type": "Type 2",
"availability": 10,
"size": "47 m²"
},
{
"price": "€ 108",
"max_occupancy": 1,
"type": "Type 2 (Tag)",
"availability": 10,
"size": "31 m²"
}
]
}
Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:
{
"_source": [
"last_updated",
"country"
],
"query": {
"bool": {
"must": [
{
"term": {
"country": "Italia"
}
},
{
"nested": {
"path": "records",
"query": {
"bool": {
"must": [
{
"range": {
"records.max_occupancy": {
"gte": 2
}
}
}
]
}
},
"inner_hits": {
"sort": {
"records.price": "asc"
},
"size": 1
}
}
}
]
}
}
}
Conditions are: Italia AND max_occupancy > 2.
Inner hits: sort by price ascending order and get the first result.
Hope you'll find it useful

Elasticsearch function_score decay not working, always returns 1

I've been trying to fix this for hours, but nothing seems to change the return value of the function_score decay function. It's simply 1 at all time. It looks like it can't get the integer of the field I'm specifying?
The data model looks like this (obviously fake):
{
"basics": {
"name": "Mr Augustus Flybynight (Jim)",
"name_pref": "Jim",
"location": {
"city": "Melbourne",
"postalCode": "3040",
"meta": {
"country": "Australia"
},
"region": "VIC",
"address": "iytiytiyt, tyiuyti"
},
"email": "augustus.flybynight2#gmail.com",
"applicantNumber": "11882",
"name_first": "Augustus",
"meta": {
"alternateContact": "",
"lastModified": 1473353751,
"alternateName": "",
"notificationType": "-1",
"alternatePhones": [
],
"gender": "M"
},
"name_last": "Flybynight",
"phone": "44556677"
}
}
I have 3 duplicates of this entity which the only difference is their timestamp (basics.meta.lastModified). I'm trying to create a 'closer is better' functional score so that the latest comes to the top. We haven't mapped the timestamp as a date yet, but it is mapped as an integer.
When trying to query with the following
{
"query": {
"function_score": {
"functions": [
{
"gauss": {
"basics.meta.lastModified": {
"origin": 1474868635, // now
"offset": 86400, // one day
"scale": 604800, // seven days
"decay": 0.5
}
},
"weight": 2
}
],
"query": {
"bool": {
"should": [
{
"match": {
"_all": "augustus flybynight"
}
},
{
"match": {
"basics.all_names.all_names_identifier_whitespace": {
"query": "augustus flybynight",
"boost": 2
}
}
},
{
"match": {
"basics.email.email_identifier_keyword": {
"query": "augustus flybynight",
"boost": 3
}
}
},
{
"match": {
"basics.applicantNumber.applicantNumber_identifier_keyword": {
"query": "augustus flybynight",
"boost": 3
}
}
},
{
"wildcard": {
"basics.email.email_identifier_keyword": {
"wildcard": "augustus flybynight*",
"boost": 2
}
}
},
{
"wildcard": {
"basics.all_names.all_names_identifier_whitespace": {
"wildcard": "augustus flybynight*"
}
}
}
],
"must": []
}
}
}
},
"size": 25,
"from": 0,
"min_score": 0.2
}
But this always returns '1' for the functional score, which is then multiplied to the query and doesn't affect it. It's the weirdest thing.
When looking at the explanation, this is what's returned:
{
"value": 1,
"description": "min of:",
"details": [
{
"value": 1,
"description": "product of:",
"details": [
{
"value": 1,
"description": "Function for field basics.meta.lastModified:",
"details": [
{
"value": 1,
"description": "max(0.0, ((2.0 - MIN[0.0])/2.0)",
"details": [
]
}
]
},
{
"value": 1,
"description": "weight",
"details": [
]
}
]
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": [
]
}
]
}
Seems like 'MIN[0.0]' is the part that should be returning the timestamp, but it isn't, instead returning 0 and making the decay function always 1. if I make the decay parameters stricter, like origin:0, offset:0, scale:1 and decay:0.5, I'd expect the function_score to be close to 0, but it's still 1.
Please help. I've been trying everything and there doesn't seem to be a lot of examples online. Any suggestions would be welcomed.
For those hitting the same issue, I've finally found the culprit.
It seems that someone didn't setup the mappings properly at that the basics.meta property was set as a nested type, but since it wasn't populated as such (you'd think that would of caused an issue when indexing data?), when trying to access the data within it, it always returned MIN[0.0] because it simply couldn't find the value of the property.
So yeah, if you ever hit this issue, thoroughly look through your mappings instead of wasting a whole day like I did :|

How to get parent objects and nested children objects by parent and children ids

I have a parent object (blogpost) and nested items (comments)
The parent and nested objects are both keyed by an id
So if I fetch the parent object, i will get all the children too
GET /my_index/blogpost/1
{
"id": 1,
"title": "Nest eggs",
"body": "Making your money work...",
"tags": [ "cash", "shares" ],
"comments": [
{
"id": 2,
"comment": "Great article",
"age": 28,
"stars": 4,
"date": "2014-09-01"
},
{
"id": 4,
"comment": "More like this please",
"age": 31,
"stars": 5,
"date": "2014-10-22"
}
]
}
Question
However, I only want to fetch the parent and a subset of children based on children ids
e.g. My desired behaviour is this:
GET /my_index/blogpost/1?onlyGetCommentIds=4
{
"id": 1,
"title": "Nest eggs",
"body": "Making your money work...",
"tags": [ "cash", "shares" ],
"comments": [
{
"id": 4,
"comment": "More like this please",
"age": 31,
"stars": 5,
"date": "2014-10-22"
}
]
}
See in the above example that only comment id == 4 is returned along with the parent object.
How do I construct this query?
You need to use nested inner_hits in order to achieve what you want
POST /my_index/blogpost/_search
{
"_source": {
"exclude": "comments"
},
"query": {
"bool": {
"filter": [
{
"term": {
"id": 1
}
},
{
"nested": {
"path": "comments",
"query": {
"term": {
"comments.id": 4
}
},
"inner_hits": {}
}
}
]
}
}
}
The response will include the parent object (without the comments nested objects) and another inner_hits section containing only the desired nested comment with id = 4.

Elasticsearch query on inner list and get only matching objects from list instead of entire list in result document

In following elastic search documents need to find comments from specific name eg "Mary Brown". Basically query on inner list and get only matching objects from list instead of entire list in result document. Is it possible. I have defined nested as mapping for 'comments'
{
"title": "Investment secrets",
"body": "What they don't tell you ...",
"tags": [ "shares", "equities" ],
"comments": [
{
"name": "Mary Brown",
"comment": "Lies, lies, lies",
"age": 42,
"stars": 1,
"date": "2014-10-18"
},
{
"name": "John Smith",
"comment": "You're making it up!",
"age": 28,
"stars": 2,
"date": "2014-10-16"
},
{
"name": "Mary Brown",
"comment": "making it!!!",
"age": 42,
"stars": 3,
"date": "2014-10-20"
}
]
}
Since you have properly mapped your comments field as nested, then yes this is possible using inner_hits, like this:
{
"_source": false,
"query": {
"nested": {
"path": "comments",
"inner_hits": { <---- use inner_hits here
"_source": [
"comment", "date"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"comments.name": "Mary Brown"
}
}
]
}
}
}
}
}

Resources