elasticsearch sort nested_filter not matching nested_path - sorting

I have an index with items of this form
{
"_index": "identity-index",
"_source": {
"names": [
"test"
],
"private": {
"lists": [
{
"listId": "56b8a0197f3c56654f8751b5",
"ratings": [
{
"rating": 4,
"authorId": "56499b7a97e3aa857cdc4f1d"
},
{
"rating": 4,
"authorId": "56b36646a24d50866de77928"
},
{
"rating": 4,
"authorId": "56cb16005082871b33ab1a60"
},
{
"rating": 4,
"authorId": "56b216a4c28edca956fe96d4"
},
{
"rating": 4,
"authorId": "56b34e8d8e324180259252f7"
}
]
},
{
"listId": "56c1c508da49cdd9662b102c"
}
]
}
},
"sort": [
"-Infinity"
]
}
I want to sort them by average rating given a listId:
I've tried a lot of ways and the closest I got was with this:
"sort": {
"private.lists.ratings.rating": {
"missing": "_last",
"order": "desc",
"mode": "avg",
"nested_path": "private.lists.ratings",
"nested_filter": {
"term": {
"private.lists.listId": "56c1c508da49cdd9662b102c"
}
}
}
},
The problem is that this scores everything as -Inf. I can't find any way to sort the nested elements in private.lists.ratings but taking into account the filter by private.lists.listId. The nested_path and nested_filter fields are different and I don't think they are supposed to be.

If the ratings field is analyzed with type nested, you can get what you want by copying the listId in each of the nested objects.
Unfortunately, nested objects are not part of the main document, and nested_filter (and nested_sort) can only disambiguate based on properties contained in each subdocument.
One solution could be to flatten your structure to a simple list of objects looking like the following
{
"listId": "56b8a0197f3c56654f8751b5",
"rating": 4,
"authorId": "56499b7a97e3aa857cdc4f1d"
}

Related

Filter documents out of the facet count in enterprise search

We use enterprise search indexes to store items that can be tagged by multiple tenants.
e.g
[
{
"id": 1,
"name": "document 1",
"tags": [
{ "company_id": 1, "tag_id": 1, "tag_name": "bla" },
{ "company_id": 2, "tag_id": 1, "tag_name": "bla" }
]
}
]
I'm looking to find a way to retrieve all documents with only the tags of company 1
This request:
{
"query": "",
"facets": {
"tags": {
"type": "value"
}
},
"sort": {
"created": "desc"
},
"page": {
"size": 20,
"current": 1
}
}
Is coming back with
...
"facets": {
"tags": [
{
"type": "value",
"data": [
{
"value": "{\"company_id\":1,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
},
{
"value": "{\"company_id\":2,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
}
]
}
],
}
...
Can I modify the request in a way such that I get no tags by "company_id" = 2 ?
I have a solution that involves modifying the results to strip the extra data after they are retrieved but I'm looking for a better solution.

Sorting by a nested field in elasticsearch

If I had a data structure that looked like this
[{"_id" 1
"scores" [{"student_id": 1, "score": 100"}, {"student_id": 2, "score": 80"}
]},
{"_id" 2
"scores" [{"student_id": 1, "score": 20"}, {"student_id": 2, "score": 90"}
]}]
Would it be possible to sort this dataset by student_1's score or by student_2's score?
For example if I sorted descending by student 1's score, I would get document 1,2, but if I sorted descending by student 2's score, I would get 2,1.
I could re-arrange the data, but I don't want to use another index because there's a bunch of metadata not included above for brevity. Thanks!
Yes, it is possible. You must use "nested" field type for your scores, that way you can keep the relation between each student_id and its score.
You can read an article I wrote about that subject:
https://opster.com/guides/elasticsearch/data-architecture/elasticsearch-nested-field-object-field/
Now the example:
Mappings
PUT test_students
{
"mappings": {
"properties": {
"scores": {
"type": "nested",
"properties": {
"student_id": {
"type": "keyword"
},
"score": {
"type": "long"
}
}
}
}
}
}
Documents
PUT test_students/_doc/1
{
"scores": [{"student_id": 1, "score": 100}, {"student_id": 2, "score": 80}]
}
PUT test_students/_doc/2
{
"scores": [{"student_id": 1, "score": 20}, {"student_id": 2, "score": 90}]
}
Query
POST test_students/_search
{
"sort" : [
{
"scores.score" : {
"mode" : "max",
"order" : "desc",
"nested": {
"path": "scores",
"filter": {
"term" : { "scores.student_id" : "2" }
}
}
}
}
]
}

can I filter an array in elastic?

I had to insert a huge amount of data into elastic and I have done it in the following manner.
I need to query this object but I am unable to filter the "logData" array. Can someone help me out here ? is it even possible to filter an array in elastic?
"_source":{
"FileName": "fileName.log"
"logData": [
{
"LineNumber": 1,
"Data": "data1"
},
{
"LineNumber": 2,
"Data": "Data2"
},
{
"LineNumber": 3,
"Data": "Data3"
},
{
"LineNumber": 4,
"Data": "Data4"
},
{
"LineNumber": 5,
"Data": "Data5"
},
{
"LineNumber": 6,
"Data": "Data6"
}
]}
Is there a way to query such that I get only few items from this array ?
like:
"_source":{
"FileName": "fileName.log"
"logData": [
{
"LineNumber": 1,
"Data": "data1"
},
{
"LineNumber": 2,
"Data": "Data2"
},
{
"LineNumber": 3,
"Data": "Data3"
}
]
}
There's no dedicated array mapping type in ES.
With that being said, when you have an array of objects with shared keys, it's recommended that you use the nested field type to preserve the connections of the individual sub-objects' attributes. If you don't use nested, the objects will be flattened which may lead to seemingly wrong query results.
As to the actual query -- assuming your mapping looks something like this:
PUT logs_index
{
"mappings": {
"properties": {
"logData": {
"type": "nested"
}
}
}
}
you'll need to filter those logData sub-documents of interest, perhaps with a terms_query. Then and only then can you extract only those array objects that've matched this query (lineNumber: 1 or 2 or 3).
The technique for that is called inner_hits:
POST logs/_search
{
"_source": ["FileName", "inner_hits.logData"],
"query": {
"nested": {
"path": "logData",
"query": {
"terms": {
"logData.LineNumber": [
1,
2,
3
]
}
},
"inner_hits": {}
}
}
}
Check this thread for more info.

Elasticsearch - How does one combine term suggestions from multiple fields?

The term suggester documentation lays out the basics of term suggester, but it leaves me wondering how I can find suggestions from multiple fields and combine them. I can probably come up with some implementation after-the-fact, but I'm wondering if there are some settings I'm missing.
For example, let's say I want to get suggestions from three different fields
GET product-search-product/_search
{
"suggest": {
"text": "som typu here",
"my-suggest-1": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_one"
}
},
"my-suggest-2": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_two"
}
},
"my-suggest-3": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_three"
}
}
}
}
This returns results I can use, but I have to figure out which field had the "best" suggestion.
"suggest": {
"my-suggest-1": [
{
"text": "som",
...
"options": [
{
"text": "somi"
...
}
]
},
{
"text": "typu",
...
"options": [
{
"text": "typo"
...
}
]
},
{
"text": "here",
...
"options": []
}
],
"my-suggest-2": [
{
"text": "som",
...
"options": [
{
"text": "some"
...
}
]
},
{
"text": "typu",
...
"options": []
},
{
"text": "here",
...
"options": []
}
],
"my-suggest-3": [
{
"text": "som",
...
"options": []
},
{
"text": "typu",
...
"options": [
{
"text": "typa"
...
}
]
},
{
"text": "here",
...
"options": []
}
]
}
It looks to me as if I have to implement something to determine which field came up with the best suggestions. Is there no way to combine these in the suggester so it can do that for me?
Phrase suggester was appropriate for my case and with the phrase suggester there exist candidate generators which appear to solve my problem.

Extract record from multiple arrays based on a filter

I have documents in ElasticSearch with the following structure :
"_source": {
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price": [
"€ 139",
"€ 125",
"€ 120",
"€ 108"
],
"max_occupancy": [
2,
2,
1,
1
],
"type": [
"Type 1",
"Type 1 - (Tag)",
"Type 2",
"Type 2 (Tag)",
],
"availability": [
10,
10,
10,
10
],
"size": [
"26 m²",
"35 m²",
"47 m²",
"31 m²"
]
}
}
Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price: ": "€ 125",
"max_occupancy": "2",
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
}
Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").
Is it possible to extract such a result from elastic search? What kind of query do I need to perform?
Could someone suggest the best approach?
My best approach: go nested with Nested Datatype
Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.
Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.
How would the mapping is going to look like? something like this:
{
"mappings": {
"properties": {
"last_updated": {
"type": "date"
},
"country": {
"type": "string"
},
"records": {
"type": "nested",
"properties": {
"price": {
"type": "string"
},
"max_occupancy": {
"type": "long"
},
"type": {
"type": "string"
},
"availability": {
"type": "long"
},
"size": {
"type": "string"
}
}
}
}
}
}
EDIT: New document structure (containing nested documents) -
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"records": [
{
"price": "€ 139",
"max_occupancy": 2,
"type": "Type 1",
"availability": 10,
"size": "26 m²"
},
{
"price": "€ 125",
"max_occupancy": 2,
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
},
{
"price": "€ 120",
"max_occupancy": 1,
"type": "Type 2",
"availability": 10,
"size": "47 m²"
},
{
"price": "€ 108",
"max_occupancy": 1,
"type": "Type 2 (Tag)",
"availability": 10,
"size": "31 m²"
}
]
}
Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:
{
"_source": [
"last_updated",
"country"
],
"query": {
"bool": {
"must": [
{
"term": {
"country": "Italia"
}
},
{
"nested": {
"path": "records",
"query": {
"bool": {
"must": [
{
"range": {
"records.max_occupancy": {
"gte": 2
}
}
}
]
}
},
"inner_hits": {
"sort": {
"records.price": "asc"
},
"size": 1
}
}
}
]
}
}
}
Conditions are: Italia AND max_occupancy > 2.
Inner hits: sort by price ascending order and get the first result.
Hope you'll find it useful

Resources