Elasticsearch sorting by array of objects - elasticsearch

I have a column engagement like this along with other columns
record 1
"date":"2017-11-23T06:46:04.358Z",
"remarks": "test1",
"engagement": [
{
"name": "comment_count",
"value": 6
},
{
"name": "like_count",
"value": 2
}
],
....
....
record 2
"date":"2017-11-23T07:16:14.358Z",
"remarks": "test2",
"engagement": [
{
"name": "comment_count",
"value": 3
},
{
"name": "like_count",
"value": 9
}
],
....
....
I am storing objects in an array format, Now I want to sort the data by desc order of any given object name, e.g. value of like_count or value of share_count.
So if I sort by like_count then 2nd record should come before the 1st record as the value of like_count of the 2nd record is 9 compared to the value of like_count of the first record which is 2.
How to do this in elasticsearch?

You should have something like the following:
{
"query": {
"nested": {
"path": "engagement",
"filter": {
...somefilter...
}
}
},
"sort": {
"engagement.name": {
"order": "desc",
"mode": "min",
"nested_filter": {
...same.filter.as.before
}
}
}
}
Source: Elastic Docs

Related

Find all documnents with max value in field

I am new in elasticsearch, it is my first NoSql DB and I have some problem.
I have something like this:
index_logs: [
{
"entity_id": id_1,
"field_name": "name",
"old_value": null
"new_value": "some_name"
"number": 1
},
{
"entity_id": id_1,
"field_name": "description",
"old_value": null
"new_value": "some_descr"
"number": 1
},
{
"entity_id": id_1,
"field_name": "description",
"old_value": "some_descr"
"new_value": null
"number": 2
},
{
"entity_id": id_2,
"field_name": "enabled",
"old_value": true
"new_value":false
"number": 25
},
]
And I need to find all documents with specified entity_id and max_number which I do not know.
In postgres it will be as following code:
SELECT *
FROM logs AS x
WHERE x.entity_id = <some_entity_id>
AND x.number = (SELECT MAX(y.number) FROM logs AS y WHERE y.entity_id = x.entity_id)
How can I do it in elasticsearch?
Use the following query:
{
"query": {
"term": {
"entity_id": {
"value": "1"
}
}
},
"aggs": {
"max_number": {
"terms": {
"field": "number",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"top_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
The aggregation will group by the "number", sort by descending order of number, then give you the top 10 results with that number, inside the 'top_hits' subaggregation.
Another way to get "All" the documents is to simply use the same query, withouty any aggregations, and sort descending on the "number" field. On the client side you use pagination with "search_after", and paginate all the results, till the "number" field changes. The first time the number changes, you exit your pagination loop and you have all the records with the given entity id and the max number.

Group by terms and get count of nested array property?

I would like to get the count from a document series where an array item matches some value.
I have documents like these:
{
"Name": "jason",
"Todos": [{
"State": "COMPLETED"
"Timer": 10
},{
"State": "PENDING"
"Timer": 5
}]
}
{
"Name": "jason",
"Todos": [{
"State": "COMPLETED"
"Timer": 5
},{
"State": "PENDING"
"Timer": 2
}]
}
{
"Name": "martin",
"Todos": [{
"State": "COMPLETED"
"Timer": 15
},{
"State": "PENDING"
"Timer": 10
}]
}
I would like to count how many documents I have where they have any Todos with COMPLETED State. And group by Name.
So from the above I would need to get:
jason: 2
martin: 1
Usually I do this with a term aggregation for the Name, and an other sub aggregation for other items:
"aggs": {
"statistics": {
"terms": {
"field": "Name"
},
"aggs": {
"test": {
"filter": {
"bool": {
"must": [{
"match_phrase": {
"SomeProperty.keyword": {
"query": "THEVALUE"
}
}
}
]
}
},
But not sure how to do this here as I have items in an array.
Elasticsearch has no problem with arrays because in fact it flattens them by default:
Arrays of inner object fields do not work the way you may expect. Lucene has no concept of inner objects, so Elasticsearch flattens object hierarchies into a simple list of field names and values.
So a query like the one you posted will do. I would use term query for keyword datatype, though:
POST mytodos/_search
{
"size": 0,
"aggs": {
"by name": {
"terms": {
"field": "Name"
},
"aggs": {
"how many completed": {
"filter": {
"term": {
"Todos.State": "COMPLETED"
}
}
}
}
}
}
}
I am assuming your mapping looks something like this:
PUT mytodos/_mappings
{
"properties": {
"Name": {
"type": "keyword"
},
"Todos": {
"properties": {
"State": {
"type": "keyword"
},
"Timer": {
"type": "integer"
}
}
}
}
}
The example documents that you posted will be transformed internally into something like this:
{
"Name": "jason",
"Todos.State": ["COMPLETED", "PENDING"],
"Todos.Timer": [10, 5]
}
However, if you need to query for Todos.State and Todos.Timer, for example, filter for those "COMPLETED" but only with Timer > 10, it will not be possible with such mapping because Elasticsearch forgets the link between fields of object array items.
In this case you would need to use something like nested datatype for such arrays, and query them with special nested query.
Hope that helps!

Elasticsearch query fails to return results when querying a nested object

I have an object which looks something like this:
{
"id": 123,
"language_id": 1,
"label": "Pablo de la Pena",
"office": {
"count": 2,
"data": [
{
"id": 1234,
"is_office_lead": false,
"office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
},
{
"id": 5678,
"is_office_lead": false,
"office": {
"id": 2,
"address_line_1": "77 High Road",
"address_line_2": "Edinburgh",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "EH1 2DE",
"city_id": 2
}
}
]
},
"primary_office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
}
My Elasticsearch mapping looks like this:
"mappings": {
"item": {
"properties": {
"office": {
"properties": {
"data": {
"type": "nested",
}
}
}
}
}
}
My Elasticsearch query looks something like this:
GET consultant/item/_search
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"term": {
"language_id": 1
}
},
{
"term": {
"office.data.office.city_id": 1
}
}
]
}
}
}
This returns zero results, however, if I remove the second term and leave it only with the language_id clause, then it works as expected.
I'm sure this is down to a misunderstading on my part of how the nested object is flattened, but I'm out of ideas - I've tried all kinds of permutations of the query and mappings.
Any guidance hugely appreciated. I am using Elasticsearch 6.1.1.
I'm not sure if you need the entire record or not, this solution gives every record that has language_id: 1 and has an office.data.office.id: 1 value.
GET consultant/item/_search
{
"from": 0,
"size": 100,
"query": {
"bool":{
"must": [
{
"term": {
"language_id": {
"value": 1
}
}
},
{
"nested": {
"path": "office.data",
"query": {
"match": {
"office.data.office.city_id": 1
}
}
}
}
]
}
}
}
I put 3 different records in my test index for proofing against false hits, one with different language_id and one with different office ids and only the matching one returned.
If you only need the office data, then that's a bit different but still solvable.

Extract record from multiple arrays based on a filter

I have documents in ElasticSearch with the following structure :
"_source": {
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price": [
"€ 139",
"€ 125",
"€ 120",
"€ 108"
],
"max_occupancy": [
2,
2,
1,
1
],
"type": [
"Type 1",
"Type 1 - (Tag)",
"Type 2",
"Type 2 (Tag)",
],
"availability": [
10,
10,
10,
10
],
"size": [
"26 m²",
"35 m²",
"47 m²",
"31 m²"
]
}
}
Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price: ": "€ 125",
"max_occupancy": "2",
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
}
Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").
Is it possible to extract such a result from elastic search? What kind of query do I need to perform?
Could someone suggest the best approach?
My best approach: go nested with Nested Datatype
Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.
Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.
How would the mapping is going to look like? something like this:
{
"mappings": {
"properties": {
"last_updated": {
"type": "date"
},
"country": {
"type": "string"
},
"records": {
"type": "nested",
"properties": {
"price": {
"type": "string"
},
"max_occupancy": {
"type": "long"
},
"type": {
"type": "string"
},
"availability": {
"type": "long"
},
"size": {
"type": "string"
}
}
}
}
}
}
EDIT: New document structure (containing nested documents) -
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"records": [
{
"price": "€ 139",
"max_occupancy": 2,
"type": "Type 1",
"availability": 10,
"size": "26 m²"
},
{
"price": "€ 125",
"max_occupancy": 2,
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
},
{
"price": "€ 120",
"max_occupancy": 1,
"type": "Type 2",
"availability": 10,
"size": "47 m²"
},
{
"price": "€ 108",
"max_occupancy": 1,
"type": "Type 2 (Tag)",
"availability": 10,
"size": "31 m²"
}
]
}
Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:
{
"_source": [
"last_updated",
"country"
],
"query": {
"bool": {
"must": [
{
"term": {
"country": "Italia"
}
},
{
"nested": {
"path": "records",
"query": {
"bool": {
"must": [
{
"range": {
"records.max_occupancy": {
"gte": 2
}
}
}
]
}
},
"inner_hits": {
"sort": {
"records.price": "asc"
},
"size": 1
}
}
}
]
}
}
}
Conditions are: Italia AND max_occupancy > 2.
Inner hits: sort by price ascending order and get the first result.
Hope you'll find it useful

Elastic Search Grouped Queries

I'm indexing an array of key value pairs. The key is always a UUID and the value is a user entered value. I've been crawling through the documentation but I can't figure out exactly how to query in this scenarioExample schema:
{
"id": 1,
"owner_id": 1,
"values": [
{ "key": "k3kfa23rewf", "value": "the red card" },
{ "key": "23a2dd23108", "value": "purple balloons" },
]
},
{
"id": 2,
"owner_id": 1,
"values": [
{ "key": "k3kfa23rewf", "value": "the blue card" },
{ "key": "23a2dd23108", "value": "purple balloons" },
]
}
I would like to query:
{ "term": { "owner_id": 1 },
{ "term": { "values.key": "23a2dd23108" }, "match": { "values.value": "purple" } },
{ "term": { "values.key": "k3kfa23rewf" }, "match": { "values.value": "blue" } }
So that the record with ID 2 is returned. Any suggestions?
I think that you need here to use nested documents.
That way, you will be able to create BoolQueries, with a Must clause with a TermQuery on owner_id and two must clauses with nested queries with Term and Match queries on values.key and values.value.
Does it help?

Resources