Find all documnents with max value in field - elasticsearch

I am new in elasticsearch, it is my first NoSql DB and I have some problem.
I have something like this:
index_logs: [
{
"entity_id": id_1,
"field_name": "name",
"old_value": null
"new_value": "some_name"
"number": 1
},
{
"entity_id": id_1,
"field_name": "description",
"old_value": null
"new_value": "some_descr"
"number": 1
},
{
"entity_id": id_1,
"field_name": "description",
"old_value": "some_descr"
"new_value": null
"number": 2
},
{
"entity_id": id_2,
"field_name": "enabled",
"old_value": true
"new_value":false
"number": 25
},
]
And I need to find all documents with specified entity_id and max_number which I do not know.
In postgres it will be as following code:
SELECT *
FROM logs AS x
WHERE x.entity_id = <some_entity_id>
AND x.number = (SELECT MAX(y.number) FROM logs AS y WHERE y.entity_id = x.entity_id)
How can I do it in elasticsearch?

Use the following query:
{
"query": {
"term": {
"entity_id": {
"value": "1"
}
}
},
"aggs": {
"max_number": {
"terms": {
"field": "number",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"top_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
The aggregation will group by the "number", sort by descending order of number, then give you the top 10 results with that number, inside the 'top_hits' subaggregation.
Another way to get "All" the documents is to simply use the same query, withouty any aggregations, and sort descending on the "number" field. On the client side you use pagination with "search_after", and paginate all the results, till the "number" field changes. The first time the number changes, you exit your pagination loop and you have all the records with the given entity id and the max number.

Related

Bucket sort in composite aggregation?

How can I do Bucket Sort in composite Aggregation?
I need to do Composite Aggregation with Bucket sort.
I have tried Sort with aggregation.
I have tried composite aggregation.
I think this question, is in continuation to your previous question, so considered the same use case
You need to use Bucket sort aggregation that is a parent pipeline
aggregation which sorts the buckets of its parent multi-bucket
aggregation. And please refer to this documentation on composite
aggregation to know more about this.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings":{
"properties":{
"user":{
"type":"keyword"
},
"date":{
"type":"date"
}
}
}
}
Index Data:
{
"date": "2015-01-01",
"user": "user1"
}
{
"date": "2014-01-01",
"user": "user2"
}
{
"date": "2015-01-11",
"user": "user3"
}
Search Query:
The size parameter can be set to define how many composite buckets
should be returned. Each composite bucket is considered as a single
bucket, so setting a size of 10 will return the first 10 composite
buckets created from the values source. The response contains the
values for each composite bucket in an array containing the values
extracted from each value source. Defaults to 10.
{
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"size": 3, <-- note this
"sources": [
{
"product": {
"terms": {
"field": "user"
}
}
}
]
},
"aggs": {
"mySort": {
"bucket_sort": {
"sort": [
{
"sort_user": {
"order": "desc"
}
}
]
}
},
"sort_user": {
"min": {
"field": "date"
}
}
}
}
}
}
Search Result:
"aggregations": {
"my_buckets": {
"after_key": {
"product": "user3"
},
"buckets": [
{
"key": {
"product": "user3"
},
"doc_count": 1,
"sort_user": {
"value": 1.4209344E12,
"value_as_string": "2015-01-11T00:00:00.000Z"
}
},
{
"key": {
"product": "user1"
},
"doc_count": 1,
"sort_user": {
"value": 1.4200704E12,
"value_as_string": "2015-01-01T00:00:00.000Z"
}
},
{
"key": {
"product": "user2"
},
"doc_count": 1,
"sort_user": {
"value": 1.3885344E12,
"value_as_string": "2014-01-01T00:00:00.000Z"
}
}
]
}

Group by terms and get count of nested array property?

I would like to get the count from a document series where an array item matches some value.
I have documents like these:
{
"Name": "jason",
"Todos": [{
"State": "COMPLETED"
"Timer": 10
},{
"State": "PENDING"
"Timer": 5
}]
}
{
"Name": "jason",
"Todos": [{
"State": "COMPLETED"
"Timer": 5
},{
"State": "PENDING"
"Timer": 2
}]
}
{
"Name": "martin",
"Todos": [{
"State": "COMPLETED"
"Timer": 15
},{
"State": "PENDING"
"Timer": 10
}]
}
I would like to count how many documents I have where they have any Todos with COMPLETED State. And group by Name.
So from the above I would need to get:
jason: 2
martin: 1
Usually I do this with a term aggregation for the Name, and an other sub aggregation for other items:
"aggs": {
"statistics": {
"terms": {
"field": "Name"
},
"aggs": {
"test": {
"filter": {
"bool": {
"must": [{
"match_phrase": {
"SomeProperty.keyword": {
"query": "THEVALUE"
}
}
}
]
}
},
But not sure how to do this here as I have items in an array.
Elasticsearch has no problem with arrays because in fact it flattens them by default:
Arrays of inner object fields do not work the way you may expect. Lucene has no concept of inner objects, so Elasticsearch flattens object hierarchies into a simple list of field names and values.
So a query like the one you posted will do. I would use term query for keyword datatype, though:
POST mytodos/_search
{
"size": 0,
"aggs": {
"by name": {
"terms": {
"field": "Name"
},
"aggs": {
"how many completed": {
"filter": {
"term": {
"Todos.State": "COMPLETED"
}
}
}
}
}
}
}
I am assuming your mapping looks something like this:
PUT mytodos/_mappings
{
"properties": {
"Name": {
"type": "keyword"
},
"Todos": {
"properties": {
"State": {
"type": "keyword"
},
"Timer": {
"type": "integer"
}
}
}
}
}
The example documents that you posted will be transformed internally into something like this:
{
"Name": "jason",
"Todos.State": ["COMPLETED", "PENDING"],
"Todos.Timer": [10, 5]
}
However, if you need to query for Todos.State and Todos.Timer, for example, filter for those "COMPLETED" but only with Timer > 10, it will not be possible with such mapping because Elasticsearch forgets the link between fields of object array items.
In this case you would need to use something like nested datatype for such arrays, and query them with special nested query.
Hope that helps!

Elasticsearch sorting by array of objects

I have a column engagement like this along with other columns
record 1
"date":"2017-11-23T06:46:04.358Z",
"remarks": "test1",
"engagement": [
{
"name": "comment_count",
"value": 6
},
{
"name": "like_count",
"value": 2
}
],
....
....
record 2
"date":"2017-11-23T07:16:14.358Z",
"remarks": "test2",
"engagement": [
{
"name": "comment_count",
"value": 3
},
{
"name": "like_count",
"value": 9
}
],
....
....
I am storing objects in an array format, Now I want to sort the data by desc order of any given object name, e.g. value of like_count or value of share_count.
So if I sort by like_count then 2nd record should come before the 1st record as the value of like_count of the 2nd record is 9 compared to the value of like_count of the first record which is 2.
How to do this in elasticsearch?
You should have something like the following:
{
"query": {
"nested": {
"path": "engagement",
"filter": {
...somefilter...
}
}
},
"sort": {
"engagement.name": {
"order": "desc",
"mode": "min",
"nested_filter": {
...same.filter.as.before
}
}
}
}
Source: Elastic Docs

How to get the average count of missing field per document with Elasticsearch?

Shortly: with Elasticsearch, given a list of fields, how can I get the average number of missing fields per document as an aggregation?
Details
With the missing aggregation type I can get the total number of documents where a given field is missing. So with the following data:
"hits": [{
"name": "A name",
"nickname": "A nickname",
"bestfriend": "A friend",
"hobby": "An hobby"
},{
"name": "A name",
"hobby": "An hobby"
},{
"name": "A name",
"nickname": "A nickname",
"hobby": "An hobby"
},{
"name": "A name",
"bestfriend": "A friend"
}]
I can run the following query:
{
"aggs": {
"name_missing": {
"missing": {"field": "name"}
},
"nickname_missing": {
"missing": {"field": "nickname"}
},
"hobby_missing": {
"missing": {"field": "hobby"}
},
"bestfriend_missing": {
"missing": {"field": "bestfriend"}
}
}
}
And I get the following aggregations:
...
"aggregations": {
"name_missing": {
"doc_count": 0
},
"nickname_missing": {
"doc_count": 2
},
"hobby_missing": {
"doc_count": 1
},
"bestfriend_missing": {
"doc_count": 1
}
}
...
What I need now is to get the average number of missing fields for each document. I can just do the math by code on the results:
sum all the missing aggregations doc_count value
divide by the total number of hits
But how would you get the same result as an aggregation from Elasticsearch?
Thank you for any reply / suggestion.
This is an ugly solution but it does the trick.
GET missing/missing/_search
{
"size": 0,
"aggs": {
"result": {
"terms": {
"script": "'aaa'"
},
"aggs": {
"name_missing": {
"missing": {
"field": "name"
}
},
"nickname_missing": {
"missing": {
"field": "nickname"
}
},
"hobby_missing": {
"missing": {
"field": "hobby"
}
},
"bestfriend_missing": {
"missing": {
"field": "bestfriend"
}
},
"avg_missing": {
"bucket_script": {
"buckets_path": { // This is kind of defining variables. name_missing._count will take the doc_count of the name_missing aggregation and same for others(nickname_missing,hobby_missing,bestfriend_missing) as well. "count":"_count" will take doc_count of the documents on which aggregation is performed(total no. of Hits).
"name_missing": "name_missing._count",
"nickname_missing": "nickname_missing._count",
"hobby_missing": "hobby_missing._count",
"bestfriend_missing": "bestfriend_missing._count",
"count":"_count"
},
"script": "(name_missing+nickname_missing+hobby_missing+bestfriend_missing)/count" // Here we are adding all the missing values and dividing it by the total no. of Hits as you require.
}
}
}
}
}
}
I've shown you how to do it, now its on you how you want to massage your parameters and extract what you intend to.

Elasticsearch group and order by nested field's min value

I've got a structure of products which are available in different stores with different prices:
[{
"name": "SomeProduct",
"store_prices": [
{
"store": "FooStore1",
"price": 123.45
},
{
"store": "FooStore2",
"price": 345.67
}
]
},{
"name": "OtherProduct",
"store_prices": [
{
"store": "FooStore1",
"price": 456.78
},
{
"store": "FooStore2",
"price": 234.56
}
]
}]
I want to show a list of products, ordered by the lowest price ascending, limited to 10 results, in this way:
SomeProduct: 123.45 USD
OtherProduct: 234.56 USD
How to do this? I've tried the nested aggregation approach described in https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html but it only returns the min price of all products, not the respective min price for each product:
{
"_source": [
"name",
"store_prices.price"
],
"query": {
"match_all": {}
},
"sort": {
"store_prices.price": "asc"
},
"aggs": {
"stores": {
"nested": {
"path": "store_prices"
},
"aggs": {
"min_price": {"min": {"field": "store_prices.price"}}
}
}
},
"from": 0,
"size": 10
}
In SQL, what I want to do could be described using the following query. I'm afraid I'm thinking too much "in sql":
SELECT
p.name,
MIN(s.price) AS price
FROM
products p
INNER JOIN
store_prices s ON s.product_id = p.id
GROUP BY
p.id
ORDER BY
price ASC
LIMIT 10
You need a nested sorting:
{
"query": // HERE YOUR QUERY,
"sort": {
"store_prices.price": {
"order" : "asc",
"nested_path" : "store_prices",
"nested_filter": {
// HERE THE FILTERS WHICH ARE EVENTUALLY
// FILTERING OUT SOME OF YOUR STORES
}
}
}
}
Pay attention that you have to repeat the eventual nested queries inside the nested filter field. You find then the price in the score field of the response.

Resources