Elastic search aggregation sum - elasticsearch

Im using elasticsearch 1.0.2 and I want to perform a search on it using a query with aggregation functions like sum()
Suppose my single record data is something like that
{
"_index": "outboxpro",
"_type": "message",
"_id": "PAyEom_mRgytIxRUCdN0-w",
"_score": 4.5409594,
"_source": {
"team_id": "1bf5f3f968e36336c9164290171211f3",
"created_user": "1a9d05586a8dc3f29b4c8147997391f9",
"created_ip": "192.168.2.245",
"folder": 1,
"report": [
{
"networks": "ec466c09fd62993ade48c6c4bb8d2da7facebook",
"status": 2,
"info": "OK"
},
{
"networks": "bdc33d8ca941b8f00c2a4e046ba44761twitter",
"status": 2,
"info": "OK"
},
{
"networks": "ad2672a2361d10eacf8a05bd1b10d4d8linkedin",
"status": 5,
"info": "[unauthorized] Invalid or expired token."
}
]
}
}
Let's say I need to fetch the count of all success messages posted with status = 2 in report field. There will be many record in the collection. I want to take report of all success messages posted.
I have tried the following code
////////////// Edit
{
"size": 2000,
"query": {
"filtered": {
"query": {
"match": {
"team_id": {
"query": "1bf5f3f968e36336c9164290171211f3"
}
}
}
}
},
"aggs": {
"genders": {
"terms": {
"field": "report.status"
}
}
}
}
Please help me to find some solution. Am newbie in elastic search. Is there any other aggregation method to find this one ?. Your help i much appreciate.

Your script filter is slow on big data and doesn't use benefits of "indexing". Did you think about parent/child instead of nested? If you use parent/child - you could use aggregations natively and use calculate sum.

You will have to make use of nested mappings here. Do have a look at https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-mapping.html.
And then you will have to do aggregation on nested fields as in https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html.

Related

Speed up Elasticsearch more like this query

Whats wrong with this more like this query? It was written from scratch. It returns relevant results, but it is too slow (this example took 187.9 ms)
{
"query": {
"bool": {
"must": [{
"more_like_this": {
"fields": ["similarity.analyzed"],
"like": [{
"_id": 4
}, {
"_id": 550
}, {
"_id": 757
}],
"min_term_freq": 1,
"min_doc_freq": 1,
"analyzer": "searchkick_search2",
"minimum_should_match": "10%"
}
}, {
"range": {
"count_posts": {
"gt": 0
}
}
}],
"must_not": [{
"terms": {
"_id": [4, 550, 757]
}
}]
}
},
"size": 10
}
This query finds similar tags to given tags set.
similarity - text field, with all posts titles, joined with space.
count_posts - numeric field, which contains number of posts if each tag.
Running Elasticseach 7.8.0 on Ubuntu 18.04 as single node. Rails 5 app with Searchkick gem.
Whats wrong with this more like this query?
"like": [{
"_id": 4
}, {
"_id": 550
}, {
"_id": 757
}]
It acts like multi get API. It does the below things.
Get's all the documents mentioned by _id in like
Analyse the field using analyser option ptovided
Analyse the same fields from the matching docs of step1. List of tokenizer,s filters also adds some ms.
Calculate doc, term frequencies along with min match.
And you have two more conditions. Documentation says
A more complicated use case consists of mixing texts with documents already existing in the index.
Unfortunately, I don't think this can be optimised further. But you can add a text instead id in like to make it much better. Hope the query is not always taking more than 100ms due to caching.

Pagination with specific search type on ElasticSearch

We are currently using ElasticSearch 6.7 and have a huge amount of data making some request taking too much time.
To avoid this problem, we want to set up pagination within our research towards elasticsearch. The problem is that I can't put one of the pagination methods proposed by ES on the different requests that already exist.
For example, this request contains different aggregations and a query:
https://github.com/trackit/trackit/blob/master/usageReports/lambda/es_request_constructor.go#L61-L75
In addition, the results are sorted after the information is collected.
I tried to set up the Search After method as well as a form of pagination using from & size.
Scroll doesn't works with aggregations and composite aggregation doesn't accept query.
So, there is any good way to do pagination in ElasticSearch combined with other request type and how to do it with the example above?
composite aggregation doesn't accept query
It does accept query. In the example below, the results are filtered based on play_name. The aggregation only get applied to the result of the query and it can be paginated using the after option.
{
"query": {
"term": {
"play_name": "A Winters Tale"
}
},
"size": 0,
"aggs": {
"speaker": {
"composite": {
"after": {
"product": "FLORIZEL"
},
"sources": [
{
"product": {
"terms": {
"field": "speaker"
}
}
}
]
},
"aggs": {
"speech_number": {
"terms": {
"field": "speech_number"
},
"aggs": {
"line_id": {
"terms": {
"field": "line_id"
}
}
}
}
}
}
}
}

Elasticsearch ordering by field value which is not in the filter

can somebody help me please to make a query which will order result items according some field value if this field is not part of query in request. I have a query:
{
"_source": [
"ico",
"name",
"city",
"status"
],
"sort": {
"_score": "desc",
"status": "asc"
},
"size": 20,
"query": {
"bool": {
"should": [
{
"match": {
"normalized": {
"query": "idona",
"analyzer": "standard",
"boost": 3
}
}
},
{
"term": {
"normalized2": {
"value": "idona",
"boost": 2
}
}
},
{
"match": {
"normalized": "idona"
}
}
]
}
}
}
The result is sorted according field status alphabetically ascending. Status contains few values like [active, canceled, old....] and I need something like boosting for every possible values in query. E.g. active boost 5, canceled boost 4, old boost 3 ........... Is it possible to do it? Thanks.
You would need a custom sort using script to achieve what you want.
I've just made use of generic match_all query for my query, you can probably go ahead and add your query logic there, but the solution that you are looking for is in the sort section of the below query.
Make sure that status is a keyword type
Custom Sorting Based on Values
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":[
{ "_score": "desc" },
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"if(params.scores.containsKey(doc['status'].value)) { return params.scores[doc['status'].value];} return 100000;",
"params":{
"scores":{
"active":5,
"old":4,
"cancelled":3
}
}
},
"order":"desc"
}
}
]
}
In the above query, go ahead and add the values in the scores section of the query. For e.g. if your value is new and you want it to be at say value 2, then your scores would be in the below:
{
"scores":{
"active":5,
"old":4,
"cancelled":3,
"new":6
}
}
So basically the documents would first get sorted by _score and then on that sorted documents, the script sort would be executed.
Note that the script sort is desc by nature as I understand that you would want to show active documents at the top, followed by other values. Feel free to play around with it.
Hope this helps!

How are the documents ordered in Elasticsearch if the sort value for two documents is same?

I was working with products data, here: link
The search query that sort by keyword field tags using max mode is as follows.
GET product/_doc/_search
{
"size":100,"from":20,"_source":["tags", "name"],
"query": {
"match_all": {}
},
"sort": [
{"tags":{
"order":"desc",
"mode":"max"
}}
]
}
Some documents have same sort value. I had read somewhere that if the sort value is same, it arranges by internal doc id (_id). However, the case does not seem so. See screenshot below:
First _id: 961 followed by _id:972 (fine). However, then came _id: 114. I am not understanding how it got random.
Help will be appreciated.
As you have already seen, its random. To overcome this you can add another field to be used to sort when the sorting value for first field is same. As you want to use _id the query will be then as follows:
{
"size": 100,
"from": 20,
"_source": [
"tags",
"name"
],
"query": {
"match_all": {}
},
"sort": [
{
"tags": {
"order": "desc",
"mode": "max"
}
},
{
"_id": "asc"
}
]
}

Issue in Elastic Search Sum Aggregation

I am trying to the example from the elastic search site with my own parameters, but it is not working.
Query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"activity_date": {
"from": "2013-11-01",
"to": "2014-11-01"
}
}
}
}
},
"aggs": {
"net_ordered_units": {
"sum": {
"field": "net_ordered_units"
}
}
}
}
Error I get:
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[YoGKlejVTC6jhg_OgPWXyTg][test][0]: SearchParseException[[test][0]: query[ConstantScore(cache(activity_date:[1383264000000 TO 1414886399999]))],from[-1],size[-1]: Parse Failure [Failed to parse source [{\"query\": {\"filtered\":{\"query\":{\"match_all\":{}},\"filter\":{\"range\":{\"activity_date\":{\"from\":\"2013-11-01\",\"to\":\"2014-11-01\"}}}}},\"aggs\":{\"net_ordered_units\":{\"sum\": {\"field\":\"net_ordered_units\"}}}}]]]; nested: SearchParseException[[test][0]: query[ConstantScore(cache(activity_date:[1383264000000 TO 1414886399999]))],from[-1],size[-1]: Parse Failure [No parser for element [aggs]]]; }]",
"status": 400
}
What is shard failure here? And it says no parser for aggs, what should I do here?
Basically, I need to perform operations like sum and then find the max out of it.
How should I modify the above code to get that?
I think your plugin (which you use to perform the CURL based elastic-search queries) is not able to parse the "aggs" tag. I use the Marvel Sense plugin (http://www.elasticsearch.org/guide/en/marvel/current/) specifically for ES queries and your query works fine ! I did a test on Postman ( a RESTful Chrome Plugin) and guess what, nothing wrong with your query... So try switching your plugin and see if that helps.
Updated:
To answer the second part of your question,
curl -s -XPOST your_ES_server/ES_index/url_to_query -d
'{"query":
{"bool":
{
"must": [{
"wildcard" : { "item_id" : "*" }
}]
}
},
"facets" : {
"facet_result":
{"terms":{
"fields":["item_count"]
}}
}
Gotcha, Actually the above query doesn't fetch you the maximum count of a specific field key but lists you all the field keys sorted by their count in descending order(by default). So naturally the top most term should be what you are looking for. The response to the above query looks as follows.
"facets": {
"facet_result": {
"_type": "terms",
"missing": 0,
"total": 35,
"other": 0,
"terms": [
{
"term": 0,
"count": 34
},
{
"term": 2,
"count": 1
}
]
}
}
This might not be a clean solution but can help you retrieve the max(sum) of a key. For more info on ordering, refer http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/search-facets-terms-facet.html#_ordering

Resources