Issue in Elastic Search Sum Aggregation - elasticsearch

I am trying to the example from the elastic search site with my own parameters, but it is not working.
Query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"activity_date": {
"from": "2013-11-01",
"to": "2014-11-01"
}
}
}
}
},
"aggs": {
"net_ordered_units": {
"sum": {
"field": "net_ordered_units"
}
}
}
}
Error I get:
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[YoGKlejVTC6jhg_OgPWXyTg][test][0]: SearchParseException[[test][0]: query[ConstantScore(cache(activity_date:[1383264000000 TO 1414886399999]))],from[-1],size[-1]: Parse Failure [Failed to parse source [{\"query\": {\"filtered\":{\"query\":{\"match_all\":{}},\"filter\":{\"range\":{\"activity_date\":{\"from\":\"2013-11-01\",\"to\":\"2014-11-01\"}}}}},\"aggs\":{\"net_ordered_units\":{\"sum\": {\"field\":\"net_ordered_units\"}}}}]]]; nested: SearchParseException[[test][0]: query[ConstantScore(cache(activity_date:[1383264000000 TO 1414886399999]))],from[-1],size[-1]: Parse Failure [No parser for element [aggs]]]; }]",
"status": 400
}
What is shard failure here? And it says no parser for aggs, what should I do here?
Basically, I need to perform operations like sum and then find the max out of it.
How should I modify the above code to get that?

I think your plugin (which you use to perform the CURL based elastic-search queries) is not able to parse the "aggs" tag. I use the Marvel Sense plugin (http://www.elasticsearch.org/guide/en/marvel/current/) specifically for ES queries and your query works fine ! I did a test on Postman ( a RESTful Chrome Plugin) and guess what, nothing wrong with your query... So try switching your plugin and see if that helps.
Updated:
To answer the second part of your question,
curl -s -XPOST your_ES_server/ES_index/url_to_query -d
'{"query":
{"bool":
{
"must": [{
"wildcard" : { "item_id" : "*" }
}]
}
},
"facets" : {
"facet_result":
{"terms":{
"fields":["item_count"]
}}
}
Gotcha, Actually the above query doesn't fetch you the maximum count of a specific field key but lists you all the field keys sorted by their count in descending order(by default). So naturally the top most term should be what you are looking for. The response to the above query looks as follows.
"facets": {
"facet_result": {
"_type": "terms",
"missing": 0,
"total": 35,
"other": 0,
"terms": [
{
"term": 0,
"count": 34
},
{
"term": 2,
"count": 1
}
]
}
}
This might not be a clean solution but can help you retrieve the max(sum) of a key. For more info on ordering, refer http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/search-facets-terms-facet.html#_ordering

Related

Elasticsearch ordering by field value which is not in the filter

can somebody help me please to make a query which will order result items according some field value if this field is not part of query in request. I have a query:
{
"_source": [
"ico",
"name",
"city",
"status"
],
"sort": {
"_score": "desc",
"status": "asc"
},
"size": 20,
"query": {
"bool": {
"should": [
{
"match": {
"normalized": {
"query": "idona",
"analyzer": "standard",
"boost": 3
}
}
},
{
"term": {
"normalized2": {
"value": "idona",
"boost": 2
}
}
},
{
"match": {
"normalized": "idona"
}
}
]
}
}
}
The result is sorted according field status alphabetically ascending. Status contains few values like [active, canceled, old....] and I need something like boosting for every possible values in query. E.g. active boost 5, canceled boost 4, old boost 3 ........... Is it possible to do it? Thanks.
You would need a custom sort using script to achieve what you want.
I've just made use of generic match_all query for my query, you can probably go ahead and add your query logic there, but the solution that you are looking for is in the sort section of the below query.
Make sure that status is a keyword type
Custom Sorting Based on Values
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":[
{ "_score": "desc" },
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"if(params.scores.containsKey(doc['status'].value)) { return params.scores[doc['status'].value];} return 100000;",
"params":{
"scores":{
"active":5,
"old":4,
"cancelled":3
}
}
},
"order":"desc"
}
}
]
}
In the above query, go ahead and add the values in the scores section of the query. For e.g. if your value is new and you want it to be at say value 2, then your scores would be in the below:
{
"scores":{
"active":5,
"old":4,
"cancelled":3,
"new":6
}
}
So basically the documents would first get sorted by _score and then on that sorted documents, the script sort would be executed.
Note that the script sort is desc by nature as I understand that you would want to show active documents at the top, followed by other values. Feel free to play around with it.
Hope this helps!

How to check field data is numeric when using inline Script in ElasticSearch

Per our requirement we need to find the max ID of the document before adding new document. Problem here is doc may contain string data also So had to use inline script on the elastic query to find out max id only for the document which has integer data otherwise returning 0. am using following inline script query to find max-key but not working. can you help me onthis ?.
{
"size":0,
"query":
{"bool":
{"filter":[
{"term":
{"Name":
{
"value":"Test2"
}
}}
]
}},
"aggs":{
"MaxId":{
"max":{
"field":"Key","script":{
"inline":"((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"}}
}
}
}
The error is because the max aggregation only supports numeric fields, i.e. you cannot specify a string field (i.e. Key) in a max aggregation.
Simply remove the "field":"Key" part and only keep the script part
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"Name": "Test2"
}
}
]
}
},
"aggs": {
"MaxId": {
"max": {
"script": {
"source": "((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"
}
}
}
}
}

Find the first doc for each property value

I'm trying to get the first document that has a specific property.
for example i have 50 docs with property "a":"1", with different dates.
also 100 docs with "a":"2"
is there a way to query the first doc of each "a" value by date?
Not exactly what you wanted, but you could run the following which will show you the results that match a:1 or a:2 and will order the results as you wanted.
{
"sort": {
"your_timestamp_field": {
"order": "desc"
}
},
"query": {
"filtered": {
"filter": {
"or": [
{
"term": {
"a": 1
}
},
{
"term": {
"a": 2
}
}
]
}
}
}
}
You could also run multiple queries using msearch. For example
Place the below in a file named requests
{"index": "your-index"}
{"size":1,"sort":{"#timestamp":{"order":"desc"}},"query":{"filtered":{"filter":{"term":{"a":"1"}}}}}
{"index": "your-index"}
{"size":1,"sort":{"#timestamp":{"order":"desc"}},"query":{"filtered":{"filter":{"term":{"a":"2"}}}}}
Then run
curl -XGET http://localhost:9200/your-index/_msearch --data-binary #requests; echo

elasticsearch-py driver is not filtering data properly when aggregating

I faced a weird with elasticsearch python drivers and would like if someone can explain it to me! The below code works directly from cURL but doesn't work with python-requests or elasticsearch-py, strangely, it works when I switch to pyelasticsearch library! The details are:
I have a type called MY_TYPE that has a nested object MY_NESTED_FIELD and a child document MY_CHILD_TYPE. I'm trying to do term facet aggregation on the nested attributes based on filters applied to the MY_TYPE and MY_CHILD_TYPE types. The query looks like
{
"query": {
"filtered": {
"filter": {
"has_child": {
"query": {
"range": {
"CHILD_FIELD": {
"gte": 0.5
}
}
},
"type": "MY_CHILD_TYPE"
}
}
}
},
"aggs": {
"aggregation_results": {
"aggs": {
"boards": {
"terms": {
"field": "MY_NESTED_FIELD.KEY",
"size": 100
},
"aggs": {
"MY_RANGES": {
"range": {
"ranges": [
{
"to": 0.5,
"from": 0
},
{
"to": 0.8
"from": 0.5
}
],
"field": "MY_NESTED_FIELD_PATH.VALUE"
}
}
}
}
},
"nested": {
"path": "MY_NESTED_FIELD_PATH"
}
}
}
}
When I run this query against elasticsearch directly (using cURL or head plugin) it filters the parent and returns aggregations based on correct results. However, when I try it from the python script, it runs successfully but returns wrong data (it returns facets from all the documents without applying the filter)
I have tried:
cURL: Works!
ElasticSearch's HEAD plugin: Works!
python-requests version 2.8.1: Did not work!
elasticsearch-py api versions 1.4.0 and 2.1.0: Did not work!
pyelasticsearch version 1.4: Works!
The code snippets for elasticsearch-py is:
from elasticsearch import Elasticsearch
es = Elasticsearch('HOST:PORT')
data = es.search(index='INDEX_NAME', doc_type='MY_TYPE', body=payload, q='*:*', size=0)
When using python-requests, the code was:
import requests
url = 'http://ES_HOST:ES_PORT/ES_INDEX/ES_TYPE/_search'
params = {'size':0, 'q':'*:*'}
data = requests.post(url, params=params, data=json.dumps(payload)).json()
My elastic search version is:
{
"version": {
"number": "1.4.4",
"build_hash": "c88f77ffc81301dfa9dfd81ca2232f09588bd512",
"build_timestamp": "2015-02-19T13:05:36Z",
"build_snapshot": false,
"lucene_version": "4.10.3"
}
}
So my questions are:
Is this the best way to write this query?
Is there an explanation for why elasticsearch-py is acting strangely?
Is there a fix for this on elasticsearch-py?

Elastic search aggregation sum

Im using elasticsearch 1.0.2 and I want to perform a search on it using a query with aggregation functions like sum()
Suppose my single record data is something like that
{
"_index": "outboxpro",
"_type": "message",
"_id": "PAyEom_mRgytIxRUCdN0-w",
"_score": 4.5409594,
"_source": {
"team_id": "1bf5f3f968e36336c9164290171211f3",
"created_user": "1a9d05586a8dc3f29b4c8147997391f9",
"created_ip": "192.168.2.245",
"folder": 1,
"report": [
{
"networks": "ec466c09fd62993ade48c6c4bb8d2da7facebook",
"status": 2,
"info": "OK"
},
{
"networks": "bdc33d8ca941b8f00c2a4e046ba44761twitter",
"status": 2,
"info": "OK"
},
{
"networks": "ad2672a2361d10eacf8a05bd1b10d4d8linkedin",
"status": 5,
"info": "[unauthorized] Invalid or expired token."
}
]
}
}
Let's say I need to fetch the count of all success messages posted with status = 2 in report field. There will be many record in the collection. I want to take report of all success messages posted.
I have tried the following code
////////////// Edit
{
"size": 2000,
"query": {
"filtered": {
"query": {
"match": {
"team_id": {
"query": "1bf5f3f968e36336c9164290171211f3"
}
}
}
}
},
"aggs": {
"genders": {
"terms": {
"field": "report.status"
}
}
}
}
Please help me to find some solution. Am newbie in elastic search. Is there any other aggregation method to find this one ?. Your help i much appreciate.
Your script filter is slow on big data and doesn't use benefits of "indexing". Did you think about parent/child instead of nested? If you use parent/child - you could use aggregations natively and use calculate sum.
You will have to make use of nested mappings here. Do have a look at https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-mapping.html.
And then you will have to do aggregation on nested fields as in https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html.

Resources