How to aggregate different field by a date field in Elasticsearch - elasticsearch

REST api call
GET test10/LREmail10/_search/
{
"size": 10,
"query": {
"range": {
"ALARM DATE": {
"gte": "now-15d/d",
"lt": "now/d"
}
}
},
"fields": [
"ALARM DATE",
"CLASSIFICATION"
]
}
part of out put is,
"took": 25,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 490,
"max_score": 1,
"hits": [
{
"_index": "test10",
"_type": "LREmail10",
"_id": "AVM5g6XaShke4hy5dziK",
"_score": 1,
"fields": {
"CLASSIFICATION": [
"Attack"
],
"ALARM DATE": [
"25/02/2016 8:35:22 AM(UTC-08:00)"
]
}
},
{
"_index": "test10",
"_type": "LREmail10",
"_id": "AVM5g6e_Shke4hy5dziL",
"_score": 1,
"fields": {
"CLASSIFICATION": [
"Compromise"
],
"ALARM DATE": [
"25/02/2016 8:36:16 AM(UTC-08:00)"
]
}
},
What I really want to do here is, aggregate CLASSIFICATION by ALARM DATE. Default format of the date has minutes, seconds and time-zone too. But I want to aggrigate all the classifications for each and everydate. So, "25/02/2016 8:36:16 AM(UTC-08:00)" and "25/02/2016 8:35:22 AM(UTC-08:00)" should be considered as "25/02/2016" date. and get the all the classifications belong to a single date.
I wish that I have explained question properly. If you guys need any more details let me know.
If anyone, can give me a hint to look what area in Elasticsearch is also very helpful.

Use date_histogram like below.
{
"size" :0 ,
"aggs": {
"classification of day": {
"date_histogram": {
"field": "ALARM DATE",
"interval": "day"
},
"aggs": {
"classification": {
"terms": {
"field": "CLASSIFICATION"
}
}
}
}
}
}

Related

Is it possible to use a query result into another query in ElasticSearch?

I have two queries that I want to combine, the first one returns a document with some fields.
Now I want to use one of these fields into the new query without creating two separates ones.
Is there a way to combine them in order to accomplish my task?
This is the first query
{
"_source": {
"includes": [
"data.session"
]
},
"query": {
"bool": {
"must": [
{
"match": {
"field1": "9419"
}
},
{
"match": {
"field2": "5387"
}
}
],
"filter": [
{
"range": {
"timestamp": {
"time_zone": "+00:00",
"gte": "2020-10-24 10:16",
"lte": "2020-10-24 11:16"
}
}
}
]
}
},
"size" : 1
}
And this is the response returned:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 109,
"relation": "eq"
},
"max_score": 3.4183793,
"hits": [
{
"_index": "file",
"_type": "_doc",
"_id": "UBYCkgsEzLKoXh",
"_score": 3.4183793,
"_source": {
"data": {
"session": "123456789"
}
}
}
]
}
}
I want to use that "data.session" into another query, instead of rewriting the value of the field by passing the result of the first query.
{
"_source": {
"includes": [
"data.session"
]
},
"query": {
"bool": {
"must": [
{
"match": {
"data.session": "123456789"
}
}
]
}
},
"sort": [
{
"timestamp": {
"order": "asc"
}
}
]
}
If you mean to use the result of the first query as an input to the second query, then it's not possible in Elasticsearch. But if you share your query and use-case, we might suggest you better way.
ElasticSearch does not allow sub queries or inner queries.

Elasticsearch order by _score or max_score from SearchResponse Java API

I have an index which contain documents with same employee name and email address but varies with other information such as meetings attended and amount spent.
{
"emp_name" : "Raju",
"emp_email" : "raju#abc.com",
"meeting" : "World cup 2019",
"cost" : "2000"
}
{
"emp_name" : "Sanju",
"emp_email" : "sanju#abc.com",
"meeting" : "International Academy",
"cost" : "3000"
}
{
"emp_name" : "Sanju",
"emp_email" : "sanju#abc.com",
"meeting" : "School of Education",
"cost" : "4000"
}
{
"emp_name" : "Sanju",
"emp_email" : "sanju#abc.com",
"meeting" : "Water world",
"cost" : "1200"
}
{
"emp_name" : "Sanju",
"emp_email" : "sanju#abc.com",
"meeting" : "Event of Tech",
"cost" : "5200"
}
{
"emp_name" : "Bajaj",
"emp_email" : "bajaju#abc.com",
"meeting" : "Event of Tech",
"cost" : "4500"
}
Now, when I do search based on emp_name field like "raj" then I should get one of the Raju, Sanju and Bajaj document since I am using fuzzy search functionality (fuzziness(auto)).
I am implementing elasticsearch using Java High level rest client 6.8 API.
TermsAggregationBuilder termAggregation = AggregationBuilders.terms("employees")
.field("emp_email.keyword")
.size(2000);
TopHitsAggregationBuilder termAggregation1 = AggregationBuilders.topHits("distinct")
.sort(new ScoreSortBuilder().order(SortOrder.DESC))
.size(1)
.fetchSource(includeFields, excludeFields);
Based on the above code, it's getting distinct documents but Raju's record is not on the top of the response instead we see Sanju document due to the number of counts.
Below is the JSON created based on the searchrequest.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "raj",
"fields": [
"emp_name^1.0",
"emp_email^1.0"
],
"boost": 1.0
}
}
],
"filter": [
{
"range": {
"meeting_date": {
"from": "2019-12-01",
"to": null,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"employees": {
"terms": {
"field": "emp_email.keyword",
"size": 2000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"distinct": {
"top_hits": {
"from": 0,
"size": 1,
"version": false,
"explain": false,
"_source": {
"includes": [
"all_uid",
"emp_name",
"emp_email",
"meeting",
"country",
"cost"
],
"excludes": [
]
},
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
}
}
}
}
}
I think if we order by max_score or _score then Raju's record will be on top of the response.
Could you please let me know how to get order by _score or max_score of the document returned by response?
Sample response is
{
"took": 264,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 232,
"max_score": 0.0,
"hits": [
]
},
"aggregations": {
"sterms#employees": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sanju",
"doc_count": 4,
"top_hits#distinct": {
"hits": {
"total": 4,
"max_score": 35.71312,
"hits": [
{
"_index": "indexone",
"_type": "employeedocs",
"_id": "1920424",
"_score": 35.71312,
"_source": {
"emp_name": "Sanju",
...
}
}
]
}
}
},
{
"key": "Raju",
"doc_count": 1,
"top_hits#distinct": {
"hits": {
"total": 1,
"max_score": 89.12312,
"hits": [
{
"_index": "indexone",
"_type": "employeedocs",
"_id": "1920424",
"_score": 89.12312,
"_source": {
"emp_name": "Raju",
...
}
}
]
}
}
}
Let me know if you have any question.
Note: I see many similar kind of questions but none of them helped me. Please advise.
Thanks,
Chetan

Elasticsearch OR query with nested objects returns inner_hits not matching the criteria

I'm getting weird results when querying nested objects. Imagine the following structure:
{ owner.name = "fred",
...,
pets [
{ name = "daisy", ... },
{ name = "flopsy", ... }
]
}
If I only have the document shown above, and I search pets matching this criteria:
pets.name = "daisy" OR
(owner.name = "julie" and pet.name = "flopsy")
I would expect to only get one result ("daisy"), but I'm getting both pet names.
This is one way to reproduce this:
# Create nested mapping
PUT pet-owners
{
"mappings": {
"animals": {
"properties": {
"owner": {"type": "text"},
"pets": {
"type": "nested",
"properties": {
"name": {"type": "text", "fielddata": true}
}
}
}
}
}
}
# Insert nested object
PUT pet-owners/animals/1?op_type=create
{
"owner" : "fred",
"pets" : [
{ "name" : "daisy"},
{ "name" : "flopsy"}
]
}
# Query
GET pet-owners/_search
{ "from": 0, "size": 50,
"query": {
"constant_score": {
"filter": { "bool": {"must": [
{"bool": {"should": [
{"nested": {"query":
{"term": {"pets.name": "daisy"}},
"path":"pets",
"inner_hits": {
"name": "pets_hits_1",
"size": 99,
"_source": false,
"docvalue_fields": ["pets.name"]
}
}},
{"bool": {"must": [
{"term": {"owner": "julie"}},
{"nested": {"query":
{"term": {"pets.name": "flopsy"}},
"path":"pets",
"inner_hits": {
"name": "pets_hits_2",
"size": 99,
"_source": false,
"docvalue_fields": ["pets.name"]
}
}}
]}}
]}}
]}}}},
"_source": false
}
The query returns both pets names (as opposed to the expected one).
Is this behavior normal? Am I doing something wrong, or my reasoning about the nested structure or the query behavior is flawed?
Any help or guidance will be much appreciated.
I'm running this query under ElasticSearch 6.3.x
EDIT: I'm adding the response received, to better illustrate the case
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_score": 1,
"inner_hits": {
"pets_hits_1": {
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_nested": {
"field": "pets",
"offset": 0
},
"_score": 0.6931472,
"fields": {
"pets.name": [
"daisy"
]
}
}
]
}
},
"pets_hits_2": {
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_nested": {
"field": "pets",
"offset": 1
},
"_score": 0.6931472,
"fields": {
"pets.name": [
"flopsy"
]
}
}
]
}
}
}
}
]
}
}
So we can see that it's not that the query matches and returns the whole existing document, but that it returns each of the pets independently, one inside each of the inner_hits. It's this result that's surprising to me.
(edited) - in summary this issue is around the context of the 'inner_hits':
It looks like the inner_hits 'pets_hits_2' is returning a match because it is belonging to the nested query that simply searches the pets field for 'flopsy'.
As an independent query on our single document, that is a valid hit.
However, because that query is within a list of bool/must queries, where other queries will not match on our document, you may well expect that the inner_hits should pick up on this and therefore not return a hit.
I haven't been able to find any docs to clarify whether this is intentional behaviour or not - might be worth raising with elastic ...

No query registered for [...]

POST: {{HOST}}/_search
ElasticSearch returns me "No query registered for [parental_rating]" error when trying to look up entries using the following query.
I am using 5.5.1 ES version
{
"query": {
"bool": {
"filter" : {
"parental_rating" : {
"lte": 6
}
}
}
}
}
Here is the example how it looks the data: Here is the example how it looks the data:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 128,
"successful": 128,
"failed": 0
},
"hits": {
"total": 182310,
"max_score": 1,
"hits": [
{
"_index": "gracenote_1528767603035",
"_type": "program",
"_id": "3a0cf999-bfd0-4195-a8e4-ccaaf1b5d4d3",
"_score": 1,
"_source": {
"imported_at": "Tue Jun 12 2018 00:21:09 GMT+0000 (UTC)",
"type": "program",
"title": "Rusty rockt!; Die Luftrettung",
"description": "Ruby nimmt an einemsen Rustyommen.",
"summary": "Ruby nimmt an einem TalRusty und den Bits.",
"short_summary": null,
"season_no": 1,
"episode_no": 8,
"season_id": "13005345",
"id": "91984",
"series_id": "13000950",
"people": [
{
"name": "Kyle Breitkopf",
"character": "Rusty",
"role": "Voice",
"id": "682794"
}
],
"genres": [
"78",
"11",
"2",
"99",
"44"
],
"parental_rating": 12,
You're probably missing a range query here. The correct query should look like this:
{
"query": {
"bool": {
"filter" : {
"range": {
"parental_rating" : {
"lte": 6
}
}
}
}
}
}

Why does elasticsearch filter does not give any results whereas using kibana dasboard gives the result?

I am query elastic search using sense. When using range filter on field, I get empty hits, but I am able to get results using kibana dashboard. Why is the filter not working? My query:
GET _search
{
"query": {
"bool": {
"must": [
{"match": {"field_name1": "value1"}},
{"match": {"file_name2": "value2"}}
]
}
},
"filter": { <- not working (no data, but gets data from kibana)
"range": {
"#timestamp": {
"gte": "2017-02-18"
}
}
},
"sort": [
{
"#timestamp": {
"order": "desc",
"ignore_unmapped" : true
}
}
]
}
From kibana dashboard when I add the time it add the time:(from:'2017-02-18T10:19:08.680Z',mode:absolute,to:'2017-02-19T10:19:08.680Z')) and I am able to see results. The dashboard also adds some other stuff like metadata and filter with negate but I think they do the same. Only the time part seem to be different. So why the difference and is my query correct? The sample url:
https://elasticsearch/app/kibana#/discover?
_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:'2017-02-18T09:23:41.044Z',mode:absolute,to:'2017-02-19T09:23:41.044Z'))
&_a=(columns:!(description,id),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:index-value,key:field_name1,negate:!f,value:value1),query:(match:(field_name2:(query:value2,type:phrase))))),index:index-value,interval:auto,query:(query_string:(analyze_wildcard:!t,query:'*')),sort:!('#timestamp',desc),uiState:(),vis:(aggs:!((params:(field:field_name2,orderBy:'2',size:20),schema:segment,type:terms),(id:'2',schema:metric,type:count)),type:histogram))
&indexPattern=index-value&type=histogram
Thanks.
Sample json response:
{
"took": some_number,
"timed_out": false,
"_shards": {
"total": some_number,
"successful": some_number,
"failed": 0
},
"hits": {
"total": some_number,
"max_score": null,
"hits": [
{
"_index": "index-name",
"_type": "log-1",
"_id": "alphanum",
"_score": null,
"_source": {
"headers": "header-string",
"query_string": "query-string",
"server_variables": "server-variables",
"cookies": "cookies",
"extra_data": "some extra stuff",
"exception_data_obj": {
"stack_trace": "",
"source": "",
"message": "success",
"additional_data": ""
},
"some_id": "211FA1F1-F312-1234-B539-F7AAE23EAA2F",
"level": "Warn",
"description": "Success",
"#timestamp": "2017-01-20T01:33:27.303Z",
"field1": "value1",
"field2": "value2"
"key": {
"key.field1": "key.value1",
"key.field2": "key.value2"
}
"#by": "app-name",
"environment": "env-name"
},
"sort": [
1484876007303
]
},
{}
]
}
}
it's not the same query, in the sense query you asked must query on field1 and field2 but in kibana you didn't

Resources