How to get different aggregations for different params in bool in Elasticsearch - elasticsearch

Document structure:
{
"_index" : "admin",
"_type" : "_doc",
"_id" : "9rEy94EB7k-V-3UYmchn",
"_source" : {
"entity_title" : "Title CPP",
"entity_type" : "type1",
"entity_score" : 185346,
"entity" : {
"customer_id" : "cid1",
"customer_name" : "cname1",
}
}
}
{
"_index" : "admin",
"_type" : "_doc",
"_id" : "9rEy94EB7k-V-3UYmchn",
"_source" : {
"entity_title" : "Title APP",
"entity_type" : "type1",
"entity_score" : 12,
"entity" : {
"customer_id" : "cid2",
"customer_name" : "cname2",
}
}
}
My query
GET /admin/_search
{
"size": 0,
"query" : {
"bool" : {
"should" : [
{
"query_string" : {"default_field" : "entity_title", "query" : "app*"}
},
{
"fuzzy": {"entity_title": {"value": "app"}}
}
]
}
}
},
"aggs": {
"by_entity_type": {
"terms":{
"field":"entity_type",
"size": 4 <total number of entity types>
},
"aggs": {
"by_top_score":{"top_hits":{"size":10, "sort": {"entity_score": {"order" : "desc", "mode" : "avg"}}}}
}
}
}
I need to
Aggregate all search results by entity_type.
Sort the results of matched query (query_string) by _score.
Sort results of fuzzy search by 'entity_score'.
Kindly help me to fetch this as a separate or in same aggregation.
Thanks.

Related

Bool Filter not showing just the filtered data in Elastic Search

I have an index "tag_nested" which has data of following type :
{
"jobid": 1,
"table_name": "table_A",
"Tags": [
{
"TagType": "WorkType",
"Tag": "ETL"
},
{
"TagType": "Subject Area",
"Tag": "Telecom"
}
]
}
When I fire the query to filter data on "Tag" and "TagType" by firing following query :
POST /tag_nested/_search
{
"query": {
"bool": {
"must": {"match_all": {}},
"filter": [
{"term": {
"Tags.Tag.keyword": "ETL"
}},
{"term": {
"Tags.TagType.keyword": "WorkType"
}}
]
}
}
}
It gives me the following output. The problem I am facing is while the above query filters documents which doesn't have filtered data BUT it shows all the "Tags" of that document instead of just the filter one
{
"_index" : "tag_nested",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"jobid" : 1,
"table_name" : "table_A",
"Tags" : [
{
"TagType" : "WorkType",
"Tag" : "ETL"
},
{
"TagType" : "Subject Area",
"Tag" : "Telecom"
}
]
}
}
Instead of above result I want my output to be like :
{
"_index" : "tag_nested",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"jobid" : 1,
"table_name" : "table_A",
"Tags" : [
{
"TagType" : "WorkType",
"Tag" : "ETL"
}
]
}
}
Already answered here, here and here.
TL;DR you'll need to make your Tags field of type nested, resync your index & use inner_hits to only fetch the applicable tag group.

Elasticsearch returns 0.0 for metrics sum aggregation

Elasticsearch returns 0.0 for metrics sum aggregation. Expected output will be some of metric probe_http_duration_seconds.
Elasticsearch version: 7.1.1
Query used for aggregation:
GET some_metric/_search
{
"query": {
"bool": {
"must": [
{
"range": { "time": { "gte" : "now-1m", "lt": "now" } }
},
{
"match": {"name": "probe_http_duration_seconds"}
},
{
"match": {"labels.instance": "some-instance"}
}
]
}
},
"aggs" : {
"sum_is" : { "sum": { "field" : "value" } }
}
}
The above query returns for documents followed by:
"aggregations" : {
"sum_is" : {
"value" : 0.0
}
Each document in the index looks like:
{
"_index" : "some_metric-2019.12.03-000004",
"_type" : "_doc",
"_id" : "_wCjz24Bk6FPpmW1lC31",
"_score" : 5.3475914,
"_source" : {
"name" : "probe_http_duration_seconds",
"time" : 1575441630181,
"value" : 0,
"labels" : {
"__name__" : "probe_http_duration_seconds",
"app" : "some-events",
"i" : "some_metric",
"instance" : "some-instance",
"job" : "someproject-k8s-service",
"kubernetes_name" : "some-events",
"kubernetes_namespace" : "deploytest",
"phase" : "connect",
"t" : "type",
"v" : "1"
}
}
}
In query on changing must to should, I get:
"aggregations" : {
"sum_is" : {
"value" : 1.5389155527088604E16
}
}
The index dynamic mapping looks something like this:
"mappings" : {
"dynamic_templates" : [
{
"strings" : {
"unmatch" : "*seconds*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "keyword"
}
}
},
{
"to_float" : {
"match" : "*seconds*",
"mapping" : {
"type" : "float"
}
}
}
],
However in our requirement, we need results matching all of the clauses in the query.
For metrics aggregation elasticsearch converts everything to double, still this doesn't explain result as zero.
Any pointers will be helpful. Thanks for attention.
NOTE: I see that in example document, value field is zero. Maybe while drafting/editing I made a mistake.
Below is the result of past 2 mins. This shows value field is actually float.
Query:
GET some_metric/_search?size=3
{
"_source": ["value"],
"query": {
"bool": {
"must": [
{
"range": { "time": { "gte" : "now-2m", "lt": "now" } }
},
{
"match": {"name": "probe_http_duration_seconds"}
},
{
"match": {"labels.instance": "some-instance"}
}
]
}
}
}
Result:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 14.551308,
"hits" : [
{
"_index" : "some_metric-2019.12.04-000005",
"_type" : "_doc",
"_id" : "7oog0G4Bk6EPplW1ibD1",
"_score" : 14.551308,
"_source" : {
"value" : 0.040022423
}
},
{
"_index" : "some_metric-2019.12.04-000005",
"_type" : "_doc",
"_id" : "74og0G4Bk6EPplW1ibD1",
"_score" : 14.551308,
"_source" : {
"value" : 3.734E-5
}
},
{
"_index" : "some_metric-2019.12.04-000005",
"_type" : "_doc",
"_id" : "A4og0G4Bk6EPplW1ibH1",
"_score" : 14.551308,
"_source" : {
"value" : 0.015694122
}
}
]
}
}
What you see is just what you indexed in the source document. ES will never modify your source document. However, since the type is long as I thought then it will index that float value as a long and not as a float.
This usually happens when the very first document to be indexed has an integer value, such as 0, for instance.
You can either reindex your data with the proper mapping... Or since you have time-based indexes, just modify the dynamic template and tomorrow's index will be created correctly.

Search in nested object

I'm having trouble making a query on elasticsearch 7.3
I create an index as this:
PUT myindex
{
"mappings": {
"properties": {
"files": {
"type": "nested"
}
}
}
}
After I create three documents:
PUT myindex/_doc/1
{
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2",
"files" : [
{
"filename" : "firstfilename.exe",
"datetime" : 111111111
},
{
"filename" : "secondfilename.exe",
"datetime" : 111111144
}
]
}
PUT myindex/_doc/2
{
"SHA256" : "87ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a",
"files" : [
{
"filename" : "thirdfilename.exe",
"datetime" : 111111133
},
{
"filename" : "fourthfilename.exe",
"datetime" : 111111122
}
]
}
PUT myindex/_doc/3
{
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a",
"files" : [
{
"filename" : "fifthfilename.exe",
"datetime" : 111111155
}
]
}
How can I get the last two files based on the datetime (ids: 1 and 3)?
I would SHA256 of the last two DATETIME ordered by DESC..
I did dozens of tests but none went well...
I don't write the code I tried because I'm really on the high seas ...
I would a result like this or similar:
{
"SHA256": [
"94ee05933....a2af6a22cc2",
"565e04933....a2af6a22c5a"
]
}
Query:
GET myindex/_search
{
"_source":"SHA256",
"sort": [
{
"files.datetime": {
"mode":"max",
"order": "desc",
"nested_path": "files"
}
}
],
"size": 2
}
Result:
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a"
},
"sort" : [
111111155
]
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2"
},
"sort" : [
111111144
]
}
]
In sort you will get the max date time value . So If you need to get file names too , you can add it in _source and use sort file to get appropriate file name.
A bit more complicated query this will give you exactly two values.
GET myindex/_search
{
"_source": "SHA256",
"query": {
"bool": {
"must": [
{
"nested": {
"path": "files",
"query": {
"match_all": {}
},
"inner_hits": {
"size":1,
"sort": [
{
"files.datetime": "desc"
}
]
}
}
}
]
}
},
"sort": [
{
"files.datetime": {
"mode": "max",
"order": "desc",
"nested_path": "files"
}
}
],
"size": 2
}
Result:
[
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a"
},
"sort" : [
111111155
],
"inner_hits" : {
"files" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_nested" : {
"field" : "files",
"offset" : 0
},
"_score" : null,
"_source" : {
"filename" : "fifthfilename.exe",
"datetime" : 111111155
},
"sort" : [
111111155
]
}
]
}
}
}
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2"
},
"sort" : [
111111144
],
"inner_hits" : {
"files" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "files",
"offset" : 1
},
"_score" : null,
"_source" : {
"filename" : "secondfilename.exe",
"datetime" : 111111144
},
"sort" : [
111111144
]
}
]
}
}
}
}
]

Find list of Distinct string Values stored in a field in ElasticSearch

I have stored my data in elasticsearch which is as given below. It returns only distinct words in the given field and not the entire distinct phrase.
{
"_index" : "test01",
"_type" : "whatever01",
"_id" : "1234",
"_score" : 1.0,
"_source" : {
"company_name" : "State Bank of India",
"user" : ""
}
},
{
"_index" : "test01",
"_type" : "whatever01",
"_id" : "5678",
"_score" : 1.0,
"_source" : {
"company_name" : "State Bank of India",
"user" : ""
}
},
{
"_index" : "test01",
"_type" : "whatever01",
"_id" : "8901",
"_score" : 1.0,
"_source" : {
"company_name" : "Kotak Mahindra Bank",
"user" : ""
}
}
I tried using Term Aggregation Function
GET /test01/_search/
{
"aggs" : {
"genres":
{
"terms" :
{ "field": "company_name"}
}
}
}
I get the following output
"aggregations" : {
"genres" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10531,
"buckets" : [
{
"key" : "bank",
"doc_count" : 2818
},
{
"key" : "mahindra",
"doc_count" : 1641
},
{
"key" : "state",
"doc_count" : 1504
}]
}}
How to get the entire string in the field "company_name" with only distinct values as given below?
"aggregations" : {
"genres" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10531,
"buckets" : [
{
"key" : "Kotak Mahindra Bank",
"doc_count" : 2818
},
{
"key" : "State Bank of India",
"doc_count" : 1641
}
]
}}
It appears that you've set "fielddata": "true" for your field company_name which is of type text. This is not good as it can end up consuming lot of heap space as mentioned in this link.
Further more, the field's values of type text are broken down into tokens and is saved in inverted index using a process called Analysis. Setting fielddata on fields of type text would cause the aggregation to work as what you mentioned in your question.
What you'd need to do is create its sibling equivalent of type keyword as mentioned in this link and perform aggregation on that field.
Basically modify your mapping for company_name as below:
Mapping:
PUT <your_index_name>/_search
{
"mappings": {
"mydocs": {
"properties": {
"company_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Run the below aggregation query on this company_name.keyword field and you'd get what you are looking for.
Query:
POST <your_index_name>/_search
{
"aggs": {
"unique_names": {
"terms": {
"field": "company_name.keyword", <----- Run on this field
"size": 10
}
}
}
}
Hope this helps!

Fetching unique data in Elasticsearch

I have following data
ID: 1, fldname: pawan
ID: 1, fldname: pawan1
ID: 1, fldname: pawan2
ID: 2, fldname: pawan3
ID: 3, fldname: pawan4
ID: 4, fldname: pawan5
I am trying to get unique data based on ID field, similar to what we get in MySQL while firing group by queries like:
select * from table_name where fldname like 'pawan%' group by ID
This will return unique values. Same works in sphinx search when we use group by function.
Is there any way to get unique values in elasticsearch..?
Below is my sample mapping:
"mappings": {
"my_type": {
"properties": {
"docid": {
"type": "keyword"
},
"flgname": {
"type": "text"
}
}
}
}
I suggest that you slightly modify your mapping:
{
"record" : {
"dynamic" : "false",
"_all" : {
"enabled" : false
},
"properties" : {
"docid" : {
"type" : "long"
},
"flgname" : {
"type" : "text"
}
}
}
}
so that docid is a long
Then you could try fuzzy queries for filtering, together with aggregations, like this one here which retrieves the minimum, maximum, average and count of docid's:
{
"from" : 0,
"size" : 10,
"_source" : true,
"query" : {
"bool" : {
"must" : [ {
"match" : {
"flgname" : {
"query" : "pawan",
"operator" : "OR",
"fuzziness" : "1",
"prefix_length" : 1,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"boost" : 1.0
}
}
} ]
}
},
"aggs" : {
"my_cardinality" : {
"cardinality" : {
"field" : "docid"
}
},
"my_avg" : {
"avg" : {
"field" : "docid"
}
},
"my_min" : {
"min" : {
"field" : "docid"
}
},
"my_max" : {
"max" : {
"field" : "docid"
}
}
}
}
By the way this is the result of the above query on the data you proposed:
{
"took" : 47,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.9808292,
"hits" : [ {
"_index" : "stack_overflow1",
"_type" : "record",
"_id" : "40b5eac0-743b-4a6a-a06d-3ae4d56f4aca",
"_score" : 0.9808292,
"_source" : {
"docid" : "1",
"flgname" : "pawan"
}
}, {
"_index" : "stack_overflow1",
"_type" : "record",
"_id" : "27821c39-e722-4361-bc07-0dcd5181a1ad",
"_score" : 0.7846634,
"_source" : {
"docid" : "2",
"flgname" : "pawan3"
}
}, {
"_index" : "stack_overflow1",
"_type" : "record",
"_id" : "86fcd9c1-a688-4a6a-9c45-e91791a8b902",
"_score" : 0.7846634,
"_source" : {
"docid" : "4",
"flgname" : "pawan5"
}
}, {
"_index" : "stack_overflow1",
"_type" : "record",
"_id" : "fb00a3cc-f1b8-4073-8808-f2ddbc4979e2",
"_score" : 0.55451775,
"_source" : {
"docid" : "1",
"flgname" : "pawan1"
}
}, {
"_index" : "stack_overflow1",
"_type" : "record",
"_id" : "18e5e20d-17a7-4d59-b2f1-7bf325a4c4df",
"_score" : 0.55451775,
"_source" : {
"docid" : "3",
"flgname" : "pawan4"
}
}, {
"_index" : "stack_overflow1",
"_type" : "record",
"_id" : "fbf49af6-f574-4ad2-8686-cbbedc5e70c4",
"_score" : 0.23014566,
"_source" : {
"docid" : "1",
"flgname" : "pawan2"
}
} ]
},
"aggregations" : {
"my_cardinality" : {
"value" : 4
},
"my_max" : {
"value" : 4.0
},
"my_avg" : {
"value" : 2.0
},
"my_min" : {
"value" : 1.0
}
}
}
If you make flgname also a keyword, then you can use sub-aggregation to aggregate over docID and subaggregate over flgname. Result will be similar to the SQL query you mentioned.
Query would look like:
{ "size": 0,
"query": {
"regexp":{
"flgname": "pawa.*"
}
},
"aggs" : {
"docids": {
"terms": {"field": "docid"},
"aggs": { "flgnam": { "terms": {"field": "flgname"}}}}
}
}

Resources