Aggregate filtered result using Elastic Search API - elasticsearch

I would like to aggregate and count the number of docs appears based on my filtering rules.
I looked at the API from their website: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html
and came out with this:
{ "size": 0,
"aggregations": {
"messages": {
"filters":{
"filters": {
"knowledge service": { "match": {"syslog_msg": "my-domain.com"}}
}
}
}
}
}
"syslog_msg" can contain information such as "my-domain.com some other value".
The response i got:
{
"_scroll_id" : "some scroll id",
"took" : 89,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1000000,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"messages" : {
"buckets" : {
"knowledge service" : {"doc_count" : 12000}
}
}
}
}
It seems working fine, but when I ran a query to look at the 12000 records, some of them do not have exact match to the string (in this case my-domain.com) that I searched for.
For example, some docs have the string "my" in syslog_msg instead of "my-domain.com".
How do I change the query so that it filters the exact match for the string that I am looking for?
The solution is to replace match with match_phrase which will search and return the exact phrase found

You should add aggregations to your filter
As elasticsearch document says (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html);
{
"aggs" : {
"red_products" : {
"filter" : { "term": { "color": "red" } },
"aggs" : {
"avg_price" : { "avg" : { "field" : "price" } }
}
}
}
}

Related

How to perform nested queries on Elasticsearch?

I was trying to perform nested query on elastic-search that is, I have 2 queries in which the output of the first query must be used as an input in the second query, was going through the documentation of elastic-search but couldn't find any alternative.
The first query is:
GET index1/_search
{
"query": {
"query_string": {
"query": "(imageName: xyz.jpg)"
}
}
}
The output of this query would be of JSON format,
For example:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.2682955,
"hits" : [
{
"_index" : "index1",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.2682955,
"_source" : {
"assetId" : "0",
"descriptor" : "randomString",
"bucketId" : [randomArray],
"imageName" : "xyz.jpg"
}
}
]
}
}
The second query is:
GET index2/_search
{
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"constant_score": {
"filter": {
"terms": {
"bucketId": [randomArray that came as an output of the first query]
}
}
}
},
"pqcode_score": {
"descriptors": [
{
"descriptor": "randomString that came as an output of the first query"
}
]
}
}
}
}
How can we use the output of the first query inside the second query?
Can anyone help me in this regard?
It is not possible in Elasticsearch. You need to implement this at your application side.
You can call first query and get result then you can call the second query by passing the output of first query that is the only option.

Can Elastic Search do aggregations for within a document?

I have a mapping like this:
mappings: {
"seller": {
"properties" : {
"overallRating": {"type" : byte}
"items": [
{
itemName: {"type": string},
itemRating: {"type" : byte}
}
]
}
}
}
Each item will only have one itemRating. Each seller will only have one overall rating. There can be many items, and at most I'm expecting maybe 50 items with itemRatings. Not all items have to have an itemRating.
I'm trying to get an average rating for each seller that combines all itemRatings and the overallRating. I have looked into aggregations but all I have seen are aggregations for across all documents. The aggregation I'm looking to do is within the document itself, and I am not sure if that is possible. Any tips would be appreciated.
Yes this is very much possible with Elasticeasrch. To produce a combined rating, you simply need to subaggregate by the document id. The only thing present in the bucket would be the individual document . That is what you want.
Here is an example:
Create the index:
PUT /ratings
{
"mappings": {
"properties": {
"overallRating": {"type" : "float"},
"items": {
"type" : "nested",
"properties": {
"itemName" : {"type" : "keyword"},
"itemRating" : {"type" : "float"},
"overallRating": {"type" : "float"}
}
}
}
}
}
Add some data:
POST ratings/_doc/
{
"overallRating" : 1,
"items" : [
{
"itemName" : "labrador",
"itemRating" : 10,
"overallRating" : 1
},
{
"itemName" : "saint bernard",
"itemRating" : 20,
"overallRating" : 1
}
]
}
{
"overallRating" : 1,
"items" : [
{
"itemName" : "cat",
"itemRating" : 5,
"overallRating" : 1
},
{
"itemName" : "rat",
"itemRating" : 10,
"overallRating" : 1
}
]
}
Query the index for a combined rating and sort by the rating:
GET ratings/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"average_rating": {
"composite": {
"sources": [
{
"ids": {
"terms": {
"field": "_id"
}
}
}
]
},
"aggs": {
"average_rating": {
"nested": {
"path": "items"
},
"aggs": {
"avg": {
"avg": {
"field": "items.compound"
}
}
}
}
}
}
},
"runtime_mappings": {
"items.compound": {
"type": "double",
"script": {
"source": "emit(doc['items.overallRating'].value + doc['items.itemRating'].value)"
}
}
}
}
The result (Pls note that i changed the exact values of ratings between writing the answer and running it in the console, so the averages are a bit different)
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"average_rating" : {
"after_key" : {
"ids" : "3vUp44EBbR3hrRYkA8pj"
},
"buckets" : [
{
"key" : {
"ids" : "3_Up44EBbR3hrRYkLsrC"
},
"doc_count" : 1,
"average_rating" : {
"doc_count" : 2,
"avg" : {
"value" : 151.0
}
}
},
{
"key" : {
"ids" : "3vUp44EBbR3hrRYkA8pj"
},
"doc_count" : 1,
"average_rating" : {
"doc_count" : 2,
"avg" : {
"value" : 8.5
}
}
}
]
}
}
}
One change for convenience:
I edited your mappings to add the overAllRating to each Item entry. This simplifies the calculations that come subsequently, simply because you only look in the nested scope and never have to step out.
I also had to use a "runtime mapping" to combine the value of each overAllRating and ItemRating, to produce a better average. I basically made a sum of every ItemRating with the OverAllRating and averaged those across every entry.
I had to use a top level composite "id" aggregation so that we only get results per document (which is what you want).
There is some pretty heavy lifting happening here, but it is very possible and easy to edit this as you require.
HTH.

Bucket Script Aggregation - Elastic Search

I'm trying to build a query at Elastic Search, in order to get the difference of two values:
Here's the code I'm using:
GET /monitora/_search
{
"size":0,
"aggs": {
"CALC_DIFF": {
"filters": {
"filters": {
"FTS_callback": {"term":{ "msgType": "panorama_fts"}},
"FTS_position": {"term":{ "msgType": "panorama_position"}}
}
},
"aggs": {
"subtract": {
"bucket_script": {
"buckets_path": {
"PCountCall": "_count",
"PcountPos":"_count"
},
"script": "params.PCountCall - params.PcountPos"
}
}
}
}
}
}
And this is what I get back when I run it:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"CALC_DIFF" : {
"buckets" : {
"FTS_callback" : {
"doc_count" : 73530,
"subtract" : {
"value" : 0.0
}
},
"FTS_position" : {
"doc_count" : 156418,
"subtract" : {
"value" : 0.0
}
}
}
}
}
}
However, instead of getting the subtraction inside these buckets (which will always be zero), I was looking for the subtraction of the counts on each bucket, which would return me (73530 - 156418) following this example.
After that, I would like to display the result as a "metric" visualization element in Kibana. Is it possible?
Could anyone give me a hand to get it right?
Thanks in advance!

filtering on 2 values of same field

I have a status field, which can have one of the following values,
I can filter for data which have status completed. I can also see data which has ongoing.
But I want to display the data which have status completed and ongoing at the same time.
But I don't know how to add filters for 2 values on a single field.
How can I achieve what I want ?
EDIT - Thanks for answers. But that is not what i wanted.
Like here I have filtered for status:completed, I want to filter for 2 values in this exact way.
I know I can edit this filter and , and use your queries, But I need a simple way to do this(query way is complex), as I have to show it to my marketing team and they don't have any idea about queries. I need to convince them.
If I understand your question correctly, you want to perform an aggregation on 2 values of a field.
This should be possible with a query similar to this one with a terms query:
{
"size" : 0,
"query" : {
"bool" : {
"must" : [ {
"terms" : {
"status" : [ "completed", "unpaid" ]
}
} ]
}
},
"aggs" : {
"freqs" : {
"terms" : {
"field" : "status"
}
}
}
}
This will give a result like this one:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"freqs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "unpaid",
"doc_count" : 4
}, {
"key" : "completed",
"doc_count" : 1
} ]
}
}
}
Here is my toy mapping definition:
{
"bookings" : {
"properties" : {
"status" : {
"type" : "keyword"
}
}
}
}
You need a filter in aggregation.
{
"size": 0,
"aggs": {
"agg_name": {
"filter": {
"bool": {
"should": [
{
"terms": {
"status": [
"completed",
"ongoing"
]
}
}
]
}
}
}
}
}
Use the above query to get results like this:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"agg_name": {
"doc_count": 6
}
}
}
The result what you want is the doc_count
For your reference bool query in elasticsearch, should it's like OR conditions,
{
"query":{
"bool":{
"should":[
{"must":{"status":"completed"}},
{"must":{"status":"ongoing"}}
]
}
},
"aggs" : {
"booking_status" : {
"terms" : {
"field" : "status"
}
}
}
}

Make Elasticsearch return the number of all documents on query

When I do a query Elasticsearch returns how many hits I get. Can I also get it to reply how many documents it has in total?
Here I've added the imaginary field sum_documents to the result. Does such thing exist, or to I have to make an extra query to fetch the sum?
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"sum_documents": 500,
"max_score" : null,
"hits" : [ ]
}
}
You can add a global aggregation in your query, and it will return the total document count in your search context (index/alias + type(s))
{
"query": {
"query_string": {
"query": "viking",
"default_operator": "AND"
}
},
"aggs": {
"harvester-test": {
"global": {}
}
}
}

Resources