elasticSearch count multiple - elasticsearch

I have two profiles, "A" and "B" both have events in the elastic
this is the elastic data for ex:
{hits: [
{tag:"A"},
{tag:"B"},
{tag:B}
]}
I want to count how much events tag "a" have and and how much "B" in one request
Ive tried this but it counts them total as 3 and I want A:1 and B:2
GET forensics/_count
{
"query": {
"terms": {
"waas_tag": ["A","B"]
}
}
}

You can use term vector API to get information about the terms of a particular field.
Adding a working example with index data and response
Index Data
{
"waas_tag": [
{
"tag": "A"
},
{
"tag": "B"
},
{
"tag": "B"
}
]
}
Term Vector API:
GET _termvectors/1?fields=waas_tag.tag
Response:
"term_vectors": {
"waas_tag.tag": {
"field_statistics": {
"sum_doc_freq": 2,
"doc_count": 1,
"sum_ttf": 3
},
"terms": {
"a": {
"term_freq": 1, // note this
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 1
}
]
},
"b": {
"term_freq": 2, // note this
"tokens": [
{
"position": 101,
"start_offset": 2,
"end_offset": 3
},
{
"position": 202,
"start_offset": 4,
"end_offset": 5
}
]
}
}
}
}

at the end I found a solution not using count but msearch
GET forensics/_msearch
{} // this means {index:"forensics"}
{"query":{"term":{"waas_tag":"A"}}}
{} // this means {index:"forensics"}
{"query":
{
"bool":{
"must":[{"term":{"waas_tag":"B"}
},
{
"range":{"#timestamp":{"gte":"now-20d","lt":"now/s"}}}]}
}
}

You can use filters aggregation to get the count for each tag in a single query without using _msearch endpoint. This query should work:
{
"size": 0,
"aggs": {
"counts": {
"filters": {
"filters": {
"CountA": {
"term": {
"waas_tag": "A"
}
},
"CountB": {
"term": {
"waas_tag": "B"
}
}
}
}
}
}
}

Related

Interval search for messages in Elasticsearch

I need to split the found messages into intervals. Can this be done with Elasticsearch?
For example. There are 10 messages, you need to divide them into 3 intervals. It should look like this...
[0,1,2,3,4,5,6,7,8,9] => {[0,1,2], [3,4,5,6], [7,8,9]}.
I'm only interested in the beginning of the intervals. For example: {[count - 3, min 0], [count - 4, min 3], [count - 3, min - 7]}
Example.
PUT /test_index
{
"mappings": {
"properties": {
"id": {
"type": "long"
}
}
}
}
POST /test_index/_doc/0
{
"id": 0
}
POST /test_index/_doc/1
{
"id": 1
}
POST /test_index/_doc/2
{
"id": 2
}
POST /test_index/_doc/3
{
"id": 3
}
POST /test_index/_doc/4
{
"id": 4
}
POST /test_index/_doc/5
{
"id": 5
}
POST /test_index/_doc/6
{
"id": 6
}
POST /test_index/_doc/7
{
"id": 7
}
POST /test_index/_doc/8
{
"id": 8
}
POST /test_index/_doc/9
{
"id": 9
}
It is necessary to divide the values ​​into 3 intervals with the same number of elements in each interval:
{
...
"aggregations": {
"result": {
"buckets": [
{
"min": 0.0,
"doc_count": 3
},
{
"min": 3.0,
"doc_count": 4
},
{
"min": 7.0,
"doc_count": 3
}
]
}
}
}
There is a similar function: "variable width histogram":
GET /test_index/_search?size=0
{
"aggs": {
"result": {
"variable_width_histogram": {
"field": "id",
"buckets": 3
}
}
},
"query": {
"match_all": {}
}
}
But "variable width histogram" separates documents by id value, not by the number of elements in the bucket
Assuming your mapping is like:
{
"some_numeric_field" : {"type" : "integer"}
}
Then you can build histograms out of it with fixed interval sizes:
POST /my_index/_search?size=0
{
"aggs": {
"some_numeric_field": {
"histogram": {
"field": "some_numeric_field",
"interval": 7
}
}
}
}
Results:
{
...
"aggregations": {
"prices": {
"buckets": [
{
"key": 0.0,
"doc_count": 7
},
{
"key": 7.0,
"doc_count": 7
},
{
"key": 14.0,
"doc_count": 7
}
]
}
}
}
To get the individual values inside each bucket, just add a sub-aggregation, maybe "top_hits" or anything else like a "terms"
aggregation.
Without knowing more about your data, I really cannot help further.

Nested array of objects aggregation in Elasticsearch

Documents in the Elasticsearch are indexed as such
Document 1
{
"task_completed": 10
"tagged_object": [
{
"category": "cat",
"count": 10
},
{
"category": "cars",
"count": 20
}
]
}
Document 2
{
"task_completed": 50
"tagged_object": [
{
"category": "cars",
"count": 100
},
{
"category": "dog",
"count": 5
}
]
}
As you can see that the value of the category key is dynamic in nature. I want to perform a similar aggregation like in SQL with the group by category and return the sum of the count of each category.
In the above example, the aggregation should return
cat: 10,
cars: 120 and
dog: 5
Wanted to know how to write this aggregation query in Elasticsearch if it is possible. Thanks in advance.
You can achieve your required result, using nested, terms, and sum aggregation.
Adding a working example with index mapping, search query and search result
Index Mapping:
{
"mappings": {
"properties": {
"tagged_object": {
"type": "nested"
}
}
}
}
Search Query:
{
"size": 0,
"aggs": {
"resellers": {
"nested": {
"path": "tagged_object"
},
"aggs": {
"books": {
"terms": {
"field": "tagged_object.category.keyword"
},
"aggs":{
"sum_of_count":{
"sum":{
"field":"tagged_object.count"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"resellers": {
"doc_count": 4,
"books": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cars",
"doc_count": 2,
"sum_of_count": {
"value": 120.0
}
},
{
"key": "cat",
"doc_count": 1,
"sum_of_count": {
"value": 10.0
}
},
{
"key": "dog",
"doc_count": 1,
"sum_of_count": {
"value": 5.0
}
}
]
}
}
}

Stats Aggregation with Min Mode in ElasticSearch

I have the below mapping in ElasticSearch
{
"properties":{
"Costs":{
"type":"nested",
"properties":{
"price":{
"type":"integer"
}
}
}
}
}
So every document has an Array field Costs, which contains many elements and each element has price in it. I want to find the min and max price with the condition being - that from each array the element with the minimum price should be considered. So it is basically min/max among the minimum value of each array.
Lets say I have 2 documents with the Costs field as
Costs: [
{
"price": 100,
},
{
"price": 200,
}
]
and
Costs: [
{
"price": 300,
},
{
"price": 400,
}
]
So I need to find the stats
This is the query I am currently using
{
"costs_stats":{
"nested":{
"path":"Costs"
},
"aggs":{
"price_stats_new":{
"stats":{
"field":"Costs.price"
}
}
}
}
}
And it gives me this:
"min" : 100,
"max" : 400
But I need to find stats after taking minimum elements of each array for consideration.
So this is what i need:
"min" : 100,
"max" : 300
Like we have a "mode" option in sort, is there something similar in stats aggregation also, or any other way of achieving this, maybe using a script or something. Please suggest. I am really stuck here.
Let me know if anything is required
Update 1:
Query for finding min/max among minimums
{
"_source":false,
"timeout":"5s",
"from":0,
"size":0,
"aggs":{
"price_1":{
"terms":{
"field":"id"
},
"aggs":{
"price_2":{
"nested":{
"path":"Costs"
},
"aggs":{
"filtered":{
"aggs":{
"price_3":{
"min":{
"field":"Costs.price"
}
}
},
"filter":{
"bool":{
"filter":{
"range":{
"Costs.price":{
"gte":100
}
}
}
}
}
}
}
}
}
},
"minValue":{
"min_bucket":{
"buckets_path":"price_1>price_2>filtered>price_3"
}
}
}
}
Only few buckets are coming and hence the min/max is coming among those, which is not correct. Is there any size limit.
One way to achieve your use case is to add one more field id, in each document. With the help of id field terms aggregation can be performed, and so buckets will be dynamically built - one per unique value.
Then, we can apply min aggregation, which will return the minimum value among numeric values extracted from the aggregated documents.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"Costs": {
"type": "nested"
}
}
}
}
Index Data:
{
"id":1,
"Costs": [
{
"price": 100
},
{
"price": 200
}
]
}
{
"id":2,
"Costs": [
{
"price": 300
},
{
"price": 400
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
}
Using stats aggregation also, it can be achieved (if you add one more field id that uniquely identifies your document)
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"costs_stats": {
"nested": {
"path": "Costs"
},
"aggs": {
"price_stats_new": {
"stats": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Update 1:
To find the maximum value among those minimums (as seen in the above query), you can use max bucket aggregation
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
},
"maxValue": {
"value": 300.0,
"keys": [
"2"
]
}
}

Elasticsearch: total term frequency and doc count from given set of documents

I am trying to get total term frequency and document count from given set of documents, but _termvectors in elasticsearch returns ttf and doc_count from all documents within the index. Is there any way so that I can specify list of documents (document ids) so that result will based on those documents only.
Below are documents details and query to get total term frequency:
Index details:
PUT /twitter
{ "mappings": {
"tweets": {
"properties": {
"name": {
"type": "text",
"analyzer":"english"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
}
Document Details:
PUT /twitter/tweets/1
{
"name":"Hello bar"
}
PUT /twitter/tweets/2
{
"name":"Hello foo"
}
PUT /twitter/tweets/3
{
"name":"Hello foo bar"
}
It will create three document with ids 1, 2 and 3. Now suppose tweets with ids 1 and 2 belongs to user1 and 3 belong to another user and I want to get the termvectors for user1.
Query to get this result:
GET /twitter/tweets/_mtermvectors
{
"ids" : ["1", "2"],
"parameters": {
"fields": ["name"],
"term_statistics": true,
"offsets":false,
"payloads":false,
"positions":false
}
}
Response:
{
"docs": [
{
"_index": "twitter",
"_type": "tweets",
"_id": "1",
"_version": 1,
"found": true,
"took": 1,
"term_vectors": {
"name": {
"field_statistics": {
"sum_doc_freq": 7,
"doc_count": 3,
"sum_ttf": 7
},
"terms": {
"bar": {
"doc_freq": 2,
"ttf": 2,
"term_freq": 1
},
"hello": {
"doc_freq": 3,
"ttf": 3,
"term_freq": 1
}
}
}
}
},
{
"_index": "twitter",
"_type": "tweets",
"_id": "2",
"_version": 1,
"found": true,
"took": 1,
"term_vectors": {
"name": {
"field_statistics": {
"sum_doc_freq": 7,
"doc_count": 3,
"sum_ttf": 7
},
"terms": {
"foo": {
"doc_freq": 2,
"ttf": 2,
"term_freq": 1
},
"hello": {
"doc_freq": 3,
"ttf": 3,
"term_freq": 1
}
}
}
}
}
]
}
Here we can see hello is having doc_count 3 and ttf 3. How can I make it to consider only documents with given ids.
One approach I am thinking is to create different index for different users. But I am not sure if this approach is correct. With this approach indices will increase with users. Or can there be another solution?
To obtain term doc count on a subset of documents you may try to use simple aggregations.
You will have to enable fielddata in the mapping of the field (though it might become tough on memory, check out the documentation page about fielddata for more details):
PUT /twitter
{
"mappings": {
"tweets": {
"properties": {
"name": {
"type": "text",
"analyzer":"english",
"fielddata": true,
"term_vector": "yes"
}
}
}
}
}
Then use terms aggregation:
POST /twitter/tweets/_search
{
"size": 0,
"query": {
"terms": {
"_id": [
"1",
"2"
]
}
},
"aggs": {
"my_term_doc_count": {
"terms": {
"field": "name"
}
}
}
}
The response will be:
{
"hits": ...,
"aggregations": {
"my_term_doc_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "hello",
"doc_count": 2
},
{
"key": "bar",
"doc_count": 1
},
{
"key": "foo",
"doc_count": 1
}
]
}
}
}
I couldn't find a way to calculate total_term_frequency on the subset of documents though, I'm afraid it can't be done.
I would suggest to compute term vectors offline with _analyze API and store them in a separate index explicitly. In this way you will be able to use simple aggregations to compute also total term frequency. Here I show an example usage of _analyze API.
POST twitter/_analyze
{
"text": "Hello foo bar"
}
{
"tokens": [
{
"token": "hello",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "foo",
"start_offset": 6,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "bar",
"start_offset": 10,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 2
}
]
}
Hope that helps!

Elasticsearch - mutiplication of 2 fields and then sum aggregation

I'm trying to make a simple query in elasticsearch but I can't figure out how to do it. I searched all over the internet and there was no discussion on this situation.
Let's say I have items like those:
{
"item_id": 1,
"item_price": 100,
"item_quantity": 2
},
{
"item_id": 2,
"item_price": 200,
"item_quantity": 3
},
{
"item_id": 3,
"item_price": 150,
"item_quantity": 1
},
{
"item_id": 4,
"item_price": 250,
"item_quantity": 5
}
I want to make a query that will give me the result of the total price in the stock.
for example: 100*2 + 200*3 + 150*1 + 250*5
the result for this query supposed to be 2,200
The answer query for the last data is working, But what about this complex situation:
POST tests/test2/
{
"item_category": "aaa",
"items":
[
{
"item_id": 1,
"item_price": 100,
"item_quantity": 2
},
{
"item_id": 2,
"item_price": 150,
"item_quantity": 4
}
]
}
POST tests/test2/
{
"item_category": "bbb",
"items":
[
{
"item_id": 3,
"item_price": 200,
"item_quantity": 3
},
{
"item_id": 4,
"item_price": 200,
"item_quantity": 5
}
]
}
POST tests/test2/
{
"item_category": "ccc",
"items":
[
{
"item_id": 5,
"item_price": 300,
"item_quantity": 2
},
{
"item_id": 6,
"item_price": 150,
"item_quantity": 8
}
]
}
POST tests/test2/
{
"item_category": "ddd",
"items":
[
{
"item_id": 7,
"item_price": 80,
"item_quantity": 10
},
{
"item_id": 8,
"item_price": 250,
"item_quantity": 4
}
]
}
In this case the next query is not working and give me a wrong answer (1,420 instead of 6,000):
GET tests/test2/_search
{
"query": {
"match_all": { }
},
"aggs": {
"total_price": {
"sum": {
"script": {
"lang": "painless",
"inline": "doc['items.item_price'].value * doc['items.item_quantity'].value"
}
}
}
}
}
You can use sum aggregation for values calculated using script
{
"aggs": {
"total_price": {
"sum": {
"script": {
"lang": "painless",
"inline": "doc['item_price'].value * doc['item_quantity'].value"
}
}
}
}
}
Take a look here https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html#_script_9 for more details
Update
As for your advanced case, it would be better to map your items field as nested type, after that you can use this aggregation
{
"aggs": {
"nested": {
"nested": {
"path": "items"
},
"aggs": {
"total_price": {
"sum": {
"script": {
"inline": "doc['items.item_price'].value * doc['items.item_quantity'].value"
}
}
}
}
}
}
}
this is the mapping query for the example DB in the question:
PUT tests
{
"mappings": {
"test2": {
"properties": {
"items": {
"type": "nested"
}
}
}
}
}
just to clarify, You must make the mapping query before the index has been created. (changing mapping for existing field is not allowed).

Resources