cannot agregate in elasticsearch

cannot agregate in elasticsearch - elasticsearch

I have a service with logs in elasticsearch. I want to get users who have used my service.
Detailed log lines were returned on my request, but I want to get a unique "kubernetes.pod_name":
{
"size": 10000,
"_source": ["kubernetes.pod_name"],
"query": {"bool": {"filter": [
{"match": {"kubernetes.labels.app" : "jupyterhub"}},
{"match_phrase": {"log": "200 GET"}}
]}},
"aggs": {"pods": {"terms": {"field": "kubernetes.pod_name"}}}
}
why aren't the log lines grouped in the "aggs" section? What to do to get unique users?
Upd:
my query return:
{'took': 614,
'timed_out': False,
'_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
'hits': {'total': 17703,
'max_score': 0.0,
'hits': [{'_index': 'dwh-dev-2020-10-14',
'_type': 'container_log',
'_id': 'vQ6vJHUBU_u817onY-cZ',
'_score': 0.0,
'_source': {'kubernetes': {'pod_name': 'jupyter-lyisova-2evg'}}},
{'_index': 'dwh-dev-2020-10-14',
'_type': 'container_log',
'_id': 'xA6vJHUBU_u817onY-cZ',
'_score': 0.0,
'_source': {'kubernetes': {'pod_name': 'jupyter-lyisova-2evg'}}},
{'_index': 'dwh-dev-2020-10-14',
'_type': 'container_log',
'_id': '6g6vJHUBU_u817onY-cZ',
'_score': 0.0,
'_source': {'kubernetes': {'pod_name': 'jupyter-bogdanov'}}},
...
I want to get 20 lines instead of 17703 where each line corresponds to a unique "kubernetes.pod_name"

You can merge between terms aggregation and filter aggregation
{
"aggs": {
"labels_filter": {
"filter": [
{
"match": {
"kubernetes.labels.app": "jupyterhub"
}
},
{
"match_phrase": {
"log": "200 GET"
}
}
],
"aggs": {
"pods": {
"terms": {
"field": "kubernetes.pod_name"
}
}
}
}
}
}

Related

Elasticsearch Aggregation of large list

I'm trying to count how many times ingredients show up in different documents. My index body is similar to this
index_body = {
"settings":{
"index":{
"number_of_replicas":0,
"number_of_shards":4,
"refresh_interval":"-1",
"knn":"true"
}
},
"mappings":{
"properties":{
"recipe_id":{
"type":"keyword"
},
"recipe_title":{
"type":"text",
"analyzer":"standard",
"similarity":"BM25"
},
"description":{
"type":"text",
"analyzer":"standard",
"similarity":"BM25"
},
"ingredient":{
"type":"keyword"
},
"image":{
"type":"keyword"
},
....
}
}
In the ingredient field, I've stored an array of strings of each ingredient [ingredient1,ingredient2,....]
I have around 900 documents. Each with their own ingredients list.
I've tried using Elasticsearch's aggregations but it seems to not return what I expected.
Here is the query I've been using:
{
"size":0,
"aggs":{
"ingredients":{
"terms": {"field":"ingredient"}
}
}
}
But it returns this:
{'took': 4, 'timed_out': False, '_shards': {'total': 4, 'successful': 4, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 994, 'relation': 'eq'}, 'max_score': None, 'hits': []}, 'aggregations': {'ingredients': {'doc_count_error_upper_bound': 56, 'sum_other_doc_count': 4709, 'buckets': [{'key': 'salt', 'doc_count': 631}, {'key': 'oil', 'doc_count': 320}, {'key': 'sugar', 'doc_count': 314}, {'key': 'egg', 'doc_count': 302}, {'key': 'butter', 'doc_count': 291}, {'key': 'flour', 'doc_count': 264}, {'key': 'garlic', 'doc_count': 220}, {'key': 'ground pepper', 'doc_count': 185}, {'key': 'vanilla extract', 'doc_count': 146}, {'key': 'lemon', 'doc_count': 131}]}}}
This is clearly wrong, as I have many ingredients. What am I doing wrong? Why is it returning only these ones? Is there a way to force Elasticsearch to return all counts?

You need to specify size inside the aggregation.
{
"size":0,
"aggs":{
"ingredients":{
"terms": {"field":"ingredient", "size": 10000}
}
}
}

Generating data tables in elastic search

I'm trying to make a data table which consists of some calculations
******************************************************
** Bidder * Request * CPM * Revenue * Response Time **
******************************************************
I've created an index which holds all the data, so my data is stored in following format:
{
"data": {
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 78,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "nits_media_bid_won",
"_type": "nits_media_data_collection",
"_id": "MIyt6m8BWa2IbVphmPUh",
"_score": 1,
"_source": {
"bidderCode": "appnexus",
"width": 300,
"height": 600,
"statusMessage": "Bid available",
"adId": "43d59b34fd61b5",
"requestId": "2c6d19dcc536c3",
"mediaType": "banner",
"source": "client",
"cpm": 0.5,
"creativeId": 98493581,
"currency": "USD",
"netRevenue": true,
"ttl": 300,
"adUnitCode": "/19968336/header-bid-tag-0",
"appnexus": {
"buyerMemberId": 9325
},
"meta": {
"advertiserId": 2529885
},
"originalCpm": 0.5,
"originalCurrency": "USD",
"auctionId": "a628c0c0-bd4d-4f2a-9011-82fab780910e",
"responseTimestamp": 1580190231422,
"requestTimestamp": 1580190231022,
"bidder": "appnexus",
"timeToRespond": 400,
"pbLg": "0.50",
"pbMg": "0.50",
"pbHg": "0.50",
"pbAg": "0.50",
"pbDg": "0.50",
"pbCg": null,
"size": "300x600",
"adserverTargeting": {
"hb_bidder": "appnexus",
"hb_adid": "43d59b34fd61b5",
"hb_pb": "0.50",
"hb_size": "300x600",
"hb_source": "client",
"hb_format": "banner"
},
"status": "rendered",
"params": [
{
"placementId": 13144370
}
],
"nits_account": "asjdfagsd2384vasgd19",
"nits_url": "http://nitsmedia.local/run-ad",
"session_id": "YTGpETKSk2nHwLRB6GbP",
"timestamp": "2020-01-28T05:43:51.702Z",
"geo_data": {
"continent": "North America",
"address_format": "{{recipient}}\n{{street}}\n{{city}} {{region_short}} {{postalcode}}\n{{country}}",
"alpha2": "US",
"alpha3": "USA",
"country_code": "1",
"international_prefix": "011",
"ioc": "USA",
"gec": "US",
"name": "United States of America",
"national_destination_code_lengths": [
3
],
"national_number_lengths": [
10
],
"national_prefix": "1",
"number": "840",
"region": "Americas",
"subregion": "Northern America",
"world_region": "AMER",
"un_locode": "US",
"nationality": "American",
"postal_code": true,
"unofficial_names": [
"United States",
"Vereinigte Staaten von Amerika",
"États-Unis",
"Estados Unidos",
"アメリカ合衆国",
"Verenigde Staten"
],
"languages_official": [
"en"
],
"languages_spoken": [
"en"
],
"geo": {
"latitude": 37.09024000000000143018041853792965412139892578125,
"latitude_dec": "39.44325637817383",
"longitude": -95.7128909999999990532160154543817043304443359375,
"longitude_dec": "-98.95733642578125",
"max_latitude": 71.5388001000000031126546673476696014404296875,
"max_longitude": -66.8854170000000038953658076934516429901123046875,
"min_latitude": 18.77629999999999910187398199923336505889892578125,
"min_longitude": 170.595699999999993679011822678148746490478515625,
"bounds": {
"northeast": {
"lat": 71.5388001000000031126546673476696014404296875,
"lng": -66.8854170000000038953658076934516429901123046875
},
"southwest": {
"lat": 18.77629999999999910187398199923336505889892578125,
"lng": 170.595699999999993679011822678148746490478515625
}
}
},
"currency_code": "USD",
"start_of_week": "sunday"
}
}
},
//Remaining data set....
]
},
}
}
So as per my data set I want to fetch all unique bidderCode (which will be represented as Bidder in the table) and make the data with calculation respective to it. For example
Request - This will be total number of docs count in aggregation
CPM - CPM will be sum of all CPM divided by 1000
Revenue - Total CPM multiplied by 1000
Response time - Average of (responseTimestamp - requestTimestamp)
How can I achieve this, I'm bit confused with it. I tried building the blocks by:
return $this->elasticsearch->search([
'index' => 'nits_media_bid_won',
'body' => [
'query' => $query,
'aggs' => [
'unique_bidders' => [
'terms' => ['field' => 'bidderCode.keyword']
],
'aggs' => [
'sum' => [
'cpm' => [
'field' => 'cpm',
'script' => '_value / 1000'
]
]
],
]
]
]);
But it is showing me error:
{
"error":{
"root_cause":[
{
"type":"x_content_parse_exception",
"reason":"[1:112] [sum] unknown field [cpm], parser not found"
}
],
"type":"x_content_parse_exception",
"reason":"[1:112] [sum] unknown field [cpm], parser not found"
},
"status":400
}
I'm new to this help me out in it. Thanks.

ElasticSearch isn't wrong -- you've swapped the aggregation name with its type. It cannot parse the agg type cpm.
Here's the corrected query:
GET nits_media_bid_won/_search
{
"size": 0,
"aggs": {
"unique_bidders": {
"terms": {
"field": "bidderCode.keyword",
"size": 10
},
"aggs": {
"cpm": { <----------
"sum": { <----------
"field": "cpm",
"script": "_value / 1000"
}
}
}
}
}
}

_update_by_query fails to update all documents in ElasticSearch

I have over 30 million documents in Elasticsearch (version - 6.3.3), I am trying to add new field to all existing documents and setting the value to 0.
For example: I want to add start field which does not exists previously in Twitter document, and set it's initial value to 0, in all 30 million documents.
In my case I was able to update 4 million only. If I try to check the submitted task with TASK API http://localhost:9200/_task/{taskId}, result from says something like ->
{
"completed": false,
"task": {
"node": "Jsecb8kBSdKLC47Q28O6Pg",
"id": 5968304,
"type": "transport",
"action": "indices:data/write/update/byquery",
"status": {
"total": 34002005,
"updated": 3618000,
"created": 0,
"deleted": 0,
"batches": 3619,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0
},
"description": "update-by-query [Twitter][tweet] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.Twitter.start = 0;', options={}, params={}}",
"start_time_in_millis": 1574677050104,
"running_time_in_nanos": 466805438290,
"cancellable": true,
"headers": {}
}
}
The query I am executing against ES , is something like:
curl -XPOST "http://localhost:9200/_update_by_query?wait_for_completion=false&conflicts=proceed" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.Twitter.start = 0;"
},
"query": {
"exists": {
"field": "Twitter"
}
}
}'
Any suggestions would be great, thanks

elasticsearch response hits is not showing up

I am utilizing elasticsearch and after running a search, this is the response I get
{'took': 7, 'timed_out': False, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}, 'hits': {'total': 1, 'max_score': 0.2876821, 'hits': []}}
My question is why is hits.total = 1 but hits.hits is empty?
Here is the query I used:
"query": {
"bool": {
"must": [
{"match_phrase": { "theName": "bill" } }
]
}
}
I know data exists in my node because when I did the search below (with the same url + index + type in the post request), I got hits.hits to be filled with the result.
"query" : {
"match_all" : {}
}

I had from = 40 that was causing the issue.

How to get latest values for each group with an Elasticsearch query?

I have some documents indexed on Elasticsearch, looking like these samples:
{'country': 'France', 'collected': '2015-03-12', 'value': 20}
{'country': 'Canada', 'collected': '2015-03-12', 'value': 21}
{'country': 'Brazil', 'collected': '2015-03-12', 'value': 33}
{'country': 'France', 'collected': '2015-02-01', 'value': 10}
{'country': 'Canada', 'collected': '2015-02-01', 'value': 11}
{'country': 'Mexico', 'collected': '2015-02-01', 'value': 9}
...
I want to build a query that gets one result per country, getting only the ones with max(collected).
So, for the examples shown above, the results would be something like:
{'country': 'France', 'collected': '2015-03-12', 'value': 20}
{'country': 'Canada', 'collected': '2015-03-12', 'value': 21}
{'country': 'Brazil', 'collected': '2015-03-12', 'value': 33}
{'country': 'Mexico', 'collected': '2015-02-01', 'value': 9}
I realized I need to do aggregation on country, but I'm failing to understand how to limit the results on max(collected).
Any ideas?

You can use a top_hits aggregation that groups on the country field, returns 1 doc per group, and orders the docs by the collected date descending:
POST /test/_search?search_type=count
{
"aggs": {
"group": {
"terms": {
"field": "country"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"collected": {
"order": "desc"
}
}
]
}
}
}
}
}
}

For those like user1892775 who run into "Fielddata is disabled on text fields by default...", you can create a multi field (https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html). So you might have mapping like:
"mapping": {
"properties": {
"country": {"type": "string", "fields": {"raw": {"type": "string", "index": "not_analyzed"}}}
}
Then your query would look like
POST /test/_search?search_type=count
{
"aggs": {
"group": {
"terms": {
"field": "country.raw"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"collected": {
"order": "desc"
}
}
]
}
}
}
}
}
}
(Note the use of country.raw)

The answer marked correct worked great for me. Here is how I added some extra filters. This is version 7.4 on AWS.
The field I'm grouping by is a keyword field named tags.
For each group (tag), get top 3 documents sorted by date_uploaded descending.
Also show the total amount of documents within each group (tag).
Only consider non-deleted documents belonging to user 22.
Only return 10 groups (tags), sorted alphabetically.
For each document, return its ID (book_id) and date_uploaded. (Default is that all info is returned.)
Size:0 keeps the query from returning lots of info about all the documents.
{'query': {'bool': {'filter': [{'terms': {'user_id': [22]}}, {'terms': {'deleted': ['false']}}]}},
'size': 0,
"aggs": {
"group": {
"terms": {
"field": "tags.keyword",
"size":10,
"order":{ "_key": "asc" }
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 3,
"_source":["book_id","date_uploaded"],
"sort": [ {"date_uploaded": { "order": "desc" }}]
}
}
}
}
}
}
Here is how to get each group (tag in my case) and the document matches for each group.
query_results = ... result of query
buckets = query_results["aggregations"]["group"]["buckets"]
for bucket in buckets:
tag = bucket["key"]
tag_doc_count = bucket["doc_count"]
print tag, tag_total_doc_count
tag_hits = bucket["group_docs"]["hits"]["hits"]
for hit in tag_hits:
source = hit["_source"]
print source["book_id"], source["date_uploaded"]
FYI, the "group" term can be named anything. Just make sure to use the same name when getting buckets from your query results.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

cannot agregate in elasticsearch - elasticsearch

You can merge between terms aggregation and filter aggregation { "aggs": { "labels_filter": { "filter": [ { "match": { "kubernetes.labels.app": "jupyterhub" } }, { "match_phrase": { "log": "200 GET" } } ], "aggs": { "pods": { "terms": { "field": "kubernetes.pod_name" } } } } } }

Related

Elasticsearch Aggregation of large list

Generating data tables in elastic search

_update_by_query fails to update all documents in ElasticSearch

elasticsearch response hits is not showing up

How to get latest values for each group with an Elasticsearch query?

Categories

Resources