two level nested aggregation in elastic search based on condition over first level aggregation - elasticsearch

My ES document structure is like this:
{
"_index": "my_index",
"_type": "_doc",
"_id": "1296",
"_version": 1,
"_seq_no": 431,
"_primary_term": 1,
"_routing": "1296",
"found": true,
"_source": {
"id": 1296,
"test_name": "abc"
"test_id": 513
"inventory_arr"[
{
"city": "bangalore",
"after_tat": 168,
"before_tat": 54,
"popularity_score": 15,
"rank": 0,
"discounted_price": 710,
"labs": [
{
"lab_id": 395,
"lab_name": "Prednalytics Laboratory",
"lab_rating": 34,
},
{
"lab_id": 363,
"lab_name": "Neuberg Diagnostics",
"lab_rating": 408,
}
]
},
{
"city": "mumbai",
"after_tat": 168,
"before_tat": 54,
"popularity_score": 15,
"rank": 0,
"discounted_price": 710,
"labs": [
{
"lab_id": 395,
"lab_name": "Prednalytics Laboratory",
"lab_rating": 34,
},
{
"lab_id": 380,
"lab_name": "Neuberg Diagnostics",
"lab_rating": 408,
}
]
}
]
}
}
I want to know how many tests are performed in each lab that is in Bangalore.
The problem I'm facing that:
If grouping by lab_id using nested aggregation than it group by each lab no matter in which city it is.
Suppose there is only one record in my doc then I'm expecting answer like this for city Bangalore
[
{key: 395, doc_count: 1}
{key: 363, doc_count: 1}
]
Note: lab id can be duplicated in each city.

This problem can be solved using a filter aggregation.
When you are using a nested aggregation, you are iterating over the nested documents. The filter aggregation, filters out the nested documents that don't match the filter query that you provide inside. In your case you would want to filter out the nested documents that aren't inside the city of Bangalore. After you have removed those nested documents you can use another terms bucket aggregation on the lab_id.
Good luck!

Related

How to use JSONata to change JSON response

I am working with an API that returns the following JSON:
{
"rows": [
{
"keys": [
"search term 1",
"https://example.com/article-about-keyword-1/"
],
"clicks": 24,
"impressions": 54,
"ctr": 0.4444444444444444,
"position": 2.037037037037037
},
{
"keys": [
"search term 2",
"https://example.com/article-about-keyword-2/"
],
"clicks": 17,
"impressions": 107,
"ctr": 0.1588785046728972,
"position": 2.663551401869159
}
],
"responseAggregationType": "byPage"
}
And I'm trying to use JSONata to change it to something more like this:
{
"rows": [
{
"keyword": search term 1,
"URL": https://example.com/article-about-keyword-1/,
"clicks": 24,
"impressions": 54,
"ctr": 0.4444444444444444,
"position": 2.037037037037037
},
{
"keyword": search term 2,
"URL": https://example.com/article-about-keyword-2/,
"clicks": 17,
"impressions": 107,
"ctr": 0.1588785046728972,
"position": 2.663551401869159
}
],
"responseAggregationType": "byPage"
}
Basically, I'm trying the break the 'keys' part out into 'Keyword' and 'URL'.
Have been playing around for a while in https://try.jsonata.org/ but I'm not getting very far. Any help appreciated.
Splitting the keys array should be achievable by accessing each element there by its index (given that the keyword and the URL are guaranteed to appear on the same index).
Here’s the full JSONata expression to translate from your source file to the desired target shape:
{
"rows": rows.{
"keyword": keys[0],
"URL": keys[1],
"clicks": clicks,
"impressions": impressions,
"ctr": ctr,
"position": position
}[],
"responseAggregationType": responseAggregationType
}
By the way, I’ve built this solution in 2 minutes by using the Mappings tool that my team is building at Stedi.

Want to get distinct records in hits section from elasticsearch

I want to get all the distinct records as per "departmentNo" .
Please check the below Index Data : (it is dummy data.)
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 1, "employeeName": "vijay", ...}
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 2, "employeeName": "rathod", ...}
{'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 3, "employeeName": "ajay", ...}
{'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 4, "employeeName": "kamal", ...}
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 5, "employeeName": "rahul", ...}
I want the below output.
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 1, "employeeName": "vijay", ...}
{'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 3, "employeeName": "ajay", ...}
I was trying to get data in hits section. But didn't found the answer.
So I tried with aggeration. Used below query
{
"size": 0,
"aggs": {
"Group_By_Dept": {
"terms": {
"field": "departmentNo"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1
}
}
}
}
}
}
I got the data by the above query. But I want all the distinct data and they should support pagination + sorting.
In elastic 6.0 we could use bucket_sort , but I am using 5.6.7.So I can't use bucket_sort.
So Can I do it in any other way.?
If I could get data in hits's section then it will be good.
(I don't want to change my index mapping. Actually here i have added dummy mapping. but usecase is same.)
You can do that by using field collapsing:
{
"query": { ... },
"from": 153,
"size": 27,
"collapse": {
"field": "departmentNo"
}
}
This will leave only one document for each repeating value in such field. You can control which document it would be using standard sort (i.e. document with highest sort value among collapsed would be returned).
Please note that there is additional functionality called inner hits, which you may want to use in the future - be aware that it multiplies document fetches and negatively affects performance.

Elastic serach record upsert with a complex _id field

I have to upsert bulk records in elastic search index with _id being combination of more than one field from the message. Can I do so. if that can be done then please give me a sample json for the same.
Regards
A sample _id field I am looking for some thing like below
{
"_index": "kpi_aggr",
"_type": "KPIBackChannel",
"_id": "<<<combination of name , period_type>>>",
"_score": 1,
"_source": {
"name": "kpi-v1",
"period_type": "w",
"country": "AL",
"pg_name": "DENTAL CARE",
"panel_type": "retail",
"number_of_records_with_proposal": 10000,
"number_of_proposals": 80000,
"overall_number_of_records": 2000,
"#timestamp": 1442162810
}
}
Naturally, you can specify your own Elasticsearch document ids during a call to the Index API:
PUT kpi_aggr/KPIBackChannel/kpi-v1,w
{
"name": "kpi-v1",
"period_type": "w",
"country": "AL",
"pg_name": "DENTAL CARE",
"panel_type": "retail",
"number_of_records_with_proposal": 10000,
"number_of_proposals": 80000,
"overall_number_of_records": 2000,
"#timestamp": 1442162810
}
You can also do so during a _bulk API call:
POST _bulk
{ "index" : { "_index" : "kpi_aggr", "_type" : "KPIBackChannel", "_id" : "kpi-v1,w" } }
{"name":"kpi-v1","period_type":"w","country":"AL","pg_name":"DENTAL CARE","panel_type":"retail","number_of_records_with_proposal":10000,"number_of_proposals":80000,"overall_number_of_records":2000,"#timestamp":1442162810}
Notice that Elasticsearch will replace the document with the new version.
If you execute these two queries on an empty index, then querying by document id:
GET kpi_aggr/KPIBackChannel/kpi-v1,w
will give you the following:
{
"_index": "kpi_aggr",
"_type": "KPIBackChannel",
"_id": "kpi-v1,w",
"_version": 2,
"found": true,
"_source": {
"name": "kpi-v1",
"period_type": "w",
"country": "AL",
"pg_name": "DENTAL CARE",
"panel_type": "retail",
"number_of_records_with_proposal": 10000,
"number_of_proposals": 80000,
"overall_number_of_records": 2000,
"#timestamp": 1442162810
}
}
Notice "_version": 2, which in our case indicates that a document has been indexed twice, hence performed an "upsert" (but in general is meant to be used for Optimistic Concurrency Control).
Hope that helps!

Elasticsearch Nest Getting and updating a single Document

I would like to be able to select a single document here is a sample of how a document looks
{
"_index": "myindex_products",
"_type": "product",
"_id": "8Wct9mEBlkDZwzEMRfbG",
"_version": 1,
"_score": 1,
"_source": {
"productId": 5749,
"name": "Product Name Here",
"productCode": "PRODCODE",
"productCategoryId": 73,
"length": 6,
"height": 0,
"productTypeId": 1,
"url": "product-name-here",
"productBrandId": 7,
"width": 0,
"dispatchTimeInDays": 10,
"leadTimeInDays": 6,
"stockAvailable": 0,
"weightKg": 0.001,
"reviewRating": 5,
"reviewRatingCount": 17,
"limitedStock": false,
"price": 16.3,
"productImage": "28796-14654.jpg",
"productCategory": {
"productCategoryId": 73,
"name": "Accessories - New",
"fullPath": "Accessories - New",
"code": "00057"
},
"productSpecification": [
{
"productSpecificationId": 127151,
"productId": 5749,
"specificationId": 232,
"name": "Brand",
"value": "Brand1"
}
,
{
"productSpecificationId": 127175,
"productId": 5749,
"specificationId": 10,
"name": "Guarantee",
"value": "10 years"
}
]
}
}
_id is being generated when I index so I don't know this at the point I want to update. I have the productId value and I would like to use this to select a document to then update/delete is there a way to return a single document if you know a particular exact value.
Thanks
While indexing, you can use something like PUT your_index/5749 (5749 being your product id) and ES will use its value for the _id field instead of auto-generating it.

Aggregations on PyElasticSearch (pyes)

I wish to calculate value-count aggregations on some indexed product data, but I seem to be getting some parameters in the ValueCountAgg constructor wrong.
An example of such indexed data is as follows -:
{
"_index": "test-index",
"_type": "product_product",
"_id": "1",
"_score": 1,
"_source": {
"code": "SomeProductCode1",
"list_price": 10,
"description": null,
"displayed_on_eshop": "true",
"active": "true",
"tree_nodes": [],
"id": 1,
"category": {},
"name": "This is Product",
"price_lists": [
{
"price": 10,
"id": 1
},
{
"price": 10,
"id": 2
}
],
"attributes": {
"color": "blue",
"attrib": "something",
"size": "L"
},
"type": "goods"
}
}
I'm calculating aggregations as follows -:
for attribute in filterable_attributes:
count = ValueCountAgg(
name='count_'+attribute, field='attributes.'+attribute
)
query.agg.add(count)
where query is a ~pyes.query.Query object wrapped inside a ~pyes.query.Search object. filterable_attributes is a list of attribute names, such as color and size.
I have tried setting field=attribute as well, but it seems to make no difference. The resultset that I obtain on conducting the search has the following as its aggs attribute -:
{'count_size': {'value': 0}, 'count_color': {'value': 0}}
where size and color are indexed inside the attributes dictionary as shown above. These are evidently wrong results, and I think it is because I am not setting field properly.
Where am I going wrong?
I've found where I was going wrong.
According to Scoping Aggregations, the scope of an aggregation is by default associated with its query. My query was returning zero results, and I had to modify the search phrase for the same.
I got the required results after that, and aggregations are coming out right.
{'count_size': {'value': 3}, 'count_color': {'value': 3}}

Resources