Get bucket key within scripted_metric - elasticsearch

is there any way I can grab a bucket's key from within a scripted_metric?
I have an issue where I need to grab some specific data from within a document that is being aggregated.
For example, this is an example of the document I am working on:
{
"attr1": "thing",
"groups": [
{
"id": 1,
"name": "foo"
},
{
"id": 2,
"name": "bar"
},
{
"id": 3,
"name": "baz"
}
],
"otherAttrs": true
}
Figure 1 (Document structure)
I am doing a terms aggregation on the distinct group IDs, but within each bucket, I'd like to put the name of the group that is represented by the bucket_key (which would be the id).
This is an example of the terms aggregation I am using:
{
"terms": {
"execution_hint": "global_ordinals_hash",
"field": "actors.groups.id",
"min_doc_count": 1
}
}
Figure 2 (Terms Aggregation to create buckets where I am trying to set name as a field)
So ideally my response would look something like this:
{
"...": "...",
"buckets" : [
{
"key" : 1,
"group_name": "foo",
"doc_count" : 42684,
"measure 0" : {
"value" : 37180
},
"measure 3" : {
"doc_count" : 37180,
"measure 3" : { "value" : 68 }
},
"measure 4" : {
"doc_count" : 3008,
"measure 4" : {
"value" : 3008
}
}
}
]
}
Figure 3 (Ideal Response format)
Notice how the key corresponds with the name found in Figure 1
So I am currently receiving a response similar to Figure 3 (without group_name) and I cannot for the life of me figure out how to extract the name field because it's within a document being aggregated.
Due to the nature of the documents I'm working with, this has to happen within a bucket aggregation but this one attribute is not an aggregation, it's just a single metric that I need to pluck off of one document.
So my attempt to solve this issue was to use a scripted_metric:
{
"...":"...",
"group_name": {
"scripted_metric": {
"map_script": {
"lang": "painless",
"source": """
for (HashMap group : params._source.actor.groups) {
String groupId = < bucket_key_here >;
if (groupId != null && !groupId.isEmpty()) {
params._aggs.name = params._source.actor.groups[groupId].name;
}
}
"""
},
"reduce_script": {
"lang": "painless",
"source": "return params._aggs.length > 0 ? params._aggs[0].name : null;"
}
}
},
"...":"..."
}
Figure 4 (Current attempt to use a scripted_metric to tease out the group name)
I cannot figure out how to access the bucket's key value which means even if I use _source to access the JSON structure of the document being aggregated, I cannot see the bucket in order to determine which group is the correct name.
Notice in Figure 1 that it's possible for one document to contain multiple groups. So I need to be able to reference the key in order to match the name from the corresponding id.
Please let me know if I can clarify or expound on anything to make this issue more clear.
Regards

Related

Custom ordering on elastic search

I'm executing a simple query which returns items matched by companyId.
In addition to only showing clients matching a specific company I also want records matching a certain location to appear at the top.So if somehow I pass through pseudo sort:"location=Johannesburg" it would return the data below and items which match the specific location would appear on top, followed by items with other locations.
Data:
{
"clientId" : 1,
"clientName" : "Name1",
"companyId" : 8,
"location" : "Cape Town"
},
{
"clientId" : 2,
"clientName" : "Name2",
"companyId" : 8,
"location" : "Johannesburg"
}
Query:
{
"query": {
"match": {
"companyId": "8"
}
},
"size": 10,
"_source": {
"includes": [
"firstName",
"companyId",
"location"
]
}
}
Is something like this possible in elastic and if so what is the name of this concept?(I'm not sure what to even Google for to solve this problem)
It can be done in different ways.
Simplest (if go only with text matching) is use bool query with should statement.
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document. Doc
Example:
{"query":
"bool": {
"must": [
"match": {
"companyId": "8"
}
],
"should": [
"match": {
"location": "Johannesburg"
}
]
}
}
}
More complex solution is to store GEO points in location, and use Distance feature query as example.

Elastic Search Query for Multi-valued Data

ES Data is indexed like this :
{
"addresses" : [
{
"id" : 69,
"location": "New Delhi"
},
{
"id" : 69,
"location": "Mumbai"
}
],
"goods" : [
{
"id" : 396,
"name" : "abc",
"price" : 12500
},
{
"id" : 167,
"name" : "XYz",
"price" : 12000
},
{
"id" : 168,
"name" : "XYz1",
"price" : 11000
},
{
"id" : 169,
"name" : "XYz2",
"price" : 13000
}
]
}
In my query I want to fetch records which should have at-least one of the address matched and goods price range between 11000 and 13000 and name xyz.
When your data contains arrays of complex objects like a list of addresses or a list of goods, you probably want to have a look at elasticsearch's nested objects to avoid running into problems when your queries result in more items than you would expect.
The issue here is the way how elasticsearch (and in effect lucene) stores the data. As there is no such concept of lists of nested objects directly, the data is flattened and the connection between e.g. XYz and 12000 is lost. So you would also get this document as result when you query for XYz and 12500 as the price of 12500 is also there in the list of values for goods.price. To avoid this, you can use the nested objects feature of elasticsearch which basically extracts all inner objects into a hidden index and allows querying for several fields that occur in one specific object instead of "in any of the objects". For more details, have a look at the docs on nested objects which also explains this pretty good.
In your case a mapping could look like the following. I assume, you only want to query for the addresses.location text without providing the id, so that this list can remain the simple object type instead of also being a nested type. Also, I assume you query for exact matches. If this is not the case, you need to switch from keyword to text and adapt the term query to be some match one...
PUT nesting-sample
{
"mappings": {
"item": {
"properties": {
"addresses": {
"properties": {
"id": {"type": "integer"},
"location": {"type": "keyword"}
}
},
"goods": {
"type": "nested",
"properties": {
"id": {"type": "integer"},
"name": {"type": "keyword"},
"price": {"type": "integer"}
}
}
}
}
}
}
You can then use a bool query on the location and a nested query to match the inner documents of your goods list.
GET nesting-sample/item/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"addresses.location": "New Delhi"
}
},
{
"nested": {
"path": "goods",
"query": {
"bool": {
"must": [
{
"range": {
"goods.price": {
"gte": 12200,
"lt": 12999
}
}
},
{
"term": {
"goods.name": {
"value": "XYz"
}
}
}
]
}
}
}
}
]
}
}
}
This query will not match the document because the price range is not in the same nested object as the exact name of the good. If you change the lower bound to 12000 it will match.
Please check your use case and be aware of the warning on the bottom of the docs regarding the mapping explosion when using nested fields.

ElasticSearch return non analyzed version of analyzed aggregate

I am having a problem implementing a autocomplete feature using the data in elastic search.. my documents currently have this kind of structure
PUT mainindex/books/1
{
"title": "The unread book",
"author": "Mario smith",
"tags": [ "Comedy", "Romantic" , "Romantic Comedy","México"]
}
all the fields are indexed, and the mapping for the tags is a lowercase,asciifolding filter..
Now the functionality that is required is that if the user types mario smith rom..., I need to sugest tags starting with rom.. but only for books of mario smith.. this required breaking the text into components.. and I already got that part.. the current query is something like this ..
{
"query": {
"query_string": {
"query": "mario smith",
"default_operator": "AND"
}
},
"size": 0,
"aggs": {
"autocomplete": {
"terms": {
"field": "suggest",
"order": {
"_term": "asc"
},
"include": {
"pattern": "rom.*"
}
}
}
}
}
and this returns the expected result, a list of word that the user should type next based on the query.. and the prefix of the word he is starting to type..
{
"aggregations" : {
"autocomplete" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "romantic comedy",
"doc_count" : 4
},
{
"key" : "romantic",
"doc_count" : 2
}
]
}
}
}
now the problem is that I can't present these words to the user because they are lowercase and without accents words liker México got indexed like mexico.. and in my language makes some words look weird.. if i remove the filters from the tag field the values are correctly saved into the index. but the pattern rom.* will not match because the user is typing in a diferrent case and may not use the correct accents..
in general terms what is need is to take a filtered set of documents.. aggregate their tags, return them in their natural format.. but filter out the ones that dont have the same prefix. filtering them in a case/accent insentitive way..
PS: I saw some suggestions about having 2 versions of the field,one analyzed and one raw.. but cant seem to be able to filter by one and return the other..
does anyone have an idea, how perform this query or implement this functionality?

Return distinct values in Elasticsearch

I am trying to solve an issue where I have to get distinct result in the search.
{
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "GEORGE",
"favorite_cars" : [ "honda","Hyundae" ]
}
When I perform a term query on favourite cars "ferrari". I get two results whose name is ABC. I simply want that the result returned should be one in this case. So my requirement will be if I can apply a distinct on name field to receive one 1 result.
Thanks
One way to achieve what you want is to use a terms aggregation on the name field and then a top_hits sub-aggregation with size 1, like this:
{
"size": 0,
"query": {
"term": {
"favorite_cars": "ferrari"
}
},
"aggs": {
"names": {
"terms": {
"field": "name"
},
"aggs": {
"single_result": {
"top_hits": {
"size": 1
}
}
}
}
}
}
That way, you'll get a single term ABC and then nested into it a single matching document

How to get distinct results in ElasticSearch if a field is the same

I have an ElasticSearch service version 1.4 with an index 40M record of data.
I have data that has the same parent field. I would like to extract 1 unique result out of the same parent only.
Ex:
{
"id": "7835",
"isbn": "3985",
"parent_id": "7819",
},
{
"id": "1835",
"isbn": "4935",
"parent_id": "7719",
},
{
"id": "2835",
"isbn": "9985",
"parent_id": "7819",
}
The expected result that I would like to have is:
{
"id": "7835",
"isbn": "3985",
"parent_id": "7819",
},
{
"id": "1835",
"isbn": "4935",
"parent_id": "7719",
},
I have checked out aggregations:
ElasticSearch - Return Unique Values
{
"aggs" : {
"parentId" : {
"terms" : { "field" : "parent_id" }
}
However the response I get - show the 3 items (so the last one doesn't get ignored), and I have term buckets with the key afterwards inside the aggregations response, which to me is not useful as it seems to tell me how many occurrence per key inside the doc, which is not the desired output.
In order not to search for original document, you should add "size":0 above aggregation.
You can see only the number of documents per each parent_id in buckets field of response.
{
"size" : 0,
"aggs" : {
"parentId" : {
"terms" : { "field" : "parent_id" }
}
}

Resources