Raw nested aggregation - elasticsearch

I would like to create a raw nested aggregation in ElasticSearch, but I'm enable to get it working.
My documents look like this :
{
"_index": "items",
"_type": "frame_spec",
"_id": "19770602001",
"_score": 1,
"_source": {
"item_type_name": "frame_spec",
"status": "published",
"creation_date": "2016-02-18T11:19:15Z",
"last_change_date": "2016-02-18T11:19:15Z",
"publishing_date": "2016-02-18T11:19:15Z",
"attributes": [
{
"brand": "Sun"
},
{
"model": "Sunglasses1"
},
{
"eyesize": "56"
},
{
"opc": "19770602001"
},
{
"madein": "UNITED KINGDOM"
}
]
}
}
What I want to do is to aggregate based on one of the attributes. I can't do a normal aggregation with "attributes.model" (for example) because some of them contain spaces. So I've tried using the "raw" property but it appears that ES considers it as a normal property and does not return any result.
This is what I've tried :
{
"size": 0,
"aggs": {
"brand": {
"terms": {
"field": "attributes.brand.raw"
}
}
}
}
But I have no result.
Have you any solution I could use for this problem ?

You should use a dynamic_template in your mapping that will catch all attributes.* string fields and create a raw sub-field for all of them. For other types than string, you don't really need raw fields. You need to delete your current index and then recreate it with this:
DELETE items
PUT items
{
"mappings": {
"frame_spec": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"path_match": "attributes.*",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}
After that, you need to re-populate your index and then you'll be able to run this:
POST /items/_search
{
"size": 0,
"aggs": {
"brand": {
"terms": {
"field": "attributes.brand.raw"
}
}
}
}

Related

search array of strings by partially match in elasticsearch

I got fields like that:
names: ["Red:123", "Blue:45", "Green:56"]
it's mapping is
"names": {
"type": "keyword"
},
how could I search like this
{
"query": {
"match": {
"names": "red"
}
}
}
to get all the documents where red is in element of names array?
Now it works only with
{
"query": {
"match": {
"names": "red:123"
}
}
}
You can add multi fields OR just change the type to text, to achieve your required result
Index Mapping using multi fields
{
"mappings": {
"properties": {
"names": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings":{
"properties":{
"names":{
"type":"text"
}
}
}
}
Index Data:
{
"names": [
"Red:123",
"Blue:45",
"Green:56"
]
}
Search Query:
{
"query": {
"match": {
"names": "red"
}
}
}
Search Result:
"hits": [
{
"_index": "64665127",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"names": [
"Red:123",
"Blue:45",
"Green:56"
]
}
}
]

How I can get the distinct result?

What I am trying to do is the query to elastic search (ver 6.4), to get the unique search result (named eids). I made a query as below. What I'd like to do is first text search from both 2 fields called eLabel and pLabel, and get the distinct result called eid. But actually the result is not aggregated, showing redundant ids from 0 to over 20. How I can adjust the query?
{
"query": {
"multi_match": {
"query": "Brazil Capital",
"fields": [
"eLabel",
"pLabel"
]
}
},
"size": 200,
"_source": [
"eid",
"eLabel"
],
"aggs": {
"eids": {
"terms": {
"field": "eid"
}
}
}
}
my current mappings are as follows.
eid : id of entity
eLabel: entity label (ex, Brazil)
prop_id: property id of the entity (eid)
pLabel: the label of the property (ex, is the capital of, is located at ...)
"mappings": {
"entity": {
"properties": {
"eLabel": {
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"eid": {
"type": "keyword"
} ,
"subclass": {
"type": "boolean"
} ,
"pLabel": {
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"prop_id": {
"type": "keyword"
} ,
"pType": {
"type": "keyword"
} ,
"way": {
"type": "keyword"
} ,
"chain": {
"type": "integer"
} ,
"siteKey": {
"type": "keyword"
},
"version": {
"type": "integer"
},
"docId": {
"type": "integer"
}
}
}
}
Based on your comment, you can make use of the below query using Bool. Don't think anything is wrong with aggregation query, just replace the query you have with the bool query I've mentioned and I think it would suffice.
When you make use of multi_match query, it would retrieve even if the document has eLabel = "Rio is capital of brazil" & pLabel = "something else entirely here"
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"eLabel": "capital"
}
},
{
"match": {
"pLabel": "brazil"
}
}
]
}
},
"size": 200,
"_source": [
"eid",
"eLabel"
],
"aggs": {
"eids": {
"terms": {
"field": "eid"
}
}
}
}
Note that if you only want the values of eid and do not want the documents, you can set "size":0 in the above query. That way you'd only have aggregation results returned.
Let me know if this helps!!

Terms aggregation with nested wildcard path

Given the following nested object of nested objects
{
[...]
"nested_parent":{
"nested_child_1":{
"classifier":"one"
},
"nested_child_2":{
"classifier":"two"
},
"nested_child_3":{
"classifier":"two"
},
"nested_child_4":{
"classifier":"five"
},
"nested_child_5":{
"classifier":"six"
}
[...]
}
I'm wanting to aggregate on the wildcard-ish field nested_parent.*.classifier, along the lines of
{
"size": 0,
"aggs": {
"termsAgg": {
"nested": {
"path": "nested_parent.*"
},
"aggs": {
"termsAgg": {
"terms": {
"size": 1000,
"field": "nested_parent.*.classifier"
}
}
}
}
}
}
which does not seem to work -- possibly because the path and field are not defined clearly enough.
How can I aggregate on nested objects with dynamically created nested mappings which share most of their properties, including the classifier on which I intend to terms-aggregate?
Tdlr;
A bit late to the party.
I would suggest a different approach as I don't see a possible solution using wildcards.
My solution would involve using the copy_to to create a field that you will be able to access using aggregation.
Solution
The idea is to create a field that will store the values of all your classifiers.
Which you can be doing aggregation on.
PUT /54198251/
{
"mappings": {
"properties": {
"classifiers": {
"type": "keyword"
},
"parent": {
"type": "nested",
"properties": {
"child": {
"type": "nested",
"properties": {
"classifier": {
"type": "keyword",
"copy_to": "classifiers"
}
}
},
"child2": {
"type": "nested",
"properties": {
"classifier": {
"type": "keyword",
"copy_to": "classifiers"
}
}
}
}
}
}
}
}
POST /54198251/_doc
{
"parent": {
"child": {
"classifier": "c1"
},
"child2": {
"classifier": "c2"
}
}
}
GET /54198251/_search
{
"aggs": {
"classifiers": {
"terms": {
"field": "classifiers",
"size": 10
}
}
}
}
Will give you:
"aggregations": {
"classifiers": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "c1",
"doc_count": 1
},
{
"key": "c2",
"doc_count": 1
}
]
}
}

Elasticsearch aggregation by field name

Imagine two documents:
[
{
"_id": "abc",
"categories": {
"category-id-1": 1,
"category-id-2": 50
}
},
{
"_id": "def",
"categories": {
"category-id-1": 2
}
}
]
As you can see, each document can be associated with a number of categories, by setting a nested field into the categories field.
With this mapping, I should be able to request the documents from a defined category and to order them by the value set as value for this field.
My problem is that I now want to make an aggregation to count for each category the number of documents. That would give the following result for the dataset I provided:
{
"aggregations": {
"categories" : {
"buckets": [
{
"key": "category-id-1",
"doc_count": 2
},
{
"key": "category-id-2",
"doc_count": 1
}
]
}
}
}
I can't find anything in the documentation to solve this problem. I'm completely new to ElasticSearch so I may be doing something wrong either on my documentation research or on my mapping choice.
Is it possible to make this kind of aggregation with my mapping? I'm using ES 6.x
EDIT: Here is the mapping for the index:
{
"test1234": {
"mappings": {
"_doc": {
"properties": {
"categories": {
"properties": {
"category-id-1": {
"type": "long"
},
"category-id-2": {
"type": "long"
}
}
}
}
}
}
}
}
The most straightforward solution is to use a new field that contains all the distinct categories of a document.
If we call this field categories_list here could be a solution :
Change the mapping to
{
"test1234": {
"mappings": {
"_doc": {
"properties": {
"categories": {
"properties": {
"category-id-1": {
"type": "long"
},
"category-id-2": {
"type": "long"
}
}
},
"categories_list": {
"type": "keyword"
}
}
}
}
}
}
Then you need to modify your documents like this :
[
{
"_id": "abc",
"categories": {
"category-id-1": 1,
"category-id-2": 50
},
"categories_list": ["category-id-1", "category-id-2"]
},
{
"_id": "def",
"categories": {
"category-id-1": 2
},
"categories_list": ["category-id-1"]
}
]
then your aggregation request should be
{
"aggs": {
"categories": {
"terms": {
"field": "categories_list",
"size": 10
}
}
}
}
and will return
"aggregations": {
"categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category-id-1",
"doc_count": 2
},
{
"key": "category-id-2",
"doc_count": 1
}
]
}
}

Elastic search range not working for double indexed as long

I have mapped a field as long, but the input data is decimal (100.123).
I've tried any range search and it doesn't work. I've verified and the data is in the proper index and I can find them if I search for missing/exists.
Range query:
"range": {
"nr_val": {
"from": 123,
"to": 1234
}
}
Is Elasticsearch just ignoring the values, treating them as strings in a range search ?
So in my situation, what can I do to make a range search from:100, to:200 work for 100.123 other than a full dump and re-import? Are there any conversion options available?
Update with detailed specs
{
"state": "open",
"settings": {
"index": {
"creation_date": "1447858537098",
"number_of_shards": "5",
"uuid": "iiPzQXasQadvnDF1da8oMw",
"version": {
"created": "1070299"
},
"number_of_replicas": "1"
}
},
"mappings": {
"mongo_doc": {
"properties": {
"parent": {
"type": "string"
},
"data.current.specs.nr._nrm_val": {
"type": "double"
},
"data.current.specs.nr_b._nrm_val": {
"type": "double"
},
"data": {
"properties": {
"current": {
"properties": {
"specs": {
"properties": {
"nr": {
"properties": {
"_nrm_val": {
"type": "double"
}
}
},
"nr_b": {
"properties": {
"_nrm_val": {
"type": "long"
}
}
}
}
}
}
}
}
}
}
}
},
"aliases": []
}
Seems that the mapping is not quite right... switched to ['data']['properties']['current']['properties'](...) notation.
In your case that field should have been double, not long. And the indexed value for 100.123 is 100 and you loose the decimals.
At this point, other than re-indexing which is ideal, probably just scripted filtering will do it:
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source['nr'].value >= param1 && _source['nr'].value <= param2",
"params": {
"param1": 100,
"param2": 200
}
}
}
}
}
}
but it will be expensive because of the _source loading.

Resources