ElasticSearch multifield not working - elasticsearch

In my documents I have a field collaboration on which I would like to do aggregation queries. However I also want it to be full-text searchable, so I figured out I should make it a multifield. The field may look something like this:
...
"collaboration" : "CMS"
or
"collaboration" : ["ATLAS", "CMS"]
or
"collaboration" : "LHCb"
...
Following this advice: ElasticSearch term aggregation I changed the mapping to:
"collaboration": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
And I run a query:
POST /my_index/_search
{
"aggs": {
"collaboration": {
"terms": {
"field": "collaboration.raw"
}
}
}
}
And get nothing:
"hits": {
"total": 5,
"max_score": 1,
"hits": [...]
},
"aggregations": {
"collaboration": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
Even when I try to use this field for searching it doesn't work:
POST /my_index/_search
{
"query": {
"query_string": {
"query": "CMS",
"fields": ["collaboration.raw"]
}
}
}
Should I change the mapping somehow because of the fact that the field is sometimes a list and sometimes a string? My research found that arrays are supposed to be supported out of the box. Any suggestions what might be wrong here?

Solved it thanks to Andrei Stefan!
All the documents needed to be reindexed after changing the mapping, small yet painful detail.

Related

elasticsearch aggregation fields with text type mapping

I am trying to aggregate on a field that has type text.
Mapping setting:
"Group":{"type":"text"}
And query:
{
"query": {
"term": {
"request_id": 22
}
},
"size": 0,
"aggs": {
"sets": {
"terms": {"field": "Group.keyword"}
}
}
}
This gives empty results:
"hits": {
"total": 7463,
"max_score": 0,
"hits": []
},
"aggregations": {
"sets": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
Without .keyword gives illegal_argument_exception.. reason: ... alternatively use a keyword field instead..
Also, values in Group field are Grp1 and Grp2 only.
How can I aggregate sets based on these two values?
Update mapping to:
"Group": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
After making the above change in mapping, re-index the documents and then you can use Group.keyword
If you would never want full text search on the values of Group field then you should keep it type as keyword.
"Group":{"type":"keyword"}
In this case you can aggregate on Group field itself.

Elastic Search Unique Field Values

I am trying to get groups of only unique values in Elastic Search for the searches. I can't figure out why this doesn't behave.
I have gone through many StackOverflow questions, and read the Documentation for most of the day. Nothing seems to work for me, below I provided what I tried doing last.
Is there any reason someone would want to have the same results repeatedly returned? Maybe for differing versions of a Document?
In this example I would like a listing of all mfr_id's, and their mfr_desc as well. I am running this over a type to search document field values only. It seems that Agg Terms is the way to accomplish this, does anyone see anything I am doing wrong?
1: API Call
GET /inventory/item/_search
{
"size": 0,
"_source": ["mfr_id", "mfr_desc"],
"aggs": {
"unique_vals": {
"terms": {
"field": "mfr_id.keyword"
/** I have to use .keyword, seems like my mappings isn't working */
}
}
}
}
2: Mapping File
The Mapping I run after doing a Bulk import is quite simple. I read to not analyze the keys if you want a unique query:
{
"index": "inventory",
"body": {
"settings": {
"number_of_shards": 1
},
"mappings": {
"_default_": {
"properties": {
"mfr_id": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
3: My Results
Aggregation has ~10 records when there are about 100. I would really like to be able to get the _source fields of more than just a key if this is possible.
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 49341,
"max_score": 0,
"hits": []
},
"aggregations": {
"unique_vals": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 6815,
"buckets": [
{
"key": "14",
"doc_count": 24292
},
{
"key": "33",
"doc_count": 5508
},
...
I would really like to be able to get the _source fields of more than
just a key if this is possible.
I think , you have only one option , I have faced same problem . Try this :-
{
"aggregations": {
"byId": {
"terms": {
"field": "mfr_id"
},
"aggs": {
"byDesc": {
"terms": {
"field": "mfr_desc"
}
}
}
}
}
}
Now you will get both id and desc while iterating through Elastic search JAVA API .
Terms aTerms = aAggregations.get("byId");
aTerms.getBuckets().stream().forEach(aBucketById-> {
Terms aTermsDesc = aBucketById.getAggregations().get("byDesc");
aTermsDesc.getBuckets().stream().forEach(aBucketByDesc -> {
//store id and desc
});
});
I would use a filter , it has better performance than an aggregation.
in aggregation you get all of the documents and only than you apply the aggregation . if you using a filter you get only the documents witch match the filter , and also filters can be cached.
{
"query": {
"constant_score": {
"filter": {
"exists": {
"field": "mfr_id"
}
}
}
}
}

Can I get a field if I disabled the _source and _all in Elasticsearch

Elasticsearch suggested to dissable _source and _all field in my case, this my mapping
{
"template": "mq-body-*",
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0,
"max_result_window": 100,
"codec": "best_compression"
},
"mappings": {
"_default_": {
"_source": {
"enabled": false
},
"_all": {
"enabled": false
}
},
"body": {
"properties": {
"body": {
"type": "string",
"doc_values": true,
"index": "not_analyzed"
}
}
}
}
}
The body.body is a very large field(20k-300k), we don't have to index and rare get,this is lost-able. But after
PUT /mq-body-local/body/1
{"body":"My body"}
I can't find the body by GET /mq-body-local/body/1?fields=body or POST /mq-body-local/body/_search -d'{"fields":["body"]}',the result is found one but no document.I know there is no _source I can not do get or search, but how can I retrive my document ?
From Elasticsearch's website:
The _source field contains the original JSON document body that was
passed at index time. The _source field itself is not indexed (and
thus is not searchable), but it is stored so that it can be returned
when executing fetch requests, like get or search
Disabling the source will prevent Elasticsearch from displaying it in the resultset. However, filtering, querying and aggregations will not be affected.
So these two queries will not generate any results in terms of the actual body:
GET mq-body-local/body/_search
GET mq-body-local/body/1
However, you could run this aggregation that will include some of the source, for example:
POST mq-body-local/body/_search
{
"aggs": {
"test": {
"terms": {
"field": "body"
}
}
}
}
Will produce this result set (I've created some test records):
"aggregations": {
"test": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "my body",
"doc_count": 1
},
{
"key": "my body2",
"doc_count": 1
}
]
}
}

Broken aggregation in elasticsearch

I'm getting erroneous results on performing terms aggregation in the field names in the index.
The following is the mappings I have used to the names field:
{
"dbnames": {
"properties": {
"names": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Here is the results I'm getting for a simple terms aggregation on the field:
"aggregations": {
"names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John Martin",
"doc_count": 1
},
{
"key": "John martin",
"doc_count": 1
},
{
"key": " Victor Moses",
"doc_count": 1
}
]
}
}
As you can see, I have the same names with different casings being shown as different buckets in the aggregation. What I want here is irrespective of the case, the names should be clubbed together.
The easiest way would be to make sure you properly case the value of your names field at indexing time.
If that is not an option, the other way to go about it is to define an analyzer that will do it for you and set that analyzer as index_analyzer for the names field. Such a custom analyzer would need to use the keyword tokenizer (i.e. take the whole value of the field as a single token) and the lowercase token filter (i.e. lowercase the value)
curl -XPUT localhost:9200/your_index -d '{
"settings": {
"index": {
"analysis": {
"analyzer": {
"casing": { <--- custom casing analyzer
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
}
},
"mappings": {
"your_type": {
"properties": {
"names": {
"type": "string",
"index_analyzer": "casing" <--- use your custom analyzer
}
}
}
}
}'
Then we can index some data:
curl -XPOST localhost:9200/your_index/your_type/_bulk -d '
{"index":{}}
{"names": "John Martin"}
{"index":{}}
{"names": "John martin"}
{"index":{}}
{"names": "Victor Moses"}
'
And finally the terms aggregation on the names field would return your the expected results:
curl -XPOST localhost:9200/your_index/your_type/_search-d '{
"size": 0,
"aggs": {
"dbnames": {
"terms": {
"field": "names"
}
}
}
}'
Results:
{
"dbnames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "john martin",
"doc_count": 2
},
{
"key": "victor moses",
"doc_count": 1
}
]
}
}
There are 2 options here
Use not_analyzed option - This one has a disadvantage that same
string with different cases wont be seen as on
keyword tokenizer + lowercase filter - This one does not have the
above issue
I have neatly outlined these two approaches and how to use them here - https://qbox.io/blog/elasticsearch-aggregation-custom-analyzer

Elasticsearch: how to scope aggregations to your query and filter?

I have been playing around with elasticsearch query and filter for some time now but never worked with aggregations before. The idea that we can scope the aggregations with our query seems quite amazing to me but I want to understand how to do it properly so that I do not make any mistakes. Currently all my search queries are designed this way:
{
"query": {
},
"filter": {
},
"from": 0,
"size": 60
}
Now, when I added some aggregation buckets, the structure became this:
{
"aggs": {
"all_colors": {
"terms": {
"field": "color.name"
}
},
"all_brands": {
"terms": {
"field": "brand_slug"
}
},
"all_sizes": {
"terms": {
"field": "sizes"
}
}
},
"query": {
},
"filter": {
},
"from": 0,
"size": 60
}
However, the results of the aggregation are always the same irrespective of what info I provide in filter.
Now, when I changed the query structure to something like this, it started showing different results:
{
"aggs": {
"all_colors": {
"terms": {
"field": "color.name"
}
},
"all_brands": {
"terms": {
"field": "brand_slug"
}
},
"all_sizes": {
"terms": {
"field": "sizes"
}
}
},
"query": {
"filtered": {
"query": {
},
"filter": {
}
}
},
"from": 0,
"size": 60
}
Does it mean I will have to change the structure of my search queries everywhere to this new filtered type of structure ? Is there any other workaround which allows me to achieve desired results without having to change that much of code ?
Also, another thing I observed is that if my brand_slug field contains multiple keywords like "peter england", then both of these are returned in separate buckets like this:
{
"buckets": [
{
"key": "england",
"doc_count": 368
},
{
"key": "peter",
"doc_count": 368
}
]
}
How can I ensure that both these end up in a same bucket like this:
{
"buckets": [
{
"key": "peter england",
"doc_count": 368
}
]
}
UPDATE: This second part I have been able to accomplish by indexing brand, color and sizes differently like this:
"sizes": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
What you've noticed is by design. Have a look at my answer to a similar question on SO. Basically, input to both aggregation and filter sections is the output of query section. Filtered Query as you've suggested would be the best way to achieve the results you desire. There is another way too. You can use Filter Aggregation. Then you would not need to change your query and filter sections but simply copy the filter section inside the aggregation sections but that in my opinion would be an overkill and a violation of the DRY principle in general.

Resources