ElasticSearch terms aggregation by entire field - ruby

How can I write an ElasticSearch term aggregation query that takes into account the entire field value, rather than individual tokens? For example, I would like to aggregate by city name, but the following returns new, york, san and francisco as individual buckets, not new york and san francisco as the buckets as expected.
curl -XPOST "http://localhost:9200/cities/_search" -d'
{
"size": 0,
"aggs" : {
"cities" : {
"terms" : {
"field" : "city",
"min_doc_count": 10
}
}
}
}'

You should fix this in your mapping. Add a not_analyzed field. You can create the multi field if you also need the analyzed version.
"album": {
"city": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
Now create your aggregate on city.raw

Update at 2018-02-11
now we can use syntax .keyword after grouped by field according to this
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}

This elastic doc suggests to fix that in mapping (as suggested in the accepted answer) - either to make the field not_analyzed or to add a raw field with not_analyzed and use it in aggregations.
There is no other way for it. As the aggregations operate upon inverted index and if the field is analyzed, the inverted index is bound to have only tokens and not the original values of the field.

Related

Can we query on Field if its mapping is not defined in ES?

Is it possible to Query on field which is not mapped with order??
Using Elastic search 7.4
I've created a index with with only 1 mapping
Index name - test_date_mapping_with_null
Dynamic mapping - False
properties - city -> text.
{
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 1
},
"mappings" : {
"dynamic":false,
"properties" : {
"city" : { "type" : "text" }
}
}
}
Inserting documents with published_at field
POST test_date_mapping_with_null/_doc/1
{
"city": "NY",
"published_at": "2022-01-01T06:58:27.000Z"
}
POST test_date_mapping_with_null/_doc/2
{
"city": "Paris",
"published_at": "2022-01-02T06:58:27.000Z"
}
POST test_date_mapping_with_null/_doc/3
{
"city": "Mumbai",
"published_at": "2022-01-03T06:58:27.000Z"
}
POST test_date_mapping_with_null/_doc/4
{
"city": "Tokyo",
"published_at": "2022-01-04T06:58:27.000Z"
}
Mapping looks like this
"mappings": {
"_doc": {
"dynamic": "false",
"properties": {
"city": {
"type": "text"
}
}
}
}
Now Upon Search Query
GET test_date_mapping_with_null/_search
{
"query": {
"range": {
"published_at": {
"gte": "2022-01-01T00:58:27.000Z",
"lte": "2022-01-03T23:58:27.000Z",
"boost": 2.0
}
}
}
}
Actual - ES returns all the docs.
Expected - ES should return only Doc 1, 2 and 3 (i.e City -> NY, Paris and Mumbai Doc)
Your index mapping, currently only includes mapping for the city field, it does not have mapping for the published_at field as you have set "dynamic": "false" in your index mapping.
This means that published_at is stored in Elasticsearch, but this field is not indexed in Elasticsearch. In simple terms, this means that you cannot perform any search on the published_at field
No, You can't query the fields if its not indexed in the Elasticsearch(as you define dynamic:false, it won't be index), you however can see the them as part of _source when you get a document using _search or by document id.
Either change the mapping from dynamic:false to dynamic:true or add the field explicitly in the mapping(if you want to have dynamic:false), if you want to query the field.
You can't query on fields which are not specified in mapping and dynamic is set to false . You can only store those fields in _source.
https://www.elastic.co/guide/en/elasticsearch/reference/7.4/dynamic.html

Merging fields in Elastic Search

I am pretty new to Elastic Search. I have a dataset with multiple fields like name, product_info, description etc., So while searching a document, the search term can come from any of these fields (let us call them as "search core fields").
If I start storing the data in elastic search, should I derive a field which is a concatenated term of all the "search core fields" ? and then index this field alone ?
I came across _all mapping concept and little confused. Does it do the same ?
no, you don't need to create any new field with concatenated terms.
You can just use _all with match query to search a text from any field.
About _all, yes, it searches the text from any field
The _all field has been removed in ES 7, so it would only work in ES 6 and previous versions. The main reason for this is that it used too much storage space.
However, you can define your own all field using the copy_to feature. You basically specify in your mapping which fields should be copied to your custom all field and then you can search on that field.
You can define your mapping like this:
PUT my-index
{
"mappings": {
"properties": {
"name": {
"type": "text",
"copy_to": "custom_all"
},
"product_info": {
"type": "text",
"copy_to": "custom_all"
},
"description": {
"type": "text",
"copy_to": "custom_all"
},
"custom_all": {
"type": "text"
}
}
}
}
PUT my-index/_doc/1
{
"name": "XYZ",
"product_info": "ABC product",
"description": "this product does blablabla"
}
And then you can search on your "all" field like this:
POST my-index/_search
{
"query": {
"match": {
"custom_all": {
"query": "ABC",
"operator": "and"
}
}
}
}

ElasticSearch : Aggregations on one field not working

I have few documents in one index in elastic search. When I aggregate by one of its fields, I do not get any results. The field's mapping is
{
"type": "string",
"index": "not_analyzed"
}
I have another field that is indexed in the same manner but I am able to do aggregations on that. What possible causes can be there for this? How do I narrow down the issue?
Edit : The Elastic Search version is 1.6.0 and I am running the following query for aggregation:
{
"aggregations": {
"aggr_name": {
"terms": {
"field": "storeId",
"size": 100
}
}
}
}
where "storeId" is the field I am aggregating on. The same aggregation works on another field with the same mapping.

Grouping data in elasticsearch taking whitespaces into account

I'm trying to execute aggregations the same way they're executed here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/_executing_aggregations.html
The problem I'm facing at the moment is that some values in the fields have whitespaces. Imagine that a possible value is "El Paso". When I execute the following, I get buckets for "El" and for "Paso", but I don't get a bucket for "El Paso".
curl -XPOST 'localhost:9200/myIndex/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_city": {
"terms": {
"field": "city"
}
}
}
}'
My desired result is that each field is treated as an indivisible unit. How do I do this?
EDIT 1: Creating the index and importing the data again would take enormous amounts of time, since that index has millions of documents, so I would like a solution that doesn't involve doing all the work again.
you have to make the index fields as not_analysed as mentioned in this page to achieve this
example
"title": {
"type": "string",
"fields": {
"raw": { "type": "string", "index": "not_analyzed" }
}
}

elasticsearch aggregations separated words

I simply run an aggregations in browser plugin(marvel) as you see in picture below there is only one doc match the query but aggregrated separated by spaces but it doesn't make sense I want aggregrate for different doc.. ın this scenario there should be only one group with count 1 and key:"Drow Ranger".
What is the true way of do this in elasticsearch..
It's probably because your heroname field is analyzed and thus "Drow Ranger" gets tokenized and indexed as "drow" and "ranger".
One way to get around this is to transform your heroname field to a multi-field with an analyzed part (the one you search on with the wildcard query) and another not_analyzed part (the one you can aggregate on).
You should create your index like this and specify the proper mapping for your heroname field
curl -XPUT localhost:9200/dota2 -d '{
"mappings": {
"agust": {
"properties": {
"heroname": {
"type": "string",
"fields": {
"raw: {
"type": "string",
"index": "not_analyzed"
}
}
},
... your other fields go here
}
}
}
}
Then you can run your aggregation on the heroname.raw field instead of the heroname field.
UPDATE
If you just want to try on the heroname field, you can just modify that field and not recreate the whole index. If you run the following command, it will simply add the new heroname.raw sub-field to your existing heroname field. Note that you still have to reindex your data though
curl -XPUT localhost:9200/dota2/_mapping/agust -d '{
"properties": {
"heroname": {
"type": "string",
"fields": {
"raw: {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Then you can keep using heroname in your wildcard query, but your aggregation will look like this:
{
"aggs": {
"asd": {
"terms": {
"field": "heroname.raw", <--- use the raw field here
"size": 0
}
}
}
}

Resources