ElasticSearch terms aggregation on whole field - elasticsearch

This is mapping for my field:
{
"product" : {
"mappings" : {
"product" : {
"filters.brand" : {
"full_name" : "filters.brand",
"mapping" : {
"brand" : {
"type" : "text",
"fielddata" : true
}
}
}
}
}
}
}
I'm trying to get unique brands with doc count will following curl:
curl -XGET 'http://localhost:9200/product/_search?pretty' -H 'Content-Type: application/json' -d'
{
"aggs": {
"domains": {
"terms": {
"field": "filters.brand",
"missing": "N/A",
"size": 10,
"order": {
"_count": "desc"
}
}
}
}
}'
It is working ok except it is returning count by field tokens, not by whole field.
For example I have brand "Absolut Joy" and it returns result for them as separate tokens.
How to get aggregation for whole field?
ElasticSearch version: 5.3.1
Thank you

You can update the mapping of filters.brand as
{
"mapping": {
"brand": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
And update the aggregation query to contain "field": "filters.brand.keyword".
Use of fielddata: true for text is not advised.
Refer: Before-enabling-field-data
For using same field for different purposes. Refer: use-multi-fields

Solution is to change analyzer and search_analyzer.
"analyzer": "keyword",
"search_analyzer": "keyword"

Related

why elasticsearch keyword search not working?

i use NLog to write log message to Elasticsearch, the index structure is here:
"mappings": {
"logevent": {
"properties": {
"#timestamp": {
"type": "date"
},
"MachineName": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"level": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"message": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
}
}
}
I was able to get results using a text search:
GET /webapi-2022.07.28/_search
{
"query": {
"match": {
"message": "ERROR"
}
}
}
result
"hits" : [
{
"_index" : "webapi-2022.07.28",
"_type" : "logevent",
"_id" : "IFhYQoIBRhF4cR9wr-ja",
"_score" : 4.931916,
"_source" : {
"#timestamp" : "2022-07-28T01:07:58.8822339Z",
"level" : "Error",
"message" : """2022-07-28 09:07:58.8822|ERROR|AppSrv.Filter.AccountAuthorizeAttribute|[KO17111808]-[172.10.2.200]-[ERROR]-"message"""",
"MachineName" : "WIN-EPISTFOBD41"
}
}
//.....
]
but when i use keyword, i get nothing:
GET /webapi-2022.07.28/_search
{
"query": {
"term": {
"message.keyword": "ERROR"
}
}
}
i tried term and match, the result is same.
this is happening due to message field not just containing ERROR but also having other string in the .keyword field, you need to use the text search only in your case, you can use the .keyword field only in case of the exact search.
If your message field contained only the ERROR string than only searching on your .keyword would produce result, you can test it yourself by indexing a sample document.

Elasticsearch - Wrong field type

I'm running on ElasticSearch 6.8.
I tried to add a keyword type field to my index mapping.
What I want is a mapping with my_field seeming like that:
"my_field": {
"type": "keyword"
}
So in order to do that, I added a field to my mapping:
"properties": {
...
"my_field": {
"type": "keyword",
"norms": false
},
...
}
But currently, it gives me something like:
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
I need this keyword type because I need to aggregate on it, and with a text type, it gave me:
Fielddata is disabled on text fields by default. Set fielddata=true on [my_field] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.
But I'm not able to set fielddata to true.
I tried many things like creating a new index instead of updating one but none of these tries worked.
Anyone knows how to have the correct field type ? (the solution I prefer)
Or how to set fielddata to true in the mapping?
Best regards,
Jules
I just created set field-data to true on text field by using below curl command on Elasticsearch 6.X version:
curl -X POST "localhost:9200/my_index/type?pretty" -H 'Content-Type: application/json' -d'
> {
> "mappings" :{
> "properties": {
> "my_field": {
> "type": "text",
> "fielddata": true
> }
> }
> }
> }'
And it created index with proper mapping.
{
"_index" : "my_index",
"_type" : "type",
"_id" : "3Jl0F3EBg44VI1hJVGnz",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
Mapping API gives below JSON response.
{
"my_index": {
"mappings": {
"type": {
"properties": {
"mappings": {
"properties": {
"properties": {
"properties": {
"my_field": {
"properties": {
"fielddata": {
"type": "boolean"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
}
}
}
}

How to get nested aggregations buckets using java high level REST client Elasticsearch

I have some nested fields, of which I want to calculate all distinct values, for example:
"author":{
"type":"nested",
"properties":{
"first_name":{
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
"last_name":{
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
Suppose I need all unique first names, so I am adding an aggregation like this :
GET /statementmetadataindex/data/_search?size=0
{
"aggs": {
"distinct_authors": {
"nested": {
"path": "authors"
},
"aggs": {
"distinct_first_names": {
"terms": {
"field": "authors.first_name.keyword"
}
}
}
}
}
}
which returns an aggregation like this:
"aggregations" : {
"distinct_authors" : {
"doc_count" : 20292,
"distinct_first_names" : {
"doc_count_error_upper_bound" : 4761,
"sum_other_doc_count" : 124467,
"buckets" : [
{
"key" : "Charles",
"doc_count" : 48411
},
{
"key" : "Rudyard",
"doc_count" : 30954
}
]
}
}
}
Now, I am using Nested aggregation builder in the java code like this :
NestedAggregationBuilder uniqueAuthors=AggregationBuilders.nested("distinct_authors", "authors");
TermsAggregationBuilder distinct_first_name= AggregationBuilders.terms("distinct_first_names")
.field("authors.first_name.keyword").size(size);
uniqueAuthors.subAggregation(distinct_first_name);
and I usually get the aggregation like this from the response:
Terms distinct_authornames=aggregations.get("distinct_authors");
but the buckets that I need are in the sub-aggregation "distinct_first_names" inside "distinct_authors" , so how do I parse the aggregation result to get the unique buckets with the first names?
Try this (not tested):
Nested distinct_authornames=aggregations.get("distinct_authors");
Terms distinct_first_names=distinct_authornames.getAggregations().get("distinct_first_names");
for (Terms.Bucket bucket : distinct_first_names.getBuckets())
{
System.out.println((int) bucket.getDocCount());
System.out.println(bucket.getKeyAsString());
}
Hope this helps
Figured out the solution, quite long time back , but didn't realise it was working because I kept getting exception , due to some other reason. The following works well :
Nested distinct_authorsOuter=aggregations.get("distinct_authors");
Aggregations distinct_authors_aggs=distinct_authorsOuter.getAggregations();
Terms distinct_firstNames= distinct_authors_aggs.get("distinct_first_names");

How to search a elasticsearch index by partial text of a field in the indexed document?

I have an ElsaticSearch index where I keep certain data. Each document in the index has a field named file_namein a nested document. So a doc looks like
{
...
"file_data":{
"file_name": "sample_filename_acp24_20180223_1222.json"
}
...
}
I want my search to return above document if I search for sample, filename,acp24 and 20180223 and likewise.
So far I tried following analyzers and full text search queries. But still it doesn't return the above doc if I searched for acp24, 20180223.
Index Mapping
{
"index_name": {
"mappings": {
"type": {
"properties": {
"file_data": {
"type": "nested",
"properties": {
"file_name": {
"type": "text",
"analyzer": "keyword_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Analyzer
{
"analysis": {
"analyzer": {
"keyword_analyzer":{
"type": "pattern",
"pattern":"\\W|_",
"lowercase": true
}
}
}
}
Search Query
{
"query": {
"match_phrase_prefix": {
"_all": {
"query": "20180223",
"analyzer": "keyword_analyzer"
}
}
}
}
Any help on how to achieve this is very much appreciated. I have spent so many hours with this and still couldn't find a solution.
If I understand right, you could use the wildcard query :
POST /my_index
{
"query" : {
"wildcard" : {
"file_data.file_name" : {
"wildcard" : "sample_*filename_acp24*", "boost" : 2.0
}
}
}
}
(tested with elasticsearch 6.1, might need to change the syntax for other versions)

Aggregating over _field_names in elasticsearch 5

I'm trying to aggregate over field names in ES 5 as described in Elasticsearch aggregation on distinct keys But the solution described there is not working anymore.
My goal is to get the keys across all the documents. Mapping is the default one.
Data:
PUT products/product/1
{
"param": {
"field1": "data",
"field2": "data2"
}
}
Query:
GET _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 0
}
}
}
}
I get following error: Fielddata is not supported on field [_field_names] of type [_field_names]
After looking around it seems the only way in ES > 5.X to get the unique field names is through the mappings endpoint, and since cannot aggregate on the _field_names you may need to slightly change your data format since the mapping endpoint will return every field regardless of nesting.
My personal problem was getting unique keys for various child/parent documents.
I found if you are prefixing your field names in the format prefix.field when hitting the mapping endpoint it will automatically nest the information for you.
PUT products/product/1
{
"param.field1": "data",
"param.field2": "data2",
"other.field3": "data3"
}
GET products/product/_mapping
{
"products": {
"mappings": {
"product": {
"properties": {
"other": {
"properties": {
"field3": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"param": {
"properties": {
"field1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"field2": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Then you can grab the unique fields based on the prefix.
This is probably because setting size: 0 is not allowed anymore in ES 5. You have to set a specific size now.
POST _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 100 <--- change this
}
}
}
}

Resources