ElasticSearch REST API Aggregating Text Field

ElasticSearch REST API Aggregating Text Field - elasticsearch

So I'm brand new to ElasticSearch/Kibana, trying to create a simple Curl command to hit Elastic's REST API and return the number of logs that contain a given string of text. But I'm getting the following error:
"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [timestamp] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
My code is as follows:
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "2021-06-15",
"lte": "2021-06-23"
}
}
},
"aggs": {
"hit_count_per_day": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
}
}
}
}
Where should I be adding this "fielddata=true" value mentioned in the error? Can anyone point me towards a reference doc for ElasticSearch API syntax?

Based on the error you are getting, it seems that timestamp field is of text type. You cannot perform aggregation on text type fields.
Since you are using date_histogram aggregation, you should usetimestamp field to be of date type.
Modify your index mapping as shown below
{
"mappings": {
"properties": {
"timestamp": {
"type": "date"
}
}
}
}

Related

Partial search on date fields in elasticsearch

I'm trying to implement partial search on a date field in elastic search. For example if startDate is stored as "2019-08-28" i should be able to retrieve the same while querying just "2019" or "2019-08" or "2019-0".
For other fields i'm doing this:
{
"simple_query_string": {
"fields": [
"customer"
],
"query": "* Andrew *",
"analyze_wildcard": "true",
"default_operator": "AND"
}}
which perfectly works on text fields, but the same doesn't work on date fields.
This is the mapping :
{"mappings":{"properties":{"startDate":{"type":"date"}}}}
Any way this can be achieved, be it change in mapping or other query method? Also i found this discussion related to partial dates in elastic, not sure if it's much relevant but here it is:
https://github.com/elastic/elasticsearch/issues/45284

Excerpt from ES-Docs
Internally, dates are converted to UTC (if the time-zone is specified)
and stored as a long number representing milliseconds-since-the-epoch.
It is not possible to do searching as we can do on a text field. However, we can tell ES to index date field as both date & text.e.g
Index date field as multi-type:
PUT sample
{
"mappings": {
"properties": {
"my_date": {
"type": "date",
"format": "year_month_day",//<======= yyyy-MM-dd
"fields": {
"formatted": {
"type": "text", //<========= another representation of type TEXT, can be accessed using my_date.formatted
"analyzer": "whitespace" //<======= whitespace analyzer (standard will tokenized 2020-01-01 into 2020,01 & 01)
}
}
}
}
}
}
POST dates/_doc
{
"date":"2020-01-01"
}
POST dates/_doc
{
"date":"2019-01-01"
}
Use wildcard query to search: You can even use n-grams at indexing time for faster search if required.
GET dates/_search
{
"query": {
"wildcard": {
"date.formatted": {
"value": "2020-0*"
}
}
}
}

Elasticsearch Terms aggregation with unknown datatype

I'm indexing data of unknown schema in Elasticsearch using dynamic mapping, i.e. we don't know the shape, datatypes, etc. of much of the data ahead of time. In queries, I want to be able to aggregate on any field. Strings are (by default) mapped as both text and keyword types, and only the latter can be aggregated on. So for strings my terms aggregations must look like this:
"aggs": {
"something": {
"terms": {
"field": "something.keyword"
}
}
}
But other types like numbers and bools do not have this .keyword sub-field, so aggregations for those must look like this (which would fail for text fields):
"aggs": {
"something": {
"terms": {
"field": "something"
}
}
}
Is there any way to specify a terms aggregation that basically says "if something.keyword exists, use that, otherwise just use something", and without taking a significant performance hit?
Requiring datatype information to be provided at query time might be an option for me, but ideally I want to avoid it if possible.

If the primary use case is aggregations, it may be worth changing the dynamic mapping for string properties to index as a keyword datatype, with a multi-field sub-field indexed as a text datatype i.e. in dynamic_templates
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"ignore_above": 256,
"fields": {
"text": {
"type": "text"
}
}
}
}
},

ElasticSearch : Aggregations on one field not working

I have few documents in one index in elastic search. When I aggregate by one of its fields, I do not get any results. The field's mapping is
{
"type": "string",
"index": "not_analyzed"
}
I have another field that is indexed in the same manner but I am able to do aggregations on that. What possible causes can be there for this? How do I narrow down the issue?
Edit : The Elastic Search version is 1.6.0 and I am running the following query for aggregation:
{
"aggregations": {
"aggr_name": {
"terms": {
"field": "storeId",
"size": 100
}
}
}
}
where "storeId" is the field I am aggregating on. The same aggregation works on another field with the same mapping.

Elasticsearch autocomplete integer field

I am trying to implement an autocomplete feature on a numeric field (it's actual type in ES is long).
I am using a jQuery UI Autocomplete widget on the client side, having it's source function send a query to Elasticsearch with the prefix term to get a number (say, 5) of autocomplete options.
The query I am using is something like the following:
{
"size": 0,
"query": {
"prefix": {
"myField": "<term>"
}
},
"aggs": {
"myAggregation": {
"terms": {
"field": "myField",
"size": 5
}
}
}
}
Such that if myField has the distinct values: [1, 15, 151, 21, 22], and term is 1, then I'd expect to get from ES the buckets with keys [1, 15, 151].
The problem is this does not seem to work with numeric fields. For the above example, I am getting a single bucket with the key 1, and if term is 15 I am getting a single bucket with key 15, i.e. it only returns exact matches. In contrast, it works perfectly for string typed fields.
I am guessing I need some special mapping for myField, but I'd prefer to have the mapping as general as possible, while having the autocomplete working with minimal changes to the mapping (just to note - the index I am querying might be a general one, external to my application, so I will be able to change the type/field mappings in it only if the new mapping is something general and standard).
What are my options here?

What I would do is to create a string sub-field into your integer field, like this:
{
"myField": {
"type": "integer",
"fields": {
"to_string": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Then your query would need to be changed to the one below, i.e. query on the string field, but retrieve the terms aggregations from the integer field
{
"size": 0,
"query": {
"prefix": {
"myField.to_string": "1"
}
},
"aggs": {
"myAggregation": {
"terms": {
"field": "myField",
"size": 5
}
}
}
}
Note that you can also create a completely independent field, not necessary a sub-field, the key point is that one field needs the integer value to run the terms aggregation on and the other field needs the string value to run the prefix query on.

Is there any way to use indices which contains no query results as a statistical comparison background in Elasticsearch?

I could find the criteria that Elasticsearch's significant_terms aggregations use.
I mean, background set used for statistical comparisons is the index or indices from which the results were gathered.
However, I want to use daily created Logstash index for significant terms aggregation.
Here is the problem I've faced.
If I use a filtered query containing a filter like below, it ignores all logstash indices except for "logstash-2014.12.10".
{
"range":{
"#timestamp":{
"from":"2014-12-10T15:00:00.000+00:00",
"to":"2014-12-10T18:00:00.000+00:00"
}
}
}
Is there any way to use all indices that are daily created by Logstash as the background documents of a significant_terms aggregation like below?
"aggregations": {
"agg_by_remote_ip": {
"significant_terms": {
"field": "remote_ip"
}
}
}
Thanks in advance.

Try running the query over multiple indices in the first place (note the logstash-* part of the GET request):
GET /logstash-*/_search
{
"query": {
"range": {
"#timestamp": {
"from": "2014-12-10T15:00:00.000+00:00",
"to": "2014-12-10T18:00:00.000+00:00"
}
}
},
"aggregations": {
"agg_by_remote_ip": {
"significant_terms": {
"field": "remote_ip"
}
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ElasticSearch REST API Aggregating Text Field - elasticsearch

Related

Partial search on date fields in elasticsearch

Elasticsearch Terms aggregation with unknown datatype

ElasticSearch : Aggregations on one field not working

Elasticsearch autocomplete integer field

Is there any way to use indices which contains no query results as a statistical comparison background in Elasticsearch?

Categories

Resources