Elasticsearch The float field becomes integer after aggregations - elasticsearch

I have a field, it look like "usage": 66.667. I tried to get sum of this field:
"aggs": {
"sum_usage": {
"sum": {
"field": "usage"
}
}
}
But after this aggregation I have
"aggregations" : {
"sum_usage" : {
"value" : 66.0
}
}
Could you please tell me how does it happens? Why float filed becomes integer?

The reason is because in your index mapping the field is mapped as an integer. You can see this when running the following command:
GET your-index/_mapping/field/usage
The reason is that you didn't create your mapping explicitly and you let ES dynamically generate the mapping, which happens when you index your first document. When you did, the very first document must have had an integer value for the usage field (e.g. "1", "0", etc), and hence, the mapping was created with integer instead of float.
You need to explicitly create the mapping of your index with the proper types for all your fields. Then reindex your data and your query will work as you expect.

Related

Elasticsearch 7 number_format_exception for input value as a String

I have field in index with mapping as :
"sequence_number" : {
"type" : "long",
"copy_to" : [
"_custom_all"
]
}
and using search query as
POST /my_index/_search
{
"query": {
"term": {
"sequence_number": {
"value": "we"
}
}
}
}
I am getting error message :
,"index_uuid":"FTAW8qoYTPeTj-cbC5iTRw","index":"my_index","caused_by":{"type":"number_format_exception","reason":"For input string: \"we\""}}}]},"status":400}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:260) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:238) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1433) ~[elasticsearch-rest-high-level-client-7.1.1.jar:7.1.1]
at
How can i ignore number_format_exception errors, so the query just doesn't return anything or ignores this filter in particular - either is acceptable.
Thanks in advance.
What you are looking for is not possible, ideally, you should have coherce enabled on your numeric fields so that your index doesn't contain dirty data.
The best solution is that in your application which generated the Elasticsearch query(you should have a check for NumberFormatExcepton if you are searching for numeric fields as your index doesn't contain the dirty data in the first place and reject the query if you get an exception in your application).
Edit: Another interesting approach is to validate the data before inserting into ES, using the Validate API as suggested by #prakash, only thing is that it would add another network call but if your application is not latency-sensitive, it can be used as a workaround.

How can I get options for filtering by a field directly from elasticsearch?

I want to populate a filtering field based on the data I have indexed inside Elasticsearch. How can I retrieve this data? For example, my documents inside index "test" and type "doc" could be
{"id":1, "tag":"foo", "name":"foothing"}
{"id":2, "tag":"bar", "name":"barthing"}
{"id":3, "tag":"foo", "name":"something"}
{"id":4, "tag":"quux", "name":"quuxthing"}
I'm looking for something like GET /test/doc/_magic?q=tag that would return [foo,bar,quux] from my data. I don't know what this is called or even possible. I don't want to get all index entries into memory and do this programmatically, I have millions of documents in the index with around a hundred different tags.
Is this possible with ES?
Yes, that's possible and this is called a terms aggregation
You can do it like this:
GET /test/doc/_search
{
"size": 0,
"aggs" : {
"tags" : {
"terms" : {
"field" : "tag.keyword",
"size": 100
}
}
}
}
Note that depending on the cardinality of your tag field, you can increase/decrease the size setting (10 by default).

Elasticsearch 6.2: terms query require lowercase input when searching on keyword

I've created an example index, with the following mapping:
{
"_doc": {
"_source": {
"enabled": False
},
"properties": {
"status": { "type": "keyword" }
}
}
}
And indexed a document:
{"status": "CMP"}
When searching the documents with this status with a terms query, I find no results:
{
"query" : {
"terms": { "status": ["CMP"]}
}
}
However, if I make the same query by putting the input in lowercase, I will find my document:
{
"query" : {
"terms": { "status": ["cmp"]}
}
}
Why is it? Since I'm searching on a keyword field, the indexed content should not be analyzed and should match an uppercase value...
no more #Oliver Charlesworth Now - in Elastic 6.x - you could continue to use a keyword datatype, lowercasing your text with a normalizer,doc here. However in every cases you should change your index mapping and reindex your docs
The index and mapping creation and the search were part of a test suite. It seems that the setup part of the test suite was not executed, and the mapping was not applied to the index.
The index was then using the default types instead of the mapping types, resulting of the use of string fields instead of keywords.
After changing the setup method of the automated tests, the mappings are well applied to the index, and the uppercase values for the status "CMP" are now matching documents.
The symptoms you're seeing shouldn't occur, unless something else is wrong.
A keyword index is not analysed, so your index should contain only CMP. A terms query is also not analysed, etc. so your index is searched only for CMP. Hence there should be a match.

How to make a field in Kibana numeric (from String)

I've inherited an ELK stack for logs and I'm still learning the ropes - I've been tasked with making two fields numeric on a certain type on our logstash indexes. Can't seem to figure out how to do this. Things I tried:
In the Kibana settings page, went to my logstash index and found the field. Went to edit on the controls tab, saw type listed as String (and it was immutable). Dropdown for format shows URL and String.
Went to one of my Elasticsearch hosts and found the grok rule for the document type, and found that they were indeed written to parse the field as a number. Example: %{NUMBER:response_code}
Ran out of ideas, since I don't know my way around the ELK stack.
Any help greatly appreciated, especially links to relevant documentation so I can understand what's going on. I'd be googling harder if I knew what to google.
Also note that %{NUMBER:response_code} doesn't make a number out of a string, it simply recognizes and parses a number present in a string, but the resulting response_code field is still a string, which you need to convert to number using a mutate/convert filter. grok will always parse a string into other smaller strings and it is your job to convert the resulting fields into the types you expect.
So you need to add this after your grok filter:
mutate {
convert => { "response_code" => "integer" }
}
From then on, the response_code in your event will be an integer and the logstash template used to create your daily logstash indices contains a specific dynamic template for integer fields. Note that the response_code field will be an integer only once the new logstash index is created, the existing indices will not change.
You will need to reindex your data. Because the Elasticsearch mapping (ie. schema) is already set to string for this field, you will not be able to index data as an integer within the same index.
A typical ELK setup will create rolling indices (per day or month), so it's possible to switch from string to interger between indices, but this is not recommended as it will interfere with long term aggregations and searches.
As you found out, changing the Grok rule will help with future data. Now, you need to pass all your existing data through Logstash again to apply the new ryles.
To do this, you can either pass the log files again, or have Logstash read from Elasticsearch using
input {
elasticsearch {
hosts => "localhost"
}
}
The newer versions of Elasticsearch should improve this by providing a native reindex API.
Try to view sample of documents:
curl -XGET 'localhost:9200/_search?q=opcode:userLessonComplexityPoll&pretty'
let say you see these docs:
{
"_index" : "myindex",
"_type" : "logs",
"_id" : "AWNoYI8pGmxxeL6jupEZ",
"_score" : 1.0,
"_source" : {
"production" : "0",
"lessonId" : "2144",
"opcode" : "userLessonComplexityPoll",
"courseId" : "45",
"lessonType" : "minitest",
...
So, try to convert in one document:
curl -XPOST 'localhost:9200/educa_stats-*/_update_by_query?pretty' -d '
{
"script": {
"lang": "painless",
"source": "if(ctx._source.lessonId instanceof String) { int lessonId = Integer.parseInt(ctx._source.lessonId); ctx._source.lessonId = (int)lessonId; }"
},
"query": {
"bool": {
"terms": {
"_id": ["AWNoYI8pGmxxeL6jupEZ", "AWMcRJYFGmxxeL6jucIZ"]
}
}
}
}'
success? Try to convert all documents by query:
curl -XPOST 'localhost:9200/educa_stats-*/_update_by_query?pretty' -d '
{
"script": {
"lang": "painless",
"source": "if(ctx._source.lessonId instanceof String) { int lessonId = Integer.parseInt(ctx._source.lessonId); ctx._source.lessonId = (int)lessonId; }"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "lessonId"
}
}
]
}
}
}'
All fields lessonId will be converted from String type to int (-2^32 - 2^32) type. It's all.

Elastic search filter value like "123-325-23243" during aggregation

In elastic search query when I try to aggregate, I have value like 1234-3245-34234-2342 it just returns with key: 1234
Is there any possibility in mentionings the property type or regular expression in it
Some more explanation :
"aggregations": { "myagg": { "terms": { "field": "did", "size": 50 } } }
When I try it on the data the values are like ABC-CDEF-DEFG and after running the script it is not able aggregate it. It shows the key only to be ABC and
"key" : "ABC", "doc_count" : 24069
It can't take the entire key like ABC-DEF-GHI-fhho
Check your mapping, I expect you did not do anything for the mapping. That is when you can the standard analyzer for strings. The standard analyser brakes up at the "-", that is why you get the term you mentioned. Make the field not_analyzed and you should get better results.
When i use field.raw that fixes the issue...https://github.com/elasticsearch/kibana/issues/364

Resources