How to make a field in Kibana numeric (from String) - elasticsearch

I've inherited an ELK stack for logs and I'm still learning the ropes - I've been tasked with making two fields numeric on a certain type on our logstash indexes. Can't seem to figure out how to do this. Things I tried:
In the Kibana settings page, went to my logstash index and found the field. Went to edit on the controls tab, saw type listed as String (and it was immutable). Dropdown for format shows URL and String.
Went to one of my Elasticsearch hosts and found the grok rule for the document type, and found that they were indeed written to parse the field as a number. Example: %{NUMBER:response_code}
Ran out of ideas, since I don't know my way around the ELK stack.
Any help greatly appreciated, especially links to relevant documentation so I can understand what's going on. I'd be googling harder if I knew what to google.

Also note that %{NUMBER:response_code} doesn't make a number out of a string, it simply recognizes and parses a number present in a string, but the resulting response_code field is still a string, which you need to convert to number using a mutate/convert filter. grok will always parse a string into other smaller strings and it is your job to convert the resulting fields into the types you expect.
So you need to add this after your grok filter:
mutate {
convert => { "response_code" => "integer" }
}
From then on, the response_code in your event will be an integer and the logstash template used to create your daily logstash indices contains a specific dynamic template for integer fields. Note that the response_code field will be an integer only once the new logstash index is created, the existing indices will not change.

You will need to reindex your data. Because the Elasticsearch mapping (ie. schema) is already set to string for this field, you will not be able to index data as an integer within the same index.
A typical ELK setup will create rolling indices (per day or month), so it's possible to switch from string to interger between indices, but this is not recommended as it will interfere with long term aggregations and searches.
As you found out, changing the Grok rule will help with future data. Now, you need to pass all your existing data through Logstash again to apply the new ryles.
To do this, you can either pass the log files again, or have Logstash read from Elasticsearch using
input {
elasticsearch {
hosts => "localhost"
}
}
The newer versions of Elasticsearch should improve this by providing a native reindex API.

Try to view sample of documents:
curl -XGET 'localhost:9200/_search?q=opcode:userLessonComplexityPoll&pretty'
let say you see these docs:
{
"_index" : "myindex",
"_type" : "logs",
"_id" : "AWNoYI8pGmxxeL6jupEZ",
"_score" : 1.0,
"_source" : {
"production" : "0",
"lessonId" : "2144",
"opcode" : "userLessonComplexityPoll",
"courseId" : "45",
"lessonType" : "minitest",
...
So, try to convert in one document:
curl -XPOST 'localhost:9200/educa_stats-*/_update_by_query?pretty' -d '
{
"script": {
"lang": "painless",
"source": "if(ctx._source.lessonId instanceof String) { int lessonId = Integer.parseInt(ctx._source.lessonId); ctx._source.lessonId = (int)lessonId; }"
},
"query": {
"bool": {
"terms": {
"_id": ["AWNoYI8pGmxxeL6jupEZ", "AWMcRJYFGmxxeL6jucIZ"]
}
}
}
}'
success? Try to convert all documents by query:
curl -XPOST 'localhost:9200/educa_stats-*/_update_by_query?pretty' -d '
{
"script": {
"lang": "painless",
"source": "if(ctx._source.lessonId instanceof String) { int lessonId = Integer.parseInt(ctx._source.lessonId); ctx._source.lessonId = (int)lessonId; }"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "lessonId"
}
}
]
}
}
}'
All fields lessonId will be converted from String type to int (-2^32 - 2^32) type. It's all.

Related

Elasticsearch The float field becomes integer after aggregations

I have a field, it look like "usage": 66.667. I tried to get sum of this field:
"aggs": {
"sum_usage": {
"sum": {
"field": "usage"
}
}
}
But after this aggregation I have
"aggregations" : {
"sum_usage" : {
"value" : 66.0
}
}
Could you please tell me how does it happens? Why float filed becomes integer?
The reason is because in your index mapping the field is mapped as an integer. You can see this when running the following command:
GET your-index/_mapping/field/usage
The reason is that you didn't create your mapping explicitly and you let ES dynamically generate the mapping, which happens when you index your first document. When you did, the very first document must have had an integer value for the usage field (e.g. "1", "0", etc), and hence, the mapping was created with integer instead of float.
You need to explicitly create the mapping of your index with the proper types for all your fields. Then reindex your data and your query will work as you expect.

Elasticsearch 6.2: terms query require lowercase input when searching on keyword

I've created an example index, with the following mapping:
{
"_doc": {
"_source": {
"enabled": False
},
"properties": {
"status": { "type": "keyword" }
}
}
}
And indexed a document:
{"status": "CMP"}
When searching the documents with this status with a terms query, I find no results:
{
"query" : {
"terms": { "status": ["CMP"]}
}
}
However, if I make the same query by putting the input in lowercase, I will find my document:
{
"query" : {
"terms": { "status": ["cmp"]}
}
}
Why is it? Since I'm searching on a keyword field, the indexed content should not be analyzed and should match an uppercase value...
no more #Oliver Charlesworth Now - in Elastic 6.x - you could continue to use a keyword datatype, lowercasing your text with a normalizer,doc here. However in every cases you should change your index mapping and reindex your docs
The index and mapping creation and the search were part of a test suite. It seems that the setup part of the test suite was not executed, and the mapping was not applied to the index.
The index was then using the default types instead of the mapping types, resulting of the use of string fields instead of keywords.
After changing the setup method of the automated tests, the mappings are well applied to the index, and the uppercase values for the status "CMP" are now matching documents.
The symptoms you're seeing shouldn't occur, unless something else is wrong.
A keyword index is not analysed, so your index should contain only CMP. A terms query is also not analysed, etc. so your index is searched only for CMP. Hence there should be a match.

Find documents in Elasticsearch where `ignore_malformed` was triggered

Elasticsearch by default throws an exception if inserting data to a field which does not fit the existing type. For example, if a field has been created as number type, inserting a document with a string value for that field causes an error.
This behavior can be changed by enabling then ignore_malformed setting, which means such fields are silently ignored for indexing purposes, but retained in the _source document - meaning that the invalid values cannot be searched or aggregated, but are still included in the returned document.
This is preferable behavior in our use case, but we would wish to be able to locate such documents somehow so we can fix them in the future.
Is there any way to somehow flag documents for which some malformed fields were ignored? We control the document insertion process fully, so we can modify all insertion flags, or do a trial insert, or anything, to reach our goal.
You can use the exists query to find document where this field does not exist, see this example
PUT foo
{
"mappings": {
"bar": {
"properties": {
"baz": {
"type": "integer",
"ignore_malformed": true
}
}
}
}
}
PUT foo/bar/1
{
"baz": "field"
}
GET foo/bar/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "baz"
}
}
]
}
}
}
}
}
There is no dedicated mechanism though, so this search finds also documents where the field is not set intentionally
You cannot, when you search on elasticsearch, you don't search on document source but on the inverted index, which contains the analyzed data.
ignore_malformed flag is saying "always store document, analyze if possible".
You can try, create a mal-formed document, and use _termvectors API to see how the document is analyzed and stored in the inverted index, in a case of a string field, you can see an "Array" is stored as an empty string etc.. but the field will exists.
So forget the inverted index, let's use the source!
Scroll all your data until you find the anomaly, I use a small python script that search scroll, unserialize and I test field type for every documents (very long) but I can have a list of wrong document IDs.
Use a script query can be very long and crash your cluster, use with caution, maybe as a post_filter:
Here I want to retrieve the document where country_name is not a string:
{
"_source": false,
"timeout" : "30s",
"query" : {
"query_string" : {
"query" : "locale:de_ch"
}
},
"post_filter": {
"script": {
"script": "!(_source.country_name instanceof String)"
}
}
}
"_source:false" => I want only document ID
"timeout" => prevent crash
As you notice, this is a missing feature, I know logstash will tag
document that fail, so elasticsearch could implement the same thing.

elasticsearch: define field's order in returned doc

i'm doing sending queries to elasticsearch and it responde with an unknown order of fields in its documents.
how can i fix the order that elsasticsearch is returning fields inside documents?
i mean, i'm sending this query:
{
"index": "my_index",
"_source":{
"includes" : ["field1","field2","field3","field14"]
},
"size": X,
"body": {
"query": {
// stuff
}
}
}
and when it responds, it gives me something not in the good order.
i ultimatly want to convert this to csv, and want to fix csv headers.
is there something to do so i can get something like
doc1 :{"field1","field2","field3","field14"}
doc2 :{"field1","field2","field3","field14"}
...
in the same order as my "_source" ?
thank's for your help.
A document in Elasticsearch is a JSON hash/map and by definition maps are unordered.
One solution around this would be to use Logstash in order to extract docs from ES using an elasticsearch input and output them in CSV using a csv output. That way you can guarantee that the fields in the CSV file will have the exact same order as specified. Another benefit is that you don't have to write your own boilerplate code to extract from ES and sink to CSV, Logstash does it all for you for free.
The Logstash configuration would look something like this:
input {
elasticsearch {
hosts => "localhost"
query => '{ "query": { "match_all": {} } }'
size => 100
index => "my_index"
}
}
filter {}
output {
csv {
fields => ["field1","field2","field3","field14"]
path => "/path/to/file.csv"
}
}

Elasticsearch doesn't return results

I am facing a strange issue in elasticsearch query. I don't know much about elasticsearch. My query is:
{
"query":
{
"bool":
{
"must":
[
{
"text":
{
"countryCode2":"DE"
}
}
],
"must_not":[],
"should":[]
}
},"from":0,"size":1,"sort":[],"facets":{}
}
The issues is for "DE". It is giving me results but for "BE" or "IN" it returns empty result.
You are indexing using the default mapping, which by default removes english stopwords. The country codes "IN", "BE", and many more are stopwords which don't even get indexed, therefore it's not possible to have matching documents, nor get back those country codes when faceting on that field.
The solution is to reindex after having submitted your own mapping for the country code field:
{
"your_type_name" : {
"country" : {
"type" : "string", "index" : "not_analyzed"
}
}
}
If you already tried to do this but nothing changed, the mapping didn't get submitted properly. I would suggest to double check that its json structure is correct and that you can actually get it back using the get mapping api.
As this is a common problem the defaults are probably going to change in the future to be less intrusive and avoid applying any language dependent text analysis.

Resources