Where do .raw fields come from when using Logstash with Elasticsearch output? - elasticsearch

When using Logstash and Elasticsearch together, fields with .raw are appended for analyzed fields, so that when querying Elasticsearch with tools like Kibana, it's possible to use the field's value as-is without per-word splitting and what not.
I built a new installation of the ELK stack with the latest greatest versions of everything, and noticed my .raw fields are no longer being created as they were on older versions of the stack. There are a lot of folks posting solutions of creating templates on Elasticsearch, but I haven't been able to find much information as to why this fixes things. In an effort to better understand the broader problem, I ask this specific question:
Where do the .raw fields come from?
I had assumed that Logstash was populating Elasticsearch with strings as-analyzed and strings as-raw when it inserted documents, but considering the fact that the fix lies in Elasticsearch templates, I question whether or not my assumption is correct.

You're correct in your assumption that the .raw fields are the result of a dynamic template for string fields contained in the default index template that Logstash creates IF manage_template: true (which it is by default).
The default template that Logstash creates (as of 2.1) can be seen here. As you can see on line 26, all string fields (except the message one) have a not_analyzed .raw sub-field created.
However, the template hasn't changed in the latest Logstash versions as can be seen in the template.json change history, so either something else must be wrong with your install or you've changed your Logstash config to use your own index template (without .raw fields) instead.
If you run curl -XGET localhost:9200/_template/logstash* you should see the template that Logstash has created.

Related

Why does Elasticsearch ignore_malformed add malformed value to index?

I am using Serilog in C# to create a log file, which is ingested by Filebeat and sent via Logstash to Elasticsearch. The Elasticsearch indexes conform to ECS 1.5.
The log file sometimes contains erroneous values for the field "host.ip", it can contain values like "localhost:5000". This lead to rejected log posts, since a string like that cannot be converted into an ip number. This is all expected, and the issue of correcting the log file is not in the scope of this question.
I decided to add the "ignore_malformed: true" setting, on the index level. After that, the log posts are no longer rejected - I can find them in Elasticsearch. So, the setting is proven to have had effect. BUT the field "host.ip" now actually contains the malformed value "localhost:5000". I can't see how that is even possible, it is not what I expected or wanted.
From the documetation of "ignore_malformed", it would appear as if values that do not match the field type are supposed to be discarded - not written into the field. I also find no added "_ignored" field.
It's as if setting ignore_malformed to true actually allows the malformed data into the index, instead of dropping it. I'm expecting/wanting the field to be empty, if the value is malformed. Is this a bug, or am I missing something?
Whatever you send in the source document will always be there, ES will never modify it. However, the fact that you're now specifying ignore_malformed means that ES will not try to index malformed data, but the value will still be visible in your source document.

How do I exclude/predefine fields for Index Patterns in Kibana?

I am using ELK to monitor REST API servers. Logstash decomposes the URL into a JSON object with fields for query parameters, header params, request duration, headers.
TLDR: I want all these fields retained so when I look at a specific message, I can see all the details. But only need a few of them to query and generate reports/visualizations in Kibana.
I've been testing for a few weeks and adding some new fields on the server side. So whenever I do, I need to rescan the index. However the auto-detection now finds 300+ fields and I'm guessing it indexes all of them.
I would like to control it to just index a set of fields as I think the more it detects, the larger the index file gets?
It was about 300MB/day for a week (100-200 fields), and then when I added a new field I needed to refresh, it went to 350 fields; 1 GB/day. After I accidentally deleted the ELK instance yesterday, I redid everything and now the indexes are like 100MB/day so far which is why I got curious.
I found these docs but not sure which one's are relevant or how they relate/need to be put together.
Mapping, index patterns, indices, templates/filebeats/rollup policy
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html
https://discuss.elastic.co/t/index-lifecycle-management-for-existing-indices/181749/3
https://www.elastic.co/guide/en/elasticsearch/reference/7.3/indices-templates.html
(One has a PUT call that sends a huge JSON text but not sure how you would enter something like that in putty. POSTMAN/JMeter maybe but these need to be executed on the server itself which is just an SSH session, no GUI/text window.)
To remove fields from your log (since you are using logstash), you can use remove_field option of logstash mutate filter.
Ref: Mutate filter plugin

elasticsearch / kibana, search for documents where message contains '=' char

i have an issue which i suspect is quite basic but i have been stuck on this for too long and i fear i am missing something so basic that i can't see it by now.
we are using the ELK stack today for log analysis of our application logs.
logs are created by the JAVA application into JSON format, shipped using filebeat into logstash which in turn processes the input and queues it into ES.
some of the messages contain unstructured data in the message field which i currently cannot parse into separate fields so i need to catch them in the message field. problem is this:
the string i need to catch is: "57=1" this is an indication of something which i need to filter documents upon. i need to get documents which contain this exact string.
no matter what i try i can't get kibana to match this. it seems to always ignore the equal char and match either 57 or 1.
please advise.
thanks
You may check the Elasticsearch mapping on the field type of the referring field. If it is analyzed, the '=' may not have been indexed due to the default-analyzer. (source 1, source 2)

Can we migrate non stored Index data in SOLR to Elastic search?

We are currently using SOLR for full-text search. Now we are planning to move from SOLR to ElasticSearch. When we were in this process i have read somewhere that there are some plugins available which will migrate data from SOLR-ElasticSearch. But it won't be able to migrate those records which are not stored in SOLR. So is there a plugin available which will migrate non-stored index data from SOLR to elastic search if so please let me know.
Currently am using SOLR-to-ES plugin, but it won't migrate the non-stored index data.
Thanks
If the field is not stored, then you don't have the original value. If you have it indexed, what's is in there is the value after it has gone through the analysis chain, and so is probably different than the original one (has no stopwords, is probably lowercased, maybe stemmed...stuff like that).
There are a couple of possibilities that might allow you to have the original content when not stored:
indexed field: if it has been analyzed with just the keyword tokenizer: then the indexed value is the original value.
field has docValues=true then the original value is also stored. This feature was introduced later, so your index might not be using it.
The issue is, the common plugings might not take advantage of those cases where stored=true is not totally necessary. You need to check them.

Getting elasticsearch to utilize Bro timestamps through Logstash

I'm having some issues getting elasticsearch to interpret an epoch millis timestamp field. I have some old bro logs I want to ingest and have them be in the proper orders and spacing. Thanks to Logstash filter to convert "$epoch.$microsec" to "$epoch_millis"
I've been able to convert the field holding the bro timestamp to the proper length of digits. I've also inserted a mapping into elasticsearch for that field, and it says that the type is "Date" with the format being the default. However, when I go and look at the entries it still has a little "t" next to it instead of a little clock. And hence I can't use it for my filter view reference in kibana.
Anyone have any thoughts or have dealt with this before? Unfortunately it's a stand alone system so I would have to manually enter any of the configs I'm using.
I did try and convert my field "ts" back to an integer after using the method described in the link above. So It should be a logstash integer before hitting the elasticsearch mapping.
So I ended up just deleting all my mappings in Kibana, and elasticsearch. I then resubmitted and this time it worked. Must have been some old junk in there that was messing me up. But now it's working great!

Resources