I want to use the logstash ruby plugin to rename a dynamic field name.
Specifically I want to strip out dots so I can feed it to Elasticsearch and remove some extra static text
A field name like this
foo.bar.Host11.x.y.uptime => 37
would become
host11_uptime => 37
or even better to split into seperate fields
host => 11
uptime => 37
Here's some general code to loop across fields in ruby. You could then split the field name to create the one (or more) that you wanted.
Related
I'm indexing logs in ELasticsearch through Logstash which contain a field with an array of codes, for example:
indicator.codes : [ "3", "120", "148" ]
Is there some way in Logstash to lookup these codes in a csv and save the categories and descriptions in 2 new fields such as indicator.categories and indicator.descriptions.
A subset of the csv with 3 columns:
Column 1 => indicator.code
Column 2 => indicator.category
Column 3 => indicator.description
3;Hiding;There are signs in the header
4;Hiding;This binary might try to schedule a task
34;General;This is a 7zip selfextracting file
120;General;This is a selfextracting RAR file
121;General;This binary tries to run as a service
148;Stealthiness;This binary uses tunnel traffic
I've been looking at the csv filter and the translate filter, but they do not seem to be able to lookup multiple keys.
The translate filter seems to work only with 2 columns. The csv filter seems unable to loop through the indicator.codes array.
I would suggest using a Ruby filter to loop over the indicator.codes and compare them to your data you retrieved from the csv.
https://www.elastic.co/guide/en/logstash/8.1/plugins-filters-ruby.html
sample message - 111,222,333,444,555,val1in6th,val2in6th,777
The sixth column contains a value consisting of commas (val1in6th,val2in6th is a sample value of 6th column).
When I use a simple csv filter this message is getting converted to 8 fields. I want to be able to tell the filter that val1in6th,val2in6th should be treated as a single value and placed as the value of 6th column (its okay not to have comma between val1in6th and val2in6th when placed as the output as 6th column).
change your plugin, no more the csv one but grok filter - doc here.
Then you use a debugger to create a parser for your lines - like this one: https://grokdebug.herokuapp.com/
For your lines you could use this grok expression:
%{WORD:FIELD1},%{WORD:FIELD2},%{WORD:FIELD3},%{WORD:FIELD4},%{WORD:FIELD5},%{GREEDYDATA:FIELD6}
or :
%{INT:FIELD1},%{INT:FIELD2},%{INT:FIELD3},%{INT:FIELD4},%{INT:FIELD5},%{GREEDYDATA:FIELD6}
It changes the datatypes in elastic of the firsts 5 fields.
To know about parse csv with grok filter in elastic you could use this es official blog guide, it is explained how to use grok with ingestion pipeline, but it is the same with logstash
Small part of my CSV log:
TAGS
contentms:Drupal;contentms.ver:7.1.8;vuln:rce;cve:CVE-2018-0111;
cve:CVE-2014-0160;vuln:Heartbleed;
contentms.ver:4.1.6;contentms:WordPress;tag:backdoor
tag:energia;
Idea is that I know nothing of the keys and values other than the format
key:value;key:value;key:value;key:value; etc
I just create an pattern with logstash plugin "kv"
kv {
source => "TAGS"
field_split => ";"
value_split => ":"
target => "TAGS"
}
I've been trying to get my data into Elastic for Kibana and some of it goes through. But for example keys contentms: and contentms.ver: don't get read. Also keys that do - only one value is searchable in Kibana. For example key cve: is seen on mutliple lines mutliple times in my log with different values but only this value is indexed cve:CVE-2014-0160 same problem for tag: and vuln: keys.
I've seen some similar problems and solutions with ruby, but any solutions with just kv? or change my log format around a bit?
I can't test it right now, but notice that you have both "contentms" (a string) and "contentms.ver", which probably looks to elasticsearch like a nested field ([contentms][ver]), but "contentms" was already defined as a string, so you can't nest beneath it.
After the cvs filter, try renaming "contentms" to "[contentms][name]", which would then be a peer to "[contentms][ver]".
You'd need to start with a new index to create this new mapping.
I intend to have an ELK stack setup where daily JSON inputs get stored in log files created, one for each date. My logstash shall listen to the input via these logs and store it to Elasticsearch at an index corresponding to the date of the log file entry.
My logstash-output.conf goes something like:
output {
elasticsearch {
host => localhost
cluster => "elasticsearch_prod"
index => "test"
}
}
Thus, as for now, all the inputs to logstash get stored at index test of elasticsearch. What I want is that an entry to logstash occurring on say, 2015.11.19, which gets stored in logfile named logstash-2015.11.19.log, must be correspondingly stored at an index test-2015.11.19.
How should I edit my logstash configuration file to enable this ?
Answer because the comment can't be formatted and it looks awful.
Your filename ( I assume you use a file input ) is stored in your path variable as such:
file {
path => "/logs/**/*my_log_file*.log"
}
type => "myType"
}
This variable is accessible throughout your whole configuration, so what you can do is use a regex filter to parse your date out of the path, for example using grok, you could do something like that (look out: Pseudocode)
if [type] == "myType" {
grok {
match => {
"path" => "%{MY_DATE_PATTERN:myTimeStampVar}"
}
}
}
With this you now have your variable in "myTimeStampVar" and you can use it in your output:
elasticsearch {
host => "127.0.0.1"
cluster => "logstash"
index => "events-%{myTimeStampVar}"
}
Having said all this, I am not quite sure why you need this? I think it is better to have ES do the job for you. It will know the timestamp of your log and index it accordingly so you have easy access to it. However, the setup above should work for you, I used a very similar approach to parse out a client name and create sub-indexes on a per-client bases, for example: myIndex-%{client}-%{+YYYY.MM.dd}
Hope this helps,
Artur
Edit: I did some digging because I suspect that you are worried your logs get put in the wrong index because they are parsed at the wrong time? If this is correct, the solution is not to parse the index out of the log file, but to parse the timestamp out of each log.
I assume each log line for you has a timestamp. Logstash will create an #timestamp field which is the current date. So this would be not equal to the index. However, the correct way to solve this, is to mutate the #timestamp field and instead use the timestamp in your log line (the parsed one). That way logstash will have the correct index and put it there.
I am using Mpdreamz/NEST for integrating Elasticsearch in C# . Is there any way to limit the number of words in a result string of query??
For example I have a field named 'Content' in ES and I need to dispaly 30 words of 'Content' matching 'sensex' from my index.
Thanks in advance for any help
You can't so easily even within Elasticsearch itself.
You have three options
Force excerpts by using highlighting
Try to use script_fields to return the first 30 words
At index time add another field that has just the first 30 words
Eventhough the first two are possible to do with NEST i would go for the third option since it won't incur a performance penalty at query time.