logstash tab separator not escaping - elasticsearch

I have tab separated data which I want to input into logstash. Here is my configuration file:
input {
file {
path => "/*.csv"
type => "testSet"
start_position => "beginning"
}
}
filter {
csv {
separator => "\t"
}
}
output {
stdout {
codec => rubydebug
}
}
It simply looks for all .csv files and separates them using tabs. For an input like this:
col1 col2
data1 data2
logstash output is (for the two rows):
column1 => "col1\tcol2"
column1 => "data1\tdata2"
Obviously it is not correctly parsing it. I saw that this issue was brought up a while ago here but there was no solution. Does anyone know if this problem has been resolved or maybe there's another way to do it? Thanks!

Instead of using "\t" as the seperator, input an actual tab.
like this:
filter {
csv {
separator => " "
}
}

https://www.elastic.co/guide/en/logstash/6.2/plugins-filters-csv.html#plugins-filters-csv-separator
Define the column separator value. If this is not specified, the default is a comma ,. If you want to define a tabulation as a separator, you need to set the value to the actual tab character and not \t.

Related

Some of KV filter values has custom date that identified as string in Kibana

I'm using kv filter in Logstash to process config file in the following format :
key1=val1
key2=val2
key3=2020-12-22-2150
with the following lines in Logstash :
kv {
field_split => "\r\n"
value_split => "="
source => "message"
}
Some of my fields in the conf file have a the following date format : YYYY-MM-DD-HHMMSS. When Logstash send the fields to ES, Kibana display them as strings. How can I let Logstash know that those fields are date fields and by that indexing them in ES as dates and not strings ?
I don't want to edit the mapping of the index because it will require reindexing. My final goal with those fields is to calculate the diff between the fields (in seconds, minutes,hours..) and display it in Kibana.
The idea that I have :
Iterate over k,v filter results, if the value is of format YYYY-MM-DD-HHMMSS (check with regex)
In this case, chance the value of the field to milliseconds since epoch
I decided to use k,v filter and Ruby code as a solution but I'm facing an issue.
It could be done more easily outside of logstash by adding a dynamic_template on your index and let him manage field types.
You can use the field name as a detector if it is clear enough (*_date) or define a regex
"match_pattern": "regex",
"match": "^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d$"
The code above hasnot been tested.
You can find the official doc here.
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html
My solution :
I used the kv filter to convert each line into key value set.
I saved the kv filter resut into a dedicated field.
On this dedicated field, I run a Ruby script that changed all the dates with the custom format to miliseconds since epoch.
code :
filter {
if "kv_file" in [tags] {
kv {
field_split => "\r\n"
value_split => "="
source => "message"
target => "config_file"
}
ruby {
id => "kv_ruby"
code => "
require 'date'
re = /([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])-[0-23]{2}[0-5]{1}[0-9]{1}[0-5]{1}[0-9]{1})/
hash = event.get('config_file').to_hash
hash.each { |key,value|
if value =~ re
date_epochs_milliseconds = DateTime.strptime(value,'%F-%H%M%S').strftime('%Q')
event.set(key, date_epochs_milliseconds.to_i)
end
}
"
}
}
}
By the way, if you are facing the following error in your Ruby compilation : (ruby filter code):6: syntax error, unexpected null hash it doesn't actually mean that you got a null value, it seems that it is related to the escape character of the double quotes. Just try to replace double quotes with one quote.

Logstash doesn't import last line in txt file

I am trying to load a txt file into Elasticsearch with Logstash. The txt file is a simple text which consists of 3 lines, and it looks like this:
text file
My conf file looks like this:
input {
file {
path => "C:/Users/dinar/Desktop/myfolder/mytest.txt"
start_position => "beginning"
sincedb_path => "NULL"
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "mydemo"
document_type => "intro"
}
stdout {}
}
After I run this and I go to Kibana, I can see the index being created. However, the only messages I see are the first two lines, and the last line is not displayed. This is what I see:
Kibana page
Does anybody know why the last line is not being imported and how I can fix this?
Thank you all for your help.
Based on the screenshot, my assumption is that you are editing the file manually. In that case please verify, there is a newline after the last log entry: Just hit Enter & Save.

Unique count, array to string

There is my input
{"Names":"Name1, Name2","Country":"TheCountry"}
What i have been trying to do is count how many time a certain name appears not only in one input but also using all previous events. For that i have looked into Metrics but i cannot figure out how i might be able to do that. The first problem i have meet is that Names is a string and not an array.
I do not see how i might convert Names into an array and give it to metric. Is there any other solution ?
First of all, please check logstash configuration and add the following split filter to your logstash.yml file. Your comma separated names will be split while ingesting the data:
filter {
split {
field => "Names"
terminator => ","
target => "NamesArray"
}
}
And you can change your mapping. To add a new field to your type mapping like below:
{
"properties": {
...
"NamesArray": {
"type": "keyword"
}
...
}
}
You should use keyword type for NamesArray to get correct metrics about the separated words with the blank character.

Elasticsearch is slow to update mapping

Elasticsearch has become extremely slow when receiving input from Logstash, particularly when using file input. I have had to wait for 10+ minutes to get results. Oddly enough, standard input is mapped very quickly, which leads me to believe that my config file may be too complex, but it really isn't.
My config file uses file input, my filter is grok with 4 fields, and my output is to Elasticsearch. The file I am inputting is a .txt with 5 lines in it.
Any suggestions? Elasticsearch and Logstash newbie here.
input {
file {
type => "type"
path => "insert_path"
start_position = > "beginning'
}
}
filter {
grok { match => {"message" => "%{IP:client} %{WORD:primary} %{WORD:secondary} %{NUMBER:speed}" }
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
Thought: Is it a sincedb_path issue? I often try to reparse files without changing the filename.

Is it possible to join 2 lines of logs from 2 separate log files using Logstash

I have a 2 separate IIS log files (advanced and simple)
They need to be joined, because the advanced log is missing information that the simple one has written, but they have the same timestamp.
input { file {
path => "C:/Logs/*.log"
start_position => "beginning"}
}
filter {
grok {
break_on_match => "true"
match => ["message", '%{TIMESTAMP_ISO8601:log_timestamp} \"%{DATA:s_computername}\"']
match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{NOTSPACE:uriQuery} "]
}
output {
elasticsearch {
hosts => "localhost:9200"
}
}
}
simplified version. I want elasticsearch to have a entry containing log_timestamp, s_computername, and uriQuery.
Using the multiline codec, it is possible to merge them if the lines were next to each other. As they're in separate files, I have not yet found a way to do this. Is it possible to merge the two using the same timestamp as a unique identifier?

Resources