How to force encoding for Logstash filters? (umlauts from message not recognized) - elasticsearch

I am trying to import historical log data into ElasticSearch (Version 5.2.2) using Logstash (Version 5.2.1) - all running under Windows 10.
Sample log file
The sample log file I am importing looks like this:
07.02.2017 14:16:42 - Critical - General - Ähnlicher Fehler mit übermäßger Ödnis
08.02.2017 14:13:52 - Critical - General - ästhetisch überfällige Fleißarbeit
Working configuration
For starters I tried the following simple Logstash configuration (it's running on Windows so don't get confused by the mixed slashes ;)):
input {
file {
path => "D:/logstash/bin/*.log"
sincedb_path => "C:\logstash\bin\file_clientlogs_lastrun"
ignore_older => 999999999999
start_position => "beginning"
stat_interval => 60
type => "clientlogs"
}
}
output {
if [type] == "clientlogs" {
elasticsearch {
index => "logstash-clientlogs"
}
}
}
And this works fine - I see that input gets read line by line into the index I specified - when I check with Kibana for example those two lines might look like this (I just ommitted host-name - click to enlarge):
More complex (not working) configuration
But of course this is still pretty flat data and I really want to extract the proper timestamps from my lines and also the other fields and replace #timestamp and message with those; so I inserted some filter-logic involving grok-, mutate- and date-filter in between inputand output so the resulting configuration looks like this:
input {
file {
path => "D:/logs/*.log"
sincedb_path => "C:\logstash\bin\file_clientlogs_lastrun"
ignore_older => 999999999999
start_position => "beginning"
stat_interval => 60
type => "clientlogs"
}
}
filter {
if [type] == "clientlogs" {
grok {
match => [ "message", "%{MONTHDAY:monthday}.%{MONTHNUM2:monthnum}.%{YEAR:year} %{TIME:time} - %{WORD:severity} - %{WORD:granularity} - %{GREEDYDATA:logmessage}" ]
}
mutate {
add_field => {
"timestamp" => "%{year}-%{monthnum}-%{monthday} %{time}"
}
replace => [ "message", "%{logmessage}" ]
remove_field => ["year", "monthnum", "monthday", "time", "logmessage"]
}
date {
locale => "en"
match => ["timestamp", "YYYY-MM-dd HH:mm:ss"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
}
}
output {
if [type] == "clientlogs" {
elasticsearch {
index => "logstash-clientlogs"
}
}
}
Now, when I look at those logs for example with Kibana, I see the fields I wanted to add do appear and the timestamp and message are replaced correctly, but my umlauts are all gone (click to enlarge):
Forcing charset in input and output
I also tried setting
codec => plain {
charset => "UTF-8"
}
for input and output, but that also did not change anything for the better.
Different output-type
When I change output to stdout { }
The output seems okay:
2017-02-07T13:16:42.000Z MYPC Ähnlicher Fehler mit übermäßger Ödnis
2017-02-08T13:13:52.000Z MYPC ästhetisch überfällige Fleißarbeit
Querying without Kibana
I also queried against the index using this PowerShell-command:
Invoke-WebRequest –Method POST -Uri 'http://localhost:9200/logstash-clientlogs/_search' -Body '
{
"query":
{
"regexp": {
"message" : ".*"
}
}
}
' | select -ExpandProperty Content
But it also returns the same messed up contents Kibana reveals:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"logstash-clientlogs","_type":"clientlogs","_id":"AVskdTS8URonc
bfBgFwC","_score":1.0,"_source":{"severity":"Critical","debug":"timestampMatched","message":"�hnlicher Fehler mit �berm��ger �dnis\r","type":"clientlogs","path":"D:/logs/Client.log","#timestamp":"2017-02-07T13:16:42.000Z","granularity":"General","#version":"1","host":"MYPC","timestamp":"2017-02-07 14:16:42"}},{"_index":"logstash-clientlogs","_type":"clientlogs","_id":"AVskdTS8UR
oncbfBgFwD","_score":1.0,"_source":{"severity":"Critical","debug":"timestampMatched","message":"�sthetisch �berf�llige Flei�arbeit\r","type":"clientlogs","path":"D:/logs/Client.log","#timestamp":"2017-02-08T13:13:52.000Z","granularity":"General","#version":"1","host":"MYPC","timestamp":"2017-02-08 14:13:52"}}]}}
Has anyone else experienced this and has a solution for this use-case? I don't see any setting for grok to specify any encoding (the file I am passing is UTF-8 with BOM) and encoding for input itself does not seem necessary, because it gets me the correct message when I leave out the filter.

Related

Logstash not importing data

I am working on an ELK stack setup I want to import data from a csv file from my PC to elasticsearch via logstash. Elasticsearch and Kibana is working properly.
Here is my logstash.conf file:
input {
file {
path => "C:/Users/aron/Desktop/es/archive/weapons.csv"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
csv {
separator => ","
columns => ["name", "type", "country"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200/"]
index => "weapons"
document_type => "ww2_weapon"
}
stdout {}
}
And a sample row data from my .csv file looks like this:
Name
Type
Country
10.5 cm Kanone 17
Field Gun
Germany
German characters are all showing up.
I am running logstash via: logstash.bat -f path/to/logstash.conf
It starts working but it freezes and becomes unresponsive along the way, here is a screenshot of stdout
In kibana, it created the index and imported 2 documents but the data is all messed up. What am I doing wrong?
If your task is only to import that CSV you better use the file upload in Kibana.
Should be available under the following link (for Kibana > v8):
<your Kibana domain>/app/home#/tutorial_directory/fileDataViz
Logstash is used if you want to do this job on a regular basis with new files coming in over time.
You can try with below one. It is running perfectly on my machine.
input {
file {
path => "path/filename.csv"
start_position => "beginning"
sincedb_path => "NULL"
}
}
filter {
csv {
separator => ","
columns => ["field1","field2",...]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => "https://localhost:9200"
user => "username" ------> if any
password => "password" ------> if any
index => "indexname"
document_type => "doc_type"
}
}

how filter {"foo":"bar", "bar": "foo"} with grok to get only foo field?

I copied
{"name":"myapp","hostname":"banana.local","pid":40161,"level":30,"msg":"hi","time":"2013-01-04T18:46:23.851Z","v":0}
from https://github.com/trentm/node-bunyan and save it as my logs.json. I am trying to import only two fields (name and msg) to ElasticSearch via LogStash. The problem is that I depend on a sort of filter that I am not able to accomplish. Well I have successfully imported such line as a single message but certainly it is not worth in my real case.
That said, how can I import only name and msg to ElasticSearch? I tested several alternatives using http://grokdebug.herokuapp.com/ to reach an useful filter with no success at all.
For instance, %{GREEDYDATA:message} will bring the entire line as an unique message but how to split it and ignore all other than name and msg fields?
At the end, I am planing to use here:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
grok {
match => { "message" => "data=%{GREEDYDATA:request}"}
}
#### some extra lines here probably
}
output
{
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
I have just gone through the list of available Logstash filters. The prune filter should match your need.
Assume you have installed the prune filter, your config file should look like:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
prune {
whitelist_names => [
"#timestamp",
"type",
"name",
"msg"
]
}
}
output {
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
Please be noted that you will want to keep type for Elasticsearch to index it into a correct type. #timestamp is required if you will view the data on Kibana.

Data type conversion using logstash grok

Basic is a float field. The mentioned index is not present in elasticsearch. When running the config file with logstash -f, I am getting no exception. Yet, the data reflected and entered in elasticsearch shows the mapping of Basic as string. How do I rectify this? And how do I do this for multiple fields?
input {
file {
path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
type => "promosms_dec15"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
csv {
columns => ["Generation_Date","Basic"]
separator => ","
}
ruby {
code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
index => "promosms-%{+dd.MM.YYYY}"
workers => 1
}
}
You have two problems. First, your grok filter is listed prior to the csv filter and because filters are applied in order there won't be a "Basic" field to convert when the grok filter is applied.
Secondly, unless you explicitly allow it, grok won't overwrite existing fields. In other words,
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
will always be a no-op. Either specify overwrite => ["Basic"] or, preferably, use mutate's type conversion feature:
mutate {
convert => ["Basic", "float"]
}

Logstash not importing files due to missing index error

I am having a difficult time trying to get the combination of the Logstash, Elasticsearch & Kibana working in my Windows 7 environment.
I have set all 3 up and they all seem to be running fine, Logstash and Elasticsearch are running as Windows services and Kibana as a website in IIS.
Logstash is running from http://localhost:9200
I have a web application creating log files in .txt with the format:
Datetime=[DateTime], Value=[xxx]
The log files get created in this directory:
D:\wwwroot\Logs\Errors\
My logstash.conf file looks like this:
input {
file {
format => ["plain"]
path => ["D:\wwwroot\Logs\Errors\*.txt"]
type => "testlog"
}
}
output {
elasticsearch {
embedded => true
}
}
My Kibana config.js file looks like this:
define(['settings'],
function (Settings) {
return new Settings({
elasticsearch: "http://localhost:9200",
kibana_index: "kibana-int",
panel_names: [
'histogram',
'map',
'pie',
'table',
'filtering',
'timepicker',
'text',
'fields',
'hits',
'dashcontrol',
'column',
'derivequeries',
'trends',
'bettermap',
'query',
'terms'
]
});
});
When I view Kibana I see the error:
No index found at http://localhost:9200/_all/_mapping. Please create at least one index.If you're using a proxy ensure it is configured correctly.
I have no idea on how to create the index, so if anyone can shed some light on what I am doing wrong that would be great.
It seems like nothing is making it to elasticsearch currently.
For the current version of es (0.90.5), I had to use elasticsearch_http output. The elasticsearch output seemed to be too closely associated with 0.90.3.
e.g: here is how my config is for log4j format to elastic search
input {
file {
path => "/srv/wso2/wso2am-1.4.0/repository/logs/wso2carbon.log"
path => "/srv/wso2/wso2as-5.1.0/repository/logs/wso2carbon.log"
path => "/srv/wso2/wso2is-4.1.0/repository/logs/wso2carbon.log"
type => "log4j"
}
}
output {
stdout { debug => true debug_format => "ruby"}
elasticsearch_http {
host => "localhost"
port => 9200
}
}
For my file format, I have a grok filter as well - to parse it properly.
filter {
if [message] !~ "^[ \t\n]+$" {
# if the line is a log4j type
if [type] == "log4j" {
# parse out fields from log4j line
grok {
match => [ "message", "TID:%{SPACE}\[%{BASE10NUM:thread_name}\]%{SPACE}\[%{WORD:component}\]%{SPACE}\[%{TIMESTAMP_ISO8601:timestamp}\]%{SPACE}%{LOGLEVEL:level}%{SPACE}{%{JAVACLASS:java_file}}%{SPACE}-%{SPACE}%{GREEDYDATA:log_message}" ]
add_tag => ["test"]
}
if "_grokparsefailure" not in [tags] {
mutate {
replace => ["message", " "]
}
}
multiline {
pattern => "^TID|^ $"
negate => true
what => "previous"
add_field => {"additional_log" => "%{message}"}
remove_field => ["message"]
remove_tag => ["_grokparsefailure"]
}
mutate {
strip => ["additional_log"]
remove_tag => ["test"]
remove_field => ["message"]
}
}
} else {
drop {}
}
}
Also, I would get elasticsearch head plugin to monitor your content in elasticsearch- to easily verify the data and state it is in.

logstash issue with json input file

i have the following json in a file-
{
"foo":"bar",
"spam" : "eggs"
},
{
"css":"ddq",
"eeqw": "fewq"
}
and the following conf file-
input {
file
{
path => "/opt/logstash-1.4.2/bin/sam.json"
type => "json"
codec => json_lines
start_position =>"beginning"
}
}
output { stdout { codec => json } }
but when i run
./logstash -f sample.conf
i don't get any output in stdout.
but when i don't give json as codec and give type => "core2" then it seems to work.
Anyone know how i can fix it to work for json type.
The other issue is it gives me the following output when it does give stdout-
{"message":"{","#version":"1","#timestamp":"2015-07-15T02:02:02.653Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"foo\":\"bar\", ","#version":"1","#timestamp":"2015-07-15T02:02:02.654Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"spam\" : \"eggs\" ","#version":"1","#timestamp":"2015-07-15T02:02:02.655Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"},","#version":"1","#timestamp":"2015-07-15T02:02:02.655Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"{ ","#version":"1","#timestamp":"2015-07-15T02:02:02.655Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"css\":\"ddq\", ","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"eeqw\": \"fewq\"","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"}","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}
I want to know how it can be parsed the right way with the key value pairs in my input file
I found this and edited it to suit your purpose. The following config should do exactly what you want:
input {
file {
codec => multiline
{
pattern => "^\}"
negate => true
what => previous
}
path => ["/absoute_path/json.json"]
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
mutate {
replace => [ "message", "%{message}}" ]
gsub => [ "message","\n",""]
gsub => [ "message","},",""]
}
if [message] =~ /^{.*}$/ {
json { source => message }
}
}
I tried your given json and it results in two events. First with foo = bar and spam = eggs. Second with css = ddq and eeqw = fewq.
As of my understanding you want to put your complete son document on one line if you want to use the json_lines codec:
{"foo":"bar","spam" : "eggs"}
{"css":"ddq","eeqw": "fewq"}
In your case you have a problem with the structure since you also have a ',' between the son objects. Not the most easy way to handle it. SO if possible change the source to my example. If that is not possible the multiline approach might help you. Check this for reference:
input json to logstash - config issues?

Resources