How to parse a xml-file with logstash filters - elasticsearch

I'm trying to index some simple XML-files with elasticsearch and logstash. So far I have the ELK-stack set up, and logstash-forwarder. I am trying to use the documentation to set up a xml filter, but I just cant seem to get it right.
My XML format is pretty straigth forward;
<Recording>
<DataFile description="desc" fileName="test.wav" Source="mic" startTime="2014-12-12_121212" stopTime="2014-12-12_131313"/>
</Recording>
I just want each file to be an entry in elasticsearch, and every parameter in the DataFile-tag to be a key-value that I can search. Since the documentation is getting me nowhere, how would such a filter look? I have also tried to use the answers in this and this without any luck.

Add the below in your logstash-forwarder configuration and change the logstash server IP, Certificate path and the log path accordingly.
{
"network": {
"servers": [ "x.x.x.x:5043" ],
"ssl ca": " / cert/server.crt",
"timeout": 15
},
"files": [
{
"paths": [
"D:/ELK/*.log"
],
"fields": { "type": "log" }
}
]
}
Add the below input plugin in your logstash server configuration. Change the certificate ,key path and name accordingly.
lumberjack {
port => 5043
type => "lumberjack"
ssl_certificate => " /cert/server.crt"
ssl_key => "D:/ELK/logstash/cert/server.key"
codec => multiline {
pattern => "(\/Recording>)"
what => "previous"
negate => true
}
}
Now add the below grok filter under your logstash filter section
grok {
match => ["message", "(?<content>(< Recording(.)*?</Recording>))"]
tag_on_failure => [ ]
}
Finally in the logstash output session add
elasticsearch {
host => "127.0.0.1"
port => "9200"
protocol => "http"
index => "Recording-%{+YYYY.MM.dd}"
index_type => "log"
}
Now when you add your xml messages into your log file. Each entry will be processed and stored in your elastic search server.
Thanks,

Related

csv file input processing using logstash stops working after 2/3 days

I am using logstash-1.5.1 to process csv file and get upload into elasticsearch-1.5.1.This process should happen for every day.So I put my logstash and elastic search engines up once and left it hoping that csv file processing should happen for every day and get uploaded into elasticsearch.Every day one new csv file is being downloaded from internet and get stored in local folder from where logstash reads. But surprisingly the logstash stop processing the csv file after 2/3 days.I don't know the reason please help me . The logstash input file configuration is as follows.
input {
file {
type => "csv"
path => "D:/Tools/logstash-1.5.1/data/**/*"
start_position => beginning
sincedb_path => "D:/Tools/logstash-1.5.1/sincedb/.sincedb"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
output {
elasticsearch {
host => "localhost"
cluster => "Test"
node_name => "data"
index => "client"
template => "D:/Tools/logstash-1.5.1/lib/elasticsearch-template.json"
template_overwrite => true
}
}
So, try to use logstash-forwarder and please post result, I am really interesting in it.
You can install logstash-forwarder with, for example, this configuration:
{
"network": {
"servers": [ "$YOUR_SERVER:$PORT" ],
"timeout": 20,
"ssl ca": "/path/to/logstash/*.crt_file"
},
"files": [
{
"paths": ["D:/Tools/logstash-1.5.1/data/**/*"],
"fields": { "type": "csv" },
"dead time" : "5m"
}
]
}
And in your logstash server you can use this input:
input {
lumberjack {
port => "$PORT"
ssl_key => "/path/to/your/*.key_file"
ssl_certificate => " "/path/to/your/*.key_file""
}
}

logstash, syslog and grok

I am working on an ELK-stack configuration. logstash-forwarder is used as a log shipper, each type of log is tagged with a type-tag:
{
"network": {
"servers": [ "___:___" ],
"ssl ca": "___",
"timeout": 15
},
"files": [
{
"paths": [
"/var/log/secure"
],
"fields": {
"type": "syslog"
}
}
]
}
That part works fine... Now, I want logstash to split the message string in its parts; luckily, that is already implemented in the default grok patterns, so the logstash.conf remains simple so far:
input {
lumberjack {
port => 6782
ssl_certificate => "___" ssl_key => "___"
}
}
filter {
if [type] == "syslog" {
grok {
match => [ "message", "%{SYSLOGLINE}" ]
}
}
}
output {
elasticsearch {
cluster => "___"
template => "___"
template_overwrite => true
node_name => "logstash-___"
bind_host => "___"
}
}
The issue I have here is that the document that is received by elasticsearch still holds the whole line (including timestamp etc.) in the message field. Also, the #timestamp still shows the date of when logstash has received the message which makes is bad to search since kibana does query the #timestamp in order to filter by date... Any idea what I'm doing wrong?
Thanks, Daniel
The reason your "message" field contains the original log line (including timestamps etc) is that the grok filter by default won't allow existing fields to be overwritten. In other words, even though the SYSLOGLINE pattern,
SYSLOGLINE %{SYSLOGBASE2} %{GREEDYDATA:message}
captures the message into a "message" field it won't overwrite the current field value. The solution is to set the grok filter's "overwrite" parameter.
grok {
match => [ "message", "%{SYSLOGLINE}" ]
overwrite => [ "message" ]
}
To populate the "#timestamp" field, use the date filter. This will probably work for you:
date {
match => [ "timestamp", "MMM dd HH:mm:ss", "MMM d HH:mm:ss" ]
}
It is hard to know were the problem without seeing an example event that is causing you the problem. I can suggest you to try the grok debugger in order to verify the pattern is correct and to adjust it to your needs once you see the problem.

Changing the elasticsearch host in logstash 1.3.3 web interface

I followed the steps in this document and I was able to do get some reports on the Shakespeare data.
I want to do the same thing with elastic search remotely installed.I tried configuring the "host" in config file but the queries still run on host as opposed to remote .This is my config file
input {
stdin{
type => "stdin-type" }
file {
type => "accessLog"
path => [ "/Users/akushe/Downloads/requests.log" ]
}
}
filter {
grok {
match => ["message","%{COMMONAPACHELOG} (?:%{INT:responseTime}|-)"]
}
kv {
source => "request"
field_split => "&?"
}
if [lng] {
kv {
add_field => [ "location" , ["%{lng}","%{lat}"]]
}
}else if [lon] {
kv {
add_field => [ "location" , ["%{lon}","%{lat}"]]
}
}
}
output {
elasticsearch {
host => "slc-places-qa-es3001.slc.where.com"
port => 9200
}
}
You need to add protocol => http in to make it use HTTP transport rather than joining the cluster using multicast.

Issue using grok filter with logstash and a windows file

I am attempting to filter a sql server error log using Logstash and grok. Logstash 1.3.3 is running as a windows service using NSSM and JRE6. My config file is below
input {
file {
path => "c:\program files\microsoft sql server\mssql10_50.mssqlserver\mssql\log\errorlog"
type => SQLServerLog
start_position => "beginning"
codec => plain {
charset => "UTF-8"
}
}
}
filter {
grok {
type => "SQLServerLog"
match => [ "message", "%{DATESTAMP:DateStamp} %{WORD:Process} %{GREEDYDATA:Message}" ]
named_captures_only => true
singles => true
remove_tag => [ "_grokparsefailure" ]
add_tag => [ "GrokFilterWorked" ]
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
embedded => true
}
}
A sample of the log file content is below.
2014-01-31 00:00:38.73 spid21s This instance of SQL Server has been using a process ID of 14632 since 28/01/2014 13:09:24 (local) 28/01/2014 13:09:24 (UTC). This is an informational message only; no user action is required.
Events are visible in Kibana but when collapsed the message is displayed like {"message":"\u00002\u00000\u00001\u00004...
When expanded the table view shows the event message as text instead. The raw data for the event when viewed is as below.
{
"_index": "logstash-2014.01.31",
"_type": "SQLServerLog",
"_id": "NpvKSf4eTFSHkBdoG3zw6g",
"_score": null,
"_source": {
"message": "\u00002\u00000\u00001\u00004\u0000-\u00000\u00001\u0000-\u00003\u00000\u0000 \u00000\u00000\u0000:\u00000\u00000\u0000:\u00002\u00001\u0000.\u00006\u00004\u0000 \u0000s\u0000p\u0000i\u0000d\u00002\u00004\u0000s\u0000 \u0000 \u0000 \u0000 \u0000 \u0000T\u0000h\u0000i\u0000s\u0000 \u0000i\u0000n\u0000s\u0000t\u0000a\u0000n\u0000c\u0000e\u0000 \u0000o\u0000f\u0000 \u0000S\u0000Q\u0000L\u0000 \u0000S\u0000e\u0000r\u0000v\u0000e\u0000r\u0000 \u0000h\u0000a\u0000s\u0000 \u0000b\u0000e\u0000e\u0000n\u0000 \u0000u\u0000s\u0000i\u0000n\u0000g\u0000 \u0000a\u0000 \u0000p\u0000r\u0000o\u0000c\u0000e\u0000s\u0000s\u0000 \u0000I\u0000D\u0000 \u0000o\u0000f\u0000 \u00001\u00004\u00006\u00003\u00002\u0000 \u0000s\u0000i\u0000n\u0000c\u0000e\u0000 \u00002\u00008\u0000/\u00000\u00001\u0000/\u00002\u00000\u00001\u00004\u0000 \u00001\u00003\u0000:\u00000\u00009\u0000:\u00002\u00004\u0000 \u0000(\u0000l\u0000o\u0000c\u0000a\u0000l\u0000)\u0000 \u00002\u00008\u0000/\u00000\u00001\u0000/\u00002\u00000\u00001\u00004\u0000 \u00001\u00003\u0000:\u00000\u00009\u0000:\u00002\u00004\u0000 \u0000(\u0000U\u0000T\u0000C\u0000)\u0000.\u0000 \u0000T\u0000h\u0000i\u0000s\u0000 \u0000i\u0000s\u0000 \u0000a\u0000n\u0000 \u0000i\u0000n\u0000f\u0000o\u0000r\u0000m\u0000a\u0000t\u0000i\u0000o\u0000n\u0000a\u0000l\u0000 \u0000m\u0000e\u0000s\u0000s\u0000a\u0000g\u0000e\u0000 \u0000o\u0000n\u0000l\u0000y\u0000;\u0000 \u0000n\u0000o\u0000 \u0000u\u0000s\u0000e\u0000r\u0000 \u0000a\u0000c\u0000t\u0000i\u0000o\u0000n\u0000 \u0000i\u0000s\u0000 \u0000r\u0000e\u0000q\u0000u\u0000i\u0000r\u0000e\u0000d\u0000.\u0000\r\u0000",
"#version": "1",
"#timestamp": "2014-01-31T08:55:03.373Z",
"type": "SQLServerLog",
"host": "MyMachineName",
"path": "C:\\Program Files\\Microsoft SQL Server\\MSSQL10_50.MSSQLSERVER\\MSSQL\\Log\\ERRORLOG"
},
"sort": [
1391158503373,
1391158503373
]
}
I am unsure whether the encoding of the message is preventing Grok from filtering it properly.
I would like to be able to filter these events using Grok and am unsure how to proceed.
Further info:
I created a copy of the log file as UTF-8 and the filter worked fine. So it's definitely a charset issue. I guess I need to determine what the correct charset for the log file is and it should work.
So I had the same issue with reading SQL Server log file.
Then I realised that SQL Server will log the same entries to the Windows Event Log, which logstash supports as an input.
SQL Server logs entries with 'MSSQLSERVER' source on my systems. You will need the logstash-contrib package, simply extract the contents over base logstash files on your Windows box (wherever you run logstash to collect data).
I have my logstash agent configured to simply ship the entries to another logstash instance on a linux box that does some other stuff not relevant to this question ;)
Example logstash.conf:
input {
eventlog {
type => "Win32-EventLog"
logfile => ["Application", "Security", "System"]
}
}
filter {
if "MSSQLSERVER" in [SourceName] {
# Track logon failures
grok {
match => ["Message", "Login failed for user '%{DATA:username}'\..+CLIENT: %{IP:client_ip}"]
}
dns {
action => "append"
resolve => "client_ip"
}
}
}
output {
stdout { codec => rubydebug }
tcp {
host => "another-logstash-instance.local"
port => "5115"
codec => "json_lines"
}
}
Hope this helps.

Logstash not importing files due to missing index error

I am having a difficult time trying to get the combination of the Logstash, Elasticsearch & Kibana working in my Windows 7 environment.
I have set all 3 up and they all seem to be running fine, Logstash and Elasticsearch are running as Windows services and Kibana as a website in IIS.
Logstash is running from http://localhost:9200
I have a web application creating log files in .txt with the format:
Datetime=[DateTime], Value=[xxx]
The log files get created in this directory:
D:\wwwroot\Logs\Errors\
My logstash.conf file looks like this:
input {
file {
format => ["plain"]
path => ["D:\wwwroot\Logs\Errors\*.txt"]
type => "testlog"
}
}
output {
elasticsearch {
embedded => true
}
}
My Kibana config.js file looks like this:
define(['settings'],
function (Settings) {
return new Settings({
elasticsearch: "http://localhost:9200",
kibana_index: "kibana-int",
panel_names: [
'histogram',
'map',
'pie',
'table',
'filtering',
'timepicker',
'text',
'fields',
'hits',
'dashcontrol',
'column',
'derivequeries',
'trends',
'bettermap',
'query',
'terms'
]
});
});
When I view Kibana I see the error:
No index found at http://localhost:9200/_all/_mapping. Please create at least one index.If you're using a proxy ensure it is configured correctly.
I have no idea on how to create the index, so if anyone can shed some light on what I am doing wrong that would be great.
It seems like nothing is making it to elasticsearch currently.
For the current version of es (0.90.5), I had to use elasticsearch_http output. The elasticsearch output seemed to be too closely associated with 0.90.3.
e.g: here is how my config is for log4j format to elastic search
input {
file {
path => "/srv/wso2/wso2am-1.4.0/repository/logs/wso2carbon.log"
path => "/srv/wso2/wso2as-5.1.0/repository/logs/wso2carbon.log"
path => "/srv/wso2/wso2is-4.1.0/repository/logs/wso2carbon.log"
type => "log4j"
}
}
output {
stdout { debug => true debug_format => "ruby"}
elasticsearch_http {
host => "localhost"
port => 9200
}
}
For my file format, I have a grok filter as well - to parse it properly.
filter {
if [message] !~ "^[ \t\n]+$" {
# if the line is a log4j type
if [type] == "log4j" {
# parse out fields from log4j line
grok {
match => [ "message", "TID:%{SPACE}\[%{BASE10NUM:thread_name}\]%{SPACE}\[%{WORD:component}\]%{SPACE}\[%{TIMESTAMP_ISO8601:timestamp}\]%{SPACE}%{LOGLEVEL:level}%{SPACE}{%{JAVACLASS:java_file}}%{SPACE}-%{SPACE}%{GREEDYDATA:log_message}" ]
add_tag => ["test"]
}
if "_grokparsefailure" not in [tags] {
mutate {
replace => ["message", " "]
}
}
multiline {
pattern => "^TID|^ $"
negate => true
what => "previous"
add_field => {"additional_log" => "%{message}"}
remove_field => ["message"]
remove_tag => ["_grokparsefailure"]
}
mutate {
strip => ["additional_log"]
remove_tag => ["test"]
remove_field => ["message"]
}
}
} else {
drop {}
}
}
Also, I would get elasticsearch head plugin to monitor your content in elasticsearch- to easily verify the data and state it is in.

Resources