Parsing XML file using Logstash - elasticsearch

I am trying to parse an XML file in Logstash. I want to use XPath to do the parsing of documents in XML. So when I run my config file the data loads into elasticsearch but It is not in the way I want to load the data. The data loaded in elasticsearch is each line in xml document
Structure of my XML file
What I want to achieve:
create fields in elasticsearch that stores the follwing
ID =1
Name = "Finch"
My Config file:
input{
file{
path => "C:\Users\186181152\Downloads\stations.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
exclude => "*.gz"
type => "xml"
}
}
filter{
xml{
source => "message"
store_xml => false
target => "stations"
xpath => [
"/stations/station/id/text()", "station_id",
"/stations/station/name/text()", "station_name"
]
}
}
output{
elasticsearch{
codec => json
hosts => "localhost"
index => "xmlns"
}
stdout{
codec => rubydebug
}
}
Output in Logstash:
{
"station_name" => "%{station_name}",
"path" => "C:\Users\186181152\Downloads\stations.xml",
"#timestamp" => 2018-02-09T04:03:12.908Z,
"station_id" => "%{station_id}",
"#version" => "1",
"host" => "BW",
"message" => "\t\r",
"type" => "xml"
}

The multiline filter allows to create xml file as a single event and we can use xml-filter or xpath to parse the xml to ingest data in elasticsearch.
In the multiline filter, we mention a pattern( in below example) that is used by logstash to scan your xml file. Once the pattern matches all the entries after that will be considered as a single event.
The following is an example of working config file for my data
input {
file {
path => "C:\Users\186181152\Downloads\stations3.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
exclude => "*.gz"
type => "xml"
codec => multiline {
pattern => "<stations>"
negate => "true"
what => "previous"
}
}
}
filter {
xml {
source => "message"
store_xml => false
target => "stations"
xpath => [
"/stations/station/id/text()", "station_id",
"/stations/station/name/text()", "station_name"
]
}
}
output {
elasticsearch {
codec => json
hosts => "localhost"
index => "xmlns24"
}
stdout {
codec => rubydebug
}
}

Related

JSON parse error, original data now in message fiel {:message=>"incompatible json object type=java

My logstash filter code catches incorrect data, can anyone help me with the correct syntax? it points to an error in the pipeline, no, there are multiple pipelines I'm working on, but I'm also in doubt about the syntax
input {
file {
path => "/var/tmp/wd/accounts/*.json"
codec => "json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
json {
source => "message"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
template => "/etc/logstash/templates/accounts-template.json"
template_name =>["accounts-template.json"]
template_overwrite => true
index => "accounts-%{+yyyy.MM.dd}"
user => "user"
password => "password"
}
stdout {codec => rubydebug}
}

Loading a number of xml file into logstash

i want to load a number of xml file into logstash in the same time, so what i should to add in my config file!.
Thanks guys for your support :)
this is my config file :
input {
file {
path => "D:/test*.xml",
start_position => beginning
sincedb_path => "NUL"
codec => multiline {
pattern => "<invoicing>|</invoicing>"
negate => "true"
what => "previous"
auto_flush_interval => 1
max_lines => 3000
}
}
}
filter {
xml {
source => "message"
target => "message.parsed"
store_xml => false
force_array => false
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
index => "tizer005"
hosts => ["localhost:9200"]
document_type => "ChannelFiles"
}
}

LOGSTASH config for xml

I'm new to elasticsearch and logstash and I would like to create the config file for logstash that it can load the XML file data, so I can do the searching in elasticsearch using kibana. How to create this config?
The XML file structure is :
<?xml version="1.0" encoding="ISO-8859-15"?>
<ORDERS>
<ORDER>
<COMPANY_CODE>CHU</COMPANY_CODE>
<ETABLISSEMENET_CODE>CHU</ETABLISSEMENET_CODE>
<FOURNISSEUR>BI</FOURNISSEUR>
<DESTINATAIRE>CHUSUDRUN2</DESTINATAIRE>
<NUM_COMMANDE_MYTOWER>342</NUM_COMMANDE_MYTOWER>
<NUM_COMMANDE_CHU>CMD12345</NUM_COMMANDE_CHU>
<INSTRUCTIONS>COLIS</INSTRUCTIONS>
<ETAT>4</ETAT>
<DATE_DE_COMMANDE>01-01-2018</DATE_DE_COMMANDE>
<DATE_DE_DISPONIBILITE>01-01-2018</DATE_DE_DISPONIBILITE>
<MONTANT_HT>3695.0</MONTANT_HT>
<DATE_DE_CREATION></DATE_DE_CREATION>
<POIDS_BRUT>20.0</POIDS_BRUT>
<NOMBRE_COLIS>3</NOMBRE_COLIS>
Below is an example of xml conf in logstash:
input {
file
{
path => "/home/Test_xml.xml"
start_position => "beginning"
codec => multiline
{
pattern => "^<\?book .*\>"
negate => true
what => "previous"
}
sincedb_path => "/dev/null"
}
}
filter
{
xml {
source => "message"
target => "parsed"
}
split {
field => "[parsed][book]"
add_field => {
bookAuthor => "%{[parsed][book][author]}"
title => "%{[parsed][book][title]}"
genre => "%{[parsed][book][genre]}"
price => "%{[parsed][book][price]}"
publish_date => "%{[parsed][book][publish_date]}"
description => "%{[parsed][book][description]}"
}
}
}
output
{
elasticsearch {
hosts => "localhost:9200"
index => "xml_test"
}
stdout
{
codec => rubydebug
}
}
link for xml file
I tried to insert data in elasticsearh long back with logstash .
hope this will work.

Logstash code to upload CSV file content into Elasticsearch

I have CSV file and I want to import it into Elasticsearch 5.0.0 using Logstash.
This are first 2 lines of the CSV file:
Id,CameraId,ZoneId,Latitude,Longitude,Number,OriginalNumber,Country,OriginalCountry,CapturedOn,Status,Direction,Speed,Confidence,AvgDigitsHeight,MultiplateRate,ProcessingTimeOCR,Signaled,OcrImageId,EnvImageIds,CapturerId,CapturerType,IsAlarm,AlarmListIds,ReplicationId,ImagesUploaded
111,24,1,42.8,3.5,XXDFR,XXDFR,DE,DE,2017-03-04 12:06:20.0,0,1,0,99,21.0,3,16.0193003809306,0,0,[],null,null,0,[],0,0
I run this Logstash script:
input {
file {
path => ["/usr/develop/test.csv"]
type => "core2"
start_position => "beginning"
}
}
filter {
csv {
columns => [
"Id","CameraId","ZoneId","Latitude","Longitude,"Number","OriginalNumber","Country","OriginalCountry","CapturedOn","Status","Direction","Speed","Confidence","AvgDigitsHeight","MultiplateRate","ProcessingTimeOCR","Signaled","OcrImageId","EnvImageIds","CapturerId","CapturerType","IsAlarm","AlarmListIds","ReplicationId","ImagesUploaded"
]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "mytest"
document_type => "doc"
document_id => "%{Id}"
workers => 1
}
}
I get this error:
logstash.agent ] fetched an invalid config {:config=>"input
{\nfile {\npath => [\"/usr/develop/test.csv\"]\ntype =>
\"core2\"\nstart_position => \"beginning\" \n}\n}\nfilter {\ncsv
{\nseparator => \",\"\ncolumns =>
[\"Id\",\"CameraId\",\"ZoneId\",\"Latitude\",\"Longitude,\"Number\",\"OriginalNumber\",\"Country\",\"OriginalCountry\",\"CapturedOn\"]\n}\n}\noutput
{\nelasticsearch {\naction => \"index\"\nhosts =>
[\"localhost:9200\"]\nindex => \"test\"\ndocument_type =>
\"doc\"\ndocument_id => \"%{Id}\"\nworkers => 1\n}\nstdout { codec =>
rubydebug}\n}\n\n", :reason=>"Expected one of #, {, ,, ] at line 11,
column 61 (byte 225) after filter {\ncsv {\nseparator =>
\",\"\ncolumns =>
[\"Id\",\"CameraId\",\"ZoneId\",\"Latitude\",\"Longitude,\""}
Not sure if you caught this yet, but it's because you are missing a " for the column name "Longitude"

elasticsearch - import csv using logstash date is not parsed as of datetime type

I am trying to import csv into elasticsearch using logstash
I have tried using two ways:
Using CSV
Using grok filter
1) For csv below is my logstash file:
input {
file {
path => "path_to_my_csv.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["col1","col2_datetime"]
}
mutate {convert => [ "col1", "float" ]}
date {
locale => "en"
match => ["col2_datetime", "ISO8601"] // tried this one also - match => ["col2_datetime", "yyyy-MM-dd HH:mm:ss"]
timezone => "Asia/Kolkata"
target => "#timestamp" // tried this one also - target => "col2_datetime"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "my_collection"
}
stdout {}
}
2) Using grok filter:
For grok filter below is my logstash file
input {
file {
path => "path_to_my_csv.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => { "message" => "(?<col1>(?:%{BASE10NUM})),(%{TIMESTAMP_ISO8601:col2_datetime})"}
remove_field => [ "message" ]
}
date {
match => ["col2_datetime", "yyyy-MM-dd HH:mm:ss"]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "my_collection_grok"
}
stdout {}
}
PROBLEM:
So when I run both the files individually, I am able to import the data in elasticsearch. But my date field is not parsed as of datetime type rather it has been saved as string and because of that I am not able to run the date filters.
So can someone help me to figure out why it's happening.
My elasticsearch version is 5.4.1.
Thanks in advance
There are 2 changes I made to your config file.
1) remove the under_score in the column name col2_datetime
2) add target
Here is how my config file look like...
vi logstash.conf
input {
file {
path => "/config-dir/path_to_my_csv.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["col1","col2"]
}
mutate {convert => [ "col1", "float" ]}
date {
locale => "en"
match => ["col2", "yyyy-MM-dd HH:mm:ss"]
target => "col2"
}
}
output {
elasticsearch {
hosts => "http://172.17.0.1:9200"
index => "my_collection"
}
stdout {}
}
Here is the data file:
vi path_to_my_csv.csv
1234365,2016-12-02 19:00:52
1234368,2016-12-02 15:02:02
1234369,2016-12-02 15:02:07

Resources