I am trying to generate various types in the same index based on various csv. As I donĀ“t know the amount of csv, making an input for each one would be non-viable.
So does anyone know how to generate types with the names of the files and in those, introduce the csv respectively?
input {
file {
path => "/home/user/Documents/data/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
skip_header => "true"
autodetect_column_names => true
autogenerate_column_names => true
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "final_index"
}
stdout {}
}
Thank you so much
Having multiple document structures in the same index has been removed in Elasticsearch indices since version 6, if a document is not looking the same way as the index is templated it will not be able to send the data to it, what you can do is make sure that all fields are known and you have one general template containing all possible fields.
Is there a reason why you want all of it in one index?
If it is for querying purposes or Kibana, do know you can wildcard when searching and have patterns for Kibana.
Update after your comment:
Use a filter to extract the filename using grok
filter {
grok {
match => ["path","%{GREEDYDATA}/%{GREEDYDATA:filename}\.csv"]
}
}
And use the filename in your output like this:
elasticsearch {
hosts => "http://localhost:9200"
index => "final_index-%{[filename]}"
}
Related
I started ELK a week back to use it for storing multiple CSVs and getting them to kibana for ease of analysing them. One case will involve multiple machines and one machine will generate many CSVs. Now these CSVs have a particular naming pattern. I am taking one particular file ( BrowsingHistoryView_DMZ-machine1.csv ) for reference and setting up the case as index. To define an index I've chosen to rename files to have prefix of '__case_number __' . So the file name will be- __1__BrowsingHistoryView_DMZ-machine1.csv
Now I want to derive two things out of it.1. Get the case number __1 __ and use 1 as index. 1 , 2 , 3 etc will be used as a case numbers.
2. Get the filetype (BrowsingHistoryView for ex.) and add a tag name to the uploaded file.
3. Get the machine name DMZ-machine1 (don't know yet where I'll use it).
I created a config file for it, which is as below-
file {
path => "/home/kriss/Documents/*.csv" # get the files from Documents
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
}
if [path] =~ "BrowsingHistory" { mutate { add_tag => ["Browsinghistory"] } # define a new tag for browser history, this worked
grok { match => ["path", "__(?<case>[0-9]+)__(?<category>\w+_)(?<machine>(.+)).csv"] # This regex pattern is to get category(browsingHistory), MachineName
}
}
if [path] =~ "Inbound_RDP_Events" { mutate { add_tag => {"Artifact" => "RDP" } } }
} # This tagging worked
output {
elasticsearch {
hosts => "localhost"
index => "%{category}" # This referencing the category variable didn't work
}
stdout {}
}
When I run this config on logstash, the index generated is %category . I needed it to capture browser_history for the index of that file. Also if I can convert the category to small letters, since sometimes uppercases don't work well in index. I tried to follow the official documentation but didn't get the complete info that I need.
There's a grok debugger in Dev Tools in Kibana you can use to work on these kinds of problems, or an online one at https://grokdebug.herokuapp.com/ - it's great.
Below is a slightly modified version of your config. I've removed your comments and inserted my own.
The changes are:
The path regex in your config doesn't match the example filename you gave. You might want to change it back, depending on how accurate your example was.
The grok pattern has been tweaked
Changed your Artifact tag to a field, because it looks like you're trying to create a field
I tried to stick to your spacing convention :)
input {
file {
path => "/home/kriss/Documents/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
}
# I replaced your regex with something matches your example
# filename, but given that you said you already had this
# working, you might want to change it back.
if [path] =~ "browser_history" {
mutate { add_tag => ["Browsinghistory"] }
grok {
# I replaced custom captures with a more grokish style, and
# use GREEDYDATA to capture everything up to the last '_'
match => [ "path", "^_+%{NUMBER:case}_+%{GREEDYDATA:category}_+%{DATA:case}\.csv$" ]
}
}
# Replaced `add_tag` with `add_field` so that the syntax makes sense
if [path] =~ "Inbound_RDP_Events" { mutate { add_field => {"Artifact" => "RDP" } } }
# Added the `mutate` filter's `lowercase` function for "category"
mutate {
lowercase => "category"
}
}
output {
elasticsearch {
hosts => "localhost"
index => "%{category}"
}
stdout {}
}
Not tested, but I hope it gives you enough clues.
So, for the reference for anyone who is trying to use custom variables in logstash config file. Below is the working config-
input {
file {
path => "/home/user1/Documents/__1__BrowsingHistoryView_DMZ-machine1.csv" # Getting the absolte path (necessary)
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
}
if [path] =~ "BrowsingHistory" { mutate { add_field => {"artifact" => "Browsinghistory"} } # if BrowsingHistory is found in path, add a tag called Browsinghistory
grok { match => ["path", "__(?<case>[0-9]+)__(?<category>\w+_)(?<machine>(.+)).csv"] # get the caseNumber, logCategory, machineName into variables
}
}
if [path] =~ "Inbound_RDP_Events" { mutate { add_field => {"artifact" => "RDP"} } } # another tag if RDP event file found in path
}
output {
elasticsearch {
hosts => "localhost"
index => "%{case}" # passing the variable value derived from regex
# index => "%{category}" # another regex variable
# index => "%{machine}" # another regex variable
}
stdout {}
}
I wasn't very sure whether to add a new tag or a new field (add_field => {"artifact" => "Browsinghistory"}) for easy identification of a file in kibana. If someone could provide some info on how to chose one out of them.
How to load binary files (pdf, xls, other...) with Logstash and not changing their content.
Currently I try to load with
input {
file {
path => "C:/path/files/*"
type => "gesamt"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
multiline {
pattern => "/.*./gesamt"
negate => true
what => "previous"
}
base64 {
field => "blob"
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "index-name"
pipeline=>"test-pipeline"
}
}
It seems that multiline filter damages the binary content.
You cannot just dump binary files into Elasticsearch, this will not make them searchable, and a filesystem might be better suited to hold them.
If you want to make them searchable, you might want to take a look at the ingest attachment processor
Trying to update a specific field in elasticsearch through logstash. Is it possible to update only a set of fields through logstash ?
Please find the code below,
input {
file {
path => "/**/**/logstash/bin/*.log"
start_position => "beginning"
sincedb_path => "/dev/null"
type => "multi"
}
}
filter {
csv {
separator => "|"
columns => ["GEOREFID","COUNTRYNAME", "G_COUNTRY", "G_UPDATE", "G_DELETE", "D_COUNTRY", "D_UPDATE", "D_DELETE"]
}
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
query => "GEOREFID:%{GEOREFID}"
fields => [["JSON_COUNTRY","G_COUNTRY"],
["XML_COUNTRY","D_COUNTRY"]]
}
if [G_COUNTRY] {
mutate {
update => { "D_COUNTRY" => "%{D_COUNTRY}"
}
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
document_id => "%{GEOREFID}"
}
}
We are using the above configuration when we use this the null value field is getting removed instead of skipping null value update.
Data comes from 2 different source. One is from XML file and the other is from JSON file.
XML log format : GEO-1|CD|23|John|892|Canada|31-01-2017|QC|-|-|-|-|-
JSON log format : GEO-1|AS|33|-|-|-|-|-|Mike|123|US|31-01-2017|QC
When adding one log new document will get created in the index. When reading the second log file the existing document should get updated. The update should happen only in the first 5 fields if log file is XML and last 5 fields if the log file is JSON. Please suggest us on how to do this in logstash.
Tried with the above code. Please check and can any one help on how to fix this ?
For the Elasticsearch output to do any action other than index you need to tell it to do something else.
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
action => "update"
document_id => "%{GEOREFID}"
}
This should probably be wrapped in a conditional to ensure you're only updating records that need updating. There is another option, though, doc_as_upsert
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
action => "update"
doc_as_upsert => true
document_id => "%{GEOREFID}"
}
This tells the plugin to insert if it is new, and update if it is not.
However, you're attempting to use two inputs to define a document. This makes things complicated. Also, you're not providing both inputs, so I'll improvise. To provide different output behavior, you will need to define two outputs.
input {
file {
path => "/var/log/xmlhome.log"
[other details]
}
file {
path => "/var/log/jsonhome.log"
[other details]
}
}
filter { [some stuff ] }
output {
if [path] == '/var/log/xmlhome.log' {
elasticsearch {
[XML file case]
}
} else if [path] == '/var/log/jsonhome.log' {
elasticsearch {
[JSON file case]
action => "update"
}
}
}
Setting it up like this will allow you to change the ElasticSearch behavior based on where the event originated.
I am indexing a large corpora of information and I have a string-key that I know is unique. I would like to avoid using the search and rather access documents by this artificial identifier.
Since the Path directive is discontinued in ES 1.5, anyone know a workaround to this problem!?
My data look like:
{unique-string},val1, val2, val3...
{unique-string2},val4, val5, val6...
I am using logstash to index the files and would prefer to fetch the documents through a direct get, rather than through an exact-match.
In your elasticsearch output plugin, just specify the document_id setting with a reference to the field you want to use as id, i.e. the one named 1 in your csv filter.
input {
file {...}
}
filter {
csv{
columns=>["1","2","3"]
separator => ","
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
port => "9200"
index => "index-name"
document_id => "%{1}" <--- add this line
workers => 2
cluster => "elasticsearch-cluster"
protocol => "http"
}
}
Basic is a float field. The mentioned index is not present in elasticsearch. When running the config file with logstash -f, I am getting no exception. Yet, the data reflected and entered in elasticsearch shows the mapping of Basic as string. How do I rectify this? And how do I do this for multiple fields?
input {
file {
path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
type => "promosms_dec15"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
csv {
columns => ["Generation_Date","Basic"]
separator => ","
}
ruby {
code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
index => "promosms-%{+dd.MM.YYYY}"
workers => 1
}
}
You have two problems. First, your grok filter is listed prior to the csv filter and because filters are applied in order there won't be a "Basic" field to convert when the grok filter is applied.
Secondly, unless you explicitly allow it, grok won't overwrite existing fields. In other words,
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
will always be a no-op. Either specify overwrite => ["Basic"] or, preferably, use mutate's type conversion feature:
mutate {
convert => ["Basic", "float"]
}