I am using a folder as input:
input {
file {
path => "C:/ProjectName/Uploads/*"
start_position => "beginning"
sincedb_path => "dev/null"
}
}
and as output:
output {
elasticsearch {
hosts => "localhost"
index => "manual_index_name" # want filename here
document_type => "_doc"
}
}
I want the index in elasticsearch to be the name of the file being indexed.
I've tried variations of this answer with no success as I am not clear on what it is doing: https://stackoverflow.com/a/40156466/6483906
You'll need to use a grok filter to find the last portion of the filename:
filter {
grok {
match => ["path", "Uploads/%{GREEDYDATA:index_name}" ]
}
}
and then just use the portion in your index name index => "%{index_name}"
Related
elasticsearch and kibana both are running but when i use the following command to ingest csv file into elasticsearch it stops automatically and take a while to respond .
bin\logstash -f logstash.config
here is my logstash.confg
input {
file {
path => "C:\Users\Sireesha Chapa\Desktop\logstashData.csv"
start_position => "beginning"
}
}
filter {
csv {
separator => ","
columns => ["id","group","sex","disease","age"]
}
mutate { convert => ["id" ,"integer"] }
mutate { convert => ["age","integer"] }
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "health"
document_type => "patient_record"
}
stdout{}
}
Change the name of your logstash config to logstash.conf.
I am trying to use file as an input to logstash.Here is my logstash.conf
input {
file {
path => "/home/dxp/elb.log"
type => "elb"
start_position => "beginning"
sincedb_path => "/home/dxp/log.db"
}
}
filter {
if [type] == "elb" {
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:loadbalancer} %{IP:client_ip}:%{NUMBER:client_port:int} %{IP:backend_ip}:%{NUMBER:backend_port:int} %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} %{NUMBER:elb_status_code:int} %{NUMBER:backend_status_code:int} %{NUMBER:received_bytes:int} %{NUMBER:sent_bytes:int} %{QS:request}" ]
}
}
}
output
{
elasticsearch {
hosts => "10.99.0.180:9200"
manage_template => false
index => "elblog-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
My logs show this:
[2017-10-27T13:11:31,164][DEBUG][logstash.inputs.file ]_globbed_files: /home/dxp/elb.log: glob is []: I guess my file has not been read by logstash, so a new index is not formed in elasticsearch.
Please help me with what i am missing in this.
I am trying to import csv into elasticsearch using logstash
I have tried using two ways:
Using CSV
Using grok filter
1) For csv below is my logstash file:
input {
file {
path => "path_to_my_csv.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["col1","col2_datetime"]
}
mutate {convert => [ "col1", "float" ]}
date {
locale => "en"
match => ["col2_datetime", "ISO8601"] // tried this one also - match => ["col2_datetime", "yyyy-MM-dd HH:mm:ss"]
timezone => "Asia/Kolkata"
target => "#timestamp" // tried this one also - target => "col2_datetime"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "my_collection"
}
stdout {}
}
2) Using grok filter:
For grok filter below is my logstash file
input {
file {
path => "path_to_my_csv.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => { "message" => "(?<col1>(?:%{BASE10NUM})),(%{TIMESTAMP_ISO8601:col2_datetime})"}
remove_field => [ "message" ]
}
date {
match => ["col2_datetime", "yyyy-MM-dd HH:mm:ss"]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "my_collection_grok"
}
stdout {}
}
PROBLEM:
So when I run both the files individually, I am able to import the data in elasticsearch. But my date field is not parsed as of datetime type rather it has been saved as string and because of that I am not able to run the date filters.
So can someone help me to figure out why it's happening.
My elasticsearch version is 5.4.1.
Thanks in advance
There are 2 changes I made to your config file.
1) remove the under_score in the column name col2_datetime
2) add target
Here is how my config file look like...
vi logstash.conf
input {
file {
path => "/config-dir/path_to_my_csv.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["col1","col2"]
}
mutate {convert => [ "col1", "float" ]}
date {
locale => "en"
match => ["col2", "yyyy-MM-dd HH:mm:ss"]
target => "col2"
}
}
output {
elasticsearch {
hosts => "http://172.17.0.1:9200"
index => "my_collection"
}
stdout {}
}
Here is the data file:
vi path_to_my_csv.csv
1234365,2016-12-02 19:00:52
1234368,2016-12-02 15:02:02
1234369,2016-12-02 15:02:07
After importing my elasticsearch documents from a CSV file using Logstash, my documents have their ID value set to long alphanumeric strings. How can I have each document ID set to a numeric value instead?
Here is basically what my logstash config looks like:
input {
file {
path => "/path/to/movies.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["title","director","year","country"]
separator => ","
}
mutate {
convert => {
"year" => "integer"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "movie"
document_type => "movie"
}
stdout {}
}
The first and easiest option is to add a new column ID in your CSV and use that field as the document id.
Another option is to use a ruby filter that will add a dynamic ID to your events. The downside of this solution is that if your CSV changes and you re-run your pipeline each document might not get the same ID. Another downside is that you need to run your pipeline with only one worker (i.e. with -w 1) because the id_seq variable cannot be shared between worker pipelines.
filter {
csv {
columns => ["title","director","year","country"]
separator => ","
}
mutate {
convert => {
"year" => "integer"
}
}
# create ID
ruby {
"init" => "id_seq = 0"
"code" => "
event.set('id', id_seq)
id_seq += 1
"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "movie"
document_type => "movie"
document_id => "%{id}"
}
stdout {}
}
I'm trying to input a csv file to elasticsearch through logstash.
That's my configuration file
input {
file {
codec => plain{
charset => "ISO-8859-1"
}
path => ["PATH/*.csv"]
sincedb_path => "PATH/.sincedb_path"
start_position => "beginning"
}
}
filter {
if [message] =~ /^"ID","DATE"/ {
drop { }
}
date {
match => [ "DATE","yyyy-MM-dd HH:mm:ss" ]
target => "DATE"
}
csv {
columns => ["ID","DATE",...]
separator => ","
source => message
remove_field => ["message","host","path","#version","#timestamp"]
}
}
output {
elasticsearch {
embedded => false
host => "localhost"
cluster => "elasticsearch"
node_name => "localhost"
index => "index"
index_type => "type"
}
}
Now, the mapping produced in elasticsearch types the DATE field as string. I would like to type as a date field.
In the filter element, I tried to convert the type field in date but it doesn't work.
How can I fix that ?
Regards,
Alexandre
You have your filter chain setup in the wrong order. The date{} block needs to come after the csv {} block.