Logstash does not insert into elasticsearch (from file piped into stdin) - elasticsearch

I am trying to insert the NYC taxi data into elasticsearch via Logstash.
with the command bin/logstash -f myconf.conf < /trip_data1.csv
But I get...
Logstash startup completed
Logstash shutdown completed
Where apparently nothing happened. No index has been created/modified in Elasticsearch either.
What am I doing wrong?
Here is my conf file:
input {
stdin{}
}
filter {
csv {
columns => ["medallion","hack_license","vendor_id","rate_code","store_and_fwd_flag","pickup_datetime","dropoff_datetime","passenger_count","trip_time_in_secs","trip_distance","pickup_longitude","pickup_latitude","dropoff_longitude","dropoff_latitude"]
separator => ","
}
}
output {
elasticsearch {
index => "samples1"
document_type => "sample"
}
stdout { codec => rubydebug }
}

Related

Logstash not importing data

I am working on an ELK stack setup I want to import data from a csv file from my PC to elasticsearch via logstash. Elasticsearch and Kibana is working properly.
Here is my logstash.conf file:
input {
file {
path => "C:/Users/aron/Desktop/es/archive/weapons.csv"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
csv {
separator => ","
columns => ["name", "type", "country"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200/"]
index => "weapons"
document_type => "ww2_weapon"
}
stdout {}
}
And a sample row data from my .csv file looks like this:
Name
Type
Country
10.5 cm Kanone 17
Field Gun
Germany
German characters are all showing up.
I am running logstash via: logstash.bat -f path/to/logstash.conf
It starts working but it freezes and becomes unresponsive along the way, here is a screenshot of stdout
In kibana, it created the index and imported 2 documents but the data is all messed up. What am I doing wrong?
If your task is only to import that CSV you better use the file upload in Kibana.
Should be available under the following link (for Kibana > v8):
<your Kibana domain>/app/home#/tutorial_directory/fileDataViz
Logstash is used if you want to do this job on a regular basis with new files coming in over time.
You can try with below one. It is running perfectly on my machine.
input {
file {
path => "path/filename.csv"
start_position => "beginning"
sincedb_path => "NULL"
}
}
filter {
csv {
separator => ","
columns => ["field1","field2",...]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => "https://localhost:9200"
user => "username" ------> if any
password => "password" ------> if any
index => "indexname"
document_type => "doc_type"
}
}

Elasticsearch not recieving input from logstash

I'm running logstash where the output is set to elasticsearch on my localhost. However, when I open up elasticsearch, it appears that it did not receive any data from logstash. Logstash parses the csv file correctly, as I can see by the output in the terminal.
I've tried modifying the conf file, but the problem remains. The conf file is below
input {
file {
path => "/Users/kevinliu/Desktop/logstash_tutorial/gg.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["name","price","unit","url"]
}
}
output {
elasticsearch {
hosts => "localhost"
index => "gg-prices"
}
stdout {}
}
When I access localhost:9200/ I just see the default " "You Know, for Search" display/message from elasticsearch.

How to avoid elasticsearch duplicate documents

How do I avoid elasticsearch duplicate documents?
The elasticsearch index docs count (20,010,253) doesn’t match with logs line count (13,411,790).
documentation:
File input plugin.
File rotation is detected and handled by this input,
regardless of whether the file is rotated via a rename or a copy operation.
nifi:
real time nifi pipeline copies logs from nifi server to elk server.
nifi has rolling log files.
logs line count on elk server:
wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
13,411,790 total
elasticsearch index docs count:
curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253
logstash input conf file:
cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}
You can use fingerprint filter plugin: https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html
This can e.g. be used to create consistent document ids when inserting
events into Elasticsearch, allowing events in Logstash to cause
existing documents to be updated rather than new documents to be
created.

using logstash to get data out of elasticsearch to a csv file

I am new to using logstash and am struggling to get data out of elasticsearch using logstash as a csv.
To create some sample data, we can first add a basic csv into elasticsearch... the head of the sample csv can be seen below
$ head uu.csv
"hh","hh1","hh3","id"
-0.979646332669359,1.65186132910743,"L",1
-0.283939374784435,-0.44785377794233,"X",2
0.922659898930901,-1.11689020559612,"F",3
0.348918777124474,1.95766948269957,"U",4
0.52667811182958,0.0168862169880919,"Y",5
-0.804765331279075,-0.186456470768865,"I",6
0.11411203100637,-0.149340801708981,"Q",7
-0.952836952412902,-1.68807271639322,"Q",8
-0.373528919496876,0.750994450392907,"F",9
I then put that into logstash using the following...
$ cat uu.conf
input {
stdin {}
}
filter {
csv {
columns => [
"hh","hh1","hh3","id"
]
}
if [hh1] == "hh1" {
drop { }
} else {
mutate {
remove_field => [ "message", "host", "#timestamp", "#version" ]
}
mutate {
convert => { "hh" => "float" }
convert => { "hh1" => "float" }
convert => { "hh3" => "string" }
convert => { "id" => "integer" }
}
}
}
output {
stdout { codec => dots }
elasticsearch {
index => "temp_index"
document_type => "temp_doc"
document_id => "%{id}"
}
}
This is put into logstash with the following command....
$ cat uu.csv | logstash-2.1.3/bin/logstash -f uu.conf
Settings: Default filter workers: 16
Logstash startup completed
....................................................................................................Logstash shutdown completed
So far so good, but I would like to get some of the data out in particular the hh and hh3 fields in the temp_index.
I wrote the following to extract the data out of elasticsearch into a csv.
$ cat yy.conf
input {
elasticsearch {
hosts => "localhost:9200"
index => "temp_index"
query => "*"
}
}
filter {
elasticsearch{
add_field => {"hh" => "%{hh}"}
add_field => {"hh3" => "%{hh3}"}
}
}
output {
stdout { codec => dots }
csv {
fields => ['hh','hh3']
path => '/home/username/yy.csv'
}
}
But get the following error when trying to run logstash...
$ logstash-2.1.3/bin/logstash -f yy.conf
The error reported is:
Couldn't find any filter plugin named 'elasticsearch'. Are you sure this is correct? Trying to load the elasticsearch filter plugin resulted in this error: no such file to load -- logstash/filters/elasticsearch
What do I need to change to yy.conf so that a logstash command will extract the data out of elasticsearch and input into a new csv called yy.csv.
UPDATE
changing yy.conf to be the following...
$ cat yy.conf
input {
elasticsearch {
hosts => "localhost:9200"
index => "temp_index"
query => "*"
}
}
filter {}
output {
stdout { codec => dots }
csv {
fields => ['hh','hh3']
path => '/home/username/yy.csv'
}
}
I got the following error...
$ logstash-2.1.3/bin/logstash -f yy.conf
Settings: Default filter workers: 16
Logstash startup completed
A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::Elasticsearch hosts=>["localhost:9200"], index=>"temp_index", query=>"*", codec=><LogStash::Codecs::JSON charset=>"UTF-8">, scan=>true, size=>1000, scroll=>"1m", docinfo=>false, docinfo_target=>"#metadata", docinfo_fields=>["_index", "_type", "_id"], ssl=>false>
Error: [400] {"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"init_scan","grouped":true,"failed_shards":[{"shard":0,"index":"temp_index","node":"zu3E6F7kQRWnDPY5L9zF-w","reason":{"type":"parse_exception","reason":"Failed to derive xcontent"}}]},"status":400} {:level=>:error}
A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::Elasticsearch hosts=>["localhost:9200"], index=>"temp_index", query=>"*", codec=><LogStash::Codecs::JSON charset=>"UTF-8">, scan=>true, size=>1000, scroll=>"1m", docinfo=>false, docinfo_target=>"#metadata", docinfo_fields=>["_index", "_type", "_id"], ssl=>false>
Error: [400] {"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"init_scan","grouped":true,"failed_shards":[{"shard":0,"index":"temp_index","node":"zu3E6F7kQRWnDPY5L9zF-w","reason":{"type":"parse_exception","reason":"Failed to derive xcontent"}}]},"status":400} {:level=>:error}
A plugin had an unrecoverable error. Will restart this plugin.
Interestingly...if i change yy.conf to remove elasticsearch{} to look like...
$ cat yy.conf
input {
elasticsearch {
hosts => "localhost:9200"
index => "temp_index"
query => "*"
}
}
filter {
add_field => {"hh" => "%{hh}"}
add_field => {"hh3" => "%{hh3}"}
}
output {
stdout { codec => dots }
csv {
fields => ['hh','hh3']
path => '/home/username/yy.csv'
}
}
I get the following error...
$ logstash-2.1.3/bin/logstash -f yy.conf
Error: Expected one of #, { at line 10, column 19 (byte 134) after filter {
add_field
You may be interested in the '--configtest' flag which you can
use to validate logstash's configuration before you choose
to restart a running system.
Also when changing yy.conf to be something similar to take into account the error message
$ cat yy.conf
input {
elasticsearch {
hosts => "localhost:9200"
index => "temp_index"
query => "*"
}
}
filter {
add_field {"hh" => "%{hh}"}
add_field {"hh3" => "%{hh3}"}
}
output {
stdout { codec => dots }
csv {
fields => ['hh','hh3']
path => '/home/username/yy.csv'
}
}
I get the following error...
$ logstash-2.1.3/bin/logstash -f yy.conf
The error reported is:
Couldn't find any filter plugin named 'add_field'. Are you sure this is correct? Trying to load the add_field filter plugin resulted in this error: no such file to load -- logstash/filters/add_field
* UPDATE 2 *
Thanks to Val I have made some progress and started to get output. But they don't seem correct. I get the following outputs when running the following commands...
$ cat uu.csv | logstash-2.1.3/bin/logstash -f uu.conf
Settings: Default filter workers: 16
Logstash startup completed
....................................................................................................Logstash shutdown completed
$ logstash-2.1.3/bin/logstash -f yy.conf
Settings: Default filter workers: 16
Logstash startup completed
....................................................................................................Logstash shutdown completed
$ head uu.csv
"hh","hh1","hh3","id"
-0.979646332669359,1.65186132910743,"L",1
-0.283939374784435,-0.44785377794233,"X",2
0.922659898930901,-1.11689020559612,"F",3
0.348918777124474,1.95766948269957,"U",4
0.52667811182958,0.0168862169880919,"Y",5
-0.804765331279075,-0.186456470768865,"I",6
0.11411203100637,-0.149340801708981,"Q",7
-0.952836952412902,-1.68807271639322,"Q",8
-0.373528919496876,0.750994450392907,"F",9
$ head yy.csv
-0.106007607975644E1,F
0.385395589205671E0,S
0.722392598488791E-1,Q
0.119773830827963E1,Q
-0.151090510772458E1,W
-0.74978830916084E0,G
-0.98888121700762E-1,M
0.965827615823707E0,S
-0.165311094671424E1,F
0.523818819076447E0,R
Any help would be much appreciated...
You don't need that elasticsearch filter, just specify the fields you want in your CSV in the csv output like you did and you should be fine. The fields you need in your CSV are already contained in the event, you simply need to list them in the fields list of the csv output, simply as that.
Concretely, your config file should look like this:
$ cat yy.conf
input {
elasticsearch {
hosts => "localhost:9200"
index => "temp_index"
}
}
filter {
}
output {
stdout { codec => dots }
csv {
fields => ['hh','hh3']
path => '/home/username/yy.csv'
}
}

Losgstah configuration issue

I begin with logstash and ElasticSearch and I would like to index .pdf or .doc file type in ElasticSearch via logstash.
I configured logstash using the codec multiline to get my file in a single message in ElasticSearch. Below is my configuration file:
input {
file {
path => "D:/BaseCV/*"
codec => multiline {
# Grok pattern names are valid! :)
pattern => ""
what => "previous"
}
}
}
output {
stdout {
codec => "rubydebug"
}
elasticsearch {
hosts => "localhost"
index => "cvindex"
document_type => "file"
}
}
At the start of logstash the first file I add, I recovered in ElasticSearch in one message, but the following are spread over several messages. I wish I had the correspondence : 1 file = 1 message.
Is this possible ? What should I change my setup to solve the problem ?
Thank you for your feedback.

Resources