Yesterday I've configured a logstash file, to send the data to elasticsearch.
Today, I'm trying to do the same but (configure another file) but it doesn't work!
Why? what should I do?
The terminal just shows me that the pipeline started and pipeslines running that' all.
this is the configuration:
input{
file{
path =>"C:\Users\GeeksData\Desktop\ElasticSerach\GENERIC_FUFR0004_20171017_173013379.SyntaxicError.txt"
start_position =>"beginning"
}
}
output {
elasticsearch {
hosts => "localhost"
index =>"helloworld3"
document_type =>"helloworld3"
}
stdout {}
}
I've added in input plugin these line:
sincedb_path => NUL
know it works
It's not really an answer but I have no more problems with ingesting data using logstash :
Some important informations that you need to know if you have problems ingesting data with logstash:
1- The Name of the index,hosts and document_type need to be lowercased
2- Logstash don't reingest data that has already been ingested , only if you have changed something (like the name of the index) in the configuration file .
3- You need to create an index pattern in kibana and link it with the index created by elasticsearch to be able to visualize data of this index with kibana
Related
I need to configure an ELK pipeline such that I could give the input node a directory and each new file written to the directory will be parsed.
I currently use logstash File input, but if the same file is written twice, logstash ignores it.
What I've tried:
Setting sincedb_path => "/dev/null|
Setting sincedb_clean_after => 0
I have file_completed_action => "delete"
These work only if I run logstash as root (docker-compose -> user: root)
Is there any other way to accomplish this without running logstash as root?
So I have a fairly modest Logstash setup for Apache logs that I am using on RedHat 7 (production) as well as macOS High Sierra (10.13.6) for development and something odd has happened since upgrading from Logstash version 6.3.2 to 6.4.1. I am using Homebrew on macOS to install and update Logstash and these issues persist even if I “nuke” my installed Hombrew items and reinstall.
Straight to the point.
Simply put, static data input files are not being read and ingested on startup in 6.4.1 as they once did on 6.3.2 and earlier. For 6.4.1 I need to manually cat log lines to the target path for Logstash to “wake up” and pick up these new lines even if I designate the new read mode.
At the end of the day, this setup doesn’t need a sincedb setup and can be restarted and read from the head of file to end and we are all happy… At least until Logstash 6.4.1… Now nobody is happy. What can be done to force Logstash to always read data from the beginning of files no matter what?
Details and discovery.
The Logstash setup I am using just does some filtering of Apache logs for input. The input config I am using reads as follows; note that the file path is slightly tweaked for privacy but is effectively exactly what I am using right now and have been using for the past year or so without issue:
input {
file {
path => "/opt/logstash/coolapp/access_log*"
exclude => "*.gz"
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
close_older => 3600
stat_interval => 1
discover_interval => 15
}
}
The way I am using this for local development is simply getting a copy of remote Apache server logs and placing them in that /opt/logstash/coolapp/ directory.
Then when I startup Logstash via the command line like this with the -f potion set so my coolapp-apache.conf is read:
logstash -f coolapp-apache.conf
Logstash starts up locally, emits all of it’s pile of start up status messages until this final message:
[2018-09-24T12:40:09,458][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
Which to me indicates it’s fully up and running and checking my data collection output shows—if it is working—a flow of data pouring in… But when using Logstash 6.4.1 I see no data flowing in.
File input plugin works with tail mode.
Checking the newly updated documentation for the file input plugin (v4.1.5) shows there is a new mode option that has a read mode and a tail mode. Knowing that the default mode is tail I tested the setup by doing the following after starting up my local Logstash debugging setup. First I copied the access_log as follows:
cp /opt/logstash/coolapp/access_log /opt/logstash/coolapp/access_log_BAK
Then I zeroed out the main access_log file using :> like this:
:> /opt/logstash/coolapp/access_log
And finally I just ran cat and appended that copied file’s data to the original file like this:
cat /opt/logstash/coolapp/access_log_BAK > /opt/logstash/coolapp/access_log
The second I did that, lo and behold the data started to flow as expected! I guess the new file input plugin is focused on tailing a file more than read`ing? Anyway, that works but is clearly annoying. I don’t develop like this. I need Logstash to simply read the files and parse them.
File input plugin not working with read mode.
So I tried using the following setup to just read the files based on what I saw in the official Logstash file input mode documentation:
input {
file {
path => "/opt/logstash/coolapp/access_log"
mode => "read"
file_completed_action => "log"
file_completed_log_path => "/Users/Giacomo1968/Desktop/access_log_foo"
}
}
Of course things like access_log_foo is just for proof-of-concept file name for testing, but when all is said and done this read mode utterly does not work on macOS. I have even tried changing the path to something like my desktop and it doesn’t work. And the whole “zero out and then append a file” trick I used as explained in the “tail mode” explanation doesn’t cut it here since the file is not being tailed I guess?
So knowing all of that:
What can be done to force Logstash 6.4.1 to always read data from the beginning of files no matter what as it once did effortlessly in Logstash version 6.3.2 and previous?
Okay, I figured this out. I am now on Logstash 6.5 and my original config was as follows:
input {
file {
path => "/opt/logstash/coolapp/access_log*"
exclude => "*.gz"
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
close_older => 3600
stat_interval => 1
discover_interval => 15
}
}
When I redid it getting rid of ignore_older and adjusting close_older and stat_interval to use string_duration things started working again as expected.
input {
file {
path => "/opt/logstash/coolapp/access_log*"
exclude => "*.gz"
start_position => "beginning"
sincedb_path => "/dev/null"
close_older => "1 hour"
stat_interval => "1 second"
discover_interval => 15
}
}
My assumption is that Logstash 6.3.2 interpreted ignore_older being set to 0 as false thus disabling ignore_older but in version 6.4 and higher that value is now being interpreted as an actual time value in seconds? Haven’t dug deeply into the source code, but everything I have experienced points to that being the issue.
Regardless, this config now works and I am running Logstash 6.5 on macOS Mojave (10.14.1) without any issues.
I try to send logs from logs.csv file to elasticsearch using Logstash. In Elasticsearch I have index logs with type log. At the moment my logstash.conf looks in this way:
input {
file {
path => "/run/shm/elastic/logstash/logs.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["logs"]
}
}
output {
elasticsearch {
hosts => "hostaddress:9200"
index => "logs"
document_type => "log"
user => "elastic"
password => "elastic"
}
stdout {}
}
Logstash seems to be configured correctly because for instance sudo ./logstash -e 'input { stdin { } } output { stdout {} }' works properly.
However I get error shown below. Any ideas?
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2018-07-11 10:48:27.473 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[FATAL] 2018-07-11 10:48:27.510 [LogStash::Runner] runner - Logstash could not be started because there is already another instance using the configured data directory. If you wish to run multiple instances, you must change the "path.data" setting.
[ERROR] 2018-07-11 10:48:27.522 [LogStash::Runner] Logstash - java.lang.IllegalStateException: Logstash stopped processing because of an error: (SystemExit) exit
This error happens because another instance of Logstash is still running. You should start Logstash as a service in Linux instead of directly starting it, for example on RHEL you should start using:
service logstash start
and stop
service logstash stop
You can find commands for other systems under this link.
But sometimes Logstash gets stalled and you have to kill it manually
ps aux | grep logstash
Find Logstash's PID and kill it:
kill -9 LOGSTASH_PID
Most of the time Logstash can't be stopped in the standard way because it's processing some data but you can force Logstash to stop by adding --pipeline.unsafe_shutdown in the service startup file, you can read more about this here.
If you want to run multiple logstash instances, you need to define the path.data either by command,
bin/logstash -f <config_file.conf> --path.data PATH
(make sure the directory is writable)
or specify in logstash.yml file under /etc/logstash/ for each instance.
Please read, logstash could not be started when running multiple instances - path.data setting
To read all columns fro csv file, you need to provide name of each column like this,
columns => ["Date","column2","column3"]
I am using :
- elasticsearch-2.1.1
- kibana-4.3.1-linux-x64
- logstash-2.1.1
I followed this tutorial:
https://www.elastic.co/guide/en/logstash/current/advanced-pipeline.html
Then logstash was able to create the index into Elasticsearch.
After I deleted the index in elasticsearch with :
curl -XDELETE http://localhost:9200/logstash-2015.12.30/
Then I tried to create a new index with a new file config but logstash did not send the new index to the elasticsearch.
What is wrong?
Why logstash is not sending the new index to the elasticsearch?
Is this some kind of bug ?
I hope someone can help me.
Regards
This is because logstash has already read and processed your input file. logstash make use of sincedb to keep track of position that it already read. To make logstash to read and process your input every time you run logstash, use "sincedb_path" option to /dev/null in your input plugin as given below.
input {
file {
path => "/path/to/logstash-tutorial.log"
start_position => beginning
sincedb_path => "/dev/null"
}
}
See this(how to use sincedb in logstash?) link for more information.
Logstash uses the sincedb file to store the position it is at in processing a file. In the event of logstash shutting shown before processing is completed, it can use sincedb to continue from where it left off.
Running on Windows, the behaviour observed is that the sincedb file is only written when logstash closes. This means that if the machine logstash is running on is terminated and logstash's own shutdown routines are not called, no sincedb file will be written.
Setting the sincedb_write_interval to different values does not appear to make any difference. Even with this set, sincedb is only written when logstash terminates or is shutdown.
Below is the basic structure of our logstash configuration.
Are we using sincedb_write_interval in the wrong way?
Thanks
{
file {
path => "..."
sincedb_write_interval => 10
}
}
output {
elasticsearch {
host => "..."
index => "..."
protocol => "http"
cluster => "..."
}
}
You are using it correctly.
However, the default is 15 seconds so you should not be having the issue. Could run with some test input then wait a minute and post your sincedb?
sincedb_write_interval matters when Logstash tries to decide whether to pick up from last time's read.
If you do "sincedb_write_interval => NULL", Logstash will re-parse the whole file even though it has parsed it before.
I am using a very old logstash, 1.4.2. But having the same issue. The only value that works is "1". The default 15 doesn't work and no other value apart from "1" also don't work.
sincedb_write_interval => 1
Setting it to "1", updates the sincedb immediately.