Why reinstalled Filebeat knows previous logs were already loaded to Elasticsearch - elasticsearch

I previously used Filebeat to load log data to the Elasticsearch though logstash, then I would like to try it again. So I reinstalled the Filebeat and emptied the Elasticsearch data, and then tried to reload the log data by Filebeat to Elasticsearch. But the Filebeat already knows that the data has been loaded once eventhough the Elasticsearch data storage is emptied. How Filebeat knows that log data was previously loaded? and if I would like to load again all log data, what should I do?

You need to clear the registry_file so that the "history" of read files is cleared as well.
To change the default configuration for the registry_file you just need to specify the full configuration path in the config file (filebeat.yml): https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html#_registry_file
For example:
filebeat:
registry_file: /var/lib/filebeat/registry

Related

Filebeats sends duplicated logs to Elasticsearch

The problem is that Filebeats is sending duplicated logs to Elasticsearch, when I restart Filebeats, he sends the whole log again.
I have been mounting /var/share/filebeat/data to the container where I am runnig Filebeats. I also had change the permissions of the share directory, to be owned by the filebeats user.
I am using Elasticsearch 8.1.2
The most probable reason for this is persistent volume location for filebeat registry. Essentially, filebeat creates a registry to keep track of all log files processed and to what offset. If this registry is not stored on a persistent location (for instance stored to /tmp) and filebeat is restarted, the registry file will be lost and new one will be created. This tells filebeat to tail all the log files present at specified path from beginning, hence the duplicate logs.
To resolve this, please mount a persistent volume to filebeat (may be hostpath) and configure it to be used for storing registry.
Thanks for the answers, but the issue was that in the initial setup we didn't define an ID tag for the filestream input type. As simple as that.
https://www.elastic.co/guide/en/beats/filebeat/current/_step_1_set_an_identifier_for_each_filestream_input.html

Kibana doesn’t update properly once filebeat is stopped for sometime

In my ELK setup, when filebeat is stopped for sometime, Kibana starts updating from the timestamp where the filebeat is started. No data available(under Discover tab) for the filebeat not functioning timeframe. Once filebeat is started, there are spikes in "Discover" tab initially which means data is updated under wrong time stamp.
How can I resolve this?
This is because some component (probably Logstash) is adding #timestamp, on the moment of sending the data to Elasticsearch. The solution: add #timestamp at the source, and don't overwrite it.
You can add a #timestamp value with a Filebeat Processor, or in Logstash, or by adding a valid #timestamp to your logs from the source and logging directly in JSON, foregoing any regex/grok or Processors.
Whichever way you use, you have to go through the pipeline to make sure it does not tamper with #timestamp.

filebeat to logstash or elasticsearch

I'm trying to visualize logs from my app. My logs formatted as json and stored in some file. I have filebeat installed which uses the same file as input. An filebeat could send the logs to Logstash and to Elasticsearch directly. Logstash could process logs, do something, parse them...
But my logs are json formatted already.
Elasticsearch are going to be installed on another server, another side of the planet...
so, my question is, Is there any good reason to use logstash in such scenario?( no need do any processing ), or is it ok to send logs to elasticsearch server directly?
I'm guessing the Logstash could do some buffering, but I want to keep my app's server light, don't want to install anything on top of it.
Thanks.
May this help you :https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html.
You can post the json into es by filebeat without Logstash, Logtstash is too heavy sometimes.

Filebeat > is it possible to send data to Elasticsearch by means of Filebeat without Logstash

I am a newbie of ELK. I installed first Elasticsearch and Filebeat without Logstash, and I would like to send data from Filebeat to Elasticsearch. After I installed the Filebeat and configured the log files and Elasticsearch host, I started the Filebeat, but then nothing happened even though there are lots of rows in the log files, which Filebeats prospects.
So is it possible to forward log data directly to Elasticsearch host without Logstash at all? I
It looks like your ES 2.3.1 is only configured to be reachable from localhost (default since ES 2.0)
You need to modify your elasticsearch.yml file with this and restart ES:
network.host: 168.17.0.100
Then your filebeat output configuration needs to look like this:
output:
elasticsearch:
hosts: ["168.17.0.100:9200"]
Then you can check in your ES filebeat-* indices that you're getting the new log data (i.e. the hits.total count should increase over time):
curl -XGET 168.17.0.100:9200/filebeat-*/_search

Log storage location ELK stack

I am doing centralized logging using logstash. I am using logstash-forwarder on the shipper node and ELK stack on the collector node.I wanted to know the location where the logs are stored in elasticsearch i didn't see any data files created where the logs are stored.Do anyone has idea about this?
Login to the server that runs Elasticsearch
If it's an ubuntu box, open the /etc/elasticsearch/elasticsearch.yml
Check out the path.data configuration
The files are stored on that location
Good luck.
I agree with #Tomer but the default path to logs in case of ubuntu is
/var/log/elasticsearch.log
/var/log/elasticsearch-access.log
/var/log/elasticsearch_deprecation.log
In /etc/elasticsearch/elasticsearch.yml the path to data path is commented out by default.
So the default path to logs is /var/log/elasticsearch/elasticsearch.log
As others have pointed out, path.data will be where Elasticsearch stores its data (in your case indexed logs) and path.logs is where Elasticsearch stores its own logs.
If you can't find elasticsearch.yml, you can have a look at the command line, where you'll find something like -Des.path.conf=/opt/elasticsearch/config
If path.data/path.logs aren't set, they should be under a data/logs directory under path.home. In my case, the command line shows -Des.path.home=/opt/elasticsearch

Resources