Filebeats sends duplicated logs to Elasticsearch - elasticsearch

The problem is that Filebeats is sending duplicated logs to Elasticsearch, when I restart Filebeats, he sends the whole log again.
I have been mounting /var/share/filebeat/data to the container where I am runnig Filebeats. I also had change the permissions of the share directory, to be owned by the filebeats user.
I am using Elasticsearch 8.1.2

The most probable reason for this is persistent volume location for filebeat registry. Essentially, filebeat creates a registry to keep track of all log files processed and to what offset. If this registry is not stored on a persistent location (for instance stored to /tmp) and filebeat is restarted, the registry file will be lost and new one will be created. This tells filebeat to tail all the log files present at specified path from beginning, hence the duplicate logs.
To resolve this, please mount a persistent volume to filebeat (may be hostpath) and configure it to be used for storing registry.

Thanks for the answers, but the issue was that in the initial setup we didn't define an ID tag for the filestream input type. As simple as that.
https://www.elastic.co/guide/en/beats/filebeat/current/_step_1_set_an_identifier_for_each_filestream_input.html

Related

Re-Bootstrap Elastic Cluster

I need guidance to reinstate my Elastic cluster.
I had bootstrapped Elastic Cluster and had created 1 super-user and 2 other system-users too.
Ingest, Data, Gateway nodes had also joined the cluster.
Later, I felt I want to rename the Data but Google-Cloud does not allow me to rename so I created new data nodes with proper name and then deleted the old data nodes.
I had not ingest any data so far, no index was created .
Now, when I tried to see any of the cluster details ( say license information).
It does not authenticate any system user.
I tried re-creating the Bootstrap password and setting again. But that did not work either.
I'm seeing below exception in Elastic logs.
failed to retrieve password hash for reserved user [username]
org.elasticsearch.action.UnavailableShardsException: at least one primary shard for the index [.security-5] is unavailable
Please suggest me, there is a way to reinstate the existing configurations or how can I bootstrap it again .
I had not ingest any data so far
If you haven't added any actual data yet, the simplest approach is probably to delete all the current data directories and start the cluster from scratch again.
Also, is this still Elasticsearch 5 (looking at .security-5)? Because that's a really old version and some things work differently there than with current versions for a proper reset.
I had the sudo access, I created a system user using file based auth
then re-created other system users with the same password
then reverted the access type to normal login
That worked for me.

How log files are collected from different microservices in logstash or fluentd?

I have 10 microservices running and stored separate log files for each microservice. but when i am configuring the logstash or fluentd config file for tracing i have to give one specific log file name but i want logstash or fluentd has to take all the log files from all of my microservices.
how is that possible?
can i give the path of my log folder, where all my logs are storing?
will it consider the all log files? or i have to give only one log file path?
when i checked some docs, they are using one single log file and giving the entire path where that log file is present. but i want it to take all the log files which are present in the logs folder.
also please guide me how i can achieve this in Kubernetes?
1 node contains 10 pods and each pod running 1 microservice in kubernetes.

ELK logstash cant create index in ES

after following this tuto (https://www.bmc.com/blogs/elasticsearch-logs-beats-logstash/) in order to use logstash to analyze some log files, my index was created fine at the first time, then I wanted to re-index new files with new filters and new repositories so I deleted via "curl XDELETE" the index and now when I restart logstash and filebeat, the index is not created anymore.. I dont see any errors while launching the components.
Do I need to delete something else in order to re-create my index?
Ok since my guess (see comments) was correct, here's the explanation:
To avoid that filebeat reads and publishes lines of a file over and over again, it uses a registry to store the current state of the harvester:
The registry file stores the state and location information that Filebeat uses to track where it was last reading.
As you stated, filebeat successfully harvested the files, sent the lines to logstash and logstash published the events to elasticsearch which created the desired index. Since filebeat updated its registry, no more lines had to be harvested and thus no events were published to logstash again, even when you deleted the index. When you inserted some new lines, filebeat reopened the harvester and published only the new lines (which came after the "registry checkpoint") to logstash.
The default location of the registry file is ${path.data}/registry (see Filebeat's Directory Layout Overview).
... maybe the curl api call is not the best solution to restart the index
This has nothing to do with deleting the index. Deleting the index happens inside elasticsearch. Filebeat has no clue about your actions in elasticsearch.
Q: Is there a way to re-create an index based on old logs?
Yes, there are some ways you should take into consideration:
You can use the reindex API which copies documents from one index to another. You can update the documents while reindexing them into the new index.
In contrast to the reindex you can use the update by query API to update documents that will remain in the original index.
Lastly you could of course delete the registry file. However this could cause data loss. But for development purposes I guess that's fine.
Hope I could help you.

Why reinstalled Filebeat knows previous logs were already loaded to Elasticsearch

I previously used Filebeat to load log data to the Elasticsearch though logstash, then I would like to try it again. So I reinstalled the Filebeat and emptied the Elasticsearch data, and then tried to reload the log data by Filebeat to Elasticsearch. But the Filebeat already knows that the data has been loaded once eventhough the Elasticsearch data storage is emptied. How Filebeat knows that log data was previously loaded? and if I would like to load again all log data, what should I do?
You need to clear the registry_file so that the "history" of read files is cleared as well.
To change the default configuration for the registry_file you just need to specify the full configuration path in the config file (filebeat.yml): https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html#_registry_file
For example:
filebeat:
registry_file: /var/lib/filebeat/registry

Log storage location ELK stack

I am doing centralized logging using logstash. I am using logstash-forwarder on the shipper node and ELK stack on the collector node.I wanted to know the location where the logs are stored in elasticsearch i didn't see any data files created where the logs are stored.Do anyone has idea about this?
Login to the server that runs Elasticsearch
If it's an ubuntu box, open the /etc/elasticsearch/elasticsearch.yml
Check out the path.data configuration
The files are stored on that location
Good luck.
I agree with #Tomer but the default path to logs in case of ubuntu is
/var/log/elasticsearch.log
/var/log/elasticsearch-access.log
/var/log/elasticsearch_deprecation.log
In /etc/elasticsearch/elasticsearch.yml the path to data path is commented out by default.
So the default path to logs is /var/log/elasticsearch/elasticsearch.log
As others have pointed out, path.data will be where Elasticsearch stores its data (in your case indexed logs) and path.logs is where Elasticsearch stores its own logs.
If you can't find elasticsearch.yml, you can have a look at the command line, where you'll find something like -Des.path.conf=/opt/elasticsearch/config
If path.data/path.logs aren't set, they should be under a data/logs directory under path.home. In my case, the command line shows -Des.path.home=/opt/elasticsearch

Resources