Re-queue a Logstash event - elasticsearch

Is it possible to re-queue a Logstash event to be processed again in a bit of time?
In order to illustrate what I mean, I will explain my use case: I have a custom Logstash filter that extracts the application version from logs at the start of an application, and then appends the correct version to every log event. However in the very beginning, race conditions can occur where an application version has not yet been written to a file, and yet the Logstash filter tries to read in the data anyway (since it it processing log lines concurrently). This results in an application version that is null. In case it matters, Logstash gets its input from Filebeat.
I would like to re-queue these events to be re-processed entirely a couple seconds (or milliseconds) from now, when the application version has been saved to the disk.
However this leads me to a broader question, which is, can you tell a Logstash event to be re-queued, or is there an alternative solution to this scenario?
Thanks for the help!

Process data and append in a new file after that use that file to further process data.
Logstash Processor - 1 Geat Data Proces Data and append to file.
Logstash Processor - 2 Get Data From 2nd File and do whatever you
want to do.

Related

How can i get a live view of syslog-ng logs in a webfrontend?

I currently have Syslog-ng set up to aggregate my logs. I want to show these logs in real time to my web frontend users. I however have no clue how to do this, is it possible to connect directly to Syslog-ng using WebSockets? Or do I need to first pass it on to something like elasticsearch, if so, how do I get my data live from elasticsearch?
I found this table in the Syslog-ng documentation, but iIcould not find any output destination that would solve my problem.
Unfortunately currently there's no mechanism to export real-time log traffic for a generic destination. You could however write your configuration in a way that places log information for a frontend to read.
For instance, if you have a log statement delivering messages to elastic:
log {
source(s_network);
destination(d_elastic);
};
you could add an alternative destination to the same log statement, which would only serve as a buffer for exporting real-time log data. For instance:
log {
source(s_network);
destination(d_elastic);
destination { file("/var/log/buffers/elastic_snapshot.$SEC" overwrite-if-older(59)); };
};
Notice the 2nd destination in the log statement above, with curly braces you tell syslog-ng to use an in-line destination instead of a predefined one (or you could use a full-blown destination declaration, but I omitted that for brevity).
This new file destination would write all messages that elastic receives to a file. The file contains the time based macro $SEC, meaning that you'd get a series of files: one for each second in a minute.
Your frontend could just try to find the file with the latest timestamp and present that as the real-time traffic (from the last second).
The overwrite-if-older() option tells syslog-ng that if the file is older than 59 seconds, then it should overwrite it instead of appending to it.
This is a bit hacky, I even intend do implement something what you have asked for in a generic way, but it's doable even today, as long as the syslog-ng configuration is in your control.

Apache NiFi instance hangs on the "Computing FlowFile lineage..." window

My Apache NiFi instance just hangs on the "Computing FlowFile lineage..." for a specific flow. Others work, but it won't show the lineage for this specific flow for any data files. The only error message in the log is related to an error in one of the processors, but I can't see how that would affect the lineage, or stop the page from loading.
This was related to two things...
1) I was using the older (but default) provenance repository, which didn't perform well, resulting in the lag in the UI. So I needed to change it...
#nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository
nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
2) Fixing #1 exposed the second issue, which was that the EnforceOrder processor was generating hundreds of provenance events per file, because I was ordering on a timestamp, which had large gaps between the values. This is apparently not a proper use case for the EnforceOrder processor. So I'll have to remove it and find another way to do the ordering.

How to conditionally process FlowFile's by a MongoDB query result?

I need to process a list of files based on the result of a MongoDB query, but I can't find any processor that would let me do that. I basically have to take each file and process it or completely discard based on the result of a query that involves that file attributes.
The only MongoDB-related processor that I see in NiFi 1.50 is GetMongo, which apparently can't receive connections, but only emit FlowFiles based on the configured parameters.
Am I looking in the wrong place?
NIFI-4827 is an Improvement Jira that aims to allow GetMongo to accept incoming flow files, the content would contain the query, and the properties will accept Expression Language. The code is still under review, but the intent is to make it available in the upcoming NiFi 1.6.0 release.
As a possible workaround in the meantime, if there is a REST API you could use InvokeHttp to make the call(s) manually and parse the result(s). Also if you have a JDBC driver for MongoDB (such as Unity), you might be able to use ExecuteSQL.

Retain tag/field across events in logstash 1.5

I'm using logstash 1.5 to analyze logs.
I want to track two events which occur one after the other.
So I would like to set a flag/field/tag when first event occurs and retain the value across events.
I looked at this link but looks like grep and drop are not supported in logstash 1.5.
Is there a way of achieving this?
The closest you can get with logstash is the elapsed{} filter. You could use that code as a basis for your own filter if it doesn't meet your needs. I also run some external (python) post-processing to do more than elapsed{} can (or should) do.

Logstash availability when elasticsearch dies/can't write to disk

I have rsyslog forwarding logs to logstash via TCP. If logstash is not available rsyslog will build up queues.
In the event that logstash is available, but elasticsearch is dead or for some reason cannot write to the file system.
Is there a way for logstash to reject further TCP messages.
Thanks
According to life of an event description:
An output can fail or have problems because of some downstream cause, such as full disk, permissions problems, temporary network failures, or service outages. Most outputs should keep retrying to ship any events that were involved in the failure.
If an output is failing, the output thread will wait until this output is healthy again and able to successfully send the message. Therefore, the output queue will stop being read from by this output and will eventually fill up with events and block new events from being written to this queue.
A full output queue means filters will block trying to write to the output queue. Because filters will be stuck, blocked writing to the output queue, they will stop reading from the filter queue which will eventually cause the filter queue (input -> filter) to fill up.
A full filter queue will cause inputs to block when writing to the filters. This will cause each input to block, causing each input to stop processing new data from wherever that input is getting new events.
This means that if the elasticsearch output starts to fail then the entire pipeline will be blocked which is what you want in your case. Are you seeing something different?

Resources