Tagging when a message is uploaded in Logstash - elasticsearch

I have a large ingestion pipeline, and sometimes it takes awhile for things to progress from source to the Elasticsearch index. Currently, when we parse our messages with Logstash, we parse the #timestamp field based on when the message was written by the source. However, due to large volumes of messages, it takes a currently unknown and possibly very inconsistent length of time to travel from the source producer before it's ingested by Logstash and sent to the Elasticsearch index.
Is there a way to add a field to the Elasticsearch output plugin for Logstash that will mark when a message is sent to Elasticsearch?

You can try to add a ruby filter as your last filter to create a field with the current time.
ruby {
code => "event.set('fieldName', Time.now())"
}

You can do it in an ingest pipeline. That means the script is executed in elasticsearch, so it has the advantage of including any delays caused by back-pressure from the output.

Related

How to check if a field value exists before inserting in elasticsearch?

I have a working ELK with input coming from filebeat prospecting several log files and sending them to logstash. Logstash retrieves the stream, filters that in order to match lines with some fields and then sends them into elasticsearch.
Now I would like to check before output to elasticsearch, if for a coming entry into elasticsearch, there is already an existing document. If yes, I want to apply another output plugin instead of elasticsearch.

Ways to only process new(index after last run) data in Elasticsearch?

Is there a way to get the date and time that an elastic search document was written?
I am running es queries via spark and would prefer NOT to look through all documents that I have already processed. Instead I would like read the only documents that were ingested between the last time the program ran and now.
What is the best most efficient way to do this?
I have looked at;
updating to add a field with an array with booleans for if its been looked at by which analytic. The negative is waiting for the update to occur.
index per time frame method, which would be to break down the current indexes into smaller ones so by hour.The negative I see is the number of open file descriptors.
??
Elasticsearch version 5.6
I posted the question on the elasticsearch discussion board and it appears using the ingest pipeline is the best option.
I am running es queries via spark and would prefer NOT to look through
all documents that I have already processed. Instead I would like read
the only documents that were ingested between the last time the
program ran and now.
A workaround could be :
While inserting data using Logstash to Elasticsearch, Logstash appends a #timestamp key to the document which represents the time (in UTC) at which the document is created or we can use an ingest pipline
After that we can query based on the timestamp.
For more on this please have a look at :
Mapping changes
There is no way to ask ES to insert a timestamp at index time
Elasticsearch doesn't have such functionality.
You need manually save with each document date. In this case you will be able to search by date range.

Filebeat: Outputting to different outputs depending on the document type

So I'm reading in several different file types using Filebeat. I set the document_type for each kind of file I am harvesting. My problem is that I want to send most of these file types to Logstash, but there are certain types I wish to send directly to Elasticsearch.
Is it possible to select the output depending on file type? I know multiple outputs are allowed, but this sends all the data into both elasticsearch and logstash so it's going to get inserted into Elasticsearch twice, which will take up too much space. Thanks!
Filebeat ships events to one endpoint, all routing should be done in Logstash.

Can I use Kibana to parse the message field

We are using ELK and shoving all syslogs into Elasticsearch.
I have a log type like whose message field looks like:
"message":"11/04/2016 12:04:09 PM|There are now 8 active connections#015"
I would like to use Kibana to parse the message to get the number of active connections over time and then graph that in Kibana.
Am I thinking of how to do this correctly?
The reading I've done seems to be telling me to set up a filter in Logstash...but that seems like the wrong place to parse the message field for this single log line type, given the amount of messages/logs and message/log types getting sent through Logstash.
Is there a way to parse the message field for this number and then graph that count over time in Kibana?
Kibana is not meant to do this kind of parsing. There are a few options you can use:
You could write an analyser that analyses this string. It can be
done, but I would not do it like this.
Use logstash, but you already suggested that yourself. If you feel
log stash is to heavy and you have a choice for the version to use,
go for option three.
Use ingest, this is a new feature of elasticsearch. This is kind of
a lightweight logstash that comes pre-packaged with elastic, it
support patterns with grok that can do this.

Elasticsearch not immediately available for search through Logstash

I want to send queries to Elasticsearch through the Elasticsearch plugin within Logstash for every event in process. However, Logstash sends requests to Elasticsearch in bulk and indexed events are not immediately made available for search in Elasticsearch. It seems to me that there will be a lag (up to in process a second or more) between an index passing through Logstash and it being searchable. I don't know how to solve this.
Do you have any idea ?
Thank you for your time.
Joe

Resources