Task:
Pulling data from SQLServer and Pushing records to Elasticsearch.
Achieving this through triggering the logstash cmd after certain upstream conditional triggers finished.
Planning to do this by cmd.exe in c#.net process. Any better way to achieve?
Scenario to handle:
Need to send email if data transfer completed successfully.
Need to send email if unsuccessful and also perform some event.
Unsuccessful Condition: Could be anything like server not available/disk full.
Also can we capture the last record transferred to Elasticsearch in the same consecutive request in case of failure and trigger some event?
First two are V important.
Also facing issue when ES stopped "dead ES instance" in logs/command window output, but LS dont stop waits for ES? How to get this out/terminate when no response after say 5 attempts by LS to ES?
Logstash usually isn't a batch command to be triggered.
If your data is in SQLServer, try connecting with the JDBC input.
Related
I was wondering if it is possible to configure the elastic heartbeat to only send data when there is a real status change on events.
If a host is always pingable I would avoid to fill up my buffer queue with useless data, instead whenever it turns to unreachable I'd love if it could sent one message with new status.
Beats are collecting time series data. This event data is being stored in elasticsearch indices alongside the event's timestamp and other metadata.
So if you change the behaviour in the asked way, there will be no event data for some timespans.
And now imagine a dashboard/query, where you want to have a look on a specific timeframe and no data is present at all.
But there is a second reason. The availability status is not the only information being collected. Take the response times, status codes and other metadata into account. Even if there is no change in the avaiability of the monitored service, there can be valuable changes in the metadata.
This was the explanation why we store all the events and the answer on our question is no, it's not possible.
Is it possible to re-queue a Logstash event to be processed again in a bit of time?
In order to illustrate what I mean, I will explain my use case: I have a custom Logstash filter that extracts the application version from logs at the start of an application, and then appends the correct version to every log event. However in the very beginning, race conditions can occur where an application version has not yet been written to a file, and yet the Logstash filter tries to read in the data anyway (since it it processing log lines concurrently). This results in an application version that is null. In case it matters, Logstash gets its input from Filebeat.
I would like to re-queue these events to be re-processed entirely a couple seconds (or milliseconds) from now, when the application version has been saved to the disk.
However this leads me to a broader question, which is, can you tell a Logstash event to be re-queued, or is there an alternative solution to this scenario?
Thanks for the help!
Process data and append in a new file after that use that file to further process data.
Logstash Processor - 1 Geat Data Proces Data and append to file.
Logstash Processor - 2 Get Data From 2nd File and do whatever you
want to do.
I use rest api in my program,I made a processor group for convent a mongodb collection to json file:
I want to run the scheduling only one time,so I set the "Run schedule" to 10000 sec.Then I will stop the group when the data flow have ran one time,and I made a Notify processor and add a DistributedMapCacheService.But the DistributedMapCacheClientService of the Notify processor only comunicates with the DistributedMapCacheService in nifi itself,It never nofity my program.
I try to use my own socket server,but I only get a message "nifi" but no more message.
My question is:If I only want scheduling run once and stop it,how do I know when shall I stop it?Or is there some other way to achieve my purpose,like detect if the json file exists or use incremental data(If the scheduling run twice,the data will be repeated twice)?
As #daggett said you can do it in a synchronous way you can use HandleHttpRequest as trigger and HandleHttpResponse to manage the response.
For an asynchronous was you have several options for the notification like PutTCP, PostHTTP, GetHTTP, use FTP, file system, XMPP or whatever.
If the scheduling run twice the duplicated elements depends on the processors you use, some of them have state others no, but if you are facing problems with repeated elements you can use the DetectDuplicate processor.
I am using Elasticsearch bulk API to send a lot of documents to index and delete at once. If there is an error for one document, other documents will be indexed or deleted successfully. And this leads to wrong state of data in elasticstore because in my case documents are kind of related to each other. I mean if one document's field has some value then there are other documents which should also have same value for that field. I am not sure how I can handle such errors from Bulk requests. Is it possible to rollback a request in any way? I read similar questions but could not get solution on handling such cases. Or instead of rollback, is there any way to send data only if there is no error? or something like dry run of request possible?
I'm late to the question but will answer for whoever runs across a similar scenario in the future.
After executing the Elasticsearch (ES) bulk API aka BulkRequest, you get a BulkResponse in return which consists of one or more BulkItemResponse. BulkItemResponse has a method isFailed() which will tell you if that action failed or not. In your case, you can traverse all the items in the response if there are failures and handle failed responses as per your requirement.
The code will look something like this for Synchronous execution:
val bulkResponse: BulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
bulkResponse.iterator.asScala
.filter(_.isFailed)
.foreach(item => { // your logic to handle failures })
For Asynchronous execution, you can provide a listener which will be called after the execution is completed. You have to override onResponse() and onFailure() in this case. You can read more about it at https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-document-bulk.html
HTH.
The solution shared above to use BulkResponse output is basically to handle next batch requests. What if I want to break the batch processing at the position where any request failed in the batch. We are sending bulk events which are related to each other. Example of my issue: Batch(E1- E10), if batch fails at E5. I don't want E6-E10 to process because they are related. I want immediate response in that case.
I have rsyslog forwarding logs to logstash via TCP. If logstash is not available rsyslog will build up queues.
In the event that logstash is available, but elasticsearch is dead or for some reason cannot write to the file system.
Is there a way for logstash to reject further TCP messages.
Thanks
According to life of an event description:
An output can fail or have problems because of some downstream cause, such as full disk, permissions problems, temporary network failures, or service outages. Most outputs should keep retrying to ship any events that were involved in the failure.
If an output is failing, the output thread will wait until this output is healthy again and able to successfully send the message. Therefore, the output queue will stop being read from by this output and will eventually fill up with events and block new events from being written to this queue.
A full output queue means filters will block trying to write to the output queue. Because filters will be stuck, blocked writing to the output queue, they will stop reading from the filter queue which will eventually cause the filter queue (input -> filter) to fill up.
A full filter queue will cause inputs to block when writing to the filters. This will cause each input to block, causing each input to stop processing new data from wherever that input is getting new events.
This means that if the elasticsearch output starts to fail then the entire pipeline will be blocked which is what you want in your case. Are you seeing something different?