Logstash with XML file - elasticsearch

We are using logstash to read an xml file. This xml file is generated when a Jenkins pipeline build commences and is written to with build data during the pipeline execution. We use file input mode 'read'.
CURRENT BEHAVIOR:
The xml file is created when the Jenkins pipeline starts. Logstash discovers this xml file, reads it, logs it, and does not return to the xml file again.
PROBLEM:
Logstash has read the xml file prematurely and misses all the subsequent data that is written to it.
DESIRED BEHAVIOR:
Logstash allows us to apply some condition to tell it when to read the xml file. Ideally a trigger would tell logstash the xml file is completed and ready to be read and logged.
We want this to work with file input mode 'read'. The xml file is written to for around 1.5 hours.
Is there a filter, plugin or some other functionally that will allow logstash to return to the xml file when it is modified?

Related

Apache NiFi - How to pull all files thru GetSFTP processor only if a particular text file is available else ignore the files

Will be having many files daily, but need to pull them only if particular text file is in the list (which indicates all files are ready to pull), through GetSFTP Processor.
This process involved pulling files from SFTP and copying to aws-s3.
I know an alternate process to write a script and pull them through the script but I am looking to achieve the same with processors without a script.

Can I delete file in Nifi after send messages to kafka?

Hi I'm using nifi as an ETL tool.
Process IMG
This is my current process. I use TailFile to detect CSV file and then send messages to Kafka.
It works fine so far, but i want to delete CSV file after i send contents of csv to Kafka.
Is there any way?
Thanks
This depends on why you are using TailFile. From the docs,
"Tails" a file, or a list of files, ingesting data from the file as it is written to the file
TailFile is used to get new lines that are added to the same file, as they are written. If you need to a tail a file that is being written to, what condition determines it is no longer being written to?
However, if you are just consuming complete files from the local file system, then you could use GetFile which gives the option to delete the file after it is consumed.
From a remote file system, you could use ListSFTP and FetchSFTP which has a Completion Strategy to move or delete.

filebeat modify data enriching json from other sources

Log format consist on json encoded in line by line format.
Each line is
{data,payload:/local/path/to/file}
{data,payload:/another/file}
{data,payload:/a/different/file}
the initial idea is configure logstash to use http input, write a java (or anything) daemon that get the file, parse it line by line, replace the payload with the content of file, and send the data to logstash.
I can't modify how the server work, so log format can't be changed.
Logstash machine are different host, so no direct access to files.
Logstash can't mount a shared folder from the server_host.
I can't open port apart a single port for logstash due to compliance of the solution that need ot respect some silly rules that aren't under my control.
Now, to save some times and have a more reliable than a custom-made solution, it's possible to configure filebeat to process every line of json, before sending it to logstash, adding to it
{data,payload:content_of_the_file}
Filebeat won't be able to do advanced transformations of this kind, as it is only meant to forward logs, it can't even do basic string processing like logstash does. I suggest you write a custom script that does this transformation & writes the output to a different file.
You can use filebeat to send the contents of this new file to logstash.

Spring batch unit testing job with external input and db output

If I understand it correctly normal way of spring batch testing is to basically run my application and let JobLauncherTestUtils run my normal jobs. However my application reads input from external service and writes it to my database. I don't want my tests to write to my production database and I'd like to specify test input to be read rather from the files I'd provide than from external service.
Can anyone direct me to some example how I could do it? I'd like to feed a job with a file then when job has finished check in the database that what I expect is there. I guess I could specify h2 db in application-test.properties but I have no clue about the input.
Docs from https://docs.spring.io/spring-batch/4.1.x/reference/html/testing.html#testing don't really cover it for me.
Are you reading input files from disk? If so you can edit the input file source directory only for tests to be within the src/test/resources/input_dir/your_test_file.xml for example.
If the input file directory is configured with properties, you could create properties file only for tests with something like classpath:input_dir/your_test_file.xml (which would be in your project as src/test/resources/input_dir/your_test_file.xml).
If the input file directory is configured within execution context you can provide that in the jobExecutionContext parameter of JobLauncherTestUtils.launchStep

How can we generate flow files using java code in apache nifi

Is there a way to generate flow files in apache-nifi using java code which i will invoke using ExecuteStreamCommand ?
ExecuteStreamCommand starts system command and passes flow file to STDIN of this command, then takes STDOUT of the command and stores as a content of the flow file.
So, in java you have to write code that reads data from STDIN (System.in) and writes processed data to STDOUT (System.out)
I advice you to check ExecuteScript groovy examples because it is a java-based scripting language.

Resources