filebeat modify data enriching json from other sources - elasticsearch

Log format consist on json encoded in line by line format.
Each line is
{data,payload:/local/path/to/file}
{data,payload:/another/file}
{data,payload:/a/different/file}
the initial idea is configure logstash to use http input, write a java (or anything) daemon that get the file, parse it line by line, replace the payload with the content of file, and send the data to logstash.
I can't modify how the server work, so log format can't be changed.
Logstash machine are different host, so no direct access to files.
Logstash can't mount a shared folder from the server_host.
I can't open port apart a single port for logstash due to compliance of the solution that need ot respect some silly rules that aren't under my control.
Now, to save some times and have a more reliable than a custom-made solution, it's possible to configure filebeat to process every line of json, before sending it to logstash, adding to it
{data,payload:content_of_the_file}

Filebeat won't be able to do advanced transformations of this kind, as it is only meant to forward logs, it can't even do basic string processing like logstash does. I suggest you write a custom script that does this transformation & writes the output to a different file.
You can use filebeat to send the contents of this new file to logstash.

Related

send payara logs to graylog via syslog and set correct source

I have a graylog instance that's running a UDP-Syslog-Input on Port 1514.
It's working wonderfully well for all the system logs of the linux servers.
When I try to ingest payara logs though [1], the "source" of the message is set to "localhost" in graylog, while it's normally the hostname of the sending server.
This is suboptimal, because in the best case I want the application logs with correct source in graylog also.
I googled around and found:
https://github.com/payara/Payara/blob/payara-server-5.2021.5/nucleus/core/logging/src/main/java/com/sun/enterprise/server/logging/SyslogHandler.java#L122
It seems like the syslog "source" is hard-coded into payara (localhost).
Is there a way to accomplish sending payara-logs with the correct "source" set?
I have nothing to do with the application server itself, I just want to receive the logs with the correct source (the hostname of the sending server).
example log entry in /var/log/syslog for payara
Mar 10 10:00:20 localhost [ INFO glassfish ] Bootstrapping Monitoring Console Runtime
I suspect I want the "localhost" in above example set to fqdn of the host.
Any ideas?
Best regards
[1]
logging.properties:com.sun.enterprise.server.logging.SyslogHandler.useSystemLogging=true
Try enabling "store full message" in the syslog input settings.
That will add the full_message field to your log messages and will contain the header, in addition to what you see in the message field. Then you can see if the source IP is in the UDP packet. If so, collect those messages via a raw/plaintext UDP input and the source should show correctly.
You may have to parse the rest of the message via an extractor or pipeline rule, but at least you'll have the source....
Well,
this might not exactly be a good solution but I tweaked the rsyslog template for graylog.
I deploy the rsyslog-config via Puppet, so I can generate "$YOURHOSTNAME-PAYARA" dynamically using the facts.
This way, I at least have the correct source set.
$template GRAYLOGRFC5424,"<%PRI%>%PROTOCOL-VERSION% %TIMESTAMP:::date-rfc3339% YOURHOSTNAME-PAYARA %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% %msg%\n"
if $msg contains 'glassfish' then {
*.* #loghost.domain:1514;GRAYLOGRFC5424
& ~
} else {
*.* #loghost.domain:1514;RSYSLOG_SyslogProtocol23Format
}
The other thing we did is actually activating application logging through log4j and it's syslog appender:
<Syslog name="syslog_app" appName="DEMO" host="loghost" port="1514" protocol="UDP" format="RFC5424" facility="LOCAL0" enterpriseId="">
<LoggerFields>
<KeyValuePair key="thread" value="%t"/>
<KeyValuePair key="priority" value="%p"/>
<KeyValuePair key="category" value="%c"/>
<KeyValuePair key="exception" value="%ex"/>
</LoggerFields>
</Syslog>
This way, we can ingest the glassfish server logs and the independent application logs into graylog.
The "LoggerFields" in log4j.xml appear to be key-value pairs for the "StructuredDataElements" according to RFC5424.
https://logging.apache.org/log4j/2.x/manual/appenders.html
https://datatracker.ietf.org/doc/html/rfc5424
That's the problem with UDP Syslog. The sender gets to set the source in the header. There is no "best answer" to this question. When the information isn't present, it's hard for Graylog to pass it along.
It sounds like you may have found an answer that works for you. Go with it. Using log4j solves two problems and lets you define the source yourself.
For those who face a similar issue, a simpler way to solve the source problem might be to use a static field. If you send the payara syslog messages to their own input, you can create a static field that could substitute for the source to identify traffic from that source. Call it "app_name" or "app_source" or something and use that field for whatever sorting you need to do.
Alternatively, if you have just one source for application messages, you could use a pipeline to set the value of the source field to the IP or FQDN of the payara server. Then it displays like all the rest.

Can I delete file in Nifi after send messages to kafka?

Hi I'm using nifi as an ETL tool.
Process IMG
This is my current process. I use TailFile to detect CSV file and then send messages to Kafka.
It works fine so far, but i want to delete CSV file after i send contents of csv to Kafka.
Is there any way?
Thanks
This depends on why you are using TailFile. From the docs,
"Tails" a file, or a list of files, ingesting data from the file as it is written to the file
TailFile is used to get new lines that are added to the same file, as they are written. If you need to a tail a file that is being written to, what condition determines it is no longer being written to?
However, if you are just consuming complete files from the local file system, then you could use GetFile which gives the option to delete the file after it is consumed.
From a remote file system, you could use ListSFTP and FetchSFTP which has a Completion Strategy to move or delete.

Logstash with XML file

We are using logstash to read an xml file. This xml file is generated when a Jenkins pipeline build commences and is written to with build data during the pipeline execution. We use file input mode 'read'.
CURRENT BEHAVIOR:
The xml file is created when the Jenkins pipeline starts. Logstash discovers this xml file, reads it, logs it, and does not return to the xml file again.
PROBLEM:
Logstash has read the xml file prematurely and misses all the subsequent data that is written to it.
DESIRED BEHAVIOR:
Logstash allows us to apply some condition to tell it when to read the xml file. Ideally a trigger would tell logstash the xml file is completed and ready to be read and logged.
We want this to work with file input mode 'read'. The xml file is written to for around 1.5 hours.
Is there a filter, plugin or some other functionally that will allow logstash to return to the xml file when it is modified?

Not able to dump JSON response to a FTP server in a CSV file using streamsets

I created a pipeline with HTTP client > Field Pivoter > Field Flattenner > SFTP/FTP/FTPS Client
I am simply trying to fetch data from a HTTP API which returns JSON and dump its response to a FTP server in a CSV file.
When I am trying to preview it "Write to Destination and Executors" I am getting this error
com.streamsets.pipeline.lib.generator.DataGeneratorException:
WHOLE_FILE_GENERATOR_ERROR_0 - Whole
File Format Error. Reason :
java.lang.IllegalArgumentException:
Record does not contain the mandatory
fields
/fileRef,/fileInfo,/fileInfo/size for
Whole File Format.
I checked the Docs, there isn't much on what it means and how to resolve this.
In the FTP client destination, which is the final block, I have specified the File Name Expressions as ${record:value('/fileInfo/filename')}.csv
If I dont check the "Write to destination and executor" checkbox while previewing, I can see all the data and its transformation. But when I am trying to dump, it shows that error.
How can I resolve this?
Unfortunately, it is not possible to write CSV data directly from the FTP client destination. One way to do this would be to write data to local disk in one pipeline, and then move the files to the FTP server in a second.
Notes:
Your first pipeline would use Local FS destination with 'Delimited' data format.
Your second pipeline would use Directory origin and SFTP/FTP/FTPS Client destination, both set to 'Whole File' data format.

Collect log files from FTP into Logstash/Elasticsearch

I am investigated the Elastic stack for collecting logs files. As I understand, Elasticsearch is used for storage and indexing, and Logstash for parsing them. There is also Filebeat that can send the files to the Logstash server.
But it seems like this entire stack assumes that you have root access to the server that is producing the logs. In my case, I don't have root access, but I have FTP access to the files. I looked at various input plugins for Logstash, but couldn't find something suitable.
Is there a component of the Elastic system that can help with this setup, without requiring me to write (error-prone) custom code?
May be you can use exec input plugin with curl. Something like:
exec {
codec => plain { }
command => "curl ftp://server/logs.log"
interval => 3000}
}

Resources