Include fluentd time into json post data - hadoop

td-agent.config
<match test>
type webhdfs
host localhost
port 50070
path /test/%Y%m%d_%H
username hdfs
output_include_tag false
remove_prefix test
time_format %Y-%m-%d %H:%M:%S
output_include_time true
format json
localtime
buffer_type file
buffer_path /test/test
buffer_chunk_limit 4m
buffer_queue_limit 50
flush_interval 3s
</match>
In hdfs log file it show as below:
2016-02-22 16:04:15 {"login_id":123,"email":"abcd#gmail.com"}
Have any way to embed the fluentd time field not the client time into json data before store in file such as:
{"time_key":"2016-02-22 16:04:15","login_id":123,"email":"abcd#gmail.com"}

I have the solution :
Use plugin https://github.com/repeatedly/fluent-plugin-record-modifier
Add the field time and then push to hdfs
:)

Related

time_format configuration in fluent.conf

What is the correct time_format for following date and time format in fluent.conf configuration.
12/Apr/2021:12:17:03.747 +0530
I tried with below but not sure whether it is correct because access logs are not printing in the Kibana.
time_format %d/%b/%Y:%H:%M:%3N %z
Thanks.
I was able to make it work with following format.
time_format %d/%b/%Y:%H:%M:%S.%N %z
Thanks.

How can I use environment variables in fluentd config?

I have problem with using env at td-agent config, I tried:
<source>
#type tail
path /home/td-agent/test.txt
tag "#{ENV['WEBTEST']}"
pos_file /var/log/td-agent/td-agent-test.pos
#include /etc/td-agent/web_parse_regex.conf
</source>
/etc/sysconfig/td-agent :
export WEBTEST="webtest"
and when I start td-agent and check td-agent.log, tag is empty
2020-06-09 15:40:20 +0900 [info]: using configuration file: <ROOT>
<source>
#type tail
path "/home/td-agent/test.txt"
tag ""
pos_file "/var/log/td-agent/td-agent-test.pos"
.....
+I'm using centos
You need to make sure that the /etc/sysconfig/td-agent have execute rights
chmod a+x /etc/sysconfig/td-agen
and to make sure that the init script is executing these files, the below lines need to be in the file /etc/init.d/td-agent
TD_AGENT_DEFAULT=/etc/sysconfig/td-agent
# Read configuration variable file if it is present
if [ -f "${TD_AGENT_DEFAULT}" ]; then
. "${TD_AGENT_DEFAULT}"
fi
could not find a way to set env vars from inside the conf file, but you can set variable values in ruby in the system blocks and reuse them in the conf file.
<system>
"#{MONGO_CONNECTION_STRING='mongodb://localhost:27017/test'}"
</system>
<match>
#type mongo
connection_string "#{MONGO_CONNECTION_STRING}"
# database test
collection fluentd
</match>

date is not appending to elasticsearch index name while using td-agent

i need to store the log to Es index when i use logstash date is getting append ed to index name as logstash.2018-08-06,but when i try to give the custom name as in the flowing conf,its not getting added
</store>
<store>
#type elasticsearch
host X.X.X.X
port 9200
logstash_format false
index_name updatetest.%Y%m%d --> In index name its not replacing with date
</store>
Here is the index name ,created by above conf updatetest.%Y%m%d --> its should be like updatetest.20180806
Thanks for help in advance
If you don't want to use the logstash format this also works:
<store>
#type elasticsearch
host x.x.x.x
index_name test.%Y%m
<buffer tag, time>
timekey 1h
</buffer>
flush_interval 5s
</store>
Now %Y and %m get replaced. Defining a buffer makes the datetime formatting codes available.
HI Solved the above issue.
</store>
<store>
#type elasticsearch
host X.X.X.X
port 9200
logstash_format true
logstash_prefix babuji
</store>
</match>
#</match>

How to forward a JSON file with FluentD to Graylog2 with a valid time format

I am working on logging with FluentD and Graylog GELF with limited success. I want to forward a JSON file:
<source>
#type tail
path /var/log/suricata/eve.json
pos_file /var/log/td-agent/suri_eve.pos # pos record
tag ids
format json
# JSON time stamp: 2016-02-01T11:52:49.157072+0000
# this timestamp is ruby's t.strftime("%Y-%m-%dT%H:%M:%S.%6N%z")
time_format %Y-%m-%dT%H:%M:%S.%6N%z
time_key timestamp # I show a JSON message below
</source>
<match **>
#type graylog
host 1.2.3.4 #(optional; default="localhost")
port 12201 #(optional; default=9200)
flush_interval 30
num_threads 2
</match>
This kicks in, but produces error messages:
2016-02-01 15:30:11 +0000 [warn]: plugin/in_tail.rb:263:rescue in
convert_line_to_event:
"{\"timestamp\":\"2016-02-01T15:27:09.000087+0000\",\"flow_id\":51921072,\"event_type\":\"flow\",\"src_ip\":\"10.1.1.85\",\"src_port\":59820,\"dest_ip\":\"224.0.0.252\",\"dest_port\":5355,\"proto\":\"UDP\",\"flow\":{\"pkts_toserver\":4,\"pkts_toclient\":0,\"bytes_toserver\":294,\"bytes_toclient\":0,\"start\":\"2016-02-01T15:26:30.393371+0000\",\"end\":\"2016-02-01T15:26:37.670904+0000\",\"age\":7,\"state\":\"new\",\"reason\":\"timeout\"}}" error="invalid time format: value = 2016-02-01T15:27:09.000087+0000,
error_class = ArgumentError, error = invalid strptime format -
`%Y-%m-%dT%H:%M:%S.%6N%z'"
An original messages looks like this:
{"timestamp":"2016-02-01T15:31:02.000699+0000","flow_id":52015920,"event_type":"flow","src_ip":"10.1.1.44","src_port":49313,"dest_ip":"224.0.0.252","dest_port":5355,"proto":"UDP","flow":{"pkts_toserver":2,"pkts_toclient":0,"bytes_toserver":128,"bytes_toclient":0,"start":"2016-02-01T15:30:31.348568+0000","end":"2016-02-01T15:30:31.759024+0000","age":0,"state":"new","reason":"timeout"}}
So I checked the Ruby docs. I am not too familiar with FluentD but from what I know the time format expression should fit? I tried format=none but that also doesn't work.
https://github.com/Graylog2/graylog2-server/issues/1761
This is a bug/problem with reserved fields (undocumented) in Graylog2.
If you find a similar bug with timestamps, check the linked issue and the dev response.

Using id_key with fluentd/elasticsearch

I recently started attempting to use the fluentd + elasticsearch + kibana setup.
I'm currently feeding information through fluentd by having it read a log file I'm spitting out with python code.
The log is made out of a list of json data, one per line, like so:
{"id": "1","date": "2014-02-01T09:09:59.000+09:00","protocol": "tcp","source ip": "xxxx.xxxx.xxxx.xxxx","source port": "37605","country": "CN","organization": "China Telecom jiangsu","dest ip": "xxxx.xxxx.xxxx.xxxx","dest port": "23"}
I have the fluentd set-up to read my field "id" and fill out "_id", as per instructions here:
<source>
type tail
path /home/(usr)/bin1/fluentd.log
tag es
format json
keys id, date, prot, srcip, srcport, country, org, dstip, dstport
id_key id
time_key date
time_format %Y-%m-%dT%H:%M:%S.%L%:z
</source>
<match es.**>
type elasticsearch
logstash_format true
flush_interval 10s # for testing
</match>
However, the "_id" after inserting the above still comes out to be the randomly generated _id.
If anyone could point out to me what I'm doing wrong, I would much appreciate it.
id_key id should be in inside <match es.**>, not <source>.
<source> is for input plugin, tail in this case.
<match> is for output plugin, elasticsearch in this case.
So elasticsearch configuration should be set in <match>.
http://docs.fluentd.org/articles/config-file

Resources