Using id_key with fluentd/elasticsearch - elasticsearch

I recently started attempting to use the fluentd + elasticsearch + kibana setup.
I'm currently feeding information through fluentd by having it read a log file I'm spitting out with python code.
The log is made out of a list of json data, one per line, like so:
{"id": "1","date": "2014-02-01T09:09:59.000+09:00","protocol": "tcp","source ip": "xxxx.xxxx.xxxx.xxxx","source port": "37605","country": "CN","organization": "China Telecom jiangsu","dest ip": "xxxx.xxxx.xxxx.xxxx","dest port": "23"}
I have the fluentd set-up to read my field "id" and fill out "_id", as per instructions here:
<source>
type tail
path /home/(usr)/bin1/fluentd.log
tag es
format json
keys id, date, prot, srcip, srcport, country, org, dstip, dstport
id_key id
time_key date
time_format %Y-%m-%dT%H:%M:%S.%L%:z
</source>
<match es.**>
type elasticsearch
logstash_format true
flush_interval 10s # for testing
</match>
However, the "_id" after inserting the above still comes out to be the randomly generated _id.
If anyone could point out to me what I'm doing wrong, I would much appreciate it.

id_key id should be in inside <match es.**>, not <source>.
<source> is for input plugin, tail in this case.
<match> is for output plugin, elasticsearch in this case.
So elasticsearch configuration should be set in <match>.
http://docs.fluentd.org/articles/config-file

Related

Fluentbit Kubernetes - How to extract fields from existing logs

I have configured EFK stack with Fluent-bit on my Kubernetes cluster. I can see the logs in Kibana.
I also have deployed nginx pod, I can see the logs of this nginx pod also in Kibana. But all the log data are sent to a single field "log" as shown below.
How can I extract each field into a separate field. There is a solution for fluentd already in this question. Kibana - How to extract fields from existing Kubernetes logs
But how can I achieve the same with fluent-bit?
I have tried the below by adding one more FILTER section under the default FILTER section for Kubernetes, but it didn't work.
[FILTER]
Name parser
Match kube.*
Key_name log
Parser nginx
From this (https://github.com/fluent/fluent-bit/issues/723), I can see there is no grok support for fluent-bit.
In our official documentation for Kubernetes filter we have an example about how to make your Pod suggest a parser for your data based in an annotation:
https://docs.fluentbit.io/manual/filter/kubernetes
Look at this configmap:
https://github.com/fluent/fluent-bit-kubernetes-logging/blob/master/output/elasticsearch/fluent-bit-configmap.yaml
The nginx parser should be there:
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z

date is not appending to elasticsearch index name while using td-agent

i need to store the log to Es index when i use logstash date is getting append ed to index name as logstash.2018-08-06,but when i try to give the custom name as in the flowing conf,its not getting added
</store>
<store>
#type elasticsearch
host X.X.X.X
port 9200
logstash_format false
index_name updatetest.%Y%m%d --> In index name its not replacing with date
</store>
Here is the index name ,created by above conf updatetest.%Y%m%d --> its should be like updatetest.20180806
Thanks for help in advance
If you don't want to use the logstash format this also works:
<store>
#type elasticsearch
host x.x.x.x
index_name test.%Y%m
<buffer tag, time>
timekey 1h
</buffer>
flush_interval 5s
</store>
Now %Y and %m get replaced. Defining a buffer makes the datetime formatting codes available.
HI Solved the above issue.
</store>
<store>
#type elasticsearch
host X.X.X.X
port 9200
logstash_format true
logstash_prefix babuji
</store>
</match>
#</match>

Logstash don't send data

Yesterday I've configured a logstash file, to send the data to elasticsearch.
Today, I'm trying to do the same but (configure another file) but it doesn't work!
Why? what should I do?
The terminal just shows me that the pipeline started and pipeslines running that' all.
this is the configuration:
input{
file{
path =>"C:\Users\GeeksData\Desktop\ElasticSerach\GENERIC_FUFR0004_20171017_173013379.SyntaxicError.txt"
start_position =>"beginning"
}
}
output {
elasticsearch {
hosts => "localhost"
index =>"helloworld3"
document_type =>"helloworld3"
}
stdout {}
}
I've added in input plugin these line:
sincedb_path => NUL
know it works
It's not really an answer but I have no more problems with ingesting data using logstash :
Some important informations that you need to know if you have problems ingesting data with logstash:
1- The Name of the index,hosts and document_type need to be lowercased
2- Logstash don't reingest data that has already been ingested , only if you have changed something (like the name of the index) in the configuration file .
3- You need to create an index pattern in kibana and link it with the index created by elasticsearch to be able to visualize data of this index with kibana

Include fluentd time into json post data

td-agent.config
<match test>
type webhdfs
host localhost
port 50070
path /test/%Y%m%d_%H
username hdfs
output_include_tag false
remove_prefix test
time_format %Y-%m-%d %H:%M:%S
output_include_time true
format json
localtime
buffer_type file
buffer_path /test/test
buffer_chunk_limit 4m
buffer_queue_limit 50
flush_interval 3s
</match>
In hdfs log file it show as below:
2016-02-22 16:04:15 {"login_id":123,"email":"abcd#gmail.com"}
Have any way to embed the fluentd time field not the client time into json data before store in file such as:
{"time_key":"2016-02-22 16:04:15","login_id":123,"email":"abcd#gmail.com"}
I have the solution :
Use plugin https://github.com/repeatedly/fluent-plugin-record-modifier
Add the field time and then push to hdfs
:)

How to forward a JSON file with FluentD to Graylog2 with a valid time format

I am working on logging with FluentD and Graylog GELF with limited success. I want to forward a JSON file:
<source>
#type tail
path /var/log/suricata/eve.json
pos_file /var/log/td-agent/suri_eve.pos # pos record
tag ids
format json
# JSON time stamp: 2016-02-01T11:52:49.157072+0000
# this timestamp is ruby's t.strftime("%Y-%m-%dT%H:%M:%S.%6N%z")
time_format %Y-%m-%dT%H:%M:%S.%6N%z
time_key timestamp # I show a JSON message below
</source>
<match **>
#type graylog
host 1.2.3.4 #(optional; default="localhost")
port 12201 #(optional; default=9200)
flush_interval 30
num_threads 2
</match>
This kicks in, but produces error messages:
2016-02-01 15:30:11 +0000 [warn]: plugin/in_tail.rb:263:rescue in
convert_line_to_event:
"{\"timestamp\":\"2016-02-01T15:27:09.000087+0000\",\"flow_id\":51921072,\"event_type\":\"flow\",\"src_ip\":\"10.1.1.85\",\"src_port\":59820,\"dest_ip\":\"224.0.0.252\",\"dest_port\":5355,\"proto\":\"UDP\",\"flow\":{\"pkts_toserver\":4,\"pkts_toclient\":0,\"bytes_toserver\":294,\"bytes_toclient\":0,\"start\":\"2016-02-01T15:26:30.393371+0000\",\"end\":\"2016-02-01T15:26:37.670904+0000\",\"age\":7,\"state\":\"new\",\"reason\":\"timeout\"}}" error="invalid time format: value = 2016-02-01T15:27:09.000087+0000,
error_class = ArgumentError, error = invalid strptime format -
`%Y-%m-%dT%H:%M:%S.%6N%z'"
An original messages looks like this:
{"timestamp":"2016-02-01T15:31:02.000699+0000","flow_id":52015920,"event_type":"flow","src_ip":"10.1.1.44","src_port":49313,"dest_ip":"224.0.0.252","dest_port":5355,"proto":"UDP","flow":{"pkts_toserver":2,"pkts_toclient":0,"bytes_toserver":128,"bytes_toclient":0,"start":"2016-02-01T15:30:31.348568+0000","end":"2016-02-01T15:30:31.759024+0000","age":0,"state":"new","reason":"timeout"}}
So I checked the Ruby docs. I am not too familiar with FluentD but from what I know the time format expression should fit? I tried format=none but that also doesn't work.
https://github.com/Graylog2/graylog2-server/issues/1761
This is a bug/problem with reserved fields (undocumented) in Graylog2.
If you find a similar bug with timestamps, check the linked issue and the dev response.

Resources