I recently started attempting to use the fluentd + elasticsearch + kibana setup.
I'm currently feeding information through fluentd by having it read a log file I'm spitting out with python code.
The log is made out of a list of json data, one per line, like so:
{"id": "1","date": "2014-02-01T09:09:59.000+09:00","protocol": "tcp","source ip": "xxxx.xxxx.xxxx.xxxx","source port": "37605","country": "CN","organization": "China Telecom jiangsu","dest ip": "xxxx.xxxx.xxxx.xxxx","dest port": "23"}
I have the fluentd set-up to read my field "id" and fill out "_id", as per instructions here:
<source>
type tail
path /home/(usr)/bin1/fluentd.log
tag es
format json
keys id, date, prot, srcip, srcport, country, org, dstip, dstport
id_key id
time_key date
time_format %Y-%m-%dT%H:%M:%S.%L%:z
</source>
<match es.**>
type elasticsearch
logstash_format true
flush_interval 10s # for testing
</match>
However, the "_id" after inserting the above still comes out to be the randomly generated _id.
If anyone could point out to me what I'm doing wrong, I would much appreciate it.
id_key id should be in inside <match es.**>, not <source>.
<source> is for input plugin, tail in this case.
<match> is for output plugin, elasticsearch in this case.
So elasticsearch configuration should be set in <match>.
http://docs.fluentd.org/articles/config-file
Related
I have configured EFK stack with Fluent-bit on my Kubernetes cluster. I can see the logs in Kibana.
I also have deployed nginx pod, I can see the logs of this nginx pod also in Kibana. But all the log data are sent to a single field "log" as shown below.
How can I extract each field into a separate field. There is a solution for fluentd already in this question. Kibana - How to extract fields from existing Kubernetes logs
But how can I achieve the same with fluent-bit?
I have tried the below by adding one more FILTER section under the default FILTER section for Kubernetes, but it didn't work.
[FILTER]
Name parser
Match kube.*
Key_name log
Parser nginx
From this (https://github.com/fluent/fluent-bit/issues/723), I can see there is no grok support for fluent-bit.
In our official documentation for Kubernetes filter we have an example about how to make your Pod suggest a parser for your data based in an annotation:
https://docs.fluentbit.io/manual/filter/kubernetes
Look at this configmap:
https://github.com/fluent/fluent-bit-kubernetes-logging/blob/master/output/elasticsearch/fluent-bit-configmap.yaml
The nginx parser should be there:
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
i need to store the log to Es index when i use logstash date is getting append ed to index name as logstash.2018-08-06,but when i try to give the custom name as in the flowing conf,its not getting added
</store>
<store>
#type elasticsearch
host X.X.X.X
port 9200
logstash_format false
index_name updatetest.%Y%m%d --> In index name its not replacing with date
</store>
Here is the index name ,created by above conf updatetest.%Y%m%d --> its should be like updatetest.20180806
Thanks for help in advance
If you don't want to use the logstash format this also works:
<store>
#type elasticsearch
host x.x.x.x
index_name test.%Y%m
<buffer tag, time>
timekey 1h
</buffer>
flush_interval 5s
</store>
Now %Y and %m get replaced. Defining a buffer makes the datetime formatting codes available.
HI Solved the above issue.
</store>
<store>
#type elasticsearch
host X.X.X.X
port 9200
logstash_format true
logstash_prefix babuji
</store>
</match>
#</match>
Yesterday I've configured a logstash file, to send the data to elasticsearch.
Today, I'm trying to do the same but (configure another file) but it doesn't work!
Why? what should I do?
The terminal just shows me that the pipeline started and pipeslines running that' all.
this is the configuration:
input{
file{
path =>"C:\Users\GeeksData\Desktop\ElasticSerach\GENERIC_FUFR0004_20171017_173013379.SyntaxicError.txt"
start_position =>"beginning"
}
}
output {
elasticsearch {
hosts => "localhost"
index =>"helloworld3"
document_type =>"helloworld3"
}
stdout {}
}
I've added in input plugin these line:
sincedb_path => NUL
know it works
It's not really an answer but I have no more problems with ingesting data using logstash :
Some important informations that you need to know if you have problems ingesting data with logstash:
1- The Name of the index,hosts and document_type need to be lowercased
2- Logstash don't reingest data that has already been ingested , only if you have changed something (like the name of the index) in the configuration file .
3- You need to create an index pattern in kibana and link it with the index created by elasticsearch to be able to visualize data of this index with kibana
td-agent.config
<match test>
type webhdfs
host localhost
port 50070
path /test/%Y%m%d_%H
username hdfs
output_include_tag false
remove_prefix test
time_format %Y-%m-%d %H:%M:%S
output_include_time true
format json
localtime
buffer_type file
buffer_path /test/test
buffer_chunk_limit 4m
buffer_queue_limit 50
flush_interval 3s
</match>
In hdfs log file it show as below:
2016-02-22 16:04:15 {"login_id":123,"email":"abcd#gmail.com"}
Have any way to embed the fluentd time field not the client time into json data before store in file such as:
{"time_key":"2016-02-22 16:04:15","login_id":123,"email":"abcd#gmail.com"}
I have the solution :
Use plugin https://github.com/repeatedly/fluent-plugin-record-modifier
Add the field time and then push to hdfs
:)
I am working on logging with FluentD and Graylog GELF with limited success. I want to forward a JSON file:
<source>
#type tail
path /var/log/suricata/eve.json
pos_file /var/log/td-agent/suri_eve.pos # pos record
tag ids
format json
# JSON time stamp: 2016-02-01T11:52:49.157072+0000
# this timestamp is ruby's t.strftime("%Y-%m-%dT%H:%M:%S.%6N%z")
time_format %Y-%m-%dT%H:%M:%S.%6N%z
time_key timestamp # I show a JSON message below
</source>
<match **>
#type graylog
host 1.2.3.4 #(optional; default="localhost")
port 12201 #(optional; default=9200)
flush_interval 30
num_threads 2
</match>
This kicks in, but produces error messages:
2016-02-01 15:30:11 +0000 [warn]: plugin/in_tail.rb:263:rescue in
convert_line_to_event:
"{\"timestamp\":\"2016-02-01T15:27:09.000087+0000\",\"flow_id\":51921072,\"event_type\":\"flow\",\"src_ip\":\"10.1.1.85\",\"src_port\":59820,\"dest_ip\":\"224.0.0.252\",\"dest_port\":5355,\"proto\":\"UDP\",\"flow\":{\"pkts_toserver\":4,\"pkts_toclient\":0,\"bytes_toserver\":294,\"bytes_toclient\":0,\"start\":\"2016-02-01T15:26:30.393371+0000\",\"end\":\"2016-02-01T15:26:37.670904+0000\",\"age\":7,\"state\":\"new\",\"reason\":\"timeout\"}}" error="invalid time format: value = 2016-02-01T15:27:09.000087+0000,
error_class = ArgumentError, error = invalid strptime format -
`%Y-%m-%dT%H:%M:%S.%6N%z'"
An original messages looks like this:
{"timestamp":"2016-02-01T15:31:02.000699+0000","flow_id":52015920,"event_type":"flow","src_ip":"10.1.1.44","src_port":49313,"dest_ip":"224.0.0.252","dest_port":5355,"proto":"UDP","flow":{"pkts_toserver":2,"pkts_toclient":0,"bytes_toserver":128,"bytes_toclient":0,"start":"2016-02-01T15:30:31.348568+0000","end":"2016-02-01T15:30:31.759024+0000","age":0,"state":"new","reason":"timeout"}}
So I checked the Ruby docs. I am not too familiar with FluentD but from what I know the time format expression should fit? I tried format=none but that also doesn't work.
https://github.com/Graylog2/graylog2-server/issues/1761
This is a bug/problem with reserved fields (undocumented) in Graylog2.
If you find a similar bug with timestamps, check the linked issue and the dev response.