We’re using FluentD to send data over to our ELK stack. Heroku sends over logs in a BULK format which includes multiple log entries, separated by a new line.
I was wondering if anyone had any experience with splitting incoming http requests in FluentD by newline? I saw examples of this in past versions < 1.0. There are also two Heroku+FluentD plugins, both of which no longer seem to work and are not maintained.
Can I use a parser to split the incoming message into multiple messages and emit each to FluentD, if so, how?
If not, is there a simpler way to get these bulk messages sent from Heroku into FluentD, split by new line?
The bulk log messages Heroku posts look something like this:
83 <40>1 2012-11-30T06:45:29+00:00 host app web.3 - State changed from starting to up 119 <40>1 2012-11-30T06:45:26+00:00 host app web.3 - Starting process with command bundle exec rackup config.ru -p 24405
So in our logging solution, we’re getting multiple rows per entry. We’ve tried multi line parsing, but that doesn’t seem to do the trick.
You can achieve your goal by using the following gem
Install the gem and use one of the following configs
Use this one of your events are separated with a new line
<match *.*>
#type record_splitter
tag splitted.log
input_key message
split_stratgey lines
append_new_line true
remove_new_line true
</match>
or use this one to split the lines with regex
<match *.*>
#type record_splitter
tag splitted.log
input_key message
split_stratgey regex
split_regex /\d+\s<\d+>.+/
</match>
to be able to process the log lines further, you can add another output match for the tag used (example below to send logs to elk)
<filter splitted.log>
...
</filter>
<match splitted.log>
#type rewrite_tag_filter
</match>
<match **.**>
#type elasticsearch
...
</match>
Related
i'm using elk stack of version 5.5 in ubuntu 16.0
My logs are getting broken and not writting properly into elastic which is resulting in json.erros
like below
Error decoding JSON: invalid character 'e' in literal null (expecting 'u')"
getting json.errors very frequent and those logs are not reading or writting properly into elasticsearch ?
and this is happening for every 5 to 10 mins. please help me solve it.
screenshot of broken logs in kibana
My sample log is :
{"log":"2019-10-01 07:18:26:854*[DEBUG]*cluster2-nio-worker-0*Connection*userEventTriggered*Connection[cassandraclient/10.3.254.137:9042-1, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat\n","stream":"stdout","time":"2019-10-01T07:18:26.85462769Z"}
Since you have stated that the json logs are not pretty printed I assume that the multiline-settings of your input configuration are causing the problems.
In my opinion you don't need any multiline settings when you have logs in json format and they are not pretty printed, meaning the whole json object (= log event) is written in one line.
You have already specified
json.message_key: log
This solely should get the job done.
So to sum it up:
Remove the multiline settings and try again. Your configuration should look like this:
filebeat.inputs:
- type: log
paths:
- "/var/log/containers/*.log"
tags: ["kube-logs"]
symlinks: true
json.message_key: log
json.keys_under_root: true
json.add_error_key: true
I Want the functionality that one filebeat instance can send data to different logstash pipeline.
Is this possible?
I have configured one logstash service having two pipelines, both
pipelines separate ports are given.
Let's say Pipeline1 (Port 5044) , Pipeline2 (Port 5045)
Now i want to send data to the logstash using filebeat. So i have
two types of log file let's say log1, log2.
I want to send log1 to Pipeline1 and log2 to Pipeline 2.
I am running only one instance of filebeat, how i can do this?
Filebeat can have only one output, you will need to run another filebeat instance or change your logstash pipeline to listen in only one port and then filter the data based in tags, it is easier to filter on logstash than to have two instances.
In Filebeat you can specify a tag for each input that you have and use those tags in your logstash to send the log to desired pipeline.
For example, events with the tag log1 will be sent to the pipeline1 and events with the tag log2 will be sent to the pipeline2.
Your configuration needs to be something like this in Filebeat:
- type: log
enabled: true
paths:
- "/path/to/your/logs/*.json"
tags: ["logN"]
And then you will need a conditional in your logstash filters and outputs to each tag you want:
filter {
if "logN" in [tags] {
filters
}
}
output {
if "logN" in [tags] {
output
}
}
Filebeat can have only one output, but this can be achived by using an messaging medium between filebeat and logstash, i am using kafka in my case between filebeat and logstash to achieve the above request.
Currently at the end of my Jenkins build I grab the console log and add it to a json blob along with the build details, and I send that to logstash via curl
def payload = JsonOutput.toJson([
CONSOLE: getConsoleText(),
BUILD_RESULT: currentBuild.result,
] << manager.getEnvVars()
)
sh "curl -i -X PUT -H \'content-type: application/json\' --insecure -d #data.json http://mylogstash/jenkins"
Logstash then put this straight into elasticsearch against a Jenkins index for the day. This works great and the whole log gets stored in elasticsearch, but it doesnt make it very searchable.
What I would like to do is send the log to logstash as a whole (as it is quite large), and for logstash to parse it line by line and apply filters. Then any lines I dont filter out to be posted to ES as a document by itself.
Is this possible, or would I have to send it line by line from Jenkins? As the log files are thousands of lines long would result in loads of requests to logstash.
If you have the flexibilty to it, i would suggest you to write the console logs to a log file. In this way you can use filebeat to automatically read the log line by line and send it over to the logstash. By using filebeat, you get the advantage of guaranteed single delivery of the data and automatic retires if and when the logstash goes down.
Once the data reaches logstash, you can use the pipeline to parse/filter the data as per your requirement. The Grok debugger available in this link is handy --> http://grokdebug.herokuapp.com/.
After transforming the data, the document can be sent to ES for persistance.
Apparently logstash OnDemand account does not work when I wanted to post an issue.
Anyways, I have a logstash setup with redis, elasticsearch, and kibana. My logstash are collecting logs from several files and putting in redis just fine.
Logstash version 1.3.3
Elasticsearch version 1.0.1
The only thing I have in elasticsearch_http for logstash is the host name. This all setup seems to glue together just fine.
The problem is that the elasticsearch_http is not consuming the redis entries as they come. What I have seen by running it in debug mode is that it flush about 100 entries after every 1 min (flush_size and idle_flush_time's default values). The documentation however states, from what I understand is, that it will force a flush in case the 100 flush_size is not satisfied (for example we had 10 messages in last 1 min). But it seems to work the other way. Its flushing about 100 messages every 1 min only. I changed the size to 2000 and it flush 2000 every min or so.
Here is my logstash-indexer.conf
input {
redis {
host => "1xx.xxx.xxx.93"
data_type => "list"
key => "testlogs"
codec => json
}
}
output {
elasticsearch_http {
host => "1xx.xxx.xxx.93"
}
}
Here is my elasticsearch.yml
cluster.name: logger
node.name: "logstash"
transport.tcp.port: 9300
http.port: 9200
discovery.zen.ping.unicast.hosts: ["1xx.xxx.xxx.93:9300"]
discovery.zen.ping.multicast.enabled: false
#discovery.zen.ping.unicast.enabled: true
network.bind_host: 1xx.xxx.xxx.93
network.publish_host: 1xx.xxx.xxx.93
The indexer, elasticsearch, redis, and kibana are on same server. The log collection from file is done on another server.
So I'm going to suggest a couple of different approaches to solve you problem. Logstash as you are discovering can be a bit quirky so I've found a these approaches useful in dealing with unexpected behavior from logstash.
Use the elasticsearch output instead of elasticsearch_http. You
can get the same functionality by using elasticsearch output with
protocol set to http. The elasticsearch output is more mature
(milestone 2 vs milestone 3) and I've seen this change make a
difference before.
Set the defaults for idle_flush_time and flush_size. There have
been issues with Logstash defaults previously, I've found it to be a
lot safer to set them explicitly. idle_flush_time is in seconds,
flush_size is the number of records to flush.
Upgrade to more recent versions of logstash. There is
enough of a change in how logstash is deployed with version 1.4.X
(http://logstash.net/docs/1.4.1/release-notes) that I'd that I'd
bite the bullet and upgrade. It's also significantly easier to get
attention if you still have a problem with the most recent stable
major release.
Make certain your Redis version matches those support by your
logstash version.
Experiment with setting the batch, batch_events and batch_timeout
values for the Redis output. You are using the list data_type.
list supports various batch options and as with some other
parameters it's best not to assume the defaults are always being set
correctly.
Do all of the above. In addition to trying the first set of
suggestions, I'd try all of them together in various combinations.
Keep careful records of each test run. Seems obvious but between all
the variations above it's easy to lose track - I'd keep careful
records and try to change only one variation at a time.
Stackers
I have a lot of messages in a RabbitMQ queue (running on localhost in my dev environment). The payload of the messages is a JSON string that I want to load directly into Elastic Search (also running on localhost for now). I wrote a quick ruby script to pull the messages from the queue and load them into ES, which is as follows :
#! /usr/bin/ruby
require 'bunny'
require 'json'
require 'elasticsearch'
# Connect to RabbitMQ to collect data
mq_conn = Bunny.new
mq_conn.start
mq_ch = mq_conn.create_channel
mq_q = mq_ch.queue("test.data")
# Connect to ElasticSearch to post the data
es = Elasticsearch::Client.new log: true
# Main loop - collect the message and stuff it into the db.
mq_q.subscribe do |delivery_info, metadata, payload|
begin
es.index index: "indexname",
type: "relationship",
body: payload
rescue
puts "Received #{payload} - #{delivery_info} - #{metadata}"
puts "Exception raised"
exit
end
end
mq_conn.close
There are around 4,000,000 messages in the queue.
When I run the script, I see a bunch of messages, say 30, being loaded into Elastic Search just fine. However, I see around 500 messages leaving the queue.
root#beep:~# rabbitmqctl list_queues
Listing queues ...
test.data 4333080
...done.
root#beep:~# rabbitmqctl list_queues
Listing queues ...
test.data 4332580
...done.
The script then silently exits without telling me an exception. The begin/rescue block never triggers an exception so I don't know why the script is finishing early or losing so many messages. Any clues how I should debug this next.
A
I've added a simple, working example here:
https://github.com/elasticsearch/elasticsearch-ruby/blob/master/examples/rabbitmq/consumer-publisher.rb
It's hard to debug your example when you don't provide examples of the test data.
The Elasticsearch "river" feature is deprecated, and will be removed, eventually. You should definitely invest time into writing your own custom feeder, if RabbitMQ and Elasticsearch are a central part of your infrastructure.
Answering my own question, I then learned that this is a crazy and stupid way to load a message queue of index instructions into Elastic. I created a river and can drain instructions much faster than I could with a ropey script. ;-)