Logstash agent not indexing anymore - elasticsearch

I have a Logstash instance running as a service that reads from Redis and outputs to Elasticsearch. I just noticed there was nothing new in Elasticsearch for the last few days, but the Redis lists were increasing.
Logstash log was filled with 2 errors repeated for thousands of lines:
:message=>"Got error to send bulk of actions"
:message=>"Failed to flush outgoing items"
The reason being:
{"error":"IllegalArgumentException[Malformed action/metadata line [107], expected a simple value for field [_type] but found [START_ARRAY]]","status":500},
Additionally, trying to stop the service failed repeatedly, I had to kill it. Restarting it emptied the Redis lists and imported everything to Elasticsearch. It seems to work ok now.
But I have no idea how to prevent that from happening again. The mentioned type field is set as a string for each input directive, so I don't understand how it could have become an array.
What am I missing?
I'm using Elasticsearch 1.7.1 and Logstash 1.5.3. The logstash.conf file looks like this:
input {
redis {
host => "127.0.0.1"
port => 6381
data_type => "list"
key => "b2c-web"
type => "b2c-web"
codec => "json"
}
redis {
host => "127.0.0.1"
port => 6381
data_type => "list"
key => "b2c-web-staging"
type => "b2c-web-staging"
codec => "json"
}
/* other redis inputs, only key/type variations */
}
filter {
grok {
match => ["msg", "Cache hit %{WORD:query} in %{NUMBER:hit_total:int}ms. Network: %{NUMBER:hit_network:int} ms. Deserialization %{NUMBER:hit_deserial:int}"]
add_tag => ["cache_hit"]
tag_on_failure => []
}
/* other groks, not related to type field */
}
output {
elasticsearch {
host => "[IP]"
port => "9200"
protocol=> "http"
cluster => "logstash-prod-2"
}
}

According to your log message:
{"error":"IllegalArgumentException[Malformed action/metadata line [107], expected a simple value for field [_type] but found [START_ARRAY]]","status":500},
It seems you're trying to index a document with a type field that's an array instead of a string.
I can't help you without more of the logstash.conf file.
But check followings to make sure:
When you use add_field for changing the type you actually turn type into an array with multiple values, which is what Elasticsearch is complaining about.
You can use mutate join to convert arrays to strings: api link
filter {
mutate {
join => { "fieldname" => "," }
}
}

Related

Using Redis key as Elasticsearch index name

I am attempting to use a logstash indexer to move data from redis to elasticsearch.
On the input to redis end, I give a 'key' to one set of logs from logstash output.
redis
{
host => "server
port => "7379"
data_type => "list"
key => "aruba"
}
On input end , I read each keys in the input.
input
{
redis
{
host => "localhost"
port => "6379"
data_type => "list"
type => "redis-input"
key => "logstash"
codec => "json"
threads => 32
batch_count => 1000
#timeout => 10
}
redis
{
host => "localhost"
port => "6379"
data_type => "list"
type => "redis-input"
key => "aruba"
codec => "json"
threads => 32
batch_count => 1000
#timeout => 10
}
}
and I am attempting to use the key in the logstash to write to index. i.e.
aruba-2017.24.10. something like that, but the output always goes to logstash. I tried
if[redis.key] == "xyz"
{
elasticsearch {index => "xyz-%{time}"}
}
or if[key] == "xyz" ....
also tried
elasticsearch
{
index => "%{key}-%{time}"
}
and elasticsearch{index => "%{redis.key}-%{time}"}
etc. None of it seems to work.
While #sysadmin1138 is write in that accessing nested fields is done via [field][subfield] rather than [field.subfield] your problem is that you are trying to access data that is not in your log event.
While in Redis, your log events have a key associated with them, but this is not part of the event itself and is merely used to access the event from Redis. When Logstash fetches the event from Redis, it uses that "key" to specify which events it wants, but the key does not make it to elastic.
To see this for yourself, try running logstash with stdout{codec => "rubydebug"} as an output plugin, it will prettyprint your whole log event allowing you to see what data is included.
To your rescue comes the add_field parameter that exists for every logstash plugin. You can add to your input:
redis
{
host => "localhost"
port => "6379"
data_type => "list"
type => "redis-input"
key => "aruba"
codec => "json"
threads => 32
batch_count => 1000
add_field => {
"[redis][key]" => "aruba"
}
}
Then changing your conditional to use [redis][key] will leave your code working.
(Cheers to RELK stacks)
This is likely due to an incorrect definition of the name in your conditional.
if [redis.key] == "xyz" {
elasticsearch {index => "xyz-%{time}"}
}
Should be:
if [redis][key] == "xyz" {
elasticsearch {index => "xyz-%{time}"}
}

I want to Delete document by logstash,but it throws a exception

Now,I meet a question. My logstash configuration file as follows:
input {
redis {
host => "127.0.0.1"
port => 6379
db => 10
data_type => "list"
key => "local_tag_del"
}
}
filter {
}
output {
elasticsearch {
action => "delete"
hosts => ["127.0.0.1:9200"]
codec => "json"
index => "mbd-data"
document_type => "localtag"
document_id => "%{album_id}"
}
file {
path => "/data/elasticsearch/result.json"
}
stdout {}
}
I want to read id from redis, by logstash, notify es to delete document.
Excuse me,My English is poor,I hope that someone will help me .
Thx.
I can't help you particularly, because your problem is spelled out in your error message - logstash couldn't connect to your elasticsearch instance.
That usually means one of:
elasticsearch isn't running
elasticsearch isn't bound to localhost
That's nothing to do with your logstash config. Using logstash to delete documents is a bit unusual though, so I'm not entirely sure this isn't an XY problem

ElasticSearch assign own IDs while indexing with LogStash

I am indexing a large corpora of information and I have a string-key that I know is unique. I would like to avoid using the search and rather access documents by this artificial identifier.
Since the Path directive is discontinued in ES 1.5, anyone know a workaround to this problem!?
My data look like:
{unique-string},val1, val2, val3...
{unique-string2},val4, val5, val6...
I am using logstash to index the files and would prefer to fetch the documents through a direct get, rather than through an exact-match.
In your elasticsearch output plugin, just specify the document_id setting with a reference to the field you want to use as id, i.e. the one named 1 in your csv filter.
input {
file {...}
}
filter {
csv{
columns=>["1","2","3"]
separator => ","
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
port => "9200"
index => "index-name"
document_id => "%{1}" <--- add this line
workers => 2
cluster => "elasticsearch-cluster"
protocol => "http"
}
}

Copy ElasticSearch-Index with Logstash

I have an ready-build Apache-Index on one machine, that I would like to clone to another machine using logstash. Fairly easy i thought
input {
elasticsearch {
host => "xxx.xxx.xxx.xxx"
index => "logs"
}
}
filter {
}
output {
elasticsearch {
cluster => "Loa"
host => "127.0.0.1"
protocol => http
index => "logs"
index_type => "apache_access"
}
}
that pulls over the docs, but doesn't stop as it uses the default query "*" (the original index has ~50.000 docs and I killed the former script, when the new index was over 600.000 docs and rising)
Next I tried to make sure the docs would get updated instead of duplicated, but this commit hasn't made it yet, so i don't have a primary..
Then I remembered sincedb but don't seem to be able to use that in the query (or is that possible)
Any advice? Maybe a complete different approach? Thanks a lot!
Assuming that the elasticsearch input creates a logstash event with the document id ( I assume it will be _id or something similar), try setting the elastic search output the following way:
output {
elasticsearch {
cluster => "Loa"
host => "127.0.0.1"
protocol => http
index => "logs"
index_type => "apache_access"
document_id => "%{_id}"
}
}
That way, even if the elasticsearch input, for whatever reason, continues to push the same documents indefinitely, elasticsearch will merely updated the existing documents, instead of creating new documents with new ids.
Once you reach 50,000, you can stop.

Way to populate Logstash output variable without getting it from an Input?

Is there another way to tell Logstash to supply a value to an output variable without pulling it from a Logstash input? For example, in my case I'd like to create an Elasticsearch index based on a performance run ID (which I'd do from an external script) and then have Logstash send to that. For now I was thinking of creating a tcp input just for receiving perf run info and then have a filter to match on the run id. Seems like a convoluted way to do this though. For example:
input {
tcp {
type => "perfinfo"
port => 8888
}
}
if [type] == "perfinfo" {
do some matching to extract the id
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}
I'm not sure if setting manage_template to false would actually be necessary. I've read that it is.
Update
Thanks Nirdesh for that. Using Ruby might be very handy.
While I was waiting I tried using a grok filter like so:
grok {
match => { "message" => "%{WORD:perftype}-%{POSINT:perfid}" }
}
Which produced this stdout during debugging:
{
"message" => "awperf-14",
"#version" => "1",
"#timestamp" => "2014-10-17T20:01:19.758Z",
"host" => "0:0:0:0:0:0:0:1:33361",
"type" => "perfinfo",
"perftype" => "awperf",
"perfid" => "14"
}
Which I tried creating an index based on this like so:
index => "%{perftype}-%{perfid}"
So when I passed 'awperf-14' to the input, I ended up creating these indexes
%{perftype}-%{perfid}
awperf-14
Which is not what I was expecting. Also, it's the %{perftype}-%{perfid} index that starts to be populated, not awperf-14, the one I actually wanted.
Yes.
You can add any no. of your own variables either for intermediate result or for permanent using a property called add_field. All most all filters in logstash support this property.
So, for your soluation, you can use a ruby script to find out the id dynamically and store it in a new variable called id, which you can use it in output.
For Example :
input {
tcp {
type => "perfinfo"
port => 8888
}
}
filter{
if [type] == "perfinfo" {
ruby{
//do some processing
add_field => { "id" => "Some value" }
}
}
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}
I'm not sure I can do what I was trying to do via Logstash. To be a clearer, I simply wanted to change the index based on the performance run ID I'm executing. There's nothing in the data that would have this information (I have to pull it from a DB). So instead of trying to have Logstash listen for a performance run ID, I scripted this externally. The script uses the Elasticsearch API to create a new index, and then does a string replace for the index in the Logstash config file. It then restarts Logstash, which normally happens between performance runs anyway. This approach was much easier to do, and seems cleaner.

Resources