Way to populate Logstash output variable without getting it from an Input? - elasticsearch

Is there another way to tell Logstash to supply a value to an output variable without pulling it from a Logstash input? For example, in my case I'd like to create an Elasticsearch index based on a performance run ID (which I'd do from an external script) and then have Logstash send to that. For now I was thinking of creating a tcp input just for receiving perf run info and then have a filter to match on the run id. Seems like a convoluted way to do this though. For example:
input {
tcp {
type => "perfinfo"
port => 8888
}
}
if [type] == "perfinfo" {
do some matching to extract the id
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}
I'm not sure if setting manage_template to false would actually be necessary. I've read that it is.
Update
Thanks Nirdesh for that. Using Ruby might be very handy.
While I was waiting I tried using a grok filter like so:
grok {
match => { "message" => "%{WORD:perftype}-%{POSINT:perfid}" }
}
Which produced this stdout during debugging:
{
"message" => "awperf-14",
"#version" => "1",
"#timestamp" => "2014-10-17T20:01:19.758Z",
"host" => "0:0:0:0:0:0:0:1:33361",
"type" => "perfinfo",
"perftype" => "awperf",
"perfid" => "14"
}
Which I tried creating an index based on this like so:
index => "%{perftype}-%{perfid}"
So when I passed 'awperf-14' to the input, I ended up creating these indexes
%{perftype}-%{perfid}
awperf-14
Which is not what I was expecting. Also, it's the %{perftype}-%{perfid} index that starts to be populated, not awperf-14, the one I actually wanted.

Yes.
You can add any no. of your own variables either for intermediate result or for permanent using a property called add_field. All most all filters in logstash support this property.
So, for your soluation, you can use a ruby script to find out the id dynamically and store it in a new variable called id, which you can use it in output.
For Example :
input {
tcp {
type => "perfinfo"
port => 8888
}
}
filter{
if [type] == "perfinfo" {
ruby{
//do some processing
add_field => { "id" => "Some value" }
}
}
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}

I'm not sure I can do what I was trying to do via Logstash. To be a clearer, I simply wanted to change the index based on the performance run ID I'm executing. There's nothing in the data that would have this information (I have to pull it from a DB). So instead of trying to have Logstash listen for a performance run ID, I scripted this externally. The script uses the Elasticsearch API to create a new index, and then does a string replace for the index in the Logstash config file. It then restarts Logstash, which normally happens between performance runs anyway. This approach was much easier to do, and seems cleaner.

Related

Logstash If configuration dropping events

I am using the ELK stack to save custom events. The events I am pushing may or may not contain a field called feed.name.
I am using this field to dynamically set the index, so if it doesn't exist I want to set it to unknown before sending it to Elastic.
Here is the full config I have:
input {
http{
host => "XXXXXXXXX"
port => xxxx
codec => "json"
}
}
filter{
if ![feed.name]{
mutate { add_field => { "feed.name"=> "unknown" }}
}
if [source.asn]{
mutate { convert => {"source.asn" => "string"}}
}
if [destination.asn]{
mutate { convert => {"destination.asn" => "string"}}
}
}
output {
elasticsearch {
hosts => ["xxxxxxxxx:XXXX"]
index => "l-%{feed.name}-%{+YYYY.MM.dd}"
}
}
Here is my problem: When there is no feed.name set, Logstash sets it correctly and everything is fine. However, if the field exists, the event seems to be dropped.
So 2 questions arise here: How is this behaviour explained? And also, how can I make it work (or are there any workarounds)?

Can't access Elasticsearch index name metadata in Logstash filter

I want to add the elasticsearch index name as a field in the event when processing in Logstash. This is suppose to be pretty straight forward but the index name does not get printed out. Here is the complete Logstash config.
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
}
}
filter {
mutate {
add_field => {
"log_source" => "%{[#metadata][_index]}"
}
}
}
output {
elasticsearch {
index => "logstash-%{+YYYY.MM}"
}
}
This will result in log_source being set to %{[#metadata][_index]} and not the actual name of the index. I have tried this with _id and without the underscores but it will always just output the reference and not the value.
Doing just %{[#metadata]} crashes Logstash with the error that it's trying to accessing the list incorrectly so [#metadata] is being set but it seems like index or any values are missing.
Does anyone have a another way of assigning the index name to the event?
I am using 5.0.1 of both Logstash and Elasticsearch.
You're almost there, you're simply missing the docinfo setting, which is false by default:
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
docinfo => true
}
}

ElasticSearch assign own IDs while indexing with LogStash

I am indexing a large corpora of information and I have a string-key that I know is unique. I would like to avoid using the search and rather access documents by this artificial identifier.
Since the Path directive is discontinued in ES 1.5, anyone know a workaround to this problem!?
My data look like:
{unique-string},val1, val2, val3...
{unique-string2},val4, val5, val6...
I am using logstash to index the files and would prefer to fetch the documents through a direct get, rather than through an exact-match.
In your elasticsearch output plugin, just specify the document_id setting with a reference to the field you want to use as id, i.e. the one named 1 in your csv filter.
input {
file {...}
}
filter {
csv{
columns=>["1","2","3"]
separator => ","
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
port => "9200"
index => "index-name"
document_id => "%{1}" <--- add this line
workers => 2
cluster => "elasticsearch-cluster"
protocol => "http"
}
}

Logstash agent not indexing anymore

I have a Logstash instance running as a service that reads from Redis and outputs to Elasticsearch. I just noticed there was nothing new in Elasticsearch for the last few days, but the Redis lists were increasing.
Logstash log was filled with 2 errors repeated for thousands of lines:
:message=>"Got error to send bulk of actions"
:message=>"Failed to flush outgoing items"
The reason being:
{"error":"IllegalArgumentException[Malformed action/metadata line [107], expected a simple value for field [_type] but found [START_ARRAY]]","status":500},
Additionally, trying to stop the service failed repeatedly, I had to kill it. Restarting it emptied the Redis lists and imported everything to Elasticsearch. It seems to work ok now.
But I have no idea how to prevent that from happening again. The mentioned type field is set as a string for each input directive, so I don't understand how it could have become an array.
What am I missing?
I'm using Elasticsearch 1.7.1 and Logstash 1.5.3. The logstash.conf file looks like this:
input {
redis {
host => "127.0.0.1"
port => 6381
data_type => "list"
key => "b2c-web"
type => "b2c-web"
codec => "json"
}
redis {
host => "127.0.0.1"
port => 6381
data_type => "list"
key => "b2c-web-staging"
type => "b2c-web-staging"
codec => "json"
}
/* other redis inputs, only key/type variations */
}
filter {
grok {
match => ["msg", "Cache hit %{WORD:query} in %{NUMBER:hit_total:int}ms. Network: %{NUMBER:hit_network:int} ms. Deserialization %{NUMBER:hit_deserial:int}"]
add_tag => ["cache_hit"]
tag_on_failure => []
}
/* other groks, not related to type field */
}
output {
elasticsearch {
host => "[IP]"
port => "9200"
protocol=> "http"
cluster => "logstash-prod-2"
}
}
According to your log message:
{"error":"IllegalArgumentException[Malformed action/metadata line [107], expected a simple value for field [_type] but found [START_ARRAY]]","status":500},
It seems you're trying to index a document with a type field that's an array instead of a string.
I can't help you without more of the logstash.conf file.
But check followings to make sure:
When you use add_field for changing the type you actually turn type into an array with multiple values, which is what Elasticsearch is complaining about.
You can use mutate join to convert arrays to strings: api link
filter {
mutate {
join => { "fieldname" => "," }
}
}

Copy ElasticSearch-Index with Logstash

I have an ready-build Apache-Index on one machine, that I would like to clone to another machine using logstash. Fairly easy i thought
input {
elasticsearch {
host => "xxx.xxx.xxx.xxx"
index => "logs"
}
}
filter {
}
output {
elasticsearch {
cluster => "Loa"
host => "127.0.0.1"
protocol => http
index => "logs"
index_type => "apache_access"
}
}
that pulls over the docs, but doesn't stop as it uses the default query "*" (the original index has ~50.000 docs and I killed the former script, when the new index was over 600.000 docs and rising)
Next I tried to make sure the docs would get updated instead of duplicated, but this commit hasn't made it yet, so i don't have a primary..
Then I remembered sincedb but don't seem to be able to use that in the query (or is that possible)
Any advice? Maybe a complete different approach? Thanks a lot!
Assuming that the elasticsearch input creates a logstash event with the document id ( I assume it will be _id or something similar), try setting the elastic search output the following way:
output {
elasticsearch {
cluster => "Loa"
host => "127.0.0.1"
protocol => http
index => "logs"
index_type => "apache_access"
document_id => "%{_id}"
}
}
That way, even if the elasticsearch input, for whatever reason, continues to push the same documents indefinitely, elasticsearch will merely updated the existing documents, instead of creating new documents with new ids.
Once you reach 50,000, you can stop.

Resources