How to use Logstash filter to convert into nested object for elasticsearch output? - elasticsearch

I have following event or row from JDBC input.
{"academic_session_id" : "as=1|dur=2015-16,as=2|dur=2016-17",
"branch_id" : 1}
I want to convert or format it into following using logstash filters...
{"branch_id": 1,"sessions":[{"as":"1","dur":"2015-16"},{"as":"2","dur":"2016-17"}]}
If you can suggest any alternative to logstash.
Note- I am using Elasticsearch 5.X version

Since this is a pretty customized manipulation of the data, I would use the ruby filter, and just write a script using the code setting to parse the data. Something like this would work:
filter {
ruby {
code => "
academic_session = event.get('academic_session_id').split(',').map{|data| data.split('|')}
sessions = academic_session.map do |arr|
temp_hash = {}
arr.each do |kv|
k,v = kv.split('=')
temp_hash[k] = v
end
temp_hash
end
event.set('sessions', sessions)
"
remove_field => ['academic_session_id']
}
}

Related

Converting epoch time to date in logstash using ruby filter

I have a field name "timestamp" in my configuration. It holds an array of data in epoch time (miliseconds). I want to use Ruby filter to convert each epoch time in the array and convert into Date format consumable by Kibana. I am trying to convert each date field and store in a new field as an array. I am getting syntax errors. Can anyone help me out ? I am new to Ruby.
ruby {
code => {'
event.get("timestamp").each do |x| {
event["timestamp1"] = Time.at(x)
}
'}
}
I don't know about logstash, but the Ruby code you include within quotes is invalid. Try this:
ruby {
code => {'
event.get("timestamp").each { |x| event["timestamp1"] = Time.at(x) }
'}
}
If you intend your timestamp key to increment, then you need to include an index:
ruby {
code => {'
event.get("timestamp").each_with_index { |x, i| event["timestamp#{i}"] = Time.at(x) }
'}
}
//This will take an timestamp array with values in milliseconds from epoch time and create a new field with parsed time. This code is part of ruby filter Note : This does not convert into Date field format
code => '
timestamps = Array.new
event.get("timestamp").each_with_index { |x, i|
timestamps.push(Time.at(x.to_i / 1000)) }
event.set( "timestamp1" , timestamps)
'

Aggregation in Logstash-ElasticSearch

I am using logstash with input-elasticsearch and output-elasticsearch.Both Elastic Search have a different instance.
Before the data goes to the output block,I want to aggregate some documents,create a hash of the new document and insert the nested document in the elastic search.
So basically I want to do some processing before nested document is inserted in the elasticsearch.Is this possible?
input{
# something here to get a value of variable stored in a different file
elasticsearch{
hosts=>"abc.de.fg.hi:jklm"
query=>'{--some query---}'
}
}
output{
elasticsearch{
hosts=>"xxx.xx.xx.xx:yyyy"
}
I'm using the "aggregate" plug in.
In my case the input is From UDP and i filter it with "grok" but i believe you can achieve what you want to do by tweaking the code a bit.
Without a sample of you are trying to achieve exactly, the best this i can do is show you a sample of my code:
aggregate {
task_id => “%{action}_%{progress}”
code =>
“
map[‘avg’] || = 0;
map[‘avg’] += event.get(‘elapsed’);
map[‘my_count’] || = 0;
map[‘my_count’] += 1;
if (map[‘my_count’] == ${LogstashAggregationCount})#Environment variable
event.set(‘elapsedAvg’, (map[‘avg’] / map[‘my_count’]))
event.set(‘Aggregetion’, true)
map[‘avg’] = 0
map[‘my_count’] = 0
end
“
}
if (![Aggregetion]) {
drop {}
}
Of curse you need to adapt it to your specific case. For more in depth explanation of my code read here: How to Use Logstash Aggregations

Logstash to elasticsearch. Keys with dots

I'm facing a problem with logstash configuration. You can find my logstash configuration below.
Ruby filter removes every dot - "." from my fields. It seems that every works fine - the result of data filtration is correct but elasticsearch magically responds with: "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"Field name [/ConsumerAdminWebService/getConsumerTransactions.call] cannot contain '.'"} where getConsumerTransactions.call is one of my field key.
input {
http_poller {
urls => {
uatBackend1 => {
method => get
url => "http://some-url/"
headers => {
Accept => "application/json"
}
}
}
request_timeout => 60
# Run every 30 seconds
schedule => { cron => "* * * * * UTC"}
codec => "json"
metadata_target => "http_poller_metadata"
}
}
filter {
ruby {
init => "
def remove_dots hash
new = Hash.new
hash.each { |k,v|
if v.is_a? Hash
v = remove_dots(v)
end
new[ k.gsub('.','_') ] = v
if v.is_a? Array
v.each { |elem|
if elem.is_a? Hash
elem = remove_dots(elem)
end
new[ k.gsub('.','_') ] = elem
} unless v.nil?
end
} unless hash.nil?
return new
end
"
code => "
event.instance_variable_set(:#data,remove_dots(event.to_hash))
"
}
}
output {
elasticsearch {
hosts => localhost
}
}
I'm afraid that this line of code is not correct: event.instance_variable_set(:#data,remove_dots(event.to_hash)) - result data is somehow pinned to the event but the original data persists unchanged and is delivered to Elasticsearch api.
I suppose some clarifications are required here:
I use ES version > 2.0 so dots are not allowed
Ruby filter should replace dots with "_" and it works great - resulting data is fully correct however ES replies with mentioned error. I suspect that filter does not replace event data but simply adds a new filed to Event object. ES then still reads primal data not the updated one.
To be honest Ruby is a magic to me :)
If you're using the ES version 2.0 it could be a version issue where ES doesn't pick up fields which contains . dots.
According to this response in this thread:
Field names cannot contain the . character in Elasticsearch 2.0.
As a work around you might have to mutate (rename) your field names into something like _ or - instead of using the . dot. This ticket pretty much explains this issue, where as . dots can be used in the ES versions which are after 2.0. Hope it helps!

How to do build this mongoDB query (mongomapper)

I have certain documents with a name: String and a version: Integer.
What I need is a list of documents of the highest version per name.
So I think I need to do the equivalent of group by in sql and then a having for max version per name.
I have no idea where to start to do this with mongoDB. If anyone could make this query for the mongo terminal that would be a great start, but an added bonus would be to give the sytnax for MongoMapper specifically.
If you are on Mongodb version 2.2+, you can do your query by using the aggregation framework with the group pipeline operator.
The documentation is here: http://docs.mongodb.org/manual/reference/aggregation/
MongoMapper doesn't have a helper for the aggregation framework but you can use the Ruby driver directly (driver version 1.7.0+ has an aggregate helper method). You would have to get an instance of Mongo::Collection and call the aggregate method on it. For example:
Model.collection.aggregate(["$group" =>
{"_id" => "$name",
"max_version" => {"$max" => "$version"}}
])
I hope that helps!
If you want to do a group by with Mongo DB, check the Aggregation Framework, it is the exact tool for the job !
Here you'll find the equivalent in Aggregation Framework for GROUP BY, HAVING, and more.
Thanks to #Simon I had a look at Map Reduce with MongoMapper. My take on it is probably not perfect, but it does what I want it to do. Here's the implementation:
class ChildTemplate
...
key :name, String
key :version, Integer, :default => 1
...
private
def self.map
<<-JS
function() {
emit(this.name, this);
}
JS
end
private
def self.reduce
<<-JS
function(key, values) {
var res = values[0];
for(var i=1; i<values.length; i++)
{
if(values[i].version > res.version)
{
res = values[i];
}
}
return res;
}
end
def self.latest_versions(opts = {})
results = []
opts[:out] = "ct_latest_versions"
ChildTemplate.collection.map_reduce(map, reduce, opts).find().each do |map_hash|
results << map_hash["value"]
end
return results
end

MongoDB and MongoRuby: Sorting on mapreduce

I am currently trying to do a simple mapreduce over some documents stored in MongoDB. I use
map = BSON::Code.new "function() { emit(this.userid, 1); }"
for the mapping and
reduce = BSON::Code.new "function(key, values) {
var sum = 0;
values.forEach(function(value) {
sum += value;
});
return sum;
}"
for the reduction. This works fine when I call map_reduce the following way:
output = col.map_reduce(map, reduce, # col is the collection in mongodb, e.g. db.users
{
:out => {:inline => true},
:raw => true
}
)
Now to the real question: How can I use the upper call to map_reduce to enable sorting? The manual says, that I must use sort and an array of [key, direction] pairs. I guessed the following should work, but it doesn't:
output = col.map_reduce(map, reduce,
{
:sort => [["value", Mongo::ASCENDING]],
:out => {:inline => true},
:raw => true
}
)
Do I have to choose another datatype? The option also doesn't work (same error), when using an empty [], although the manual says that is the default for the option. Unfortunately the error message from MongoDB doesn't help too much:
/usr/lib/ruby/gems/1.9.1/gems/mongo-1.3.1/lib/mongo/db.rb:506:in `command': Database command 'mapreduce' failed: {"assertion"=>"sort has to be blank or an Object", "assertionCode"=>13609, "errmsg"=>"db assertion failure", "ok"=>0.0} (Mongo::OperationFailure)
from /usr/lib/ruby/gems/1.9.1/gems/mongo-1.3.1/lib/mongo/collection.rb:576:in `map_reduce'
from ./mapreduce.rb:26:in `<main>'
If you need the full runnable code, please say so in the comments. I exclude it for now as it only contains the initialization of a connection to mongodb and initialization of the collection col by querying a database.
Use a BSON::OrderedHash and it will work.
output = col.map_reduce(map, reduce,
{
:sort => BSON::OrderedHash.new[{"value", Mongo::ASCENDING}],
:out => {:inline => true},
:raw => true
}
)

Resources