I am using logstash with input-elasticsearch and output-elasticsearch.Both Elastic Search have a different instance.
Before the data goes to the output block,I want to aggregate some documents,create a hash of the new document and insert the nested document in the elastic search.
So basically I want to do some processing before nested document is inserted in the elasticsearch.Is this possible?
input{
# something here to get a value of variable stored in a different file
elasticsearch{
hosts=>"abc.de.fg.hi:jklm"
query=>'{--some query---}'
}
}
output{
elasticsearch{
hosts=>"xxx.xx.xx.xx:yyyy"
}
I'm using the "aggregate" plug in.
In my case the input is From UDP and i filter it with "grok" but i believe you can achieve what you want to do by tweaking the code a bit.
Without a sample of you are trying to achieve exactly, the best this i can do is show you a sample of my code:
aggregate {
task_id => “%{action}_%{progress}”
code =>
“
map[‘avg’] || = 0;
map[‘avg’] += event.get(‘elapsed’);
map[‘my_count’] || = 0;
map[‘my_count’] += 1;
if (map[‘my_count’] == ${LogstashAggregationCount})#Environment variable
event.set(‘elapsedAvg’, (map[‘avg’] / map[‘my_count’]))
event.set(‘Aggregetion’, true)
map[‘avg’] = 0
map[‘my_count’] = 0
end
“
}
if (![Aggregetion]) {
drop {}
}
Of curse you need to adapt it to your specific case. For more in depth explanation of my code read here: How to Use Logstash Aggregations
Related
I have following event or row from JDBC input.
{"academic_session_id" : "as=1|dur=2015-16,as=2|dur=2016-17",
"branch_id" : 1}
I want to convert or format it into following using logstash filters...
{"branch_id": 1,"sessions":[{"as":"1","dur":"2015-16"},{"as":"2","dur":"2016-17"}]}
If you can suggest any alternative to logstash.
Note- I am using Elasticsearch 5.X version
Since this is a pretty customized manipulation of the data, I would use the ruby filter, and just write a script using the code setting to parse the data. Something like this would work:
filter {
ruby {
code => "
academic_session = event.get('academic_session_id').split(',').map{|data| data.split('|')}
sessions = academic_session.map do |arr|
temp_hash = {}
arr.each do |kv|
k,v = kv.split('=')
temp_hash[k] = v
end
temp_hash
end
event.set('sessions', sessions)
"
remove_field => ['academic_session_id']
}
}
I'm facing a problem with logstash configuration. You can find my logstash configuration below.
Ruby filter removes every dot - "." from my fields. It seems that every works fine - the result of data filtration is correct but elasticsearch magically responds with: "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"Field name [/ConsumerAdminWebService/getConsumerTransactions.call] cannot contain '.'"} where getConsumerTransactions.call is one of my field key.
input {
http_poller {
urls => {
uatBackend1 => {
method => get
url => "http://some-url/"
headers => {
Accept => "application/json"
}
}
}
request_timeout => 60
# Run every 30 seconds
schedule => { cron => "* * * * * UTC"}
codec => "json"
metadata_target => "http_poller_metadata"
}
}
filter {
ruby {
init => "
def remove_dots hash
new = Hash.new
hash.each { |k,v|
if v.is_a? Hash
v = remove_dots(v)
end
new[ k.gsub('.','_') ] = v
if v.is_a? Array
v.each { |elem|
if elem.is_a? Hash
elem = remove_dots(elem)
end
new[ k.gsub('.','_') ] = elem
} unless v.nil?
end
} unless hash.nil?
return new
end
"
code => "
event.instance_variable_set(:#data,remove_dots(event.to_hash))
"
}
}
output {
elasticsearch {
hosts => localhost
}
}
I'm afraid that this line of code is not correct: event.instance_variable_set(:#data,remove_dots(event.to_hash)) - result data is somehow pinned to the event but the original data persists unchanged and is delivered to Elasticsearch api.
I suppose some clarifications are required here:
I use ES version > 2.0 so dots are not allowed
Ruby filter should replace dots with "_" and it works great - resulting data is fully correct however ES replies with mentioned error. I suspect that filter does not replace event data but simply adds a new filed to Event object. ES then still reads primal data not the updated one.
To be honest Ruby is a magic to me :)
If you're using the ES version 2.0 it could be a version issue where ES doesn't pick up fields which contains . dots.
According to this response in this thread:
Field names cannot contain the . character in Elasticsearch 2.0.
As a work around you might have to mutate (rename) your field names into something like _ or - instead of using the . dot. This ticket pretty much explains this issue, where as . dots can be used in the ES versions which are after 2.0. Hope it helps!
How do I create a bucket aggregation based on a field value, and then run a query that gives me hits for each bucket (and not each document) with filters?
New in elastic search and I will appreciate any help!
I'm working my way through this currently. So far, I've figured out how to aggregate on a field, but haven't been able to apply filters yet. Please let me know if you've made progress as I see this was posted a while ago...
# connect to elastic search
elastic::connect('connection_string',
es_port = 9200)
# define aggregation
aggs <- list(
aggs = list(
field_name = list(
terms = list(
field = "field_name"
)
)
)
)
# search
Search(index = 'index_name',
body = aggs,
asdf = T)
I managed to apply filter for date as follows:
last_week <- query('{
"range" : {
"your_date_field" : {
"gte" : "mondays-date",
"lt" : "sundays-date"
}
}
}')
I used this as the main query. I think you can apply the aggregating to this with the %search% (last_week + agg) notation and it should work.
I am using MVC3 and have done a search facility in my controller.I have used the model first approach , what I want to be able to allow the user to search for results that contain the given keyword(s) in the data.
If there are no matches to the search term then display an appropriate message.
If there are matching stories:
Display a message like “7 items match your search criteria: 'XXXXX'”
Any help would be much appreciated , Thanks
would it be something like this but with use of the ViewBag to display a message?.
if (!String.IsNullOrEmpty(SearchString))
News = News.Where(s => s.Headline.Count(SearchString));
}
You need to use string.Contains for partial string matching:
var matchingResults = News.Where(s => s.Headline.Contains(searchString));
int count = matchingResults.Count();
if(count == 0)
{
//no matches
}
else
{
//display message
}
I have a bunch of posts which have category tags in them.
I am trying to find out how many times each category has been used.
I'm using rails with mongodb, BUT I don't think I need to be getting the occurrence of categories from the db, so the mongo part shouldn't matter.
This is what I have so far
#recent_posts = current_user.recent_posts #returns the 10 most recent posts
#categories_hash = {'tech' => 0, 'world' => 0, 'entertainment' => 0, 'sports' => 0}
#recent_posts do |cat|
cat.categories.each do |addCat|
#categories_hash.increment(addCat) #obviously this is where I'm having problems
end
end
end
the structure of the post is
{"_id" : ObjectId("idnumber"), "created_at" : "Tue Aug 03...", "categories" :["world", "sports"], "message" : "the text of the post", "poster_id" : ObjectId("idOfUserPoster"), "voters" : []}
I'm open to suggestions on how else to get the count of categories, but I will want to get the count of voters eventually, so it seems to me the best way is to increment the categories_hash, and then add the voters.length, but one thing at a time, i'm just trying to figure out how to increment values in the hash.
If you aren't familiar with map/reduce and you don't care about scaling up, this is not as elegant as map/reduce, but should be sufficient for small sites:
#categories_hash = Hash.new(0)
current_user.recent_posts.each do |post|
post.categories.each do |category|
#categories_hash[category] += 1
end
end
If you're using mongodb, an elegant way to aggregate tag usage would be, to use a map/reduce operation. Mongodb supports map/reduce operations using JavaScript code. Map/reduce runs on the db server(s), i.e. your application does not have to retrieve and analyze every document (which wouldn't scale well for large collections).
As an example, here are the map and reduce functions I use in my blog on the articles collection to aggregate the usage of tags (which is used to build the tag cloud in the sidebar). Documents in the articles collection have a key named 'tags' which holds an array of strings (the tags)
The map function simply emits 1 on every used tag to count it:
function () {
if (this.tags) {
this.tags.forEach(function (tag) {
emit(tag, 1);
});
}
}
The reduce function sums up the counts:
function (key, values) {
var total = 0;
values.forEach(function (v) {
total += v;
});
return total;
}
As a result, the database returns a hash that has a key for every tag and its usage count as a value. E.g.:
{ 'rails' => 5, 'ruby' => 12, 'linux' => 3 }