I have certain documents with a name: String and a version: Integer.
What I need is a list of documents of the highest version per name.
So I think I need to do the equivalent of group by in sql and then a having for max version per name.
I have no idea where to start to do this with mongoDB. If anyone could make this query for the mongo terminal that would be a great start, but an added bonus would be to give the sytnax for MongoMapper specifically.
If you are on Mongodb version 2.2+, you can do your query by using the aggregation framework with the group pipeline operator.
The documentation is here: http://docs.mongodb.org/manual/reference/aggregation/
MongoMapper doesn't have a helper for the aggregation framework but you can use the Ruby driver directly (driver version 1.7.0+ has an aggregate helper method). You would have to get an instance of Mongo::Collection and call the aggregate method on it. For example:
Model.collection.aggregate(["$group" =>
{"_id" => "$name",
"max_version" => {"$max" => "$version"}}
])
I hope that helps!
If you want to do a group by with Mongo DB, check the Aggregation Framework, it is the exact tool for the job !
Here you'll find the equivalent in Aggregation Framework for GROUP BY, HAVING, and more.
Thanks to #Simon I had a look at Map Reduce with MongoMapper. My take on it is probably not perfect, but it does what I want it to do. Here's the implementation:
class ChildTemplate
...
key :name, String
key :version, Integer, :default => 1
...
private
def self.map
<<-JS
function() {
emit(this.name, this);
}
JS
end
private
def self.reduce
<<-JS
function(key, values) {
var res = values[0];
for(var i=1; i<values.length; i++)
{
if(values[i].version > res.version)
{
res = values[i];
}
}
return res;
}
end
def self.latest_versions(opts = {})
results = []
opts[:out] = "ct_latest_versions"
ChildTemplate.collection.map_reduce(map, reduce, opts).find().each do |map_hash|
results << map_hash["value"]
end
return results
end
Related
I have following event or row from JDBC input.
{"academic_session_id" : "as=1|dur=2015-16,as=2|dur=2016-17",
"branch_id" : 1}
I want to convert or format it into following using logstash filters...
{"branch_id": 1,"sessions":[{"as":"1","dur":"2015-16"},{"as":"2","dur":"2016-17"}]}
If you can suggest any alternative to logstash.
Note- I am using Elasticsearch 5.X version
Since this is a pretty customized manipulation of the data, I would use the ruby filter, and just write a script using the code setting to parse the data. Something like this would work:
filter {
ruby {
code => "
academic_session = event.get('academic_session_id').split(',').map{|data| data.split('|')}
sessions = academic_session.map do |arr|
temp_hash = {}
arr.each do |kv|
k,v = kv.split('=')
temp_hash[k] = v
end
temp_hash
end
event.set('sessions', sessions)
"
remove_field => ['academic_session_id']
}
}
I'm facing a problem with logstash configuration. You can find my logstash configuration below.
Ruby filter removes every dot - "." from my fields. It seems that every works fine - the result of data filtration is correct but elasticsearch magically responds with: "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"Field name [/ConsumerAdminWebService/getConsumerTransactions.call] cannot contain '.'"} where getConsumerTransactions.call is one of my field key.
input {
http_poller {
urls => {
uatBackend1 => {
method => get
url => "http://some-url/"
headers => {
Accept => "application/json"
}
}
}
request_timeout => 60
# Run every 30 seconds
schedule => { cron => "* * * * * UTC"}
codec => "json"
metadata_target => "http_poller_metadata"
}
}
filter {
ruby {
init => "
def remove_dots hash
new = Hash.new
hash.each { |k,v|
if v.is_a? Hash
v = remove_dots(v)
end
new[ k.gsub('.','_') ] = v
if v.is_a? Array
v.each { |elem|
if elem.is_a? Hash
elem = remove_dots(elem)
end
new[ k.gsub('.','_') ] = elem
} unless v.nil?
end
} unless hash.nil?
return new
end
"
code => "
event.instance_variable_set(:#data,remove_dots(event.to_hash))
"
}
}
output {
elasticsearch {
hosts => localhost
}
}
I'm afraid that this line of code is not correct: event.instance_variable_set(:#data,remove_dots(event.to_hash)) - result data is somehow pinned to the event but the original data persists unchanged and is delivered to Elasticsearch api.
I suppose some clarifications are required here:
I use ES version > 2.0 so dots are not allowed
Ruby filter should replace dots with "_" and it works great - resulting data is fully correct however ES replies with mentioned error. I suspect that filter does not replace event data but simply adds a new filed to Event object. ES then still reads primal data not the updated one.
To be honest Ruby is a magic to me :)
If you're using the ES version 2.0 it could be a version issue where ES doesn't pick up fields which contains . dots.
According to this response in this thread:
Field names cannot contain the . character in Elasticsearch 2.0.
As a work around you might have to mutate (rename) your field names into something like _ or - instead of using the . dot. This ticket pretty much explains this issue, where as . dots can be used in the ES versions which are after 2.0. Hope it helps!
So I've got my code trying to select an object from an array of objects, and if the object isn't found, I want to create my defaults.
lead_time = lead_times.select{|d| LeadTimeProfile.new unless d.day_of_week == day }
however, from what I can tell, this is not returning me the devault LeadTimeProfile.
is there a way of doing this? Or have I got it right?
So I've got my code trying to select an object from an array of objects, and if the object isn't found, I want to create my defaults.
Take a look at Enumerable#find
lead_time = lead_times.find{ |d| d.day_of_week == day } || LeadTimeProfile.new
filter your array first, and then do the construction
lead_time = lead_times.select{|d| d.day_of_week == day}.map {|d| LeadTimeProfile.new(d)}
Passing a lambda as a parameter also works.
lead_time = lead_times.find(lambda { LeadTimeProfile.new } ){ |d| d.day_of_week == day }
Here is another way to get the same results as what Kyle posted. There is no difference between this and using a or gate other than maybe making chaining method calls a bit cleaner.
day = 2
lead_times.find(-> { LeadTimeProfile.new }) { |p|
p.day_of_week == day
}.day_of_week
I have a bunch of posts which have category tags in them.
I am trying to find out how many times each category has been used.
I'm using rails with mongodb, BUT I don't think I need to be getting the occurrence of categories from the db, so the mongo part shouldn't matter.
This is what I have so far
#recent_posts = current_user.recent_posts #returns the 10 most recent posts
#categories_hash = {'tech' => 0, 'world' => 0, 'entertainment' => 0, 'sports' => 0}
#recent_posts do |cat|
cat.categories.each do |addCat|
#categories_hash.increment(addCat) #obviously this is where I'm having problems
end
end
end
the structure of the post is
{"_id" : ObjectId("idnumber"), "created_at" : "Tue Aug 03...", "categories" :["world", "sports"], "message" : "the text of the post", "poster_id" : ObjectId("idOfUserPoster"), "voters" : []}
I'm open to suggestions on how else to get the count of categories, but I will want to get the count of voters eventually, so it seems to me the best way is to increment the categories_hash, and then add the voters.length, but one thing at a time, i'm just trying to figure out how to increment values in the hash.
If you aren't familiar with map/reduce and you don't care about scaling up, this is not as elegant as map/reduce, but should be sufficient for small sites:
#categories_hash = Hash.new(0)
current_user.recent_posts.each do |post|
post.categories.each do |category|
#categories_hash[category] += 1
end
end
If you're using mongodb, an elegant way to aggregate tag usage would be, to use a map/reduce operation. Mongodb supports map/reduce operations using JavaScript code. Map/reduce runs on the db server(s), i.e. your application does not have to retrieve and analyze every document (which wouldn't scale well for large collections).
As an example, here are the map and reduce functions I use in my blog on the articles collection to aggregate the usage of tags (which is used to build the tag cloud in the sidebar). Documents in the articles collection have a key named 'tags' which holds an array of strings (the tags)
The map function simply emits 1 on every used tag to count it:
function () {
if (this.tags) {
this.tags.forEach(function (tag) {
emit(tag, 1);
});
}
}
The reduce function sums up the counts:
function (key, values) {
var total = 0;
values.forEach(function (v) {
total += v;
});
return total;
}
As a result, the database returns a hash that has a key for every tag and its usage count as a value. E.g.:
{ 'rails' => 5, 'ruby' => 12, 'linux' => 3 }
Maybe a simple question, I'm trying to get a result from a table where the Name column contains all of an array of search terms. I'm creating a query and looping through my search strings, each time assigning the query = query.Where(...);. It appears that only the last term is being used, I supposed because I am attempting to restrict the same field each time. If I call .ToArray().AsQueryable() with each iteration I can get the cumlative restrinction behavior I'm looking for, but it there an easy way to do this using defered operators only?
Thanks!
If you're doing something like:
foreach (int foo in myFooArray)
{
query = query.where(x => x.foo == foo);
}
...then it will only use the last one since each where criteria will contain a reference to the 'foo' loop variable.
If this is what you're doing, change it to:
foreach (int foo in myFooArray)
{
int localFoo = foo;
query = query.where(x => x.foo == localFoo);
}
...and everything should be fine again.
If this is not what is happening, please provide a code sample of what you're doing...