MongoDB aggregate data to generate 'latest activity' - ruby

I have a mongodb collection that has documents like the ones below:
[
{
:event => {:type => 'comment_created'},
:item => {:id => 10},
:created_at => {:t => '11:19:03 +0100 2010', :d=> 'Fri, 19 Nov 2010'}
}
,
{
:event => {:type => 'vote_created'},
:item => {:id => 10},
:created_at => {:t => '11:19:03 +0100 2010', :d => 'Fri, 19 Nov 2010'}
}
]
What I need is to build a 'dashboard' aggregating latest activity (on current day) for each item. The result should be something like:
{
:item_id => 10,
:events => {
:vote_created => [.. ordered list with latest 3 vote_created events/documents],
:comment_created => [.. ordered list with latest 3 comment_created events/documents ],
}
}
The result would be used to construct a 'Facebook-style' syntax like: 'Mike, John and 3 others added comments on your item today.'
How can I aggregate this data using a group or a map-reduce function?

OK, there are two ways to do this:
Method #1: Map-Reduce
So first, you'll want to run a map-reduce, not a group.
Use Map-Reduce with the "out" variable which will generate a new collection. You'll then be able to run the summary queries against that new collection.
The reason you'll do this is that you're asking for an expensive query, so it's much more reasonable to access it in "not-quite" real-time.
Method #2: Double-writes
You can basically maintain two collections "details" (top one) and "summary" (bottom one). Whenever you do a write to the details, also perform an update to the summary.
MongoDB has several array methods ($push, $pull, $slice), that should make it possible to keep the "vote_created" array up-to-date.
Preferences
The method you select completely depends on the type of architecture you have and the user experience that you want. Personally, I would just use Method #2 and just keep appending to the "vote_created" array. I would put the 'Mike, John and 3 others...' syntax somewhere on the view, b/c it's really view logic not DB logic.
Yes method #2 takes more space, but it also gives you quick answers to the questions you ask alot. So you're going to have to sacrifice space to get that speed.

http://rickosborne.org/download/SQL-to-MongoDB.pdf

Related

getstram.io update activity does not move it up the feed

Assuming I have a music app (like the example) using GetStream.io feeds, where I have user feeds and band feeds (user being able to follow other users, or bands).
I am using the stram-ruby gem.
Now, say I am posting as a user to a band feed, doing something like:
user_feed = #client.feed('user', user_id)
activity_data = {
:actor => "User:#{user_id}",
:verb => 'post',
:object => "Post:#{post.id}",
:target => "Band:#{band_id}",
:foreign_id => "Post:#{post.id}",
:time => post.created_at.as_json,
:comment => 'comment 1',
:to => ["band:#{band_id}"]
}
user_feed.add_activity(activity_data)
This works fine, and retrieving the band feed, I can see that post:
#client.feed('band', band_id).get()
Now, I am attempting to update this entry (using the comment field just to see something changes):
activity_data = {
:actor => "User:#{user_id}",
:verb => 'post',
:object => "Post:#{post.id}",
:target => "Band:#{band_id}",
:foreign_id => "Post:#{post.id}",
:time => post.created_at.as_json,
:comment => 'comment 2',
:to => ["band:#{band_id}"]
}
#client.update_activity(activity_data)
getting the band feed will show correctly the new comment ('comment 2'), but my problem is this:
I am using the created_at field of my post (to ensure my ability to update it as GetStream docs say this is part of the unique key)
I use the same created_at time on the update
When fetching the band feed, I would expect this activity to be the top one - as the feed should be sorted chronologically. But it remains where it was before based on the created_at time.
What to do?
I can try using the updated_at field of the post, but then, if for any reason the post in my DB changes without me updating the GetStream feed, I will no longer be able to update it in GetStream.
Am I missing anything?
There are 2 ways in which you can achieve this.
1.) You could use aggregated feeds. They are sorted based on the last update to the aggregated item. (assuming this fits your use case of course)
2.) You could send a field to Stream called updated_at. Next you enable ranked feeds and simply tell Stream to sort by updated_at instead of the regular chronological sort.
In general, creating a ranking method gives you full control over the ranking of your feed. (paid plans only though)
https://getstream.io/docs/#custom_ranking

Problems querying Factual API data using Ruby wrapper

Trying to implement Factual API with provided Ruby wrapper. Looking to return all Bars within certain number of meters of a geo point. My query looks like this:
factual.table("places").search("category_id"=>"312").geo("$circle" => {"$center" => [40.7811, -73.98], "$meters" => 10000}).rows
This returns a 200:OK response, but 0 records (even though that location is in Manhattan). I'm pretty sure there's a problem with the way I'm passing in the category info.
Per the API documentation, I've also tried passing the category data like this, which returns a syntax error:
factual.table("places").filters("category_ids" => {"category_ids":["6"]}).geo("$circle" => {"$center" => [40.7811, -73.98], "$meters" => 10000}).rows
and this, which returns the Factual error *references unknown field category_ids*:
factual.table("places").filters("category_ids" => {"$in" => ["312", "338"]}).geo("$circle" => {"$center" => [40.7811, -73.98], "$meters" => 10000}).rows
I'm following the documentation samples here and using v3: https://github.com/Factual/factual-ruby-driver/wiki/Read-API and here: http://developer.factual.com/display/docs/Places+API+-+Global+Place+Attributes
EDIT:
I've also tried changing the filters method to search like this:
factual.table("places").search("category_ids" => {"$in" => ["312", "338"]}).geo("$circle" => {"$center" => [40.7811, -73.98], "$meters" => 10000}).rows
This returns records with 338 in the address, irrespective of category. Very strange. I've been trying different things for hours. I'm pretty sure it's an issue with how I'm passing in category information. I'm following the docs as closely as I can, but I can't get it to work.
Try this
factual.table("places").filters("category" => "Bars")
.geo("$circle" => {"$center" =>[40.7811, -73.98] , "$meters" => 1000}).rows

Find documents including element in Array field with mongomapper?

I am new to mongodb/mongomapper and can't find an answer to this.
I have a mongomapper class with the following fields
key :author_id, Integer
key :partecipant_ids, Array
Let's say I have a "record" with the following attributes:
{ :author_id => 10, :partecipant_ids => [10,15,201] }
I want to retrieve all the objects where the partecipant with id 15 is involved.
I did not find any mention in the documentation.
The strange thing is that previously I was doing this query
MessageThread.where :partecipant_ids => [15]
which worked, but after (maybe) some change in the gem/mongodb version it stopped working.
Unfortunately I don't know which version of mongodb and mongomapper I was using before.
In the current versions of MongoMapper, this will work:
MessageThread.where(:partecipant_ids => 15)
And this should work as well...
MessageThread.where(:partecipant_ids => [15])
...because plucky autoexpands that to:
MessageThread.where(:partecipant_ids => { :$in => [15] })
(see https://github.com/jnunemaker/plucky/blob/master/lib/plucky/criteria_hash.rb#L121)
I'd say take a look at your data and try out queries in the Mongo console to make sure you have a working query. MongoDB queries translate directly to MM queries except for the above (and a few other minor) caveats. See http://www.mongodb.org/display/DOCS/Querying

Should I convert my timestamp in my controller or view (CakePHP)?

I'm working at understanding best practices for MVC (using CakePHP) and am trying to understand if a certain task should be happening in the controller or the view.
Here's the scenario:
I have a table of users. Each user has many events associated with them.
In my controller I'm loading content about a user into an array like so:
$this->set('user', $this->User->read());
That results in a user array I can loop through on my view page:
Array
(
[User] => Array
(
[id] => 1
[name] => Jane Doe
[created] => 2011-03-29 15:50:25
[modified] => 1301428225
)
[Event] => Array
(
[0] => Array
(
[id] => 4
[user_id] => 1
[title] => Birthday
[created] => 2011-04-07 17:28:53
[modified] => 2011-04-07 17:28:53
[occured] => 1301889600
)
[1] => Array
(
[id] => 3
[user_id] => 1
[title] => Anniversary
[created] => 2011-04-07 17:21:27
[modified] => 2011-04-07 17:21:27
[occured] => 1301976000
)
[2] => Array
(
[id] => 2
[user_id] => 1
[title] => Graduation
[created] => 2011-04-07 17:20:41
[modified] => 2011-04-07 17:20:41
[occured] => 1301889600
)
)
Now, occured is a timestamp which I need to convert to a friendly date.
Should I:
A) Do this in the controller? If so, what's the best syntax to dig into the array and do that?
B) Do it in the view when it's called?
<?=date("d/m/Y,$thisEvent['occured']);?>
The latter seems cleaner with less code, but I don't know if it's logic I can / should be applying from the controller.
If you are going to loop through an array in the view anyway in order to display the events, I would just put the conversion in the view to save having to loop in the controller as well.
To me, the date formatting is a presentation thing - not a business logic thing.
Amy
The rule of thumb is this: Fat Model to Skinny View. The View should not handle logic. Period. The view should not handle alteration of data, it should only handle displaying it. The Model is where data manipulation should take place. But sometimes as coders we want to take the most easy way of doing things. Which isn't necessarily wrong, but isn't always the best way of doing thing.
My advice is one of the following (order of preference):
1- Alter the date format on the database so it comes out in the format you are looking for. Then there is no extra logic required for the formatting of the date.
2- Write the query as a join. (Which is what you should be doing in this case since recursive is on). Write the join so that the data comes back formatted as you expect. The problem is you are taking the shortcut by using $this->User-Read() which is causing you to have to write additional logic outside of the model to handle the formatting for the dates. The Model->read() has it's uses, this is not one of them.
3- If you are bent on using the Model->read() shortcut, you can build a date helper that you can reference from the view for any given date. For example, when you display the field in the view, you would call:
<?php echo $this->Date->format($thisEvent['occured']); ?>
Then the helper is where you would contain the code as follows:
function format($date) {
return date('d/m/Y', strtotime($date));
}
This way, if you ever decide you want to change the look of the dates, you only have to change the helper (1 location) instead of all of the views (multiple locations).
I would suggest that:
the Models return DateTime objects or similar. This way, it can be easy in the Controllers or the Views to make calculations/formatting on dates when necessary (good maintainability).
the Views, as said in another answer, should be kept as simple as possible (readability). With Smarty + working with specific DateTime objects, I can simply write {$someItem.someDate} in Views and the dates are formatted automatically (according to current user's language), by using a __toString() method of my specific DateTime class.

Retrieving array of ids in Mongoid

how do you retrieve an array of IDs in Mongoid?
arr=["id1","id2"]
User.where(:id=>arr)
You can do this easily if you are retrieving another attribute
User.where(:nickname.in=>["kk","ll"])
But I am wondering how to do this in mongoid -> this should be a very simple and common operation
Remember that the ID is stored as :_id and not :id . There is an id helper method, but when you do queries, you should use :_id:
User.where(:_id.in => arr)
Often I find it useful to get a list of ids to do complex queries, so I do something like:
user_ids = User.only(:_id).where(:foo => :bar).distinct(:_id)
Post.where(:user_id.in => user_ids)
Or simply:
arr = ['id1', 'id2', 'id3']
User.find(arr)
The above method suggested by browsersenior doesn't seem to work anymore, at least for me. What I do is:
User.criteria.id(arr)
user_ids = User.only(:_id).where(:foo => :bar).map(&:_id)
Post.where(:user_id.in => user_ids)
The solution above works fine when amount of users is small. But it will require a lot of memory while there are thousands of users.
User.only(:_id).where(:foo => :bar).map(&:_id)
will create a list of User objects with nil in each field except id.
The solution (for mongoid 2.5):
User.collection.master.where(:foo => :bar).to_a.map {|o| o['_id']}

Resources