MongoDB + Ruby: updating records in an iteration - ruby

Using MongoDB and the Ruby driver, I'm trying to calculate the rankings for players in my app, so I'm sorting by (in this case) pushups, and then adding a rank field and value per object.
pushups = coll.find.sort(["pushups", -1] )
pushups.each_with_index do |r, idx|
r[:pushups_rank] = idx + 1
coll.update( {:id => r }, r, :upsert => true)
coll.save(r)
end
This approach does work, but is this the best way to iterate over objects and update each one? Is there a better way to calculate a player's rank?

Another approach would be to do the entire update on the server by executing a javascript function:
update_rank = "function(){
var rank=0;
db.players.find().sort({pushups:-1}).forEach(function(p){
rank +=1;
p.rank = rank;
db.players.save(p);
});
}"
cn.eval( update_rank )
(Code assumes you have a "players" collection in mongo, and a ruby variable cn that holds a conection to your database)

Related

How to get rows as Arrays (not Hashes) in Sequel ORM?

In the Sequel ORM for Ruby, the Dataset class has an all method which produces an Array of row hashes: each row is a Hash with column names as keys.
For example, given a table T:
a b c
--------------
0 22 "Abe"
1 35 "Betty"
2 58 "Chris"
then:
ds = DB['select a, b, c from T']
ah = ds.all # Array of row Hashes
should produce:
[{"a":0,"b":22,"c":"Abe"},{"a":1,"b":35,"c":"Betty"},{"a":2,"b":58,"c":"Chris"}]
Is there a way built in to Sequel to instead produce an Array of row Arrays, where each row is an array of only the values in each row in the order specified in the query? Sort of how select_rows works in ActiveRecord? Something like this:
aa = ds.rows # Array of row Arrays
which would produce:
[[0,22,"Abe"],[1,35,"Betty"],[2,58,"Chris"]]
Note: the expression:
aa = ds.map { |h| h.values }
produces an array of arrays, but the order of values in the rows is NOT guaranteed to match the order requested in the original query. In this example, aa might look like:
[["Abe",0,22],["Betty",1,35],["Chris",2,58]]
Old versions of Sequel (pre 2.0) had the ability in some adapters to return arrays instead of hashes. But it caused numerous issues, nobody used it, and I didn't want to maintain it, so it was removed. If you really want arrays, you need to drop down to the connection level and use a connection specific method:
DB.synchronize do |conn|
rows = conn.exec('SQL Here') # Hypothetical example code
end
The actual code you need will depend on the adapter you are using.
DB[:table].where().select_map(:id)
If you want just an array of array of values...
DB['select * from T'].map { |h| h.values }
seems to work
UPDATE given the updated requirement of the column order matching the query order...
cols= [:a, :c, :b]
DB[:T].select{cols}.collect{ |h| cols.collect {|c| h[c]}}
not very pretty but guaranteed order is the same as the select order.
There does not appear to be a builtin to do this.
You could make a request for the feature.
I haven't yet found a built-in method to return an array of row arrays where the values in the row arrays are ordered by the column order in the original query. The following function does* although I suspect an internal method could be more effecient:
def rows( ds )
ret = []
column_keys = ds.columns # guaranteed to match query order?
ds.all { |row_hash|
row_array = []
column_keys.map { |column_key| row_array << row_hash[column_key] }
ret << row_array
}
ret
end
*This function depends on the order of the array returned by Dataset.columns. If this order is undefined, then this rows function isn't very useful.
have you tried this?
ds = DB['select a, b, c from T'].to_a
not sure it it works but give it a shot.

increment value in a hash

I have a bunch of posts which have category tags in them.
I am trying to find out how many times each category has been used.
I'm using rails with mongodb, BUT I don't think I need to be getting the occurrence of categories from the db, so the mongo part shouldn't matter.
This is what I have so far
#recent_posts = current_user.recent_posts #returns the 10 most recent posts
#categories_hash = {'tech' => 0, 'world' => 0, 'entertainment' => 0, 'sports' => 0}
#recent_posts do |cat|
cat.categories.each do |addCat|
#categories_hash.increment(addCat) #obviously this is where I'm having problems
end
end
end
the structure of the post is
{"_id" : ObjectId("idnumber"), "created_at" : "Tue Aug 03...", "categories" :["world", "sports"], "message" : "the text of the post", "poster_id" : ObjectId("idOfUserPoster"), "voters" : []}
I'm open to suggestions on how else to get the count of categories, but I will want to get the count of voters eventually, so it seems to me the best way is to increment the categories_hash, and then add the voters.length, but one thing at a time, i'm just trying to figure out how to increment values in the hash.
If you aren't familiar with map/reduce and you don't care about scaling up, this is not as elegant as map/reduce, but should be sufficient for small sites:
#categories_hash = Hash.new(0)
current_user.recent_posts.each do |post|
post.categories.each do |category|
#categories_hash[category] += 1
end
end
If you're using mongodb, an elegant way to aggregate tag usage would be, to use a map/reduce operation. Mongodb supports map/reduce operations using JavaScript code. Map/reduce runs on the db server(s), i.e. your application does not have to retrieve and analyze every document (which wouldn't scale well for large collections).
As an example, here are the map and reduce functions I use in my blog on the articles collection to aggregate the usage of tags (which is used to build the tag cloud in the sidebar). Documents in the articles collection have a key named 'tags' which holds an array of strings (the tags)
The map function simply emits 1 on every used tag to count it:
function () {
if (this.tags) {
this.tags.forEach(function (tag) {
emit(tag, 1);
});
}
}
The reduce function sums up the counts:
function (key, values) {
var total = 0;
values.forEach(function (v) {
total += v;
});
return total;
}
As a result, the database returns a hash that has a key for every tag and its usage count as a value. E.g.:
{ 'rails' => 5, 'ruby' => 12, 'linux' => 3 }

Why is this LINQ so slow?

Can anyone please explain why the third query below is orders of magnitude slower than the others when it oughtn't to take any longer than doing the first two in sequence?
var data = Enumerable.Range(0, 10000).Select(x => new { Index = x, Value = x + " is the magic number"}).ToList();
var test1 = data.Select(x => new { Original = x, Match = data.Single(y => y.Value == x.Value) }).Take(1).Dump();
var test2 = data.Select(x => new { Original = x, Match = data.Single(z => z.Index == x.Index) }).Take(1).Dump();
var test3 = data.Select(x => new { Original = x, Match = data.Single(z => z.Index == data.Single(y => y.Value == x.Value).Index) }).Take(1).Dump();
EDIT: I've added a .ToList() to the original data generation because I don't want any repeated generation of the data clouding the issue.
I'm just trying to understand why this code is so slow by the way, not looking for faster alternative, unless it sheds some light on the matter. I would have thought that if Linq is lazily evaluated and I'm only looking for the first item (Take(1)) then test3's:
data.Select(x => new { Original = x, Match = data.Single(z => z.Index == data.Single(y => y.Value == x.Value).Index) }).Take(1);
could reduce to:
data.Select(x => new { Original = x, Match = data.Single(z => z.Index == 1) }).Take(1)
in O(N) as the first item in data is successfully matched after one full scan of the data by the inner Single(), leaving one more sweep of the data by the remaining Single(). So still all O(N).
It's evidently being processed in a more long winded way but I don't really understand how or why.
Test3 takes a couple of seconds to run by the way, so I think we can safely assume that if your answer features the number 10^16 you've made a mistake somewhere along the line.
The first two "tests" are identical, and both slow. The third adds another entire level of slowness.
The first two LINQ statements here are quadratic in nature. Since your "Match" element potentially requires iterating through the entire "data" sequence in order to find the match, as you progress through the range, the length of time for that element will get progressively longer. The 10000th element, for example, will force the engine to iterate through all 10000 elements of the original sequence to find the match, making this an O(N^2) operation.
The "test3" operation takes this to an entirely new level of pain, since it's "squaring" the O(N^2) operation in the second single - forcing it to do another quadratic operation on top of the first one - which is going to be a huge number of operations.
Each time you do data.Single(...) with the match, you're doing an O(N^2) operation - the third test basically becomes O(N^4), which will be orders of magnitude slower.
Fixed.
var data = Enumerable.Range(0, 10000)
.Select(x => new { Index = x, Value = x + " is the magic number"})
.ToList();
var forward = data.ToLookup(x => x.Index);
var backward = data.ToLookup(x => x.Value);
var test1 = data.Select(x => new { Original = x,
Match = backward[x.Value].Single()
} ).Take(1).Dump();
var test2 = data.Select(x => new { Original = x,
Match = forward[x.Index].Single()
} ).Take(1).Dump();
var test3 = data.Select(x => new { Original = x,
Match = forward[backward[x.Value].Single().Index].Single()
} ).Take(1).Dump();
In the original code,
data.ToList() generates 10,000 instances (10^4).
data.Select( data.Single() ).ToList() generates 100,000,000 instances (10^8).
data.Select( data.Single( data.Single() ) ).ToList() generates 100,000,000,000,000,000 instances (10^16).
Single and First are different. Single throws if multiple instances are encountered. Single must fully enumerate its source to check for multiple instances.

How to prevent double round trip with Linq and ToArray() Method

I am trying to use an Array instead of a list in my query. But I must get the count first before I can iterate through the objects returned from the database. Here is my code:
var FavArray = favorites.OrderByDescending(y => y.post_date).Skip((page - 1) * config.MaxRowsPerPage).Take(config.MaxRowsPerPage).ToArray();
int FavArrayCount = FavArray.Count(); //Is this a round trip to the database?
for (int y = 0; y < FavArrayCount; y++)
{
q = new PostType();
q.Title = FavArray[y].post_title;
q.Date = FavArray[y].post_date;
q.PostID = FavArray[y].post_id;
q.Username = FavArray[y].user_username;
q.UsernameLowered = FavArray[y].user_username.ToLower();
q.CategoryID = FavArray[y].catid;
q.CategoryName = FavArray[y].name;
q.TitleSlug = FavArray[y].post_titleslug;
}
As you can see I need the count before I start iterating and I am worried that getting the count my make a trip to the database. Is this true?
FavArray.Count() will not round trip, because you have already converted it to an array, which is no longer "LINQ-ified".
Once you call ToArray, any operations on the array that it returns will not go back to the server. (Unless you use a foreign key)
LINQ methods such as Count() that you call on the array will use regular LINQ to Objects and will be completely unaware of SQL Server.
In addition to other comments (it definitely won't round trip; it's just an array), you can just use favArray.Length.

Lua - Sorting a table alphabetically

I have a table that is filled with random content that a user enters. I want my users to be able to rapidly search through this table, and one way of facilitating their search is by sorting the table alphabetically. Originally, the table looked something like this:
myTable = {
Zebra = "black and white",
Apple = "I love them!",
Coin = "25cents"
}
I was able to implement a pairsByKeys() function which allowed me to output the tables contents in alphabetical order, but not to store them that way. Because of the way the searching is setup, the table itself needs to be in alphabetical order.
function pairsByKeys (t, f)
local a = {}
for n in pairs(t) do
table.insert(a, n)
end
table.sort(a, f)
local i = 0 -- iterator variable
local iter = function () -- iterator function
i = i + 1
if a[i] == nil then
return nil
else
return a[i], t[a[i]]
end
end
return iter
end
After a time I came to understand (perhaps incorrectly - you tell me) that non-numerically indexed tables cannot be sorted alphabetically. So then I started thinking of ways around that - one way I thought of is sorting the table and then putting each value into a numerically indexed array, something like below:
myTable = {
[1] = { Apple = "I love them!" },
[2] = { Coin = "25cents" },
[3] = { Zebra = "black and white" },
}
In principle, I feel this should work, but for some reason I am having difficulty with it. My table does not appear to be sorting. Here is the function I use, with the above function, to sort the table:
SortFunc = function ()
local newtbl = {}
local t = {}
for title,value in pairsByKeys(myTable) do
newtbl[title] = value
tinsert(t,newtbl[title])
end
myTable = t
end
myTable still does not end up being sorted. Why?
Lua's table can be hybrid. For numerical keys, starting at 1, it uses a vector and for other keys it uses a hash.
For example, {1="foo", 2="bar", 4="hey", my="name"}
1 & 2, will be placed in a vector, 4 & my will be placed in a hashtable. 4 broke the sequence and that's the reason for including it into the hashtable.
For information on how to sort Lua's table take a look here: 19.3 - Sort
Your new table needs consecutive integer keys and needs values themselves to be tables. So you want something on this order:
SortFunc = function (myTable)
local t = {}
for title,value in pairsByKeys(myTable) do
table.insert(t, { title = title, value = value })
end
myTable = t
return myTable
end
This assumes that pairsByKeys does what I think it does...

Resources