increment value in a hash - ruby

I have a bunch of posts which have category tags in them.
I am trying to find out how many times each category has been used.
I'm using rails with mongodb, BUT I don't think I need to be getting the occurrence of categories from the db, so the mongo part shouldn't matter.
This is what I have so far
#recent_posts = current_user.recent_posts #returns the 10 most recent posts
#categories_hash = {'tech' => 0, 'world' => 0, 'entertainment' => 0, 'sports' => 0}
#recent_posts do |cat|
cat.categories.each do |addCat|
#categories_hash.increment(addCat) #obviously this is where I'm having problems
end
end
end
the structure of the post is
{"_id" : ObjectId("idnumber"), "created_at" : "Tue Aug 03...", "categories" :["world", "sports"], "message" : "the text of the post", "poster_id" : ObjectId("idOfUserPoster"), "voters" : []}
I'm open to suggestions on how else to get the count of categories, but I will want to get the count of voters eventually, so it seems to me the best way is to increment the categories_hash, and then add the voters.length, but one thing at a time, i'm just trying to figure out how to increment values in the hash.

If you aren't familiar with map/reduce and you don't care about scaling up, this is not as elegant as map/reduce, but should be sufficient for small sites:
#categories_hash = Hash.new(0)
current_user.recent_posts.each do |post|
post.categories.each do |category|
#categories_hash[category] += 1
end
end

If you're using mongodb, an elegant way to aggregate tag usage would be, to use a map/reduce operation. Mongodb supports map/reduce operations using JavaScript code. Map/reduce runs on the db server(s), i.e. your application does not have to retrieve and analyze every document (which wouldn't scale well for large collections).
As an example, here are the map and reduce functions I use in my blog on the articles collection to aggregate the usage of tags (which is used to build the tag cloud in the sidebar). Documents in the articles collection have a key named 'tags' which holds an array of strings (the tags)
The map function simply emits 1 on every used tag to count it:
function () {
if (this.tags) {
this.tags.forEach(function (tag) {
emit(tag, 1);
});
}
}
The reduce function sums up the counts:
function (key, values) {
var total = 0;
values.forEach(function (v) {
total += v;
});
return total;
}
As a result, the database returns a hash that has a key for every tag and its usage count as a value. E.g.:
{ 'rails' => 5, 'ruby' => 12, 'linux' => 3 }

Related

Evaluating Ruby and Mongo performance: 1.6M small records in file, takes 15 min to write 800K records to Mongo? More efficient way?

We have 1.6M records in a flat file. Each record contains three or four short strings of fewer than 100 characters.
We only need 800K of these records. We write these records to a Mongo collection. The other 800K are ignored.
It takes about 15 min to process the file, meaning we process about 1.67K records/second. Is this expected performance, or should the process be much faster (e.g., 5K records/second, 10K records/second)?
Code below (#skip is a hash of about 800K app IDs).
def updateApplicationDeviceTypes(dir, limit)
puts "Updating Application Data (Pass 3 - Device Types)..."
file = File.join(dir, '/application_device_type')
cols = getColumns(file)
device_type_id_col = cols[:device_type_id]
update = Proc.new do |id, group|
#applications_coll.update(
{ "itunes_id" => id },
{ :$set => { "devices" => group } }
# If all records for one id aren't adjacent, you'll need this instead
#{ :$addToSet => { "devices" => { :$each => group } } }
) unless !id or #skip[id.intern]
end
getValue = Proc.new { |r| r[device_type_id_col] }
batchRecords(file, cols[:application_id], update, getValue, limit)
end
# result to an array, before calling "update" on the array/id
def batchRecords(filename, idCol, update, getValue, limit=nil)
current_id = nil
current_group = []
eachRecord(filename, limit) do |r|
id = r[idCol]
value = getValue.call(r)
if id == current_id and !value.nil?
current_group << value
else
update.call(current_id, current_group) unless current_id.nil?
current_id = id
current_group = value.nil? ? [] : [value]
end
end
# Since the above is only called once for each row, we still
# have one group to update.
update.call(current_id, current_group)
end
Schema design and the read and write patterns of your application play an extremely large role in your applications performance.
I'd recommend enabling the profiler before looking at machine and IO level performance:
http://docs.mongodb.org/manual/tutorial/manage-the-database-profiler/
You may also find these talks from last year's MongoSV useful:
http://www.10gen.com/presentations/mongosv-2012/lessons-field-performance-operations
http://www.10gen.com/presentations/mongosv-2012/mongodb-performance-tuning

Keep id order as in query

I'm using elasticsearch to get a mapping of ids to some values, but it is crucial that I keep the order of the results in the order that the ids have.
Example:
def term_mapping(ids)
ids = ids.split(',')
self.search do |s|
s.filter :terms, id: ids
end
end
res = term_mapping("4,2,3,1")
The result collection should contain the objects with the ids in order 4,2,3,1...
Do you have any idea how I can achieve this?
If you need to use search you can sort ids before you send them to elasticsearch and retrive results sorted by id, or you can create a custom sort script that will return the position of the current document in the array of ids. However, a simpler and faster solution would be to simply use Multi-Get instead of search.
One option is to use the Multi GET API. If this doesn't work for you, another solution is to sort the results after you retrieve them from es. In python, this can be done this way:
doc_ids = ["123", "333", "456"] # We want to keep this order
order = {v: i for i, v in enumerate(doc_ids)}
es_results = [{"_id": "333"}, {"_id": "456"}, {"_id": "123"}]
results = sorted(es_results, key=lambda x: order[x['_id']])
# Results:
# [{'_id': '123'}, {'_id': '333'}, {'_id': '456'}]
May be this problem is resolved,, but someone will help with this answer
we can used the pinned_query for the ES. Do not need the loop for the sort the order
**qs = {
"size" => drug_ids.count,
"query" => {
"pinned" => {
"ids" => drug_ids,
"organic" => {
"terms": {
"id": drug_ids
}
}
}
}
}**
It will keep the sequence of the input as it

How to increase the speed of this MongoDB query?

MongoDB 2.0.7 & PHP 5
I'm trying to count the length of each array. Every document has one array. I want to get the number of elements in each array and the ID of the document. There are no indexes except from Id.
Here's my code:
$map = new MongoCode("function() {
emit(this._id,{
'_id':this._id,'cd':this.cd,'msgCount':this.cs[0].msgs.length}
);
}");
$reduce = new MongoCode("function(k, vals) {
return vals[0];
}");
$cmmd = smongo::$db->command(array(
"mapreduce" => "sessions",
"map" => $map,
"reduce" => $reduce,
"out" => "result"));
These are the timings. As you can see, the query is very slow
Array
(
[result] => result
[timeMillis] => 29452
[counts] => Array
(
[input] => 106026
[emit] => 106026
[reduce] => 0
[output] => 106026
)
[ok] => 1
)
How can I reduce the timings?
If you are going to frequently need the counts for your arrays, a better approach would be to include a count field in your actual documents. Otherwise you are going to be scanning all documents to do the count (as per your Map/Reduce example).
You can use an Atomic Operation such as $inc to increment/decrement this count at the same time as you are updating the arrays.

MongoDB + Ruby: updating records in an iteration

Using MongoDB and the Ruby driver, I'm trying to calculate the rankings for players in my app, so I'm sorting by (in this case) pushups, and then adding a rank field and value per object.
pushups = coll.find.sort(["pushups", -1] )
pushups.each_with_index do |r, idx|
r[:pushups_rank] = idx + 1
coll.update( {:id => r }, r, :upsert => true)
coll.save(r)
end
This approach does work, but is this the best way to iterate over objects and update each one? Is there a better way to calculate a player's rank?
Another approach would be to do the entire update on the server by executing a javascript function:
update_rank = "function(){
var rank=0;
db.players.find().sort({pushups:-1}).forEach(function(p){
rank +=1;
p.rank = rank;
db.players.save(p);
});
}"
cn.eval( update_rank )
(Code assumes you have a "players" collection in mongo, and a ruby variable cn that holds a conection to your database)

Lua - Sorting a table alphabetically

I have a table that is filled with random content that a user enters. I want my users to be able to rapidly search through this table, and one way of facilitating their search is by sorting the table alphabetically. Originally, the table looked something like this:
myTable = {
Zebra = "black and white",
Apple = "I love them!",
Coin = "25cents"
}
I was able to implement a pairsByKeys() function which allowed me to output the tables contents in alphabetical order, but not to store them that way. Because of the way the searching is setup, the table itself needs to be in alphabetical order.
function pairsByKeys (t, f)
local a = {}
for n in pairs(t) do
table.insert(a, n)
end
table.sort(a, f)
local i = 0 -- iterator variable
local iter = function () -- iterator function
i = i + 1
if a[i] == nil then
return nil
else
return a[i], t[a[i]]
end
end
return iter
end
After a time I came to understand (perhaps incorrectly - you tell me) that non-numerically indexed tables cannot be sorted alphabetically. So then I started thinking of ways around that - one way I thought of is sorting the table and then putting each value into a numerically indexed array, something like below:
myTable = {
[1] = { Apple = "I love them!" },
[2] = { Coin = "25cents" },
[3] = { Zebra = "black and white" },
}
In principle, I feel this should work, but for some reason I am having difficulty with it. My table does not appear to be sorting. Here is the function I use, with the above function, to sort the table:
SortFunc = function ()
local newtbl = {}
local t = {}
for title,value in pairsByKeys(myTable) do
newtbl[title] = value
tinsert(t,newtbl[title])
end
myTable = t
end
myTable still does not end up being sorted. Why?
Lua's table can be hybrid. For numerical keys, starting at 1, it uses a vector and for other keys it uses a hash.
For example, {1="foo", 2="bar", 4="hey", my="name"}
1 & 2, will be placed in a vector, 4 & my will be placed in a hashtable. 4 broke the sequence and that's the reason for including it into the hashtable.
For information on how to sort Lua's table take a look here: 19.3 - Sort
Your new table needs consecutive integer keys and needs values themselves to be tables. So you want something on this order:
SortFunc = function (myTable)
local t = {}
for title,value in pairsByKeys(myTable) do
table.insert(t, { title = title, value = value })
end
myTable = t
return myTable
end
This assumes that pairsByKeys does what I think it does...

Resources