How to sort by two values in MongoDB-Ruby? - ruby

So I have a Ruby Script where I find all the documents with the type "homework" in the "grades" collection a "students" DB (MongoDB) The thing is, following these instructions:
http://api.mongodb.org/ruby/current/file.TUTORIAL.html
I try to sort by score and then by student id (or viceversa) with:
homeworks.sort(:score, 1).sort(:student_id, 1).to_a
And running the file ("mongo.rb") I get an output of homeworks sorted by score (ascending) but not by student ID... (They're scrambled) If I try to switch values, I get the array ordered by student_id (ascending) but not by score... (In that case, score values are scrambled)
How can I Sort ascending by two arguments in mongo using ruby??

Per documentation, try
homeworks.sort([[:score, 1], [:student_id, 1]]).to_a

How about this:
c = db['grades']
x = c.find({}, {:sort=>[[:student_id, 1], [:score, 1]]}).to_a
This works for me in irb.

Related

Simpler alternative to simultaneously Sort and Filter by column in Google Spreadsheets

I have a spreadsheet (here's a copy) with the following (headered) columns:
A: Indices for a list of groceries;
B: Names for the groceries to be indexed by column A;
C: Check column with "x" for inactive items in column B, empty otherwise;
D: Sorting indices that I want to apply to column B;
Currently, I am getting the sorted AND filtered result with this formula:
=SORT(FILTER(B2:B; C2:C = ""); FILTER(D2:D; C2:C = ""); TRUE)
The problem is that I need to apply the filter two times: one for the items and one for the indices, otherwise I get a mismatch between elements for the Sort function.
I feel that this doesn't scale well since it creates duplication.
Is there a way to get the same results with a simpler formula or another arrangement of columns?
=SORT(FILTER({Itens!B2:B\Itens!G2:G}; Itens!D2:D=""))
=SORT(FILTER({Itens!B2:B\Itens!G2:G}; Itens!D2:D="");2;1)
or maybe: =SORT(FILTER(Itens!B2:B; Itens!D2:D="");2;1)

How to retrieve the last 100 documents with a MongoDB/Moped query?

I am using the Ruby Mongoid gem and trying to create a query to retrieve the last 100 documents from a collection. Rather than using Mongoid, I would like to create the query using the underlying driver (Moped). The Moped documentation only mentions how to retrieve the first 100 records:
session[:my_collection].find.limit(100)
How can I retrieve the last 100?
I have found a solution, but you will need to sort collection in descending order. If you have a field id or date you would do:
Method .sort({fieldName: 1 or -1})
The 1 will sort ascending (oldest to newest), -1 will sort descending (newest to oldest). This will reverse entries of your collection.
session[:my_collection].find().sort({id:-1}) or
session[:my_collection].find().sort({date:-1})
If your collection contain field id (_id) that identifier have a date embedded, so you can use
session[:my_collection].find().sort({_id:-1})
In accordance with your example using .limit() the complete query will be:
session[:my_collection].find().sort({id:-1}).limit(100);
Technically that query isn't finding the first 100, that's essentially finding 100 random documents because you haven't specified an order. If you want the first then you'd have to say explicitly sort them:
session[:my_collection].find.sort(:some_field => 1).limit(100)
and to reverse the order to find the last 100 with respect to :some_field:
session[:my_collection].find.sort(:some_field => -1).limit(100)
# -----------------------------------------------^^
Of course you have decide what :some_field is going to be so the "first" and "last" make sense for you.
If you want them sorted by :some_field but want to peel off the last 100 then you could reverse them in Ruby:
session[:my_collection].find
.sort(:some_field => -1)
.limit(100)
.reverse
or you could use use count to find out how many there are then skip to offset into the results:
total = session[:my_collection].find.count
session[:my_collection].find
.sort(:some_field => 1)
.skip(total - 100)
You'd have to check that total >= 100 and adjust the skip argument if it wasn't of course. I suspect that the first solution would be faster but you should benchmark it with your data to see what reality says.

how to create set of values, after group function in Pig (Hadoop)

Lets say I have set of values in file.txt
a,b,c
a,b,d
k,l,m
k,l,n
k,l,o
And my code is:
file = LOAD 'file.txt' using PigStorage(',');
events = foreach file generate session_id, user_id, code, type;
gr = group events by (session_id, user_id);
and I have set of value:
((a,b),{(a,b,c),(a,b,d)})
((k,l),{(k,l,m),(k,l,n),(k,l,o)})
And I'd like to have:
(a,b,(c,d))
(k,l,(m,n,o))
Have you got any idea how to do it?
Regards
Pawel
Note: you are inconsistent in your question. You say session_id, user_id, code, type in the FOREACH line, but your have a PigStorage not providing values. Also, that FOREACH has 4 values, while your sample data only has 3. I'll assume that type doesn't exist in order to answer your question.
After your gr relation, you are left with the group by key (in this case (session_id, user_id)) in a automatically generated tuple called group.
So, first step: gr2 = FOREACH gr GENERATE FLATTEN(group);
This will give you the tuples (a,b) and (k,l). You need to use FLATTEN because group is a tuple and you are asking for session_id and user_id to be individual columns. FLATTEN does that for you.
Ok, so now modify the gr2 line to also use a projection to tease out the third value:
gr2 = FOREACH gr GENERATE FLATTEN(group), events.code;
events.code creates a bag out of all the code values. events is the name of the bag of grouped tuples (it's named after the original relation).
This should give you:
(a, b, {c, d})
(k, l, {m, n, o})
It's very important to note that the values in the list are in a bag not a tuple, like you asked for. Keeping it in a bag is the right idea because the bag is a variable list, while a tuple is not.
Additional advice: Understanding how GROUP BY outputs data is something I see a lot of people struggle with when first using Pig. If you think my answer doesn't make much sense, I'd recommend spending some time to really get to understand GROUP BY. Understanding versus thinking it is magic will pay off in the long run.

mongoDB geoNear command with count

I am using the geoNear commang with mongoid in order to retrive a document collection ordered by distance. I need the distance for each document in the collection which is why I am having to resort to the geoNear command.
Given the following command:
category_ids = ["list", "of", "ids"]
cmd = Hash.new
cmd[:geoNear] = :poi
cmd[:near] = [params[:location][:x], params[:location][:y]]
cmd[:query] = {
"$or" => [
{primary_category_id: {"$in" => category_ids}},
{category_ids: {"$in" => category_ids}}
]
}
cmd[:spherical] = true
cmd[:num] = num
res = Poi.collection.database.command cmd
My problem is that I require the total number of results in the collection. Sure I could just run another query that just counts the number of items that satisfy the query part of the command, however that would be pretty inefficient and also not very extendible as every change I make in the command would have to be reflected in the count query. Just adding a maxDistance would land me in a whole heap of trouble.
Another option would be to go with find and calculate the distance manually but again I would like to avoid that.
So my question is there a clever way of getting the number of documents returned by the command (minus the num) without having to run a separate query or having to calculate the distance manually and go with find.
You can use facet for the same after geoNear use facet one will project the documents and in other you can use group by _id null and use the count in group to count the total number of documents.

Queries on ActiveRecord Association collection object

I have a set of rows which I've fetched from a table. Let's say the object Rating. After fetching this object, I have say 100 entries from the database.
The Rating object might look like this:
table_ratings
t.integer :num
So what I now want to do is perform some calculations on these 100 rows without performing any other queries. I can do this, running an additional query:
r = Rating.all
good = r.where('num = 2') # this runs an additional query
"#{good}/#{r.length}"
This is a very rough idea of what I want to do, and I have other more complex output to build. Let's imagine I have over 10 different calculations I need to perform on these rows, and so aliasing these in a sql query might not be the best idea.
What would be an efficient way to replace the r.where query above? Is there a Ruby method or a gem which would give me a similar query api into ActiveRecord Association collection objects?
Rating.all returns an array of all Rating objects. From there, shift your focus to selecting and mapping from the array. eg.:
#perfect_ratings = r.select{|x| x.a == 100}
See:
http://www.ruby-doc.org/core-1.9.3/Array.html
ADDITIONAL COMMENTS:
Going over the list of methods available for array, I find myself using the following frequently:
To check a variable against multiple values:
%w[dog cat llama].include? #pet_type # returns true if #pet_type == 'cat'
To create another array(map and collect are aliases):
%w[dog cat llama].map(|pet| pet.capitalize) # ["Dog", "Cat", "Llama"]
To sort and drop duplicates:
%w[dog cat llama dog].sort.uniq # ["cat", "dog", "llama"]
<< to add an element, + to add arrays, flatten to flatten embedded arrays into a single level array, count or length or size for number of elements, and join are the others I tend to use a lot.
Finally, here is an example of join:
%w[dog cat llama].join(' or ') # "dog or cat or llama"

Resources