Getting the highest value of a column in MongoDB - ruby

I've been for some help on getting the highest value on a column for a mongo document. I can sort it and get the top/bottom, but I'm pretty sure there is a better way to do it.
I tried the following (and different combinations):
transactions.find("id" => x).max({"sellprice" => 0})
But it keeps throwing errors. What's a good way to do it besides sorting and getting the top/bottom?
Thank you!

max() does not work the way you would expect it to in SQL for Mongo. This is perhaps going to change in future versions but as of now, max,min are to be used with indexed keys primarily internally for sharding.
see http://www.mongodb.org/display/DOCS/min+and+max+Query+Specifiers
Unfortunately for now the only way to get the max value is to sort the collection desc on that value and take the first.
transactions.find("id" => x).sort({"sellprice" => -1}).limit(1).first()

Sorting might be overkill. You can just do a group by
db.messages.group(
{key: { created_at:true },
cond: { active:1 },
reduce: function(obj,prev) { if(prev.cmax<obj.created_at) prev.cmax = obj.created_at; },
initial: { cmax: **any one value** }
});

db.collectionName.aggregate(
{
$group :
{
_id : "",
last :
{
$max : "$sellprice"
}
}
}
)

Example mongodb shell code for computing aggregates.
see mongodb manual entry for group (many applications) :: http://docs.mongodb.org/manual/reference/aggregation/group/#stage._S_group
In the below, replace the $vars with your collection key and target variable.
db.activity.aggregate(
{ $group : {
_id:"$your_collection_key",
min: {$min : "$your_target_variable"},
max: {$max : "$your_target_variable"}
}
}
)

Use aggregate():
db.transactions.aggregate([
{$match: {id: x}},
{$sort: {sellprice:-1}},
{$limit: 1},
{$project: {sellprice: 1}}
]);

It will work as per your requirement.
transactions.find("id" => x).sort({"sellprice" => -1}).limit(1).first()

If the column's indexed then a sort should be OK, assuming Mongo just uses the index to get an ordered collection. Otherwise it's more efficient to iterate over the collection, keeping note of the largest value seen. e.g.
max = nil
coll.find("id" => x).each do |doc|
if max == nil or doc['sellprice'] > max then
max = doc['sellprice']
end
end
(Apologies if my Ruby's a bit ropey, I haven't used it for a long time - but the general approach should be clear from the code.)

Assuming I was using the Ruby driver (I saw a mongodb-ruby tag on the bottom), I'd do something like the following if I wanted to get the maximum _id (assuming my _id is sortable). In my implementation, my _id was an integer.
result = my_collection.find({}, :sort => ['_id', :desc]).limit(1)
To get the minimum _id in the collection, just change :desc to :asc

Following query does the same thing:
db.student.find({}, {'_id':1}).sort({_id:-1}).limit(1)
For me, this produced following result:
{ "_id" : NumberLong(10934) }

Related

MongoDB compound indexes vs Single FIeld Indexes in terms of space consumption

According to this post compound indexes are bigger in terms of dimensions (I could not find much info on docs, so if you could point me there I would be grateful).
Suppose I have to search for the whole address (we can assume I will always have all the fields available both in collection and in the query) through a collection of addresses like
{
name: String,
street: String,
postcode: String,
City: String,
Country: String
}
My question is: how bigger a compound index would be?
If a compound index is bigger then a single field wouldn't it be better to add a hash of the concatenation of all values to all objects, add a single index to the hash field and search by that (although it do not sounds like a good practice)?
If a compound index is bigger then a single field wouldn't it be better to add a hash of the concatenation of all values to all objects, add a single index to the hash field and search by that (although it do not sounds like a good practice)?
These accomplish different things. A compound index has an order and that order has an effect. For instance, the index { 'country' : 1, 'city' : 1, 'postcode' : 1 } would allow to search for all address in a specific city of a specific country. A hash can't do that - hashes only support exact matches.
I don't see how this is bad practice at all, it's just a very narrow use case. Remember than every slight difference in spelling, additional white spaces, etc. will result in different hash values and that you can't even answer simple question like "how many address in country X do we store?". But if you don't need that, why not?
By the way, MongoDB has built-in support for this. If the address is embedded, using a hashed index on the entire subdocument will accomplish what you need:
MongoDB supports hashed indexes of any single field. The hashing function collapses embedded documents and computes the hash for the entire value,
e.g.:
> db.hash.insert( {"name": "john", "address" : { "city" : "Chicago", "state":"IL",
"country" : "US" } } );
WriteResult({ "nInserted" : 1 })
> db.hash.createIndex( { "address" : "hashed" } );
...
>
> This query uses the index and finds the document:
> db.hash.find({"address" : {"city" : "Chicago", "state": "IL", "country" : "US" } } );
>
> // this query wont find the document b/c of missing state, but is still fast (IXSCAN)
> db.hash.find({"address" : {"city" : "Chicago", "country" : "US" } } );

Restrict a find query MongoDB with ruby driver

I have a mongo query that look like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1})
My problem is that my collection contain a huge quantity of documents and the find is really slow.
So I was wondering if I could restrict the query for the last x inserted documents ?
Something like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1, max_doc: 10_000})
Thanks and sorry for my english.
You can't do that restriction at query time. But you could have a separated caped collection for this effect.
You insert on both and do this query on the caped collection which will will only retain the last N documents.
But this will only fit if you don't need to update or remove documents from that collection.
http://docs.mongodb.org/manual/core/capped-collections

Sorting values in couchdb

I'm trying to pull scores back from this view
"scores": {
"map": "function(doc) { emit(doc.appName, {_id:doc.username, username:doc.username, credits:doc.credits, avatar:doc.avatar}) }"
},
I pass in the appname and it returns the scores. The issue is the scores wont sort correctly. When I try to sort them they only sort by the first number so something like so
1500
50
7
900
As you can see the first numbers are sorted ASC but the whole number itself isn't. Is it possible to have couchdb sort the scores if the appname is the key?
is doc.appName a string? Turn it into a number:
function(doc) {
emit(parseInt(doc.appName), {_id:doc.username, username:doc.username, credits:doc.credits, avatar:doc.avatar});
// ^^^^^^^^
}
Use a complex key:
emit([doc.appName, doc.score], null)
Then query using a range:
startkey=["app1", 0]&endkey=["app1", {}]

How do you calculate the average of a all entries in a field from a specific collection in Mongo using Ruby

Given the following data:
{
_id: ObjectId("51659dc99d62eedc1a000001"),
type: "image_search",
branch: "qa_media_discovery_feelobot",
time_elapsed: 19000,
test: "1365613930 All Media",
search_term: null,
env: "delta",
date: ISODate("2013-04-10T17:13:45.751Z")
}
I would like to run a command like:
avg_image_search_time = #coll.find("type" => "image_search").avg(:time_elapsed)
How would I accomplish this?
I understand the documentation on this is kind of difficult to follow.
avg_image_search_time = #coll.aggregate([ {"$group" => {"_id"=>"$type", "avg"=> {"$avg"=>"$time_elapsed"}}}, {"$match" => {"_id"=>"image_search"}} ]).first['avg']
To break this down:
We are grouping the matches by the type field, and returning the $avg time_elapsed for each type. We name the resulting average avg. Then, of those groups, filter out only the ones where the group _id matches image_search. Finally, since aggregate always returns an array, get the first result (there should only be one), and grab the avg field that we named.
Use the mongodb aggregation framework http://docs.mongodb.org/manual/core/aggregation/

Retrieving a Subset of Fields from MongoDB in Ruby

I'm trying to get a subset of fields from MongoDB with a query made in Ruby but it doesn't seem to work. It doesn't return any results
This is the ruby code:
coll.find("title" => 'Halo', :fields => ["title", "isrc"]) #this doesn't work
If I remove the fields hash, it works, returning the results with all the fields
coll.find("title" => 'Halo') #this works
Looking at the mongodb console the first query ends-up on the mongodb server like this:
{ title: "Halo", fields: [ "title", "isrc" ] }
If I try to make the query from the mongo client console, it works, I get the results and the subset. I make the query like this:
db.tracks.find({title: 'Halo'}, {title:1,isrc:1})
What could be the problem? I've been looking for a solution for this for a couple of hours now.
As of Sep, 2015, these other answers are outdated. You need to use the projection method: #projection(hash)
coll.find({"title" => 'Halo'}).projection({title: 1, isrc: 1})
The query should look like
collection.find(selector = {}, opts = {})
Query the database
In your case it is
coll.find({"title" => 'Halo'}, {:fields => ["title", "isrc"]})
But still remains a problem, the ruby-driver ignores the condition of "fields", and returns all the fields! :\
This query will return only the title and isrc for a doc that has the title "Halo":
coll.find({"title" => 'Halo'},{:fields => {"_id" => 0, "title" => 1, "isrc" => 1}})
Note the use of a Hash for the fields where the keys are the field names and the values are either 1 or 0, depending on whether you want to include or exclude the given field.
You can use the below query
coll.find({"title" => 'Halo'}).projection({title: 1, isrc: 1, _id: 0})
if you don't want _id, to be retrieved in case.

Resources