Restrict a find query MongoDB with ruby driver - ruby

I have a mongo query that look like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1})
My problem is that my collection contain a huge quantity of documents and the find is really slow.
So I was wondering if I could restrict the query for the last x inserted documents ?
Something like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1, max_doc: 10_000})
Thanks and sorry for my english.

You can't do that restriction at query time. But you could have a separated caped collection for this effect.
You insert on both and do this query on the caped collection which will will only retain the last N documents.
But this will only fit if you don't need to update or remove documents from that collection.
http://docs.mongodb.org/manual/core/capped-collections

Related

FaunaDB search document and get its ranking based on a score

I have the following Collection of documents with structure:
type Streak struct {
UserID string `fauna:"user_id"`
Username string `fauna:"username"`
Count int `fauna:"count"`
UpdatedAt time.Time `fauna:"updated_at"`
CreatedAt time.Time `fauna:"created_at"`
}
This looks like the following in FaunaDB Collections:
{
"ref": Ref(Collection("streaks"), "288597420809388544"),
"ts": 1611486798180000,
"data": {
"count": 1,
"updated_at": Time("2021-01-24T11:13:17.859483176Z"),
"user_id": "276989300",
"username": "yodanparry"
}
}
Basically I need a lambda or a function that takes in a user_id and spits out its rank within the collection. rank is simply sorted by the count field. For example, let's say I have the following documents (I ignored other fields for simplicity):
user_id
count
abc
12
xyz
10
fgh
999
If I throw in fgh as an input for this lambda function, I want it to spit out 1 (or 0 if you start counting from 0).
I already have an index for user_id so I can query and match a document reference from this index. I also have an index sorted_count that sorts document based on count field ascendingly.
My current solution was to query all documents by sorted_count index, then get the rank by iterating through the array. I think there should be a better solution for this. I'm just not seeing it.
Please help. Thank you!
Counting things in Fauna isn't as easy as one might expect. But you might still be able to do something more efficient than you describe.
Assuming you have:
CreateIndex(
{
name: "sorted_count",
source: Collection("streaks"),
values: [
{ field: ["data", "count"] }
]
}
)
Then you can query this index like so:
Count(
Paginate(
Match(Index("sorted_count")),
{ after: 10, size: 100000 }
)
)
Which will return an object like this one:
{
before: [10],
data: [123]
}
Which tells you that there are 123 documents with count >= 10, which I think is what you want.
This means that, in order to get a user's rank based on their user_id, you'll need to implement this two-step process:
Determine the count of the user in question using your index on user_id.
Query sorted_count using the user's count as described above.
Note that, in case your collection has more than 100,000 documents, you'll need your Go code to iterate through all the pages based on the returned object's after field. 100,000 is Fauna's maximum allowed page size. See the Fauna docs on pagination for details.
Also note that this might not reflect whatever your desired logic is for resolving ties.

How do I model my document in MongoDB to make it paginable for nested attributes?

I'm trying to cache my tweets and show that based on my keyword save. However, as tweets grow overtime I need to paginate them.
I'm using Ruby and Mongoid which this is what I have come up so far.
class SavedTweet
include Mongoid::Document
field :saved_id, :type => String
field :slug, :type => String
field :tweets, :type => Array
end
And the tweets array would be like this
{id: "id", text: "text", created_at: "created_at"}
So it's like a bucket for each keyword that you can save. My first problem would be that Mongodb cannot sort the second level of document which in this case it's tweets and that'd make pagination much harder because I cannot use skip and limit. I will have to load the whole tweets and put that in the cache and paginate from that.
The question is how should I model my problem to make it paginable out of Mongodb and not in the memory. I'm assuming that doing it in Mongodb would be faster. Right now, I'm in the early stage of my application so it's easier to change the model than later. If you guys have any suggestions or opinion I'm really appreciated.
An option could be to save tweets in a different collection and link them with your SavedTweet class. It will be easy to query and you could use skip and limit without problems.
{id: "id", text: "text", created_at: "created_at", saved_tweet:"_id"}
EDIT: a better explanation, with two aditional options
As far I see, you have three options, if I understand correctly your requirements:
Use the same schema that you are already using. You would have two problems: you cannot use skip and limit with an usual query and you have a limit of 16 MB per document. I think, the first one could be resolved with an Aggregation Framework query ($unwind, $skip and $limit could be helpful). The second one could be a problem if you have a lot of tweet documents in the array, because one document cannot have more than 16MB of size.
Use two collections to store your tweets. One collection would have the same structure that you already have. For example:
{
save_id:"23232",
slug:"adkj"
}
And the other collection would have one document per tweet.
{
id: "id",
text: "text",
created_at: "created_at",
saved_tweet:"_id"
}
With saved_tweet field you are linking saved_tweets with tweet with a 1 to N relation. So with this way, you can carry out queries over tweet collection and still be able to use limit and skip operators..
Save all info in the same document. If your saved_tweet collection only have those fields, you can save all info in a whole document (one document for each tweet). Something like this:
{
save_id:"23232",
slug:"adkj"
tweet:
{
tweet_id: "id",
text: "text",
created_at: "created_at"
}
}
Whit this solution you are duplicating fields, because *save_id* and slug would be the same in other documents of the same saved_tweet, but I could be an option if you have a little quantity of fields and that fields are not subdocuments or arrays.
I hope it is clear now.

How do you calculate the average of a all entries in a field from a specific collection in Mongo using Ruby

Given the following data:
{
_id: ObjectId("51659dc99d62eedc1a000001"),
type: "image_search",
branch: "qa_media_discovery_feelobot",
time_elapsed: 19000,
test: "1365613930 All Media",
search_term: null,
env: "delta",
date: ISODate("2013-04-10T17:13:45.751Z")
}
I would like to run a command like:
avg_image_search_time = #coll.find("type" => "image_search").avg(:time_elapsed)
How would I accomplish this?
I understand the documentation on this is kind of difficult to follow.
avg_image_search_time = #coll.aggregate([ {"$group" => {"_id"=>"$type", "avg"=> {"$avg"=>"$time_elapsed"}}}, {"$match" => {"_id"=>"image_search"}} ]).first['avg']
To break this down:
We are grouping the matches by the type field, and returning the $avg time_elapsed for each type. We name the resulting average avg. Then, of those groups, filter out only the ones where the group _id matches image_search. Finally, since aggregate always returns an array, get the first result (there should only be one), and grab the avg field that we named.
Use the mongodb aggregation framework http://docs.mongodb.org/manual/core/aggregation/

Retrieving a Subset of Fields from MongoDB in Ruby

I'm trying to get a subset of fields from MongoDB with a query made in Ruby but it doesn't seem to work. It doesn't return any results
This is the ruby code:
coll.find("title" => 'Halo', :fields => ["title", "isrc"]) #this doesn't work
If I remove the fields hash, it works, returning the results with all the fields
coll.find("title" => 'Halo') #this works
Looking at the mongodb console the first query ends-up on the mongodb server like this:
{ title: "Halo", fields: [ "title", "isrc" ] }
If I try to make the query from the mongo client console, it works, I get the results and the subset. I make the query like this:
db.tracks.find({title: 'Halo'}, {title:1,isrc:1})
What could be the problem? I've been looking for a solution for this for a couple of hours now.
As of Sep, 2015, these other answers are outdated. You need to use the projection method: #projection(hash)
coll.find({"title" => 'Halo'}).projection({title: 1, isrc: 1})
The query should look like
collection.find(selector = {}, opts = {})
Query the database
In your case it is
coll.find({"title" => 'Halo'}, {:fields => ["title", "isrc"]})
But still remains a problem, the ruby-driver ignores the condition of "fields", and returns all the fields! :\
This query will return only the title and isrc for a doc that has the title "Halo":
coll.find({"title" => 'Halo'},{:fields => {"_id" => 0, "title" => 1, "isrc" => 1}})
Note the use of a Hash for the fields where the keys are the field names and the values are either 1 or 0, depending on whether you want to include or exclude the given field.
You can use the below query
coll.find({"title" => 'Halo'}).projection({title: 1, isrc: 1, _id: 0})
if you don't want _id, to be retrieved in case.

Getting the highest value of a column in MongoDB

I've been for some help on getting the highest value on a column for a mongo document. I can sort it and get the top/bottom, but I'm pretty sure there is a better way to do it.
I tried the following (and different combinations):
transactions.find("id" => x).max({"sellprice" => 0})
But it keeps throwing errors. What's a good way to do it besides sorting and getting the top/bottom?
Thank you!
max() does not work the way you would expect it to in SQL for Mongo. This is perhaps going to change in future versions but as of now, max,min are to be used with indexed keys primarily internally for sharding.
see http://www.mongodb.org/display/DOCS/min+and+max+Query+Specifiers
Unfortunately for now the only way to get the max value is to sort the collection desc on that value and take the first.
transactions.find("id" => x).sort({"sellprice" => -1}).limit(1).first()
Sorting might be overkill. You can just do a group by
db.messages.group(
{key: { created_at:true },
cond: { active:1 },
reduce: function(obj,prev) { if(prev.cmax<obj.created_at) prev.cmax = obj.created_at; },
initial: { cmax: **any one value** }
});
db.collectionName.aggregate(
{
$group :
{
_id : "",
last :
{
$max : "$sellprice"
}
}
}
)
Example mongodb shell code for computing aggregates.
see mongodb manual entry for group (many applications) :: http://docs.mongodb.org/manual/reference/aggregation/group/#stage._S_group
In the below, replace the $vars with your collection key and target variable.
db.activity.aggregate(
{ $group : {
_id:"$your_collection_key",
min: {$min : "$your_target_variable"},
max: {$max : "$your_target_variable"}
}
}
)
Use aggregate():
db.transactions.aggregate([
{$match: {id: x}},
{$sort: {sellprice:-1}},
{$limit: 1},
{$project: {sellprice: 1}}
]);
It will work as per your requirement.
transactions.find("id" => x).sort({"sellprice" => -1}).limit(1).first()
If the column's indexed then a sort should be OK, assuming Mongo just uses the index to get an ordered collection. Otherwise it's more efficient to iterate over the collection, keeping note of the largest value seen. e.g.
max = nil
coll.find("id" => x).each do |doc|
if max == nil or doc['sellprice'] > max then
max = doc['sellprice']
end
end
(Apologies if my Ruby's a bit ropey, I haven't used it for a long time - but the general approach should be clear from the code.)
Assuming I was using the Ruby driver (I saw a mongodb-ruby tag on the bottom), I'd do something like the following if I wanted to get the maximum _id (assuming my _id is sortable). In my implementation, my _id was an integer.
result = my_collection.find({}, :sort => ['_id', :desc]).limit(1)
To get the minimum _id in the collection, just change :desc to :asc
Following query does the same thing:
db.student.find({}, {'_id':1}).sort({_id:-1}).limit(1)
For me, this produced following result:
{ "_id" : NumberLong(10934) }

Resources