How to retrieve the last 100 documents with a MongoDB/Moped query? - ruby

I am using the Ruby Mongoid gem and trying to create a query to retrieve the last 100 documents from a collection. Rather than using Mongoid, I would like to create the query using the underlying driver (Moped). The Moped documentation only mentions how to retrieve the first 100 records:
session[:my_collection].find.limit(100)
How can I retrieve the last 100?

I have found a solution, but you will need to sort collection in descending order. If you have a field id or date you would do:
Method .sort({fieldName: 1 or -1})
The 1 will sort ascending (oldest to newest), -1 will sort descending (newest to oldest). This will reverse entries of your collection.
session[:my_collection].find().sort({id:-1}) or
session[:my_collection].find().sort({date:-1})
If your collection contain field id (_id) that identifier have a date embedded, so you can use
session[:my_collection].find().sort({_id:-1})
In accordance with your example using .limit() the complete query will be:
session[:my_collection].find().sort({id:-1}).limit(100);

Technically that query isn't finding the first 100, that's essentially finding 100 random documents because you haven't specified an order. If you want the first then you'd have to say explicitly sort them:
session[:my_collection].find.sort(:some_field => 1).limit(100)
and to reverse the order to find the last 100 with respect to :some_field:
session[:my_collection].find.sort(:some_field => -1).limit(100)
# -----------------------------------------------^^
Of course you have decide what :some_field is going to be so the "first" and "last" make sense for you.
If you want them sorted by :some_field but want to peel off the last 100 then you could reverse them in Ruby:
session[:my_collection].find
.sort(:some_field => -1)
.limit(100)
.reverse
or you could use use count to find out how many there are then skip to offset into the results:
total = session[:my_collection].find.count
session[:my_collection].find
.sort(:some_field => 1)
.skip(total - 100)
You'd have to check that total >= 100 and adjust the skip argument if it wasn't of course. I suspect that the first solution would be faster but you should benchmark it with your data to see what reality says.

Related

Get n-th result returned from MATCH query in Neo4J database

I've set up a library database where users borrow books. Using a MATCH Command i can return the book titles and number of their lendings by descending order.
My Cypher for returning the list of books and number of lendings is:
MATCH (user)-[:LENDING]->(b:Book)
RETURN b.title, COUNT(b.title) as numberOfRents
ORDER BY numberOfRents DESC
This is working properly. However, i need to get the n-th book(by lendings) returned only(let's say the third for example), which is something i failed to do until now.
Sounds like you need SKIP and LIMIT
MATCH (user)-[:LENDING]->(b:Book)
RETURN b.title, COUNT(b.title) as numberOfRents
ORDER BY numberOfRents DESC
SKIP 2 LIMIT 1
// skips the first 2, so you only get the 3rd

Cloudant couchdb query custom sort

I want to sort the results of couchdb query a.k.a mango queries based on custom sort. I need custom sort because if Status can be one of the following:
Active = 1
Sold = 2
Contingent = 4
Pending = 3
I want to sort the results on Status but not in an alphabetical order, rather my own weightage I assign to each value which can be seen in the above list. Here's the selector for Status query I'm using:
{type:"Property", Status:{"$or":[{$eq: "Pending"}, {$eq:"Active"}, {$eq: "Sold"}]}}
If I use the sort array in my json with Status I think it'll sort alphabetically which I don't want.
You are actually looking for results based on "Status". You can create a view similar to this:
function(doc) { if (doc.type == "Property") { emit(doc.Status, doc);}}
When you use it, invoke it 4 times in the order you need and you'll get the result you need. This would eliminate the need to sort.

Slicing neo4j Cypher results in chunks

I want to slice Cypher results in chunks of 100 rows, and be able to retrieve a specific chunk.
At the moment, the only way to ensure that rows are not mixed-up is to user ORDER BY which makes the query very inefficient ( 3sec. for me is too much)
MATCH (p:Person) RETURN p.id ORDER BY p.id SKIP {chunk}*100 LIMIT 100
where {chunk} is an external parameter to identify a specific chunk.
Any suggestions?
PS: the property p.id is indexed.
You may try something like adding label to Person before extracting chunks and then using query like
Match (p:Chunk:Person) with p LIMIT 100
Match (p) remove p:Chunk
Return *
If the p.id values are unique and dense (say, the value starts at 1 and increments, without any gaps), then this query will take advantage of the index on :Person(id) to efficiently get each hundred-Person chunk:
WITH (({chunk} - 1) * 100 + 1) AS startId
MATCH (p:Person)
WHERE p.id IN RANGE(startId, startId + 99)
RETURN p.id
ORDER BY p.id
Now, practically speaking, your id space will probably not remain dense, even if it started out that way. Person nodes will be deleted over time. In that case, the above query can return fewer than 100 rows. So, you can make your chunk size bigger than 100 and do some post-processing to get the 100 you need. In the worst case, you may need to make multiple requests to get the 100 you need, but each request will be fast. (Ideally, you would want to assign no-longer-unused id values to new Person nodes, to fill up gaps in the id space -- but this would require you to scan for the gaps.)

OFFSET/LIMIT only count DISTINCT values in Activerecord query

I am running this query
Playlistship.order("created_at desc").select("distinct playlist_id").limit(12).offset(2)
This query does not necessarily return 12 records. It returns the number of distinct records in the set of 12 defined by the LIMIT, OFFSET and ORDER parameters.
For example if the Playlistships between id=13 and id=24 had playlist_ids of [2,3,3,5,6,3,5,6,8,11,12,12], then this query will only give return 7 records, corresponding to the first ones having the playlist_ids [2,3,5,6,8,11,12].
What I would like to find is a query that yields 12, records with distinct playlist_ids, with the correct offset so that running this query again with an OFFSET of 3 would yield the next 12 records with distinct playlist_ids.
Hopefully I didn't "over explain" this one, as I think it's a relatively straightforward question. Please ask for more details if you need them.
Thanks!
Have you tried with subqueries? Give this a try:
Playlistship.select("distinct playlist_id").limit(12).where(playlist_id: Playlistship.order("created_at desc").select('playlist_id').offset(2))

mongoDB geoNear command with count

I am using the geoNear commang with mongoid in order to retrive a document collection ordered by distance. I need the distance for each document in the collection which is why I am having to resort to the geoNear command.
Given the following command:
category_ids = ["list", "of", "ids"]
cmd = Hash.new
cmd[:geoNear] = :poi
cmd[:near] = [params[:location][:x], params[:location][:y]]
cmd[:query] = {
"$or" => [
{primary_category_id: {"$in" => category_ids}},
{category_ids: {"$in" => category_ids}}
]
}
cmd[:spherical] = true
cmd[:num] = num
res = Poi.collection.database.command cmd
My problem is that I require the total number of results in the collection. Sure I could just run another query that just counts the number of items that satisfy the query part of the command, however that would be pretty inefficient and also not very extendible as every change I make in the command would have to be reflected in the count query. Just adding a maxDistance would land me in a whole heap of trouble.
Another option would be to go with find and calculate the distance manually but again I would like to avoid that.
So my question is there a clever way of getting the number of documents returned by the command (minus the num) without having to run a separate query or having to calculate the distance manually and go with find.
You can use facet for the same after geoNear use facet one will project the documents and in other you can use group by _id null and use the count in group to count the total number of documents.

Resources