pymongo sort() limit() different? - sorting

1. db.bios.find().sort( { name: 1 } ).limit( 5 )
2. db.bios.find().limit( 5 ).sort( { name: 1 } )
What is the different with them? They are equal?
If first one doing: find all documents? it is bad.
If db.bios.find().count() is very big(1000000), which process fast?
what is find() default sequence? the insert sequence?
Thanks.

1.These two are equal, sorting will be done first, and then the result will be limited.
2.In order to optimize this, consider having index on name, if this is going to be a frequent query.
3. The natural order for find() is, in general, insertion order, though it is not guaranteed if the documents were updated after creation.

Related

Cloudant couchdb query custom sort

I want to sort the results of couchdb query a.k.a mango queries based on custom sort. I need custom sort because if Status can be one of the following:
Active = 1
Sold = 2
Contingent = 4
Pending = 3
I want to sort the results on Status but not in an alphabetical order, rather my own weightage I assign to each value which can be seen in the above list. Here's the selector for Status query I'm using:
{type:"Property", Status:{"$or":[{$eq: "Pending"}, {$eq:"Active"}, {$eq: "Sold"}]}}
If I use the sort array in my json with Status I think it'll sort alphabetically which I don't want.
You are actually looking for results based on "Status". You can create a view similar to this:
function(doc) { if (doc.type == "Property") { emit(doc.Status, doc);}}
When you use it, invoke it 4 times in the order you need and you'll get the result you need. This would eliminate the need to sort.

How can I descending sort a grouping based on the count of the reduction array in rethinkdb

Importing this dataset as a table:
https://data.cityofnewyork.us/Housing-Development/Registration-Contacts/feu5-w2e2#revert
I use the following query to perform an aggregation and then attempt to sort in descending order based on the reduction field. My intention is sort based on the count of that field or to have the aggregation create a second field called count and sort the grouping results in descending order of the reduction array count or length. How can this be done in rethinkdb?
query:
r.table("contacts").filter({"Type": "Agent","ContactDescription" : "CONDO"}).hasFields("CorporationName").group("CorporationName").ungroup().orderBy(r.desc('reduction'))
I don't quite understand what you're going for, but does this do what you want? If not, what do you want to be different in the output?
r.table("contacts")
.filter({"Type": "Agent","ContactDescription" : "CONDO"})
.hasFields("CorporationName")
.group("CorporationName")
.ungroup()
.merge(function(row){ return {count: row('reduction').count()}; })
.orderBy(r.desc('count'))
You are almost there:
r.table("contacts").filter({"Type": "Agent","ContactDescription" : "CONDO"}).hasFields("CorporationName").group("CorporationName").count().ungroup().orderBy(r.desc('reduction'))
See that .count()? That is a map-reduce operation to get the count of each group.
I haven't tested the query on your dataset. Please comment in case you had problems with it.
EDIT:
If you want to add a count field and preserve the original document, you need to use map and reduce. In your case, it should be something like:
r.table("contacts").filter({"Type": "Agent","ContactDescription" : "CONDO"})
.hasFields("CorporationName")
.group("CorporationName")
.map(r.row.merge({count:1}))
.reduce(function(left, right){
return {
count: left('count').add(right('count')),
<YOUR_OTHER_FIELDS>: left('<YOUR_OTHER_FIELDS>'),
...
};
})
.ungroup().orderBy(r.desc(r.row('reduction')('count')))
EDIT:
I am not sure if this can do the trick, but it is worth a try:
.reduce(function(left, right){
return left.merge({count: left('count').add(right('count'))})
})

Order by random in RethinkDB

I want to order documents randomly in RethinkDB. The reason for this is that I return n groups of documents and each group must appear in order in the results (so all documents belonging to a group should be placed together); and I need to randomly pick a document, belonging to the first group in the results (you don't know which is the first group in the results - the first ones could be empty, so no documents are retrieved for them).
The solution I found to this is to randomly order each of the groups before concat-ing to the result, then always pick the first document from the results (as it will be random). But I'm having a hard time ordering these groups randomly. Would appreciate any hint or even a better solution if there is one.
If you want to order a selection of documents randomly you can just use .orderBy and return a random number using r.random.
r.db('test').table('table')
.orderBy(function (row) { return r.random(); })
If these document are in a group and you want to randomize them inside the group, you can just call orderBy after the group statement.
r.db('test').table('table')
.groupBy('property')
.orderBy(function (row) { return r.random(); })
If you want to randomize the order of the groups, you can just call orderBy after calling .ungroup
r.db('test').table('table')
.groupBy('property')
.ungroup()
.orderBy(function (row) { return r.random(); })
The accepted answer here should not be possible, as John mentioned the sorting function must be deterministic, which r.random() is not.
The r.sample() function could be used to return a random order of the elements:
If the sequence has less than the requested number of elements (i.e., calling sample(10) on a sequence with only five elements), sample will return the entire sequence in a random order.
So, count the number of elements you have, and set that number as the sample number, and you'll get a randomized response.
Example:
var res = r.db("population").table("europeans")
.filter(function(row) {
return row('age').gt(18)
});
var num = res.count();
res.sample(num)
I'm not getting this to work. I tried to sort an table randomly and I'm getting the following error:
e: Sorting by a non-deterministic function is not supported in:
r.db("db").table("table").orderBy(function(var_33) { return r.random(); })
Also I have read in the rethink documentation that this is not supported. This is from the rethinkdb orderBy documentation:
Sorting functions passed to orderBy must be deterministic. You cannot, for instance, order rows using the random command. Using a non-deterministic function with orderBy will raise a ReqlQueryLogicError.
Any suggestions on how to get this to work?
One simple solution would be to give each document a random number:
r.db('db').table('table')
.merge(doc => ({
random: r.random(1, 10)
})
.orderBy('random')

RethinkDB: Can I group by fields between dates efficiently?

I'd like to group by multiple fields, between two timestamps.
I tried something like:
r.table('my_table').between(r.time(2015, 1, 1, 'Z'), r.now(), {index: "timestamp"}).group("field_a", "field_b").count()
Which takes a lot of time since my table is pretty big. I started thinking about using index in the 'group' part of the query, then I remembered it's impossible to use more than one index in the same rql.
Can I achieve what I need efficiently?
You could create a compound index, and then efficiently compute the count for any of the groups without computing all of them:
r.table('my_table').indexCreate('compound', function(row) {
return [row('field_a'), row('field_b'), row('timestamp')];
})
r.table('my_table').between(
[field_a_val, field_b_val, r.time(2015, 1, 1, 'Z')],
[field_a_val, field_b_val, r.now]
).count()

How to retrieve the last 100 documents with a MongoDB/Moped query?

I am using the Ruby Mongoid gem and trying to create a query to retrieve the last 100 documents from a collection. Rather than using Mongoid, I would like to create the query using the underlying driver (Moped). The Moped documentation only mentions how to retrieve the first 100 records:
session[:my_collection].find.limit(100)
How can I retrieve the last 100?
I have found a solution, but you will need to sort collection in descending order. If you have a field id or date you would do:
Method .sort({fieldName: 1 or -1})
The 1 will sort ascending (oldest to newest), -1 will sort descending (newest to oldest). This will reverse entries of your collection.
session[:my_collection].find().sort({id:-1}) or
session[:my_collection].find().sort({date:-1})
If your collection contain field id (_id) that identifier have a date embedded, so you can use
session[:my_collection].find().sort({_id:-1})
In accordance with your example using .limit() the complete query will be:
session[:my_collection].find().sort({id:-1}).limit(100);
Technically that query isn't finding the first 100, that's essentially finding 100 random documents because you haven't specified an order. If you want the first then you'd have to say explicitly sort them:
session[:my_collection].find.sort(:some_field => 1).limit(100)
and to reverse the order to find the last 100 with respect to :some_field:
session[:my_collection].find.sort(:some_field => -1).limit(100)
# -----------------------------------------------^^
Of course you have decide what :some_field is going to be so the "first" and "last" make sense for you.
If you want them sorted by :some_field but want to peel off the last 100 then you could reverse them in Ruby:
session[:my_collection].find
.sort(:some_field => -1)
.limit(100)
.reverse
or you could use use count to find out how many there are then skip to offset into the results:
total = session[:my_collection].find.count
session[:my_collection].find
.sort(:some_field => 1)
.skip(total - 100)
You'd have to check that total >= 100 and adjust the skip argument if it wasn't of course. I suspect that the first solution would be faster but you should benchmark it with your data to see what reality says.

Resources