MongoDB Index and Natural Sort Optimization - sorting

What's the fastest method to sort by reverse insertion order on a capped-collection ('rf' has been sparse-indexed)
db.log.find({ rf : 'o-5556457634'}).sort({ '$natural' : -1 }).explain();
{
"cursor" : "ReverseCappedCursor",
"nscanned" : 1654468,
"nscannedObjects" : 1654468,
"n" : 4,
"millis" : 2932,
"nYields" : 5,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
seems like 'natural' bypass the use of the indexed ('rf') field, significantly slowing the query. Is this an expected expected behaviour? Shouldn't the 'natural' sort be computed after the find/index?
without the 'natural' sort:
db.log.find({ rf : 'o-5556457634'}).explain();
{
"cursor" : "BtreeCursor rf_1",
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 4,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"rf" : [
[
"o-5556457634",
"o-5556457634"
]
]
}
Hint does force the engine to use the 'rf' index but the result bypass the (reverse) 'natural' sort
db.log.find({ rf : 'o-5556457634'}).sort({ '$natural' : -1 }).hint({rf :1}).explain();
{
"cursor" : "BtreeCursor rf_1",
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 4,
"scanAndOrder" : true,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"rf" : [
[
"o-5556457634",
"o-5556457634"
]
]
}
}

Faced the same problem but found a solution.
You can create index on the fields you're mentioning in find filter with addition of the "_id":-1 field and then use sort({"_id":-1}).
Helped me.

It looks like the query optimizer is doing the wrong thing when you add the sort.
Can you try adding .hint({rf :1}) to the query to see what happens?

Related

Firebase sorting not working

I'm missing something or Fireabase just isn't sorting at all ??
https://dinosaur-facts.firebaseio.com/dinosaurs.json?orderBy=%22height%22&print=pretty
{
"bruhathkayosaurus" : {
"appeared" : -70000000,
"height" : 25,
"length" : 44,
"order" : "saurischia",
"vanished" : -70000000,
"weight" : 135000
},
"lambeosaurus" : {
"appeared" : -76000000,
"height" : 2.1,
"length" : 12.5,
"order" : "ornithischia",
"vanished" : -75000000,
"weight" : 5000
},
"linhenykus" : {
"appeared" : -85000000,
"height" : 0.6,
"length" : 1,
"order" : "theropoda",
"vanished" : -75000000,
"weight" : 3
},
"pterodactyl" : {
"appeared" : -150000000,
"height" : 0.6,
"length" : 0.8,
"order" : "pterosauria",
"vanished" : -148500000,
"weight" : 2
},
"stegosaurus" : {
"appeared" : -155000000,
"height" : 4,
"length" : 9,
"order" : "ornithischia",
"vanished" : -150000000,
"weight" : 2500
},
"triceratops" : {
"appeared" : -68000000,
"height" : 3,
"length" : 8,
"order" : "ornithischia",
"vanished" : -66000000,
"weight" : 11000
}
}
The heights are not sorted, is returning [44,...,0.6,...,3]
The request returns the result as a JSON object and there is no way to express the ordering, so the orderBy parameter cannot be used for ordering the results.
The orderBy parameter is expected to be used in conjunction with filtering parameters like equalTo.
See also this answer.

Elastic Search Index Status

I am trying to setup a scripted reindex operation as suggested in: http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/
To go with the suggestion of creating a new index, aliasing then deleting the old index I would need to have a way to tell when the indexing operation on the new index was complete. Ideally via the REST interface.
It has 80 million rows to index and can take a few hours.
I can't find anything helpful in the docs..
You can try with _stats : http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-stats.html
Eg :
{
"_shards" : {
"total" : 10,
"successful" : 5,
"failed" : 0
},
"_all" : {
"primaries" : {
"docs" : {
"count" : 0,
"deleted" : 0
},
"store" : {
"size_in_bytes" : 575,
"throttle_time_in_millis" : 0
},
"indexing" : {
"index_total" : 0,
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time_in_millis" : 0,
"delete_current" : 0,
"noop_update_total" : 0,
"is_throttled" : false,
"throttle_time_in_millis" : 0
},
I think, you can compare _all.total.docs.count and _all.total.indexing.index_current

Mongodb query using $and operator does full scan while same query without doesn't

I'm relatively new to Mongodb and I'm having trouble understanding why a query with the $and operator seems to do a full scan, while the same query without it doesn't.
My document looks something like this
{_id:"123456", labels : [{label:"beef", language:"en"}, {...}],...}
I have a compound index on label and language, however, explain shows very different results depending on whether I do
db.mycollection.find({ "labels" : { "$elemMatch" : { "label" : "beef" , "$and" : [ { "lang" : "en"}]}}}).explain()
{
"cursor" : "Complex Plan",
"n" : 4,
"nscannedObjects" : 0,
"nscanned" : 16701573,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 16701573,
"nYields" : 130540,
"nChunkSkips" : 0,
"millis" : 16283,
"filterSet" : false
}
or
db.mycollection.find({ "labels" : {$elemMatch: { label:"beef" ,"lang" : "en" }}}).explain()
{
"cursor" : "BtreeCursor labels.label_1_labels.lang_1",
"isMultiKey" : true,
"n" : 4,
"nscannedObjects" : 4,
"nscanned" : 4,
"nscannedObjectsAllPlans" : 4,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"labels.label" : [
[
"beef",
"beef"
]
],
"labels.lang" : [
[
"en",
"en"
]
]
},
"filterSet" : false
}
Can anybody help me understand why?.
Thanks in advance!.

How to read Verbose Output from MongoDB-explain(1)

I have the following query.explain(1)-Output. It is a verbose output and my question is how to read that. How is the order of the operations? Does it starts with GEO_NEAR_2DSPHERE or with LIMIT? What does the field advanced express?
And most important, where is this documented? Could not find this in the mongoDB-manual :(
Query:
db.nodesWays.find(
{
geo:{
$nearSphere:{
$geometry:{
type: "Point",
coordinates: [lon, lat]
}
}
},
"amenity":"restaurant"
},
{name:1}
).limit(10).explain(1)
The output:
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 69582,
"nscanned" : 69582,
"nscannedObjectsAllPlans" : 69582,
"nscannedAllPlans" : 69582,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 543,
"nChunkSkips" : 0,
"millis" : 606,
"indexBounds" : {
},
"allPlans" : [
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 69582,
"nscanned" : 69582,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
}
}
],
"server" : "DBTest:27017",
"filterSet" : false,
"stats" : {
"type" : "LIMIT",
"works" : 69582,
"yields" : 543,
"unyields" : 543,
"invalidates" : 0,
"advanced" : 10,
"needTime" : 69572,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "PROJECTION",
"works" : 69582,
"yields" : 543,
"unyields" : 543,
"invalidates" : 0,
"advanced" : 10,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 0,
"children" : [
{
"type" : "FETCH",
"works" : 69582,
"yields" : 543,
"unyields" : 543,
"invalidates" : 0,
"advanced" : 10,
"needTime" : 69572,
"needFetch" : 0,
"isEOF" : 0,
"alreadyHasObj" : 4028,
"forcedFetches" : 0,
"matchTested" : 10,
"children" : [
{
"type" : "GEO_NEAR_2DSPHERE",
"works" : 69582,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 4028,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 0,
"children" : [ ]
}
]
}
]
}
]
}
}
By looking at the stats array, the sequence should be
GEO_NEAR_2DSPHERE -> scans 69582 index objects.
Fetch and limit -> Fetches matched documents up to limited number of documents.
Projection -> Project to return only required fields.
The reason why MongoDB wrap all actions in LIMIT is to align with the query's syntax for easier interpretation.
The query uses an unknown index of type S2NearCursor. In addition to the index, it also retrieved whole document for further reduction on amenity. You may want to explore indexing that as well.
BTW, this is a known bug in MongoDB. It misses the index name when using S2NearCursor index.
As for detailed documentation, I myself also don't find much, but a few online blogs you can browse around.
explain.explain() – Understanding Mongo Query Behavior
Speeding Up Queries: Understanding Query Plans
I especially want to recommend you to pay attention to the last paragraph of the two blog posts. Tune, generate the query plan and try to explain the plan yourself. Doing this a number of rounds, you'll get some idea how it works.
Happy explaining. : )

Why mongodb query, 2 conditions slower than 1 condition?

I have 2 queries:
// query 1
{
"site.$id": ObjectId("52d617b5d8c472274f00004f")
}
// query 2
{
"site.$id": ObjectId("52d617b5d8c472274f00004f"),
"category.$id": ObjectId("52d617c0d8c472274f000076")
}
Can someone please explain why the second query took 1s while the first is very fast?
This collection has about 300.000 entries
Here is explain for query 1
db.Product.find({ "site.$id": ObjectId("52d617b5d8c472274f00004f") }).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 21,
"nscannedObjects" : 279001,
"nscanned" : 279001,
"nscannedObjectsAllPlans" : 279001,
"nscannedAllPlans" : 279001,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 3,
"nChunkSkips" : 0,
"millis" : 545,
"indexBounds" : {
},
"server" : "ip-172-31-9-78:27017"
}
Here is explain for query 2
db.Product.find({ "site.$id": ObjectId("52d617b5d8c472274f00004f"), "category.$id": ObjectId("52d617c0d8c472274f000076") }).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 279002,
"nscanned" : 279002,
"nscannedObjectsAllPlans" : 279002,
"nscannedAllPlans" : 279002,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 107,
"nChunkSkips" : 0,
"millis" : 1852,
"indexBounds" : {
},
"server" : "ip-172-31-9-78:27017"
}
Try to create index for that fields. I think it will work much faster..
db.Product.ensureIndex( { "site.$id": 1, "category.$id": 1 } )
Also you can think about order of fields in your query and respectively in index. This will help http://docs.mongodb.org/manual/tutorial/create-queries-that-ensure-selectivity/

Resources