Related
Below is the code I am using:
"_source" : {
"name" : "hn name",
"user_id" : 553,
"email_id" : "ns#gmail.com",
"lres_id" : "",
"hres_id" : "hn image",
"followers" : 0,
"following" : 1,
"mentors" : 2,
"mentees" : 2,
"basic_info" : "hn developer",
"birth_date" : 1448451985397,
"charge_price" : 3000,
"org" : "mnc pvt ltd",
"located_in" : "Noidasec51 ",
"position" : "jjunior ava developer",
"requests" : 0,
"exp" : 5,
"video_bio_lres" : "hn test lres url",
"video_bio_hres" : "hn hres url",
"ratings" : [ {
"rating" : 1,
"ratedByUserId" : 777
}, {
"rating" : 1,
"ratedByUserId" : 555
} ],
"avg_rating" : 0.0,
"status" : 0,
"expertises" : [ 3345, 1234, 2345 ],
"blocked_users" : [ ]
}
In the Following Code, I want to delete rating ratedByUserId 555 only.But Some How I am unable for doing so.
How to do it?
its works for me:-
curl -XPOST 'localhost:9200/mentorz/users/555/_update' -d
'{" script":"ctx._source.ratings.remove(ratings)",
"params":{
"ratings":{
"rating":1,
"ratedByUserId":555
}
}
}'
I have a collection of user documents, where each user can have an arbitrary set of properties. Each user is associated to an app document. Here is an example user:
{
"appId": "XXXXXXX",
"properties": [
{ "name": "age", "value": 30 },
{ "name": "gender", "value": "female" },
{ "name": "alive", "value": true }
]
}
I would like to be able to find/count users based on the values of their properties. For example, find me all users for app X that have property Y > 10 and Z equals true.
I have a compound, multikey index on this collection db.users.ensureIndex({ "appId": 1, "properties.name": 1, "properties.value": 1}). This index is working well for single condition queries, ex:
db.users.find({
appId: 'XXXXXX',
properties: {
$elemMatch: {
name: 'age',
value: {
$gt: 10
}
}
}
})
The above query completes in < 300ms with a collection of 1M users. However, when I try and add a second condition, the performance degrades considerably (7-8s), and the explain() output indicates that the whole index is being scanned to fulfill the query ("nscanned" : 2752228).
Query
db.users.find({
appId: 'XXXXXX',
properties: {
$all: [
{
$elemMatch: {
name: 'age',
value: {
$gt: 10
}
}
},
{
$elemMatch: {
name: 'alive',
value: true
}
}
]
}
})
Explain
{
"cursor" : "BtreeCursor appId_1_properties.name_1_properties.value_1",
"isMultiKey" : true,
"n" : 256,
"nscannedObjects" : 1000000,
"nscanned" : 2752228,
"nscannedObjectsAllPlans" : 1018802,
"nscannedAllPlans" : 2771030,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 21648,
"nChunkSkips" : 0,
"millis" : 7425,
"indexBounds" : {
"appId" : [
[
"XXXXX",
"XXXXX"
]
],
"properties.name" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"properties.value" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"filterSet" : false
}
I assume this is because Mongo is unable to create suitable bounds since I am looking for both boolean and integer values.
My question is this: Is there a better way to structure my data, or modify my query to improve performance and take better advantage of my index? Is it possible to instruct mongo to treat each condition separately, generate appropriate bounds, and then perform the intersection of the results, instead of scanning all documents? Or is mongo just not suited for this type of use case?
I know this is an old question, but I think it would be much better to structure your data without the "name" and "value" tags:
{
"appId": "XXXXXXX",
"properties": [
{ "age": 30 },
{ "gender: "female" },
{ "alive": true }
]
}
Please, observe:
MongoDB shell version: 2.4.1
connecting to: test
> use dummy
switched to db dummy
> db.invoices.find({'items.nameTags': /^z/}, {_id: 1}).explain()
{
"cursor" : "BtreeCursor items.nameTags_1_created_1_special_1__id_1_items.qty_1_items.total_1 multi",
"isMultiKey" : true,
"n" : 55849,
"nscannedObjects" : 223568,
"nscanned" : 223568,
"nscannedObjectsAllPlans" : 223568,
"nscannedAllPlans" : 223568,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 86,
"nChunkSkips" : 0,
"millis" : 88864,
"indexBounds" : {
"items.nameTags" : [
[
"z",
"{"
],
[
/^z/,
/^z/
]
],
"created" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"special" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"items.qty" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"items.total" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "IL-Mark-LT:27017"
}
>
Here is the definition of the index:
> db.system.indexes.find({name : 'items.nameTags_1_created_1_special_1__id_1_items.qty_1_items.total_1'}).pretty()
{
"v" : 1,
"key" : {
"items.nameTags" : 1,
"created" : 1,
"special" : 1,
"_id" : 1,
"items.qty" : 1,
"items.total" : 1
},
"ns" : "dummy.invoices",
"name" : "items.nameTags_1_created_1_special_1__id_1_items.qty_1_items.total_1"
}
>
Finally, here is an example invoice document (with just 2 items):
> db.invoices.findOne({itemCount: 2})
{
"_id" : "85923",
"customer" : "Wgtd Fm 91",
"businessNo" : "314227928",
"billTo_name" : "Wgtd Fm 91",
"billTo_addressLine1" : "3839 Ross Street",
"billTo_addressLine2" : "Kingston, ON",
"billTo_postalCode" : "K7L 4V4",
"purchaseOrderNo" : "boi",
"terms" : "COD",
"shipDate" : "2013-07-10",
"shipVia" : "Moses Transportation Inc.",
"rep" : "Snowhite",
"items" : [
{
"qty" : 4,
"name" : "CA 7789",
"desc" : "3 pc. Coffee Table set (Silver)",
"price" : 222.3,
"total" : 889.2,
"nameTags" : [
"ca 7789",
"a 7789",
" 7789",
"7789",
"789",
"89",
"9"
],
"descTags" : [
"3",
"pc",
"c",
"coffee",
"offee",
"ffee",
"fee",
"ee",
"e",
"table",
"able",
"ble",
"le",
"e",
"set",
"et",
"t",
"silver",
"ilver",
"lver",
"ver",
"er",
"r"
]
},
{
"qty" : 4,
"name" : "QP 8681",
"desc" : "Ottoman Bed",
"price" : 1179.1,
"total" : 4716.4,
"nameTags" : [
"qp 8681",
"p 8681",
" 8681",
"8681",
"681",
"81",
"1"
],
"descTags" : [
"ottoman",
"ttoman",
"toman",
"oman",
"man",
"an",
"n",
"bed",
"ed",
"d"
]
}
],
"itemCount" : 2,
"discount" : "10%",
"delivery" : 250,
"hstPercents" : 13,
"subTotal" : 5605.6,
"totalBeforeHST" : 5295.04,
"total" : 5983.4,
"totalDiscount" : 560.56,
"hst" : 688.36,
"modified" : "2012-10-08",
"created" : "2014-06-25",
"version" : 0
}
>
My problem is that mongodb does not use index only according to the aforementioned explain() output. Why? After all I only request the _id field, which is part of the index.
In general, I feel that I am doing something very wrong. My invoices collection has 65,000 invoices with the total of 3,291,092 items. It took almost 89 seconds to explain() the query.
What am I doing wrong?
You are using arrays and subdocuments. Covered Indexes dont work with either of these.
From the mongo docs:
An index cannot cover a query if:
any of the indexed fields in any of the documents in the collection includes an array. If an indexed field is an array, the index becomes a multi-key index index and cannot support a covered query.
any of the indexed fields are fields in subdocuments. To index fields in subdocuments, use dot notation. For example, consider a collection users with documents of the following form:
http://docs.mongodb.org/manual/tutorial/create-indexes-to-support-queries/
I have a test collection with 5 million records:
for(var i = 0; i < 5000000; i++) { db.testcol.insert({field1: i}) }
Inserted an index:
db.testcol.ensureIndex({field1:1})
Now the funny thing:
mongos> db.testcol2.find({field1: {$gte: 0}},{field1:1,_id:0}).explain();
{
"cursor" : "BtreeCursor field1_1",
"isMultiKey" : false,
"n" : 5000000,
"nscannedObjects" : 0,
"nscanned" : 5000000,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 5000000,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 4675,
"indexBounds" : {
"field1" : [
[
0,
1.7976931348623157e+308
]
]
},
"server" : "jvangaalen-PC:27020",
"millis" : 4675
}
Indexonly is true, nscannedobject: 0 (meaning it never looked at the real document to examine it
Now the same query, but with _id:1 (_id is not in the index so it has to look at the document too):
> db.testcol2.find({field1: {$gte: 0}},{field1:1,_id:1}).explain(); {
> "cursor" : "BtreeCursor field1_1",
> "isMultiKey" : false,
> "n" : 5000000,
> "nscannedObjects" : 5000000,
> "nscanned" : 5000000,
> "nscannedObjectsAllPlans" : 5000000,
> "nscannedAllPlans" : 5000000,
> "scanAndOrder" : false,
> "indexOnly" : false,
> "nYields" : 5,
> "nChunkSkips" : 0,
> "millis" : 3742,
> "indexBounds" : {
> "field1" : [
> [
> 0,
> 1.7976931348623157e+308
> ]
> ]
> },
> "server" : "jvangaalen-PC:27020",
> "millis" : 3742 }
Response times dropped from 4.7 seconds to 3.7 seconds. Indexonly: false and nscannedobjects is 5000000 (all). This seems funny because it seems that it has to do more work for the same resultset, but still is significantly faster.
How is this possible? Reason why I am trying different this is that I cannot get nscannedobjects to 0 when it's running a indexonly:true query after sharding
I am relatively new to mongoDB.
I set up a shard mongo cluster with 2 Replica Sets; each set in a shard. -> 4 mongo deamons
The deamons are distributed on 2 WIN server with 8gb ram each.
I have a Test Collection with 10 mio documents (~ 600bytes / doc) and using the c# driver to connect to the mongos (primaryPreferred)
Now if i run some thousands single read-queries on the shard key I can see that mongo eats up more and more memory and stalls at around 7,2GB. Almost no more page faults and the queries are extremly fast. Good!
The same with more complex queries on different document properties
(Combined Index for those queries exists)
BUT
If I execute just a couple of update queries, I got a huge drop in memory usage... like mongo frees up 3GB of RAM just in no time and the so fast read queries are getting very slow.
It gets worse if i launch like 500k upserts (Save) in a row.
A complex query that was taking like 2sec to run takes now 22minutes.
I get the same behavior for Count-Queries with the same query parameters.
Is that a rather normal mongoDB behaviour or is there something that I missed to set up?
--- UPDATE #hwatkins
MongoDB version: 2.2.2
1 document scanned for a single upsert
I Have quite high disk activity during the bulk-upsert
explain() for a complex count- query before upsert
Count Explain: { "clusteredType" : "ParallelSort", "shards" : { "set1/xxxx:1234,yyyy:1234" : [{ "cursor" : "BtreeCursor AC", "isMultiKey" : false, "n" : 20799, "nscannedObjects" : 292741, "nscanned" : 292741, "nscannedObjectsAllPlans" : 294290, "nscannedAllPlans" : 294290, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 2, "nChunkSkips" : 0, "millis" : 2382, "indexBounds" : { "f.14.b" : [["A", "A"]], "f.500.b" : [[10, 50]] }, "allPlans" : [{ "cursor" : "BtreeCursor AC", "n" : 20795, "nscannedObjects" : 292741, "nscanned" : 292741, "indexBounds" : { "f.14.b" : [["A", "A"]], "f.500.b" : [[10, 50]] } }, { "cursor" : "BasicCursor", "n" : 4, "nscannedObjects" : 1549, "nscanned" : 1549, "indexBounds" : { } }], "oldPlan" : { "cursor" : "BtreeCursor AC", "indexBounds" : { "f.14.b" : [["A", "A"]], "f.500.b" : [[10, 50]] } }, "server" : "xxxx:1234" }], "set2/xxxx:56789,yyyy:56789" : [{ "cursor" : "BtreeCursor AC", "isMultiKey" : false, "n" : 7000, "nscannedObjects" : 97692, "nscanned" : 97692, "nscannedObjectsAllPlans" : 98941, "nscannedAllPlans" : 98941, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 729, "indexBounds" : { "f.14.b" : [["A", "A"]], "f.500.b" : [[10, 50]] }, "allPlans" : [{ "cursor" : "BtreeCursor AC", "n" : 6996, "nscannedObjects" : 97692, "nscanned" : 97692, "indexBounds" : { "f.14.b" : [["A", "A"]], "f.500.b" : [[10, 50]] } }, { "cursor" : "BasicCursor", "n" : 4, "nscannedObjects" : 1249, "nscanned" : 1249, "indexBounds" : { } }], "oldPlan" : { "cursor" : "BtreeCursor AC", "indexBounds" : { "f.14.b" : [["A", "A"]], "f.500.b" : [[10, 50]] } }, "server" : "yyyy:56789" }] }, "cursor" : "BtreeCursor AC", "n" : 27799, "nChunkSkips" : 0, "nYields" : 2, "nscanned" : 390433, "nscannedAllPlans" : 393231, "nscannedObjects" : 390433, "nscannedObjectsAllPlans" : 393231, "millisShardTotal" : 3111, "millisShardAvg" : 1555, "numQueries" : 2, "numShards" : 2, "millis" : 2384 }
explain() after upsert for the same query
{ "clusteredType" : "ParallelSort", "shards" : { "set1/xxxx:1234,yyyy:1234" : [{ "cursor" : "BtreeCursor AC", "isMultiKey" : false, "n" : 20799, "nscannedObjects" : 292741, "nscanned" : 292741, "nscannedObjectsAllPlans" : 294290, "nscannedAllPlans" : 294290, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 379, "nChunkSkips" : 0, "millis" : 391470, "indexBounds" : { "f.14.b" : [["A", "A"]], "f.500.b" : [[10, 50]] }, "allPlans" : [{ "cursor" : "BtreeCursor AC", "n" : 20795, "nscannedObjects" : 292741, "nscanned" : 292741, "indexBounds" : { "f.14.b" : [["A", "A"]], "f.500.b" : [[10, 50]] } }, { "cursor" : "BasicCursor", "n" : 4, "nscannedObjects" : 1549, "nscanned" : 1549, "indexBounds" : { } }], "server" : "xxxx:1234" }], "set2/xxxx:56789,yyyy:56789" : [{ "cursor" : "BtreeCursor AC", "isMultiKey" : false, "n" : 7000, "nscannedObjects" : 97692, "nscanned" : 97692, "nscannedObjectsAllPlans" : 98941, "nscannedAllPlans" : 98941, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 910, "indexBounds" : { "f.14.b" : [["A", "A"]], "f.500.b" : [[10, 50]] }, "allPlans" : [{ "cursor" : "BtreeCursor AC", "n" : 6996, "nscannedObjects" : 97692, "nscanned" : 97692, "indexBounds" : { "f.14.b" : [["A", "A"]], "f.500.b" : [[10, 50]] } }, { "cursor" : "BasicCursor", "n" : 4, "nscannedObjects" : 1249, "nscanned" : 1249, "indexBounds" : { } }], "oldPlan" : { "cursor" : "BtreeCursor AC", "indexBounds" : { "f.14.b" : [["A", "A"]], "f.500.b" : [[10, 50]] } }, "server" : "yyyy:56789" }] }, "cursor" : "BtreeCursor AC", "n" : 27799, "nChunkSkips" : 0, "nYields" : 379, "nscanned" : 390433, "nscannedAllPlans" : 393231, "nscannedObjects" : 390433, "nscannedObjectsAllPlans" : 393231, "millisShardTotal" : 392380, "millisShardAvg" : 196190, "numQueries" : 2, "numShards" : 2, "millis" : 391486 }
Btw:
*One single upsert (one affected doc) lets the memory usage drop by around 600MB. --> ~ 4,5GB mem usage reached only after some queries.
if i take the query from above and i use the mongoCursor to loop on the result-set it just takes ages... (query still running as i type) :(
UPDATE II #Daniel
Here you got a sample doc stored in the mongoDB-Cluster.
The Shard Key is the b -Property of my doc (it corresponds to a telephone number)
Upsert:
I search back existing docs by the shard-key and update some properties of the f- array. Then I call Save on the mongoDB driver for all those docs one by one (like 500k times).
There is an index: { "f.14.b" : 1, "f.500.b" : 1 }
This index is used for complex queries. Like described above those queries are fast before the bulk-update and extremely slow after the update.
{
"_id" : ObjectId("51248d6xxxxxxxxxxxxx"),
"b" : "33600000000",
"f" : {
"500" : {
"a" : ISODate("2013-02-20T08:45:38.075Z"),
"b" : 91
},
"14" : {
"a" : ISODate("2013-02-20T08:45:38.075Z"),
"b" : "A"
},
"1501" : {
"a" : ISODate("2013-02-20T08:45:38.141Z"),
"b" : ["X", "Y", "Z"]
},
"2000" : {
"a" : ISODate("2013-02-20T08:45:38.141Z"),
"b" : false
}
}
}
Thanks a lot,
Blume
This is interesting. It looks like first, your data is not very evenly distributed. Your explain shows nscanned: 292741 on the first set and nscanned: 97692 on the second set. Pretty big difference. It also shows on the first set nyields:379 and on the second set nyields:0. This implies that only are you reading unevenly from your sets, you are probably writing unevenly to them. You will get more out of your cluster if you choose a shard key that has a more even distribution.
As to why specifically this is happening with your upserts, are you adding more data to your existing documents? If so you are probably a victim of document movement. In your mongodb logs do you see any queries with moved: 1? This means the slow query in the log also had a document movement on disk which causes lots of havok with indexes into arrays/subdocuments. Mongodb I believe still will essentially have to do an index rebuild on the entire document if it moves and will have to do some heavy updating of all indexes into subdocuments/arrays.
The workaround for document movement is to preallocate extra data on document at creation then immediately remove it from the document. Mongo allocates all documents with a fixed space + padding factor on the disk. If they outgrow their space, they must be moved on the disk to a larger area. If you created your documents with already extra data then remove it, you will give yourself a lot of extra padding on disk to accomodate your document growth. This can be a waste of space for sure but it will be a big saver of performance.
What version of mongodb are you on?
When you do the upsert can you do an .explain() on it to see how
many documents it's scanning.
What does the disk io look like during the upserts