Having trouble with a slow MongoDB aggregation query - performance

I have the following aggregation query in MongoDB
return mongoose.model('Submission')
.aggregate([
{ $match: { client: { $in: clientIds }, admin: this._admin._id } },
{ $sort: { client: 1, submitted: -1 } },
{ $group: {
_id: '$client',
lastSubmitted: { $first: '$submitted' },
timezone: { $first: '$timezone' },
} },
])
.exec();
which is performing really badly on a collection with about 2000 documents. It usually takes 5 seconds to complete and I've seen as bad as 15 seconds. I have the following index on the submissions collection:
{
client : 1,
admin : 1,
assessment : 1,
submitted : -1,
}
I'm stuck as to why it's taking so long. Any suggestions?
EDIT
I've run the query
db.submissions.aggregate([
{$match: {
client: {$in: ['54a4cdfdd0666c243035dc98','55cc985291a0ffab6849de34']},
admin: '542b4af8880fc300007eb411'
}},
{$sort: {client:1, submitted: -1}},
{$group: {
_id: '$client',
lastSubmitted: {$first: '$submitted'},
timezone: {$first: '$timezone'}
}}
], {explain: true})
in the shell with explain and got
{
"stages" : [
{
"$cursor" : {
"query" : {
"client" : {
"$in" : [
"54a4cdfdd0666c243035dc98",
"55cc985291a0ffab6849de34"
]
},
"admin" : "542b4af8880fc300007eb411"
},
"fields" : {
"client" : 1,
"submitted" : 1,
"timezone" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "webdemo.submissions",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"admin" : {
"$eq" : "542b4af8880fc300007eb411"
}
},
{
"client" : {
"$in" : [
"54a4cdfdd0666c243035dc98",
"55cc985291a0ffab6849de34"
]
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"client" : 1,
"admin" : 1,
"assessment" : 1,
"submitted" : -1
},
"indexName" : "client_1_admin_1_assessment_1_submitted_-1",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"client" : [
"[\"54a4cdfdd0666c243035dc98\", \"54a4cdfdd0666c243035dc98\"]",
"[\"55cc985291a0ffab6849de34\", \"55cc985291a0ffab6849de34\"]"
],
"admin" : [
"[\"542b4af8880fc300007eb411\", \"542b4af8880fc300007eb411\"]"
],
"assessment" : [
"[MinKey, MaxKey]"
],
"submitted" : [
"[MaxKey, MinKey]"
]
}
}
},
"rejectedPlans" : [ ]
}
}
},
{
"$sort" : {
"sortKey" : {
"client" : 1,
"submitted" : -1
}
}
},
{
"$group" : {
"_id" : "$client",
"lastSubmitted" : {
"$first" : "$submitted"
},
"timezone" : {
"$first" : "$timezone"
}
}
}
],
"ok" : 1
}
EDIT 2
The output I get from db.submissions.getIndices() is
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "webdemo.submissions"
},
{
"v" : 1,
"key" : {
"client" : 1,
"admin" : 1,
"assessment" : 1,
"submitted" : -1
},
"name" : "client_1_admin_1_assessment_1_submitted_-1",
"ns" : "webdemo.submissions",
"background" : true
}
]

Related

Get document by min size of array in Mongodb

I have mongo collection:
{
"_id" : 123,
"index" : "111",
"students" : [
{
"firstname" : "Mark",
"lastname" : "Smith"),
}
],
}
{
"_id" : 456,
"index" : "222",
"students" : [
{
"firstname" : "Mark",
"lastname" : "Smith"),
}
],
}
{
"_id" : 789,
"index" : "333",
"students" : [
{
"firstname" : "Neil",
"lastname" : "Smith"),
},
{
"firstname" : "Sofia",
"lastname" : "Smith"),
}
],
}
I want to get document that has index that is in the set of the given indexes, for example givenSet = ["111","333"] and has min length of students array.
Result should be the first document with _id:123, because its index is in the givenSet and studentsArrayLength = 1, which is smaller than third.
I need to write custom JSON #Query for Spring Mongo Repository. I am new to Mongo and am stuck a bit with this problem.
I wrote something like this:
#Query("{'index':{$in : ?0}, length:{$size:$students}, $sort:{length:-1}, $limit:1}")
Department getByMinStudentsSize(Set<String> indexes);
And got error: error message '$size needs a number'
Should I just use .count() or something like that?
you should use the aggregation framework for this type of query.
filter the result based on your condition.
add a new field and assign the array size to it.
sort based on the new field.
limit the result.
the solution should look something like this:
db.collection.aggregate([
{
"$match": {
index: {
"$in": [
"111",
"333"
]
}
}
},
{
"$addFields": {
"students_size": {
"$size": "$students"
}
}
},
{
"$sort": {
students_size: 1
}
},
{
"$limit": 1
}
])
working example: https://mongoplayground.net/p/ih4KqGg25i6
You are getting the issue because the second param should be enclosed in curly braces. And second param is projection
#Query("{{'index':{$in : ?0}}, {length:{$size:'$students'}}, $sort:{length:1}, $limit:1}")
Department getByMinStudentsSize(Set<String> indexes);
Below is the mongodb query :
db.collection.aggregate(
[
{
"$match" : {
"index" : {
"$in" : [
"111",
"333"
]
}
}
},
{
"$project" : {
"studentsSize" : {
"$size" : "$students"
},
"students" : 1.0
}
},
{
"$sort" : {
"studentsSize" : 1.0
}
},
{
"$limit" : 1.0
}
],
{
"allowDiskUse" : false
}
);

How project DBRef on Spring MongoDB Aggregation?

I have the following aggregation done in a MongoDB shell to get the number of alerts of each type for each user:
db.getCollection('alerts').aggregate(
{
$unwind:"$son"
},
{
$group:
{
_id:{
son: "$son",
level: "$level"
},
count: { $sum: 1 }
}
},
{
$group:
{
_id:{
son: "$_id.son"
},
alerts: { $addToSet: {
level: "$_id.level",
count: "$count"
}}
}
}
)
I have translated it to Spring Data MongoDB as follows:
TypedAggregation<AlertEntity> alertsAggregation =
Aggregation.newAggregation(AlertEntity.class,
unwind("$son"),
Aggregation.group("$son", "$level").count().as("count"),
Aggregation.group("$_id.son")
.addToSet(new BasicDBObject("level", "$_id.level").append("count", "$count")).as("alerts"));
// Aggregation.match(Criteria.where("_id").in(sonIds)
AggregationResults<AlertsBySonDTO> results = mongoTemplate.
aggregate(alertsAggregation, AlertsBySonDTO.class);
List<AlertsBySonDTO> alertsBySonResultsList = results.getMappedResults();
return alertsBySonResultsList;
What I have not clear and I can not get it to work, is to project the identifier and if possible the name of the user (son variable).
The resulting DTO is as follows
public final class AlertsBySonDTO implements Serializable {
private static final long serialVersionUID = 1L;
#JsonProperty("identity")
private String id;
#JsonProperty("alerts")
private ArrayList<Map<String, String>> alerts;
}
but in the id property the entire embedded child entity.
This is the structure of the collection of alerts.
JSON alerts format:
{
"_id" : ObjectId("59e6ff3d9ef9d46a91112890"),
"_class" : "es.bisite.usal.bulltect.persistence.entity.AlertEntity",
"level" : "INFO",
"title" : "Alerta de Prueba",
"payload" : "Alerta de Prueba",
"create_at" : ISODate("2017-10-18T07:13:45.091Z"),
"delivery_mode" : "PUSH_NOTIFICATION",
"delivered" : false,
"parent" : {
"$ref" : "parents",
"$id" : ObjectId("59e6ff369ef9d46a91112878")
},
"son" : {
"$ref" : "children",
"$id" : ObjectId("59e6ff389ef9d46a9111287b")
}
}
/* 2 */
{
"_id" : ObjectId("59e6ff6d9ef9d46a91112892"),
"_class" : "es.bisite.usal.bulltect.persistence.entity.AlertEntity",
"level" : "WARNING",
"title" : "Token de acceso inv�lido.",
"payload" : "El token de acceso YOUTUBE no es v�lido",
"create_at" : ISODate("2017-10-18T07:14:53.449Z"),
"delivery_mode" : "PUSH_NOTIFICATION",
"delivered" : false,
"parent" : {
"$ref" : "parents",
"$id" : ObjectId("59e6ff369ef9d46a91112878")
},
"son" : {
"$ref" : "children",
"$id" : ObjectId("59e6ff389ef9d46a9111287b")
}
}
/* 3 */
{
"_id" : ObjectId("59e6ff6d9ef9d46a91112893"),
"_class" : "es.bisite.usal.bulltect.persistence.entity.AlertEntity",
"level" : "WARNING",
"title" : "Token de acceso inv�lido.",
"payload" : "El token de acceso INSTAGRAM no es v�lido",
"create_at" : ISODate("2017-10-18T07:14:53.468Z"),
"delivery_mode" : "PUSH_NOTIFICATION",
"delivered" : false,
"parent" : {
"$ref" : "parents",
"$id" : ObjectId("59e6ff369ef9d46a91112878")
},
"son" : {
"$ref" : "children",
"$id" : ObjectId("59e6ff389ef9d46a9111287c")
}
}
Anyone know how I can approach this?
thanks in advance
1. With MongoDB version 3.4
These are the following collections I created to reproduce your use case:
Alerts Collection
{
"_id" : ObjectId("59e6ff3d9ef9d46a91112890"),
"_class" : "es.bisite.usal.bulltect.persistence.entity.AlertEntity",
"level" : "INFO",
"title" : "Alerta de Prueba",
"payload" : "Alerta de Prueba",
"create_at" : ISODate("2017-10-18T07:13:45.091+0000"),
"delivery_mode" : "PUSH_NOTIFICATION",
"delivered" : false,
"parent" : DBRef("parents", ObjectId("59e6ff369ef9d46a91112878")),
"son" : DBRef("children", ObjectId("59e72ff0572ae72d8c063666"))
}
{
"_id" : ObjectId("59e6ff6d9ef9d46a91112892"),
"_class" : "es.bisite.usal.bulltect.persistence.entity.AlertEntity",
"level" : "WARNING",
"title" : "Token de acceso inv�lido.",
"payload" : "El token de acceso YOUTUBE no es valido",
"create_at" : ISODate("2017-10-18T07:14:53.449+0000"),
"delivery_mode" : "PUSH_NOTIFICATION",
"delivered" : false,
"parent" : DBRef("parents", ObjectId("59e6ff369ef9d46a91112878")),
"son" : DBRef("children", ObjectId("59e72ff0572ae72d8c063666"))
}
{
"_id" : ObjectId("59e6ff6d9ef9d46a91112893"),
"_class" : "es.bisite.usal.bulltect.persistence.entity.AlertEntity",
"level" : "WARNING",
"title" : "Token de acceso inv�lido.",
"payload" : "El token de acceso INSTAGRAM no es v�lido",
"create_at" : ISODate("2017-10-18T07:14:53.468+0000"),
"delivery_mode" : "PUSH_NOTIFICATION",
"delivered" : false,
"parent" : DBRef("parents", ObjectId("59e6ff369ef9d46a91112878")),
"son" : DBRef("children", ObjectId("59e72ffb572ae72d8c063669"))
}
Notice I changed the OBjectIds of the sons reference to match the children collection I created.
Children collection
{
"_id" : ObjectId("59e72ff0572ae72d8c063666"),
"name" : "Bob"
}
{
"_id" : ObjectId("59e72ffb572ae72d8c063669"),
"name" : "Tim"
}
Since you are using a reference you can't just access a field from the other collection. So I think you are missing some aggregation steps.
I did the following:
db.getCollection('alerts').aggregate(
{
$unwind:"$son"
},
{
$group:
{
_id:{
son: "$son",
level: "$level"
},
count: { $sum: 1 }
}
},
{
$group:
{
_id:{
son: "$_id.son"
},
alerts: { $addToSet: {
level: "$_id.level",
count: "$count"
}}
}
},
{ $addFields: { sonsArray: { $objectToArray: "$_id.son" } } },
{ $match: { "sonsArray.k": "$id"} },
{ $lookup: { from: "children", localField: "sonsArray.v", foreignField: "_id", as: "name" } }
)
And got the following results as json:
{
"_id" : {
"son" : DBRef("children", ObjectId("59e72ffb572ae72d8c063669"))
},
"alerts" : [
{
"level" : "WARNING",
"count" : NumberInt(1)
}
],
"sonsArray" : [
{
"k" : "$ref",
"v" : "children"
},
{
"k" : "$id",
"v" : ObjectId("59e72ffb572ae72d8c063669")
}
],
"name" : [
{
"_id" : ObjectId("59e72ffb572ae72d8c063669"),
"name" : "Tim"
}
]
}
{
"_id" : {
"son" : DBRef("children", ObjectId("59e72ff0572ae72d8c063666"))
},
"alerts" : [
{
"level" : "INFO",
"count" : NumberInt(1)
},
{
"level" : "WARNING",
"count" : NumberInt(1)
}
],
"sonsArray" : [
{
"k" : "$ref",
"v" : "children"
},
{
"k" : "$id",
"v" : ObjectId("59e72ff0572ae72d8c063666")
}
],
"name" : [
{
"_id" : ObjectId("59e72ff0572ae72d8c063666"),
"name" : "Bob"
}
]
}
If you want to get rid of the fields that where additionally created like sonsArray etc. you can do add a $project pipeline to clean your result.
2. If you have older versions of mongodb and you can change your data structure.
If instead of using a reference like this:
"son" : DBRef("children", ObjectId("59e72ffb572ae72d8c063669"))
you can add the objectId of the son/s as an array like this:
"sonId" : [
ObjectId("59e72ff0572ae72d8c063666")
]
then you can do your aggregation as follows:
db.getCollection('alerts').aggregate(
{
$unwind:"$sonId"
},
{
$group:
{
_id:{
sonId: "$sonId",
level: "$level"
},
count: { $sum: 1 }
}
},
{
$group:
{
_id:{
sonId: "$_id.sonId"
},
alerts: { $addToSet: {
level: "$_id.level",
count: "$count"
}}
}
},
{ $lookup: { from: "children", localField: "_id.sonId", foreignField: "_id", as: "son" } }
)
Is that something you are looking for?

Count Documents Matching Multiple Array Criteria

Schema is:
{
"_id" : ObjectId("594b7e86f59ccd05bb8a90b5"),
"_class" : "com.notification.model.entity.Notification",
"notificationReferenceId" : "7917a5365ba246d1bb3664092c59032a",
"notificationReceivedAt" : ISODate("2017-06-22T08:23:34.382+0000"),
"sendTo" : [
{
"userReferenceId" : "check",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "UNREAD"
}
]
}
]
}
{
"_id" : ObjectId("594b8045f59ccd076dd86063"),
"_class" : "com.notification.model.entity.Notification",
"notificationReferenceId" : "6990329330294cbc950ef2b38f6d1a4f",
"notificationReceivedAt" : ISODate("2017-06-22T08:31:01.299+0000"),
"sendTo" : [
{
"userReferenceId" : "check",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "UNREAD"
}
]
}
]
}
{
"_id" : ObjectId("594b813ef59ccd076dd86064"),
"_class" : "com.notification.model.entity.Notification",
"notificationReferenceId" : "3c910cf5fcec42d6bfb78a9baa393efa",
"notificationReceivedAt" : ISODate("2017-06-22T08:35:10.474+0000"),
"sendTo" : [
{
"userReferenceId" : "check",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "UNREAD"
}
]
},
{
"userReferenceId" : "hello",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "READ"
}
]
}
]
}
I want to count a user notifications based on statusList which is a List. I used mongoOperations to make a query:
Query query = new Query();
query.addCriteria(Criteria.where("sendTo.userReferenceId").is(userReferenceId)
.andOperator(Criteria.where("sendTo.mediumAndDestination.status").in(statusList)));
long count = mongoOperations.count(query, Notification.class);
I realise I'm doing it wrong because I am getting count as 1 when I query for user with reference ID hello and statusList with single element as UNREAD.
How do I perform an aggregated query on array element?
The query needs $elemMatch in order to actually match "within" the array element that matches both criteria:
Query query = new Query(Criteria.where("sendTo")
.elemMatch(
Criteria.where("userReferenceId").is("hello")
.and("mediumAndDestination.status").is("UNREAD")
));
Which essentially serializes to:
{
"sendTo": {
"$elemMatch": {
"userReferenceId": "hello",
"mediumAndDestination.status": "UNREAD"
}
}
}
Note that in your question there is no such document, the only matching thing with "hello" actually has the "status" of "READ". If I supply those criteria instead:
{
"sendTo": {
"$elemMatch": {
"userReferenceId": "hello",
"mediumAndDestination.status": "READ"
}
}
}
Then I get the last document:
{
"_id" : ObjectId("594b813ef59ccd076dd86064"),
"_class" : "com.notification.model.entity.Notification",
"notificationReferenceId" : "3c910cf5fcec42d6bfb78a9baa393efa",
"notificationReceivedAt" : ISODate("2017-06-22T08:35:10.474Z"),
"sendTo" : [
{
"userReferenceId" : "check",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "UNREAD"
}
]
},
{
"userReferenceId" : "hello",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "READ"
}
]
}
]
}
But with "UNREAD" the count is actually 0 for this sample.

Datediff in Criteria operator in spring-data-mongodb not working

Can I have difference between two dates to greater than 0 in Criteria operator in spring-data-mongodb? I wrote the query below :
Criteria c= Criteria.where("myDate").gte(startDate).
andOperator(Criteria.where("myDate").lte(endDate).andOperator(Criteria.where("studentId").is(studentId).andOperator(Criteria.where("currDate - myDate").gt(0))));
This query not working.
If possible please help me in getting this query work with spring-data-mongodb.
Edit:
The mongodb pipeline query is as follows:
{ "aggregate" : "__collection__" , "pipeline" : [ { "$match" : { "myDate" : { "$gte" : { "$date" : "2000-01-01T07:57:33.231Z"}} , "$and" : [ { "myDate" : { "$lte" : { "$date" : "2015-11-05T07:57:33.231Z"}} , "$and" : [ { "studentId" : "100" , "$and" : [ { "currDate - myDate" : { "$gt" : 0}}]}]}]}} , { "$project" : { "status" : 1}} , { "$group" : { "_id" : { "status" : "$status"} , "activeCount" : { "$sum" : 1}}}]}
Regards
Kris
For it to work, you'd essentially want to convert the current aggregation pipeline to this:
var pipeline = [
{
"$project" : {
"status" : 1,
"studentId" : 1,
"myDate" : 1,
"dateDifference": { "$subtract": [ new Date(), "$myDate" ] }
}
},
{
"$match" : {
"studentId": "100" ,
"myDate": {
"$gte": ISODate("2000-01-01T07:57:33.231Z"),
"$lte": ISODate("2015-11-05T07:57:33.231Z")
},
"dateDifference": { "$gt" : 0 }
}
},
{
"$group": {
"_id": "$status",
"activeCount": { "$sum" : 1 }
}
}
];
db.collection.aggregate(pipeline);
The Spring Data MongoDB equivalent follows:
Criteria dateCriteria = new Criteria().andOperator(Criteria.where("myDate").gte(startDate).lte(endDate),
Criteria.where("dateDifference").gt(0));
Aggregation agg = Aggregation.newAggregation(
project("id", "status", "studentId", "myDate")
.andExpression("currDate - myDate").as("dateDifference"),
//.and(currDate).minus("myDate").as("dateDifference"), <-- or use expressions
match(Criteria.where("studentId").is("100").andOperator(dateCriteria)),
group("status"),
.count().as("activeCount")
);

MongoDB FindAndModify Sorting

I am using FindAndModify in MongoDB in several concurrent processes. The collection size is about 3 million entries and everything works like a blast as long as I don't pass a sorting option (by an indexed field). Once I try to do so, the following warning is spawned to the logs:
warning: ClientCursor::yield can't unlock b/c of recursive lock ns: test_db.wengine_queue top:
{
opid: 424210,
active: true,
lockType: "write",
waitingForLock: false,
secs_running: 0,
op: "query",
ns: "test_db",
query: {
findAndModify: "wengine_queue",
query: {
locked: { $ne: 1 },
rule_completed: { $in: [ "", "0", null ] },
execute_at: { $lt: 1324381363 },
company_id: 23,
debug: 0,
system_id: "AK/AK1201"
},
update: {
$set: { locked: 1 }
},
sort: {
execute_at: -1
}
},
client: "127.0.0.1:60873",
desc: "conn",
threadId: "0x1541bb000",
connectionId: 1147,
numYields: 0
}
I do have all the keys from the query indexed, here they are:
PRIMARY> db.wengine_queue.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test_db.wengine_queue",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"system_id" : 1,
"company_id" : 1,
"locked" : 1,
"rule_completed" : 1,
"execute_at" : -1,
"debug" : 1
},
"ns" : "test_db.wengine_queue",
"name" : "system_id_1_company_id_1_locked_1_rule_completed_1_execute_at_-1_debug_1"
},
{
"v" : 1,
"key" : {
"debug" : 1
},
"ns" : "test_db.wengine_queue",
"name" : "debug_1"
},
{
"v" : 1,
"key" : {
"system_id" : 1
},
"ns" : "test_db.wengine_queue",
"name" : "system_id_1"
},
{
"v" : 1,
"key" : {
"company_id" : 1
},
"ns" : "test_db.wengine_queue",
"name" : "company_id_1"
},
{
"v" : 1,
"key" : {
"locked" : 1
},
"ns" : "test_db.wengine_queue",
"name" : "locked_1"
},
{
"v" : 1,
"key" : {
"rule_completed" : 1
},
"ns" : "test_db.wengine_queue",
"name" : "rule_completed_1"
},
{
"v" : 1,
"key" : {
"execute_at" : -1
},
"ns" : "test_db.wengine_queue",
"name" : "execute_at_-1"
},
{
"v" : 1,
"key" : {
"thread_id" : 1
},
"ns" : "test_db.wengine_queue",
"name" : "thread_id_1"
},
{
"v" : 1,
"key" : {
"rule_id" : 1
},
"ns" : "test_db.wengine_queue",
"name" : "rule_id_1"
}
]
Is there any way around this?
For those interested -- I had to create a separate index ending with the key that the set is to be sorted by.
That warning is thrown when an operation that wants to yield (such as long updates, removes, etc.) cannot do so because it cannot release the lock it's holding for whatever reason.
Do you have the field you're sorting on indexed? If not adding an index for that will probably remove the warnings.

Resources