MongoDB: Get count of inner array object with nested array element match - spring

I have mongo collection with survey answers submitted by each user. I would like to get the count of users selected as an option. Only one user has selected the option O12. The output should be 1.
{
"_id" : ObjectId("5ea179eb39ff117948f19266"),
"_class" : "model.survey.Answer",
"survey_id" : "5ea178c239ff117948f19265",
"survey_user" : [
{
"user_id" : 1072,
"user_option" : [
{
"question_id" : "Q1",
"option_id" : "O11"
},
{
"question_id" : "Q2",
"option_id" : "O21"
},
{
"question_id" : "Q3",
"option_id" : "O31"
},
{
"question_id" : "Q4",
"option_id" : "O41"
}
]
},
{
"user_id" : 1073,
"user_option" : [
{
"question_id" : "Q1",
"option_id" : "O12"
},
{
"question_id" : "Q2",
"option_id" : "O21"
},
{
"question_id" : "Q3",
"option_id" : "O31"
},
{
"question_id" : "Q4",
"option_id" : "O41"
}
]
}
]
}

You can do that using MongoDB's aggregation-pipeline :
Different ways to do it, One way is to use $unwind:
Type 1 - Query 1 :
db.collection.aggregate([
/** Optional but will be good on huge collections to lessen data for further stages */
{
$match: { "survey_user.user_option.option_id": "O12" }
},
{
$unwind: "$survey_user"
},
/** When you unwind a each object/element in array gets it's own document after `unwind` stage */
{
$match: { "survey_user.user_option.option_id": "O12" }
},
/** After match you'll only have objects which met the criteria in `survey_user` array */
/** group on `_id` & push entire original doc to data field */
{
$group: { _id: "$_id", survey_user: { $push: "$survey_user" }, data: { $first: "$$ROOT" } }
},
/** Add `survey_user` array to `data.survey_user` & it's size to `data.optedCount` field */
{
$addFields: { "data.survey_user": "$survey_user", "data.optedCount": { $size: "$survey_user" } }
},
/** Make `data` as new root to doc */
{
$replaceRoot: { newRoot: "$data" }
}
])
Test : mongoplayground
Just in case if you just need count but not needed the entire doc to be returned there will be a minor change in above query :
Type 1 - Query 2 :
db.collection.aggregate([
{
$match: { "survey_user.user_option.option_id": "O12" }
},
{
$unwind: "$survey_user"
},
{
$match: { "survey_user.user_option.option_id": "O12" }
},
/** Just group on `_id` & count no.of docs, maintain `survey_id` */
{
$group: { _id: "$_id", optedCount: { $sum: 1 }, survey_id: { $first: "$survey_id" } }
}
])
Test : mongoplayground
Using array iterator $reduce, which might be helpful if your collections data is so huge, as unwind will explode your docs.
Type 2 - Query :
db.collection.aggregate([
{
$match: {
"survey_user.user_option.option_id": "O12",
},
},
/** Instead of `$addFields`, you can use `$project` to project fewer needed fields (which can be help improve query with performance benefits ) */
{
$addFields: {
optedCount: {
$reduce: {
input: "$survey_user",
initialValue: 0,
in: {
$cond: [
{ $in: ["O12", "$$this.user_option.option_id"] },
{ $add: ["$$value", 1] },
"$$value",
]
}
}
}
}
}
]);
Test : mongoplayground

Related

Search in MongoDB with the condition to get only one result per attribute with the higehst version

Excuse my newbie question but I can't figure it out.
This is my collection:
[
{
_id: "A",
uuid: "12345",
version: 1,
test: "data1"
},
{
_id: "B",
uuid: "56566",
version: 1,
test: "data2"
},
{
_id: "C",
uuid: "12345",
version: 2,
test: "data3"
}
]
I'm looking for a query with a UuidContains condition and with a exact condition.
findByUuidContains(5)
-> Result: [B,C] as Object Array
findByUuidContains(12345)
-> Result: [C] as Object Array
findByUuidContains(66)
-> Result: [B]
Is this kind of query possible?
In words:
Select all Object that uuid contains ${value} and from the resultset select only one per uuid with the highest Version.
EDIT1:
I changed the group projection from answer:
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$gt": [
{
"$indexOfCP": [
{
"$toLower": "$uuid"
},
"5"
]
},
-1
]
},
"$$KEEP",
"$$PRUNE"
]
}
},
{
$sort: {
version: -1
}
},
{
$group: {
_id: {
uuid: "$uuid"
},
version: {
$first: "$version"
},
id: {
$first: "$_id"
},
test: {
$first: "$test"
}
}
},
{
"$project": {
num: "1",
id: 1,
_id: 0,
version: 1,
test: 1
}
},
{
"$group": {
"_id": "$num",
"result": {
"$addToSet": {
id: "$id",
version: "$version",
test: "$test"
}
}
}
},
{
"$project": {
_id: 0,
result: 1
}
}
])
and I added some test data attributes to my documents. Now I have to 'translate' it into the spring boot 'language'
EDIT2:
I'm currently trying to translate the second answer but I can't figure out how the GroupOpertaion in Spring works. Somebody familiar with it? The first and second operation works like the mongo query operations but it failed by the group operation
String uuidRegexExp = String.format(".*%s.*", uuidSegment);
Pattern uuidPattern = Pattern.compile(uuidRegexExp);
MatchOperation match = new MatchOperation(Criteria.where("uuid").regex(uuidPattern));
SortOperation sort = Aggregation.sort(Sort.Direction.DESC,"version");
GroupOperation grup = Aggregation.group("version").first("version").as("version");
Aggregation aggregate = Aggregation.newAggregation(
match, sort, grup
);
AggregationResults<Example> aggregate1 = mongoTemplate.aggregate(aggregate, Example.COLLECTION_NAME, Example.class);
aggregate1.getMappedResults().forEach(er -> log.info(er.toString()));
This is the example class:
#Data
#Document(Example.COLLECTION_NAME)
public class Example {
public static final String COLLECTION_NAME = "Example";
public static final String FIELD_UUID_NAME = "uuid";
public static final String FIELD_HOST_NAME = "host";
public static final String FIELD_URL_NAME = "url";
public static final String FIELD_VERSION_NAME = "version";
public static final String FIELD_ID_NAME = "_id";
#Field(FIELD_ID_NAME)
private ObjectId _id;
#Field(FIELD_UUID_NAME)
private String uuid;
#Field(FIELD_HOST_NAME)
private String host;
#Field(FIELD_URL_NAME)
private String url;
#Field(FIELD_VERSION_NAME)
private Long version;
}
EDIT3:
I think I have done it. Here is the Code in a not pretty version:
String uuidRegexExp = String.format(".*%s.*", uuidSegment);
Pattern uuidPattern = Pattern.compile(uuidRegexExp);
MatchOperation match = new MatchOperation(Criteria.where("uuid").regex(uuidPattern));
SortOperation sort = Aggregation.sort(Sort.Direction.DESC,"version");
GroupOperation grup = Aggregation.group("uuid").first("version").as("version").first("_id").as("id");
ProjectionOperation project = Aggregation.project().and("_id").as("uuid").and("version").as("version").and("id").as("_id");
Aggregation aggregate = Aggregation.newAggregation(
match, sort, grup,project
);
AggregationResults<Example> aggregate1 = mongoTemplate.aggregate(aggregate, SingleRawArticle.COLLECTION_NAME, Example.class);
Is this something you are looking for? I have created mongo playground for it. You can check the query by passing diffrent parameters. I have used 5 in example like below. But i have also tried with 12345 and 66 and it looks fine to me.
{
"$indexOfCP": [
{
"$toLower": "$uuid"
},
"5"
]
},
Mongo Playground
Here is the query :
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$gt": [
{
"$indexOfCP": [
{
"$toLower": "$uuid"
},
"5"
]
},
-1
]
},
"$$KEEP",
"$$PRUNE"
]
}
},
{
$sort: {
version: -1
}
},
{
$group: {
_id: {
uuid: "$uuid"
},
version: {
$first: "$version"
},
id: {
$first: "$_id"
}
}
},
{
"$project": {
num: "1",
id: 1,
_id: 0
}
},
{
"$group": {
"_id": "$num",
"result": {
"$addToSet": "$id"
}
}
},
{
"$project": {
_id: 0,
result: 1
}
}
])
check the below query to get the documents matching the given string. I have used regex to match the input string.
db.collection.aggregate(
[
{
"$match" : {
"uuid" : {
"$regex" : ".*5.*"
}
}
},
{
"$sort" : {
"version" : -1.0
}
},
{
"$group" : {
"_id" : {
"uuid" : "$uuid"
},
"uuid" : {
"$first" : "$uuid"
},
"id" : {
"$first" : "$_id"
},
"version" : {
"$first" : "$version"
}
}
},
{
"$project" : {
"_id" : "$id",
"uuid" : 1.0,
"version" : 1.0
}
}
],
{
"allowDiskUse" : false
}
);
Output:
{
"uuid" : "12345",
"version" : 2.0,
"_id" : "C"
}
{
"uuid" : "56566",
"version" : 1.0,
"_id" : "B"
}
Java code equivalent to query. Modified your edit according to the latest changes. Changed variable names to be more specific.
String uuidRegexExp = String.format(".*%s.*", uuidSegment);
MatchOperation match = new MatchOperation(Criteria.where("uuid").regex(Pattern.compile(uuidRegexExp)));
SortOperation sort = Aggregation.sort(Sort.Direction.DESC,"version");
GroupOperation group = Aggregation.group("uuid").first("version").as("version").first("_id").as("id").first("uuid").as("uuid");
ProjectionOperation project = Aggregation.project().and("uuid").as("uuid").and("version").as("version").and("id").as("_id");
Aggregation aggregation = Aggregation.newAggregation(
match, sort, group,project
);
AggregationResults<Example> aggregate = mongoTemplate.aggregate(aggregation, SingleRawArticle.COLLECTION_NAME, Example.class);

Spring Data MongoDB building dynamic query

Need help to build dynamic MongoDB query.
everything inside the "$or" Array is dynamic.
db.group.find({
"version" : NumberLong(0),
"$or" : [{
"$and" : [
{
"object_type" : "D"
},
{
"type" : "R"
},
{
"name" : "1"
}
]
},{
"$and" : [
{
"object_type" : "D"
},
{
"type" : "E"
},
{
"name" : "2"
}
]
]
});
Did the below spring data query but doesn't work
Criteria criteria = Criteria.where("version").is("123");
List<Criteria> docCriterias = new ArrayList<Criteria>();
groups.stream().forEach(grp -> {
docCriterias.add(Criteria.where("type").is(grp.get("type").toString())
.andOperator(Criteria.where("object_type").is(grp.get("objectType").toString()))
.andOperator(Criteria.where("name").is(grp.get("name").toString())));
});
criteria.orOperator((Criteria[]) docCriterias.toArray());
Query q = new Query(criteria);
Thanks for the help
You should pay attention to how you combine the operators.
The ff code should work for you (note this is groovy remember to change the closure into to java lambda expression):
List<Criteria> docCriterias = new ArrayList<Criteria>();
List groups = [
[
type: "type1",
object_type: "object_type1",
name: "name1"
],
[
type: "type2",
object_type: "object_type2",
name: "name2"
],
[
type: "type3",
object_type: "object_type3",
name: "name3"
],
]
groups.stream().each {grp ->
docCriterias.add(new Criteria().andOperator(
Criteria.where("type").is(grp.get("type")),
Criteria.where("object_type").is(grp.get("object_type")),
Criteria.where("name").is(grp.get("name"))
))
};
Criteria criteria = new Criteria().andOperator(
Criteria.where("version").is("123"),
new Criteria().orOperator(docCriterias.toArray(new Criteria[docCriterias.size()]))
);
Query q = new Query(criteria);
Which will give you this query:
{
"$and":[
{
"version":"123"
},
{
"$or":[
{
"$and":[
{
"type":"type1"
},
{
"object_type":"object_type1"
},
{
"name":"name1"
}
]
},
{
"$and":[
{
"type":"type2"
},
{
"object_type":"object_type2"
},
{
"name":"name2"
}
]
},
{
"$and":[
{
"type":"type3"
},
{
"object_type":"object_type3"
},
{
"name":"name3"
}
]
}
]
}
]
},
Fields:{
},
Sort:{
}
You could reach this using MongoDB Aggregation Pipeline in Json and Apache Velocity to customize more the Query, then execute this using db.runCommand using Spring MongoTemplate.
Example:
monodb_client_dynamic_query.vm
{
"aggregate": "client",
"pipeline": [
{
"$match" : {
"$and" : [
{
"is_removed" : {
"$ne" : [
true
]
}
},
{
"errors" : {
"$size" : 0.0
}
},
{
"client_id": "$velocityMap.client_id"
}
]
}
},
{
"$project" : {
"_id" : -1.0,
"account" : "$_id.account",
"person_id" : "$_id.person_id",
"begin_date": { $dateToString: { format: "%Y-%m-%d", date: "$value.begin_date" } },
"end_date": { $dateToString: { format: "%Y-%m-%d", date: "$value.end_date" } }
}
}
]
}
Then execute using MondoTemplate:
String script = ...load from file the script monodb_client_dynamic_query.vm
Map parameters = ... put all variables to replace in the mongodb script
String scriptNoSql = VelocityUtil.loadTemplateVM(script, parameters);
DBObject dbObject = (BasicDBObject) JSON.parse(scriptNoSql);
if (null == dbObject) {
return;
}
DB db = mongoTemplate.getDb();
CommandResult result = db.command(dbObject);
if(!result.ok()) {
throw result.getException();
}

Spring Data REST: slow page summary

I have a finder defined as a Spring Data repository derived from MongoRepository which searches for 3 different attributes in MongoDB. All three have a single index.
public Page<Content> findByIdInOrAuthorUserNameInOrTagsIdIn(
#Param("ids") Collection ids,
#Param("userNames") Collection userName,
#Param("tagIds") Collection tagIds,
#Param("pageable") Pageable pageable);
The problem is that one attributes has a result set of 2,5 mio entries:
"page": {
"size": 20,
"totalElements": 2531397,
"totalPages": 126570,
"number": 5
}
So the query for a page is quite fast (13ms) as seen in the mongo log file:
2017-04-10T12:50:27.562+0200 I COMMAND [conn68] command content.content command: find { find: "content", filter: { $or: [ { $or: [ { _id: { $in: [ "..." ] } }, { author.userName: { $in: [ "...", "..." ] } } ] }, { tags._id: { $in: [ "..." ] } } ] }, skip: 100, limit: 20 } planSummary: IXSCAN { _id: 1 }, IXSCAN { tags._id: 1 }, IXSCAN { author.userName: 1 } keysExamined:120 docsExamined:120 cursorExhausted:1 numYields:0 nreturned:20 reslen:21185 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 13ms
But it seems that the page summary which counts the result takes ~117s:
2017-04-10T12:52:24.172+0200 I COMMAND [conn68] command content.content command: count { count: "content", query: { $or: [ { $or: [ { _id: { $in: [ "..." ] } }, { author.userName: { $in: [ "...", "..." ] } } ] }, { tags._id: { $in: [ "..." ] } } ] } } planSummary: IXSCAN { _id: 1 }, IXSCAN { tags._id: 1 }, IXSCAN { author.userName: 1 } keysExamined:2531466 docsExamined:2531397 numYields:21190 reslen:44 locks:{ Global: { acquireCount: { r: 42382 } }, Database: { acquireCount: { r: 21191 } }, Collection: { acquireCount: { r: 21191 } } } protocol:op_query 116592ms
Is there a way to switch off the page summary or speed up the counting some how?
Use Slice instead of Page. It is very similar to Page, but doesn't need the total count of elements.

Create Spring Data Aggregation Query with Projection of Nested Array

Here is how my document looks like:
{
"_id" : ObjectId("583cb6bcce047d1e68339b64"),
"variantDetails" : [
{
"variants" : {
"_" : "_"
},
"sku" : "069563-59690"
},
{
"variants" : {
"size" : "35"
},
"sku" : "069563-59690-35",
"barcode" : "809702246941"
},
{
"variants" : {
"size" : "36"
},
"sku" : "069563-59690-36",
"barcode" : "809702246958"
}
......
] }
And I would like to use a complex aggregation query like this:
db.getCollection('product').aggregate([
{ '$match': { 'variantDetails.sku': { '$in': ['069563-59690', '069563-59690-36', '069563-59690-37', '511534-01001'] } } },
{ '$project': {'_id': 1, 'variantDetails': 1, 'variantLength': { '$size': '$variantDetails' } } },
{ '$unwind': '$variantDetails' },
{ '$match': { 'variantDetails.sku': { '$in': ['069563-59690', '069563-59690-36', '069563-59690-37', '511534-01001'] } } },
{ '$match': { '$or': [
{'variantLength': { '$ne': 1 }, 'variantDetails.variants._': { '$ne': '_' } },
{'variantLength': 1 }
] } },
{ '$group': { '_id': '$_id', 'variantDetails': { '$push': '$variantDetails' } } },
{ '$project': {'_id': 1, 'variantDetails.sku': 1, 'variantDetails.barcode': 1} }
])
And here is my java code:
final Aggregation agg = Aggregation.newAggregation(
Aggregation.match(Criteria.where("variantDetails.sku").in(skus)),
Aggregation.project("_id", "variantDetails").and("variantDetails").project("size").as("variantLength"),
Aggregation.unwind("variantDetails"),
Aggregation.match(Criteria.where("variantDetails.sku").in(skus)),
Aggregation.match(new Criteria().orOperator(Criteria.where("variantLength").is(1), Criteria.where("variantLength").ne(1).and("variantDetails.variants._").is("_"))),
Aggregation.group("_id").push("variantDetails").as("variantDetails"),
Aggregation.project("_id", "variantDetails.sku", "variantDetails.barcode")
);
final AggregationResults<Product> result = this.mongo.aggregate(agg, this.mongo.getCollectionName(Product.class), Product.class);
return result.getMappedResults();
The problem is that spring translate
Aggregation.project("_id", "variantDetails.sku", "variantDetails.barcode")
To
{ "$project" : { "_id" : 1 , "sku" : "$variantDetails.sku" , "barcode" : "$variantDetails.barcode"}
But I'm expecting
{ '$project': {'_id': 1, 'variantDetails.sku': 1, 'variantDetails.barcode': 1} }
Could someone let me know how to make it right?
I had the same issue and this way works:
Aggregation.project("_id")
.andExpression("variantDetails.sku").as("variantDetails.sku")
.andExpression("variantDetails.barcode").as("variantDetails.barcode"));
The projection will be:
{'$project': {'_id': 1, 'variantDetails.sku': '$variantDetails.sku',
'variantDetails.barcode': '$variantDetails.barcode'} }
You just need to specify the label as alias in the projection operation as the default that spring provides doesnt match. Use Spring 1.8.5 version
Aggregation.project("_id")
.and(context -> new BasicDBObject("$arrayElemAt", Arrays.asList("variantDetails.sku", 0))).as("variantDetails.sku")
.and(context -> new BasicDBObject("$arrayElemAt", Arrays.asList("variantDetails.barcode", 0))).as("variantDetails.barcode"));
May be an old question, but I faced the same issue pointed by Sean.
If found that if you want the expected result
{ '$project': {'_id': 1, 'variantDetails.sku': 1, 'variantDetails.barcode': 1} }
a solution can be:
Aggregation.project("_id")
.andExpression("1").as("variantDetails.sku")
.andExpression("1").as("variantDetails.barcode")
Virginia León's answer was the starting point for finding this solution

MongoDB scans entire index when using $all and $elemMatch

I have a collection of user documents, where each user can have an arbitrary set of properties. Each user is associated to an app document. Here is an example user:
{
"appId": "XXXXXXX",
"properties": [
{ "name": "age", "value": 30 },
{ "name": "gender", "value": "female" },
{ "name": "alive", "value": true }
]
}
I would like to be able to find/count users based on the values of their properties. For example, find me all users for app X that have property Y > 10 and Z equals true.
I have a compound, multikey index on this collection db.users.ensureIndex({ "appId": 1, "properties.name": 1, "properties.value": 1}). This index is working well for single condition queries, ex:
db.users.find({
appId: 'XXXXXX',
properties: {
$elemMatch: {
name: 'age',
value: {
$gt: 10
}
}
}
})
The above query completes in < 300ms with a collection of 1M users. However, when I try and add a second condition, the performance degrades considerably (7-8s), and the explain() output indicates that the whole index is being scanned to fulfill the query ("nscanned" : 2752228).
Query
db.users.find({
appId: 'XXXXXX',
properties: {
$all: [
{
$elemMatch: {
name: 'age',
value: {
$gt: 10
}
}
},
{
$elemMatch: {
name: 'alive',
value: true
}
}
]
}
})
Explain
{
"cursor" : "BtreeCursor appId_1_properties.name_1_properties.value_1",
"isMultiKey" : true,
"n" : 256,
"nscannedObjects" : 1000000,
"nscanned" : 2752228,
"nscannedObjectsAllPlans" : 1018802,
"nscannedAllPlans" : 2771030,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 21648,
"nChunkSkips" : 0,
"millis" : 7425,
"indexBounds" : {
"appId" : [
[
"XXXXX",
"XXXXX"
]
],
"properties.name" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"properties.value" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"filterSet" : false
}
I assume this is because Mongo is unable to create suitable bounds since I am looking for both boolean and integer values.
My question is this: Is there a better way to structure my data, or modify my query to improve performance and take better advantage of my index? Is it possible to instruct mongo to treat each condition separately, generate appropriate bounds, and then perform the intersection of the results, instead of scanning all documents? Or is mongo just not suited for this type of use case?
I know this is an old question, but I think it would be much better to structure your data without the "name" and "value" tags:
{
"appId": "XXXXXXX",
"properties": [
{ "age": 30 },
{ "gender: "female" },
{ "alive": true }
]
}

Resources