Springdata mongodb aggregation match - spring

After asking question to understand a bit more of the aggregation framework in MongoDB I finally found the way to do aggregation for my need (thanks to a StackExchange user)
So basically here is a document from my collection:
{
"_id" : ObjectId("s4dcsd5s4d6c54s6d"),
"items" : [
{
type : "TYPE_1",
text : "blablabla"
},
{
type : "TYPE_2",
text : "blablabla"
},
{
type : "TYPE_3",
text : "blablabla"
},
{
type : "TYPE_1",
text : "blablabla"
},
{
type : "TYPE_2",
text : "blablabla"
},
{
type : "TYPE_1",
text : "blablabla"
}
]
}
The idea was to be able to filter only some elements of my collections (avoiding Type 2 and 3). In fact I have more than 30 types and 6 are not allowed but for simplicity I made this example.
So the aggregation command in command line is this one:
db.history.aggregate([{
$match: {
_id: ObjectId("s4dcsd5s4d6c54s6d")
}
}, {
$unwind: '$items'
}, {
$match: {
'items.type': { '$nin': [ "TYPE_2" , "TYPE_3"] }
}
},
{ $limit: 10 }
]);
With this I am able to retrieve the 10 elements items of this document which do not match TYPE_2 and TYPE_3
However when I am using spring data there is no output. I looked a bit at the example to build mine but its still not working.
So I did:
Aggregation aggregation = newAggregation(
match(Criteria.where("id").is(myID)),
unwind("items"),
match(Criteria.where("items.type").nin(ignoreditemstype)),
limit(3),
skip(offsetLong)
);
AggregationResults<PersonnalHistory> results = mongAccess.getOperation().aggregate(query,
"items", PersonnalHistory.class);
PersonnalHistory is marked with annotation #Document(collection = "history") and id with the #id annotation
ignoreditemstype is a list containing TYPE_2 and TYPE_3
Here is what I have in the toString method of aggregation:
{
"aggregate" : "__collection__" ,
"pipeline" : [
{ "$match": { "id" : "s4dcsd5s4d6c54s6d"} },
{ "$unwind": "$items"},
{ "$match": { "items.type": { "$nin" : [ "TYPE_2" , "TYPE_3" ] } } },
{ "$limit" : 3},
{ "$skip" : 0 }
]
}
I tried a lot of stuff (to have at least an answer :) ) like removing id or the nin:
aggregation = newAggregation(
unwind("items"),
match(Criteria.where("items.type").nin(ignoreditemstype)),
limit(3),
skip(offsetLong)
);
aggregation = newAggregation(
match(Criteria.where("id").is(myid)),
unwind("items")
);
For information when I do a simple query like:
query.addCriteria(Criteria.where("id").is(myID));
My document is returned. However I have thousands of items. So I just want to have the 15 first (in fact the 15 first are the 15 last added)
Do you maybe see what I am doing wrong?

Yeah looks like you are passing simple String while it is expecting ObjectId
Aggregation aggregation = newAggregation(
match(Criteria.where("_id").is(new ObjectId(myID))),
unwind("items"),
match(Criteria.where("items.type").nin(ignoreditemstype)),
limit(3),
skip(offsetLong)
);
Now the question is why it works with simple query, my answer would be because spring-data driver is not that mature at least not with aggregation pipeline.

Related

Run a subquery for each of the filtered elasticsearch documents

I have an index named employees with the following structure:
{
id: integer,
name: text,
age: integer,
cityId: integer,
resumeText: text <--------- parsed resume text
}
I want to search employees with certain criteria e.g having age > 40, resumeText contains a specific skill or employee belongs to a certain city etc, and have the following query for so far requirement:
{
query:{
bool:{
should:[
{
term:{
cityId:2990
},
{
match:{
resumeText:"marketing"
},
{
match:{
resumeText:"critical thinking"
}}}
],
filter:{
range:{
age:{
gte:40
}}}}}
}
This gives me expected results but i want to know also among the returned documents/employees which are the ones whose resumeText contains the mentioned skills. e.g in the response, I want to get documents having mentioned that this document had matched "critical thinking" , this employee had matched both the skills and this employee didn't match any skills (as it was returned based on other filters)
What changes do i need to do to get the desired results:
can aggregation help?
can we rum a script for EACH filtered document to compute desired result (sub query for each document)?
any other approach?
Yes, You can use aggregation.
Refer this
You can bucket like how many resumes are matching each skill you are looking for.
GET employees/_search
{
"size": 0,
"aggs" : {
"messages" : {
"filters" : {
"filters" : {
"marketing_resume_count" : { "match" : { "resumeText" : "marketing" }},
"thinking_resume_count" : { "match" : { "resumeText" : "thinking" }}
}
}
}
}
}
To extend to your use case:
You can add query section to the query as below
GET employees/_search
{
"size": 0,
"query":{
"match":{
"region":"AM"
}
},
"aggs" : {
"messages" : {
"filters" : {
"filters" : {
"marketing_resume_count" : { "match" : { "resumeText" : "marketing" }},
"thinking_resume_count" : { "match" : { "resumeText" : "thinking" }}
}
}
}
}
}
You can use range query to handle gte and let conditions. You can refer this for range query example. This can be used in place of query section.

How to search through nested array and retreive only matched elements with mongo and springdata [duplicate]

This question already has answers here:
Find in Double Nested Array MongoDB
(2 answers)
Spring data Match and Filter Nested Array
(1 answer)
Closed 3 years ago.
I'm looking to search into my collection and retreive only element who matched Criteria.
Here is my collection :
{
"_id" : "id",
"name" : "test",
"groupUsers" : [
{
"name" : "blabla",
"toys" : [
{
"createdAt" : ISODate("2019-10-30T12:59:41.409Z"),
},
{
"createdAt" : ISODate("2019-11-30T12:59:10.409Z"),
},
{
"createdAt" : ISODate("2019-12-30T12:59:12.409Z"),
}
],
"createdAt" : ISODate("2019-10-30T12:33:39.036Z")
},
{
"name" : "blabla2",
"toys" : [
{
"createdAt" : ISODate("2019-10-32T12:59:41.409Z"),
},
{
"createdAt" : ISODate("2019-11-30T12:59:56.409Z"),
},
{
"createdAt" : ISODate("2019-12-30T12:59:15.409Z"),
}
],
"createdAt" : ISODate("2019-10-32T12:33:39.036Z")
}
],
}
I want to retreive the whole collection but it depends when the user was added to the group for example, user blabla2 (in the example above) will only get the whole group but with only the two last toys of the first user in the response.
Anyway, I guess it's something really basic but I don't know why I can't figure it out.
What I'm Doing
I'm doing a first query to get the current user and get when he was added in the group (notice that the date gets converted into java Date Util here).
Aggregation groupAgg = newAggregation(match(Criteria.where("_id").is(groupId).and("groupUsers.userId").is(userId)));
GroupUser groupUser = mongoTemplate.aggregate(groupAgg, Group.class, GroupUser.class).getUniqueMappedResult();
In a second query, I want to get the whole document but only with the Criteria that I define before.
MatchOperation matchedGroup = match(new Criteria("_id").is(groupId));
MatchOperation matchedToys = match(
new Criteria("groupUsers.toys.createdAt").gte(groupUser.getCreatedAt()));
Aggregation aggregation = newAggregation(matchedGroup, matchedToys);
AggregationResults<Group> result = mongoTemplate.aggregate(aggregation, Group.class, Group.class);
Group group = result.getUniqueMappedResult();
This query doesn't work, and I'm looking to something like even if there is no match (for example, none toys has been created yet), it still return the group basic response and not null.
Maybe I need to unwind the nested array ?
Any help is appreciate. I'm using spring data.
Try this query
db.testers.aggregate([
{
$addFields:{
"groupUsers":{
$map:{
"input":"$groupUsers",
"as":"doc",
"in":{
$mergeObjects:[
"$$doc",
{
"toys":{
$filter:{
"input":"$$doc. toys",
"as":"sn",
"cond": {
"$and": [
{ "$gte": [ "$$sn.createdAt", ISODate('2015-06-17T10:03:46.000Z') ] },
]
}
}
}
}
]
}
}
}
}
}
]).pretty()

elasticsearch bulk delete by custom field values

I'm building app with elasticsearch (5.4) and everything was going well until I try to delete several documents by field values. My x-ndjson looks like this:
{ "delete" : {} }
{ "id" : "109991" }
{ "delete" : {} }
{ "id" : "109992" }
{ "delete" : {} }
{ "id" : "109993" }
<- empty line
and i am POSTing it on http://localhost:9200/someindex/sometype/_bulk, but it responds with "Malformed action/metadata line [2], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]".
Note that my "id" is my custom field, not the _id.
Is something missing in my request?
Thank you
I guess you need to use Delete By Query for this.
POST index/_delete_by_query
{
"query": {
"terms": {
"id": [
109991,
109992
]
}
}
}

Elasticsearch : adding temporary field to result with Groovy

I've a large configuration with queries and filters, which works fine.
Now, I'm adding a new script filter with Groovy, which works fine too:
doc['age'].value >= 18;
But I'm wondering how to do the following with Groovy:
Add a temporary boolean field to current document. See example below.
Example document in my result:
{
"name": "foo",
"age": 20
}
But I want to add the result of the script filter in my result, like so:
{
"name": "foo",
"age": 20,
"age_ok": true
}
age_ok is not indexed, but set by the Groovy filter.
AFAIK, You can't inject a script filter into the search result, but you can use scriptable field in order to inject scriptable data. You'll have to duplicate some part of the script.
From the documentation :
{
"query" : {
...
},
"script_fields" : {
"test1" : {
"script" : "doc['my_field_name'].value * 2"
},
"test2" : {
"script" : {
"inline": "doc['my_field_name'].value * factor",
"params" : {
"factor" : 2.0
}
}
}
}
}
See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html

Join/Merge Elasticsearch results

We have a documents with a (simplified) structure as shown here in Elasticsearch:
{ _id: ..., patientId: 4711, text: "blue" }
{ _id: ..., patientId: 4711, text: "red" }
{ _id: ..., patientId: 4712, text: "blue" }
{ _id: ..., patientId: 4712, text: "green" }
{ ... }
How can I create a query to find all documents containing the text
blue and red within the SAME patient.
In the above example I would expect a result set of two documents with patientId 4711 (contains blue and red).
Potential solution strategies might be :
Run two queries and "join" results afterward by application logic.
Run separate queries based on prior list of patients. Only feasible if number of potential patients are small.
Are there better ways (ideal one query) to handle this use case?
How about changing the way you store data into elastisearch.
Just store one document for one patient id, and keep text as array of all distinct colors assigned to that patient.
You can simply use bool query or bool filter
Example using bool filter
{
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"bool" : {
"Must" : [
{
"term" : { "text" : "blue" }
},
{
"term" : { "text" : "red" }
}
]
}
}
}
}
Edit: misread the requirement:
You should be using field collapsing

Resources