Using Apache Nifi i am trying to figure out how to find records which have a string in an array that start with a value
Given the below array, i would like only record which have a tag that start with '/test2/'
[
{
"name":"bob",
"tags":[ "/test1/foo","/alpha"]
}
,
{
"name":"bill",
"tags":[ "/test2/blah","/beta"]
}
]
SELECT * FROM FLOWFILE WHERE RPATH_STRING(tags, '/') LIKE '/test2/%'
due to java.lang.String cannot be cast to org.apache.nifi.serialization.record.Record: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.nifi.serialization.record.Record
I've tried a few other permutations, but no luck.
Possible solution with 2 processors (ScriptedTransformProcessor -> QueryRecord):
ScriptedTransformProcessor (add new field tags_str - concatenating all of the elements in tags with delimiter |)
Script Language: Groovy
Script Body:
record.setValue('tags_str', record.getValue('tags').join("|"))
record
Output (JSON):
[ {
"name" : "bob",
"tags" : [ "/test1/foo", "/alpha" ],
"tags_str" : "/test1/foo|/alpha"
}, {
"name" : "bill",
"tags" : [ "/test2/blah", "/beta" ],
"tags_str" : "/test2/blah|/beta"
} ]
QueryRecord (filter)
filter (dynamic property):
SELECT name, tags
FROM FLOWFILE
WHERE tags_str LIKE '%/test2/%'
output (JSON):
[ {
"name" : "bill",
"tags" : [ "/test2/blah", "/beta" ]
} ]
Related
This question already has answers here:
Find in Double Nested Array MongoDB
(2 answers)
Spring data Match and Filter Nested Array
(1 answer)
Closed 3 years ago.
I'm looking to search into my collection and retreive only element who matched Criteria.
Here is my collection :
{
"_id" : "id",
"name" : "test",
"groupUsers" : [
{
"name" : "blabla",
"toys" : [
{
"createdAt" : ISODate("2019-10-30T12:59:41.409Z"),
},
{
"createdAt" : ISODate("2019-11-30T12:59:10.409Z"),
},
{
"createdAt" : ISODate("2019-12-30T12:59:12.409Z"),
}
],
"createdAt" : ISODate("2019-10-30T12:33:39.036Z")
},
{
"name" : "blabla2",
"toys" : [
{
"createdAt" : ISODate("2019-10-32T12:59:41.409Z"),
},
{
"createdAt" : ISODate("2019-11-30T12:59:56.409Z"),
},
{
"createdAt" : ISODate("2019-12-30T12:59:15.409Z"),
}
],
"createdAt" : ISODate("2019-10-32T12:33:39.036Z")
}
],
}
I want to retreive the whole collection but it depends when the user was added to the group for example, user blabla2 (in the example above) will only get the whole group but with only the two last toys of the first user in the response.
Anyway, I guess it's something really basic but I don't know why I can't figure it out.
What I'm Doing
I'm doing a first query to get the current user and get when he was added in the group (notice that the date gets converted into java Date Util here).
Aggregation groupAgg = newAggregation(match(Criteria.where("_id").is(groupId).and("groupUsers.userId").is(userId)));
GroupUser groupUser = mongoTemplate.aggregate(groupAgg, Group.class, GroupUser.class).getUniqueMappedResult();
In a second query, I want to get the whole document but only with the Criteria that I define before.
MatchOperation matchedGroup = match(new Criteria("_id").is(groupId));
MatchOperation matchedToys = match(
new Criteria("groupUsers.toys.createdAt").gte(groupUser.getCreatedAt()));
Aggregation aggregation = newAggregation(matchedGroup, matchedToys);
AggregationResults<Group> result = mongoTemplate.aggregate(aggregation, Group.class, Group.class);
Group group = result.getUniqueMappedResult();
This query doesn't work, and I'm looking to something like even if there is no match (for example, none toys has been created yet), it still return the group basic response and not null.
Maybe I need to unwind the nested array ?
Any help is appreciate. I'm using spring data.
Try this query
db.testers.aggregate([
{
$addFields:{
"groupUsers":{
$map:{
"input":"$groupUsers",
"as":"doc",
"in":{
$mergeObjects:[
"$$doc",
{
"toys":{
$filter:{
"input":"$$doc. toys",
"as":"sn",
"cond": {
"$and": [
{ "$gte": [ "$$sn.createdAt", ISODate('2015-06-17T10:03:46.000Z') ] },
]
}
}
}
}
]
}
}
}
}
}
]).pretty()
Context:
1) We are building a CDC pipeline (using kafka & connect framework)
2) We are using debezium for capturing mysql Tx logs
3) We are using Elastic Search connector to add documents to ES index
Sample change event generated by Debezium:
{
"source" : {
"before" : {
"Id" : 97,
"name" : "Northland",
"code" : "NTL",
"country_id" : 6,
"is_business_mapped" : 0
},
"after" : {
"Id" : 97,
"name" : "Northland",
"code" : "NTL",
"country_id" : 6,
"is_business_mapped" : 1
},
"source" : {
"version" : "0.7.5",
"name" : "__",
"server_id" : 252639387,
"ts_sec" : 1547805940,
"gtid" : null,
"file" : "mysql-bin-changelog.000570",
"pos" : 236,
"row" : 0,
"snapshot" : false,
"thread" : 614,
"db" : "bazaarify",
"table" : "state"
},
"op" : "u",
"ts_ms" : 1547805939683
}
What we want :
We want to visualize only 3 columns in kibana :
1) before - containing the nested JSON as string
2) after - containing the nested JSON as string
3) source - containing the nested JSON as string
I can think below possibilities here :
a) Either converting nested JSON as string
b) Combining column data in elastic search
I am a newbie to elastic search . Can someone please guide me how to do that.
I tried defining custom mapping as well but it is giving me exception.
You can always view your document as a Raw JSON in Kibana.
You don't need to manipulate it before indexing in elastic.
As this is related to visualization, handle this in Kibana only.
Check this link for a screenshot.
Refer this to add the columns which you want to see onto the results
I don't fully understand your use case, but if you would like to turn some json's to their representing strings, then you can use logstash for that, or even Elasticsearch ingest capabilities to convert an object (json) to a string.
From the link above, an example:
PUT _ingest/pipeline/my-pipeline-id { "description": "converts the
content of the id field to an integer", "processors" : [
{
"convert" : {
"field" : "source",
"type": "string"
}
} ] }
After asking question to understand a bit more of the aggregation framework in MongoDB I finally found the way to do aggregation for my need (thanks to a StackExchange user)
So basically here is a document from my collection:
{
"_id" : ObjectId("s4dcsd5s4d6c54s6d"),
"items" : [
{
type : "TYPE_1",
text : "blablabla"
},
{
type : "TYPE_2",
text : "blablabla"
},
{
type : "TYPE_3",
text : "blablabla"
},
{
type : "TYPE_1",
text : "blablabla"
},
{
type : "TYPE_2",
text : "blablabla"
},
{
type : "TYPE_1",
text : "blablabla"
}
]
}
The idea was to be able to filter only some elements of my collections (avoiding Type 2 and 3). In fact I have more than 30 types and 6 are not allowed but for simplicity I made this example.
So the aggregation command in command line is this one:
db.history.aggregate([{
$match: {
_id: ObjectId("s4dcsd5s4d6c54s6d")
}
}, {
$unwind: '$items'
}, {
$match: {
'items.type': { '$nin': [ "TYPE_2" , "TYPE_3"] }
}
},
{ $limit: 10 }
]);
With this I am able to retrieve the 10 elements items of this document which do not match TYPE_2 and TYPE_3
However when I am using spring data there is no output. I looked a bit at the example to build mine but its still not working.
So I did:
Aggregation aggregation = newAggregation(
match(Criteria.where("id").is(myID)),
unwind("items"),
match(Criteria.where("items.type").nin(ignoreditemstype)),
limit(3),
skip(offsetLong)
);
AggregationResults<PersonnalHistory> results = mongAccess.getOperation().aggregate(query,
"items", PersonnalHistory.class);
PersonnalHistory is marked with annotation #Document(collection = "history") and id with the #id annotation
ignoreditemstype is a list containing TYPE_2 and TYPE_3
Here is what I have in the toString method of aggregation:
{
"aggregate" : "__collection__" ,
"pipeline" : [
{ "$match": { "id" : "s4dcsd5s4d6c54s6d"} },
{ "$unwind": "$items"},
{ "$match": { "items.type": { "$nin" : [ "TYPE_2" , "TYPE_3" ] } } },
{ "$limit" : 3},
{ "$skip" : 0 }
]
}
I tried a lot of stuff (to have at least an answer :) ) like removing id or the nin:
aggregation = newAggregation(
unwind("items"),
match(Criteria.where("items.type").nin(ignoreditemstype)),
limit(3),
skip(offsetLong)
);
aggregation = newAggregation(
match(Criteria.where("id").is(myid)),
unwind("items")
);
For information when I do a simple query like:
query.addCriteria(Criteria.where("id").is(myID));
My document is returned. However I have thousands of items. So I just want to have the 15 first (in fact the 15 first are the 15 last added)
Do you maybe see what I am doing wrong?
Yeah looks like you are passing simple String while it is expecting ObjectId
Aggregation aggregation = newAggregation(
match(Criteria.where("_id").is(new ObjectId(myID))),
unwind("items"),
match(Criteria.where("items.type").nin(ignoreditemstype)),
limit(3),
skip(offsetLong)
);
Now the question is why it works with simple query, my answer would be because spring-data driver is not that mature at least not with aggregation pipeline.
I have a use case of elastic search to update a doc.
My doc is something like this-
{
"first_name" : "firstName",
"last_name" : "lastName",
"version" : 1234,
"user_roles" : {
"version" : 12345,
"id" : 1234,
"name" : "role1"},
},
"groups" : {
"version" : 123,
"list": [
{"id":123, "name" : "ashd"},
{"id":1234, "name" : "awshd"},
]
}
}
Now depepeding on some feed I will either will be updating the parent doc or will be updating the nested doc.
I am able to find how to update the basic attributes like firstName and lastName but unable to get how to update complex/nested ones
I did something like from REST client-
"script": {
"inline": "ctx._source.user_roles = { "id" : 5678, "name" :"hcsdl"}
}
but its giving me exception-
Actual use case-
I will actually be getting a Map in java.
This key can be simple key like "first_name" or can be complex key like "user_role" and "groups"
I want to update the document using update by query on version.
The code I wrote is something like-
for (String key : document.keySet()) {
String value = defaultObjectMapper.writeValueAsString(document.get(key));
scriptBuilder.append("ctx._source.");
scriptBuilder.append(key);
scriptBuilder.append('=');
scriptBuilder.append(value);
scriptBuilder.append(";");
}
where document is the Map
Now I might get the simple fields to update or complex object.
I tried giving keys like user_roles.id and user_roles.name and also tried giving complete user_roles but nothing is working.
Can someone helpout
Try this with groovy maps instead of verbatim JSON inside your script:
"script": {
"inline": "ctx._source.user_roles = [ 'id' : 5678, 'name' : 'hcsdl']}
}
Let me give a example here:
Two entries in the collection Author:
{
"name" : "Joe"
"Book" : "A"
},
{
"name" : "Joe"
"Book" : "B"
}
Now, if I use the aggregation function in Mongo via spring mongo, basically just to grab the books with name Joe, it could be coded like:
Aggregation agg = newAggregation(Map.class, group("name", "Book"));
AggregationResults<Map> results = mongoTemplate.aggregate(agg, "Author",
Map.class);
Obviously I could get two Maps this way, one has entry {"name":"Joe", "Book": A}, the other has {"name" : "Joe", "Book" : "B"}
But what if I want get ONLY one result back, with one entry :
{"name" : Joe, "Books" : ["A", "B"]}?
I'm not sure if it is reachable just using one query. It certainly could be achieved by multiple steps, which I'd hate to do..
You need to use the $addToSet operator in your $group pipeline. This will return an array of all unique values ["A", "B"] that results from applying the $group expression to each document in a group of documents that share the same group by key "name". So in mongo shell you have
db.author.aggregate([
{ $group: {
_id: '$name',
Books: { $addToSet: '$Book' }
} }
]);
which brings back the desired result
{
"result" : [
{
"_id" : "Joe",
"Books" : [ "B", "A" ]
}
],
"ok" : 1
}
The equivalent Spring aggregation:
Aggregation agg = newAggregation(Map.class, group("name").addToSet("Book").as("Books"));
AggregationResults<Map> results = mongoTemplate.aggregate(agg, "Author", Map.class);