Monitor backpressure count and size in custom processor - apache-nifi

I have a custom processor (NiFi 1.8.0) that already modifies incoming flow files as needed. However before transferring the file to the outgoing relationship I would like to check if that relationship's backpressure is close to exceeding it's threshold. If it is then I plan to send the flow file to another relationship that connects to a PutFile processor where it will be written to disk.
I know I can get the incoming queue count and size. But I can't figure out how to get count and size from the outgoing relationship's connection.

There is a controller service available called - SiteToSiteStatusReportingTask which essentially sends the status of each and every event that is happening in Nifi.
If you look at the data structure it returns , you can see it has few very helpful attributes on detecting backpressure -
// fields for connections
{ "name" : "sourceId", "type" : ["string", "null"]},
{ "name" : "sourceName", "type" : ["string", "null"]},
{ "name" : "destinationId", "type" : ["string", "null"]},
{ "name" : "destinationName", "type" : ["string", "null"]},
{ "name" : "maxQueuedBytes", "type" : ["long", "null"]},
{ "name" : "maxQueuedCount", "type" : ["long", "null"]},
{ "name" : "queuedBytes", "type" : ["long", "null"]},
{ "name" : "backPressureBytesThreshold", "type" : ["long", "null"]},
{ "name" : "backPressureObjectThreshold", "type" : ["long", "null"]},
{ "name" : "backPressureDataSizeThreshold", "type" : ["string", "null"]},
{ "name" : "isBackPressureEnabled", "type" : ["string", "null"]},
You can use this information to derive what you need. Refer this article for more details on implementation

I ended up finding the connections from the ProcessGroupStatus object:
String myProcessorId = this.getIdentifier();
int queuedCount = 0;
float queuedBytes = 0;
ProcessGroupStatus processGroupStatus = ((EventAccess) getControllerServiceLookup().getControllerStatus();
if (processGroupStatus.getConnectionStatus() != null {
Collection < CollectionStatus > groupConnections = processGroupStatus.getConnectionStatus();
// Now have to iterate through groupConnections to find the one where the connection's source ID = myProcessorId and
// the connection's name = 'normal output' (this is the name of a relationship I added)
ArrayList connections = new ArrayList <> (groupConnections);
for (Object processorConnection : connections) {
ConnectionStatus connection = (ConnectionStatus) processorConnection;
if (connection.getName().equals("normal output") && connections.getSourceId.equals(myProcessorId)) {
// Now I can grab the current count and size of the 'normal output' relationship
// The back pressure threshold values can be grabbed from the connection as well
queuedCount = connection.getQueuedCount();
queuedBytes = connection.getQueuedBytes();
break;
}
}
}
The above only retrieves connections from the parrent group. If the connection you're looking for is contained in a child group, you will need to iterate through the child groups:
ProcessGroupStatus processGroupStatus = ((EventAccess) getControllerServiceLookup().getControllerStatus();
ArrayList childProcessorGroups = new ArrayList < > (processGroupStatus.getProcessGroupStatus());
for (Object childProcessorGroup : childProcessorGroups) {
ProcessGroupStatus childProcessGroupStatus = (ProcessGroupStatus) childProcessorGroup;
Collection < CollectionStatus > groupConnections = childProcessGroupStatus.getConnectionStatus();
// Then iterate through groupConnections as above
}
The NiFi getControllerServiceLookup() does show an 'allConnections' variable which contains all connections across all processors in all groups. But there doesn't appear to be a getter for it. If there was a getter for it, you wouldn't have to worry about which group to look in for connections. You could simply iterate through 'allConnections' and look for the connection matching your processor ID and relationship name.

Related

How to create an EntityModel for type with nested resources using Spring HATEOAS?

I have an address entity model, EntityModel<Address>:
{
"id" : 10662,
"streetNumber" : 4823,
"steetName" : "Bakersfield",
"city" : "anytown",
"state" : "anystate",
"zip" : "12345",
"_links" : {
"self" : {
"href" : "http://dev:9001/api/addresses/10662"
},
"address" : {
"href" : "http://dev:9001/api/addresses/10662"
}
}
}
I have a Java client that gets this address and assigns it to a new student:
EntityModel<Address> address = client.getAddress(10662);
Student student = new Student();
student.setAddress(address.getContent());
//set other student props
I would like to have the client then send a RESTful POST request to the server to save the student. But how can I create a EntityModel with a link to the address?
I know how to set links at the Student level
EntityModel<Student> studentEntity = EntityModel.of(
student,
WebMvcLinkBuilder.linkTo(student.getId()).withSelfRel()
);
but not sure how to set a link to the student's address

How to store nested document as String in elastic search

Context:
1) We are building a CDC pipeline (using kafka & connect framework)
2) We are using debezium for capturing mysql Tx logs
3) We are using Elastic Search connector to add documents to ES index
Sample change event generated by Debezium:
{
"source" : {
"before" : {
"Id" : 97,
"name" : "Northland",
"code" : "NTL",
"country_id" : 6,
"is_business_mapped" : 0
},
"after" : {
"Id" : 97,
"name" : "Northland",
"code" : "NTL",
"country_id" : 6,
"is_business_mapped" : 1
},
"source" : {
"version" : "0.7.5",
"name" : "__",
"server_id" : 252639387,
"ts_sec" : 1547805940,
"gtid" : null,
"file" : "mysql-bin-changelog.000570",
"pos" : 236,
"row" : 0,
"snapshot" : false,
"thread" : 614,
"db" : "bazaarify",
"table" : "state"
},
"op" : "u",
"ts_ms" : 1547805939683
}
What we want :
We want to visualize only 3 columns in kibana :
1) before - containing the nested JSON as string
2) after - containing the nested JSON as string
3) source - containing the nested JSON as string
I can think below possibilities here :
a) Either converting nested JSON as string
b) Combining column data in elastic search
I am a newbie to elastic search . Can someone please guide me how to do that.
I tried defining custom mapping as well but it is giving me exception.
You can always view your document as a Raw JSON in Kibana.
You don't need to manipulate it before indexing in elastic.
As this is related to visualization, handle this in Kibana only.
Check this link for a screenshot.
Refer this to add the columns which you want to see onto the results
I don't fully understand your use case, but if you would like to turn some json's to their representing strings, then you can use logstash for that, or even Elasticsearch ingest capabilities to convert an object (json) to a string.
From the link above, an example:
PUT _ingest/pipeline/my-pipeline-id { "description": "converts the
content of the id field to an integer", "processors" : [
{
"convert" : {
"field" : "source",
"type": "string"
}
} ] }

Performance with nested data in a script field

I am wondering if there is a more performant way of performing a calculation on nested data in a script field or of organizing the data. In the code below, the data will contain values for 50 states and/or other regions. Each user is tied to an area, so the script above will search to see that the averageValue in their area is above a certain threshold and return a true/false value for each matching document.
Mapping
{
"mydata" : {
"properties" : {
...some fields,
"related" : {
"type" : "nested",
"properties" : {
"average_value" : {
"type" : "integer"
},
"state" : {
"type" : "string"
}
}
}
}
}
}
Script
"script_fields" : {
"inBudget" : {
"script" : {
"inline" : "_source.related.find { it.state == default_area && it.average_value >= min_amount } != null",
"params" : {
"min_amount" : 100,
"default_area" : "CA"
}
}
}
}
I have a working solution using the above method, but it slows my query down and I am curious if there is a better solution. I have been toying with the idea of using a inner object with a key, like: related_CA and having each states data in a separate object, however for flexibility I would rather not have to pre-define each region in a mapping (as I may not have them all ahead of time). I feel like I might be missing a simpler/better way and I am open to either reorganizing the data/mapping and/or changes to the script.

Spring mongodb Find document if a single field matches in a list within a document

I have some data stored as
{
"_id" : ObjectId("abc"),
"_class" : "com.xxx.Team",
"name" : "Team 1",
"members" : [
{"userId" : 1, "email" : "a#x.com" },
{"userId" : 2, "email" : "b#x.com" },
]
}
{
"_id" : ObjectId("xyz"),
"_class" : "com.xxx.Team",
"name" : "Team 2",
"members" : [
{"userId" : 2, "email" : "b#x.com" },
{"userId" : 3, "email" : "c#x.com" }
]
}
I have 2 POJO classes Team (mapped to entire document),TeamMember (mapped to members inside a document).
Now I want to find to which team a specific user belongs to. For example if I search for a#x.com it should return me the document for Team 1. Similarly searching for b#x.com should return both of them as its in both the documents.
As I am very new to spring, not able to find out how to solve this.
Note: I am using MongoTemplate
somthing like this will do
final QueryBuilder queryBuilder = QueryBuilder.start();
//queryBuilder.and("members.email").is("a#x.com") This will work as well. try it out.
queryBuilder.and("members.email").in(Arrays.asList("a#x.com"))
final BasicDBObject projection = new BasicDBObject();
projection.put("fieldRequired", 1);
try (final DBCursor cursor = mongoTemplate.getCollection(collectionName).find(queryBuilder.get(), projection)
.batchSize(this.readBatchSize)) {
while (cursor.hasNext()) {
DBObject next = cursor.next();
........
// read the fields using next.get("field")
.........
}
}
batchsize and projection is not mandatory. Use projection if you don't want to fetch the whole document. You can specify which field in the document you want to fetch in the result.
You can use below code with the MongoTemplate
Query findQuery = new Query();
Criteria findCriteria = Criteria.where("members.email").is("b#x.com");
findQuery.addCriteria(findCriteria);
List<Team> teams = mongoTemplate.find(findQuery, Team.class);

Mongo DB MapReduce: Emit key from array based on condition

I am new to mongo db so excuse me if this is rather trivial. I would really appreciate the help.
The idea is to generate a histogram over some specific values. In that case the mime types of some files. For that I am using a map reduce job.
I have a mongo with documents in the following form:
{
"_id" : ObjectId("4fc5ed3e67960de6794dd21c"),
"name" : "some name",
"uid" : "some app specific uid",
"collection" : "some name",
"metadata" : [
{
"key" : "key1",
"value" : "Plain text",
"status" : "SINGLE_RESULT",
},
{
"key" : "key2",
"value" : "text/plain",
"status" : "SINGLE_RESULT",
},
{
"key" : "key3",
"value" : 3469,
"status" : "OK",
}
]
}
Please note, that in almost every document there are more metadata key values.
Map Reduce job
I tried doing the following:
function map() {
var mime = "";
this.metadata.forEach(function (m) {
if (m.key === "key2") {
mime = m.value;}
});
emit(mime, {count:1});
}
function reduce() {
var res = {count:0};
values.forEach(function (v) {res.count += v.count;});
return res;
}
db.collection.mapReduce(map, reduce, {out: { inline : 1}})
This seems to work for a small number of documents (~15K) but the problem is that iterating through all metadata key values takes a lot of time during the mapping phase. When running this on more documents (~1Mio) the operation takes for ever.
So my question is:
Is there some way in which I can emit the mime type (the value) directly instead of iterating through all keys and selecting it? Or is there a better way to write a map reduce functions.
Something like emit (this.metadata.value {$where this.metadata.key:"key2"}) or similar...
Thanks for your help!
Two thoughts ...
First thought: How attached are you to this document schema? Could you instead have the metadata field value as an embedded document rather than an embedded array, like so:
{
"_id" : ObjectId("4fc5ed3e67960de6794dd21c"),
"name" : "some name",
"uid" : "some app specific uid",
"collection" : "some name",
"metadata" : {
"key1" : {
"value" : "Plain text",
"status" : "SINGLE_RESULT"
},
"key2": {
"value" : "text/plain",
"status" : "SINGLE_RESULT"
},
"key3" : {
"value" : 3469,
"status" : "OK"
}
}
}
Then your map step does away with the loop entirely:
function map() {
emit( this.metadata["key2"].value, { count : 1 } );
}
At that point, you might even be able to cast this as a "group" command rather than a "mapReduce".
Second thought: Absent a schema change like that, particularly if "key2" appears early in the metadata array, you could at least exit the loop eagerly once the key is found to save yourself some iterations, like so:
function map() {
var mime = "";
this.metadata.forEach(function (m) {
if (m.key === "key2") {
mime = m.value;
break;
}
});
emit(mime, {count:1});
}
Not sure if either path is the key to victory, but hopefully helpful thoughts. Best of luck!

Resources