Issue with NiFi Rest API in fetching Remote Process Group details - spring

Using below mentioned NiFi Rest API endpoint and code snippet,
I am fetching a list of Remote Process Groups (RPG), iterating and fetching each RPG details. The problem is, I am getting inaccurate RPG data. If I hit this Endpoint (https://nifihost:8080/nifi-api/remote-process-groups/{id}), I am receiving accurate details. Please clarify,
Why there is discrepancy between the results of these two
end-points?
(https://nifihost:8080/nifi-api/process-groups/{id}/remote-process-groups
Vs.
https://nifihost:8080/nifi-api/remote-process-groups/{id})
As my requirement is to iterate via each Process Group, getting a list of
Remote Process Groups (RPG) within it and fetching each RPG details? What is the right way
to achieve this?
Endpoint:
https://nifihost:8080/nifi-api/process-groups/{id}/remote-process-groups
Source Code
ArrayList<NifiRemoteProcessGroup> remoteProcessGroupArrayList = new ArrayList<>();
String returnedJSON = "";
String remoteProcessGroupURL = getNifiURL() + "/nifi-api/process-groups/" + processGroup + "/remote-process-groups";
HttpEntity httpEntity = RestCall.oAuthHeaders(token);
RestTemplate restTemplate = new RestTemplate();
try{
ResponseEntity<String> response = restTemplate.exchange(remoteProcessGroupURL,HttpMethod.GET,httpEntity,String.class);
returnedJSON = response.getBody();
}
catch(Exception e){
logger.error("There was an error retrieving the remote-process-groups : " + e.getMessage());
}
try{
ObjectMapper objectMapper = new ObjectMapper();
JsonNode rootNode = objectMapper.readTree(returnedJSON);
JsonNode processorNode = rootNode.path("remoteProcessGroups");
Iterator<JsonNode> elements = processorNode.elements();
while(elements.hasNext()){
JsonNode remoteProcessGroup = elements.next();
JsonNode statusElement = remoteProcessGroup.path("status");
JsonNode bulletinElement = remoteProcessGroup.path("bulletins");
JsonNode componentElement = remoteProcessGroup.path("component");
JsonNode aggregateSnapshot = statusElement.path("aggregateSnapshot");
NifiRemoteProcessGroup remoteProcessGroupInstance = new NifiRemoteProcessGroup();
remoteProcessGroupInstance.setRemoteProcessGroupId(checkExists(statusElement,"id"));
remoteProcessGroupInstance.setRemoteProcessGroupName(checkExists(componentElement,"name"));
remoteProcessGroupInstance.setRemoteProcessGroupGroupId(checkExists(statusElement,"groupId"));
remoteProcessGroupInstance.setRemoteProcessGroupTargetURL(checkExists(componentElement,"targetUri"));
remoteProcessGroupInstance.setRemoteProcessGroupBulletins(bulletinElement.asText());
remoteProcessGroupInstance.setRemoteProcessGroupTransmitting(Boolean.valueOf(checkExists(componentElement,"transmitting")));
remoteProcessGroupInstance.setRemoteProcessGroupTransmissionStatus(checkExists(statusElement,"transmissionStatus"));
remoteProcessGroupInstance.setRemoteProcessGroupActiveThreadCount(Double.valueOf(checkExists(aggregateSnapshot,"activeThreadCount")));
remoteProcessGroupInstance.setRemoteProcessGroupFlowFilesReceived(Double.valueOf(checkExists(aggregateSnapshot,"flowFilesReceived")));
remoteProcessGroupInstance.setRemoteProcessGroupBytesReceived(Double.valueOf(checkExists(aggregateSnapshot,"bytesReceived")));
remoteProcessGroupInstance.setRemoteProcessGroupReceived(checkExists(aggregateSnapshot,"received"));
remoteProcessGroupArrayList.add(remoteProcessGroupInstance);
}
}
catch(Exception e){
logger.info("There was an error creating the list of remote process groups: " + e.getMessage());
}

'process-groups/{id}/remote-process-groups' is part of the ProcessGroupsAPI subsection, and will return a RemoteProcessGroupsEntity, which contains a listing of the Remote Process Groups bounded with the ProcessGroup of the ID you submit.
'remote-process-groups/{id}' is part of the RemoteProcessGroups API, and will fetch the exact RemoteProcessGroupEntity (note the lack of plural) requested.
I maintain the nominal Python client for NiFi, given the outcome you mention seeking I suggest you could try:
import nipyapi
nipyapi.utils.set_endpoint('http://localhost:8080/nifi')
rpg_info = [nipyapi.canvas.get_remote_process_group(rpg.id) for rpg in nipyapi.canvas.list_all_remote_process_groups('root', True)]
The RPG info returned will give you the parent ProcessGroup ID under .component.parent_group_id, allowing you to reconstruct the tree, but you should find it much more performant than seeking each individually.

Related

How to update tag value for exported metric in micrometer?

Im using micrometer for exporting summery of third party api consumption.
Now I want to precisely count failed requests and export each failed request ids.
Invoking below method for each restTemplate exchange call.
private DistributionSummary incFailedCounter(String requestId) {
this.registry = beanProvider.getRegistry();
DistributionSummary summary = summarys.get(myCounter);
if (summary == null) {
Builder tags = DistributionSummary.builder("failed.test").tags("req_id", requestId, "count", "1");
summary = tags.register(registry);
summarys.put(myCounter, summary);
} else {
String tag = summary.getId().getTag("req_id");
String[] split = tag.split(",");
summary.close();
summarys.put(myCounter,
DistributionSummary.builder("failed.test")
.tags("req_id", tag + ", " + requestId, "count", String.valueOf(split.length + 1))
.register(registry));
}
return summary;
}
This code insert new line to metric for each request.
failed_test_count{count="1",instance="localhost:8080",job="monitor-app",req_id="1157408321"}
failed_test_count{count="2",instance="localhost:8080",job="monitor-app",req_id="1157408321, 1157408321"}
failed_test_count{count="3",instance="localhost:8080",job="monitor-app",req_id="1157408321, 1157408321, 1157408321"}
Problem is this metric size is increased with many requests.
Is there way to remove or replace same tag and export only one dynamic metric with updated req_ids ?
Can not remove or update tags, cause they are immutable. One way is to unregister current meter. used below method to removed registered meter and applied new one.
registry.remove(summary.getId());
This produces one line metric.
failed_test_count{count="4",instance="localhost:8080",job="monitor-app",req_id="1157408321, 58500184, 58500184, 58500184"}

How to read and write files in a reactive way using InputStreamand OutputStream

I am trying to read an Excel file in manipulate it or add new data to it and write it back out. I am also trying to do this a complete reactive process using Flux and Mono. The Idea is to return the resulting file or bytearray via a webservice.
My question is how do I get a InputStream and OutputStream in a non blocking way?
I am using the Apache Poi library to read and generate the Excel File.
I currently have a solution based around a mix of Mono.fromCallable() and Blocking code getting the Input Stream.
For example the webservice part is as follows.
#GetMapping(value = API_BASE_PATH + "/download", produces = "application/vnd.ms-excel")
public Mono<ByteArrayResource> download() {
Flux<TimeKeepingEntry> createExcel = excelExport.createDocument(false);
return createExcel.then(Mono.fromCallable(() -> {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
excelExport.getWb().write(outputStream);
return new ByteArrayResource(outputStream.toByteArray());
}).subscribeOn(Schedulers.elastic()));
}
And the Processing of the file:
public Flux<TimeKeepingEntry> createDocument(boolean all) {
Flux<TimeKeepingEntry> entries = null;
try {
InputStream inputStream = new ClassPathResource("Timesheet Template.xlsx").getInputStream();
wb = WorkbookFactory.create(inputStream);
Sheet sheet = wb.getSheetAt(0);
log.info("Created document");
if (all) {
//all entries
} else {
entries = service.findByMonth(currentMonthName).log("Excel Export - retrievedMonths").sort(Comparator.comparing(TimeKeepingEntry::getDateOfMonth)).doOnNext(timeKeepingEntry-> {
this.populateEntry(sheet, timeKeepingEntry);
});
}
} catch (IOException e) {
log.error("Error Importing File", e);
}
return entries;
}
This works well enough but not very in line with Flux and Mono. Some guidance here would be good. I would prefer to have the whole sequence non-blocking.
Unfortunately the WorkbookFactory.create() operation is blocking, so you have to perform that operation using imperative code. However fetching each timeKeepingEntry can be done reactively. Your code would looks something like this:
public Flux<TimeKeepingEntry> createDocument() {
return Flux.generate(
this::getWorkbookSheet,
(sheet, sink) -> {
sink.next(getNextTimeKeepingEntryFrom(sheet));
},
this::closeWorkbook);
}
This will keep the workbook in memory, but will fetch each entry on demand when the elements of the Flux are requested.

How to extract and manipulate data within a Nifi processor

I'm trying to write a custom Nifi processor which will take in the contents of the incoming flow file, perform some math operations on it, then write the results into an outgoing flow file. Is there a way to dump the contents of the incoming flow file into a string or something? I've been searching for a while now and it doesn't seem that simple. If anyone could point me toward a good tutorial that deals with doing something like that it would be greatly appreciated.
The Apache NiFi Developer Guide documents the process of creating a custom processor very well. In your specific case, I would start with the Component Lifecycle section and the Enrich/Modify Content pattern. Any other processor which does similar work (like ReplaceText or Base64EncodeContent) would be good examples to learn from; all of the source code is available on GitHub.
Essentially you need to implement the #onTrigger() method in your processor class, read the flowfile content and parse it into your expected format, perform your operations, and then re-populate the resulting flowfile content. Your source code will look something like this:
#Override
public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) {
return;
}
final ComponentLog logger = getLogger();
AtomicBoolean error = new AtomicBoolean();
AtomicReference<String> result = new AtomicReference<>(null);
// This uses a lambda function in place of a callback for InputStreamCallback#process()
processSession.read(flowFile, in -> {
long start = System.nanoTime();
// Read the flowfile content into a String
// TODO: May need to buffer this if the content is large
try {
final String contents = IOUtils.toString(in, StandardCharsets.UTF_8);
result.set(new MyMathOperationService().performSomeOperation(contents));
long stop = System.nanoTime();
if (getLogger().isDebugEnabled()) {
final long durationNanos = stop - start;
DecimalFormat df = new DecimalFormat("#.###");
getLogger().debug("Performed operation in " + durationNanos + " nanoseconds (" + df.format(durationNanos / 1_000_000_000.0) + " seconds).");
}
} catch (Exception e) {
error.set(true);
getLogger().error(e.getMessage() + " Routing to failure.", e);
}
});
if (error.get()) {
processSession.transfer(flowFile, REL_FAILURE);
} else {
// Again, a lambda takes the place of the OutputStreamCallback#process()
FlowFile updatedFlowFile = session.write(flowFile, (in, out) -> {
final String resultString = result.get();
final byte[] resultBytes = resultString.getBytes(StandardCharsets.UTF_8);
// TODO: This can use a while loop for performance
out.write(resultBytes, 0, resultBytes.length);
out.flush();
});
processSession.transfer(updatedFlowFile, REL_SUCCESS);
}
}
Daggett is right that the ExecuteScript processor is a good place to start because it will shorten the development lifecycle (no building NARs, deploying, and restarting NiFi to use it) and when you have the correct behavior, you can easily copy/paste into the generated skeleton and deploy it once.

NiFi SpringContextProcessor interrupts the flow

I added a SpringContextProcessor to my NiFi flow, it executes as expected and updates the FlowFile content and attributes. But in the data provenance section of NiFi instead of seeing SEND/RECEIVE, I am seeing
03/27/2017 11:47:57.164 MDT RECEIVE 42fa1c3f-edde-4cb7-8e73-ce752f7e3d66
03/27/2017 11:47:57.163 MDT DROP 667094a7-8eef-4657-981a-dc9fdc6c4056
03/27/2017 11:47:57.163 MDT SEND 667094a7-8eef-4657-981a-dc9fdc6c4056
Looks like the original message is being dropped and replaced by a new message. I haven't seen this behavior in other components, i.e. they all seem to preserve the original Flow File UUID. Simplified version of the Spring processor code:
#ServiceActivator(inputChannel = "fromNiFi", outputChannel = "toNiFi")
public Message<byte[]> process1(Message<byte[]> inMessage) {
String inMessagePayload = new String(inMessage.getPayload());
String userId = getUserIdFromDb(inMessagePayload);
String outMessagePayload = inMessagePayload + userId;
return MessageBuilder.withPayload(outMessagePayload.getBytes())
.copyHeaders(inMessage.getHeaders())
.setHeader("userId", userId)
.build();
}
Is there a way to preserve the original Flow File UUID in the outgoing message?
This is probably an oversight on our end, so yes please do raise a JIRA.
However as a workaround you can try to extract FlowFile attributes from the incoming Message headers and then propagate them back to the outgoing message.
public Message<byte[]> process1(Message<byte[]> inMessage) {
String myHeader = inMessage.getHeader("someHeader");
. . .
return MessageBuilder.withPayload(outMessagePayload.getBytes())
.copyHeaders(inMessage.getHeaders())
.setHeader("userId", userId)
.setHeader("someHeader", myHeader)
.build();
}

Multiple connections on the controller service (Spring)

I have written a controller which takes as a input the domain name , crawls the whole site and gives back the result in JSON format
http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.google.com
This gives the data google
http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.yahoo.com
This gives data for yahoo
If I try to run these two URL's simultaneously, I see that I am getting the mixed data, and the results of one is affecting the another, even though I try to hit them from different machines.
Here is my controller
#RequestMapping("/getUrlCrawlData/{domain:.+}")
#ResponseBody
public String registerContact(#PathVariable("domain") String domain) throws HttpStatusException, SQLException, IOException {
List<URLdata> urldata = null;
Gson gson = new Gson();
String json;
urldata = crawlService.crawlURL("http://"+domain);
json = gson.toJson(urldata);
return json;
}
What do I need to do modify to allow many multiple independent connections.
Update
Following is my crawl Service
public List<URLdata> crawlURL(String domain) throws HttpStatusException, SQLException, IOException{
testDomain = domain;
urlList.clear();
urlMap.clear();
urldata.clear();
urlList.add(testDomain);
processPage(testDomain);
//Get all pages
for(int i = 1; i < urlList.size(); i++){
if(urlList.size()>=500){
break;
}
processPage(urlList.get(i));
//System.out.println(urlList.get(i));
}
//Calculate Time
for(int i = 0; i < urlList.size(); i++){
getTitleAndMeta(urlList.get(i));
}
return urldata;
}
public static void processPage(String URL) throws SQLException, IOException, HttpStatusException{
//get useful information
try{
Connection.Response response = Jsoup.connect(URL)
.userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
.timeout(10000)
.execute();
Document doc = response.parse();
//get all links and recursively call the processPage method
Elements questions = doc.select("a[href]");
for(Element link: questions){
String linkName = link.attr("abs:href");
if(linkName.contains(testDomain.replaceAll("http://www.", ""))){
if(linkName.contains("#")){
linkName = linkName.substring(0, linkName.indexOf("#"));
}
if(linkName.contains("?")){
linkName = linkName.substring(0, linkName.indexOf("?"));
}
if(!urlList.contains(linkName) && urlList.size() <= 500){
urlList.add(linkName);
}
}
}
}
catch(HttpStatusException e){
System.out.println(e);
}
catch(SocketTimeoutException e){
System.out.println(e);
}
catch(UnsupportedMimeTypeException e){
System.out.println(e);
}
catch(UnknownHostException e){
System.out.println(e);
}
catch(MalformedURLException e){
System.out.println(e);
}
}
Each of your requests (http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.google.com and http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.yahoo.com) is processed in a separate thread. You have two instances of the crawlURL() method working simultaneously, but both methods use the same variables (testDomain, urlList, urlMap and urldata). So they mess up each other's data in these variables.
One way to fix the problem is to declare these variables locally (inside the method). This way, new instances of these variables will be created for each invocation of crawlURL(). Alternatively, you can create a new instance of your CrawlService class for each invocation of the crawlURL() method.
Synchronizing threads would be a bad idea here because one requests will wait for another to complete before it can be processed by crawlURL().
As far as SpringMVC is concerned every request running in separate thread. So I think problem is in crawlService which, I suppose, is not stateless (singleton-like). Try to create new crawl service for every request and check if your data is not mixed. If creating crawl service is expensive operation you should rewrite it to work in stateless way.
#RequestMapping("/getUrlCrawlData/{domain:.+}")
#ResponseBody
public String registerContact(#PathVariable("domain") String domain) throws HttpStatusException, SQLException, IOException {
Gson gson = new Gson();
List<URLdata> = new CrawlService().crawlURL("http://"+domain);
return gson.toJson(urldata);
}
I think
urldata = crawlService.crawlURL("http://"+domain);
This call to crawl Service is the one which is affected by Multiple requests coming simultaneously.
check whether crawlService is safe from multithreading.
ie check whether crawlURL() method is synchronized , if not make it synchronized.
or else synchronize the block of calling crawlservice inside controller.

Resources