NiFi SpringContextProcessor interrupts the flow - spring

I added a SpringContextProcessor to my NiFi flow, it executes as expected and updates the FlowFile content and attributes. But in the data provenance section of NiFi instead of seeing SEND/RECEIVE, I am seeing
03/27/2017 11:47:57.164 MDT RECEIVE 42fa1c3f-edde-4cb7-8e73-ce752f7e3d66
03/27/2017 11:47:57.163 MDT DROP 667094a7-8eef-4657-981a-dc9fdc6c4056
03/27/2017 11:47:57.163 MDT SEND 667094a7-8eef-4657-981a-dc9fdc6c4056
Looks like the original message is being dropped and replaced by a new message. I haven't seen this behavior in other components, i.e. they all seem to preserve the original Flow File UUID. Simplified version of the Spring processor code:
#ServiceActivator(inputChannel = "fromNiFi", outputChannel = "toNiFi")
public Message<byte[]> process1(Message<byte[]> inMessage) {
String inMessagePayload = new String(inMessage.getPayload());
String userId = getUserIdFromDb(inMessagePayload);
String outMessagePayload = inMessagePayload + userId;
return MessageBuilder.withPayload(outMessagePayload.getBytes())
.copyHeaders(inMessage.getHeaders())
.setHeader("userId", userId)
.build();
}
Is there a way to preserve the original Flow File UUID in the outgoing message?

This is probably an oversight on our end, so yes please do raise a JIRA.
However as a workaround you can try to extract FlowFile attributes from the incoming Message headers and then propagate them back to the outgoing message.
public Message<byte[]> process1(Message<byte[]> inMessage) {
String myHeader = inMessage.getHeader("someHeader");
. . .
return MessageBuilder.withPayload(outMessagePayload.getBytes())
.copyHeaders(inMessage.getHeaders())
.setHeader("userId", userId)
.setHeader("someHeader", myHeader)
.build();
}

Related

Issue with NiFi Rest API in fetching Remote Process Group details

Using below mentioned NiFi Rest API endpoint and code snippet,
I am fetching a list of Remote Process Groups (RPG), iterating and fetching each RPG details. The problem is, I am getting inaccurate RPG data. If I hit this Endpoint (https://nifihost:8080/nifi-api/remote-process-groups/{id}), I am receiving accurate details. Please clarify,
Why there is discrepancy between the results of these two
end-points?
(https://nifihost:8080/nifi-api/process-groups/{id}/remote-process-groups
Vs.
https://nifihost:8080/nifi-api/remote-process-groups/{id})
As my requirement is to iterate via each Process Group, getting a list of
Remote Process Groups (RPG) within it and fetching each RPG details? What is the right way
to achieve this?
Endpoint:
https://nifihost:8080/nifi-api/process-groups/{id}/remote-process-groups
Source Code
ArrayList<NifiRemoteProcessGroup> remoteProcessGroupArrayList = new ArrayList<>();
String returnedJSON = "";
String remoteProcessGroupURL = getNifiURL() + "/nifi-api/process-groups/" + processGroup + "/remote-process-groups";
HttpEntity httpEntity = RestCall.oAuthHeaders(token);
RestTemplate restTemplate = new RestTemplate();
try{
ResponseEntity<String> response = restTemplate.exchange(remoteProcessGroupURL,HttpMethod.GET,httpEntity,String.class);
returnedJSON = response.getBody();
}
catch(Exception e){
logger.error("There was an error retrieving the remote-process-groups : " + e.getMessage());
}
try{
ObjectMapper objectMapper = new ObjectMapper();
JsonNode rootNode = objectMapper.readTree(returnedJSON);
JsonNode processorNode = rootNode.path("remoteProcessGroups");
Iterator<JsonNode> elements = processorNode.elements();
while(elements.hasNext()){
JsonNode remoteProcessGroup = elements.next();
JsonNode statusElement = remoteProcessGroup.path("status");
JsonNode bulletinElement = remoteProcessGroup.path("bulletins");
JsonNode componentElement = remoteProcessGroup.path("component");
JsonNode aggregateSnapshot = statusElement.path("aggregateSnapshot");
NifiRemoteProcessGroup remoteProcessGroupInstance = new NifiRemoteProcessGroup();
remoteProcessGroupInstance.setRemoteProcessGroupId(checkExists(statusElement,"id"));
remoteProcessGroupInstance.setRemoteProcessGroupName(checkExists(componentElement,"name"));
remoteProcessGroupInstance.setRemoteProcessGroupGroupId(checkExists(statusElement,"groupId"));
remoteProcessGroupInstance.setRemoteProcessGroupTargetURL(checkExists(componentElement,"targetUri"));
remoteProcessGroupInstance.setRemoteProcessGroupBulletins(bulletinElement.asText());
remoteProcessGroupInstance.setRemoteProcessGroupTransmitting(Boolean.valueOf(checkExists(componentElement,"transmitting")));
remoteProcessGroupInstance.setRemoteProcessGroupTransmissionStatus(checkExists(statusElement,"transmissionStatus"));
remoteProcessGroupInstance.setRemoteProcessGroupActiveThreadCount(Double.valueOf(checkExists(aggregateSnapshot,"activeThreadCount")));
remoteProcessGroupInstance.setRemoteProcessGroupFlowFilesReceived(Double.valueOf(checkExists(aggregateSnapshot,"flowFilesReceived")));
remoteProcessGroupInstance.setRemoteProcessGroupBytesReceived(Double.valueOf(checkExists(aggregateSnapshot,"bytesReceived")));
remoteProcessGroupInstance.setRemoteProcessGroupReceived(checkExists(aggregateSnapshot,"received"));
remoteProcessGroupArrayList.add(remoteProcessGroupInstance);
}
}
catch(Exception e){
logger.info("There was an error creating the list of remote process groups: " + e.getMessage());
}
'process-groups/{id}/remote-process-groups' is part of the ProcessGroupsAPI subsection, and will return a RemoteProcessGroupsEntity, which contains a listing of the Remote Process Groups bounded with the ProcessGroup of the ID you submit.
'remote-process-groups/{id}' is part of the RemoteProcessGroups API, and will fetch the exact RemoteProcessGroupEntity (note the lack of plural) requested.
I maintain the nominal Python client for NiFi, given the outcome you mention seeking I suggest you could try:
import nipyapi
nipyapi.utils.set_endpoint('http://localhost:8080/nifi')
rpg_info = [nipyapi.canvas.get_remote_process_group(rpg.id) for rpg in nipyapi.canvas.list_all_remote_process_groups('root', True)]
The RPG info returned will give you the parent ProcessGroup ID under .component.parent_group_id, allowing you to reconstruct the tree, but you should find it much more performant than seeking each individually.

Get message content from mime message?

I have a java spring integration project that is receving emails through the below code:
ClassPathXmlApplicationContext ac =
new ClassPathXmlApplicationContext(
"/integration/gmail-imap-idle-config.xml");
DirectChannel inputChannel = ac.getBean("receiveChannel", DirectChannel.class);
inputChannel.subscribe(message -> {
org.springframework.messaging.Message<MimeMailMessage> received =
(org.springframework.messaging.Message<MimeMailMessage>) message;
log.info("content" + message);
List<String> sentences = null;
try {
} catch (Exception e) {
}
I get the email, and I can get the subject, but I can never actually extract the message body. How do I do this?
Thank you!
You have to use this option on the channel adapter:
simple-content="true"
See its description:
When 'true', messages produced by the source will be rendered by 'MimeMessage.getContent()'
which is usually just the body for a simple text email. When false (default) the content
is rendered by the 'getContent()' method on the actual message returned by the underlying
javamail implementation.
For example, an IMAP message is rendered with some message headers.
This attribute is provided so that users can enable the previous behavior, which just
rendered the body.
But still it is doubtful, since I see in case of GMail message it is never simple. The content is a MimeMultipart and we need to read its parts to get access to the real body.
So, this is how you should change your code as well:
log.info("content" + ((MimeMultipart) ((MimeMessage) message.getPayload()).getContent()).getBodyPart(0).getContent());

Kafka Processor does not keep the state of attributes of flowfile

I update few attributes of the flowfile and put the same in kafka but when I consume the same from consumekafka_2.0 processor that attributes are lost.
Is this not supported ? Do I need to customise this processor?
When I saw the below source code of the processor then I got that it is already reading the attributes from record and writing the same in flowfile then Why these are not available in the flowfile?
private void writeData(final ProcessSession session, ConsumerRecord<byte[], byte[]> record, final TopicPartition topicPartition) {
FlowFile flowFile = session.create();
final BundleTracker tracker = new BundleTracker(record, topicPartition, keyEncoding);
tracker.incrementRecordCount(1);
final byte[] value = record.value();
if (value != null) {
flowFile = session.write(flowFile, out -> {
out.write(value);
});
}
flowFile = session.putAllAttributes(flowFile, getAttributes(record));
tracker.updateFlowFile(flowFile);
populateAttributes(tracker);
session.transfer(tracker.flowFile, REL_SUCCESS);
}
In order to pass attributes you must make use of Kafka headers, otherwise there is no way to pass the attributes across since they are not part of the body of the flow file which is what will become the body of the message in Kafka.
On the publish side, PublishKafka_2_0 has the following property to specify which attributes to send as headers:
static final PropertyDescriptor ATTRIBUTE_NAME_REGEX = new PropertyDescriptor.Builder()
.name("attribute-name-regex")
.displayName("Attributes to Send as Headers (Regex)")
.description("A Regular Expression that is matched against all FlowFile attribute names. "
+ "Any attribute whose name matches the regex will be added to the Kafka messages as a Header. "
+ "If not specified, no FlowFile attributes will be added as headers.")
.addValidator(StandardValidators.REGULAR_EXPRESSION_VALIDATOR)
.expressionLanguageSupported(ExpressionLanguageScope.NONE)
.required(false)
.build();
On the consume side, ConsumeKafka_2_0 has the following property to specify which header fields to add as attributes:
static final PropertyDescriptor HEADER_NAME_REGEX = new PropertyDescriptor.Builder()
.name("header-name-regex")
.displayName("Headers to Add as Attributes (Regex)")
.description("A Regular Expression that is matched against all message headers. "
+ "Any message header whose name matches the regex will be added to the FlowFile as an Attribute. "
+ "If not specified, no Header values will be added as FlowFile attributes. If two messages have a different value for the same header and that header is selected by "
+ "the provided regex, then those two messages must be added to different FlowFiles. As a result, users should be cautious about using a regex like "
+ "\".*\" if messages are expected to have header values that are unique per message, such as an identifier or timestamp, because it will prevent NiFi from bundling "
+ "the messages together efficiently.")
.addValidator(StandardValidators.REGULAR_EXPRESSION_VALIDATOR)
.expressionLanguageSupported(ExpressionLanguageScope.NONE)
.required(false)
.build();

How to extract and manipulate data within a Nifi processor

I'm trying to write a custom Nifi processor which will take in the contents of the incoming flow file, perform some math operations on it, then write the results into an outgoing flow file. Is there a way to dump the contents of the incoming flow file into a string or something? I've been searching for a while now and it doesn't seem that simple. If anyone could point me toward a good tutorial that deals with doing something like that it would be greatly appreciated.
The Apache NiFi Developer Guide documents the process of creating a custom processor very well. In your specific case, I would start with the Component Lifecycle section and the Enrich/Modify Content pattern. Any other processor which does similar work (like ReplaceText or Base64EncodeContent) would be good examples to learn from; all of the source code is available on GitHub.
Essentially you need to implement the #onTrigger() method in your processor class, read the flowfile content and parse it into your expected format, perform your operations, and then re-populate the resulting flowfile content. Your source code will look something like this:
#Override
public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) {
return;
}
final ComponentLog logger = getLogger();
AtomicBoolean error = new AtomicBoolean();
AtomicReference<String> result = new AtomicReference<>(null);
// This uses a lambda function in place of a callback for InputStreamCallback#process()
processSession.read(flowFile, in -> {
long start = System.nanoTime();
// Read the flowfile content into a String
// TODO: May need to buffer this if the content is large
try {
final String contents = IOUtils.toString(in, StandardCharsets.UTF_8);
result.set(new MyMathOperationService().performSomeOperation(contents));
long stop = System.nanoTime();
if (getLogger().isDebugEnabled()) {
final long durationNanos = stop - start;
DecimalFormat df = new DecimalFormat("#.###");
getLogger().debug("Performed operation in " + durationNanos + " nanoseconds (" + df.format(durationNanos / 1_000_000_000.0) + " seconds).");
}
} catch (Exception e) {
error.set(true);
getLogger().error(e.getMessage() + " Routing to failure.", e);
}
});
if (error.get()) {
processSession.transfer(flowFile, REL_FAILURE);
} else {
// Again, a lambda takes the place of the OutputStreamCallback#process()
FlowFile updatedFlowFile = session.write(flowFile, (in, out) -> {
final String resultString = result.get();
final byte[] resultBytes = resultString.getBytes(StandardCharsets.UTF_8);
// TODO: This can use a while loop for performance
out.write(resultBytes, 0, resultBytes.length);
out.flush();
});
processSession.transfer(updatedFlowFile, REL_SUCCESS);
}
}
Daggett is right that the ExecuteScript processor is a good place to start because it will shorten the development lifecycle (no building NARs, deploying, and restarting NiFi to use it) and when you have the correct behavior, you can easily copy/paste into the generated skeleton and deploy it once.

Send Status code and message in SpringMVC

I have the following code in my web application:
#ExceptionHandler(InstanceNotFoundException.class)
#ResponseStatus(HttpStatus.NO_CONTENT)
public ModelAndView instanceNotFoundException(InstanceNotFoundException e) {
return returnErrorPage(message, e);
}
Is it possible to also append a status message to the response? I need to add some additional semantics for my errors, like in the case of the snippet I posted I would like to append which class was the element of which the instance was not found.
Is this even possible?
EDIT: I tried this:
#ResponseStatus(value=HttpStatus.NO_CONTENT, reason="My message")
But then when I try to get this message in the client, it's not set.
URL u = new URL ( url);
HttpURLConnection huc = (HttpURLConnection) u.openConnection();
huc.setRequestMethod("GET");
HttpURLConnection.setFollowRedirects(true);
huc.connect();
final int code = huc.getResponseCode();
String message = huc.getResponseMessage();
Turns out I needed to activate custom messages on Tomcat using this parameter:
-Dorg.apache.coyote.USE_CUSTOM_STATUS_MSG_IN_HEADER=true
The message can be in the body rather than in header. Similar to a successful method, set the response (text, json, xml..) to be returned, but set the http status to an error value. I have found that to be more useful than the custom message in header. The following example shows the response with a custom header and a message in body. A ModelAndView that take to another page will also be conceptually similar.
#ExceptionHandler(InstanceNotFoundException.class)
public ResponseEntity<String> handle() {
HttpHeaders responseHeaders = new HttpHeaders();
responseHeaders.set("ACustomHttpHeader", "The custom value");
return new ResponseEntity<String>("the error message", responseHeaders, HttpStatus.INTERNAL_SERVER_ERROR);
}

Resources