Apache NIFI timeout while waiting for OnScheduled - apache-nifi

Is the nifi.processor.scheduling.timeout really defaulted to infinite as described in the admin guide? When I looked at the code, it looks like it is timing out after 60 seconds. We have a processor that takes a bit to start up (load resources) and are encountering the "Timed out while waiting for OnScheduled" error. Just trying to figure out why it sometimes fails on startup and then will also continue to fail with the same error.
Really strange. Turning off all of the processors, bouncing the instance and starting the processor individually seems to eliminate the issue. However, if they are all on and the instance is restarted, we encounter the error.
Could easily be something else, but the startup sequence seems to work.
NIFI Admin
NIFI Processor Code
Code Snippet From NIFI Github where I found the timeout error
String timeoutString = NiFiProperties.getInstance().getProperty(NiFiProperties.PROCESSOR_SCHEDULING_TIMEOUT);
long onScheduleTimeout = timeoutString == null ? 60000
: FormatUtils.getTimeDuration(timeoutString.trim(), TimeUnit.MILLISECONDS);
Future<?> taskFuture = callback.invokeMonitoringTask(task);
try {
taskFuture.get(onScheduleTimeout, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
LOG.warn("Thread was interrupted while waiting for processor '" + this.processor.getClass().getSimpleName()
+ "' lifecycle OnScheduled operation to finish.");
Thread.currentThread().interrupt();
throw new RuntimeException("Interrupted while executing one of processor's OnScheduled tasks.", e);
} catch (TimeoutException e) {
taskFuture.cancel(true);
LOG.warn("Timed out while waiting for OnScheduled of '"
+ this.processor.getClass().getSimpleName()
+ "' processor to finish. An attempt is made to cancel the task via Thread.interrupt(). However it does not "
+ "guarantee that the task will be canceled since the code inside current OnScheduled operation may "
+ "have been written to ignore interrupts which may result in runaway thread which could lead to more issues "
+ "eventually requiring NiFi to be restarted. This is usually a bug in the target Processor '"
+ this.processor + "' that needs to be documented, reported and eventually fixed.");
throw new RuntimeException("Timed out while executing one of processor's OnScheduled task.", e);
} catch (ExecutionException e){
throw new RuntimeException("Failed while executing one of processor's OnScheduled task.", e);
} finally {
callback.postMonitor();
}

Related

Exception class required for the event ibm messaging queue timeout

I am new to the messaging queue implementation.
I have implemented IBM messaging queue(MQ) in my application.
Problem statement:
When this MQ is not able to handle certain number of messages MQ throws timeout exception.
Due to a technical limitation of my system I am unable to catch the exact exception class.
Means that I simply declare catch(Exception e) ..but I would like to know exactly which exception class should be use to handle timeout error.
I think you need some MQ training or you need to do a lot of reading on MQ.
There is no such thing as a timeout on an MQPUT. I would say you have some poorly written code and you are confusing MQ with your poorly written code. Are you logging ALL interactions?
If your code is Java/JMS then you should have the following exception:
catch (JMSException e)
{
System.err.println(e.getLocalizedMessage());
if (e != null)
System.err.println("getLinkedException()=" + e.getLinkedException());
}
If your code is plain Java then you should have the following exception:
catch (MQException e)
{
System.err.println(e.getLocalizedMessage());
System.err.println("CC = " + e.completionCode + " : RC = " + e.reasonCode + " [" + MQConstants.lookup(e.reasonCode, "MQRC_.*") +"]");
}

NPE when removing a Dynamic Spring Integration Flow

Hi in our application we are creating certain dynamic integrations flows that we remove and create on the flow .
Mostly things have worked great but we have observed the below error sometimes specially due to the integration flow trying to remove dependent beans . Can someone comment whether this is a bug or we are missing anything . Error Trace below
java.lang.NullPointerException: null
at org.springframework.beans.factory.support.DefaultListableBeanFactory.resetBeanDefinition(DefaultListableBeanFactory.java:912)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.removeBeanDefinition(DefaultListableBeanFactory.java:891)
at org.springframework.integration.dsl.context.IntegrationFlowContext.removeDependantBeans(IntegrationFlowContext.java:203)
at org.springframework.integration.dsl.context.IntegrationFlowContext.remove(IntegrationFlowContext.java:189) at
code that removes integration flow
flowContext.remove("flowId");
Update of the invoking code
if (discoveryService.isFLowPresent(flowId))
{
LOG.debug("Removing and creating flow [{}]", flowId);
discoveryService.removeIntegrationFlow(fc.getFeedId());
LOG.debug("Removing Old Job and create fresh one with new params [{}]", flowId);
try
{
discoveryService.createFlow(fc.getFeedId());
}
catch (ExecutionException e)
{
throw new IllegalStateException("Error while starting flow for Integration adapter [{}]" + fc.getFeedId(), e);
}
}

How to know which queue is allocated to which consumer-RocketMQ?

Consumer queues are allocated in client side, broker knows nothing about this.
So how can we monitor which queue is allocated to which consumer client?
Though there is no exiting command, for each message queue per consumer group, You can find out the client using provided admin infrastructure. Here is the snippet achieving this:
private Map<MessageQueue, String> getClientConnection(DefaultMQAdminExt defaultMQAdminExt, String groupName){
Map<MessageQueue, String> results = new HashMap<MessageQueue, String>();
try{
ConsumerConnection consumerConnection = defaultMQAdminExt.examineConsumerConnectionInfo(groupName);
for (Connection connection : consumerConnection.getConnectionSet()){
String clinetId = connection.getClientId();
ConsumerRunningInfo consumerRunningInfo = defaultMQAdminExt.getConsumerRunningInfo(groupName, clinetId, false);
for(MessageQueue messageQueue : consumerRunningInfo.getMqTable().keySet()){
results.put(messageQueue, clinetId + " " + connection.getClientAddr());
}
}
}catch (Exception e){
}
return results;
}
In case you have not used the RocketMQ-Console project, please try and run it: https://github.com/rocketmq/rocketmq-console-ng
In the Consumer tab, Click "consumer detail" button, you will see message queue allocation result visually as below:
Message queues allocation result

jt400 write record throws "CPF5035 Data mapping error"

I have a table with many, many fields. When trying to insert data with jt400 (flei00.write(newrec);) I get error CPF5035 Data mapping error on member FLEI00.. Even when trying to insert empty or nearly empty record, the error message is the same. Is there a way to get know, which field is causing the problem? I've been fighting with it a whole day and have no more idea what to check :-(. Any help (e.g. where to look for more info) will be appreciated.
On IBM i, the job log is THE place to find details about errors occurring in a given job. In the case of JT400 jobs, the JT400 app connects via sockets to a server job. Typically, there are a bunch of these jobs 'prestarted', waiting for a connection. This can be difficult to navigate if you're not accustomed to the 5250 interface.
Here's a JT400 program that gets the job log messages for you. If you run this in the same session that you are getting the error in you should see the details about what field is causing the issue.
import java.util.*;
import com.ibm.as400.access.*;
public class TestJobLog {
public static void main(String[] args) {
int i = 0;
try {
AS400 system = new AS400();
JobLog jobLog = new JobLog(system);
// what attributes?
jobLog.clearAttributesToRetrieve();
jobLog.addAttributeToRetrieve(JobLog.MESSAGE_WITH_REPLACEMENT_DATA);
jobLog.addAttributeToRetrieve(JobLog.MESSAGE_HELP_WITH_REPLACEMENT_DATA);
// load the messages
jobLog.load();
// Create a list and subset it
Enumeration list = jobLog.getMessages();
System.out.println("There are " + Integer.toString(jobLog.getLength()) + " messages.");
while (list.hasMoreElements()) {
i++;
QueuedMessage message = (QueuedMessage) list.nextElement();
String text = message.getID() +
" " + message.getType() +
" " + message.getText() + "\n" +
" " + message.getMessageHelpReplacement() + "\n";
System.out.println(Integer.toString(i) + " " + text);
}
jobLog.close();
System.exit(0);
} catch (Exception e) {
System.out.println(e);
}
}
}

Hive JDBC getConnection does not return

I'm following the hive JDBC tutorial. I could not get it working. When it try to get the connection it just hangs. It does not report any error either. I'm sure the Hive server is running. Any help?
public class HiveJdbcClient {
private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
public static void main(String[] args){
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
e.printStackTrace();
System.exit(1);
}
try{
Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", "");
System.out.println("got the connection");
}catch(SQLException e){
e.printStackTrace();
}
}
}
output of the netstat:
$ sudo netstat -anlp | grep 10000
Password:
tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 27738/java
tcp 107 0 127.0.0.1:10000 127.0.0.1:45910 ESTABLISHED 27738/java
tcp 0 0 127.0.0.1:33665 127.0.0.1:10000 ESTABLISHED 24475/java
tcp 0 0 127.0.0.1:45910 127.0.0.1:10000 ESTABLISHED 7445/java
tcp 107 0 127.0.0.1:10000 127.0.0.1:33665 ESTABLISHED 27738/java
Naresh: Try stopping the triffserver, then move to the HIVE_HOME/bin directory from your terminal, then start the hive trift server using the ./hive --service hiveserver 10000 & command. Then try running the program. Do a create table as per the hive client wiki example . Then do a show tables query in the next step. Let us know the result once this steps are followed. We can have a discussion after that.
You can do the following to pinpoint where the hang is happening. Here is an example that I did to trace it in my broken Hive JDBC connection. Note that this is not a concrete solution to any generic hive connection hanging error.
This is an answer to the question: "how can I find out where my JDBC hive connection is hanging? "
What makes this hard to trace is the JDBC dynamic invocation. Instead, you can just manually createa HiveConnection() class. That allows you to add some tracing into the code directly, to see where the hang is happening.
I've traced this by doing the following.
* USING LOG4J *
The thrift and other JDBC hive classes use log4j when connecting, if you turn DEBUG logging on, you can see fine grained errors. You can do this easily by adding
BasicConfigurator.configure()
To your client app. In any case, doing this led me to find that this seems to be stalling in the SASL transport layer. I guess it could be security related but I would assume that a security error would STILL return, and not hang... So I think this may be worthy of a JIRA. I've pasted a follow up question:
How can I trace the failure ot TSaslTransport (hive related)
* ANOTHER METHOD *
1) You can grab a copy of the "HiveConnection" class from github, or wherever, and instantiate a new one:
String url=String.format("jdbc:hive2://%s:%s/default",
server,
port)
Properties p = new Properties();
p.setProperty("host", con);
Connection jdbc = new HiveConnection(url,p);
Then, you can add your debugger hooks or log statements to the HiveConnection() class.
Ulitmately, when i had this error, I traced it to:
openTransport
Which ultimately creates a
org.apache.thrift.transport.TSaslClientTransport
instance.
And the hang happens in this code block:
try {
System.out.println(".....1 carrying on... attempting to open. " + transport.getClass().getName());
transport.open();
System.out.println("done open.");
}
catch (TTransportException e) {
System.out.println("2 fail .");
e.printStackTrace();
throw new SQLException("Could not establish connection to "
+ uri + ": " + e.getMessage(), " 08S01");
}
FYI I've posted a follow up regarding why MY connection failed. It might be related to yours as well... How can I trace the failure ot TSaslTransport (hive related)
I had the same problem/ Check this params:
driverName = "org.apache.hive.jdbc.HiveDriver"
con = DriverManager.getConnection("jdbc:hive2://192.168.1.93:10000/default", "", "");

Resources