Set heartbeat via command line in hadoop - hadoop

It is possible to set the heartbeat of the nodemanager parameter via command line in hadoop ?
How ?
In alternative is possible to modify such parameter without restart the cluster ?
The parameter I am interested in manage is yarn.resourcemanager.nodemanagers.heartbeat-interval-ms under yarn-default.xml

You cannot set this parameter yarn.resourcemanager.nodemanagers.heartbeat-interval-ms (indicates The heart-beat interval in milliseconds for every NodeManager in the cluster.) using command line.
You can change this parameter in yarn-site.xml and then you need to re-start the services.
The reason being, this parameter is read once, when the Resource Tracker Service is started in the Resource Manager. The heart beat interval is returned to the Node Manager, as part of NodeHeartbeatResponse.
// Heartbeat response
NodeHeartbeatResponse nodeHeartBeatResponse = YarnServerBuilderUtils
.newNodeHeartbeatResponse(lastNodeHeartbeatResponse.
getResponseId() + 1, NodeAction.NORMAL, null, null, null, null,
nextHeartBeatInterval);
The parameter nextHeartBeatInterval in the call above, is read in serviceInit() method of Resource Tracker Service:
nextHeartBeatInterval =
conf.getLong(YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS,
YarnConfiguration.DEFAULT_RM_NM_HEARTBEAT_INTERVAL_MS);
if (nextHeartBeatInterval <= 0) {
throw new YarnRuntimeException("Invalid Configuration. "
+ YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS
+ " should be larger than 0.");
}
Also, the value of yarn.resourcemanager.nodemanagers.heartbeat-interval-ms (default 1000) should be less than value of yarn.nm.liveness-monitor.expiry-interval-ms (default 600000). yarn.nm.liveness-monitor.expiry-interval-ms indicates How long to wait until a node manager is considered dead..
The check for this is in validateConfigs() method of the Resource Manager:
// validate expireIntvl >= heartbeatIntvl
long expireIntvl = conf.getLong(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS,
YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS);
long heartbeatIntvl =
conf.getLong(YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS,
YarnConfiguration.DEFAULT_RM_NM_HEARTBEAT_INTERVAL_MS);
if (expireIntvl < heartbeatIntvl) {
throw new YarnRuntimeException("Nodemanager expiry interval should be no"
+ " less than heartbeat interval, "
+ YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS + "=" + expireIntvl
+ ", " + YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS + "="
+ heartbeatIntvl);
}

Related

How do I get the get list of connections from processContext in NiFi 1.11.4

Our production instance of NiFi is version 1.8.0. We have a custom processor that continually looks at it's downstream connections in order to route flow files based on the connection's queue size.
Here is the salient snippet of how we do this . . .
String processorId = this.getIdentifier();
ProcessorGroupStatus processGroupStatus = ((EventAccess) getControllerService()).getContollerStatus();
Collection<ConnectionStatus> groupConnections = processGroupStatus.getConnectionStatus();
ArrayList connections = new ArrayList<>(groupConnections);
for (Object processorConnection : connections) {
ConnectionStatus connection = (ConnectionStatus) processorConnection;
if(connection.getSourceId().equals(processorId){
//do stuff with connection.getQueuedCount() & connection.getQueuedBytes()
break;
}
}
Everything has been working as expected for the last couple of years. However, upgrading our NiFi instance to version 1.11.4 has broken this approach. The exception thrown is:
class org.apache.nifi.contoller.serviceStandardContollerServiceProvider cannot be cast to class org.apache.nifi.reporting.EventAccess
Is there another way to retrieve connections from processContext?
One approach that may be more upwardly compatible (and easier to maintain) than a custom Java processor would be to use the ExecuteGroovyScript processor.
The Groovy script in this case would look something like:
ff = session.get()
if (ff) {
me = context.procNode
processorId = me.identifier
connections = me.processGroup.connections
connections.each { connection ->
if(connection.source.identifier.equals(processorId)) {
ff[connection.identifier] = "I am the source " +
"[" + connection.flowFileQueue.size().objectCount + "]" +
"[" + connection.flowFileQueue.size().byteCount + "]"
}
else {
ff[connection.identifier] = "I am NOT the source; my name is [" + connection.name + "]"
}
}
REL_SUCCESS << ff
}
To find out what is available to the Groovy script, I use a combination of the NiFi JavaDocs (https://javadoc.io/static/org.apache.nifi/nifi-api/1.12.0/index.html) and the Github code for NiFi (https://github.com/apache/nifi/tree/c396927299586b896df4ebc745793b4c451f3898/nifi-api/src/main/java/org/apache/nifi).
As a side note, we converted our custom Java processors to Groovy script, because of an upgrade incompatibility when going to (ironically) 1.8.0. We have not had an issue with NiFi upgrades since then, and are currently running v 1.11.4.

How to know if preemption is happening on Yarn fair share scheduler?

Is there any way to know for sure if the preemption mechanism has been triggered on YARN?
In the YARN Resource Manager or the logs maybe?
If your log level is set to info you should see this in the YARN Resource Manager logs.
// Warn application about containers to be killed
for (RMContainer container : containers) {
FSAppAttempt app = scheduler.getSchedulerApp(
container.getApplicationAttemptId());
LOG.info("Preempting container " + container +
" from queue " + app.getQueueName());
app.trackContainerForPreemption(container);
}
at https://github.com/apache/hadoop/.../scheduler/fair/FSPreemptionThread.java#L143.
And if you have the log level to debug you will also see this:
if (LOG.isDebugEnabled()) {
LOG.debug(
"allocate: post-update" + " applicationAttemptId=" + appAttemptId
+ " #ask=" + ask.size() + " reservation= " + application
.getCurrentReservation());
LOG.debug("Preempting " + preemptionContainerIds.size()
+ " container(s)");
}
at https://github.com/apache/hadoop/.../scheduler/fair/FairScheduler.java#937

MQCMD_INQUIRE_CLUSTER_Q_MGR pcf request not returning cluster information

Isn't MQCMD_INQUIRE_CLUSTER_Q_MGR is equivalent to runmqsc DISPLAY CLUSQMGR(*) command. Following is the output from this command
display clusqmgr(*)
4 : display clusqmgr(*)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM_FR1) CHANNEL(TO.QM_FR1)
CLUSTER(CLUSTER1)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM_FR2) CHANNEL(TO.QM_FR2)
CLUSTER(CLUSTER1)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM_PR1) CHANNEL(TO.QM_PR1)
CLUSTER(CLUSTER1)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM_PR2) CHANNEL(TO.QM_PR2)
CLUSTER(CLUSTER1)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM_PR3) CHANNEL(TO.QM_PR3)
CLUSTER(CLUSTER1)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM_PR3) CHANNEL(TO.QM_PR3)
CLUSTER(CLUSTER1)
I was expecting a similar response with PCF in the code i have supplied, but i don't get this information.
I have tried the following code but this does not return cluster information.
PCFMessageAgent agent = new PCFMessageAgent(queueManager);
agent.setCheckResponses(false);
PCFMessage[] responses;
PCFMessage request = new PCFMessage(MQConstants.MQCMD_INQUIRE_CLUSTER_Q_MGR);
request.addParameter(MQConstants.MQCA_CLUSTER_Q_MGR_NAME, queueManager);
responses = agent.send(request);
String clusterName = (String)responses[0].getParameterValue(MQConstants.MQCA_CLUSTER_NAME);
String clusterInfo = (String)responses[0].getParameterValue(MQConstants.MQIACF_CLUSTER_INFO);
logger.info("Cluster Name [" + clusterName + "]");
logger.info("Cluster Information [" + clusterInfo + "]");
The last line prints out a null.
So the question is
How do I get this information using PCF? The above output is for a full repository queue manager.
The following code displays the required information:
responses = agent.send(request);
for(int i=0; i < responses.length; i++) {
System.out.println("Cluster Queue manager [" + (String)responses[i].getParameterValue(MQConstants.MQCA_CLUSTER_Q_MGR_NAME) + "]");
System.out.println("Cluster Name [" + (String)responses[i].getParameterValue(MQConstants.MQCA_CLUSTER_NAME) + "]");
System.out.println("Cluster Channel [" + (String)responses[i].getParameterValue(MQConstants.MQCACH_CHANNEL_NAME) + "]");
}
The output looks like this:
Cluster Queue manager [QM1 ]
Cluster Name [CLUS1 ]
Cluster Channel [TO.QM1 ]

ntp sync date/time failure causes Zk cleanup procedure not to remove client sessions

If for any reason the ntp sync. date/time looses the correct time and the cluster changes to a future date/time then the created client sessions in Zk will not be removed after 10 minutes if meanwhile the date/time of cluster is again synced to the correct date/time.
Now my assumption is that zookeeper clean procedure is activated (every 10 minutes) in the crontabs but is cleaning sessions that are in the past only!!!
0-59/10 * * * * /opt/dve/bin/zookeeper_cleanup.sh
So there is no way to clean those sessions created in the future but only going Back to the future on the same date and wait until the zookeeper_cleanup.sh cleans the session after 10 minutes.
i tried to remove the log files,but it may not the right way,may be it will cause other problem.
so what i should do is better.
For last ,we remove the log by zookeeper api.as:
Stat stat = null;
List<String> children = this.zooKeeper.getChildren(tempSessionPath, false, stat);
System.out.println(children.size() + " sessions found.");
int count = 0;
for(String child:children){
String path = this.tempSessionPath + "/" + child;
stat = this.zooKeeper.exists(path, false);
if(null != stat && stat.getCtime() > currentTime){
count++;
System.out.println("Session created in future time " + stat.getCtime() + " to be deleted!");
safeDeleteZKNode(path, stat);
}
}
System.out.println(count + " future sessions cleaned!");

Tracking Hadoop job status via web interface? (Exposing Hadoop to internal clients in the company)

I want to develop a website that will allow analysts within the company to run Hadoop jobs (choose from a set of defined jobs) and see their job's status\progress.
Is there an easy way to do this (get running jobs statuses etc.) via Ruby\Python?
How do you expose your Hadoop cluster to internal clients on your company?
I have found one way to get information about jobs on JobTracker. This is the code:
Configuration conf = new Configuration();
conf.set("mapred.job.tracker", "URL");
JobClient client = new JobClient(new JobConf(conf));
JobStatus[] jobStatuses = client.getAllJobs();
for (JobStatus jobStatus : jobStatuses) {
long lastTaskEndTime = 0L;
TaskReport[] mapReports = client.getMapTaskReports(jobStatus.getJobID());
for (TaskReport r : mapReports) {
if (lastTaskEndTime < r.getFinishTime()) {
lastTaskEndTime = r.getFinishTime();
}
}
TaskReport[] reduceReports = client.getReduceTaskReports(jobStatus.getJobID());
for (TaskReport r : reduceReports) {
if (lastTaskEndTime < r.getFinishTime()) {
lastTaskEndTime = r.getFinishTime();
}
}
client.getSetupTaskReports(jobStatus.getJobID());
client.getCleanupTaskReports(jobStatus.getJobID());
System.out.println("JobID: " + jobStatus.getJobID().toString() +
", username: " + jobStatus.getUsername() +
", startTime: " + jobStatus.getStartTime() +
", endTime: " + lastTaskEndTime +
", Durration: " + (lastTaskEndTime - jobStatus.getStartTime()));
}
Since version 'beta 2' of Cloudera's Hadoop Distribution you can almost with no effort use Hadoop User Experience (HUE), which was earlier called Cloudera Desktop.
But since this version it has grown enormously. It comes with job designer,hive interface and many more. You should definitely check this out before deciding to build your own application.
Maybe a good place to start would be to take a look at Cloudera Destktop. It provides a web interface to enable cluster administration and job development tasks. Its free to download.
There is nothing like this that ships with hadoop. It should be trivial to build this functionality. Some of this is available via the JobTracker's page and some you will have to build yourself.

Resources