Spark Kafka Receiver is not picking data from all partitions - spark-streaming

I have created a Kafka topic with 5 partitions. And I am using createStream receiver API like following. But somehow only one receiver is getting the input data. Rest of receivers are not processign anything. Can you please help?
JavaPairDStream<String, String> messages = null;
if(sparkStreamCount > 0){
// We create an input DStream for each partition of the topic, unify those streams, and then repartition the unified stream.
List<JavaPairDStream<String, String>> kafkaStreams = new ArrayList<JavaPairDStream<String, String>>(sparkStreamCount);
for (int i = 0; i < sparkStreamCount; i++) {
kafkaStreams.add( KafkaUtils.createStream(jssc, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap));
}
messages = jssc.union(kafkaStreams.get(0), kafkaStreams.subList(1, kafkaStreams.size()));
}
else{
messages = KafkaUtils.createStream(jssc, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap);
}
After adding the changes I am getting following exceptions:
INFO : org.apache.spark.streaming.kafka.KafkaReceiver - Connected to localhost:2181
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Stopping receiver with message: Error starting receiver 0: java.lang.AssertionError: assertion failed
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Called receiver onStop
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Deregistering receiver 0
ERROR: org.apache.spark.streaming.scheduler.ReceiverTracker - Deregistered receiver for stream 0: Error starting receiver 0 - java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:165)
at kafka.consumer.TopicCount$$anonfun$makeConsumerThreadIdsPerTopic$2.apply(TopicCount.scala:36)
at kafka.consumer.TopicCount$$anonfun$makeConsumerThreadIdsPerTopic$2.apply(TopicCount.scala:34)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at kafka.consumer.TopicCount$class.makeConsumerThreadIdsPerTopic(TopicCount.scala:34)
at kafka.consumer.StaticTopicCount.makeConsumerThreadIdsPerTopic(TopicCount.scala:100)
at kafka.consumer.StaticTopicCount.getConsumerThreadIdsPerTopic(TopicCount.scala:104)
at kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:198)
at kafka.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:138)
at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:111)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1986)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1986)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Stopped receiver 0
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Stopping BlockGenerator
INFO : org.apache.spark.streaming.util.RecurringTimer - Stopped timer for BlockGenerator after time 1473964037200
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Waiting for block pushing thread to terminate
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Pushing out the last 0 blocks
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Stopped block pushing thread
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Stopped BlockGenerator
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Waiting for receiver to be stopped
ERROR: org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Stopped receiver with error: java.lang.AssertionError: assertion failed
ERROR: org.apache.spark.executor.Executor - Exception in task 0.0 in stage 29.0

There is one issue with the above code. The kafkaTopicMap parameter in KafkaUtils.createStream method specify Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread.
Try the below code:
JavaPairDStream<String, String> messages = null;
int sparkStreamCount = 5;
Map<String, Integer> kafkaTopicMap = new HashMap<String, Integer>();
if (sparkStreamCount > 0) {
List<JavaPairDStream<String, String>> kafkaStreams = new ArrayList<JavaPairDStream<String, String>>(sparkStreamCount);
for (int i = 0; i < sparkStreamCount; i++) {
kafkaTopicMap.put(topic, i+1);
kafkaStreams.add(KafkaUtils.createStream(streamingContext, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap));
}
messages = streamingContext.union(kafkaStreams.get(0), kafkaStreams.subList(1, kafkaStreams.size()));
} else {
messages = KafkaUtils.createStream(streamingContext, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap);
}

Related

Debezium - Oracle Connector - Service Not Starting

DebeziumEngine looking for kafka topic eventhough i have not specified KafkaOffsetBackingStore for offset.storage
Reference : DebeziumEngine Config
Config
Configuration config = Configuration.create()
.with("name", "oracle_debezium_connector")
.with("connector.class", "io.debezium.connector.oracle.OracleConnector")
.with("offset.storage", "org.apache.kafka.connect.storage.FileOffsetBackingStore")
.with("offset.storage.file.filename", "/Users/dk/Documents/work/ACET/offset.dat")
.with("offset.flush.interval.ms", 2000)
.with("database.hostname", "localhost")
.with("database.port", "1521")
.with("database.user", "pravin")
.with("database.password", "*****")
.with("database.sid", "ORCLCDB")
.with("database.server.name", "mServer")
.with("database.out.server.name", "dbzxout")
.with("database.history", "io.debezium.relational.history.FileDatabaseHistory")
.with("database.history.file.filename", "/Users/dk/Documents/work/ACET/dbhistory.dat")
.with("topic.prefix","cycowner")
.with("database.dbname", "ORCLCDB")
.build();
DebeziumEngine
DebeziumEngine<ChangeEvent<String, String>> engine = DebeziumEngine.create(Json.class)
.using(config.asProperties())
.using(connectorCallback)
.using(completionCallback)
.notifying(record -> {
System.out.println(record);
})
.build();
Error :
2022-10-29T16:06:16,457 ERROR [pool-2-thread-1] i.d.c.Configuration: The 'schema.history.internal.kafka.topic' value is invalid: A value is required
2022-10-29T16:06:16,457 ERROR [pool-2-thread-1] i.d.c.Configuration: The 'schema.history.internal.kafka.bootstrap.servers' value is invalid: A value is required**
2022-10-29T16:06:16,458 INFO [pool-2-thread-1] i.d.c.c.BaseSourceTask: Stopping down connector
2022-10-29T16:06:16,463 INFO [pool-3-thread-1] i.d.j.JdbcConnection: Connection gracefully closed
2022-10-29T16:06:16,465 INFO [pool-2-thread-1] o.a.k.c.s.FileOffsetBackingStore: Stopped FileOffsetBackingStore
connector stopped successfully
---------------------------------------------------
success status: false, message : Unable to initialize and start connector's task class 'io.debezium.connector.oracle.OracleConnectorTask' with config: {connector.class=io.debezium.connector.oracle.OracleConnector, database.history.file.filename=/Users/dkuma416/Documents/work/ACET/dbhistory.dat, database.user=pravin, database.dbname=ORCLCDB, offset.storage=org.apache.kafka.connect.storage.FileOffsetBackingStore, database.server.name=mServer, offset.flush.timeout.ms=5000, errors.retry.delay.max.ms=10000, database.port=1521, database.sid=ORCLCDB, offset.flush.interval.ms=2000, topic.prefix=cycowner, offset.storage.file.filename=/Users/dkuma416/Documents/work/ACET/offset.dat, errors.max.retries=-1, database.hostname=localhost, database.password=********, name=oracle_debezium_connector, database.out.server.name=dbzxout, errors.retry.delay.initial.ms=300, value.converter=org.apache.kafka.connect.json.JsonConverter, key.converter=org.apache.kafka.connect.json.JsonConverter, database.history=io.debezium.relational.history.MemoryDatabaseHistory}, **Error: Error configuring an instance of KafkaSchemaHistory; check the logs for details**

RabbitMQ channel.addConfirmListener() , interface ackCallback Some callbacks are missing?

This is my code, channel.addConfirmListener() ackCallback Some callbacks will be lost, The message is indeed sent to the rabbitmq server and can be consumed normally , But I sleep for 2ms after sending the message, and all ack callbacks can be received,
I don't know if this is an error in my code or a rabbitmq bug
import com.rabbitmq.client.AMQP;
import com.rabbitmq.client.Channel;
import com.rabbitmq.client.Connection;
import com.rabbitmq.client.ConnectionFactory;
import lombok.extern.log4j.Log4j2;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.concurrent.TimeoutException;
#Log4j2
public class 异步确认发布{
public static void main(String[] args) throws IOException, TimeoutException, InterruptedException {
ConnectionFactory connectionFactory = new ConnectionFactory();
connectionFactory.setHost("");
connectionFactory.setPort(7005);
connectionFactory.setUsername("");
connectionFactory.setPassword("");
Connection connection = connectionFactory.newConnection();
Channel channel = connection.createChannel();
// 开启确认发布
AMQP.Confirm.SelectOk selectOk = channel.confirmSelect();
channel.queueDeclare("hello", true, false, false, null);
// 异步确认发布消息 回调
channel.addConfirmListener(
(deliveryTag, multiple) -> {
log.info("消息deliveryTag=>{}, send successful", deliveryTag);
},
(deliveryTag, multiple) -> {
log.info("消息deliveryTag=>{}, fail in send", deliveryTag);
}
);
for (int i = 0; i < 5; i++) {
String message = "Hello World!!! " + i;
channel.basicPublish("", "hello", null, message.getBytes(StandardCharsets.UTF_8));
}
}
}
The console shows some callbacks missing
17:04:29.607 [AMQP Connection 27.11.210.232:7005] INFO me.demo.me.rabbitmq.consumer.发布确认.异步确认发布 - ackCallback, deliveryTag=>4, send successful
17:04:29.615 [AMQP Connection 27.11.210.232:7005] INFO me.demo.me.rabbitmq.consumer.发布确认.异步确认发布 - ackCallback, deliveryTag=>5, send successful
But I sleep for 2ms after sending the message, and all callbacks can be received
example code
for (int i = 0; i < 5; i++) {
String message = "Hello World!!! " + i;
channel.basicPublish("", "hello", null, message.getBytes(StandardCharsets.UTF_8));
Thread.sleep(2); // I sleep for 2ms after sending the message, and all ack callbacks can be received
}
console log
17:05:18.037 [AMQP Connection 27.11.210.232:7005] INFO me.demo.me.rabbitmq.consumer.发布确认.异步确认发布 - ackCallback, deliveryTag=>1, send successful
17:05:18.043 [AMQP Connection 27.11.210.232:7005] INFO me.demo.me.rabbitmq.consumer.发布确认.异步确认发布 - ackCallback, deliveryTag=>2, send successful
17:05:18.043 [AMQP Connection 27.11.210.232:7005] INFO me.demo.me.rabbitmq.consumer.发布确认.异步确认发布 - ackCallback, deliveryTag=>3, send successful
17:05:18.043 [AMQP Connection 27.11.210.232:7005] INFO me.demo.me.rabbitmq.consumer.发布确认.异步确认发布 - ackCallback, deliveryTag=>4, send successful
17:05:18.043 [AMQP Connection 27.11.210.232:7005] INFO me.demo.me.rabbitmq.consumer.发布确认.异步确认发布 - ackCallback, deliveryTag=>5, send successful
My RabbitMQ Server Version is 3.9.14 (No configuration has been modified. The default configuration is used), Erlang 24.3.2 ,
Maven Project dependency in
<dependency>
<groupId>org.springframework.amqp</groupId>
<artifactId>spring-rabbit</artifactId>
<version>2.2.18.RELEASE</version>
</dependency>
I tried to prevent the main thread from shutting down, but it doesn't seem to be the reason for the main thread to shut down, because the main thread won't shut down automatically once the connection is created
I am not sure why you tagged this with spring-rabbit because you are not using the spring-rabbit APIs at all; you are using the amqp-client directly.
This is working as designed; for performance reasons, the confirm callback has the additional argument multiple when true; this means that all tags up to and including this one are confirmed with a single confirmation.
https://www.rabbitmq.com/tutorials/tutorial-seven-java.html
multiple: this is a boolean value. If false, only one message is confirmed/nack-ed, if true, all messages with a lower or equal sequence number are confirmed/nack-ed.

Saving RDD using a Proprietary OutputFormatter

I am using a Proprietary database which provides its own OutputFormatter. Using This OutputFormatter I can write a Map Reduce Job and save the data from MR into this database.
However I am trying to use the OutputFormatter inside of Spark and trying to save an RDD to a database.
The code I have written is
object VerticaSpark extends App {
val scConf = new SparkConf
val sc = new SparkContext(scConf)
val conf = new Configuration()
val job = new Job(conf)
job.setInputFormatClass(classOf[VerticaInputFormat])
job.setOutputKeyClass(classOf[Text])
job.setOutputValueClass(classOf[VerticaRecord])
job.setOutputFormatClass(classOf[VerticaOutputFormat])
VerticaInputFormat.setInput(job, "select * from Foo where key = ?", "1", "2", "3", "4")
VerticaOutputFormat.setOutput(job, "Bar", true, "name varchar", "total int")
val rddVR : RDD[VerticaRecord] = sc.newAPIHadoopRDD(job.getConfiguration, classOf[VerticaInputFormat], classOf[LongWritable], classOf[VerticaRecord]).map(_._2)
val rddTup = rddVR.map(x => (x.get(1).toString(), x.get(2).toString().toInt))
val rddGroup = rddTup.reduceByKey(_ + _)
val rddVROutput = rddGroup.map({
case(x, y) => (new Text("Bar"), getVerticaRecord(x, y, job.getConfiguration))
})
//rddVROutput.saveAsNewAPIHadoopFile("Bar", classOf[Text], classOf[VerticaRecord], classOf[VerticaOutputFormat], job.getConfiguration)
rddVROutput.saveAsNewAPIHadoopDataset(job.getConfiguration)
def getVerticaRecord(name : String, value : Int , conf: Configuration) : VerticaRecord = {
var retVal = new VerticaRecord(conf)
//println(s"going to build Vertica Record with ${name} and ${value}")
retVal.set(0, new Text(name))
retVal.set(1, new IntWritable(value))
retVal
}
}
I entire solution can be downloaded from here
https://github.com/abhitechdojo/VerticaSpark.git
My code works perfectly till the saveAsNewAPIHadoopFile function is reached. At this line it throws a NullPointer Exception
The same logic and same Input and Output Formatter work perfectly in a Map Reduce Program and I can write successfully from DB using the MR program
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/HadoopIntegrationGuide/HadoopConnector/ExampleHadoopConnectorApplication.htm%3FTocPath%3DIntegrating%2520with%2520Hadoop%7CUsing%2520the%2520%2520MapReduce%2520Connector%7C_____7
The stack trace of the error is
16/01/15 16:42:53 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 5, machine): java.lang.NullPointerException
at com.abhi.VerticaSpark$$anonfun$4.apply(VerticaSpark.scala:39)
at com.abhi.VerticaSpark$$anonfun$4.apply(VerticaSpark.scala:38)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:999)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 12, machine): java.lang.NullPointerException
at com.abhi.VerticaSpark$$anonfun$4.apply(VerticaSpark.scala:39)
at com.abhi.VerticaSpark$$anonfun$4.apply(VerticaSpark.scala:38)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:999)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
16/01/15 16:42:54 INFO TaskSetManager: Lost task 3.1 in stage 1.0 (TID 11) on executor machine: java.lang.NullPointerException (null) [duplicate 7]

keeping connection alive to websocket when using ServerWebSocketContainer

I was trying to create a websocket based application where the server needs to keep the connection alive with the clients using heartbeat.
I checked the server ServerWebSocketContainer.SockJsServiceOptions class for the same, but could not use it. I am using the code from the spring-integration sample
#Bean
ServerWebSocketContainer serverWebSocketContainer() {
return new ServerWebSocketContainer("/messages").withSockJs();
}
#Bean
MessageHandler webSocketOutboundAdapter() {
return new WebSocketOutboundMessageHandler(serverWebSocketContainer());
}
#Bean(name = "webSocketFlow.input")
MessageChannel requestChannel() {
return new DirectChannel();
}
#Bean
IntegrationFlow webSocketFlow() {
return f -> {
Function<Message , Object> splitter = m -> serverWebSocketContainer()
.getSessions()
.keySet()
.stream()
.map(s -> MessageBuilder.fromMessage(m)
.setHeader(SimpMessageHeaderAccessor.SESSION_ID_HEADER, s)
.build())
.collect(Collectors.toList());
f.split( Message.class, splitter)
.channel(c -> c.executor(Executors.newCachedThreadPool()))
.handle(webSocketOutboundAdapter());
};
}
#RequestMapping("/hi/{name}")
public void send(#PathVariable String name) {
requestChannel().send(MessageBuilder.withPayload(name).build());
}
Please let me know how can I set the heartbeat options ensure the connection is kept alive unless the client de-registers itself.
Thanks
Actually you got it right, but missed a bit of convenience :-).
You can configure it like this:
#Bean
ServerWebSocketContainer serverWebSocketContainer() {
return new ServerWebSocketContainer("/messages")
.withSockJs(new ServerWebSocketContainer.SockJsServiceOptions()
.setHeartbeatTime(60_000));
}
Although it isn't clear for me why you need to configure it at all because of this:
/**
* The amount of time in milliseconds when the server has not sent any
* messages and after which the server should send a heartbeat frame to the
* client in order to keep the connection from breaking.
* <p>The default value is 25,000 (25 seconds).
*/
public SockJsServiceRegistration setHeartbeatTime(long heartbeatTime) {
this.heartbeatTime = heartbeatTime;
return this;
}
UPDATE
In the Spring Integration Samples we have something like stomp-chat application.
I have done there something like this to the stomp-server.xml:
<int-websocket:server-container id="serverWebSocketContainer" path="/chat">
<int-websocket:sockjs heartbeat-time="10000"/>
</int-websocket:server-container>
Added this to the application.properties:
logging.level.org.springframework.web.socket.sockjs.transport.session=trace
And this to the index.html:
sock.onheartbeat = function() {
console.log('heartbeat');
};
After connecting the client I see this in the server log:
2015-10-13 19:03:06.574 TRACE 7960 --- [ SockJS-3] s.w.s.s.t.s.WebSocketServerSockJsSession : Writing SockJsFrame content='h'
2015-10-13 19:03:06.574 TRACE 7960 --- [ SockJS-3] s.w.s.s.t.s.WebSocketServerSockJsSession : Cancelling heartbeat in session sogfe2dn
2015-10-13 19:03:06.574 TRACE 7960 --- [ SockJS-3] s.w.s.s.t.s.WebSocketServerSockJsSession : Scheduled heartbeat in session sogfe2dn
2015-10-13 19:03:16.576 TRACE 7960 --- [ SockJS-8] s.w.s.s.t.s.WebSocketServerSockJsSession : Preparing to write SockJsFrame content='h'
2015-10-13 19:03:16.576 TRACE 7960 --- [ SockJS-8] s.w.s.s.t.s.WebSocketServerSockJsSession : Writing SockJsFrame content='h'
2015-10-13 19:03:16.576 TRACE 7960 --- [ SockJS-8] s.w.s.s.t.s.WebSocketServerSockJsSession : Cancelling heartbeat in session sogfe2dn
2015-10-13 19:03:16.576 TRACE 7960 --- [ SockJS-8] s.w.s.s.t.s.WebSocketServerSockJsSession : Scheduled heartbeat in session sogfe2dn
In the browser's console I see this after:
So, looks like heart-beat feature works well...

Unable to run distributed shell on YARN

I am trying to run distributed shell example on YARN cluster.
#Test
public void realClusterTest() throws Exception {
System.setProperty("HADOOP_USER_NAME", "hdfs");
String[] args = {
"--jar",
APPMASTER_JAR,
"--num_containers",
"1",
"--shell_command",
"ls",
"--master_memory",
"512",
"--container_memory",
"128"
};
LOG.info("Initializing DS Client");
Client client = new Client(new Configuration());
boolean initSuccess = client.init(args);
Assert.assertTrue(initSuccess);
LOG.info("Running DS Client");
boolean result = client.run();
LOG.info("Client run completed. Result=" + result);
Assert.assertTrue(result);
}
But it fails with:
2013-09-17 11:45:28,338 INFO [main] distributedshell.Client (Client.java:monitorApplication(600)) - Got application report from ASM for, appId=11, clientToAMToken=null, appDiagnostics=Application application_1379338026167_0011 failed 2 times due to AM Container for appattempt_1379338026167_0011_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
at org.apache.hadoop.util.Shell.run(Shell.java:373)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
................
.Failing this attempt.. Failing the application., appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1379407525237, yarnAppState=FAILED, distributedFinalState=FAILED, appTrackingUrl=ip-10-232-149-222.us-west-2.compute.internal:8088/proxy/application_1379338026167_0011/, appUser=hdfs
Here is what I see in server logs:
2013-09-17 08:45:26,870 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(213)) - Exception from container-launch with container ID: container_1379338026167_0011_02_000001 and exit code: 1
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
at org.apache.hadoop.util.Shell.run(Shell.java:373)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:258)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:74)
The question is how can I get more details to identify what is going wrong.
PS: we are using HDP 2.0.5

Resources