Spring Batch - ItemStreamException: Output file was not created - spring

I have the following FlatFileItemWriter defined in a multi-threaded step.
public FlatFileItemWriter<School> writer() throws Exception {
FlatFileItemWriter<School> flatFileWriter = new FlatFileItemWriter<School>();
flatFileWriter.setResource(new FileSystemResource("C:\\u01\\SchoolDetails.txt"));
flatFileWriter.setName("School-File-Writer");
flatFileWriter.setAppendAllowed(true);
flatFileWriter.setLineSeparator("\n");
flatFileWriter.setHeaderCallback(writer -> writer.write(columnHeaders()));
flatFileWriter.setLineAggregator(new DelimitedLineAggregator<School>() {
{
setDelimiter("^");
setFieldExtractor((FieldExtractor<School>) schoolFieldExtractor());
}
});
return flatFileWriter;
}
private BeanWrapperFieldExtractor<School> schoolFieldExtractor() {
return new BeanWrapperFieldExtractor<School>() {
{
String[] columnValuesMapper = new String[] {
"schoolName", "schoolAddress"
};
setNames(columnValuesMapper);
}
};
}
The ItemWriter generates the files on most days. But once a while it throws the following error:
2022-02-14 22:07:46.652 [SimpleAsyncTaskExecutor-25] INFO SpringBatchConfiguration:703 - Item Reader
2022-02-14 22:07:46.653 [SimpleAsyncTaskExecutor-25] INFO PagingItemReader:80 - reading records 1 to 10
2022-02-14 22:07:46.657 [SimpleAsyncTaskExecutor-28] INFO PagingItemReader:80 - reading records 11 to 20
2022-02-14 22:07:46.661 [SimpleAsyncTaskExecutor-27] INFO PagingItemReader:80 - reading records 21 to 30
2022-02-14 22:07:46.665 [SimpleAsyncTaskExecutor-26] INFO PagingItemReader:80 - reading records 31 to 40
2022-02-14 22:07:46.998 [SimpleAsyncTaskExecutor-25] INFO o.s.batch.core.step.AbstractStep:272 - Step: [childStep:partition1] executed in 350ms
2022-02-14 22:07:47.005 [SimpleAsyncTaskExecutor-28] INFO o.s.batch.core.step.AbstractStep:272 - Step: [childStep:partition3] executed in 357ms
2022-02-14 22:07:47.033 [SimpleAsyncTaskExecutor-27] ERROR o.s.batch.core.step.AbstractStep:237 - Encountered an error executing step childStep in School-Job-Process
org.springframework.batch.item.ItemStreamException: Output file was not created: [/u01/TotalRecordsFound-20220214.txt]
at org.springframework.batch.item.util.FileUtils.setUpOutputFile(FileUtils.java:76)
at org.springframework.batch.item.support.AbstractFileItemWriter$OutputState.initializeBufferedWriter(AbstractFileItemWriter.java:553)
at org.springframework.batch.item.support.AbstractFileItemWriter$OutputState.access$000(AbstractFileItemWriter.java:385)
at org.springframework.batch.item.support.AbstractFileItemWriter.doOpen(AbstractFileItemWriter.java:319)
at org.springframework.batch.item.support.AbstractFileItemWriter.open(AbstractFileItemWriter.java:309)
at org.springframework.batch.item.support.AbstractFileItemWriter$$FastClassBySpringCGLIB$$f2d35c3.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:771)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:136)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:124)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:691)
at org.springframework.batch.item.file.FlatFileItemWriter$$EnhancerBySpringCGLIB$$294bdfee.open(<generated>)
at org.springframework.batch.item.support.CompositeItemStream.open(CompositeItemStream.java:103)
at org.springframework.batch.core.step.tasklet.TaskletStep.open(TaskletStep.java:311)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:205)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:138)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:135)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:834)
The error occurs intermittenly. The error occurs when two or more threads collides to create and write the data to the file. I can avoid it by delegating my FlatFileItemWriter to a SynchronizedItemStreamWriter. But the Spring docs suggest otherwise. The docs suggest that using a FlatFileItemWriter in a multi-threaded step does NOT require synchronizing writes.
So, I am not sure on how I can avoid these errors and also according to the logs the first two partitions successfully completed running which means the file is created successfully and data is written to it (if exists). So, how is the third partition telling that the file is not created when its already created by the first two paritions.
Any help would be appreciated. Thanks in advance.

Related

invalid stored block lengths java.io.IOException: invalid stored block lengths

I have one zip that contains 2 JSON files. When I process those files I got the following error. The first file is processed properly when it comes to the second file which throws ZipException.
How can I fix this?
My code,
private Path archivePath = Paths.get("src/test/resources/data/files.zip";
public void uploadData() throws IOException {
FileSystem fileFS = FileSystems.newFileSystem(archivePath,(ClassLoader) null);
String[] pathNames = {"file_1.json","file_2.json"};
for(String pathName : pathNames) {
//unzip files
Path path = fileFS.getPath(pathName);
Request request = new Request("POST","/_bulk");
RequestOptions.Builder options = RequestOptions.DEFAULT.toBuilder();
options.addHeader("Content-Type","application/x-ndjson");
request.setOptions(options);
request.setEntity(new InputStreamEntity(Files.newInputStream(path)));
Response response = client.performRequest(request);
assertResponseSuccessful(response,false);
}
Error:
invalid stored block lengths
java.io.IOException: invalid stored block lengths
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:828)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:134)
at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:597)
at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:173)
at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:816)
at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:146)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at org.testng.TestRunner.privateRun(TestRunner.java:766)
at org.testng.TestRunner.run(TestRunner.java:587)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
at org.testng.SuiteRunner.access$000(SuiteRunner.java:28)
at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:425)
at org.testng.internal.thread.ThreadUtil.lambda$execute$0(ThreadUtil.java:68)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.zip.ZipException: invalid stored block lengths
at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:165)
at java.base/java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:388)
at org.apache.http.nio.entity.EntityAsyncContentProducer.produceContent(EntityAsyncContentProducer.java:67)
at org.apache.http.nio.protocol.BasicAsyncRequestProducer.produceContent(BasicAsyncRequestProducer.java:125)
at org.apache.http.impl.nio.client.MainClientExec.produceContent(MainClientExec.java:262)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.produceContent(DefaultClientExchangeHandlerImpl.java:140)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.outputReady(HttpAsyncRequestExecutor.java:249)
at org.apache.http.impl.nio.client.InternalRequestExecutor.outputReady(InternalRequestExecutor.java:96)
at org.apache.http.impl.nio.DefaultNHttpClientConnection.produceOutput(DefaultNHttpClientConnection.java:290)
at org.apache.http.impl.nio.client.InternalIODispatch.onOutputReady(InternalIODispatch.java:86)
at org.apache.http.impl.nio.client.InternalIODispatch.onOutputReady(InternalIODispatch.java:39)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.outputReady(AbstractIODispatch.java:145)
at org.apache.http.impl.nio.reactor.BaseIOReactor.writable(BaseIOReactor.java:187)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:341)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
... 1 more
Any inputs here are really be appreciated.

Issue with FlatFileItemWriter in multithreaded step [duplicate]

I have the following FlatFileItemWriter defined in a multi-threaded step.
public FlatFileItemWriter<School> writer() throws Exception {
FlatFileItemWriter<School> flatFileWriter = new FlatFileItemWriter<School>();
flatFileWriter.setResource(new FileSystemResource("C:\\u01\\SchoolDetails.txt"));
flatFileWriter.setName("School-File-Writer");
flatFileWriter.setAppendAllowed(true);
flatFileWriter.setLineSeparator("\n");
flatFileWriter.setHeaderCallback(writer -> writer.write(columnHeaders()));
flatFileWriter.setLineAggregator(new DelimitedLineAggregator<School>() {
{
setDelimiter("^");
setFieldExtractor((FieldExtractor<School>) schoolFieldExtractor());
}
});
return flatFileWriter;
}
private BeanWrapperFieldExtractor<School> schoolFieldExtractor() {
return new BeanWrapperFieldExtractor<School>() {
{
String[] columnValuesMapper = new String[] {
"schoolName", "schoolAddress"
};
setNames(columnValuesMapper);
}
};
}
The ItemWriter generates the files on most days. But once a while it throws the following error:
2022-02-14 22:07:46.652 [SimpleAsyncTaskExecutor-25] INFO SpringBatchConfiguration:703 - Item Reader
2022-02-14 22:07:46.653 [SimpleAsyncTaskExecutor-25] INFO PagingItemReader:80 - reading records 1 to 10
2022-02-14 22:07:46.657 [SimpleAsyncTaskExecutor-28] INFO PagingItemReader:80 - reading records 11 to 20
2022-02-14 22:07:46.661 [SimpleAsyncTaskExecutor-27] INFO PagingItemReader:80 - reading records 21 to 30
2022-02-14 22:07:46.665 [SimpleAsyncTaskExecutor-26] INFO PagingItemReader:80 - reading records 31 to 40
2022-02-14 22:07:46.998 [SimpleAsyncTaskExecutor-25] INFO o.s.batch.core.step.AbstractStep:272 - Step: [childStep:partition1] executed in 350ms
2022-02-14 22:07:47.005 [SimpleAsyncTaskExecutor-28] INFO o.s.batch.core.step.AbstractStep:272 - Step: [childStep:partition3] executed in 357ms
2022-02-14 22:07:47.033 [SimpleAsyncTaskExecutor-27] ERROR o.s.batch.core.step.AbstractStep:237 - Encountered an error executing step childStep in School-Job-Process
org.springframework.batch.item.ItemStreamException: Output file was not created: [/u01/TotalRecordsFound-20220214.txt]
at org.springframework.batch.item.util.FileUtils.setUpOutputFile(FileUtils.java:76)
at org.springframework.batch.item.support.AbstractFileItemWriter$OutputState.initializeBufferedWriter(AbstractFileItemWriter.java:553)
at org.springframework.batch.item.support.AbstractFileItemWriter$OutputState.access$000(AbstractFileItemWriter.java:385)
at org.springframework.batch.item.support.AbstractFileItemWriter.doOpen(AbstractFileItemWriter.java:319)
at org.springframework.batch.item.support.AbstractFileItemWriter.open(AbstractFileItemWriter.java:309)
at org.springframework.batch.item.support.AbstractFileItemWriter$$FastClassBySpringCGLIB$$f2d35c3.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:771)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:136)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:124)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:691)
at org.springframework.batch.item.file.FlatFileItemWriter$$EnhancerBySpringCGLIB$$294bdfee.open(<generated>)
at org.springframework.batch.item.support.CompositeItemStream.open(CompositeItemStream.java:103)
at org.springframework.batch.core.step.tasklet.TaskletStep.open(TaskletStep.java:311)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:205)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:138)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:135)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:834)
The error occurs intermittenly. The error occurs when two or more threads collides to create and write the data to the file. I can avoid it by delegating my FlatFileItemWriter to a SynchronizedItemStreamWriter. But the Spring docs suggest otherwise. The docs suggest that using a FlatFileItemWriter in a multi-threaded step does NOT require synchronizing writes.
So, I am not sure on how I can avoid these errors and also according to the logs the first two partitions successfully completed running which means the file is created successfully and data is written to it (if exists). So, how is the third partition telling that the file is not created when its already created by the first two paritions.
Any help would be appreciated. Thanks in advance.

OKHttpClient - Socket Read timed out issue

I am getting lot of Socket Read Timout when I try to read a response from my
post request. I am using OKHttpClient verion 3.11.0 with the following configuration:
#Bean
OkHttpClient okHttpClient() {
def loggingInterceptor = new HttpLoggingInterceptor()
loggingInterceptor.setLevel(HttpLoggingInterceptor.Level.BODY)
Dispatcher dispatcher = new Dispatcher()
dispatcher.setMaxRequests(200)
dispatcher.setMaxRequestsPerHost(200)
return new OkHttpClient()
.newBuilder()
.eventListenerFactory(PrintingEventListener.FACTORY)
.retryOnConnectionFailure(true)
.connectTimeout(30000, TimeUnit.MILLISECONDS)
.readTimeout(30000, TimeUnit.MILLISECONDS)
.dispatcher(dispatcher)
.connectionPool(new ConnectionPool(200, 30, TimeUnit.SECONDS))
.addNetworkInterceptor(loggingInterceptor)
.addInterceptor(loggingInterceptor).build()
}
The code to process the request and response is given below:
Response r = client.newCall(request).execute()
r.withCloseable { response ->
def returnable
if (response.header('Content-Type')?.contains('application/json')) {
def body = response.body().string()
if (body.trim().isEmpty()) {
returnable = [:]
} else {
try {
def result = new JsonSlurper().parseText(body)
return result
} catch (any) {
log.error('Failed to parse json response: ' + body, any.message)
throw any
}
}
} else if (response.header('Content-Type')?.contains('image/jpeg')) {
returnable = response.body().bytes()
} else {
returnable = response.body().string()
}
return returnable
}
I add event listener, I see that the httpclient call responseBodyStart event and it hangs there and when timeout seconds reached, the call failed and throw Socket Read timeout exception. Is there anything missing in my configuration?
Event listener shows
responseHeaderStart or responseBodyStart followed by connectionReleased after specified timeout(30s) reached?
Please find the event trace and exception trace below:
INFO 11097 : 2.1701E-5 -- callStart
INFO 11097 : 1.44572E-4 -- connectionAcquired
INFO 11097 : 0.001036047 -- requestHeadersStart
INFO 11097 : 0.001064492 -- requestHeadersEnd
INFO 11097 : 0.001084433 -- requestBodyStart
INFO 11097 : 0.001103787 -- requestBodyEnd
INFO 11097 : 0.001279736 -- responseHeadersStart
INFO 11097 : 1.007175496 -- responseHeadersEnd
INFO 11097 : 1.007247928 -- responseBodyStart
INFO 11097 : 31.082725087 -- connectionReleased
INFO 11097 : 31.083717147 -- callFailed
INFO 11097 : 31.092341876 --
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at okio.Okio$2.read(Okio.java:140)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
at okio.RealBufferedSource.request(RealBufferedSource.java:68)
at okio.RealBufferedSource.require(RealBufferedSource.java:61)
at okio.RealBufferedSource.readHexadecimalUnsignedLong(RealBufferedSource.java:304)
at okhttp3.internal.http1.Http1Codec$ChunkedSource.readChunkSize(Http1Codec.java:469)
at okhttp3.internal.http1.Http1Codec$ChunkedSource.read(Http1Codec.java:449)
at okio.RealBufferedSource.request(RealBufferedSource.java:68)
at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:241)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:213)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200)
at okhttp3.RealCall.execute(RealCall.java:77)
at okhttp3.Call$execute.call(Unknown Source)

Spark Kafka Receiver is not picking data from all partitions

I have created a Kafka topic with 5 partitions. And I am using createStream receiver API like following. But somehow only one receiver is getting the input data. Rest of receivers are not processign anything. Can you please help?
JavaPairDStream<String, String> messages = null;
if(sparkStreamCount > 0){
// We create an input DStream for each partition of the topic, unify those streams, and then repartition the unified stream.
List<JavaPairDStream<String, String>> kafkaStreams = new ArrayList<JavaPairDStream<String, String>>(sparkStreamCount);
for (int i = 0; i < sparkStreamCount; i++) {
kafkaStreams.add( KafkaUtils.createStream(jssc, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap));
}
messages = jssc.union(kafkaStreams.get(0), kafkaStreams.subList(1, kafkaStreams.size()));
}
else{
messages = KafkaUtils.createStream(jssc, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap);
}
After adding the changes I am getting following exceptions:
INFO : org.apache.spark.streaming.kafka.KafkaReceiver - Connected to localhost:2181
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Stopping receiver with message: Error starting receiver 0: java.lang.AssertionError: assertion failed
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Called receiver onStop
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Deregistering receiver 0
ERROR: org.apache.spark.streaming.scheduler.ReceiverTracker - Deregistered receiver for stream 0: Error starting receiver 0 - java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:165)
at kafka.consumer.TopicCount$$anonfun$makeConsumerThreadIdsPerTopic$2.apply(TopicCount.scala:36)
at kafka.consumer.TopicCount$$anonfun$makeConsumerThreadIdsPerTopic$2.apply(TopicCount.scala:34)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at kafka.consumer.TopicCount$class.makeConsumerThreadIdsPerTopic(TopicCount.scala:34)
at kafka.consumer.StaticTopicCount.makeConsumerThreadIdsPerTopic(TopicCount.scala:100)
at kafka.consumer.StaticTopicCount.getConsumerThreadIdsPerTopic(TopicCount.scala:104)
at kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:198)
at kafka.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:138)
at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:111)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1986)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1986)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Stopped receiver 0
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Stopping BlockGenerator
INFO : org.apache.spark.streaming.util.RecurringTimer - Stopped timer for BlockGenerator after time 1473964037200
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Waiting for block pushing thread to terminate
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Pushing out the last 0 blocks
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Stopped block pushing thread
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Stopped BlockGenerator
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Waiting for receiver to be stopped
ERROR: org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Stopped receiver with error: java.lang.AssertionError: assertion failed
ERROR: org.apache.spark.executor.Executor - Exception in task 0.0 in stage 29.0
There is one issue with the above code. The kafkaTopicMap parameter in KafkaUtils.createStream method specify Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread.
Try the below code:
JavaPairDStream<String, String> messages = null;
int sparkStreamCount = 5;
Map<String, Integer> kafkaTopicMap = new HashMap<String, Integer>();
if (sparkStreamCount > 0) {
List<JavaPairDStream<String, String>> kafkaStreams = new ArrayList<JavaPairDStream<String, String>>(sparkStreamCount);
for (int i = 0; i < sparkStreamCount; i++) {
kafkaTopicMap.put(topic, i+1);
kafkaStreams.add(KafkaUtils.createStream(streamingContext, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap));
}
messages = streamingContext.union(kafkaStreams.get(0), kafkaStreams.subList(1, kafkaStreams.size()));
} else {
messages = KafkaUtils.createStream(streamingContext, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap);
}

Unable to run distributed shell on YARN

I am trying to run distributed shell example on YARN cluster.
#Test
public void realClusterTest() throws Exception {
System.setProperty("HADOOP_USER_NAME", "hdfs");
String[] args = {
"--jar",
APPMASTER_JAR,
"--num_containers",
"1",
"--shell_command",
"ls",
"--master_memory",
"512",
"--container_memory",
"128"
};
LOG.info("Initializing DS Client");
Client client = new Client(new Configuration());
boolean initSuccess = client.init(args);
Assert.assertTrue(initSuccess);
LOG.info("Running DS Client");
boolean result = client.run();
LOG.info("Client run completed. Result=" + result);
Assert.assertTrue(result);
}
But it fails with:
2013-09-17 11:45:28,338 INFO [main] distributedshell.Client (Client.java:monitorApplication(600)) - Got application report from ASM for, appId=11, clientToAMToken=null, appDiagnostics=Application application_1379338026167_0011 failed 2 times due to AM Container for appattempt_1379338026167_0011_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
at org.apache.hadoop.util.Shell.run(Shell.java:373)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
................
.Failing this attempt.. Failing the application., appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1379407525237, yarnAppState=FAILED, distributedFinalState=FAILED, appTrackingUrl=ip-10-232-149-222.us-west-2.compute.internal:8088/proxy/application_1379338026167_0011/, appUser=hdfs
Here is what I see in server logs:
2013-09-17 08:45:26,870 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(213)) - Exception from container-launch with container ID: container_1379338026167_0011_02_000001 and exit code: 1
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
at org.apache.hadoop.util.Shell.run(Shell.java:373)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:258)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:74)
The question is how can I get more details to identify what is going wrong.
PS: we are using HDP 2.0.5

Resources