Issue with FlatFileItemWriter in multithreaded step [duplicate] - spring

I have the following FlatFileItemWriter defined in a multi-threaded step.
public FlatFileItemWriter<School> writer() throws Exception {
FlatFileItemWriter<School> flatFileWriter = new FlatFileItemWriter<School>();
flatFileWriter.setResource(new FileSystemResource("C:\\u01\\SchoolDetails.txt"));
flatFileWriter.setName("School-File-Writer");
flatFileWriter.setAppendAllowed(true);
flatFileWriter.setLineSeparator("\n");
flatFileWriter.setHeaderCallback(writer -> writer.write(columnHeaders()));
flatFileWriter.setLineAggregator(new DelimitedLineAggregator<School>() {
{
setDelimiter("^");
setFieldExtractor((FieldExtractor<School>) schoolFieldExtractor());
}
});
return flatFileWriter;
}
private BeanWrapperFieldExtractor<School> schoolFieldExtractor() {
return new BeanWrapperFieldExtractor<School>() {
{
String[] columnValuesMapper = new String[] {
"schoolName", "schoolAddress"
};
setNames(columnValuesMapper);
}
};
}
The ItemWriter generates the files on most days. But once a while it throws the following error:
2022-02-14 22:07:46.652 [SimpleAsyncTaskExecutor-25] INFO SpringBatchConfiguration:703 - Item Reader
2022-02-14 22:07:46.653 [SimpleAsyncTaskExecutor-25] INFO PagingItemReader:80 - reading records 1 to 10
2022-02-14 22:07:46.657 [SimpleAsyncTaskExecutor-28] INFO PagingItemReader:80 - reading records 11 to 20
2022-02-14 22:07:46.661 [SimpleAsyncTaskExecutor-27] INFO PagingItemReader:80 - reading records 21 to 30
2022-02-14 22:07:46.665 [SimpleAsyncTaskExecutor-26] INFO PagingItemReader:80 - reading records 31 to 40
2022-02-14 22:07:46.998 [SimpleAsyncTaskExecutor-25] INFO o.s.batch.core.step.AbstractStep:272 - Step: [childStep:partition1] executed in 350ms
2022-02-14 22:07:47.005 [SimpleAsyncTaskExecutor-28] INFO o.s.batch.core.step.AbstractStep:272 - Step: [childStep:partition3] executed in 357ms
2022-02-14 22:07:47.033 [SimpleAsyncTaskExecutor-27] ERROR o.s.batch.core.step.AbstractStep:237 - Encountered an error executing step childStep in School-Job-Process
org.springframework.batch.item.ItemStreamException: Output file was not created: [/u01/TotalRecordsFound-20220214.txt]
at org.springframework.batch.item.util.FileUtils.setUpOutputFile(FileUtils.java:76)
at org.springframework.batch.item.support.AbstractFileItemWriter$OutputState.initializeBufferedWriter(AbstractFileItemWriter.java:553)
at org.springframework.batch.item.support.AbstractFileItemWriter$OutputState.access$000(AbstractFileItemWriter.java:385)
at org.springframework.batch.item.support.AbstractFileItemWriter.doOpen(AbstractFileItemWriter.java:319)
at org.springframework.batch.item.support.AbstractFileItemWriter.open(AbstractFileItemWriter.java:309)
at org.springframework.batch.item.support.AbstractFileItemWriter$$FastClassBySpringCGLIB$$f2d35c3.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:771)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:136)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:124)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:691)
at org.springframework.batch.item.file.FlatFileItemWriter$$EnhancerBySpringCGLIB$$294bdfee.open(<generated>)
at org.springframework.batch.item.support.CompositeItemStream.open(CompositeItemStream.java:103)
at org.springframework.batch.core.step.tasklet.TaskletStep.open(TaskletStep.java:311)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:205)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:138)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:135)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:834)
The error occurs intermittenly. The error occurs when two or more threads collides to create and write the data to the file. I can avoid it by delegating my FlatFileItemWriter to a SynchronizedItemStreamWriter. But the Spring docs suggest otherwise. The docs suggest that using a FlatFileItemWriter in a multi-threaded step does NOT require synchronizing writes.
So, I am not sure on how I can avoid these errors and also according to the logs the first two partitions successfully completed running which means the file is created successfully and data is written to it (if exists). So, how is the third partition telling that the file is not created when its already created by the first two paritions.
Any help would be appreciated. Thanks in advance.

Related

invalid stored block lengths java.io.IOException: invalid stored block lengths

I have one zip that contains 2 JSON files. When I process those files I got the following error. The first file is processed properly when it comes to the second file which throws ZipException.
How can I fix this?
My code,
private Path archivePath = Paths.get("src/test/resources/data/files.zip";
public void uploadData() throws IOException {
FileSystem fileFS = FileSystems.newFileSystem(archivePath,(ClassLoader) null);
String[] pathNames = {"file_1.json","file_2.json"};
for(String pathName : pathNames) {
//unzip files
Path path = fileFS.getPath(pathName);
Request request = new Request("POST","/_bulk");
RequestOptions.Builder options = RequestOptions.DEFAULT.toBuilder();
options.addHeader("Content-Type","application/x-ndjson");
request.setOptions(options);
request.setEntity(new InputStreamEntity(Files.newInputStream(path)));
Response response = client.performRequest(request);
assertResponseSuccessful(response,false);
}
Error:
invalid stored block lengths
java.io.IOException: invalid stored block lengths
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:828)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:134)
at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:597)
at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:173)
at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:816)
at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:146)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at org.testng.TestRunner.privateRun(TestRunner.java:766)
at org.testng.TestRunner.run(TestRunner.java:587)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
at org.testng.SuiteRunner.access$000(SuiteRunner.java:28)
at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:425)
at org.testng.internal.thread.ThreadUtil.lambda$execute$0(ThreadUtil.java:68)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.zip.ZipException: invalid stored block lengths
at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:165)
at java.base/java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:388)
at org.apache.http.nio.entity.EntityAsyncContentProducer.produceContent(EntityAsyncContentProducer.java:67)
at org.apache.http.nio.protocol.BasicAsyncRequestProducer.produceContent(BasicAsyncRequestProducer.java:125)
at org.apache.http.impl.nio.client.MainClientExec.produceContent(MainClientExec.java:262)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.produceContent(DefaultClientExchangeHandlerImpl.java:140)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.outputReady(HttpAsyncRequestExecutor.java:249)
at org.apache.http.impl.nio.client.InternalRequestExecutor.outputReady(InternalRequestExecutor.java:96)
at org.apache.http.impl.nio.DefaultNHttpClientConnection.produceOutput(DefaultNHttpClientConnection.java:290)
at org.apache.http.impl.nio.client.InternalIODispatch.onOutputReady(InternalIODispatch.java:86)
at org.apache.http.impl.nio.client.InternalIODispatch.onOutputReady(InternalIODispatch.java:39)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.outputReady(AbstractIODispatch.java:145)
at org.apache.http.impl.nio.reactor.BaseIOReactor.writable(BaseIOReactor.java:187)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:341)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
... 1 more
Any inputs here are really be appreciated.

Spring Batch - ItemStreamException: Output file was not created

I have the following FlatFileItemWriter defined in a multi-threaded step.
public FlatFileItemWriter<School> writer() throws Exception {
FlatFileItemWriter<School> flatFileWriter = new FlatFileItemWriter<School>();
flatFileWriter.setResource(new FileSystemResource("C:\\u01\\SchoolDetails.txt"));
flatFileWriter.setName("School-File-Writer");
flatFileWriter.setAppendAllowed(true);
flatFileWriter.setLineSeparator("\n");
flatFileWriter.setHeaderCallback(writer -> writer.write(columnHeaders()));
flatFileWriter.setLineAggregator(new DelimitedLineAggregator<School>() {
{
setDelimiter("^");
setFieldExtractor((FieldExtractor<School>) schoolFieldExtractor());
}
});
return flatFileWriter;
}
private BeanWrapperFieldExtractor<School> schoolFieldExtractor() {
return new BeanWrapperFieldExtractor<School>() {
{
String[] columnValuesMapper = new String[] {
"schoolName", "schoolAddress"
};
setNames(columnValuesMapper);
}
};
}
The ItemWriter generates the files on most days. But once a while it throws the following error:
2022-02-14 22:07:46.652 [SimpleAsyncTaskExecutor-25] INFO SpringBatchConfiguration:703 - Item Reader
2022-02-14 22:07:46.653 [SimpleAsyncTaskExecutor-25] INFO PagingItemReader:80 - reading records 1 to 10
2022-02-14 22:07:46.657 [SimpleAsyncTaskExecutor-28] INFO PagingItemReader:80 - reading records 11 to 20
2022-02-14 22:07:46.661 [SimpleAsyncTaskExecutor-27] INFO PagingItemReader:80 - reading records 21 to 30
2022-02-14 22:07:46.665 [SimpleAsyncTaskExecutor-26] INFO PagingItemReader:80 - reading records 31 to 40
2022-02-14 22:07:46.998 [SimpleAsyncTaskExecutor-25] INFO o.s.batch.core.step.AbstractStep:272 - Step: [childStep:partition1] executed in 350ms
2022-02-14 22:07:47.005 [SimpleAsyncTaskExecutor-28] INFO o.s.batch.core.step.AbstractStep:272 - Step: [childStep:partition3] executed in 357ms
2022-02-14 22:07:47.033 [SimpleAsyncTaskExecutor-27] ERROR o.s.batch.core.step.AbstractStep:237 - Encountered an error executing step childStep in School-Job-Process
org.springframework.batch.item.ItemStreamException: Output file was not created: [/u01/TotalRecordsFound-20220214.txt]
at org.springframework.batch.item.util.FileUtils.setUpOutputFile(FileUtils.java:76)
at org.springframework.batch.item.support.AbstractFileItemWriter$OutputState.initializeBufferedWriter(AbstractFileItemWriter.java:553)
at org.springframework.batch.item.support.AbstractFileItemWriter$OutputState.access$000(AbstractFileItemWriter.java:385)
at org.springframework.batch.item.support.AbstractFileItemWriter.doOpen(AbstractFileItemWriter.java:319)
at org.springframework.batch.item.support.AbstractFileItemWriter.open(AbstractFileItemWriter.java:309)
at org.springframework.batch.item.support.AbstractFileItemWriter$$FastClassBySpringCGLIB$$f2d35c3.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:771)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:136)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:124)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:691)
at org.springframework.batch.item.file.FlatFileItemWriter$$EnhancerBySpringCGLIB$$294bdfee.open(<generated>)
at org.springframework.batch.item.support.CompositeItemStream.open(CompositeItemStream.java:103)
at org.springframework.batch.core.step.tasklet.TaskletStep.open(TaskletStep.java:311)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:205)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:138)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:135)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:834)
The error occurs intermittenly. The error occurs when two or more threads collides to create and write the data to the file. I can avoid it by delegating my FlatFileItemWriter to a SynchronizedItemStreamWriter. But the Spring docs suggest otherwise. The docs suggest that using a FlatFileItemWriter in a multi-threaded step does NOT require synchronizing writes.
So, I am not sure on how I can avoid these errors and also according to the logs the first two partitions successfully completed running which means the file is created successfully and data is written to it (if exists). So, how is the third partition telling that the file is not created when its already created by the first two paritions.
Any help would be appreciated. Thanks in advance.

OKHttpClient - Socket Read timed out issue

I am getting lot of Socket Read Timout when I try to read a response from my
post request. I am using OKHttpClient verion 3.11.0 with the following configuration:
#Bean
OkHttpClient okHttpClient() {
def loggingInterceptor = new HttpLoggingInterceptor()
loggingInterceptor.setLevel(HttpLoggingInterceptor.Level.BODY)
Dispatcher dispatcher = new Dispatcher()
dispatcher.setMaxRequests(200)
dispatcher.setMaxRequestsPerHost(200)
return new OkHttpClient()
.newBuilder()
.eventListenerFactory(PrintingEventListener.FACTORY)
.retryOnConnectionFailure(true)
.connectTimeout(30000, TimeUnit.MILLISECONDS)
.readTimeout(30000, TimeUnit.MILLISECONDS)
.dispatcher(dispatcher)
.connectionPool(new ConnectionPool(200, 30, TimeUnit.SECONDS))
.addNetworkInterceptor(loggingInterceptor)
.addInterceptor(loggingInterceptor).build()
}
The code to process the request and response is given below:
Response r = client.newCall(request).execute()
r.withCloseable { response ->
def returnable
if (response.header('Content-Type')?.contains('application/json')) {
def body = response.body().string()
if (body.trim().isEmpty()) {
returnable = [:]
} else {
try {
def result = new JsonSlurper().parseText(body)
return result
} catch (any) {
log.error('Failed to parse json response: ' + body, any.message)
throw any
}
}
} else if (response.header('Content-Type')?.contains('image/jpeg')) {
returnable = response.body().bytes()
} else {
returnable = response.body().string()
}
return returnable
}
I add event listener, I see that the httpclient call responseBodyStart event and it hangs there and when timeout seconds reached, the call failed and throw Socket Read timeout exception. Is there anything missing in my configuration?
Event listener shows
responseHeaderStart or responseBodyStart followed by connectionReleased after specified timeout(30s) reached?
Please find the event trace and exception trace below:
INFO 11097 : 2.1701E-5 -- callStart
INFO 11097 : 1.44572E-4 -- connectionAcquired
INFO 11097 : 0.001036047 -- requestHeadersStart
INFO 11097 : 0.001064492 -- requestHeadersEnd
INFO 11097 : 0.001084433 -- requestBodyStart
INFO 11097 : 0.001103787 -- requestBodyEnd
INFO 11097 : 0.001279736 -- responseHeadersStart
INFO 11097 : 1.007175496 -- responseHeadersEnd
INFO 11097 : 1.007247928 -- responseBodyStart
INFO 11097 : 31.082725087 -- connectionReleased
INFO 11097 : 31.083717147 -- callFailed
INFO 11097 : 31.092341876 --
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at okio.Okio$2.read(Okio.java:140)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
at okio.RealBufferedSource.request(RealBufferedSource.java:68)
at okio.RealBufferedSource.require(RealBufferedSource.java:61)
at okio.RealBufferedSource.readHexadecimalUnsignedLong(RealBufferedSource.java:304)
at okhttp3.internal.http1.Http1Codec$ChunkedSource.readChunkSize(Http1Codec.java:469)
at okhttp3.internal.http1.Http1Codec$ChunkedSource.read(Http1Codec.java:449)
at okio.RealBufferedSource.request(RealBufferedSource.java:68)
at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:241)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:213)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200)
at okhttp3.RealCall.execute(RealCall.java:77)
at okhttp3.Call$execute.call(Unknown Source)

Saving RDD using a Proprietary OutputFormatter

I am using a Proprietary database which provides its own OutputFormatter. Using This OutputFormatter I can write a Map Reduce Job and save the data from MR into this database.
However I am trying to use the OutputFormatter inside of Spark and trying to save an RDD to a database.
The code I have written is
object VerticaSpark extends App {
val scConf = new SparkConf
val sc = new SparkContext(scConf)
val conf = new Configuration()
val job = new Job(conf)
job.setInputFormatClass(classOf[VerticaInputFormat])
job.setOutputKeyClass(classOf[Text])
job.setOutputValueClass(classOf[VerticaRecord])
job.setOutputFormatClass(classOf[VerticaOutputFormat])
VerticaInputFormat.setInput(job, "select * from Foo where key = ?", "1", "2", "3", "4")
VerticaOutputFormat.setOutput(job, "Bar", true, "name varchar", "total int")
val rddVR : RDD[VerticaRecord] = sc.newAPIHadoopRDD(job.getConfiguration, classOf[VerticaInputFormat], classOf[LongWritable], classOf[VerticaRecord]).map(_._2)
val rddTup = rddVR.map(x => (x.get(1).toString(), x.get(2).toString().toInt))
val rddGroup = rddTup.reduceByKey(_ + _)
val rddVROutput = rddGroup.map({
case(x, y) => (new Text("Bar"), getVerticaRecord(x, y, job.getConfiguration))
})
//rddVROutput.saveAsNewAPIHadoopFile("Bar", classOf[Text], classOf[VerticaRecord], classOf[VerticaOutputFormat], job.getConfiguration)
rddVROutput.saveAsNewAPIHadoopDataset(job.getConfiguration)
def getVerticaRecord(name : String, value : Int , conf: Configuration) : VerticaRecord = {
var retVal = new VerticaRecord(conf)
//println(s"going to build Vertica Record with ${name} and ${value}")
retVal.set(0, new Text(name))
retVal.set(1, new IntWritable(value))
retVal
}
}
I entire solution can be downloaded from here
https://github.com/abhitechdojo/VerticaSpark.git
My code works perfectly till the saveAsNewAPIHadoopFile function is reached. At this line it throws a NullPointer Exception
The same logic and same Input and Output Formatter work perfectly in a Map Reduce Program and I can write successfully from DB using the MR program
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/HadoopIntegrationGuide/HadoopConnector/ExampleHadoopConnectorApplication.htm%3FTocPath%3DIntegrating%2520with%2520Hadoop%7CUsing%2520the%2520%2520MapReduce%2520Connector%7C_____7
The stack trace of the error is
16/01/15 16:42:53 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 5, machine): java.lang.NullPointerException
at com.abhi.VerticaSpark$$anonfun$4.apply(VerticaSpark.scala:39)
at com.abhi.VerticaSpark$$anonfun$4.apply(VerticaSpark.scala:38)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:999)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 12, machine): java.lang.NullPointerException
at com.abhi.VerticaSpark$$anonfun$4.apply(VerticaSpark.scala:39)
at com.abhi.VerticaSpark$$anonfun$4.apply(VerticaSpark.scala:38)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:999)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
16/01/15 16:42:54 INFO TaskSetManager: Lost task 3.1 in stage 1.0 (TID 11) on executor machine: java.lang.NullPointerException (null) [duplicate 7]

Unable to run distributed shell on YARN

I am trying to run distributed shell example on YARN cluster.
#Test
public void realClusterTest() throws Exception {
System.setProperty("HADOOP_USER_NAME", "hdfs");
String[] args = {
"--jar",
APPMASTER_JAR,
"--num_containers",
"1",
"--shell_command",
"ls",
"--master_memory",
"512",
"--container_memory",
"128"
};
LOG.info("Initializing DS Client");
Client client = new Client(new Configuration());
boolean initSuccess = client.init(args);
Assert.assertTrue(initSuccess);
LOG.info("Running DS Client");
boolean result = client.run();
LOG.info("Client run completed. Result=" + result);
Assert.assertTrue(result);
}
But it fails with:
2013-09-17 11:45:28,338 INFO [main] distributedshell.Client (Client.java:monitorApplication(600)) - Got application report from ASM for, appId=11, clientToAMToken=null, appDiagnostics=Application application_1379338026167_0011 failed 2 times due to AM Container for appattempt_1379338026167_0011_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
at org.apache.hadoop.util.Shell.run(Shell.java:373)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
................
.Failing this attempt.. Failing the application., appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1379407525237, yarnAppState=FAILED, distributedFinalState=FAILED, appTrackingUrl=ip-10-232-149-222.us-west-2.compute.internal:8088/proxy/application_1379338026167_0011/, appUser=hdfs
Here is what I see in server logs:
2013-09-17 08:45:26,870 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(213)) - Exception from container-launch with container ID: container_1379338026167_0011_02_000001 and exit code: 1
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
at org.apache.hadoop.util.Shell.run(Shell.java:373)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:258)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:74)
The question is how can I get more details to identify what is going wrong.
PS: we are using HDP 2.0.5

Resources