Hadoop in windows : file not found exception - windows

I'm using hadoop in windows and i've configured everything good (installing cygwin, passwordless ssh etc..)
I've compiled the wordcount program in WC.jar and tried to run. Its running perfectly in standalone mode.. but in fully distributed mode it gives FileNotFoundException
Please look into the logs and tel me what is wrong with it.
i've started the dfs and mapreduce in the MACH1. (thats my master)
$ bin/hadoop jar WC.jar WordCount words result
10/07/24 16:57:38 INFO input.FileInputFormat: Total input paths to process : 2
10/07/24 16:57:39 INFO mapred.JobClient: Running job: job_201007241657_0001
10/07/24 16:57:40 INFO mapred.JobClient: map 0% reduce 0%
10/07/24 16:57:50 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00003_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000003_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:57:55 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_r_0
00002_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_r_000002_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:07 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00003_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000003_1/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:14 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00003_2, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000003_2/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:26 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00002_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000002_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:34 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_r_0
00001_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_r_000001_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:41 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00002_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000002_1/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:47 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00002_2, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000002_2/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:53 INFO mapred.JobClient: Job complete: job_201007241657_0001
10/07/24 16:58:53 INFO mapred.JobClient: Counters: 0
328510#01HW179531 /usr/local/hadoop-0.20.2
$`
Thanks.

I think I might have seen this exception before but I don't have access to my old logs to confirm it. I solved my FileNotFoundException by reformatting the namenode. You might want to check the namenode logs for "inconsistent state" to confirm the cause before reformatting.

Related

Tez - DAGAppMaster - java.lang.IllegalArgumentException: Invalid ContainerId

I try to launch a mapreduce job, but I get an error while excuting the jobs in shell or in hive :
hive> select count(*) from employee ; Query ID =
mapr_20171107135114_a574713d-7d69-45e1-aa73-d4de07a3059b Total jobs =
1 Launching Job 1 out of 1 Number of reduce tasks determined at
compile time: 1 In order to change the average load for a reducer (in
bytes): set hive.exec.reducers.bytes.per.reducer= In order to
limit the maximum number of reducers: set
hive.exec.reducers.max= In order to set a constant number of
reducers: set mapreduce.job.reduces= Starting Job =
job_1510052734193_0005, Tracking URL =
http://hdpsrvpre2.intranet.darty.fr:8088/proxy/application_1510052734193_0005/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill
job_1510052734193_0005 Hadoop job information for Stage-1: number of
mappers: 0; number of reducers: 0 2017-11-07 13:51:25,951 Stage-1 map
= 0%, reduce = 0% Ended Job = job_1510052734193_0005 with errors Error during job, obtaining debugging information... **FAILED: Execution
Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: Stage-Stage-1: MAPRFS Read: 0 MAPRFS Write: 0
FAIL Total MapReduce CPU Time Spent: 0 mse
in Ressourcemanager logs that what I find :
> 2017-11-07 13:51:25,269 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1510052734193_0005_000002 State change from LAUNCHED to
> FINAL_SAVING 2017-11-07 13:51:25,269 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1510052734193_0005_000002 at:
> /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1510052734193_0005/appattempt_1510052734193_0005_000002
> 2017-11-07 13:51:25,283 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> Unregistering app attempt : appattempt_1510052734193_0005_000002
> 2017-11-07 13:51:25,283 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> Application finished, removing password for
> appattempt_1510052734193_0005_000002 2017-11-07 13:51:25,283 **INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1510052734193_0005_000002 State change from FINAL_SAVING to
> FAILED** 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The
> number of failed attempts is 2. The max attempts is 2 2017-11-07
> 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> Updating application application_1510052734193_0005 with final state:
> FAILED 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1510052734193_0005 State change from ACCEPTED to
> FINAL_SAVING 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Updating info for app: application_1510052734193_0005 2017-11-07
> 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Application appattempt_1510052734193_0005_000002 is done.
> finalState=FAILED 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for app: application_1510052734193_0005 at:
> /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1510052734193_0005/application_1510052734193_0005
> 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
> Application application_1510052734193_0005 requests cleared 2017-11-07
> 13:51:25,296 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> Application application_1510052734193_0005 failed 2 times due to AM
> Container for appattempt_1510052734193_0005_000002 exited with
> exitCode: 1 For more detailed output, check application tracking
> page:http://hdpsrvpre2.intranet.darty.fr:8088/cluster/app/application_1510052734193_0005Then,
> click on links to logs of each attempt. Diagnostics: Exception from
> container-launch. Container id:
> container_e10_1510052734193_0005_02_000001 Exit code: 1 Stack trace:
> ExitCodeException exitCode=1: at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at
> org.apache.hadoop.util.Shell.run(Shell.java:456) at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:748) Shell output: main : command
> provided 1 main : user is mapr main : requested yarn user is mapr
>
> Container exited with a non-zero exit code 1 Failing this attempt. Failing the application.
Also , in sys log of jobs I find :
2017-11-07 12:09:46,419 FATAL [main] app.DAGAppMaster: Error starting
DAGAppMaster java.lang.IllegalArgumentException: Invalid ContainerId:
container_e10_1510052734193_0001_01_000001 at
org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:1794)
Caused by: java.lang.NumberFormatException: For input string: "e10"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441) at
java.lang.Long.parseLong(Long.java:483) at
org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
at
org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
... 1 more
It seems to be that Tez which causes the issue, is there any solution to solve that?
Thank you !
I think that the execution environment has different versions of hadoop and their respective jar files.
Please verify the environment and make sure you use only the required version and remove the references of other versions from any of your environment variables.

java IOException: Write end dead during a hadoop job

I have a map-only hadoop job, that throws several IO exceptions during it's work:
1) java.io.IOException: Write end dead
2) java.io.IOException: Pipe closed
It manages to finish it's work, but there exceptions make me worry. Is there anything I'm doing wrong?
Practically the same job is working daily on another dataset which is 20 times smaller, and no exceptions are thrown. Jobs are run by Google dataproc.
The config file I'm using:
#!/bin/bash
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
-D mapreduce.output.fileoutputformat.compress=true \
-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec \
-D mapreduce.job.reduces=0 \
-D mapreduce.input.fileinputformat.split.maxsize=1500000000 \
-D mapreduce.map.failures.maxpercent=1 \
-D mapreduce.fileoutputcommitter.algorithm.version=2 \
-D mapreduce.task.timeout=900000 \
-D mapreduce.map.memory.mb=2048 \
-file mymapper.py \
-input gs://input_folder/* \
-output gs://output_folder/$1 \
-mapper mymapper.py \
-reducer org.apache.hadoop.mapred.lib.IdentityReducer \
-inputformat org.apache.hadoop.mapred.lib.CombineTextInputFormat
Here is an error log:
17/03/15 09:53:30 INFO mapreduce.Job: Running job:
job_1489571529338_0001
17/03/15 09:53:37 INFO mapreduce.Job: Job job_1489571529338_0001 running in uber mode : false
17/03/15 09:53:37 INFO mapreduce.Job: map 0% reduce 0%
17/03/15 09:56:58 INFO mapreduce.Job: map 1% reduce 0%
17/03/15 10:00:16 INFO mapreduce.Job: Task Id : attempt_1489571529338_0001_m_000744_0, Status : FAILED
Error: java.io.IOException: java.io.IOException: Write end dead
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.write(AbstractGoogleAsyncWriteChannel.java:256)
at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.write(CacheSupplementedGoogleCloudStorage.java:58)
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
at java.nio.channels.Channels.writeFully(Channels.java:101)
at java.nio.channels.Channels.access$000(Channels.java:61)
at java.nio.channels.Channels$1.write(Channels.java:174)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:126)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:109)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:844)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Suppressed: java.io.IOException: java.io.IOException: Write end dead
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287)
at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.close(CacheSupplementedGoogleCloudStorage.java:68)
at java.nio.channels.Channels$1.close(Channels.java:178)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
... 14 more
Caused by: java.io.IOException: Write end dead
at java.io.PipedInputStream.read(PipedInputStream.java:310)
at java.io.PipedInputStream.read(PipedInputStream.java:377)
at com.google.api.client.util.ByteStreams.read(ByteStreams.java:181)
at com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:629)
at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:358)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[CIRCULAR REFERENCE:java.io.IOException: Write end dead]
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
17/03/15 10:01:06 INFO mapreduce.Job: map 2% reduce 0%
17/03/15 10:02:46 INFO mapreduce.Job: Task Id : attempt_1489571529338_0001_m_001089_0, Status : FAILED
Error: java.io.IOException: Pipe closed
at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260)
at java.io.PipedInputStream.receive(PipedInputStream.java:226)
at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
at java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:458)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.write(AbstractGoogleAsyncWriteChannel.java:259)
at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.write(CacheSupplementedGoogleCloudStorage.java:58)
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
at java.nio.channels.Channels.writeFully(Channels.java:101)
at java.nio.channels.Channels.access$000(Channels.java:61)
at java.nio.channels.Channels$1.write(Channels.java:174)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:126)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:109)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:844)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Suppressed: java.io.IOException: java.io.IOException: Write end dead
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287)
at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.close(CacheSupplementedGoogleCloudStorage.java:68)
at java.nio.channels.Channels$1.close(Channels.java:178)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
... 14 more
Caused by: java.io.IOException: Write end dead
at java.io.PipedInputStream.read(PipedInputStream.java:310)
at java.io.PipedInputStream.read(PipedInputStream.java:377)
at com.google.api.client.util.ByteStreams.read(ByteStreams.java:181)
at com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:629)
at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:358)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
17/03/15 10:03:35 INFO mapreduce.Job: Task Id : attempt_1489571529338_0001_m_001217_0, Status : FAILED
Error: java.io.IOException: Pipe closed
at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260)
at java.io.PipedInputStream.receive(PipedInputStream.java:226)
at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
at java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:458)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.write(AbstractGoogleAsyncWriteChannel.java:259)
at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.write(CacheSupplementedGoogleCloudStorage.java:58)
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
at java.nio.channels.Channels.writeFully(Channels.java:101)
at java.nio.channels.Channels.access$000(Channels.java:61)
at java.nio.channels.Channels$1.write(Channels.java:174)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:126)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:109)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:844)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Suppressed: java.io.IOException: java.io.IOException: Write end dead
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287)
at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.close(CacheSupplementedGoogleCloudStorage.java:68)
at java.nio.channels.Channels$1.close(Channels.java:178)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
... 14 more
Caused by: java.io.IOException: Write end dead
at java.io.PipedInputStream.read(PipedInputStream.java:310)
at java.io.PipedInputStream.read(PipedInputStream.java:377)
at com.google.api.client.util.ByteStreams.read(ByteStreams.java:181)
at com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:629)
at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:358)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
17/03/15 10:04:51 INFO mapreduce.Job: map 3% reduce 0%
17/03/15 10:08:34 INFO mapreduce.Job: map 4% reduce 0%
17/03/15 10:12:12 INFO mapreduce.Job: map 5% reduce 0%
UPD.
Now it comes with Backend Error:
Error: java.io.IOException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone
{
"code" : 500,
"errors" : [ {
"domain" : "global",
"message" : "Backend Error",
"reason" : "backendError"
} ],
"message" : "Backend Error"
}
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWrit
eChannel.java:432)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287)
at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.close(CacheSupplementedGoogleCloud
Storage.java:68)
at java.nio.channels.Channels$1.close(Channels.java:178)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:126)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:109)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:844)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone
{
"code" : 500,
"errors" : [ {
"domain" : "global",
"message" : "Backend Error",
"reason" : "backendError"
} ],
"message" : "Backend Error"
}
Usually Write end dead means a writer thread failed to close() the output stream before exiting, but if it's happening in something in the underlying framework rather than any kind of manually created write channel, it's likely the result of some transient failure which caused a single task to fail for other reasons, and then the Write end dead message is just another symptom of the failure.
In your case, the 410 Gone error is a known transient failure mode of GCS which is not recoverable in the same stream (recoverable errors are automatically retried silently under the hood). But that's just a single failed task, and Hadoop ensures that failed tasks will get retried end-to-end for the job, and only if the same task fails too many times will the overall job fail.
So in general, it means as long as your overall job completes successfully, then all your data was processed correctly; single-task failures can just be treated as warnings.

error while executing pig script?

p.pig contains follwoing code
salaries= load 'salaries' using PigStorage(',') As (gender, age,salary,zip);
salaries= load 'salaries' using PigStorage(',') As (gender:chararray,age:int,salary:double,zip:long);
salaries=load 'salaries' using PigStorage(',') as (gender:chararray,details:bag{b(age:int,salary:double,zip:long)});
highsal= filter salaries by salary > 75000;
dump highsal
salbyage= group salaries by age;
describe salbyage;
salbyage= group salaries All;
salgrp= group salaries by $3;
A= foreach salaries generate age,salary;
describe A;
salaries= load 'salaries.txt' using PigStorage(',') as (gender:chararray,age:int,salary:double,zip:int);
vivek#ubuntu:~/Applications/Hadoop_program/pip$ pig -x mapreduce p.pig
15/09/24 03:16:32 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
15/09/24 03:16:32 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
15/09/24 03:16:32 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2015-09-24 03:16:32,990 [main] INFO org.apache.pig.Main - Apache Pig version 0.14.0 (r1640057) compiled Nov 16 2014, 18:02:05
2015-09-24 03:16:32,991 [main] INFO org.apache.pig.Main - Logging error messages to: /home/vivek/Applications/Hadoop_program/pip/pig_1443089792987.log
2015-09-24 03:16:38,966 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-09-24 03:16:41,232 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/vivek/.pigbootup not found
2015-09-24 03:16:42,869 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-09-24 03:16:42,870 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-09-24 03:16:42,870 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2015-09-24 03:16:45,436 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " <PATH> "salaries=load "" at line 7, column 1.
Was expecting one of:
<EOF>
"cat" ...
"clear" ...
"fs" ...
"sh" ...
"cd" ...
"cp" ...
"copyFromLocal" ...
"copyToLocal" ...
"dump" ...
"\\d" ...
"describe" ...
"\\de" ...
"aliases" ...
"explain" ...
"\\e" ...
"help" ...
"history" ...
"kill" ...
"ls" ...
"mv" ...
"mkdir" ...
"pwd" ...
"quit" ...
"\\q" ...
"register" ...
"rm" ...
"rmf" ...
"set" ...
"illustrate" ...
"\\i" ...
"run" ...
"exec" ...
"scriptDone" ...
"" ...
"" ...
<EOL> ...
";" ...
Details at logfile: /home/vivek/Applications/Hadoop_program/pip/pig_1443089792987.log
2015-09-24 03:16:45,554 [main] INFO org.apache.pig.Main - Pig script completed in 13 seconds and 48 milliseconds (13048 ms)
vivek#ubuntu:~/Applications/Hadoop_program/pip$
Here at starting p.pig comprised of the code give above.
i'm started my pig in mapreduce mode.
while executing above code it encounters following error:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "salaries=load "" at line 7, column 1.
please try to resolve the error .
You have not provided spaces between alias name and command.
Pig expects atleast on space before or after '=' operator.
Change this line :
salaries=load 'salaries' using PigStorage(',') as (gender:chararray,details:bag{b(age:int,salary:double,zip:long)});
TO
salaries = load 'salaries' using PigStorage(',') as (gender:chararray,details:bag{b(age:int,salary:double,zip:long)});

sqoop export failing due to timeout

I am not able to export data from sqoop to as400 server.
I am able to import the data successfully.
I am using following command: –
sqoop export –driver com.ibm.as400.access.AS400JDBCDriver –connect jdbc:as400://178.xxx.3.21:23/MELLET1/TEXT4 –username xxxxxx –password xxxxx007 –table TEXT3 –export-dir /as400/1GBTBL5/part-m-00000 -m 1
I am getting timeout issue.
>15/05/10 17:42:06 INFO input.FileInputFormat: Total input paths to process : 1
15/05/10 17:42:06 INFO input.FileInputFormat: Total input paths to process : 1
15/05/10 17:42:06 INFO mapreduce.JobSubmitter: number of splits:1
15/05/10 17:42:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1431267418859_0014
15/05/10 17:42:07 INFO impl.YarnClientImpl: Submitted application application_1431267418859_0014
15/05/10 17:42:07 INFO mapreduce.Job: Running job: job_1431267418859_0014
15/05/10 17:42:18 INFO mapreduce.Job: Job job_1431267418859_0014 running in uber mode : false
15/05/10 17:42:18 INFO mapreduce.Job: map 0% reduce 0%
15/05/10 17:42:37 INFO mapreduce.Job: map 100% reduce 0%
15/05/10 17:47:47 INFO mapreduce.Job: Task Id : attempt_1431267418859_0014_m_000000_0, Status : FAILED
AttemptID:attempt_1431267418859_0014_m_000000_0 Timed out after 300 secs
15/05/10 17:47:48 INFO mapreduce.Job: map 0% reduce 0%
15/05/10 17:48:07 INFO mapreduce.Job: map 100% reduce 0%
15/05/10 17:53:16 INFO mapreduce.Job: Task Id : attempt_1431267418859_0014_m_000000_1, Status : FAILED
AttemptID:attempt_1431267418859_0014_m_000000_1 Timed out after 300 secs
15/05/10 17:53:17 INFO mapreduce.Job: map 0% reduce 0%
15/05/10 17:53:40 INFO mapreduce.Job: map 100% reduce 0%
15/05/10 17:58:46 INFO mapreduce.Job: Task Id : attempt_1431267418859_0014_m_000000_2, Status : FAILED
AttemptID:attempt_1431267418859_0014_m_000000_2 Timed out after 300 secs
Please follow the below command which is for MySQL, similarly you can frame to your database accordingly
$ sqoop export --connect jdbc:mysql://db.example.com/foo --table bar --export-dir /results/bar_data
Have you tried batch mode?
set -Dsqoop.export.records.per.statement and --batch

Sqoop error while loading data from Hive to MySQL

I am getting sqoop error while loading data from Hive to MySQL
Error message is:
java.lang.NumberFormatException: For input string
==
hive > CREATE EXTERNAL TABLE IF NOT EXISTS test (
id int,
name string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION
'/user/cloudera/test';
==
vi test:
1 a
2 b
==
hadoop fs -put test /user/cloudera
==
mysql> CREATE TABLE `foo` (`id` int(11) , `name` varchar(30) )
==
sqoop export --connect jdbc:mysql://localhost/test --table foo -m 1 --export-dir /user/cloudera/test
==
log:
14/05/13 07:18:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/05/13 07:18:52 INFO tool.CodeGenTool: Beginning code generation
14/05/13 07:18:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `foo` AS t LIMIT 1
14/05/13 07:18:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `foo` AS t LIMIT 1
14/05/13 07:18:53 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
14/05/13 07:18:53 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20-mapreduce/hadoop-core.jar
Note: /tmp/sqoop-cloudera/compile/e6582e332bf9e0eedfb641f14d866599/foo.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/05/13 07:18:56 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/e6582e332bf9e0eedfb641f14d866599/foo.jar
14/05/13 07:18:56 INFO mapreduce.ExportJobBase: Beginning export of foo
14/05/13 07:18:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/05/13 07:19:00 INFO input.FileInputFormat: Total input paths to process : 1
14/05/13 07:19:00 INFO input.FileInputFormat: Total input paths to process : 1
14/05/13 07:19:00 INFO mapred.JobClient: Running job: job_201405081447_0046
14/05/13 07:19:01 INFO mapred.JobClient: map 0% reduce 0%
14/05/13 07:19:14 INFO mapred.JobClient: Task Id : attempt_201405081447_0046_m_000000_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NumberFormatException: For input string: "1 a"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java
14/05/13 07:19:20 INFO mapred.JobClient: Task Id : attempt_201405081447_0046_m_000000_1, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NumberFormatException: For input string: "1 a"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java
14/05/13 07:19:28 INFO mapred.JobClient: Task Id : attempt_201405081447_0046_m_000000_2, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NumberFormatException: For input string: "1 a"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java
==
Any help?
Thank you!
The location into which you placed the file does not appear to be correct. For a table "test" you should put a file underneath a directory test. But your command
hadoop fs -put test /user/cloudera
creates a file called test.
You would likely find more success as follows:
hadoop fs -mkdir /user/cloudera/test
hadoop dfs -put test /user/cloudera/test

Resources