Fail to load data to Hortonworks Sandbox in Pig - hadoop

Hi I am very neophyte to hadoop and when I first run this command
LOAD 'Pig/iris.csv' using PigStorage (',') the error popped out:
LOAD 'Pig/iris.csv' using PigStorage (',');
2014-09-05 06:04:04,853 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.1.2.1.1.0-385 (rexported) compiled Apr 16 2014, 15:59:00
2014-09-05 06:04:04,885 [main] INFO org.apache.pig.Main - Logging error messages to: /dev/null
2014-09-05 06:04:07,077 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /usr/lib/hue/.pigbootup not found
2014-09-05 06:04:14,699 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-09-05 06:04:14,699 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-09-05 06:04:14,699 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020
2014-09-05 06:05:11,826 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> LOAD 'Pig/iris.csv' using PigStorage (',');
2014-09-05 06:05:13,203 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " <IDENTIFIER> "LOAD "" at line 1, column 1.
Was expecting one of:
<EOF>
"cat" ...
"clear" ...
"fs" ...
"sh" ...
"cd" ...
"cp" ...
"copyFromLocal" ...
"copyToLocal" ...
"dump" ...
"\\d" ...
"describe" ...
"\\de" ...
"aliases" ...
"explain" ...
"\\e" ...
"help" ...
"history" ...
"kill" ...
"ls" ...
"mv" ...
"mkdir" ...
"pwd" ...
"quit" ...
"\\q" ...
"register" ...
"rm" ...
"rmf" ...
"set" ...
"illustrate" ...
"\\i" ...
"run" ...
"exec" ...
"scriptDone" ...
"" ...
"" ...
<EOL> ...
";" ...
Details at logfile: /dev/null
Does anyone know how to solve the problem?

LOAD creates a relation. You need to assign that to a variable so that you can do something with it later:
L = LOAD 'Pig/iris.csv' using PigStorage (',');
DUMP L;

Related

Extract logs between time frame with non specific time stamp using bash

I want to fetch log between two time stamps but i do not have specific time stamps with me. I can use the command sed for fetching if I have specific time stamp in log using the following command
sed -rne '/$StartTime/,/$EndTime/'p <filename>
My query is that since the specific StartTime and EndTime which I'm fetching from my DB might not be present in the log file, I will have to fetch the log between times near to the StartTime and EndTime that I provide using >= and <= signs. I tried the following command but it does not work.
awk '$0>=st && $0<=et' st=$StartTime et=$EndTime <filename>
Sample input and output
Input
Time retrieved from DB
StartTime - 2017-11-02 10:20:00
EndTime - 2017-11-02 11:20:00
The time present in log
T1 - 2017-11-02 10:17:44
T2 - 2017-11-02 11:19:32
Output: Entire Log text between T1 & T2
Sample Log
2017-03-03 10:43:18,736 [main] WARN - ORACLE_HOSTNAME=xxxxxxxxxx[OVERRIDES:
xxxxxxxxxxxxxxxx]
2017-03-03 10:43:18,736 [main] WARN - NLS_DATE_FORMAT=DD-MON-YYYY
HH24:MI:SS [OVERRIDES: DD-MON-YYYY HH24:MI:SS]
2017-03-03 10:43:18,736 [main] WARN - xxxxUsername=MDMPIUSER [OVERRIDES: MDMPIUSER]
2017-03-03 10:43:18,736 [main] WARN - BUNDLE_GEMFILE=uri:classloader://installer/Gemfile [OVERRIDES: uri:classloader://installer/Gemfile]
2017-03-03 10:43:18,736 [main] WARN - TIMEOUT=900 [OVERRIDES: 900]
2017-03-03 10:43:18,736 [main] WARN - SHLVL=4 [OVERRIDES: 4]
2017-03-03 10:43:18,736 [main] WARN - HISTSIZE=1000 [OVERRIDES: 1000]
2017-03-03 10:43:18,736 [main] WARN - JAVA_HOME=/usr/java/jdk1.8.0_60/jre [OVERRIDES: /usr/java/jdk1.8.0_60/jre]
2017-03-03 10:43:20,156 [main] WARN - APP_PROPS=/home/xxx/conf/appProperties [OVERRIDES: /home/xxx/conf/appProperties]
You can try
awk -v start="$StartTime" -v end="$EndTime" '
function fonct(date)
{
gsub(/-|,| |:/,"",date)
return date
}
BEGIN{
start=fonct(start)
end=fonct(end)
}
{
a=fonct($1$2)
if (a>=start && a<=end)print $0
}' infile

error while executing pig script?

p.pig contains follwoing code
salaries= load 'salaries' using PigStorage(',') As (gender, age,salary,zip);
salaries= load 'salaries' using PigStorage(',') As (gender:chararray,age:int,salary:double,zip:long);
salaries=load 'salaries' using PigStorage(',') as (gender:chararray,details:bag{b(age:int,salary:double,zip:long)});
highsal= filter salaries by salary > 75000;
dump highsal
salbyage= group salaries by age;
describe salbyage;
salbyage= group salaries All;
salgrp= group salaries by $3;
A= foreach salaries generate age,salary;
describe A;
salaries= load 'salaries.txt' using PigStorage(',') as (gender:chararray,age:int,salary:double,zip:int);
vivek#ubuntu:~/Applications/Hadoop_program/pip$ pig -x mapreduce p.pig
15/09/24 03:16:32 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
15/09/24 03:16:32 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
15/09/24 03:16:32 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2015-09-24 03:16:32,990 [main] INFO org.apache.pig.Main - Apache Pig version 0.14.0 (r1640057) compiled Nov 16 2014, 18:02:05
2015-09-24 03:16:32,991 [main] INFO org.apache.pig.Main - Logging error messages to: /home/vivek/Applications/Hadoop_program/pip/pig_1443089792987.log
2015-09-24 03:16:38,966 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-09-24 03:16:41,232 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/vivek/.pigbootup not found
2015-09-24 03:16:42,869 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-09-24 03:16:42,870 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-09-24 03:16:42,870 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2015-09-24 03:16:45,436 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " <PATH> "salaries=load "" at line 7, column 1.
Was expecting one of:
<EOF>
"cat" ...
"clear" ...
"fs" ...
"sh" ...
"cd" ...
"cp" ...
"copyFromLocal" ...
"copyToLocal" ...
"dump" ...
"\\d" ...
"describe" ...
"\\de" ...
"aliases" ...
"explain" ...
"\\e" ...
"help" ...
"history" ...
"kill" ...
"ls" ...
"mv" ...
"mkdir" ...
"pwd" ...
"quit" ...
"\\q" ...
"register" ...
"rm" ...
"rmf" ...
"set" ...
"illustrate" ...
"\\i" ...
"run" ...
"exec" ...
"scriptDone" ...
"" ...
"" ...
<EOL> ...
";" ...
Details at logfile: /home/vivek/Applications/Hadoop_program/pip/pig_1443089792987.log
2015-09-24 03:16:45,554 [main] INFO org.apache.pig.Main - Pig script completed in 13 seconds and 48 milliseconds (13048 ms)
vivek#ubuntu:~/Applications/Hadoop_program/pip$
Here at starting p.pig comprised of the code give above.
i'm started my pig in mapreduce mode.
while executing above code it encounters following error:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "salaries=load "" at line 7, column 1.
please try to resolve the error .
You have not provided spaces between alias name and command.
Pig expects atleast on space before or after '=' operator.
Change this line :
salaries=load 'salaries' using PigStorage(',') as (gender:chararray,details:bag{b(age:int,salary:double,zip:long)});
TO
salaries = load 'salaries' using PigStorage(',') as (gender:chararray,details:bag{b(age:int,salary:double,zip:long)});

Hadoop Cluster Deployment Using Pivotal

I am trying to deploy Hadoop cluster via Pivotal distribution.
For the same, I am following link mentioned below
http://pivotalhd.docs.pivotal.io/doc/2100/webhelp/topics/ManuallyInstallingandUsingPivotalHD21Stack.html
Deployment Configuration:
1) phd1.xyz.com - NameNode, ResourceManager
2) phd2.xyz.com - DataNode, NodeManager
I have above mentioned services UP and Running and also able to access the HDFS file system but not able to execute jobs on cluster
Above provided link doesn't mention if the job has to be executed via root or hdfs user, so I tried both the ways
Error when job is executed via root user
hadoop jar/usr/lib/gphd/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0-gphd-3.1.0.0.jar
pi 2 200
The following error occurring:
> Number of Maps = 2
> Samples per Map = 200
> org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE,
> inode="/user":hdfs:supergroup:drwxr-xr-x
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:214)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:158)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5389)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5371)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5345)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3583)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3553)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3525)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:745)
> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:550)
> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:63031)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
Error when job is executed via hdfs user
sudo -u hdfs hadoop jar /usr/lib/gphd/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0-gphd-3.1.0.0.jar pi 2 200
the following error ocurring:
> Number of Maps = 2
> Samples per Map = 200
> Wrote input for Map #0
> Wrote input for Map #1
> Starting Job
> 15/01/01 20:48:20 INFO client.RMProxy: Connecting to ResourceManager at phd1.xyz.com/10.44.189.6:8050
> 15/01/01 20:48:21 INFO input.FileInputFormat: Total input paths to process : 2
> 15/01/01 20:48:21 INFO mapreduce.JobSubmitter: number of splits:2
> 15/01/01 20:48:21 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use
> mapreduce.map.speculative
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use
> mapreduce.job.output.value.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
> mapreduce.reduce.speculative
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use
> mapreduce.job.map.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use
> mapreduce.job.reduce.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use
> mapreduce.job.inputformat.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use
> mapreduce.output.fileoutputformat.outputdir
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use
> mapreduce.job.outputformat.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use
> mapreduce.job.output.key.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use
> mapreduce.job.working.dir
> 15/01/01 20:48:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1420122968684_0002
> 15/01/01 20:48:22 INFO impl.YarnClientImpl: Submitted application application_1420122968684_0002 to ResourceManager at
> phd1.xyz.com/10.44.189.6:8050
> 15/01/01 20:48:22 INFO mapreduce.Job: The url to track the job: http://phd1.persistent.co.in:8088/proxy/application_1420122968684_0002/
> 15/01/01 20:48:22 INFO mapreduce.Job: Running job: job_1420122968684_0002
> 15/01/01 20:48:26 INFO mapreduce.Job: Job job_1420122968684_0002 running in uber mode : false
> 15/01/01 20:48:26 INFO mapreduce.Job: map 0% reduce 0%
> 15/01/01 20:48:26 INFO mapreduce.Job: Job job_1420122968684_0002 failed with state FAILED due to: Application
> application_1420122968684_0002 failed 2 times due to AM Container for
> appattempt_1420122968684_0002_000002 exited with exitCode: 1 due to:
> Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> .Failing this attempt.. Failing the application.
> 15/01/01 20:48:26 INFO mapreduce.Job: Counters: 0
> Job Finished in 5.973 seconds
> java.io.FileNotFoundException: File does not exist: hdfs://phd1.xyz.com:8020/user/hdfs/QuasiMonteCarlo_1420125497811_11863122/out/reduce-out
> at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
> at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1112)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1112)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
> at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
> at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Please let me know how can I resolve this error.
Thanks

Sqoop error while loading data from Hive to MySQL

I am getting sqoop error while loading data from Hive to MySQL
Error message is:
java.lang.NumberFormatException: For input string
==
hive > CREATE EXTERNAL TABLE IF NOT EXISTS test (
id int,
name string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION
'/user/cloudera/test';
==
vi test:
1 a
2 b
==
hadoop fs -put test /user/cloudera
==
mysql> CREATE TABLE `foo` (`id` int(11) , `name` varchar(30) )
==
sqoop export --connect jdbc:mysql://localhost/test --table foo -m 1 --export-dir /user/cloudera/test
==
log:
14/05/13 07:18:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/05/13 07:18:52 INFO tool.CodeGenTool: Beginning code generation
14/05/13 07:18:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `foo` AS t LIMIT 1
14/05/13 07:18:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `foo` AS t LIMIT 1
14/05/13 07:18:53 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
14/05/13 07:18:53 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20-mapreduce/hadoop-core.jar
Note: /tmp/sqoop-cloudera/compile/e6582e332bf9e0eedfb641f14d866599/foo.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/05/13 07:18:56 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/e6582e332bf9e0eedfb641f14d866599/foo.jar
14/05/13 07:18:56 INFO mapreduce.ExportJobBase: Beginning export of foo
14/05/13 07:18:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/05/13 07:19:00 INFO input.FileInputFormat: Total input paths to process : 1
14/05/13 07:19:00 INFO input.FileInputFormat: Total input paths to process : 1
14/05/13 07:19:00 INFO mapred.JobClient: Running job: job_201405081447_0046
14/05/13 07:19:01 INFO mapred.JobClient: map 0% reduce 0%
14/05/13 07:19:14 INFO mapred.JobClient: Task Id : attempt_201405081447_0046_m_000000_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NumberFormatException: For input string: "1 a"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java
14/05/13 07:19:20 INFO mapred.JobClient: Task Id : attempt_201405081447_0046_m_000000_1, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NumberFormatException: For input string: "1 a"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java
14/05/13 07:19:28 INFO mapred.JobClient: Task Id : attempt_201405081447_0046_m_000000_2, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NumberFormatException: For input string: "1 a"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java
==
Any help?
Thank you!
The location into which you placed the file does not appear to be correct. For a table "test" you should put a file underneath a directory test. But your command
hadoop fs -put test /user/cloudera
creates a file called test.
You would likely find more success as follows:
hadoop fs -mkdir /user/cloudera/test
hadoop dfs -put test /user/cloudera/test

Hadoop in windows : file not found exception

I'm using hadoop in windows and i've configured everything good (installing cygwin, passwordless ssh etc..)
I've compiled the wordcount program in WC.jar and tried to run. Its running perfectly in standalone mode.. but in fully distributed mode it gives FileNotFoundException
Please look into the logs and tel me what is wrong with it.
i've started the dfs and mapreduce in the MACH1. (thats my master)
$ bin/hadoop jar WC.jar WordCount words result
10/07/24 16:57:38 INFO input.FileInputFormat: Total input paths to process : 2
10/07/24 16:57:39 INFO mapred.JobClient: Running job: job_201007241657_0001
10/07/24 16:57:40 INFO mapred.JobClient: map 0% reduce 0%
10/07/24 16:57:50 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00003_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000003_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:57:55 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_r_0
00002_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_r_000002_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:07 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00003_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000003_1/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:14 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00003_2, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000003_2/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:26 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00002_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000002_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:34 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_r_0
00001_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_r_000001_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:41 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00002_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000002_1/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:47 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00002_2, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000002_2/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:53 INFO mapred.JobClient: Job complete: job_201007241657_0001
10/07/24 16:58:53 INFO mapred.JobClient: Counters: 0
328510#01HW179531 /usr/local/hadoop-0.20.2
$`
Thanks.
I think I might have seen this exception before but I don't have access to my old logs to confirm it. I solved my FileNotFoundException by reformatting the namenode. You might want to check the namenode logs for "inconsistent state" to confirm the cause before reformatting.

Resources