Error when trying to execute pig statment - hadoop

I am trying to execute a pig statement that shows me the data in a txt file and I am running in mapreduce mode, but I am getting an error please can somebody help me to resolve this!!
[root#master ~]# pig -x mapreduce
17/04/19 17:42:34 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
17/04/19 17:42:34 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
17/04/19 17:42:34 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2017-04-19 17:42:34,853 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0 (r1746530) compiled Jun 01 2016, 23:10:49
2017-04-19 17:42:34,853 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1492603954851.log
2017-04-19 17:42:34,907 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2017-04-19 17:42:36,060 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost
2017-04-19 17:42:37,130 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-f60d05c3-9fee-4624-9aa8-07f1584e6165
2017-04-19 17:42:37,130 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
grunt> dump b;
2017-04-19 17:42:41,135 [main] ERROR org.apache.pig.tools.grunt.Grunt - You don't have permission to perform the operation. Error from the server: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/tmp/temp1549818457":dead:supergroup:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1692)
at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3894)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:983)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
2017-04-19 17:42:41,136 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias b
Details at logfile: /root/pig_1492603954851.log

You can try this :-
pig -x mapreduce -p 'pig.temp.dir'='<temp_location_hdfs>'
'temp_location_hdfs' should have either 775 or 777 permissions.
Then you can try :-
hadoop fs -chmod -R 777 /tmp/*

It seems like you do not have proper permission to pig.temp.dir setting and hence this issue. By default pig writes the intermediate results in /tmp on HDFS. Overwrite it by using -Dpig.temp.dir.

Related

pig script does not exists error , even if I can see it in hdfs

I am trying to run the pig script using the -f usecatalog option but it is giving me issue.
it says script does not exist, while I can see the file is present in hdfs file system. see below.
[hdfs#ip-xx-xx-xx-x-xx ec2-user]$ pig -useHCatalog -f /user/admin/pig/scripts/hcat1.pig
WARNING: Use "yarn jar" to launch YARN applications.
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2016-04-01 13:44:13,645 [main] INFO org.apache.pig.Main - Apache Pig version 0.15.0.2.3.4.0-3485 (rexported) compiled Dec 16 20 15, 04:30:33
2016-04-01 13:44:13,645 [main] INFO org.apache.pig.Main - Logging error messages to: /tmp/hsperfdata_hdfs/pig_1459532653643.log
2016-04-01 13:44:14,184 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. File /user/admin/pig/scripts/hca t1.pig does not exist
Details at logfile: /tmp/hsperfdata_hdfs/pig_1459532653643.log
2016-04-01 13:44:14,203 [main] INFO org.apache.pig.Main - Pig script completed in 753 milliseconds (753 ms)
[hdfs#ip-xxx-xx-xx-xx ec2-user]$ hadoop fs -cat /user/admin/pig/scripts/hcat1.pig
a = load 'trucks' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by truckid == 'A1';
store b INTO '/user/admin/pig/scritps/outputb1';
You need to specify the complete HDFS URI to run the scripts that are stored in HDFS.
Here is what you need:
$pig -useHCatalog hdfs://namenode_hostname:port/user/admin/pig/scripts/hcat1.pig

How to resolve the following apache pig error?

I am executing the following commands:
A= load 'user/cloudera' using PigStorage(':');
foreach A generate $0,$4,$5;
dump B;
On executing the last command I get the following error which I am unable to resolve.Being a newbie to bigdata and apache hadoop stack,I am unable to comprehend this error.Please help ASAP.Also searching on StackOverflow for similar errors did not help:
2015-11-13 06:36:46,170 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-11-13 06:36:46,208 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]}
2015-11-13 06:36:46,212 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-11-13 06:36:46,225 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-11-13 06:36:46,225 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-11-13 06:36:46,404 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-11-13 06:36:46,415 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2015-11-13 06:36:46,445 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-11-13 06:36:49,232 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job306801006066349255.jar
2015-11-13 06:37:04,185 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job306801006066349255.jar created
2015-11-13 06:37:04,223 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-11-13 06:37:04,238 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-11-13 06:37:04,238 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2015-11-13 06:37:04,238 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-11-13 06:37:04,274 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-11-13 06:37:04,274 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-11-13 06:37:04,283 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-11-13 06:37:04,363 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-11-13 06:37:05,416 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /tmp/hadoop-yarn/staging/cloudera/.staging/job_1447417089361_0004
2015-11-13 06:37:05,420 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
2015-11-13 06:37:05,420 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
... 18 more
2015-11-13 06:37:05,423 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1447417089361_0004
2015-11-13 06:37:05,423 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B
2015-11-13 06:37:05,423 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,3],B[4,3] C: R:
2015-11-13 06:37:05,423 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1447417089361_0004
2015-11-13 06:37:05,440 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-11-13 06:37:10,463 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-11-13 06:37:10,463 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1447417089361_0004 has failed! Stop running all dependent jobs
2015-11-13 06:37:10,463 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-11-13 06:37:10,620 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Could not get Job info from RM for job job_1447417089361_0004. Redirecting to job history server.
2015-11-13 06:37:10,844 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Could not get Job info from RM for job job_1447417089361_0004. Redirecting to job history server.
2015-11-13 06:37:10,849 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-11-13 06:37:10,850 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0-cdh5.4.2 0.12.0-cdh5.4.2 cloudera 2015-11-13 06:36:46 2015-11-13 06:37:10 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1447417089361_0004 A,B MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
... 18 more
hdfs://quickstart.cloudera:8020/tmp/temp-193566860/tmp-1023933528,
Input(s):
Failed to read data from "hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera"
Output(s):
Failed to produce result in "hdfs://quickstart.cloudera:8020/tmp/temp-193566860/tmp-1023933528"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1447417089361_0004
2015-11-13 06:37:10,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2015-11-13 06:37:10,853 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B
Details at logfile: /home/cloudera/pig_1447424730804.log
Solution:
Change line 1 of your code to:
A= load '/user/cloudera' using PigStorage(':');
Having said that, are you sure your input data is in your home area? That seems unlikely. It's more likely to be in a folder within your home area, i.e. /user/cloudera/input-data.
Before running your job do:
hdfs dfs -ls /user/cloudera
to confirm that input data is actually in that folder. If it's not, work out where it actually is, and make sure it is on the HDFS and not locally.
Explanation:
The relevant part of the logs is
ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
This suggests it's related to the input path. The part of your code that deals with the input path is
A= load 'user/cloudera' using PigStorage(':');
By not adding a forward slash to /user then it assumes that everything is relative to your home area, so for example writing load 'input' would lead the Pig job to read in hdfs://quickstart.cloudera:8028/user/cloudera/input. In your case then the missing slash means it adds it to your user area.
Please check where are you running pig statements in local/linux mode or mapreduce mode , if you are running local mode you cant read HDFS file
run this cmd $ pig -x mapreduce
after this
Grunt> A = load '/;
Grunt>Dump/store /
Input(s):
Failed to read data from
"hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera"
Output(s):
Failed to produce result in "hdfs://quickstart.cloudera:8020/tmp/temp-193566860/tmp-1023933528"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
If you look at the above error code (copied from question that was posted), input file path is taken by pig as below
'hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera'
So when running pig in mapreduce mode and using quickstart VM, you have use
'hdfs://quickstart.cloudera:8020' before your input file path....
Is the input a folder?...if so pig script will read all the files in the folder....
Example:
say your input file path which is located in hdfs is '/usr/cloudera/inputdata.txt' then your query has to be
A= load 'hdfs://quickstart.cloudera:8020/user/cloudera/inputdata.txt' using PigStorage(':');
foreach A generate $0,$4,$5;
dump B;
The only thing that u have to check is the file path.
hdfs dfs -ls /
check how the path structure comes.
like ..../input/data/file.
file u are going to use.
A = load '/input/data/file';
dump A;
only thing is the proper path of your file that u want to load.
I had a similar issue and I was able to fix by deleting the passwd file from the target folder and then copying it again using the correct relative path:
A= load '/user/cloudera' using PigStorage(':');
The instruction in the video is missing the '/' before the user
Step 1:
hdfs dfs -rm /user/cloudera/passwd
Step 2:
A= load '/user/cloudera' using PigStorage(':');
Then followed the instructions from the Coursera video.

Oozie launching MR jobs as YARN user instead of given user name

I have designed a Oozie workflow to run a Sqoop script.
I'm submitting the workflow using the user name given by hadoop admin team.
Script is failing because oozie is launching MR jobs as YARN user which is not able to access my userid directory in HDFS i.e /user/cv1100.
I checked the MR log there I can see the property as "user.name=yarn"
How can I change this? I have mentioned "user.name" in job.properties file of Oozie.
Below is the error I’m getting in logs, check if it helps
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
error: error reading /usr/lib/hadoop/lib/smore.jar; /usr/lib/hadoop/lib/smore.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/adfs.jar; /usr/lib/hadoop/lib/adfs.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/aftp.jar; /usr/lib/hadoop/lib/aftp.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/janusclient.jar; /usr/lib/hadoop/lib/janusclient.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/adfs-api-loader.jar; /usr/lib/hadoop/lib/adfs-api-loader.jar (Permission denied)
error: error reading /usr/lib/hadoop/lib/aster-networking.jar; /usr/lib/hadoop/lib/aster-networking.jar (Permission denied)
Note: /tmp/sqoop-yarn/compile/a9988b6ea5448f4cc962b625361feb1a/test.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
31075 [Thread-35] INFO org.apache.sqoop.hive.HiveImport - Loading data to table ecsdashboard.test_oozie
31175 [Thread-35] INFO org.apache.sqoop.hive.HiveImport - Failed with exception Unable to move sourcehdfs:///user/qjdht93/test/_SUCCESS to destination hdfs://apps/hive/warehouse/ecsdashboard.db/test_oozie/_SUCCESS
31176 [Thread-35] INFO org.apache.sqoop.hive.HiveImport - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
31728 [main] ERROR org.apache.sqoop.tool.ImportTool - Encountered IOException running import job: java.io.IOException: Hive exited with status 1
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:385)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:335)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:425)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:506)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:206)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:174)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:225)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://Had1:8020/user/qjdht93/oozie-oozi/0000054-150404155202480-oozie-oozi-W/sqoop2hive--sqoop/action-data.seq
Oozie Launcher ends
Log Type: syslog
Log Length: 2914
2015-04-14 02:49:11,912 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-04-14 02:49:11,943 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink ganglia started
2015-04-14 02:49:12,012 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-04-14 02:49:12,012 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2015-04-14 02:49:12,024 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2015-04-14 02:49:12,024 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1428177121154_1938, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#45d1f40c)
2015-04-14 02:49:12,048 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 39.7.48.3:8050, Ident: (owner=qjdht93, renewer=oozie mr token, realUser=oozie, issueDate=1428994143490, maxDate=1429598943490, sequenceNumber=253, masterKeyId=11)
2015-04-14 02:49:12,100 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2015-04-14 02:49:12,383 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /data2/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data3/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data4/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data5/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data6/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data7/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data8/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data9/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data10/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data11/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938,/data12/hadoop/yarn/local/usercache/qjdht93/appcache/application_1428177121154_1938
2015-04-14 02:49:12,821 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-04-14 02:49:13,273 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2015-04-14 02:49:13,596 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://Had1:8020/user/qjdht93/oozie-oozi/0000054-150404155202480-oozie-oozi-W/sqoop2hive--sqoop/input/dummy.txt:0+5
2015-04-14 02:49:13,622 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2015-04-14 02:49:13,659 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id

PIG setup throwing error

I was trying to install PIG v0.13.0 in my Fedora 20 system. After extracting the tar.gz contents, I did the PATH setup for JAVA_HOME and PIG/bin. Then I type the command pig in the console and this is what I got: Unable to understand what went wrong:
[root#localhost /]# pig
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-12-21 00:05:16,082 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-12-21 00:05:16,083 [main] INFO org.apache.pig.Main - Logging error messages to: //pig_1419100516081.log
2014-12-21 00:05:16,130 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2014-12-21 00:05:16,765 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:16,771 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-12-21 00:05:16,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2014-12-21 00:05:16,780 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-12-21 00:05:19,130 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:19,130 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2014-12-21 00:05:19,136 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> ls
2014-12-21 00:05:33,697 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Call From localhost.localdomain/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Details at logfile: //pig_1419100516081.log
Please let me know why did the ls command in grunt shell throw the error?
Please guide.
When you type pig in console, by default it will go to MAPREDUCE mode, for that you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode in pig.
It looks like your hadoop cluster is not configured properly that is the reason you are getting the connection refunded error. Please follow up this link to solve this connect-refused problem.http://wiki.apache.org/hadoop/ConnectionRefused.
As a workaround use LOCAL mode, this doesn't need hadoop installation.
In the console type pig -x local this will bring the grunt shell and type ls command.
Local mode
$ pig -x local
Mapreduce mode
$ pig
(or) //try to connect HDFS
$ pig -x mapreduce
Ok I got this one working. if I connect to the pig mapreduce mode the the ls command will change to ls hdfs:/. Hence changing the above command from ls to ls hdfs:/ resolves my problem. But again, if I am connecting to the local mode then the ls command works fine.

How to Import/Load .csv file in PIG?

lets suppose there is a text file tab limited (datetemp.txt) I want to load this text file in pig for processing but when I am typing below line its giving me error as :
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage() As (EventID: chararray,eventdate: chararray,count:int);
grunt> dump inputfile;
2014-09-06 08:41:23,527 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-09-06 08:41:23,544 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-09-06 08:41:23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-09-06 08:41:23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-09-06 08:41:23,551 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-09-06 08:41:23,551 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-09-06 08:41:23,552 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2739171785773930333.jar
2014-09-06 08:42:39,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2739171785773930333.jar created
2014-09-06 08:42:39,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-09-06 08:42:39,619 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-09-06 08:42:39,630 [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2014-09-06 08:42:39,891 [Thread-12] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area hdfs://192.168.195.130:8020/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/training/.staging/job_201408292336_0009
2014-09-06 08:42:39,891 [Thread-12] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:training (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
2014-09-06 08:42:40,119 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2014-09-06 08:42:40,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job null has failed! Stop running all dependent jobs
2014-09-06 08:42:40,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-09-06 08:42:40,131 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
... 15 more
2014-09-06 08:42:40,131 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-09-06 08:42:40,135 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.1.1 0.10.0-cdh4.1.1 training 2014-09-06 08:41:23 2014-09-06 08:42:40 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
N/A inputfile MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
... 15 more
hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785,
Input(s):
Failed to read data from "/training/pig/datetemp.txt"
Output(s):
Failed to produce result in "hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
null
2014-09-06 08:42:40,135 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2014-09-06 08:42:40,142 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias inputfile
Details at logfile: /home/training/pig_1410006833865.log
Please help me here..!!
PigStorage is case sensitive. Use PigStorage and not pigstorage.
Your question headliner said you were trying to load a CSV file. For that, I've had good luck with using org.apache.pig.piggybank.storage.CSVExcelStorage() in my LOAD statements as demonstrated at https://martin.atlassian.net/wiki/x/WYBmAQ.
Why don't you write PigStorage('\t') as you have mentioned already you have tab delimited file instead of PigStorage()
Mentioned code -
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage()
As (EventID: chararray,eventdate: chararray,count:int);
May be this might solve your problem.
let me know if it is something else.
hdfs://192.168.195.130:8020/training/pig/datetemp.txt
file wat not found in your hdfs!! make sure the input file to be placed in the above location.
Have you checked whether the input path exists?
Try:
fs -ls /training/pig/ in Grunt Shell
if it displays datetemp.txt in the list then it will work otherwise give proper input path
Log is telling the ERROR clearly.
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
Can you check the file exists in HDFS or not?
You can also check your pig is running in mapreduce mode or local mode.
You can specify ',' in PigStorage Class to read CSV file.
Query looks like :
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage(',') As (EventID: chararray,eventdate: chararray,count:int);
grunt> dump inputfile;
And make sure that you have file '/training/pig/datetemp.txt' on HDFS.
To test run : hadoop fs -ls /training/pig/datetemp.txt

Resources