Related
I'm super new to apache pig and was trying to load a multilevel json into pig and save it as csv.
Json file I have - home/vikaspattathe/dataset/sample.json
{"_id":{"$oid":"5a1321d5741a2384e802c552"},"reviewerID":"A3HVRXV0LVJN7","asin":"0110400550","reviewerName":"BiancaNicole","helpful":[4,4],"reviewText":"Best phone case ever . Everywhere I go I get a ton of compliments on it. It was in perfect condition as well.","overall":5.0,"summary":"A++++","unixReviewTime":1358035200,"reviewTime":"01 13, 2013","category":"Cell_Phones_and_Accessories","class":1.0}
{"_id":{"$oid":"5a1321d5741a2384e802c557"},"reviewerID":"A1BJGDS0L1IO6I","asin":"0110400550","reviewerName":"cf \"t\"","helpful":[0,3],"reviewText":"ITEM NOT SENT from Blue Top Company in Hong Kong and it's been over two months! I will report this. DO NOT use this company. Not happy at all!","overall":1.0,"summary":"ITEM NOT SENT!!","unixReviewTime":1359504000,"reviewTime":"01 30, 2013","category":"Cell_Phones_and_Accessories","class":0.0}
Opened pig from the the directory(incase that is of any concern) - /home/vikaspattathe/dataset/
I tried the below commands to load the data and getting following errors.
grunt> sample_table = LOAD '/home/vikaspattathe/dataset/sample.json' USING JsonLoader('id:chararray, reviewerId:chararray, asin:chararray, reviewerName:chararray, reviewText:chararray, overall:int, summary:chararray, unixReviewTime:chararray, reviewTime:chararray, category:chararray, class:int');
2022-10-29 11:35:55,333 [main] INFO org.apache.pig.impl.util.SpillableMemoryMan ager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2022-10-29 11:35:55,556 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2022-10-29 11:35:55,572 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2022-10-29 11:35:55,632 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
Tried dumping sample_table as I believe the load wasn't successfull.
grunt> dump sample_table;
2022-10-29 11:42:24,535 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2022-10-29 11:42:24,546 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2022-10-29 11:42:24,558 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2022-10-29 11:42:24,558 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2022-10-29 11:42:24,558 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, FilterConstantCalculator, ForEachConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitConstantCalculator, SplitFilter, StreamTypeCastInserter]}
2022-10-29 11:42:24,559 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2022-10-29 11:42:24,560 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2022-10-29 11:42:24,561 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2022-10-29 11:42:24,569 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2022-10-29 11:42:24,571 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at spam-ham-m/10.154.0.5:8032
2022-10-29 11:42:24,572 [main] INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at spam-ham-m/10.154.0.5:10200
2022-10-29 11:42:24,575 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2022-10-29 11:42:24,575 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2022-10-29 11:42:24,576 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2022-10-29 11:42:24,621 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/pig-0.18.0-SNAPSHOT-core-h3.jar to DistributedCache through /tmp/temp500810153/tmp-403584597/pig-0.18.0-SNAPSHOT-core-h3.jar
2022-10-29 11:42:24,642 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/hadoop/lib/jackson-core-asl-1.9.13.jar to DistributedCache through /tmp/temp500810153/tmp-332868819/jackson-core-asl-1.9.13.jar
2022-10-29 11:42:24,663 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp500810153/tmp759836349/automaton-1.11-8.jar
2022-10-29 11:42:24,683 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp500810153/tmp-305484567/antlr-runtime-3.4.jar
2022-10-29 11:42:24,815 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/hive/lib/hive-exec-3.1.2.jar to DistributedCache through /tmp/temp500810153/tmp-1240067490/hive-exec-3.1.2.jar
2022-10-29 11:42:24,837 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/lib/RoaringBitmap-shaded-0.7.45.jar to DistributedCache through /tmp/temp500810153/tmp-1911465994/RoaringBitmap-shaded-0.7.45.jar
2022-10-29 11:42:24,855 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2022-10-29 11:42:24,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2022-10-29 11:42:24,885 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at spam-ham-m/10.154.0.5:8032
2022-10-29 11:42:24,886 [JobControl] INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at spam-ham-m/10.154.0.5:10200
2022-10-29 11:42:24,906 [JobControl] INFO org.apache.hadoop.mapreduce.JobResourceUploader - Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/vikaspattathe/.staging/job_1667038087507_0005
2022-10-29 11:42:24,908 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2022-10-29 11:42:24,960 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /tmp/hadoop-yarn/staging/vikaspattathe/.staging/job_1667038087507_0005
2022-10-29 11:42:24,963 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://spam-ham-m/home/vikaspattathe/dataset/sample.json
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:298)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1565)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1562)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1562)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.pig.backend.hadoop.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop.PigJobControl.run(PigJobControl.java:205)
at java.lang.Thread.run(Thread.java:750)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:298)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://spam-ham-m/home/vikaspattathe/dataset/sample.json
at org.apache.hadoop.mapred.LocatedFileStatusFetcher.getFileStatuses(LocatedFileStatusFetcher.java:153)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:280)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:396)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:283)
... 18 more
2022-10-29 11:42:25,375 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1667038087507_0005
2022-10-29 11:42:25,375 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases sample_table
2022-10-29 11:42:25,375 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: sample_table[1,15] C: R:
2022-10-29 11:42:25,384 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2022-10-29 11:42:30,393 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2022-10-29 11:42:30,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1667038087507_0005 has failed! Stop running all dependent jobs
2022-10-29 11:42:30,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2022-10-29 11:42:30,395 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at spam-ham-m/10.154.0.5:8032
2022-10-29 11:42:30,396 [main] INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at spam-ham-m/10.154.0.5:10200
2022-10-29 11:42:30,404 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Could not get Job info from RM for job job_1667038087507_0005. Redirecting to job history server.
2022-10-29 11:42:30,420 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
2022-10-29 11:42:30,421 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2022-10-29 11:42:30,421 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.3 0.18.0-SNAPSHOT vikaspattathe 2022-10-29 11:42:24 2022-10-29 11:42:30 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1667038087507_0005 sample_table MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://spam-ham-m/home/vikaspattathe/dataset/sample.json
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:298)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1565)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1562)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1562)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.pig.backend.hadoop.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop.PigJobControl.run(PigJobControl.java:205)
at java.lang.Thread.run(Thread.java:750)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:298)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://spam-ham-m/home/vikaspattathe/dataset/sample.json
at org.apache.hadoop.mapred.LocatedFileStatusFetcher.getFileStatuses(LocatedFileStatusFetcher.java:153)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:280)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:396)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:283)
... 18 more
hdfs://spam-ham-m/tmp/temp500810153/tmp929245825,
Input(s):
Failed to read data from "/home/vikaspattathe/dataset/sample.json"
Output(s):
Failed to produce result in "hdfs://spam-ham-m/tmp/temp500810153/tmp929245825"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1667038087507_0005
2022-10-29 11:42:30,421 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2022-10-29 11:42:30,427 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sample_table. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
Details at logfile: /home/vikaspattathe/dataset/pig_1667043341193.log
The load wasn't successful.
Also, is the below command correct for storing the file as samplecsv.csv, once it's successfuly loaded.
grunt> STORE sample_table INTO '/home/vikaspattathe/dataset/samplecsv' USING PigStorage(',');
HDFS doesn't have /home file paths. User data is stored under /user paths. As the error says, the input you gave doesn't exist in HDFS.
If you want to load local filesystem, then prefix the path with file://
You also don't need HDFS to use pig, if you start it with pig -x local
As for the output, start with
X = LOAD '/tmp/sample.json' USING JsonLoader('id:(oid:chararray), reviewerID:chararray, asin:chararray, reviewerName:chararray, helpful:tuple(int), reviewText:chararray, overall:int, summary:chararray, unixReviewTime:long, reviewTime:chararray, category:chararray, class:int');
But I don't know how helpful field should be added for arrays... I tried helpful:[int], and that produces the same result.
\DUMP X;
((5a1321d5741a2384e802c552),A3HVRXV0LVJN7,0110400550,BiancaNicole,,4,,,,,,)
((5a1321d5741a2384e802c557),A1BJGDS0L1IO6I,0110400550,cf "t",,3,,,,,,)
test = LOAD 'hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576' USING TextLoader AS (line:chararray);
log = FOREACH test GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] "(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)"')) AS (address_ip: chararray, logname: chararray, user: chararray, timestamp: chararray, req_line: chararray, status: int, bytes: int, referer: chararray, userAgent: chararray);
STORE log INTO 'hbase://Access_Logs' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:address_ip, cf:logname, cf:user, cf:timestamp, cf:req_line, cf:status, cf:bytes, cf:referer, cf:userAgent');
2018-03-10 10:52:03,636 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,840 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,840 [main] INFO org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for Access_Logs
2018-03-10 10:52:03,843 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2018-03-10 10:52:03,859 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,860 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2018-03-10 10:52:03,860 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2018-03-10 10:52:03,890 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2018-03-10 10:52:03,890 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2018-03-10 10:52:03,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2018-03-10 10:52:03,897 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,898 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:52:03,899 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2018-03-10 10:52:03,899 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2018-03-10 10:52:03,900 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2018-03-10 10:52:03,981 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.15.0-core-h2.jar to DistributedCache through /tmp/temp1710369540/tmp1282565307/pig-0.15.0-core-h2.jar
2018-03-10 10:52:04,013 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/htrace-core-2.04.jar to DistributedCache through /tmp/temp1710369540/tmp520067094/htrace-core-2.04.jar
2018-03-10 10:52:04,067 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp1710369540/tmp946538428/guava-11.0.2.jar
2018-03-10 10:52:04,123 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-common-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp468949353/hbase-common-0.98.8-hadoop2.jar
2018-03-10 10:52:04,144 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-hadoop-compat-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp113887319/hbase-hadoop-compat-0.98.8-hadoop2.jar
2018-03-10 10:52:04,200 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-server-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp682998180/hbase-server-0.98.8-hadoop2.jar
2018-03-10 10:52:04,256 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-client-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp-1958170360/hbase-client-0.98.8-hadoop2.jar
2018-03-10 10:52:04,317 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-protocol-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp-892814021/hbase-protocol-0.98.8-hadoop2.jar
2018-03-10 10:52:04,363 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.5.jar to DistributedCache through /tmp/temp1710369540/tmp-830858682/zookeeper-3.4.5.jar
2018-03-10 10:52:04,396 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar to DistributedCache through /tmp/temp1710369540/tmp-420530468/protobuf-java-2.5.0.jar
2018-03-10 10:52:04,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/high-scale-lib-1.1.1.jar to DistributedCache through /tmp/temp1710369540/tmp-1046507224/high-scale-lib-1.1.1.jar
2018-03-10 10:52:04,474 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar to DistributedCache through /tmp/temp1710369540/tmp309001480/netty-3.6.2.Final.jar
2018-03-10 10:52:04,489 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1710369540/tmp964502237/automaton-1.11-8.jar
2018-03-10 10:52:04,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1710369540/tmp-1680308848/antlr-runtime-3.4.jar
2018-03-10 10:52:04,528 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/joda-time-2.5.jar to DistributedCache through /tmp/temp1710369540/tmp813805284/joda-time-2.5.jar
2018-03-10 10:52:04,539 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2018-03-10 10:52:04,544 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2018-03-10 10:52:04,544 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2018-03-10 10:52:04,544 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2018-03-10 10:52:04,555 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2018-03-10 10:52:04,557 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:52:04,596 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:04,597 [JobControl] INFO org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for Access_Logs
2018-03-10 10:52:04,625 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2018-03-10 10:52:04,742 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2018-03-10 10:52:04,742 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2018-03-10 10:52:04,748 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2018-03-10 10:52:04,786 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2018-03-10 10:52:04,828 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1519576410629_0017
2018-03-10 10:52:04,832 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2018-03-10 10:52:04,908 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1519576410629_0017
2018-03-10 10:52:04,910 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://server2.linux.com:8088/proxy/application_1519576410629_0017/
2018-03-10 10:52:05,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1519576410629_0017
2018-03-10 10:52:05,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases log,test
2018-03-10 10:52:05,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: test[16,7],log[-1,-1] C: R:
2018-03-10 10:52:05,064 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2018-03-10 10:52:05,064 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1519576410629_0017]
2018-03-10 10:53:02,248 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2018-03-10 10:53:02,249 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1519576410629_0017]
2018-03-10 10:53:05,259 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2018-03-10 10:53:05,259 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1519576410629_0017 has failed! Stop running all dependent jobs
2018-03-10 10:53:05,259 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2018-03-10 10:53:05,260 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:53:05,274 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
2018-03-10 10:53:05,789 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
2018-03-10 10:53:05,789 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2018-03-10 10:53:05,789 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.4.1 0.15.0 hadoop 2018-03-10 10:52:03 2018-03-10 10:53:05 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1519576410629_0017 log,test MAP_ONLY Message: Job failed! hbase://Access_Logs,
Input(s):
Failed to read data from "hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576"
Output(s):
Failed to produce result in "hbase://Access_Logs"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1519576410629_0017
2018-03-10 10:53:05,789 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
But runs fine when doing
dump log
The PIG script is not able to load data possibly for columns status: int, bytes: int.
The Error says
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
It means, the REGEX parser is bringing String data, when PIG expect it to be Integer.
To debug, try changing datatype in the PIG command and try just printing the output. Once all set, you may try to save into hbase.
So I was trying to import a table from Oracle to Hive, using Sqoop. Here is my query
sqoop-import --hive-import --connect jdbc:oracle:thin:#10.35.10.180:1521:dms
--table DEFECT
--hive-database inspex
--username INSPEX
--password inspex
Yarn seems to split the job into 4 parts.
17/02/17 15:15:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.9.1
17/02/17 15:15:17 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/02/17 15:15:17 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
17/02/17 15:15:17 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
17/02/17 15:15:18 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
17/02/17 15:15:18 INFO manager.SqlManager: Using default fetchSize of 1000
17/02/17 15:15:18 INFO tool.CodeGenTool: Beginning code generation
17/02/17 15:15:18 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "DEFECT" t WHERE 1=0
17/02/17 15:15:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/72422bf2a67c745893ae440ad77e3049/DEFECT.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/02/17 15:15:20 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/72422bf2a67c745893ae440ad77e3049/DEFECT.jar
17/02/17 15:15:20 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:20 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:20 INFO mapreduce.ImportJobBase: Beginning import of DEFECT
17/02/17 15:15:20 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/02/17 15:15:20 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/02/17 15:15:21 INFO client.RMProxy: Connecting to ResourceManager at vn1.localdomain/10.35.10.17:8032
17/02/17 15:15:23 INFO db.DBInputFormat: Using read commited transaction isolation
17/02/17 15:15:23 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN("DEFECT_INDEX"), MAX("DEFECT_INDEX") FROM "DEFECT"
17/02/17 15:15:45 INFO mapreduce.JobSubmitter: number of splits:4
17/02/17 15:15:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1487360998088_0003
17/02/17 15:15:45 INFO impl.YarnClientImpl: Submitted application application_1487360998088_0003
17/02/17 15:15:45 INFO mapreduce.Job: The url to track the job: http://vn1.localdomain:8088/proxy/application_1487360998088_0003/
17/02/17 15:15:45 INFO mapreduce.Job: Running job: job_1487360998088_0003
17/02/17 15:15:51 INFO mapreduce.Job: Job job_1487360998088_0003 running in uber mode : false
17/02/17 15:15:51 INFO mapreduce.Job: map 0% reduce 0%
17/02/17 15:15:57 INFO mapreduce.Job: map 50% reduce 0%
17/02/17 15:16:35 INFO mapreduce.Job: map 75% reduce 0%
Then it got stuck in 75% and runs forever. I notice that 3 out of 4 jobs are finished pretty quick. Except for one:
Seems like this job isn't making any progress and just stay at 0%. I checked out the syslog:
2017-02-17 15:15:53,795 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-02-17 15:15:53,851 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-02-17 15:15:53,851 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2017-02-17 15:15:53,859 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2017-02-17 15:15:53,859 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1487360998088_0003, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#41480ec1)
2017-02-17 15:15:53,948 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2017-02-17 15:15:54,426 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /yarn/nm/usercache/root/appcache/application_1487360998088_0003
2017-02-17 15:15:55,409 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2017-02-17 15:15:55,813 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2017-02-17 15:15:55,822 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2017-02-17 15:15:56,278 INFO [main] org.apache.sqoop.mapreduce.db.DBInputFormat: Using read commited transaction isolation
2017-02-17 15:15:56,411 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: "DEFECT_INDEX" >= 1 AND "DEFECT_INDEX" < 545225318
2017-02-17 15:15:56,491 INFO [main] org.apache.sqoop.mapreduce.db.OracleDBRecordReader: Time zone has been set to GMT
2017-02-17 15:15:56,564 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: "DEFECT_INDEX" >= 1 AND "DEFECT_INDEX" < 545225318
2017-02-17 15:15:56,610 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: SELECT "DEFECT_INDEX", "LAYER_SCAN_INDEX", "DEFECT_SITE_ID", "CLUSTER_ID", "CARRY_OVER", "VERIFIED_ADDER", "REPEATING_DEFECT", "TEST_NUMBER", "X_DIE_COORDINATE", "Y_DIE_COORDINATE", "X_COORDINATE", "Y_COORDINATE", "X_DEFECT_SIZE", "Y_DEFECT_SIZE", "D_SIZE", "INSPECT_INTENSITY", "PATTERN_ID", "SURFACE", "ANGLE", "HOT_SPOT", "ASPECT_RATIO", "GRAY_SCALE", "MACROSIGID", "REGIONID", "SEMREVSAMPLE", "POLARITY", "DBGROUP", "CAREAREAGROUPCODE", "VENNID", "SEGMENTID", "MDAT_OFFSET", "DESIGNFILEFLOORPLANID", "DBSCRITICALITYINDEX", "CELLSIZE", "PCI", "LINECOMPLEXITY", "DCIRANGE", "GDSX", "GDSY" FROM "DEFECT" WHERE ( "DEFECT_INDEX" >= 1 ) AND ( "DEFECT_INDEX" < 545225318 )
The log ends on starting executing query. I wait for like 10 hours and still there's no update. Which shouldn't be right because it's not really that big.
I didn't find any error in the log. Before this I have successfully imported several tables from oracle to hive. So I think my configuration is fine.
I have tried to set mapper from 1 to 100 and still didn't work. And I notice that the task which PK starts from 1~somenumber always show 0% progress while other works just fine.
I'm looking for any suggestions or help. Thanks.
by default, sqoop aplit your job in 4 mappers based on the PK of your table, however depending on the distribution of your data, this can be very inefficient. You didnt talk about the size of your cluster or your data but I would sugeest see how is the distribution of your data and try set a greater number of maps(-m option) using a different column to split the load (split-by option). your job is using the defect_index column to split the work
When I execute the below dump command, it stop in the 0% complete and not continuing for next step. How to track the issue .
grunt> a = load '/user/hduser1/file1' using PigStorage(',') as (usernames:chararray, password:chararray, price:int);
2015-12-25 10:34:26,471 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> dump a;
2015-12-25 10:34:33,862 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-12-25 10:34:33,904 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:33,910 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2015-12-25 10:34:33,951 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2015-12-25 10:34:34,069 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-12-25 10:34:34,088 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-12-25 10:34:34,088 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-12-25 10:34:34,158 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:34,197 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-12-25 10:34:34,425 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2015-12-25 10:34:34,432 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2015-12-25 10:34:34,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-12-25 10:34:34,432 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-12-25 10:34:34,435 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2015-12-25 10:34:35,464 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp-1328911758/tmp304593952/pig-0.14.0-core-h2.jar
2015-12-25 10:34:35,519 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1328911758/tmp1133010181/automaton-1.11-8.jar
2015-12-25 10:34:35,575 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1328911758/tmp-1601875197/antlr-runtime-3.4.jar
2015-12-25 10:34:35,641 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp-1328911758/tmp1137978977/guava-11.0.2.jar
2015-12-25 10:34:36,218 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp-1328911758/tmp325144073/joda-time-2.1.jar
2015-12-25 10:34:36,259 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-12-25 10:34:36,304 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-12-25 10:34:36,305 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2015-12-25 10:34:36,311 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-12-25 10:34:36,344 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:36,706 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2015-12-25 10:34:36,782 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-12-25 10:34:36,782 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-12-25 10:34:36,820 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2015-12-25 10:34:36,972 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2015-12-25 10:34:37,483 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1451017152672_0001
2015-12-25 10:34:37,664 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2015-12-25 10:34:38,078 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1451017152672_0001
2015-12-25 10:34:38,221 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://vijee-Lenovo-IdeaPad-S510p:8088/proxy/application_1451017152672_0001/
2015-12-25 10:34:38,221 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1451017152672_0001
2015-12-25 10:34:38,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases a
2015-12-25 10:34:38,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a[1,4],a[-1,-1] C: R:
2015-12-25 10:34:38,243 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-12-25 10:34:38,243 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1451017152672_0001]
Find the Image for the application.
logs
Im trying to do bulkload in HBase but below exception is coming while loading the data...
Application application_1439213972129_0080 initialization failed (exitCode=255) with output: Requested user root is not whitelisted and has id 0,which is below the minimum allowed 500
Failing this attempt. Failing the application.
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,personal:Name,Profession:Position_Title,Profession:Department,personal:Employee_Annual_Salary -Dimporttsv.separator=',' /tables/emp_salary_new1 /mapr/MapRDev/apps/Datasets/Employee_Details.csv
2015-08-13 18:24:33,076 INFO [main] mapreduce.TableMapReduceUtil: Setting speculative execution off for bulkload operation
2015-08-13 18:24:33,123 INFO [main] mapreduce.TableMapReduceUtil: Configured 'hbase.mapreduce.mapr.tablepath' to /tables/emp_salary_new1
2015-08-13 18:24:33,220 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2015-08-13 18:24:33,372 INFO [main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2015-08-13 18:24:33,735 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2015-08-13 18:24:33,770 INFO [main] mapreduce.TableOutputFormat: Created table instance for /tables/emp_salary_new1
2015-08-13 18:24:34,252 INFO [main] input.FileInputFormat: Total input paths to process : 1
2015-08-13 18:24:34,294 INFO [main] mapreduce.JobSubmitter: number of splits:1
2015-08-13 18:24:34,535 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1439213972129_0055
2015-08-13 18:24:34,792 INFO [main] security.ExternalTokenManagerFactory: Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager
2015-08-13 18:24:35,031 INFO [main] impl.YarnClientImpl: Submitted application application_1439213972129_0055
2015-08-13 18:24:35,114 INFO [main] mapreduce.Job: The url to track the job: http://hadoop-c02n02.ss.sw.ericsson.se:8088/proxy/application_1439213972129_0055/
2015-08-13 18:24:35,115 INFO [main] mapreduce.Job: Running job: job_1439213972129_0055
2015-08-13 18:24:53,253 INFO [main] mapreduce.Job: Job job_1439213972129_0055 running in uber mode : false
2015-08-13 18:24:53,256 INFO [main] mapreduce.Job: map 0% reduce 0%
2015-08-13 18:24:53,281 INFO [main] mapreduce.Job: Job job_1439213972129_0055 failed with state FAILED due to: Application application_1439213972129_0055 failed 2 times due to AM Container for appattempt_1439213972129_0055_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://hadoop-c02n02.ss.sw.ericsson.se:8088/cluster/app/application_1439213972129_0055Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e02_1439213972129_0055_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : user is mapradm
main : requested yarn user is mapradm
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
2015-08-13 18:24:53,320 INFO [main] mapreduce.Job: Counters: 0
Looks like you are loading data in MapR DB not in hbase.But its fine hbase commands are compatible with MarDB. I just a small change in your command and see if that works for you.
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,personal:Name,Profession:Position_Title,Profession:Department,personal:Employee_Annual_Salary '-Dimporttsv.separator=,' /tables/emp_salary_new1 /mapr/MapRDev/apps/Datasets/Employee_Details.csv