Trying to run statments on pig getting error - hadoop

When i start to read a file on hdfs using pig in mapreduce mode, when i used dump b it started the mapreduce process and after completing it, it goes on to repetition please tell me whats the problem. (I have set the file permissions to 777 and /tmp permissions in hdfs to 777).
[root#master conf]# pig -x mapreduce
17/04/19 23:05:59 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
17/04/19 23:05:59 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
17/04/19 23:05:59 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2017-04-19 23:05:59,615 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0 (r1746530) compiled Jun 01 2016, 23:10:49
2017-04-19 23:05:59,615 [main] INFO org.apache.pig.Main - Logging error messages to: /opt/hadoop/pig/conf/pig_1492623359614.log
2017-04-19 23:05:59,652 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2017-04-19 23:06:01,031 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost/
2017-04-19 23:06:02,136 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2017-04-19 23:06:02,205 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-3df7c96f-9eac-4874-aab9-9ca7726fe860
2017-04-19 23:06:02,205 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
grunt> a= load '/temp' AS (name:chararray, age:int, salary:int);
grunt> b= foreach a generate (name, salary);
grunt> dump b;
2017-04-19 23:06:22,093 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2017-04-19 23:06:22,190 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2017-04-19 23:06:22,267 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2017-04-19 23:06:22,309 [main] INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for a: $1
2017-04-19 23:06:22,456 [main] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (Tenured Gen) of size 699072512 to monitor. collectionUsageThreshold = 489350752, usageThreshold = 489350752
2017-04-19 23:06:22,564 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2017-04-19 23:06:22,589 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2017-04-19 23:06:22,589 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2017-04-19 23:06:22,724 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2017-04-19 23:06:23,128 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2017-04-19 23:06:23,152 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-04-19 23:06:23,154 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2017-04-19 23:06:23,820 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/opt/hadoop/pig/pig-0.16.0-core-h2.jar to DistributedCache through /tmp/temp2091099620/tmp-1166978625/pig-0.16.0-core-h2.jar
2017-04-19 23:06:23,951 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/opt/hadoop/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp2091099620/tmp-1829507825/automaton-1.11-8.jar
2017-04-19 23:06:24,026 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/opt/hadoop/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp2091099620/tmp-1436552250/antlr-runtime-3.4.jar
2017-04-19 23:06:24,119 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/opt/hadoop/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp2091099620/tmp-1393102603/joda-time-2.9.3.jar
2017-04-19 23:06:24,132 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2017-04-19 23:06:24,148 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2017-04-19 23:06:24,148 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2017-04-19 23:06:24,148 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2017-04-19 23:06:24,279 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2017-04-19 23:06:24,302 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2017-04-19 23:06:24,920 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2017-04-19 23:06:24,952 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2017-04-19 23:06:24,995 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2017-04-19 23:06:24,995 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2017-04-19 23:06:25,056 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2017-04-19 23:06:25,375 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2017-04-19 23:06:25,889 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1492621692528_0002
2017-04-19 23:06:26,195 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2017-04-19 23:06:26,411 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1492621692528_0002
2017-04-19 23:06:26,537 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master:8088/proxy/application_1492621692528_0002/
2017-04-19 23:06:26,537 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1492621692528_0002
2017-04-19 23:06:26,537 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases a,b
2017-04-19 23:06:26,537 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a[1,3],b[-1,-1] C: R:
2017-04-19 23:06:26,595 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2017-04-19 23:06:26,595 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1492621692528_0002]
2017-04-19 23:06:48,598 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2017-04-19 23:06:48,598 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1492621692528_0002]
2017-04-19 23:06:51,639 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2017-04-19 23:06:51,705 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-04-19 23:06:52,983 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-04-19 23:06:53,985 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-04-19 23:06:54,989 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-04-19 23:06:55,993 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-04-19 23:06:56,994 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-04-19 23:06:57,995 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-04-19 23:06:58,999 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-04-19 23:07:00,001 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-04-19 23:07:01,005 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
[2]+ Stopped pig -x mapreduce

Start the JobHistoryServer
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
Pig when ran in mapreduce mode expects the JobHistoryServer to be available.

Related

Having trouble Loading a multilevel JSON file into Apache Pig and saving it as csv

I'm super new to apache pig and was trying to load a multilevel json into pig and save it as csv.
Json file I have - home/vikaspattathe/dataset/sample.json
{"_id":{"$oid":"5a1321d5741a2384e802c552"},"reviewerID":"A3HVRXV0LVJN7","asin":"0110400550","reviewerName":"BiancaNicole","helpful":[4,4],"reviewText":"Best phone case ever . Everywhere I go I get a ton of compliments on it. It was in perfect condition as well.","overall":5.0,"summary":"A++++","unixReviewTime":1358035200,"reviewTime":"01 13, 2013","category":"Cell_Phones_and_Accessories","class":1.0}
{"_id":{"$oid":"5a1321d5741a2384e802c557"},"reviewerID":"A1BJGDS0L1IO6I","asin":"0110400550","reviewerName":"cf \"t\"","helpful":[0,3],"reviewText":"ITEM NOT SENT from Blue Top Company in Hong Kong and it's been over two months! I will report this. DO NOT use this company. Not happy at all!","overall":1.0,"summary":"ITEM NOT SENT!!","unixReviewTime":1359504000,"reviewTime":"01 30, 2013","category":"Cell_Phones_and_Accessories","class":0.0}
Opened pig from the the directory(incase that is of any concern) - /home/vikaspattathe/dataset/
I tried the below commands to load the data and getting following errors.
grunt> sample_table = LOAD '/home/vikaspattathe/dataset/sample.json' USING JsonLoader('id:chararray, reviewerId:chararray, asin:chararray, reviewerName:chararray, reviewText:chararray, overall:int, summary:chararray, unixReviewTime:chararray, reviewTime:chararray, category:chararray, class:int');
2022-10-29 11:35:55,333 [main] INFO org.apache.pig.impl.util.SpillableMemoryMan ager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2022-10-29 11:35:55,556 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2022-10-29 11:35:55,572 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2022-10-29 11:35:55,632 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
Tried dumping sample_table as I believe the load wasn't successfull.
grunt> dump sample_table;
2022-10-29 11:42:24,535 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2022-10-29 11:42:24,546 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2022-10-29 11:42:24,558 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2022-10-29 11:42:24,558 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2022-10-29 11:42:24,558 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, FilterConstantCalculator, ForEachConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitConstantCalculator, SplitFilter, StreamTypeCastInserter]}
2022-10-29 11:42:24,559 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2022-10-29 11:42:24,560 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2022-10-29 11:42:24,561 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2022-10-29 11:42:24,569 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2022-10-29 11:42:24,571 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at spam-ham-m/10.154.0.5:8032
2022-10-29 11:42:24,572 [main] INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at spam-ham-m/10.154.0.5:10200
2022-10-29 11:42:24,575 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2022-10-29 11:42:24,575 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2022-10-29 11:42:24,576 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2022-10-29 11:42:24,621 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/pig-0.18.0-SNAPSHOT-core-h3.jar to DistributedCache through /tmp/temp500810153/tmp-403584597/pig-0.18.0-SNAPSHOT-core-h3.jar
2022-10-29 11:42:24,642 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/hadoop/lib/jackson-core-asl-1.9.13.jar to DistributedCache through /tmp/temp500810153/tmp-332868819/jackson-core-asl-1.9.13.jar
2022-10-29 11:42:24,663 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp500810153/tmp759836349/automaton-1.11-8.jar
2022-10-29 11:42:24,683 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp500810153/tmp-305484567/antlr-runtime-3.4.jar
2022-10-29 11:42:24,815 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/hive/lib/hive-exec-3.1.2.jar to DistributedCache through /tmp/temp500810153/tmp-1240067490/hive-exec-3.1.2.jar
2022-10-29 11:42:24,837 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/lib/RoaringBitmap-shaded-0.7.45.jar to DistributedCache through /tmp/temp500810153/tmp-1911465994/RoaringBitmap-shaded-0.7.45.jar
2022-10-29 11:42:24,855 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2022-10-29 11:42:24,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2022-10-29 11:42:24,885 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at spam-ham-m/10.154.0.5:8032
2022-10-29 11:42:24,886 [JobControl] INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at spam-ham-m/10.154.0.5:10200
2022-10-29 11:42:24,906 [JobControl] INFO org.apache.hadoop.mapreduce.JobResourceUploader - Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/vikaspattathe/.staging/job_1667038087507_0005
2022-10-29 11:42:24,908 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2022-10-29 11:42:24,960 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /tmp/hadoop-yarn/staging/vikaspattathe/.staging/job_1667038087507_0005
2022-10-29 11:42:24,963 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://spam-ham-m/home/vikaspattathe/dataset/sample.json
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:298)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1565)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1562)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1562)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.pig.backend.hadoop.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop.PigJobControl.run(PigJobControl.java:205)
at java.lang.Thread.run(Thread.java:750)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:298)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://spam-ham-m/home/vikaspattathe/dataset/sample.json
at org.apache.hadoop.mapred.LocatedFileStatusFetcher.getFileStatuses(LocatedFileStatusFetcher.java:153)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:280)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:396)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:283)
... 18 more
2022-10-29 11:42:25,375 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1667038087507_0005
2022-10-29 11:42:25,375 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases sample_table
2022-10-29 11:42:25,375 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: sample_table[1,15] C: R:
2022-10-29 11:42:25,384 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2022-10-29 11:42:30,393 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2022-10-29 11:42:30,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1667038087507_0005 has failed! Stop running all dependent jobs
2022-10-29 11:42:30,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2022-10-29 11:42:30,395 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at spam-ham-m/10.154.0.5:8032
2022-10-29 11:42:30,396 [main] INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at spam-ham-m/10.154.0.5:10200
2022-10-29 11:42:30,404 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Could not get Job info from RM for job job_1667038087507_0005. Redirecting to job history server.
2022-10-29 11:42:30,420 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
2022-10-29 11:42:30,421 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2022-10-29 11:42:30,421 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.3 0.18.0-SNAPSHOT vikaspattathe 2022-10-29 11:42:24 2022-10-29 11:42:30 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1667038087507_0005 sample_table MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://spam-ham-m/home/vikaspattathe/dataset/sample.json
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:298)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1565)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1562)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1562)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.pig.backend.hadoop.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop.PigJobControl.run(PigJobControl.java:205)
at java.lang.Thread.run(Thread.java:750)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:298)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://spam-ham-m/home/vikaspattathe/dataset/sample.json
at org.apache.hadoop.mapred.LocatedFileStatusFetcher.getFileStatuses(LocatedFileStatusFetcher.java:153)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:280)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:396)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:283)
... 18 more
hdfs://spam-ham-m/tmp/temp500810153/tmp929245825,
Input(s):
Failed to read data from "/home/vikaspattathe/dataset/sample.json"
Output(s):
Failed to produce result in "hdfs://spam-ham-m/tmp/temp500810153/tmp929245825"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1667038087507_0005
2022-10-29 11:42:30,421 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2022-10-29 11:42:30,427 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sample_table. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
Details at logfile: /home/vikaspattathe/dataset/pig_1667043341193.log
The load wasn't successful.
Also, is the below command correct for storing the file as samplecsv.csv, once it's successfuly loaded.
grunt> STORE sample_table INTO '/home/vikaspattathe/dataset/samplecsv' USING PigStorage(',');
HDFS doesn't have /home file paths. User data is stored under /user paths. As the error says, the input you gave doesn't exist in HDFS.
If you want to load local filesystem, then prefix the path with file://
You also don't need HDFS to use pig, if you start it with pig -x local
As for the output, start with
X = LOAD '/tmp/sample.json' USING JsonLoader('id:(oid:chararray), reviewerID:chararray, asin:chararray, reviewerName:chararray, helpful:tuple(int), reviewText:chararray, overall:int, summary:chararray, unixReviewTime:long, reviewTime:chararray, category:chararray, class:int');
But I don't know how helpful field should be added for arrays... I tried helpful:[int], and that produces the same result.
\DUMP X;
((5a1321d5741a2384e802c552),A3HVRXV0LVJN7,0110400550,BiancaNicole,,4,,,,,,)
((5a1321d5741a2384e802c557),A1BJGDS0L1IO6I,0110400550,cf "t",,3,,,,,,)

New to Pig : Error 1066, Pig version 0.17.0 Hadoop Version 3.1.0

I have been trying to execute the following command in grunt
.............................................................................................................................................................
summerolympic= load '/olympics/input/summer.csv' using PigStorage('\t');
grunt> dump summerolympic
.....................................................................................................................................................
which resulted in Error 1066.
2018-05-19 00:01:34,684 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2018-05-19 00:01:34,702 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-05-19 00:01:34,703 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2018-05-19 00:01:34,703 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2018-05-19 00:01:34,704 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2018-05-19 00:01:34,705 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2018-05-19 00:01:34,705 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2018-05-19 00:01:34,715 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-05-19 00:01:34,717 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2018-05-19 00:01:34,722 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2018-05-19 00:01:34,722 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2018-05-19 00:01:34,726 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2018-05-19 00:01:35,221 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/Users/vaisakh/pig-0.17.0/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp604495205/tmp-331461663/pig-0.17.0-core-h2.jar
2018-05-19 00:01:35,248 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/Users/vaisakh/pig-0.17.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp604495205/tmp-567932277/automaton-1.11-8.jar
2018-05-19 00:01:35,273 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/Users/vaisakh/pig-0.17.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp604495205/tmp1847314396/antlr-runtime-3.4.jar
2018-05-19 00:01:35,297 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/Users/vaisakh/pig-0.17.0/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp604495205/tmp-1150759063/joda-time-2.9.3.jar
2018-05-19 00:01:35,301 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2018-05-19 00:01:35,314 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2018-05-19 00:01:35,321 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2018-05-19 00:01:35,524 [JobControl] INFO org.apache.hadoop.mapreduce.JobResourceUploader - Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/vaisakh/.staging/job_1526443313382_0005
2018-05-19 00:01:35,527 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2018-05-19 00:01:35,540 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2018-05-19 00:01:35,542 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
2018-05-19 00:01:35,542 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2018-05-19 00:01:35,545 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2018-05-19 00:01:35,994 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2018-05-19 00:01:36,025 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1526443313382_0005
2018-05-19 00:01:36,025 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Executing with tokens: []
2018-05-19 00:01:36,029 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2018-05-19 00:01:36,270 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1526443313382_0005
2018-05-19 00:01:36,278 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://SSs-MacBook-Air.local:8088/proxy/application_1526443313382_0005/
2018-05-19 00:01:36,278 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1526443313382_0005
2018-05-19 00:01:36,278 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases summerolympic
2018-05-19 00:01:36,278 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: summerolympic[5,15] C: R:
2018-05-19 00:01:36,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2018-05-19 00:01:36,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1526443313382_0005]
2018-05-19 00:01:46,346 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2018-05-19 00:01:46,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1526443313382_0005 has failed! Stop running all dependent jobs
2018-05-19 00:01:46,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2018-05-19 00:01:46,348 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2018-05-19 00:01:46,367 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2018-05-19 00:01:46,384 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2018-05-19 00:01:46,385 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.1.0 0.17.0 vaisakh 2018-05-19 00:01:34 2018-05-19 00:01:46 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1526443313382_0005 summerolympic MAP_ONLY Message: Job failed! hdfs://localhost:9000/tmp/temp604495205/tmp-558801235,
Input(s):
Failed to read data from "/olympics/input/summer.csv"
Output(s):
Failed to produce result in "hdfs://localhost:9000/tmp/temp604495205/tmp-558801235"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1526443313382_0005
2018-05-19 00:01:46,385 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2018-05-19 00:01:46,389 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias summerolympic
Any insights would be very much helpful.
Solved it by running in local mode.
$pig -x local

Failed to read data from "hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576"

test = LOAD 'hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576' USING TextLoader AS (line:chararray);
log = FOREACH test GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] "(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)"')) AS (address_ip: chararray, logname: chararray, user: chararray, timestamp: chararray, req_line: chararray, status: int, bytes: int, referer: chararray, userAgent: chararray);
STORE log INTO 'hbase://Access_Logs' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:address_ip, cf:logname, cf:user, cf:timestamp, cf:req_line, cf:status, cf:bytes, cf:referer, cf:userAgent');
2018-03-10 10:52:03,636 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,840 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,840 [main] INFO org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for Access_Logs
2018-03-10 10:52:03,843 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2018-03-10 10:52:03,859 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,860 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2018-03-10 10:52:03,860 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2018-03-10 10:52:03,890 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2018-03-10 10:52:03,890 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2018-03-10 10:52:03,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2018-03-10 10:52:03,897 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,898 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:52:03,899 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2018-03-10 10:52:03,899 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2018-03-10 10:52:03,900 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2018-03-10 10:52:03,981 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.15.0-core-h2.jar to DistributedCache through /tmp/temp1710369540/tmp1282565307/pig-0.15.0-core-h2.jar
2018-03-10 10:52:04,013 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/htrace-core-2.04.jar to DistributedCache through /tmp/temp1710369540/tmp520067094/htrace-core-2.04.jar
2018-03-10 10:52:04,067 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp1710369540/tmp946538428/guava-11.0.2.jar
2018-03-10 10:52:04,123 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-common-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp468949353/hbase-common-0.98.8-hadoop2.jar
2018-03-10 10:52:04,144 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-hadoop-compat-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp113887319/hbase-hadoop-compat-0.98.8-hadoop2.jar
2018-03-10 10:52:04,200 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-server-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp682998180/hbase-server-0.98.8-hadoop2.jar
2018-03-10 10:52:04,256 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-client-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp-1958170360/hbase-client-0.98.8-hadoop2.jar
2018-03-10 10:52:04,317 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-protocol-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp-892814021/hbase-protocol-0.98.8-hadoop2.jar
2018-03-10 10:52:04,363 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.5.jar to DistributedCache through /tmp/temp1710369540/tmp-830858682/zookeeper-3.4.5.jar
2018-03-10 10:52:04,396 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar to DistributedCache through /tmp/temp1710369540/tmp-420530468/protobuf-java-2.5.0.jar
2018-03-10 10:52:04,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/high-scale-lib-1.1.1.jar to DistributedCache through /tmp/temp1710369540/tmp-1046507224/high-scale-lib-1.1.1.jar
2018-03-10 10:52:04,474 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar to DistributedCache through /tmp/temp1710369540/tmp309001480/netty-3.6.2.Final.jar
2018-03-10 10:52:04,489 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1710369540/tmp964502237/automaton-1.11-8.jar
2018-03-10 10:52:04,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1710369540/tmp-1680308848/antlr-runtime-3.4.jar
2018-03-10 10:52:04,528 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/joda-time-2.5.jar to DistributedCache through /tmp/temp1710369540/tmp813805284/joda-time-2.5.jar
2018-03-10 10:52:04,539 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2018-03-10 10:52:04,544 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2018-03-10 10:52:04,544 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2018-03-10 10:52:04,544 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2018-03-10 10:52:04,555 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2018-03-10 10:52:04,557 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:52:04,596 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:04,597 [JobControl] INFO org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for Access_Logs
2018-03-10 10:52:04,625 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2018-03-10 10:52:04,742 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2018-03-10 10:52:04,742 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2018-03-10 10:52:04,748 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2018-03-10 10:52:04,786 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2018-03-10 10:52:04,828 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1519576410629_0017
2018-03-10 10:52:04,832 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2018-03-10 10:52:04,908 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1519576410629_0017
2018-03-10 10:52:04,910 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://server2.linux.com:8088/proxy/application_1519576410629_0017/
2018-03-10 10:52:05,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1519576410629_0017
2018-03-10 10:52:05,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases log,test
2018-03-10 10:52:05,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: test[16,7],log[-1,-1] C: R:
2018-03-10 10:52:05,064 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2018-03-10 10:52:05,064 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1519576410629_0017]
2018-03-10 10:53:02,248 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2018-03-10 10:53:02,249 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1519576410629_0017]
2018-03-10 10:53:05,259 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2018-03-10 10:53:05,259 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1519576410629_0017 has failed! Stop running all dependent jobs
2018-03-10 10:53:05,259 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2018-03-10 10:53:05,260 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:53:05,274 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
2018-03-10 10:53:05,789 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
2018-03-10 10:53:05,789 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2018-03-10 10:53:05,789 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.4.1 0.15.0 hadoop 2018-03-10 10:52:03 2018-03-10 10:53:05 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1519576410629_0017 log,test MAP_ONLY Message: Job failed! hbase://Access_Logs,
Input(s):
Failed to read data from "hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576"
Output(s):
Failed to produce result in "hbase://Access_Logs"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1519576410629_0017
2018-03-10 10:53:05,789 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
But runs fine when doing
dump log
The PIG script is not able to load data possibly for columns status: int, bytes: int.
The Error says
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
It means, the REGEX parser is bringing String data, when PIG expect it to be Integer.
To debug, try changing datatype in the PIG command and try just printing the output. Once all set, you may try to save into hbase.

org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader does not load a file in combined file format

I have an Apache combined log file that is stored in HDFS. Here is an example of the first five lines:
123.125.67.216 - - [02/Jan/2012:00:48:27 -0800] "GET /wiki/Dekart HTTP/1.1" 200 4512 "-" "Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"
209.85.238.130 - - [02/Jan/2012:00:48:27 -0800] "GET /w/index.php?title=Special:RecentChanges&feed=atom HTTP/1.1" 304 260 "-" "Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 4 subscribers; feed-id=11568779694056348047)"
123.125.67.213 - - [02/Jan/2012:00:48:33 -0800] "GET / HTTP/1.1" 301 433 "-" "Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"
123.125.67.214 - - [02/Jan/2012:00:48:33 -0800] "GET /wiki/Main_Page HTTP/1.1" 200 8647 "-" "Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"
I am trying to load this file with Apache Pig using the CombinedLogLoader in the piggybank. This should work. Here is my example code:
grunt> raw = LOAD 'log' USING org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader AS (remoteAddr, remoteLogname, user, time, method, uri, proto, status, bytes, referer, userAgent);
16/02/15 21:39:38 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> dump raw;
I get 0 records, even though the file has thousands.
Below is my full output. What am I doing wrong?
162493 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
16/02/15 21:39:40 INFO pigstats.ScriptState: Pig features used in the script: UNKNOWN
16/02/15 21:39:40 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
162551 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
16/02/15 21:39:40 WARN data.SchemaTupleBackend: SchemaTupleBackend has already been initialized
162551 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
16/02/15 21:39:40 INFO optimizer.LogicalPlanOptimizer: {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
162559 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
16/02/15 21:39:40 INFO mapReduceLayer.MRCompiler: File concatenation threshold: 100 optimistic? false
162562 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
16/02/15 21:39:40 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
162562 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
16/02/15 21:39:40 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
16/02/15 21:39:40 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
16/02/15 21:39:40 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-42-90.ec2.internal/172.31.42.90:8032
162586 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
16/02/15 21:39:40 INFO mapreduce.MRScriptState: Pig script settings are added to the job
162586 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
16/02/15 21:39:40 INFO mapReduceLayer.JobControlCompiler: mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
162587 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
16/02/15 21:39:40 INFO mapReduceLayer.JobControlCompiler: This job cannot be converted run in-process
162611 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/lib/piggybank.jar to DistributedCache through /tmp/temp2003065886/tmp2039083441/piggybank.jar
16/02/15 21:39:40 INFO mapReduceLayer.JobControlCompiler: Added jar file:/usr/lib/pig/lib/piggybank.jar to DistributedCache through /tmp/temp2003065886/tmp2039083441/piggybank.jar
162651 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/pig-0.14.0-amzn-0-core-h2.jar to DistributedCache through /tmp/temp2003065886/tmp551968774/pig-0.14.0-amzn-0-core-h2.jar
16/02/15 21:39:40 INFO mapReduceLayer.JobControlCompiler: Added jar file:/usr/lib/pig/pig-0.14.0-amzn-0-core-h2.jar to DistributedCache through /tmp/temp2003065886/tmp551968774/pig-0.14.0-amzn-0-core-h2.jar
162670 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp2003065886/tmp710362688/automaton-1.11-8.jar
16/02/15 21:39:40 INFO mapReduceLayer.JobControlCompiler: Added jar file:/usr/lib/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp2003065886/tmp710362688/automaton-1.11-8.jar
162689 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp2003065886/tmp-1076004022/antlr-runtime-3.4.jar
16/02/15 21:39:40 INFO mapReduceLayer.JobControlCompiler: Added jar file:/usr/lib/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp2003065886/tmp-1076004022/antlr-runtime-3.4.jar
162714 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/hadoop/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp2003065886/tmp1810740836/guava-11.0.2.jar
16/02/15 21:39:40 INFO mapReduceLayer.JobControlCompiler: Added jar file:/usr/lib/hadoop/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp2003065886/tmp1810740836/guava-11.0.2.jar
162737 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/lib/hadoop-mapreduce/joda-time-2.8.1.jar to DistributedCache through /tmp/temp2003065886/tmp-1238145114/joda-time-2.8.1.jar
16/02/15 21:39:40 INFO mapReduceLayer.JobControlCompiler: Added jar file:/usr/lib/hadoop-mapreduce/joda-time-2.8.1.jar to DistributedCache through /tmp/temp2003065886/tmp-1238145114/joda-time-2.8.1.jar
162752 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
16/02/15 21:39:40 INFO mapReduceLayer.JobControlCompiler: Setting up single store job
162753 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
16/02/15 21:39:40 INFO data.SchemaTupleFrontend: Key [pig.schematuple] is false, will not generate code.
162753 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
16/02/15 21:39:40 INFO data.SchemaTupleFrontend: Starting process to move generated code to distributed cacche
162753 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
16/02/15 21:39:40 INFO data.SchemaTupleFrontend: Setting key [pig.schematuple.classes] with classes to deserialize []
162776 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
16/02/15 21:39:40 INFO mapReduceLayer.MapReduceLauncher: 1 map-reduce job(s) waiting for submission.
16/02/15 21:39:40 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-42-90.ec2.internal/172.31.42.90:8032
16/02/15 21:39:40 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
16/02/15 21:39:40 INFO input.FileInputFormat: Total input paths to process : 1
162866 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
16/02/15 21:39:40 INFO util.MapRedUtil: Total input paths (combined) to process : 1
16/02/15 21:39:40 INFO mapreduce.JobSubmitter: number of splits:1
16/02/15 21:39:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1455560055771_0007
16/02/15 21:39:40 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
16/02/15 21:39:40 INFO impl.YarnClientImpl: Submitted application application_1455560055771_0007
16/02/15 21:39:40 INFO mapreduce.Job: The url to track the job: http://ip-172-31-42-90.ec2.internal:20888/proxy/application_1455560055771_0007/
163278 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1455560055771_0007
16/02/15 21:39:40 INFO mapReduceLayer.MapReduceLauncher: HadoopJobId: job_1455560055771_0007
163278 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases raw
16/02/15 21:39:40 INFO mapReduceLayer.MapReduceLauncher: Processing aliases raw
163278 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: raw[2,6],null[-1,-1] C: R:
16/02/15 21:39:40 INFO mapReduceLayer.MapReduceLauncher: detailed locations: M: raw[2,6],null[-1,-1] C: R:
163283 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
16/02/15 21:39:40 INFO mapReduceLayer.MapReduceLauncher: 0% complete
163283 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1455560055771_0007]
16/02/15 21:39:40 INFO mapReduceLayer.MapReduceLauncher: Running jobs are [job_1455560055771_0007]
177841 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
16/02/15 21:39:55 INFO mapReduceLayer.MapReduceLauncher: 50% complete
177841 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1455560055771_0007]
16/02/15 21:39:55 INFO mapReduceLayer.MapReduceLauncher: Running jobs are [job_1455560055771_0007]
16/02/15 21:39:56 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-42-90.ec2.internal/172.31.42.90:8032
16/02/15 21:39:56 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
16/02/15 21:39:56 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-42-90.ec2.internal/172.31.42.90:8032
16/02/15 21:39:56 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
16/02/15 21:39:56 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-42-90.ec2.internal/172.31.42.90:8032
16/02/15 21:39:56 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
178506 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
16/02/15 21:39:56 INFO mapReduceLayer.MapReduceLauncher: 100% complete
178506 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.7.1-amzn-0 0.14.0-amzn-0 hadoop 2016-02-15 21:39:40 2016-02-15 21:39:56 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_1455560055771_0007 1 0 5 5 5 5 0 0 0 0 raw MAP_ONLY hdfs://ip-172-31-42-90.ec2.internal:8020/tmp/temp2003065886/tmp1853785276,
Input(s):
Successfully read 0 records (10040153 bytes) from: "hdfs://ip-172-31-42-90.ec2.internal:8020/user/hadoop/log"
Output(s):
Successfully stored 0 records in: "hdfs://ip-172-31-42-90.ec2.internal:8020/tmp/temp2003065886/tmp1853785276"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1455560055771_0007
16/02/15 21:39:56 INFO mapreduce.SimplePigStats: Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.7.1-amzn-0 0.14.0-amzn-0 hadoop 2016-02-15 21:39:40 2016-02-15 21:39:56 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_1455560055771_0007 1 0 5 5 5 5 0 0 0 0 raw MAP_ONLY hdfs://ip-172-31-42-90.ec2.internal:8020/tmp/temp2003065886/tmp1853785276,
Input(s):
Successfully read 0 records (10040153 bytes) from: "hdfs://ip-172-31-42-90.ec2.internal:8020/user/hadoop/log"
Output(s):
Successfully stored 0 records in: "hdfs://ip-172-31-42-90.ec2.internal:8020/tmp/temp2003065886/tmp1853785276"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1455560055771_0007
16/02/15 21:39:56 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-42-90.ec2.internal/172.31.42.90:8032
16/02/15 21:39:56 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
16/02/15 21:39:56 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-42-90.ec2.internal/172.31.42.90:8032
16/02/15 21:39:56 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
16/02/15 21:39:56 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-42-90.ec2.internal/172.31.42.90:8032
16/02/15 21:39:56 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
178606 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
16/02/15 21:39:56 INFO mapReduceLayer.MapReduceLauncher: Success!
16/02/15 21:39:56 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
178607 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
16/02/15 21:39:56 INFO data.SchemaTupleBackend: Key [pig.schematuple] was not set... will not generate code.
16/02/15 21:39:56 INFO input.FileInputFormat: Total input paths to process : 1
178616 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
16/02/15 21:39:56 INFO util.MapRedUtil: Total input paths to process : 1
grunt>
Be sure your log file is in well-formed. I have noticed that your log file contained space on the edge of the line.Remove that space and execute the same script.
See I have executed your script and result is following below.
Script:
grunt> LOAD '~/temp1.log' USING org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader AS (remoteAddr, remoteLogname, user, time, method, uri, proto, status, bytes, referer, userAgent);
grunt> dump raw;
Output:
Input(s):
Successfully read 4 records (1090 bytes) from: "~/temp1.log"
Output(s):
Successfully stored 4 records (753 bytes) in: "hdfs://sandbox.hortonworks.com:8020/tmp/temp-53432852/tmp1982058168"
Counters:
Total records written : 4
Total bytes written : 753
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1522732012453_0054
2018-04-03 10:44:33,436 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-04-03 10:44:33,436 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8050
2018-04-03 10:44:33,437 [main] INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
2018-04-03 10:44:33,467 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2018-04-03 10:44:33,620 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-04-03 10:44:33,620 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8050
2018-04-03 10:44:33,620 [main] INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
2018-04-03 10:44:33,635 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2018-04-03 10:44:33,797 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-04-03 10:44:33,797 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8050
2018-04-03 10:44:33,799 [main] INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
2018-04-03 10:44:33,813 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2018-04-03 10:44:33,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2018-04-03 10:44:33,939 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2018-04-03 10:44:33,955 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2018-04-03 10:44:33,955 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(123.125.67.216,-,-,02/Jan/2012:00:48:27 -0800,GET,/wiki/Dekart,HTTP/1.1,200,4512,-,Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2)
(209.85.238.130,-,-,02/Jan/2012:00:48:27 -0800,GET,/w/index.php?title=Special:RecentChanges&feed=atom,HTTP/1.1,304,260,-,Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 4 subscribers; feed-id=11568779694056348047))
(123.125.67.213,-,-,02/Jan/2012:00:48:33 -0800,GET,/,HTTP/1.1,301,433,-,Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2)
(123.125.67.214,-,-,02/Jan/2012:00:48:33 -0800,GET,/wiki/Main_Page,HTTP/1.1,200,8647,-,Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2)

Pig jobs not running on hadoop 2.6

When I execute the below dump command, it stop in the 0% complete and not continuing for next step. How to track the issue .
grunt> a = load '/user/hduser1/file1' using PigStorage(',') as (usernames:chararray, password:chararray, price:int);
2015-12-25 10:34:26,471 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> dump a;
2015-12-25 10:34:33,862 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-12-25 10:34:33,904 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:33,910 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2015-12-25 10:34:33,951 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2015-12-25 10:34:34,069 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-12-25 10:34:34,088 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-12-25 10:34:34,088 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-12-25 10:34:34,158 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:34,197 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-12-25 10:34:34,425 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2015-12-25 10:34:34,432 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2015-12-25 10:34:34,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-12-25 10:34:34,432 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-12-25 10:34:34,435 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2015-12-25 10:34:35,464 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp-1328911758/tmp304593952/pig-0.14.0-core-h2.jar
2015-12-25 10:34:35,519 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1328911758/tmp1133010181/automaton-1.11-8.jar
2015-12-25 10:34:35,575 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1328911758/tmp-1601875197/antlr-runtime-3.4.jar
2015-12-25 10:34:35,641 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp-1328911758/tmp1137978977/guava-11.0.2.jar
2015-12-25 10:34:36,218 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp-1328911758/tmp325144073/joda-time-2.1.jar
2015-12-25 10:34:36,259 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-12-25 10:34:36,304 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-12-25 10:34:36,305 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2015-12-25 10:34:36,311 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-12-25 10:34:36,344 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:36,706 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2015-12-25 10:34:36,782 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-12-25 10:34:36,782 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-12-25 10:34:36,820 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2015-12-25 10:34:36,972 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2015-12-25 10:34:37,483 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1451017152672_0001
2015-12-25 10:34:37,664 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2015-12-25 10:34:38,078 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1451017152672_0001
2015-12-25 10:34:38,221 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://vijee-Lenovo-IdeaPad-S510p:8088/proxy/application_1451017152672_0001/
2015-12-25 10:34:38,221 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1451017152672_0001
2015-12-25 10:34:38,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases a
2015-12-25 10:34:38,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a[1,4],a[-1,-1] C: R:
2015-12-25 10:34:38,243 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-12-25 10:34:38,243 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1451017152672_0001]
Find the Image for the application.
logs

Resources