I have got a running hadoop (2.6.0) cluster with 6 nodes (master node included) and want to run a pig (0.14.0) script in mapreduce mode. The script is running with no errors, but unfortunately it seems that it is only running on the master node. During my research I tried some changes to the hadoop config files with no success.
Can you help me figure out how to get pig working on the whole cluster?
Here are some information:
Configuration on every node:
General:
/etc/hosts
127.0.0.1 localhost
192.168.101.3 master
192.168.101.4 node1
192.168.101.5 node2
192.168.101.6 node3
192.168.101.7 node4
192.168.101.8 node5
Hadoop:
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
<description>...</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
<description>...</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
<description>...</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8050</value>
<description>...</description>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8041</value>
<description>...</description>
</property>
<property>
<name>yarn.nodemanager.aux_services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux_services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>master:19888/jobhistory/logs/</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary dictionaries.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
<description>...</description>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>master:54311</value>
<description>...</description>
</property>
<property>
<name>mapred.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
<description>...</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
<description>...</description>
</property>
</configuration>
pig output:
15/01/09 13:12:54 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
15/01/09 13:12:54 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
15/01/09 13:12:54 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2015-01-09 13:12:54,845 [main] INFO org.apache.pig.Main - Apache Pig version 0.14.0 (r1640057) compiled Nov 16 2014, 18:02:05
2015-01-09 13:12:54,845 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hduser/pig_1420805574843.log
2015-01-09 13:12:56,450 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hduser/.pigbootup not found
2015-01-09 13:12:56,876 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-01-09 13:12:56,886 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:12:56,886 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master:9000/
2015-01-09 13:12:58,146 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master:54311
2015-01-09 13:12:59,195 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:12:59,418 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:12:59,598 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:13:00,496 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: FILTER,UNION
2015-01-09 13:13:00,618 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:13:00,634 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2015-01-09 13:13:00,713 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2015-01-09 13:13:00,987 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-01-09 13:13:01,037 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-01-09 13:13:01,038 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-01-09 13:13:01,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:13:01,103 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id
2015-01-09 13:13:01,105 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2015-01-09 13:13:01,149 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2015-01-09 13:13:01,161 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2015-01-09 13:13:01,161 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-01-09 13:13:01,161 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-01-09 13:13:01,167 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2015-01-09 13:13:19,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/hduser/pig-0.14.0/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp-1277984423/tmp-918732110/pig-0.14.0-core-h2.jar
2015-01-09 13:13:20,063 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/hduser/pig-0.14.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1277984423/tmp883771618/automaton-1.11-8.jar
2015-01-09 13:13:20,621 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/hduser/pig-0.14.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1277984423/tmp-1372558595/antlr-runtime-3.4.jar
2015-01-09 13:13:26,600 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp-1277984423/tmp-1556176302/guava-11.0.2.jar
2015-01-09 13:13:29,300 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/hduser/pig-0.14.0/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp-1277984423/tmp145012374/joda-time-2.1.jar
2015-01-09 13:13:29,718 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-01-09 13:13:29,736 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-01-09 13:13:29,736 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2015-01-09 13:13:29,736 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-01-09 13:13:29,840 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-01-09 13:13:29,841 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2015-01-09 13:13:30,191 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2015-01-09 13:13:30,384 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:13:30,785 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2015-01-09 13:13:30,949 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:13:30,949 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:13:31,250 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 52
2015-01-09 13:13:31,309 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:13:31,309 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:13:31,355 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 24
2015-01-09 13:13:31,378 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:13:31,379 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:13:31,394 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 6
2015-01-09 13:13:31,587 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:82
2015-01-09 13:13:31,706 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:13:32,475 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local647507189_0001
2015-01-09 13:13:33,628 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805612754/pig-0.14.0-core-h2.jar <- /home/hduser/pig-0.14.0-core-h2.jar
2015-01-09 13:13:33,758 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp-1277984423/tmp-918732110/pig-0.14.0-core-h2.jar as file:/app/hadoop/tmp/mapred/local/1420805612754/pig-0.14.0-core-h2.jar
2015-01-09 13:13:33,759 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805612755/automaton-1.11-8.jar <- /home/hduser/automaton-1.11-8.jar
2015-01-09 13:13:33,770 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp-1277984423/tmp883771618/automaton-1.11-8.jar as file:/app/hadoop/tmp/mapred/local/1420805612755/automaton-1.11-8.jar
2015-01-09 13:13:33,772 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805612756/antlr-runtime-3.4.jar <- /home/hduser/antlr-runtime-3.4.jar
2015-01-09 13:13:33,781 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp-1277984423/tmp-1372558595/antlr-runtime-3.4.jar as file:/app/hadoop/tmp/mapred/local/1420805612756/antlr-runtime-3.4.jar
2015-01-09 13:15:54,534 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp206201348/tmp-1481268210/guava-11.0.2.jar
2015-01-09 13:15:56,233 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/hduser/pig-0.14.0/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp206201348/tmp-1921418840/joda-time-2.1.jar
2015-01-09 13:15:56,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-01-09 13:15:56,366 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-01-09 13:15:56,367 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2015-01-09 13:15:56,368 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-01-09 13:15:56,483 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-01-09 13:15:56,486 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2015-01-09 13:15:56,505 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2015-01-09 13:15:56,582 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:15:56,695 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2015-01-09 13:15:57,070 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:15:57,070 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:15:57,197 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 52
2015-01-09 13:15:57,227 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:15:57,228 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:15:57,263 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 24
2015-01-09 13:15:57,289 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:15:57,289 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:15:57,306 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 6
2015-01-09 13:15:57,393 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:82
2015-01-09 13:15:57,416 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:15:57,791 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local561414911_0001
2015-01-09 13:15:58,741 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805758017/pig-0.14.0-core-h2.jar <- /home/hduser/pig-0.14.0-core-h2.jar
2015-01-09 13:15:58,755 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp206201348/tmp1912320441/pig-0.14.0-core-h2.jar as file:/app/hadoop/tmp/mapred/local/1420805758017/pig-0.14.0-core-h2.jar
2015-01-09 13:15:58,757 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805758018/automaton-1.11-8.jar <- /home/hduser/automaton-1.11-8.jar
2015-01-09 13:15:58,766 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp206201348/tmp-886499198/automaton-1.11-8.jar as file:/app/hadoop/tmp/mapred/local/1420805758018/automaton-1.11-8.jar
2015-01-09 13:15:58,768 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805758019/antlr-runtime-3.4.jar <- /home/hduser/antlr-runtime-3.4.jar
2015-01-09 13:15:58,778 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp206201348/tmp1437387446/antlr-runtime-3.4.jar as file:/app/hadoop/tmp/mapred/local/1420805758019/antlr-runtime-3.4.jar
2015-01-09 13:15:58,779 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805758020/guava-11.0.2.jar <- /home/hduser/guava-11.0.2.jar
2015-01-09 13:15:58,786 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp206201348/tmp-1481268210/guava-11.0.2.jar as file:/app/hadoop/tmp/mapred/local/1420805758020/guava-11.0.2.jar
2015-01-09 13:15:58,787 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805758021/joda-time-2.1.jar <- /home/hduser/joda-time-2.1.jar
2015-01-09 13:15:58,795 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp206201348/tmp-1921418840/joda-time-2.1.jar as file:/app/hadoop/tmp/mapred/local/1420805758021/joda-time-2.1.jar
2015-01-09 13:15:58,953 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1420805758017/pig-0.14.0-core-h2.jar
2015-01-09 13:15:58,954 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1420805758018/automaton-1.11-8.jar
2015-01-09 13:15:58,955 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1420805758019/antlr-runtime-3.4.jar
2015-01-09 13:15:58,955 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1420805758020/guava-11.0.2.jar
2015-01-09 13:15:58,955 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1420805758021/joda-time-2.1.jar
2015-01-09 13:15:58,970 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/
2015-01-09 13:15:58,973 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local561414911_0001
2015-01-09 13:15:58,973 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases records_infobox,records_mappingbased,records_person,records_union,result_filter
2015-01-09 13:15:58,973 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: records_person[10,17],records_person[-1,-1],null[-1,-1],records_union[13,16],records_infobox[6,18],records_infobox[-1,-1],result_filter[16,16],records_mappingbased[8,23],records_mappingbased[-1,-1],null[-1,-1] C: R:
2015-01-09 13:15:58,990 [Thread-19] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
2015-01-09 13:15:58,991 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-01-09 13:15:58,994 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local561414911_0001]
2015-01-09 13:15:59,067 [Thread-19] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2015-01-09 13:15:59,069 [Thread-19] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:15:59,069 [Thread-19] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-01-09 13:15:59,094 [Thread-19] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2015-01-09 13:15:59,257 [Thread-19] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2015-01-09 13:15:59,258 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local561414911_0001_m_000000_0
2015-01-09 13:15:59,459 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
2015-01-09 13:15:59,470 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 134217728
Input split[0]:
Length = 134217728
ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
Locations:
-----------------------
2015-01-09 13:15:59,522 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed hdfs://master:9000/wiki/infobox_properties_en.nt:0+134217728
2015-01-09 13:15:59,662 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2015-01-09 13:15:59,743 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records_person[10,17],records_person[-1,-1],null[-1,-1],records_union[13,16],records_infobox[6,18],records_infobox[-1,-1],result_filter[16,16],records_mappingbased[8,23],records_mappingbased[-1,-1],null[-1,-1] C: R:
2015-01-09 13:15:59,798 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject(ACCESSING_NON_EXISTENT_FIELD): Attempt to access field which was not found in the input
2015-01-09 13:15:59,815 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject(ACCESSING_NON_EXISTENT_FIELD): Attempt to access field which was not found in the input
2015-01-09 13:16:05,578 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map
2015-01-09 13:16:08,582 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map
2015-01-09 13:16:10,209 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map
2015-01-09 13:16:10,699 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task:attempt_local561414911_0001_m_000000_0 is done. And is in the process of committing
2015-01-09 13:16:10,714 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map
2015-01-09 13:16:10,714 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task attempt_local561414911_0001_m_000000_0 is allowed to commit now
2015-01-09 13:16:10,849 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local561414911_0001_m_000000_0' to hdfs://master:9000/tmp/temp206201348/tmp-1297558267/_temporary/0/task_local561414911_0001_m_000000
2015-01-09 13:16:10,854 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map
2015-01-09 13:16:10,854 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local561414911_0001_m_000000_0' done.
2015-01-09 13:16:10,855 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local561414911_0001_m_000000_0
2015-01-09 13:16:10,855 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local561414911_0001_m_000001_0
2015-01-09 13:16:10,877 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
2015-01-09 13:16:10,883 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
....
I had similar problem, but with different mapred-site.xml, but nonetheless I think the problem is there.
Yarn is the next version of MR, that's why we need the following section in the file to make sure it is used with older programs:
<property>
<name>mapred.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
However, assuming you use Yarn, you don't have Jobtracker, since it was replaced by ResourceManager in some sense (actually, it was a complete re-design. You can read about it in http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/ )
So, you need to erase the following lines:
<property>
<name>mapreduce.jobtracker.address</name>
<value>master:54311</value>
<description>...</description>
</property>
from the file, and pig will be good to go.
(There is a relevant answer discussing this change in
Why there is a mapreduce.jobtracker.address configuration on YARN?)
According to logs you post here your job is running in Local system([Local Job Runner])
There is one property in Pig called as pig.auto.local.enabled by default it comes as true for performance, it means if your data size is less then the size set in property pig.auto.local.input.maxbytes which is 1 GB by default pig will not execute in cluster(Yarn UI will also not show application for the job also) instead it will be executed in the node where it is launched. Both the properties you can set in pig.properties file.
Related
I have been trying to execute the following command in grunt
.............................................................................................................................................................
summerolympic= load '/olympics/input/summer.csv' using PigStorage('\t');
grunt> dump summerolympic
.....................................................................................................................................................
which resulted in Error 1066.
2018-05-19 00:01:34,684 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2018-05-19 00:01:34,702 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-05-19 00:01:34,703 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2018-05-19 00:01:34,703 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2018-05-19 00:01:34,704 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2018-05-19 00:01:34,705 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2018-05-19 00:01:34,705 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2018-05-19 00:01:34,715 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-05-19 00:01:34,717 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2018-05-19 00:01:34,722 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2018-05-19 00:01:34,722 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2018-05-19 00:01:34,726 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2018-05-19 00:01:35,221 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/Users/vaisakh/pig-0.17.0/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp604495205/tmp-331461663/pig-0.17.0-core-h2.jar
2018-05-19 00:01:35,248 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/Users/vaisakh/pig-0.17.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp604495205/tmp-567932277/automaton-1.11-8.jar
2018-05-19 00:01:35,273 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/Users/vaisakh/pig-0.17.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp604495205/tmp1847314396/antlr-runtime-3.4.jar
2018-05-19 00:01:35,297 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/Users/vaisakh/pig-0.17.0/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp604495205/tmp-1150759063/joda-time-2.9.3.jar
2018-05-19 00:01:35,301 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2018-05-19 00:01:35,314 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2018-05-19 00:01:35,321 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2018-05-19 00:01:35,524 [JobControl] INFO org.apache.hadoop.mapreduce.JobResourceUploader - Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/vaisakh/.staging/job_1526443313382_0005
2018-05-19 00:01:35,527 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2018-05-19 00:01:35,540 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2018-05-19 00:01:35,542 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
2018-05-19 00:01:35,542 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2018-05-19 00:01:35,545 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2018-05-19 00:01:35,994 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2018-05-19 00:01:36,025 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1526443313382_0005
2018-05-19 00:01:36,025 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Executing with tokens: []
2018-05-19 00:01:36,029 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2018-05-19 00:01:36,270 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1526443313382_0005
2018-05-19 00:01:36,278 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://SSs-MacBook-Air.local:8088/proxy/application_1526443313382_0005/
2018-05-19 00:01:36,278 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1526443313382_0005
2018-05-19 00:01:36,278 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases summerolympic
2018-05-19 00:01:36,278 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: summerolympic[5,15] C: R:
2018-05-19 00:01:36,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2018-05-19 00:01:36,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1526443313382_0005]
2018-05-19 00:01:46,346 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2018-05-19 00:01:46,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1526443313382_0005 has failed! Stop running all dependent jobs
2018-05-19 00:01:46,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2018-05-19 00:01:46,348 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2018-05-19 00:01:46,367 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2018-05-19 00:01:46,384 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2018-05-19 00:01:46,385 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.1.0 0.17.0 vaisakh 2018-05-19 00:01:34 2018-05-19 00:01:46 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1526443313382_0005 summerolympic MAP_ONLY Message: Job failed! hdfs://localhost:9000/tmp/temp604495205/tmp-558801235,
Input(s):
Failed to read data from "/olympics/input/summer.csv"
Output(s):
Failed to produce result in "hdfs://localhost:9000/tmp/temp604495205/tmp-558801235"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1526443313382_0005
2018-05-19 00:01:46,385 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2018-05-19 00:01:46,389 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias summerolympic
Any insights would be very much helpful.
Solved it by running in local mode.
$pig -x local
test = LOAD 'hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576' USING TextLoader AS (line:chararray);
log = FOREACH test GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] "(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)"')) AS (address_ip: chararray, logname: chararray, user: chararray, timestamp: chararray, req_line: chararray, status: int, bytes: int, referer: chararray, userAgent: chararray);
STORE log INTO 'hbase://Access_Logs' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:address_ip, cf:logname, cf:user, cf:timestamp, cf:req_line, cf:status, cf:bytes, cf:referer, cf:userAgent');
2018-03-10 10:52:03,636 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,840 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,840 [main] INFO org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for Access_Logs
2018-03-10 10:52:03,843 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2018-03-10 10:52:03,859 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,860 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2018-03-10 10:52:03,860 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2018-03-10 10:52:03,890 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2018-03-10 10:52:03,890 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2018-03-10 10:52:03,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2018-03-10 10:52:03,897 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,898 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:52:03,899 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2018-03-10 10:52:03,899 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2018-03-10 10:52:03,900 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2018-03-10 10:52:03,981 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.15.0-core-h2.jar to DistributedCache through /tmp/temp1710369540/tmp1282565307/pig-0.15.0-core-h2.jar
2018-03-10 10:52:04,013 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/htrace-core-2.04.jar to DistributedCache through /tmp/temp1710369540/tmp520067094/htrace-core-2.04.jar
2018-03-10 10:52:04,067 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp1710369540/tmp946538428/guava-11.0.2.jar
2018-03-10 10:52:04,123 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-common-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp468949353/hbase-common-0.98.8-hadoop2.jar
2018-03-10 10:52:04,144 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-hadoop-compat-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp113887319/hbase-hadoop-compat-0.98.8-hadoop2.jar
2018-03-10 10:52:04,200 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-server-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp682998180/hbase-server-0.98.8-hadoop2.jar
2018-03-10 10:52:04,256 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-client-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp-1958170360/hbase-client-0.98.8-hadoop2.jar
2018-03-10 10:52:04,317 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-protocol-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp-892814021/hbase-protocol-0.98.8-hadoop2.jar
2018-03-10 10:52:04,363 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.5.jar to DistributedCache through /tmp/temp1710369540/tmp-830858682/zookeeper-3.4.5.jar
2018-03-10 10:52:04,396 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar to DistributedCache through /tmp/temp1710369540/tmp-420530468/protobuf-java-2.5.0.jar
2018-03-10 10:52:04,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/high-scale-lib-1.1.1.jar to DistributedCache through /tmp/temp1710369540/tmp-1046507224/high-scale-lib-1.1.1.jar
2018-03-10 10:52:04,474 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar to DistributedCache through /tmp/temp1710369540/tmp309001480/netty-3.6.2.Final.jar
2018-03-10 10:52:04,489 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1710369540/tmp964502237/automaton-1.11-8.jar
2018-03-10 10:52:04,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1710369540/tmp-1680308848/antlr-runtime-3.4.jar
2018-03-10 10:52:04,528 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/joda-time-2.5.jar to DistributedCache through /tmp/temp1710369540/tmp813805284/joda-time-2.5.jar
2018-03-10 10:52:04,539 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2018-03-10 10:52:04,544 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2018-03-10 10:52:04,544 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2018-03-10 10:52:04,544 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2018-03-10 10:52:04,555 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2018-03-10 10:52:04,557 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:52:04,596 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:04,597 [JobControl] INFO org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for Access_Logs
2018-03-10 10:52:04,625 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2018-03-10 10:52:04,742 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2018-03-10 10:52:04,742 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2018-03-10 10:52:04,748 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2018-03-10 10:52:04,786 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2018-03-10 10:52:04,828 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1519576410629_0017
2018-03-10 10:52:04,832 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2018-03-10 10:52:04,908 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1519576410629_0017
2018-03-10 10:52:04,910 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://server2.linux.com:8088/proxy/application_1519576410629_0017/
2018-03-10 10:52:05,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1519576410629_0017
2018-03-10 10:52:05,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases log,test
2018-03-10 10:52:05,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: test[16,7],log[-1,-1] C: R:
2018-03-10 10:52:05,064 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2018-03-10 10:52:05,064 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1519576410629_0017]
2018-03-10 10:53:02,248 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2018-03-10 10:53:02,249 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1519576410629_0017]
2018-03-10 10:53:05,259 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2018-03-10 10:53:05,259 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1519576410629_0017 has failed! Stop running all dependent jobs
2018-03-10 10:53:05,259 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2018-03-10 10:53:05,260 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:53:05,274 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
2018-03-10 10:53:05,789 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
2018-03-10 10:53:05,789 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2018-03-10 10:53:05,789 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.4.1 0.15.0 hadoop 2018-03-10 10:52:03 2018-03-10 10:53:05 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1519576410629_0017 log,test MAP_ONLY Message: Job failed! hbase://Access_Logs,
Input(s):
Failed to read data from "hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576"
Output(s):
Failed to produce result in "hbase://Access_Logs"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1519576410629_0017
2018-03-10 10:53:05,789 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
But runs fine when doing
dump log
The PIG script is not able to load data possibly for columns status: int, bytes: int.
The Error says
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
It means, the REGEX parser is bringing String data, when PIG expect it to be Integer.
To debug, try changing datatype in the PIG command and try just printing the output. Once all set, you may try to save into hbase.
I'm trying to use Hadoop and Apache Pig. I have a .txt file with some data and a script .pig file with my script :
student = LOAD '/home/srv-hadoop/data.txt' USING PigStorage(',')
as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);
student_order = ORDER student BY firstname ASC;
Dump student_order;
And this is my .txt file :
001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi
004,Preethi,Agarwal,21,9848022330,Pune
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
006,Archana,Mishra,23,9848022335,Chennai
007,Komal,Nayak,24,9848022334,trivendram
008,Bharathi,Nambiayar,24,9848022333,Chennai
But, when I execute : pig -x mapreduce data.pig
17/07/25 17:04:59 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
17/07/25 17:04:59 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
17/07/25 17:04:59 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2017-07-25 17:04:59,399 [main] INFO org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
2017-07-25 17:04:59,399 [main] INFO org.apache.pig.Main - Logging error messages to: /home/srv-hadoop/pig_1500995099397.log
2017-07-25 17:04:59,749 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-07-25 17:04:59,930 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/srv-hadoop/.pigbootup not found
2017-07-25 17:05:00,062 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-07-25 17:05:00,066 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:54310
2017-07-25 17:05:00,470 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:54311
2017-07-25 17:05:00,489 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-data.pig-2bb2e75c-41a7-42bf-926f-05354b881211
2017-07-25 17:05:00,489 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
2017-07-25 17:05:01,230 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: ORDER_BY
2017-07-25 17:05:01,279 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2017-07-25 17:05:01,308 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2017-07-25 17:05:01,362 [main] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2017-07-25 17:05:01,411 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2017-07-25 17:05:01,452 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SecondaryKeyOptimizerMR - Using Secondary Key Optimization for MapReduce node scope-23
2017-07-25 17:05:01,462 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2017-07-25 17:05:01,462 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2017-07-25 17:05:01,515 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id
2017-07-25 17:05:01,516 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2017-07-25 17:05:01,548 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2017-07-25 17:05:01,552 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2017-07-25 17:05:01,552 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-07-25 17:05:01,555 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2017-07-25 17:05:01,558 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2017-07-25 17:05:01,570 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
2017-07-25 17:05:01,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/srv-hadoop/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp1676993497/tmp-1698368733/pig-0.17.0-core-h2.jar
2017-07-25 17:05:01,932 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/srv-hadoop/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1676993497/tmp885160047/automaton-1.11-8.jar
2017-07-25 17:05:01,975 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/srv-hadoop/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1676993497/tmp-1346471388/antlr-runtime-3.4.jar
2017-07-25 17:05:02,012 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/srv-hadoop/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp1676993497/tmp32088650/joda-time-2.9.3.jar
2017-07-25 17:05:02,023 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2017-07-25 17:05:02,031 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2017-07-25 17:05:02,031 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2017-07-25 17:05:02,031 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2017-07-25 17:05:02,093 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2017-07-25 17:05:02,095 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2017-07-25 17:05:02,095 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-07-25 17:05:02,104 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-07-25 17:05:02,113 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2017-07-25 17:05:02,178 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2017-07-25 17:05:02,207 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2017-07-25 17:05:02,213 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area file:/home/srv-hadoop/hadoop-2.6.2/tmp/mapred/staging/srv-hadoop1897657638/.staging/job_local1897657638_0001
2017-07-25 17:05:02,214 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:data.pig got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:294)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:302)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:319)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.pig.backend.hadoop.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop.PigJobControl.run(PigJobControl.java:205)
at java.lang.Thread.run(Thread.java:748)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:301)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
... 18 more
2017-07-25 17:05:02,597 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1897657638_0001
2017-07-25 17:05:02,597 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases student
2017-07-25 17:05:02,597 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: student[1,10],student[-1,-1] C: R:
2017-07-25 17:05:02,600 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2017-07-25 17:05:07,608 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2017-07-25 17:05:07,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1897657638_0001 has failed! Stop running all dependent jobs
2017-07-25 17:05:07,609 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2017-07-25 17:05:07,619 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-07-25 17:05:07,620 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
2017-07-25 17:05:07,620 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2017-07-25 17:05:07,622 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.2 0.17.0 srv-hadoop 2017-07-25 17:05:01 2017-07-25 17:05:07 ORDER_BY
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local1897657638_0001 student MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:294)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:302)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:319)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.pig.backend.hadoop.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop.PigJobControl.run(PigJobControl.java:205)
at java.lang.Thread.run(Thread.java:748)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:301)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
... 18 more
Input(s):
Failed to read data from "/home/srv-hadoop/data.txt"
Output(s):
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1897657638_0001 -> null,
null -> null,
null
2017-07-25 17:05:07,622 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2017-07-25 17:05:07,624 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias student_order
Details at logfile: /home/srv-hadoop/pig_1500995099397.log
2017-07-25 17:05:07,648 [main] INFO org.apache.pig.Main - Pig script completed in 8 seconds and 442 milliseconds (8442 ms)
I get :
Input(s):
Failed to read data from "/home/srv-hadoop/data.txt"
Output(s):
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1897657638_0001 -> null,
null -> null,
null
Bit, if I execute : pig -x local data.pig --> it works fine
I missed something ?
Hey It seems that your 'data.txt' is on your local file system. When you run 'pig -x mapreduce' it expect input to be in hdfs.
Since '/home/srv-hadoop/data.txt' file is on local file system your 'pig -x local ' is working.
make directory on hadoop filesystem:
hadoop fs -mkdir -p /home/srv-hadoop/
copy your data.txt file from local to hadoop
hadoop fs -put /home/srv-hadoop/data.txt /home/srv-hadoop/
now run you pig in mapreduce mode. It will work fine
You have 6 fields in your file and the schema that you have specified has only 5 fields.You have another field after lastname,age perhaps.Modify your load statement like below.
student = LOAD '/home/srv-hadoop/data.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray,age:int, phone:chararray, city:chararray);
You can try this: pig -x mapreduce file://data.pig
oozie #emailAction #hadoop
Im running a hadoop pig job using a oozie workflow. How Can I access the entire log for the hadoop job in the workflow xml, so that I can use it in the success/failure email action?
Thanks
Sample Log I need in email:
2016-10-26 13:58:30,385 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2016-10-26 13:58:30,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-10-26 13:58:30,522 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-10-26 13:58:30,522 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-10-26 13:58:30,608 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2016-10-26 13:58:30,639 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-10-26 13:58:30,640 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2016-10-26 13:58:30,647 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=2369469310
2016-10-26 13:58:30,648 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 3
2016-10-26 13:58:30,876 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5719456061273645490.jar
2016-10-26 13:58:33,816 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job5719456061273645490.jar created
2016-10-26 13:58:33,834 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2016-10-26 13:58:33,865 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2016-10-26 13:58:33,896 [JobControl] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2016-10-26 13:58:34,053 [JobControl] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-10-26 13:58:34,053 [JobControl] WARN org.apache.hadoop.conf.Configuration - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2016-10-26 13:58:34,115 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-10-26 13:58:34,166 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 18
2016-10-26 13:58:34,367 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2016-10-26 13:58:35,007 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201610241241_0117
2016-10-26 13:58:35,007 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A
2016-10-26 13:58:35,007 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4] C: R:
2016-10-26 13:58:35,007 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: XXX/jobdetails.jsp?jobid=job_201610241241_0117
2016-10-26 13:58:45,851 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 6% complete
2016-10-26 13:58:46,865 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 8% complete
2016-10-26 13:58:48,907 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 12% complete
2016-10-26 13:58:51,982 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 17% complete
2016-10-26 13:58:55,059 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 21% complete
2016-10-26 13:58:58,098 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 25% complete
2016-10-26 13:59:01,120 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 26% complete
2016-10-26 13:59:42,816 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 32% complete
2016-10-26 13:59:44,324 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
2016-10-26 13:59:45,832 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 35% complete
2016-10-26 13:59:49,351 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 39% complete
2016-10-26 13:59:53,374 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 42% complete
2016-10-26 14:01:04,726 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-10-26 14:01:04,728 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.7.1 0.11.0-cdh4.7.1 hadoop 2016-10-26 13:58:30 2016-10-26 14:01:04 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_201610241241_0117 18 0 138 24 76 79 0 0 0 0 A MAP_ONLY /home/hadoop/xx/xx/xx/20161015/00,
Input(s):
Successfully read 116235853 records (2369955422 bytes) from: "/home/hadoop/xx/data/xx/20161015/00/part*"
Output(s):
Successfully stored 116235853 records (5855768014 bytes) in: "/home/hadoop/xx/xx/xx/20161015/00"
Counters:
Total records written : 116235853
Total bytes written : 5855768014
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201610241241_0117
2016-10-26 14:01:04,747 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
Based on the question and comments I would recommend you to do the following:
Once a job fails, don't transition it directly to the OK node. Instead route it either to a fail node (if you just want to see faillures when looking on the cluster) or route it first to a mail node, and then to either OK or fail depending on your preference.
In the mail sent in the mail node, you could add the job id. Then people know that they need to look at this job on the server because something failed in it.
Optionally you could always send a mail, in that case use a setup with transition to either a mailOK or a mailFail node, so people know that the process ran at all, and whether they need to look at a faillure.
When I execute the below dump command, it stop in the 0% complete and not continuing for next step. How to track the issue .
grunt> a = load '/user/hduser1/file1' using PigStorage(',') as (usernames:chararray, password:chararray, price:int);
2015-12-25 10:34:26,471 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> dump a;
2015-12-25 10:34:33,862 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-12-25 10:34:33,904 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:33,910 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2015-12-25 10:34:33,951 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2015-12-25 10:34:34,069 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-12-25 10:34:34,088 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-12-25 10:34:34,088 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-12-25 10:34:34,158 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:34,197 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-12-25 10:34:34,425 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2015-12-25 10:34:34,432 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2015-12-25 10:34:34,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-12-25 10:34:34,432 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-12-25 10:34:34,435 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2015-12-25 10:34:35,464 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp-1328911758/tmp304593952/pig-0.14.0-core-h2.jar
2015-12-25 10:34:35,519 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1328911758/tmp1133010181/automaton-1.11-8.jar
2015-12-25 10:34:35,575 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1328911758/tmp-1601875197/antlr-runtime-3.4.jar
2015-12-25 10:34:35,641 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp-1328911758/tmp1137978977/guava-11.0.2.jar
2015-12-25 10:34:36,218 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp-1328911758/tmp325144073/joda-time-2.1.jar
2015-12-25 10:34:36,259 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-12-25 10:34:36,304 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-12-25 10:34:36,305 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2015-12-25 10:34:36,311 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-12-25 10:34:36,344 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:36,706 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2015-12-25 10:34:36,782 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-12-25 10:34:36,782 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-12-25 10:34:36,820 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2015-12-25 10:34:36,972 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2015-12-25 10:34:37,483 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1451017152672_0001
2015-12-25 10:34:37,664 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2015-12-25 10:34:38,078 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1451017152672_0001
2015-12-25 10:34:38,221 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://vijee-Lenovo-IdeaPad-S510p:8088/proxy/application_1451017152672_0001/
2015-12-25 10:34:38,221 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1451017152672_0001
2015-12-25 10:34:38,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases a
2015-12-25 10:34:38,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a[1,4],a[-1,-1] C: R:
2015-12-25 10:34:38,243 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-12-25 10:34:38,243 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1451017152672_0001]
Find the Image for the application.
logs