GPU resource for hadoop 3.0 / yarn

GPU resource for hadoop 3.0 / yarn - hadoop

I try to use Hadoop 3.0 GA release with gpu, but when I executed the below shell command, there is an error and not working with gpu. please check the below and just let you know the shell command. I guess that there are misconfigurations from me.
2018-01-09 15:04:49,256 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:main(355)) - Initializing ApplicationMaster
2018-01-09 15:04:49,391 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:init(514)) - Application master for app, appId=1, clustertimestamp=1515477741976, attemptId=1
2018-01-09 15:04:49,418 WARN [main] distributedshell.ApplicationMaster (ApplicationMaster.java:init(626)) - Timeline service is not enabled
2018-01-09 15:04:49,418 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(649)) - Starting ApplicationMaster
2018-01-09 15:04:49,542 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(60)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-01-09 15:04:49,623 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(659)) - Executing with tokens:
2018-01-09 15:04:49,744 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(662)) - Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 1 cluster_timestamp: 1515477741976 } attemptId: 1 } keyId: 1619387150)
2018-01-09 15:04:49,801 INFO [main] client.RMProxy (RMProxy.java:newProxyInstance(133)) - Connecting to ResourceManager at /0.0.0.0:8030
2018-01-09 15:04:49,886 INFO [main] impl.NMClientAsyncImpl (NMClientAsyncImpl.java:serviceInit(138)) - Upper bound of the thread pool size is 500
2018-01-09 15:04:49,889 WARN [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(786)) - Timeline service is not enabled
2018-01-09 15:04:50,170 INFO [main] conf.Configuration (Configuration.java:getConfResourceAsInputStream(2656)) - resource-types.xml not found
2018-01-09 15:04:50,170 INFO [main] resource.ResourceUtils (ResourceUtils.java:addResourcesFileToConf(395)) - Unable to find 'resource-types.xml'.
2018-01-09 15:04:50,183 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,185 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,185 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,185 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,187 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,187 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,188 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,188 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,188 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,188 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(717)) - Max mem capability of resources in this cluster 8192
2018-01-09 15:04:50,188 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(720)) - Max vcores capability of resources in this cluster 4
2018-01-09 15:04:50,189 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(739)) - appattempt_1515477741976_0001_000001 received 0 previous attempts' running containers on AM registration.
2018-01-09 15:04:50,202 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:setupContainerAskForRM(1311)) - Requested container ask: Capability[<memory:-1, vCores:-1>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[gpu-1]
2018-01-09 15:04:50,246 WARN [AMRM Heartbeater thread] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:51,255 WARN [AMRM Heartbeater thread] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:52,273 WARN [AMRM Heartbeater thread] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:52,278 INFO [AMRM Callback Handler Thread] distributedshell.ApplicationMaster (ApplicationMaster.java:onContainersAllocated(957)) - Got response from RM for container ask, allocatedCnt=1
2018-01-09 15:04:52,278 WARN [AMRM Callback Handler Thread] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
And the shell command that I executed with respect to YARN-7223 ticket is followed by,
yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \ -jar <path/to/hadoop-yarn-applications-distributedshell.jar> \ -shell_command /usr/local/nvidia/bin/nvidia-smi -container_resource_profile gpu-1
Thanks in advance.

Related

The sqoop import action job in oozie is still running

The sqoop import action job in oozie is still running.
What should I check?
oozie version:5.2.1
hadoop version:3.3.4
sqoop version:1.4.7
>>> Invoking Sqoop command line now >>>
2022-10-26 08:26:56,510 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2022-10-26 08:26:56,543 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.7
2022-10-26 08:26:56,556 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
2022-10-26 08:26:56,566 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2022-10-26 08:26:56,607 [main] INFO org.apache.sqoop.manager.MySQLManager - Preparing to use a MySQL streaming resultset.
2022-10-26 08:26:56,607 [main] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation
2022-10-26 08:26:56,890 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `calendar` AS t LIMIT 1
2022-10-26 08:26:56,912 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `calendar` AS t LIMIT 1
2022-10-26 08:26:56,923 [main] INFO org.apache.sqoop.orm.CompilationManager - $HADOOP_MAPRED_HOME is not set
log4j: Finalizing appender named [EventCounter].
2022-10-26 08:26:58,284 [main] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-hadoop/compile/92647049c21a99ec3fe668f737f0bf1a/calendar.jar
2022-10-26 08:26:58,296 [main] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql.
2022-10-26 08:26:58,296 [main] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct
2022-10-26 08:26:58,296 [main] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path.
2022-10-26 08:26:58,296 [main] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql)
2022-10-26 08:26:58,305 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of calendar
2022-10-26 08:26:58,306 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2022-10-26 08:26:58,311 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2022-10-26 08:26:58,329 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2022-10-26 08:26:58,331 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
2022-10-26 08:26:58,393 [main] INFO org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider - Connecting to ResourceManager at bigdata/172.3.031.123:8032
2022-10-26 08:26:58,496 [main] INFO org.apache.hadoop.mapreduce.JobResourceUploader - Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/oozie/.staging/job_1666660764861_0196
2022-10-26 08:26:58,659 [main] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation
2022-10-26 08:26:58,694 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2022-10-26 08:26:58,800 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1666660764861_0196
2022-10-26 08:26:58,801 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Executing with tokens: [Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 195 cluster_timestamp: 1666660764861 } attemptId: 1 } keyId: 1397232171)]
2022-10-26 08:26:58,980 [main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1666660764861_0196
2022-10-26 08:26:59,016 [main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://bigdata:8088/proxy/application_1666660764861_0196/
2022-10-26 08:26:59,016 [main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://bigdata:8088/proxy/application_1666660764861_0196/
2022-10-26 08:26:59,017 [main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_1666660764861_0196
2022-10-26 08:26:59,017 [main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_1666660764861_0196

Error while doing bulkload in HBase

Im trying to do bulkload in HBase but below exception is coming while loading the data...
Application application_1439213972129_0080 initialization failed (exitCode=255) with output: Requested user root is not whitelisted and has id 0,which is below the minimum allowed 500
Failing this attempt. Failing the application.
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,personal:Name,Profession:Position_Title,Profession:Department,personal:Employee_Annual_Salary -Dimporttsv.separator=',' /tables/emp_salary_new1 /mapr/MapRDev/apps/Datasets/Employee_Details.csv
2015-08-13 18:24:33,076 INFO [main] mapreduce.TableMapReduceUtil: Setting speculative execution off for bulkload operation
2015-08-13 18:24:33,123 INFO [main] mapreduce.TableMapReduceUtil: Configured 'hbase.mapreduce.mapr.tablepath' to /tables/emp_salary_new1
2015-08-13 18:24:33,220 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2015-08-13 18:24:33,372 INFO [main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2015-08-13 18:24:33,735 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2015-08-13 18:24:33,770 INFO [main] mapreduce.TableOutputFormat: Created table instance for /tables/emp_salary_new1
2015-08-13 18:24:34,252 INFO [main] input.FileInputFormat: Total input paths to process : 1
2015-08-13 18:24:34,294 INFO [main] mapreduce.JobSubmitter: number of splits:1
2015-08-13 18:24:34,535 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1439213972129_0055
2015-08-13 18:24:34,792 INFO [main] security.ExternalTokenManagerFactory: Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager
2015-08-13 18:24:35,031 INFO [main] impl.YarnClientImpl: Submitted application application_1439213972129_0055
2015-08-13 18:24:35,114 INFO [main] mapreduce.Job: The url to track the job: http://hadoop-c02n02.ss.sw.ericsson.se:8088/proxy/application_1439213972129_0055/
2015-08-13 18:24:35,115 INFO [main] mapreduce.Job: Running job: job_1439213972129_0055
2015-08-13 18:24:53,253 INFO [main] mapreduce.Job: Job job_1439213972129_0055 running in uber mode : false
2015-08-13 18:24:53,256 INFO [main] mapreduce.Job: map 0% reduce 0%
2015-08-13 18:24:53,281 INFO [main] mapreduce.Job: Job job_1439213972129_0055 failed with state FAILED due to: Application application_1439213972129_0055 failed 2 times due to AM Container for appattempt_1439213972129_0055_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://hadoop-c02n02.ss.sw.ericsson.se:8088/cluster/app/application_1439213972129_0055Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e02_1439213972129_0055_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : user is mapradm
main : requested yarn user is mapradm
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
2015-08-13 18:24:53,320 INFO [main] mapreduce.Job: Counters: 0

Looks like you are loading data in MapR DB not in hbase.But its fine hbase commands are compatible with MarDB. I just a small change in your command and see if that works for you.
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,personal:Name,Profession:Position_Title,Profession:Department,personal:Employee_Annual_Salary '-Dimporttsv.separator=,' /tables/emp_salary_new1 /mapr/MapRDev/apps/Datasets/Employee_Details.csv

Running Spark on the slave node (YARN) doesn't work

I can run SparkPi example on the master node, but when I try the same command
"spark-submit --class SparkPi --master yarn-client sparkpi.jar 10"
on the slave node, I got an error:
2015-05-19 14:05:44,881 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing view acls to: maintainer
2015-05-19 14:05:44,886 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing modify acls to: maintainer
2015-05-19 14:05:44,887 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(maintainer); users with modify permissions: Set(maintainer)
2015-05-19 14:05:45,389 INFO [sparkDriver-akka.actor.default-dispatcher-4] slf4j.Slf4jLogger (Slf4jLogger.scala:applyOrElse(80)) - Slf4jLogger started
2015-05-19 14:05:45,443 INFO [sparkDriver-akka.actor.default-dispatcher-4] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Starting remoting
2015-05-19 14:05:45,641 INFO [sparkDriver-akka.actor.default-dispatcher-3] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting started; listening on addresses :[akka.tcp://sparkDriver#slave2.com:33055]
2015-05-19 14:05:45,644 INFO [sparkDriver-akka.actor.default-dispatcher-3] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting now listens on addresses: [akka.tcp://sparkDriver#slave2.com:33055]
2015-05-19 14:05:45,653 INFO [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'sparkDriver' on port 33055.
2015-05-19 14:05:45,674 INFO [main] spark.SparkEnv (Logging.scala:logInfo(59)) - Registering MapOutputTracker
2015-05-19 14:05:45,688 INFO [main] spark.SparkEnv (Logging.scala:logInfo(59)) - Registering BlockManagerMaster
2015-05-19 14:05:45,707 INFO [main] storage.DiskBlockManager (Logging.scala:logInfo(59)) - Created local directory at /tmp/spark-local-20150519140545-c81b
2015-05-19 14:05:45,712 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - MemoryStore started with capacity 265.4 MB
2015-05-19 14:05:46,205 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-05-19 14:05:46,408 INFO [main] spark.HttpFileServer (Logging.scala:logInfo(59)) - HTTP File server directory is /tmp/spark-e95a2b5b-efea-41eb-93b9-0a9f7d6f6701
2015-05-19 14:05:46,413 INFO [main] spark.HttpServer (Logging.scala:logInfo(59)) - Starting HTTP Server
2015-05-19 14:05:46,477 INFO [main] server.Server (Server.java:doStart(272)) - jetty-8.y.z-SNAPSHOT
2015-05-19 14:05:46,499 INFO [main] server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SocketConnector#0.0.0.0:52737
2015-05-19 14:05:46,500 INFO [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'HTTP file server' on port 52737.
2015-05-19 14:05:46,790 INFO [main] server.Server (Server.java:doStart(272)) - jetty-8.y.z-SNAPSHOT
2015-05-19 14:05:46,805 INFO [main] server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SelectChannelConnector#0.0.0.0:4040
2015-05-19 14:05:46,805 INFO [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'SparkUI' on port 4040.
2015-05-19 14:05:46,808 INFO [main] ui.SparkUI (Logging.scala:logInfo(59)) - Started SparkUI at http://slave2.com:4040
2015-05-19 14:05:47,058 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Added JAR file:/home/maintainer/myjars/sparkpi.jar at http://[ip]:52737/jars/sparkpi.jar with timestamp 1432033547057
2015-05-19 14:05:47,190 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at /0.0.0.0:8032
2015-05-19 14:09:45,861 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at /0.0.0.0:8032
**2015-05-19 14:09:47,067 INFO [main] ipc.Client (Client.java:handleConnectionFailure(842)) - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-05-19 14:09:48,068 INFO [main] ipc.Client (Client.java:handleConnectionFailure(842)) - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...**

Aside from specifying yarn.resourcemanager.hostname property in yarn-site.xml, it's also necessary to propagate configuration files to workers.
It might be done with this line (before running spark-submit):
export SPARK_YARN_DIST_FILES=$(ls $HADOOP_CONF_DIR* | sed 's#^#file://#g' | tr '\n' ',' | sed 's/,$//')
If everything's configured correctly, you'll see RM hostname instead of 0.0.0.0 in this line:
2015-05-19 14:05:47,190 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at /0.0.0.0:8032

Exporting correct values for HADOOP_CONF_DIR fixed the issue.
export HADOOP_CONF_DIR=/your-path/hadoop/conf

Loading data from HDFS does not work with Elephantbird

I am trying to process data with elephantbird in pig but I don't succeed in loading the data. Here is my pig script:
register 'lib/elephant-bird-core-3.0.9.jar';
register 'lib/elephant-bird-pig-3.0.9.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
twitter = LOAD 'statuses.log.2013-04-01-00'
USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
DUMP twitter;
The output I get is
[main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21
[main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop1/twitter_test/pig_1374834826168.log
[main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop1/.pigbootup not found
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.hadoop:8021
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
[main] WARN org.apache.pig.backend.hadoop23.PigJobControl - falling back to default JobControl (not using hadoop 0.23 ?)
java.lang.NoSuchFieldException: jobsInProgress
at java.lang.Class.getDeclaredField(Class.java:1938)
at org.apache.pig.backend.hadoop23.PigJobControl.<clinit>(PigJobControl.java:58)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:102)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
at org.apache.pig.PigServer.storeEx(PigServer.java:933)
at org.apache.pig.PigServer.store(PigServer.java:900)
at org.apache.pig.PigServer.openIterator(PigServer.java:813)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=656085089
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job6015425922938886053.jar
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job6015425922938886053.jar created
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
[JobControl] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
[JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
[JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases twitter
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: twitter[10,10] C: R:
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://master.hadoop:50030/jobdetails.jsp?jobid=job_201307261031_0050
[main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201307261031_0050 has failed! Stop running all dependent jobs
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
[main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
[main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.3.0 0.11.0-cdh4.3.0 hadoop1 2013-07-26 12:33:48 2013-07-26 12:34:23 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201307261031_0050 twitter MAP_ONLY Message: Job failed! hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504,
Input(s):
Failed to read data from "hdfs://master.hadoop:8020/user/hadoop1/statuses.log.2013-04-01-00"
Output(s):
Failed to produce result in "hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
Details at logfile: /home/hadoop1/twitter_test/pig_1374834826168.log
The file exists and is accessible:
$ hdfs dfs -ls /user/hadoop1/statuses.log.2013-04-01-00
Found 1 items
-rw-r--r-- 3 hadoop1 supergroup 656085089 2013-07-26 11:53 /user/hadoop1/statuses.log.2013-04-01-00
This seems to be a general problem with the pig version shipped with Cloudera 4.6.0: the problem seems to be the line that says
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
I got a similar error when running another user defined function for loading data:
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
When I force pig to local mode (''-x local'') I get the more obvious error
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
So the version of Hadoop pig uses seems to be incompatible with the one shipped with Cloudera, I guess.

This is indeed a versioning problem: some libraries are not yet compatible with the new MapReduce API, see for example the issues #56, #247 and #308.
For ElephantBird the issue is solved in a recent version. Using ElephantBird 4.1 in the above code and adding the Hadoop compatibility module
register 'lib/elephant-bird-core-4.1.jar';
register 'lib/elephant-bird-pig-4.1.jar';
register 'lib/elephant-bird-hadoop-compat-4.1.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
solved the problem! :-)

Hadoop MapReduce - Pig/Cassandra - Unable to create input splits

I'm trying to run a MapReduce Job with Pig and Cassandra and I always get the error:
ERROR 2118: Unable to create input splits for: cassandra://constellation/logs
[SOLVED]
There were some environment variables I missed to set:
PIG_RPC_PORT, PIG_INITIAL_ADDRESS,
PIG_PARTITIONER
/opt/cassandra-0.7.0-beta3/contrib/pig$ bin/pig_cassandra example-script.pig
10/11/15 17:38:26 INFO pig.Main: Logging error messages to: /opt/cassandra-0.7.0-beta3/contrib/pig/pig_1289839106859.log
2010-11-15 17:38:27,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop-master-1.dkd.lan:8020
2010-11-15 17:38:29,756 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: hadoop-master-1.dkd.lan:8021
2010-11-15 17:38:32,753 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(hdfs://hadoop-master-1.dkd.lan/tmp/temp657556636/tmp-375431593:org.apache.pig.builtin.BinStorage) - 1-82 Operator Key: 1-82)
2010-11-15 17:38:32,960 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2010-11-15 17:38:33,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2010-11-15 17:38:33,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2010-11-15 17:38:33,364 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-11-15 17:38:38,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-11-15 17:38:38,999 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-11-15 17:38:39,055 [Thread-4] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-11-15 17:38:39,500 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2010-11-15 17:38:40,340 [Thread-4] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area hdfs://hadoop-master-1.dkd.lan/var/lib/hadoop-0.20/cache/mapred/mapred/staging/dkd-sprenger/.staging/job_201011101636_0011
2010-11-15 17:38:40,356 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2010-11-15 17:38:40,357 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
2010-11-15 17:38:40,402 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2010-11-15 17:38:40,517 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: cassandra://constellation/logs
Details at logfile: /opt/cassandra-0.7.0-beta3/contrib/pig/pig_1289839106859.log
Anyone who has an idea -> SOLVED
There were some environment variables I missed to set them.
enviroment:
Ubuntu Server 10.4
Versions:
hadoop: 0.20
pig: 0.7
cassandra: 0.7.0 beta3

The asker already updated the question to include the answer:
[SOLVED] There were some environment variables I missed to set:
PIG_RPC_PORT, PIG_INITIAL_ADDRESS, PIG_PARTITIONER

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

GPU resource for hadoop 3.0 / yarn - hadoop

Related

The sqoop import action job in oozie is still running

Error while doing bulkload in HBase

Running Spark on the slave node (YARN) doesn't work

Loading data from HDFS does not work with Elephantbird

Hadoop MapReduce - Pig/Cassandra - Unable to create input splits

Categories

Resources