GPU resource for hadoop 3.0 / yarn - hadoop

I try to use Hadoop 3.0 GA release with gpu, but when I executed the below shell command, there is an error and not working with gpu. please check the below and just let you know the shell command. I guess that there are misconfigurations from me.
2018-01-09 15:04:49,256 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:main(355)) - Initializing ApplicationMaster
2018-01-09 15:04:49,391 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:init(514)) - Application master for app, appId=1, clustertimestamp=1515477741976, attemptId=1
2018-01-09 15:04:49,418 WARN [main] distributedshell.ApplicationMaster (ApplicationMaster.java:init(626)) - Timeline service is not enabled
2018-01-09 15:04:49,418 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(649)) - Starting ApplicationMaster
2018-01-09 15:04:49,542 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(60)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-01-09 15:04:49,623 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(659)) - Executing with tokens:
2018-01-09 15:04:49,744 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(662)) - Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 1 cluster_timestamp: 1515477741976 } attemptId: 1 } keyId: 1619387150)
2018-01-09 15:04:49,801 INFO [main] client.RMProxy (RMProxy.java:newProxyInstance(133)) - Connecting to ResourceManager at /0.0.0.0:8030
2018-01-09 15:04:49,886 INFO [main] impl.NMClientAsyncImpl (NMClientAsyncImpl.java:serviceInit(138)) - Upper bound of the thread pool size is 500
2018-01-09 15:04:49,889 WARN [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(786)) - Timeline service is not enabled
2018-01-09 15:04:50,170 INFO [main] conf.Configuration (Configuration.java:getConfResourceAsInputStream(2656)) - resource-types.xml not found
2018-01-09 15:04:50,170 INFO [main] resource.ResourceUtils (ResourceUtils.java:addResourcesFileToConf(395)) - Unable to find 'resource-types.xml'.
2018-01-09 15:04:50,183 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,185 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,185 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,185 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,187 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,187 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,188 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,188 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,188 WARN [main] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:50,188 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(717)) - Max mem capability of resources in this cluster 8192
2018-01-09 15:04:50,188 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(720)) - Max vcores capability of resources in this cluster 4
2018-01-09 15:04:50,189 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:run(739)) - appattempt_1515477741976_0001_000001 received 0 previous attempts' running containers on AM registration.
2018-01-09 15:04:50,202 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:setupContainerAskForRM(1311)) - Requested container ask: Capability[<memory:-1, vCores:-1>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[gpu-1]
2018-01-09 15:04:50,246 WARN [AMRM Heartbeater thread] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:51,255 WARN [AMRM Heartbeater thread] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:52,273 WARN [AMRM Heartbeater thread] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
2018-01-09 15:04:52,278 INFO [AMRM Callback Handler Thread] distributedshell.ApplicationMaster (ApplicationMaster.java:onContainersAllocated(957)) - Got response from RM for container ask, allocatedCnt=1
2018-01-09 15:04:52,278 WARN [AMRM Callback Handler Thread] pb.ResourcePBImpl (ResourcePBImpl.java:initResources(142)) - Got unknown resource type: yarn.io/gpu; skipping
And the shell command that I executed with respect to YARN-7223 ticket is followed by,
yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \ -jar <path/to/hadoop-yarn-applications-distributedshell.jar> \ -shell_command /usr/local/nvidia/bin/nvidia-smi -container_resource_profile gpu-1
Thanks in advance.

Related

The sqoop import action job in oozie is still running

The sqoop import action job in oozie is still running.
What should I check?
oozie version:5.2.1
hadoop version:3.3.4
sqoop version:1.4.7
>>> Invoking Sqoop command line now >>>
2022-10-26 08:26:56,510 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2022-10-26 08:26:56,543 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.7
2022-10-26 08:26:56,556 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
2022-10-26 08:26:56,566 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2022-10-26 08:26:56,607 [main] INFO org.apache.sqoop.manager.MySQLManager - Preparing to use a MySQL streaming resultset.
2022-10-26 08:26:56,607 [main] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation
2022-10-26 08:26:56,890 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `calendar` AS t LIMIT 1
2022-10-26 08:26:56,912 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `calendar` AS t LIMIT 1
2022-10-26 08:26:56,923 [main] INFO org.apache.sqoop.orm.CompilationManager - $HADOOP_MAPRED_HOME is not set
log4j: Finalizing appender named [EventCounter].
2022-10-26 08:26:58,284 [main] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-hadoop/compile/92647049c21a99ec3fe668f737f0bf1a/calendar.jar
2022-10-26 08:26:58,296 [main] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql.
2022-10-26 08:26:58,296 [main] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct
2022-10-26 08:26:58,296 [main] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path.
2022-10-26 08:26:58,296 [main] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql)
2022-10-26 08:26:58,305 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of calendar
2022-10-26 08:26:58,306 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2022-10-26 08:26:58,311 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2022-10-26 08:26:58,329 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2022-10-26 08:26:58,331 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
2022-10-26 08:26:58,393 [main] INFO org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider - Connecting to ResourceManager at bigdata/172.3.031.123:8032
2022-10-26 08:26:58,496 [main] INFO org.apache.hadoop.mapreduce.JobResourceUploader - Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/oozie/.staging/job_1666660764861_0196
2022-10-26 08:26:58,659 [main] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation
2022-10-26 08:26:58,694 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2022-10-26 08:26:58,800 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1666660764861_0196
2022-10-26 08:26:58,801 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Executing with tokens: [Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 195 cluster_timestamp: 1666660764861 } attemptId: 1 } keyId: 1397232171)]
2022-10-26 08:26:58,980 [main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1666660764861_0196
2022-10-26 08:26:59,016 [main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://bigdata:8088/proxy/application_1666660764861_0196/
2022-10-26 08:26:59,016 [main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://bigdata:8088/proxy/application_1666660764861_0196/
2022-10-26 08:26:59,017 [main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_1666660764861_0196
2022-10-26 08:26:59,017 [main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_1666660764861_0196

Error while doing bulkload in HBase

Im trying to do bulkload in HBase but below exception is coming while loading the data...
Application application_1439213972129_0080 initialization failed (exitCode=255) with output: Requested user root is not whitelisted and has id 0,which is below the minimum allowed 500
Failing this attempt. Failing the application.
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,personal:Name,Profession:Position_Title,Profession:Department,personal:Employee_Annual_Salary -Dimporttsv.separator=',' /tables/emp_salary_new1 /mapr/MapRDev/apps/Datasets/Employee_Details.csv
2015-08-13 18:24:33,076 INFO [main] mapreduce.TableMapReduceUtil: Setting speculative execution off for bulkload operation
2015-08-13 18:24:33,123 INFO [main] mapreduce.TableMapReduceUtil: Configured 'hbase.mapreduce.mapr.tablepath' to /tables/emp_salary_new1
2015-08-13 18:24:33,220 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2015-08-13 18:24:33,372 INFO [main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2015-08-13 18:24:33,735 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2015-08-13 18:24:33,770 INFO [main] mapreduce.TableOutputFormat: Created table instance for /tables/emp_salary_new1
2015-08-13 18:24:34,252 INFO [main] input.FileInputFormat: Total input paths to process : 1
2015-08-13 18:24:34,294 INFO [main] mapreduce.JobSubmitter: number of splits:1
2015-08-13 18:24:34,535 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1439213972129_0055
2015-08-13 18:24:34,792 INFO [main] security.ExternalTokenManagerFactory: Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager
2015-08-13 18:24:35,031 INFO [main] impl.YarnClientImpl: Submitted application application_1439213972129_0055
2015-08-13 18:24:35,114 INFO [main] mapreduce.Job: The url to track the job: http://hadoop-c02n02.ss.sw.ericsson.se:8088/proxy/application_1439213972129_0055/
2015-08-13 18:24:35,115 INFO [main] mapreduce.Job: Running job: job_1439213972129_0055
2015-08-13 18:24:53,253 INFO [main] mapreduce.Job: Job job_1439213972129_0055 running in uber mode : false
2015-08-13 18:24:53,256 INFO [main] mapreduce.Job: map 0% reduce 0%
2015-08-13 18:24:53,281 INFO [main] mapreduce.Job: Job job_1439213972129_0055 failed with state FAILED due to: Application application_1439213972129_0055 failed 2 times due to AM Container for appattempt_1439213972129_0055_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://hadoop-c02n02.ss.sw.ericsson.se:8088/cluster/app/application_1439213972129_0055Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e02_1439213972129_0055_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : user is mapradm
main : requested yarn user is mapradm
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
2015-08-13 18:24:53,320 INFO [main] mapreduce.Job: Counters: 0
Looks like you are loading data in MapR DB not in hbase.But its fine hbase commands are compatible with MarDB. I just a small change in your command and see if that works for you.
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,personal:Name,Profession:Position_Title,Profession:Department,personal:Employee_Annual_Salary '-Dimporttsv.separator=,' /tables/emp_salary_new1 /mapr/MapRDev/apps/Datasets/Employee_Details.csv

Running Spark on the slave node (YARN) doesn't work

I can run SparkPi example on the master node, but when I try the same command
"spark-submit --class SparkPi --master yarn-client sparkpi.jar 10"
on the slave node, I got an error:
2015-05-19 14:05:44,881 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing view acls to: maintainer
2015-05-19 14:05:44,886 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing modify acls to: maintainer
2015-05-19 14:05:44,887 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(maintainer); users with modify permissions: Set(maintainer)
2015-05-19 14:05:45,389 INFO [sparkDriver-akka.actor.default-dispatcher-4] slf4j.Slf4jLogger (Slf4jLogger.scala:applyOrElse(80)) - Slf4jLogger started
2015-05-19 14:05:45,443 INFO [sparkDriver-akka.actor.default-dispatcher-4] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Starting remoting
2015-05-19 14:05:45,641 INFO [sparkDriver-akka.actor.default-dispatcher-3] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting started; listening on addresses :[akka.tcp://sparkDriver#slave2.com:33055]
2015-05-19 14:05:45,644 INFO [sparkDriver-akka.actor.default-dispatcher-3] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting now listens on addresses: [akka.tcp://sparkDriver#slave2.com:33055]
2015-05-19 14:05:45,653 INFO [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'sparkDriver' on port 33055.
2015-05-19 14:05:45,674 INFO [main] spark.SparkEnv (Logging.scala:logInfo(59)) - Registering MapOutputTracker
2015-05-19 14:05:45,688 INFO [main] spark.SparkEnv (Logging.scala:logInfo(59)) - Registering BlockManagerMaster
2015-05-19 14:05:45,707 INFO [main] storage.DiskBlockManager (Logging.scala:logInfo(59)) - Created local directory at /tmp/spark-local-20150519140545-c81b
2015-05-19 14:05:45,712 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - MemoryStore started with capacity 265.4 MB
2015-05-19 14:05:46,205 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-05-19 14:05:46,408 INFO [main] spark.HttpFileServer (Logging.scala:logInfo(59)) - HTTP File server directory is /tmp/spark-e95a2b5b-efea-41eb-93b9-0a9f7d6f6701
2015-05-19 14:05:46,413 INFO [main] spark.HttpServer (Logging.scala:logInfo(59)) - Starting HTTP Server
2015-05-19 14:05:46,477 INFO [main] server.Server (Server.java:doStart(272)) - jetty-8.y.z-SNAPSHOT
2015-05-19 14:05:46,499 INFO [main] server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SocketConnector#0.0.0.0:52737
2015-05-19 14:05:46,500 INFO [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'HTTP file server' on port 52737.
2015-05-19 14:05:46,790 INFO [main] server.Server (Server.java:doStart(272)) - jetty-8.y.z-SNAPSHOT
2015-05-19 14:05:46,805 INFO [main] server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SelectChannelConnector#0.0.0.0:4040
2015-05-19 14:05:46,805 INFO [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'SparkUI' on port 4040.
2015-05-19 14:05:46,808 INFO [main] ui.SparkUI (Logging.scala:logInfo(59)) - Started SparkUI at http://slave2.com:4040
2015-05-19 14:05:47,058 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Added JAR file:/home/maintainer/myjars/sparkpi.jar at http://[ip]:52737/jars/sparkpi.jar with timestamp 1432033547057
2015-05-19 14:05:47,190 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at /0.0.0.0:8032
2015-05-19 14:09:45,861 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at /0.0.0.0:8032
**2015-05-19 14:09:47,067 INFO [main] ipc.Client (Client.java:handleConnectionFailure(842)) - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-05-19 14:09:48,068 INFO [main] ipc.Client (Client.java:handleConnectionFailure(842)) - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...**
Aside from specifying yarn.resourcemanager.hostname property in yarn-site.xml, it's also necessary to propagate configuration files to workers.
It might be done with this line (before running spark-submit):
export SPARK_YARN_DIST_FILES=$(ls $HADOOP_CONF_DIR* | sed 's#^#file://#g' | tr '\n' ',' | sed 's/,$//')
If everything's configured correctly, you'll see RM hostname instead of 0.0.0.0 in this line:
2015-05-19 14:05:47,190 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at /0.0.0.0:8032
Exporting correct values for HADOOP_CONF_DIR fixed the issue.
export HADOOP_CONF_DIR=/your-path/hadoop/conf

Loading data from HDFS does not work with Elephantbird

I am trying to process data with elephantbird in pig but I don't succeed in loading the data. Here is my pig script:
register 'lib/elephant-bird-core-3.0.9.jar';
register 'lib/elephant-bird-pig-3.0.9.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
twitter = LOAD 'statuses.log.2013-04-01-00'
USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
DUMP twitter;
The output I get is
[main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21
[main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop1/twitter_test/pig_1374834826168.log
[main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop1/.pigbootup not found
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.hadoop:8021
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
[main] WARN org.apache.pig.backend.hadoop23.PigJobControl - falling back to default JobControl (not using hadoop 0.23 ?)
java.lang.NoSuchFieldException: jobsInProgress
at java.lang.Class.getDeclaredField(Class.java:1938)
at org.apache.pig.backend.hadoop23.PigJobControl.<clinit>(PigJobControl.java:58)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:102)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
at org.apache.pig.PigServer.storeEx(PigServer.java:933)
at org.apache.pig.PigServer.store(PigServer.java:900)
at org.apache.pig.PigServer.openIterator(PigServer.java:813)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=656085089
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job6015425922938886053.jar
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job6015425922938886053.jar created
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
[JobControl] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
[JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
[JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases twitter
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: twitter[10,10] C: R:
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://master.hadoop:50030/jobdetails.jsp?jobid=job_201307261031_0050
[main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201307261031_0050 has failed! Stop running all dependent jobs
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
[main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
[main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.3.0 0.11.0-cdh4.3.0 hadoop1 2013-07-26 12:33:48 2013-07-26 12:34:23 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201307261031_0050 twitter MAP_ONLY Message: Job failed! hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504,
Input(s):
Failed to read data from "hdfs://master.hadoop:8020/user/hadoop1/statuses.log.2013-04-01-00"
Output(s):
Failed to produce result in "hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201307261031_0050
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
Details at logfile: /home/hadoop1/twitter_test/pig_1374834826168.log
The file exists and is accessible:
$ hdfs dfs -ls /user/hadoop1/statuses.log.2013-04-01-00
Found 1 items
-rw-r--r-- 3 hadoop1 supergroup 656085089 2013-07-26 11:53 /user/hadoop1/statuses.log.2013-04-01-00
This seems to be a general problem with the pig version shipped with Cloudera 4.6.0: the problem seems to be the line that says
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
I got a similar error when running another user defined function for loading data:
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
When I force pig to local mode (''-x local'') I get the more obvious error
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
So the version of Hadoop pig uses seems to be incompatible with the one shipped with Cloudera, I guess.
This is indeed a versioning problem: some libraries are not yet compatible with the new MapReduce API, see for example the issues #56, #247 and #308.
For ElephantBird the issue is solved in a recent version. Using ElephantBird 4.1 in the above code and adding the Hadoop compatibility module
register 'lib/elephant-bird-core-4.1.jar';
register 'lib/elephant-bird-pig-4.1.jar';
register 'lib/elephant-bird-hadoop-compat-4.1.jar';
register 'lib/google-collections-1.0.jar';
register 'lib/json-simple-1.1.jar';
solved the problem! :-)

Hadoop MapReduce - Pig/Cassandra - Unable to create input splits

I'm trying to run a MapReduce Job with Pig and Cassandra and I always get the error:
ERROR 2118: Unable to create input splits for: cassandra://constellation/logs
[SOLVED]
There were some environment variables I missed to set:
PIG_RPC_PORT, PIG_INITIAL_ADDRESS,
PIG_PARTITIONER
/opt/cassandra-0.7.0-beta3/contrib/pig$ bin/pig_cassandra example-script.pig
10/11/15 17:38:26 INFO pig.Main: Logging error messages to: /opt/cassandra-0.7.0-beta3/contrib/pig/pig_1289839106859.log
2010-11-15 17:38:27,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop-master-1.dkd.lan:8020
2010-11-15 17:38:29,756 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: hadoop-master-1.dkd.lan:8021
2010-11-15 17:38:32,753 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(hdfs://hadoop-master-1.dkd.lan/tmp/temp657556636/tmp-375431593:org.apache.pig.builtin.BinStorage) - 1-82 Operator Key: 1-82)
2010-11-15 17:38:32,960 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2010-11-15 17:38:33,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2010-11-15 17:38:33,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2010-11-15 17:38:33,364 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-11-15 17:38:38,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-11-15 17:38:38,999 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-11-15 17:38:39,055 [Thread-4] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-11-15 17:38:39,500 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2010-11-15 17:38:40,340 [Thread-4] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area hdfs://hadoop-master-1.dkd.lan/var/lib/hadoop-0.20/cache/mapred/mapred/staging/dkd-sprenger/.staging/job_201011101636_0011
2010-11-15 17:38:40,356 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2010-11-15 17:38:40,357 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
2010-11-15 17:38:40,402 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2010-11-15 17:38:40,517 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: cassandra://constellation/logs
Details at logfile: /opt/cassandra-0.7.0-beta3/contrib/pig/pig_1289839106859.log
Anyone who has an idea -> SOLVED
There were some environment variables I missed to set them.
enviroment:
Ubuntu Server 10.4
Versions:
hadoop: 0.20
pig: 0.7
cassandra: 0.7.0 beta3
The asker already updated the question to include the answer:
[SOLVED] There were some environment variables I missed to set:
PIG_RPC_PORT, PIG_INITIAL_ADDRESS, PIG_PARTITIONER

Resources