/var/mapr/cluster/yarn/rm/staging/username/.staging/job_id_1234_4321/libjars/myudfs.jar (Invalid argument) - hadoop

We are using Mapr with PAM Authentication to execute select query on hive. Select query uses a myudfs.jar where we have defined our UDFs.
I have tried many links but couldn't figure out why this is happening. From the stacktrace it seems like Hadoop is not able to copy the jar into libjars directory inside /.staging. But UDF is there in the classpath.
Any help would be really appreciated.
java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. 2063.142.526190 /var/mapr/cluster/yarn/rm/staging/username/.staging/job_112233333_0002/libjars/myudfs.jar (Invalid argument)
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: 2063.142.526190 /var/mapr/cluster/yarn/rm/staging/username/.staging/job_112233333_0002/libjars/myudfs.jar (Invalid argument)
at com.mapr.fs.Inode.throwIfFailed(Inode.java:390)
at com.mapr.fs.Inode.flushPages(Inode.java:505)
at com.mapr.fs.Inode.releaseDirty(Inode.java:583)
at com.mapr.fs.MapRFsOutStream.dropCurrentPage(MapRFsOutStream.java:73)
at com.mapr.fs.MapRFsOutStream.write(MapRFsOutStream.java:85)
at com.mapr.fs.MapRFsDataOutputStream.write(MapRFsDataOutputStream.java:39)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:376)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:346)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:297)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:203)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:98)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:193)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:414)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:201)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:283)
Following is the process how we execute the query and what we do before executing it.
First we create a function as given below
CREATE FUNCTION func_name AS 'com.test.ClassContainingMapper'
This ClassContainingMapper class is packed in a udf jar named myudf.jar. We have this jar added in the classpath by using property hive.aux.jars.path.
Now, our query is as follows:
Select func_name(col1, col2) from dbname.tablename;
When this query is executed Hadoop tries to upload the jars mentioned in the classpath to it's libjars folder under staging directory. That's when it fails.
Interesting thing is that on one similar cluster this query executes successfully but on other cluster it is being failed with the exception mentioned in the stacktrace.
UPDATE:
Actually there is another query executed before the Select query. It's
ADD JAR '/path/to/the/jar/file/myudf.jar';
This makes more sense that when this query is executed then the jar is uploaded to the cluster. And during that action of uploading query fails.

Related

EMR - Sqoop import using hcatalog failing on EMR

I am using EMR - 6.4.0 (sqoop version 1.4.7 )
and using oozie workflow to import data from postgres into hive partitions using hcatalog,data is getting loaded into the table and partitions are getting created but job fails with the following error
Job commit failed: java.lang.UnsupportedOperationException: getTokenStrForm is not supported
at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getTokenStrForm(GlueMetastoreClientDelegate.java:1630)
at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.getTokenStrForm(AWSCatalogMetastoreClient.java:611)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.invoke(HiveClientCache.java:590)
at com.sun.proxy.$Proxy109.getTokenStrForm(Unknown Source)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.cancelDelegationTokens(FileOutputCommitterContainer.java:1012)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I dont see any mention about this error or limitation in EMR docs or in web.
Importing directly to table directory is working but wanted to know why hcatalog option couldn't be used

My Inputformat on hive cannot find when calling MapReduce

I'm using CDH cluster.
After writing an Inputformat, I copied it to all HiveServer2 libs and re-start HiveServer2.
When I create a table using my inputformat and write a common query like 'select *', it can run and everything is right!
BUT!
When I write a query which should using MapReduce like 'where' or 'join' or 'count', it will throw a Class Not Found error:
Error: java.io.IOException: cannot find class
com.my.parquet.MyParquetInputFormat at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:714)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Wish you helps!

Create Hive External Table on S3 throws "org.apache.hadoop.fs.s3a.S3AFileSystem not found" Exception

I'm using beeline on my local machine to run below DDL, and throws the exception.
the DDL is
CREATE TABLE `report_landing_pages`(
`google_account_id` string COMMENT 'from deserializer',
`ga_view_id` string COMMENT 'from deserializer',
`path` string COMMENT 'from deserializer',
`users` string COMMENT 'from deserializer',
`page_views` string COMMENT 'from deserializer',
`event_value` string COMMENT 'from deserializer',
`report_date` string COMMENT 'from deserializer')
PARTITIONED BY (`dt` date)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION 's3a://bucket_name/table'
the exception is
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found) at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257) at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:862) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:867) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255) ... 11 more Caused by: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:42070) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:42038) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:41964) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1199) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1185) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2399) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:93) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:752) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:740) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173) at com.sun.proxy.$Proxy34.createTable(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2330) at com.sun.proxy.$Proxy34.createTable(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:852) ... 22 more
And my local HDFS works fine with "hdfs dfs -mkdir s3a://bucket/table"
And the wierd thing is, if I created the table not on S3 first, and then manually update the location of the table to s3 in metastore later, the select statement like
select COUNT(*) from report_landing_pages group by google_account_id
works fine.
How to fix exception in DDL?
BTW, I'm running with Hive 2.3.2, Hadoop 2.7.5 under MacOS X EI Caption.
Problem solved.
After place the S3 jars, the metastore service should also be restarted, besides hiveserver2.
Did you place all the required jars s3 etc to the classpath?
(org.apache.hadoop.fs.s3a.S3AFileSystem)- this is a Hadoop class, found in the Hadoop-aws jar. An exception reporting one of these classes is missing means that this jar is not in the class path.
I tried creating the same table in my environment and it worked.
Check your : fs.s3a.access.key
"fs.s3a.secret.key","fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem"
properties in the configuration file.
Yes, that's correct. After placing the jar, the hive metastore service needs to be restarted
To restart the hive metastore, below steps needs to be followed:
1. ps -ef | grep 'hive'
Use above command to identify the Process ID (PID) that hive is using. By default, the hive used 9083 port number, so for a running hive metastore service, you can also check the PID using lsof -i:9083 command.
2. kill <process number>
This kills the existing hive metastore service.
3. hive --service metastore
This command starts the metastore service again.
OR
If you are using hive-server 2, use the following command:
$ sudo /etc/init.d/hive-metastore start
$ sudo /etc/init.d/hive-metastore stop

Unable to load Hive-JDBC driver when accessed through MapReduce program on Amazon's Elastic MapReduce

I have written a MapReduce program in which I am storing some part of output data into Hive table.
I have used Hive-JDBC driver to access Hive table via MapReduce code.
This program has compiled successfully on local machine.
After this, I created a JAR file and uploaded it on S3. Then I created an elasticmapreduce cluster and started it.
However, it is resulting into below mentioned errors:
java.lang.Throwable: Child Error at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused
by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
attempt_201407161054_0001_m_000001_0: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.jdbc.HiveDriver
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
attempt_201407161054_0001_m_000001_0: at
java.security.AccessController.doPrivileged(Native Method)
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
attempt_201407161054_0001_m_000001_0: at
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
attempt_201407161054_0001_m_000001_0: at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
attempt_201407161054_0001_m_000001_0: at
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
attempt_201407161054_0001_m_000001_0: at
java.lang.Class.forName0(Native Method)
attempt_201407161054_0001_m_000001_0: at
java.lang.Class.forName(Class.java:190)
attempt_201407161054_0001_m_000001_0: at
HubAndAuthority.InputHubMapper.configure(InputHubMapper.java:38)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
attempt_201407161054_0001_m_000001_0: at
java.lang.reflect.Method.invoke(Method.java:606)
It appears to be an issue of missing Hive-JDBC driver and it should get resolved by adding Hive-JDBC driver in classpath. However, I am not aware of the exact step to do this on Amazon's EMR.
Could you please let me know what is missing from my end and how to resolve it?
Thanks and Regards,
Prafulla
I'm not sure enough, but you should try this:
"Note
If you want your custom classpath to override the original class path, you should set the environment variable, HADOOP_USER_CLASSPATH_FIRST to true so that the HADOOP_CLASSPATH value specified in hadoop-user-env.sh is first."
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config.html
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html
Regards,
revet

ERROR in ImportTsv-HBASE BULK LOAD

I have hbase 0.94.0. I tried doing bulk import using the importtsv tool.
Here is the command i gave
./hadoop jar /home/ericsson/Desktop/ProjectFiles/hbase-0.94.0/hbase-0.94.0.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,a,b,c,d,e,f,g '-Dimporttsv.separator=,' Test1 /home/ericsson/Desktop/ProjectFiles/inputFiles1/CharginUsage-m-00000
Test1-My table that already exists in Hbase.
/home/ericsson/Desktop/ProjectFiles/inputFiles1/CharginUsage-m-00000- My directory where i have the CSV file.
I got the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/collect/Multimap
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Multimap
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 6 more
importtsv task needs Google's Guava library in order run. This library is present under $HBASE_HOME/lib/guava-.jar
It is matter of telling hadoop to fetch this guava jar during execution. Simply you could copy the jar from hbase lib to hadoop lib. A more decent solution is to add this jar path to hadoop classpath or execute the hadoop task with the below command.
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/lib/guava-<version>.jar
OR
export HADOOP_CLASSPATH=`hbase classpath ` /hadoop jar /home/ericsson/Desktop/ProjectFiles/hbase-0.94.0/hbase-0.94.0.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,a,b,c,d,e,f,g '-Dimporttsv.separator=,' Test1 /home/ericsson/Desktop/ProjectFiles/inputFiles1/CharginUsage-m-00000*

Resources