Getting permission denied error when executing Hive query - hadoop

I'm getting the following error when executing a select count(*) from tablename query when connected to beeline.
ERROR : Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:201)
I can execute showtables; successfully but get this error anytime I execute a query. I am logged in as the hadoop user that has access to both hadoop and hive.
I've granted the folder where the tables resides full permissions:
drwxr-xr-x - hadoop supergroup 0 2015-06-03 15:44 /data1
drwxrwxrwx - hadoop hadoop 0 2015-06-05 15:23 /tmp
drwxrwxrwx - hadoop supergroup 0 2015-06-05 15:24 /user
The table is in the user directory.
Environment details:
OS: CentOS
Hadoop: HW 2.6.0
Hive: 1.2
Any help would be greatly appreciated.

Is this a hive managed table in that case could you print what you get when you do
hadoop fs -ls /user
hadoop fs -ls /user/hive
hadoop fs -ls /user/hive/warehouse
the error suggests that you are accessing a table from a user who is not the owner and seems like user does not have read and execute access

Related

FAILED: HiveAuthzPluginException Error getting permissions for hdfs

I am trying to insert data into hive table from a file in hdfs directory by the query:
$ jdbc:hive2://localhost:10000> LOAD DATA INPATH '/user/xyz/stdfiles/testtbl.txt' OVERWRITE INTO TABLE testdb.testtbl;
But the query failed resulting:
Error: Error while compiling statement: FAILED:
HiveAuthzPluginException Error getting permissions for
hdfs://localhost:9000/user/xyz/stdfiles/testtbl.txt: null
(state=42000,code=40000)
I have tried giving permissions by the following command which gives no error:
$ hdfs dfs -chown -R stdfiles /user/xyz/stdfiles
$ hdfs dfs -chmod -R 777 /user/xyz/stdfiles/testtbl.txt
Checked:
$ hdfs dfs -ls /user/xyz/stdfiles
19/05/22 09:15:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rwxrwxrwx 1 stdfiles supergroup 6 2019-05-22 08:45 /user/xyz/stdfiles/testtbl.txt
Inserting data successfully is by desired output
Add following properties in hadoop configuration file core-site.xml worked for me :)
<property>
<name>hadoop.proxyuser.niazullah.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.niazullah.groups</name>
<value>*</value>
</property>
Also check for user access hdfs:
$ hdfs dfs -ls /user
Output:
drwxr-xr-x - main supergroup 0 2019-05-22 13:22 /user/test
Where "main" is the user change it do the hive user

Cannot access Hive internal tables-AccessControlException

My user id and my team cannot access any of the internal tables in hive db. when we fire up the queries in HUE and 'CLI' as well, we are getting
'AccessControlException', please find the log below,
INFO : set mapreduce.job.reduces=<number> INFO : Cleaning up the staging area maprfs:/var/mapr/cluster/yarn/rm/staging/keswara/.staging/job_1494760161412_0139
ERROR : Job Submission failed with exception org.apache.hadoop.security.AccessControlException
(User keswara(user id 1802830393) does not have access to
maprfs:///user/hive/warehouse/bistore_sit.db/wt_consumer/d_partition_number=0/000114_0)'
org.apache.hadoop.security.AccessControlException: User keswara(user id 1802830393) does not have access to maprfs:///user/hive/warehouse/bistore_sit.db/wt_consumer/d_partition_number=0/000114_0
at com.mapr.fs.MapRFileSystem.getMapRFileStatus(MapRFileSystem.java:1320)
at com.mapr.fs.MapRFileSystem.getFileStatus(MapRFileSystem.java:942)
at org.apache.hadoop.fs.FileSystem.getFileBlockLocations(FileSystem.java:741)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1762)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1747) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:307) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) at org.apache.hadoop.hive.shims.Hadoop23Shims$1.listStatus(Hadoop23Shims.java:148) at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:218) at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:75) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:310) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:472) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:573) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:331) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:323) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:199) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:421)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:421)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
any of the user can't access the internal tables right now,am part of the mapr group and sudo user as well!
and the table and partitions ownership belongs to the mapr group and the permissions are look good though!
[mapr#SAN2LPMR03 mapr]$ hadoop fs -ls /user/hive/warehouse/bistore.db/wt_consumer
Found 1 items
drwxrwxrwt - mapr mapr 1 2017-03-24 11:51 /user/hive/warehouse/bistore.db/wt_consumer/d_partition_number=__HIVE_DEFAULT_PARTITION__
Please help me to sort this out! Really appreciate your help!
If the tables are in parquet format then the files for that table will have write access only for the user who has created the table.
For this you can change the user permissions for that files using statement like below
hdfs dfs -chomd 777 /user/hive/warehouse/bistore_sit.db/wt_con‌​sumer/d_partitio‌​n_nu‌​mber=0/000114_‌​0/*
This statement will grant all users all the permissions to that particular files.
I have noticed the following while testing for some tables in both CSV and parquet formats.
When you create hive table in CSV format the table will have 777 permission for all users who have access to the group you are part of.
But when the hive table is created in parquet format only the user who has created the table will have write access. I think it has to do something with parquet format
[root#psnode44 hive-2.1]# hadoop fs -ls /user/hive/warehouse/
Found 1 items
drwxrw-rw- - mapr mapr 2 2017-06-28 12:49 /user/hive/warehouse/test
0: jdbc:hive2://10.20.30.44:10000/> select *from test;
Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: User basa(user id 5005) does not have access to maprfs:/user/hive/warehouse/test (state=,code=0)
[root#psnode44 hive-2.1]# hadoop fs -ls /user/hive/warehouse/
Found 1 items
drwxrwxrwx - mapr mapr 2 2017-06-28 12:49 /user/hive/warehouse/test
Even thought, I changed the chmod on warehouse,still its getting same error.
[root#psnode44 hive-2.1]# hadoop fs -chmod -R 777 /user/hive/warehouse/
[root#psnode44 hive-2.1]# hadoop fs -ls /user/hive/warehouse/
Found 1 items
drwxrwxrwx - mapr mapr 2 2017-06-28 12:49 /user/hive/warehouse/test
0: jdbc:hive2://10.20.30.44:10000/> select *from test;
Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: User basa(user id 5005) does not have access to maprfs:/user/hive/warehouse/test (state=,code=0)

How to use hadoop from spark thrift server?

Please consider the following setup.
hadoop version 2.6.4
spark version 2.1.0
OS CentOS Linux release 7.2.1511 (Core)
All software is installed on a single machine as a single node cluster, spark is installed in standalone mode.
I am trying to use Spark Thrift Server.
To start the spark thrift server I run the shell script
start-thriftserver.sh
After running the thrift server, I can run beeline command line tool and issue the following commands:
The commands run successfully:
!connect jdbc:hive2://localhost:10000 user_name '' org.apache.hive.jdbc.HiveDriver
create database testdb;
use testdb;
create table names_tab(a int, name string) row format delimited fields terminated by ' ';
My first question is where on haddop is the underlying file/folder for this table/database created?
The problem is even if hadoop is stopped using stop-all.sh, still the create table/database command is successful,
which makes me think that the table is not created on hadoop at all.
My second question is how do I tell spark where in the world is hadoop installed?
and ask spark to use hadoop as the underlying data store for all queries run from beeline.
Am I supposed to install spark in some other mode?
Thanks in advance.
My objective was to get the beeline command line utility work through Spark Thrift Server using hadoop as underlying data-store and I got it to work. My setup was like this:
Hadoop <--> Spark <--> SparkThriftServer <--> beeline
I wanted to configure spark in such a manner that it uses hadoop for all queries run at beeline command line utility.
The trick was to specify the following property in spark-defaults.xml.
spark.sql.warehouse.dir hdfs://localhost:9000/user/hive/warehouse
By default spark uses derby for both meta data and the data itself (called warehouse in spark)
In order to have spark use hadoop as warehouse I had to add this property.
Here is a sample output
./beeline
Beeline version 1.0.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000 abbasbutt '' org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://localhost:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/abbasbutt/Projects/hadoop_fdw/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/abbasbutt/Projects/hadoop_fdw/apache-hive-1.0.1-bin/lib/hive-jdbc-1.0.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Spark SQL (version 2.1.0)
Driver: Hive JDBC (version 1.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000> create database my_test_db;
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.379 seconds)
0: jdbc:hive2://localhost:10000> use my_test_db;
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.03 seconds)
0: jdbc:hive2://localhost:10000> create table my_names_tab(a int, b string) row format delimited fields terminated by ' ';
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.11 seconds)
0: jdbc:hive2://localhost:10000>
Here are the corresponding files in hadoop
[abbasbutt#localhost test]$ hadoop fs -ls /user/hive/warehouse/
17/01/19 10:48:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:45 /user/hive/warehouse/fdw_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:23 /user/hive/warehouse/my_spark_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-19 10:47 /user/hive/warehouse/my_test_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:45 /user/hive/warehouse/testdb.db
[abbasbutt#localhost test]$ hadoop fs -ls /user/hive/warehouse/my_test_db.db/
17/01/19 10:50:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxrwxr-x - abbasbutt supergroup 0 2017-01-19 10:50 /user/hive/warehouse/my_test_db.db/my_names_tab
[abbasbutt#localhost test]$

Hive 0.14.0 not starting

I have hadoop 1.2.1 and i have install hive 0.14.0 on single node
$ hive
Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-0.14.0.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxrwxr-x
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:444)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxrwxr-x
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:529)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:478)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:430)
... 7 more
The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxrwxr-x.
I use hadoop fs -chmod g+w /tmp/hive but not working.
Update the permission of your /tmp/hive HDFS directory using the following command
hadoop fs -chmod 777 /tmp/hive
If so can you remove /tmp/hive on both local and hdfs.
hadoop fs -rm -r /tmp/hive;
rm -rf /tmp/hive
Only temporary files are kept in this location. No problem even if we delete this, will be created when required with proper permissions.
I did a little bit of experimentation with this and thought it might be useful to someone.
When hive 0.14.0 is started without first creating /tmp/hive in HDFS, that directory is created with mode 711.
drwx--x--x - hadoop supergroup 0 2014-12-08 18:47 /tmp/hive
If instead one creates the directory via hadoop dfs -mkdir /tmp/hive it defaults to mode 755.
drwxr-xr-x - hadoop supergroup 0 2014-12-09 11:13 /tmp/hive
The minimum permissions required to allow hive to start without errors is 733.
hadoop dfs -chmod 733 /tmp/hive
Resulting in the following and hive starting successfully.
drwx-wx-wx - hadoop supergroup 0 2014-12-09 11:13 /tmp/hive
This leads me to believe that hive 0.14.0 is doing the wrong thing when it creates that directory.
Check the value for the below tag on hive-site.xml, then change the permission for the folder mentioned
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/mydir</value>
<description>Local scratch space for Hive jobs</description>
</property>
hadoop fs -rmr /tmp/mydir;
hadoop fs -mkdir /tmp/mydir;
hadoop fs -chmod 777 /tmp/mydir;
hadoop fs -chmod -R 777 /tmp/mydir;

Running Apache Pig tutorial problems

I am having some difficulties running "standard" pig tutorial - pig script1-hadoop.pig
However, because of cluster set up (users), I had to modify an example a bit. Standard tutorial expects all files on / of HDFS, which I cannot use in my case, so I created /pig dir for that purpose
drwxrwxrwx - hdfs hdfs 0 2014-03-31 11:15 /pig
with the uploaded content
-rw-r--r-- 3 jakub hdfs 10408717 2014-03-31 10:41 /pig/excite.log.bz2
I also modified the pig script script1-hadoop.pig as well, to respect those changes as follows (mainly just for load and store commands):
raw = LOAD '/pig/excite.log.bz2' USING PigStorage('\t') AS (user, time, query);
...
STORE ordered_uniq_frequency INTO '/pig/script1-hadoop-results' USING PigStorage();
I run the pig script:
[jakub#hadooptools pigtmp]$ pig script1-hadoop.pig
but with no luck and getting error:
2014-03-31 10:15:11,896 [main] ERROR org.apache.pig.tools.grunt.Grunt - You don't have permission to perform the operation. Error from the server: Permission denied: user=jakub, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:214)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:158)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5202)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5184)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5158)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3405)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3375)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3349)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:724)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:502)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59598)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
I am not quite sure why PIG script is trying to write into / on HDFS. I know that PIG can store some immediate results on HDFS so I modified pig.temp.dir property (/etc/pig/conf/pig.properties) and created location on HDFS /pig/tmp
drwxrwxrwx - jakub hdfs 0 2014-03-31 11:15 /pig/tmp
Any idea what might be wrong? Pig in local mode is ok.
Sorted.
User running Pig script has to have permissions to write to tmp directory created and /user/pig_user_running has to be present on the cluster as well with permissions allowing him to write there.
Super-user on HDFS is the user under which namenode process is running, which is typycally HDFS.

Resources