Apache Sentry: SemanticException No valid privileges Required privileges for this query - hadoop

I have unsecured cluster (CDH 5.4) and as I want to provide an access to data to more users, I would like to turn on the Sentry, so far without Kerberos (which comes after sucessful launch of Sentry).
As some other people might need Impala at the moment, I decided to set it up in Hive in first stage.
Steps I have taken:
1) I have set up 2 users: hive and tuser
tuser - group test
hive - group hive, zookeeper
group test
indexer.access, about.access, beeswax.access, filebrowser.access, hbase.write, hbase.access, help.access, impala.access, jobbrowser.access,
jobsub.access, metastore.write, metastore.access, oozie.dashboard_jobs_access, oozie.access, pig.access, proxy.access, rdbms.access,
search.access, security.impersonate, security.access, spark.access, sqoop.access, useradmin.access_view:useradmin:edit_user, useradmin.access, zookeeper.access
group hive
beeswax.access
group hive has role admin (the first one with an unlocked lock):
SERVER
server=server1 action=ALL
SERVER
server=server1 action=ALL
group test has role neco
SERVER
server=server1 action=ALL
URI
server=server1 hdfs://...:8020/user/hive/warehouse action=ALL
DATABASE
server=server1 db=default action=ALL
Moreover, the user hive is in both sets sentry.service.admin.group and sentry.service.allow.connect.
2) I have turned on the sentry
- in Hive checked the Sentry Service from "none" to "Sentry"
- in Hive Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml inserted <property> <name>sentry.hive.testing.mode</name><value>true</value></property>
+ restarted Sentry
Result:
User hive can access anything in Hive. That's what I was expecting.
User tuser can't access anything in Hive: Error while compiling statement: FAILED: SemanticException No valid privileges Required privileges for this query: Server=server1->Db=*->Table=+->action=insert;Server=server1->Db=*->Table=+->action=select;
What am I missing?

Finally I was adviced what was wrong: The Hue groups must be the same as the groups on the Namenode's linux (as the HDFS org.apache.hadoop.security.ShellBasedUnixGroupsMapping is checked). In the case of Impala, all of nodes with Impala Daemons have to have same groups.
However, I am going to overtake the groups from LDAP (option org.apache.hadoop.security.LdapGroupsMapping).

Related

Hive permission denied for user anonymous using beeline shell

I created a 3 node Hadoop cluster with 1 namenode and 2 datanode.
I can perform a read/write query from Hive shell, but not beeline.
I found many suggestions and answers related to this issue.
In every suggestion it was mentioned to give the permission for the userX for each individual table.
But I don't know how to set the permission for an anonymous user once and for all.
Why I am getting the user anonymous while accessing the data from beeline or from a Java program?
I am able to read the data from the both beeline shell and using Java JDBC connection.
But I can't insert the data in the table.
This is my jdbc connection : jdbc:hive2://hadoop01:10000.
Below is the error i am getting while on insert request:
Permission denied: user=anonymous, access=WRITE, inode="/user/hive/warehouse/test_log/.hive-staging_hive_2017-10-07_06-54-36_347_6034469031019245441-1":hadoop:supergroup:drwxr-xr-x
Beeline syntax is
beeline -n username -u "url"
I assume you are missing the username. Also, no one but the hadoop user has WRITE access to that table anyway
If you don't have full control over the table permissions, you can try relocating the staging directory with the setting hive.exec.stagingdir
If no database is specified in the connection URL to connect, like
jdbc:hive2://hadoop01:10000/default
then beeline connects to the database DEFAULT , and while inserting the data into the table - first the data is loaded to a temporary table in default database and then loaded to the actual table.
So, you need to give the user access to the DEFAULT database also, or you can connect to the databases where you have access to.
jdbc:hive2://hadoop01:10000/your_db

Hive Transactions + Remote Metastore Error

I'm running Hive 2.1.1 on EMR 5.5.0 with a remote mysql metastore DB. I need to enable transactions on hive, but when I follow the configuration here and run any query, I get the following error
FAILED: Error in acquiring locks: Error communicating with the metastore
Settings on the metastore:
hive.compactor.worker.threads = 0
hive.compactor.initiator.on = true
Settings in the hive client:
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
This only happens when I set hive.txn.manager, so my hive metastore is definitely online.
I've tried some of the old suggestions of turning hive test features on which didn't work, but I don't think this is a test feature anymore. I can't turn off concurrency as a similar post in SO suggests because I need concurrency. It seems like the problem is that either DbTxnManager isn't getting the remote metastore connection info properly from hive config or the mysqldb is missing some tables required by DbTxnManager. I have datanucleus.autoCreateTables=true.
It looks like hive wasn't properly creating the tables needed for the transaction manager. I'm not sure where it was getting its schema, but it was definitely wrong.
So we just ran the hive-txn-schema query to setup the schema manually. We'll do this at the start of any of our clusters from now on.
https://github.com/apache/hive/blob/master/metastore/scripts/upgrade/mysql/hive-txn-schema-2.1.0.mysql.sql
The error from
FAILED: Error in acquiring locks: Error communicating with the metastore
sometimes because of it without any data, you need to initialization some data in your tables. for example below.
create table t1(id int, name string)
clustered by (id) into 8 buckets
stored as orc TBLPROPERTIES ('transactional'='true');

Configure Sentry to show/hide different databases for different users

I have a cluster running with cdh-5.7.0 and configured the following setup
hadoop with kerberos
hive with LDAP authentication
hive with sentry authorization (rules stored in JDBC derby)
My goal is to restrict users to see which databases exist in my system.
E.g.:
User-A should only see database DB-A when execute show databases
User-B should only see database DB-B when execute show databases
I followed the article https://blog.cloudera.com/blog/2013/12/how-to-get-started-with-sentry-in-hive/ to make that happen. But without success.
What I achieved was that
User-A can only select tables from DB-A and not from DB-B.
User-B can only select tables from DB-B and not from DB-A.
But both can still see DB-A and DB-B when executing show databases. But i want to avoid this.
Any hints from you how the rules or the setup could looks like to get that running?
Thanks
Marko
According your description and from what I've learned from existing setups, in case of Sentry v1.6+ you need to add the following property to your hive-site.xml:
<property>
<name>hive.metastore.filter.hook</name>
<value>org.apache.sentry.binding.metastore.SentryMetaStoreFilterHook</value>
</property>
Even if you are on CDH 5.7, the MapR 5 documentation is providing some context. As well Sentry Service Interactions.
After re-starting the Hive service you should be able to see the result which you are expecting.

Hive User Impersonation

I need some information on Hive user impersonation. I did some research on it and found that by default HiveServer2 performs a query processing as the user who submitted query but if hive.server2.enable.doAs set it to false then query will be run as user who started hiveserver2 process.
I need to a create jdbc/thirft connection with hiveserver2 with service account (let’s say with user ‘ABC’ is logged in) but would like to run my hive statement with user that I pass , for example with user ‘XYZ’.
Let me know if anyone has done this before.
Is it possible to do this for Hive ?
With Hive impersonation enabled , you can run your queries that you will pass along with connection string .
For Example
jdbc:hive2://localhost:10000/default,username,password
In this case , your job will run with username that you are passing instead of hive user.
hope this helps.

Oozie cannot access metastore database in HUE

I'm on CDH4, in HUE, I have a database in Metastore Manager named db1. I can run Hive queries that create objects in db1 with no problem. I put those same queries in scripts and run them through Oozie and they fail with this message:
FAILED: SemanticException 0:0 Error creating temporary folder on: hdfs://lad1dithd1002.thehartford.com:8020/appl/hive/warehouse/db1.db. Error encountered near token 'TOK_TMP_FILE'
I created db1 in the Metastore Manager as HUE user db1, and as HUE user admin, and as HUE user db1, and nothing works. The db1 user also has a db1 ID on the underlying Linux cluster, if that helps.
I have chmod'd the /appl/hive/warehouse/db1.db to read, write, execute to owner, group, other, and none of that makes a difference.
I'm almost certain it's a rights issue, but what? Oddly, I have this working under another ID where I had hacked some combination of things that seemed to have worked, but I'm not sure how. It was all in HUE, so if possible, I'd like a solution doable in HUE so I can easily hand it off to folks who prefer to work at the GUI level.
Thanks!
Did you also add hive-site.xml into your Files and Job XML fields? Hue has great tutorial about how to run Hive job. Watch it here. Adding of hive-site.xml is described around 4:20.
Exact same error on Hadoop MapR.
Root cause : Main database and temporary(scrat) database were created by different users.
Resolution : Creating both folders with same ID might help with this.

Resources