Configure Sentry to show/hide different databases for different users - hadoop

I have a cluster running with cdh-5.7.0 and configured the following setup
hadoop with kerberos
hive with LDAP authentication
hive with sentry authorization (rules stored in JDBC derby)
My goal is to restrict users to see which databases exist in my system.
E.g.:
User-A should only see database DB-A when execute show databases
User-B should only see database DB-B when execute show databases
I followed the article https://blog.cloudera.com/blog/2013/12/how-to-get-started-with-sentry-in-hive/ to make that happen. But without success.
What I achieved was that
User-A can only select tables from DB-A and not from DB-B.
User-B can only select tables from DB-B and not from DB-A.
But both can still see DB-A and DB-B when executing show databases. But i want to avoid this.
Any hints from you how the rules or the setup could looks like to get that running?
Thanks
Marko

According your description and from what I've learned from existing setups, in case of Sentry v1.6+ you need to add the following property to your hive-site.xml:
<property>
<name>hive.metastore.filter.hook</name>
<value>org.apache.sentry.binding.metastore.SentryMetaStoreFilterHook</value>
</property>
Even if you are on CDH 5.7, the MapR 5 documentation is providing some context. As well Sentry Service Interactions.
After re-starting the Hive service you should be able to see the result which you are expecting.

Related

Ranger is showing Hadoop-ACL used to grant access instead of Ranger-ACL

I have two hiveserver2 instances running. One uses Binary Transport (for HUE), the other uses HTTP transport (for ODBC connections).
I am trying to grant access for one user (ra01 in the screenshot) to only a specific table in Hive.
The user account is intended to be used for ODBC connection from PowerBI.
I set the policy as seen in the screenshot. The policy seems to work if I try it in HUE but if I use the same user via ODBC, it seems to grant all permissions and it is using "Hadoop-ACL" instead of "Ranger-ACL" as seen in the attached screenshot.
What am I missing?
It appears the issue is with the folder permissions in HDFS.
I followed the instructions in the link below; it fixed the issue for me:
Best practices in HDFS authorization with Apache Ranger

Apache Drill - Not listing tables in Hive DB

I have created the necessary storage plugins and the relevant databases in hive show up when issuing the show database command.
When using one of the hive databases though using the use command, I found that I cannot select any tables which are within that database. Looking further, when issuing the show table command, no tables within that database show up via Apache Drill whereas they appear fine in Hive.
Is there anything I am missing by any chance in terms of granting permission via Hive to any user? How exactly does Apache Drill connect to Hive to run the relevant jobs?
Appreciate your responses.
Show tables; will not list hive tables as of now. It's better to create views on top of hive tables. These Views will show up on show tables; command.

How to check who created database in impala

I have one Hadoop cluster (Cloudera distribution) given access to multiple user. Now from different users we are creating databases. How do i verify which user is creating which database.? Can anyone suggest me.?
Use below query:
Describe formatted databaseName.tableName;
Will show the owner and other details like table type,size etc.

Authentication and Security in Hadoop

We are building system which queries Hive table. Our Service Layer will construct Hive Query based on User Selection on UI, We have some security related questions over here
• Is it Ok to pass Hive Dynamic Query constructed at service layer to a UDF/HQL in Hive ?
• Are there any SQL Injection kind of Scenarios occurs in Hive, We are Hive 0.14, it contains delete and update statements.
• How can we manage Role Authorization to access table only like perform Read instead of Write and Delete. Is there way to manage permission for Hive table. Or will it be managed by HCatalog?
Yes, you can pass a dynamic query to Hive using PowerShell APIs. Check out https://hadoopsdk.codeplex.com/
Hive 14 does support Insert, Update and Delete. Check out https://issues.apache.org/jira/browse/HIVE-5317
Role authorization is currently not supported by HDInsight (as of 6/2015) but is something we are actively investigating and hope to bring to market soon.
Role based authorization and auditing with record, fields and cell level control and dynamic masking is available on hadoop with bluetalon policy engine.

How to use hive with multiple users

I have several users use the same hive.
Now i want each user to have a private metadata in hive.
example:
user a call show table : a1 , a2, a3 ...
user b call show table : b1 , b2 ,b3 ...
Of course when user run query they can not access table of other user.
thanks.
In order to make setup easy for new users, Hive's Metastore is
configured to store metadata locally in an embedded Apache Derby
database. Unfortunately, this configuration only allows a single user
to access the Metastore at a time. Cloudera strongly encourages users
to use a MySQL database instead. This section describes how to
configure Hive to use a remote MySQL database, which allows Hive to
support multiple users. See the Hive Metastore documentation for
additional information.
For more details see the part with heading 'Configuring the Hive Metastore' here.
Once the external meta store has been created then Hive authorization can be used to grant/restrict privileges.
This is the disclaimer from Hive
Hive authorization is not completely secure. In its current form, the authorization scheme is intended primarily to prevent good users from accidentally doing bad things, but makes no promises about preventing malicious users from doing malicious things.

Resources