Authentication and Security in Hadoop - hadoop

We are building system which queries Hive table. Our Service Layer will construct Hive Query based on User Selection on UI, We have some security related questions over here
• Is it Ok to pass Hive Dynamic Query constructed at service layer to a UDF/HQL in Hive ?
• Are there any SQL Injection kind of Scenarios occurs in Hive, We are Hive 0.14, it contains delete and update statements.
• How can we manage Role Authorization to access table only like perform Read instead of Write and Delete. Is there way to manage permission for Hive table. Or will it be managed by HCatalog?

Yes, you can pass a dynamic query to Hive using PowerShell APIs. Check out https://hadoopsdk.codeplex.com/
Hive 14 does support Insert, Update and Delete. Check out https://issues.apache.org/jira/browse/HIVE-5317
Role authorization is currently not supported by HDInsight (as of 6/2015) but is something we are actively investigating and hope to bring to market soon.

Role based authorization and auditing with record, fields and cell level control and dynamic masking is available on hadoop with bluetalon policy engine.

Related

Controlling/filtering user's Hive queries using Hue platform

Is there a way to somehow filter the user's Hive queries using the Hue platform, by configuring settings or something in order to allow a user to only execute read-only queries (like SELECT query)?
The most used way is to rely on Sentry to set the correct SELECT/UPDATE/INSERT permissions and those will automatically apply in Hue as Hue relies on HiveServer2.

Configure Sentry to show/hide different databases for different users

I have a cluster running with cdh-5.7.0 and configured the following setup
hadoop with kerberos
hive with LDAP authentication
hive with sentry authorization (rules stored in JDBC derby)
My goal is to restrict users to see which databases exist in my system.
E.g.:
User-A should only see database DB-A when execute show databases
User-B should only see database DB-B when execute show databases
I followed the article https://blog.cloudera.com/blog/2013/12/how-to-get-started-with-sentry-in-hive/ to make that happen. But without success.
What I achieved was that
User-A can only select tables from DB-A and not from DB-B.
User-B can only select tables from DB-B and not from DB-A.
But both can still see DB-A and DB-B when executing show databases. But i want to avoid this.
Any hints from you how the rules or the setup could looks like to get that running?
Thanks
Marko
According your description and from what I've learned from existing setups, in case of Sentry v1.6+ you need to add the following property to your hive-site.xml:
<property>
<name>hive.metastore.filter.hook</name>
<value>org.apache.sentry.binding.metastore.SentryMetaStoreFilterHook</value>
</property>
Even if you are on CDH 5.7, the MapR 5 documentation is providing some context. As well Sentry Service Interactions.
After re-starting the Hive service you should be able to see the result which you are expecting.

How to check who created database in impala

I have one Hadoop cluster (Cloudera distribution) given access to multiple user. Now from different users we are creating databases. How do i verify which user is creating which database.? Can anyone suggest me.?
Use below query:
Describe formatted databaseName.tableName;
Will show the owner and other details like table type,size etc.

Can a user without admin rights manage object access privileges in hive?

I am working with hive 0.14 mainly using beeline.
I am not an admin but I am looking to create a couple of views that the team can use.
We've got a common hive database where everyone has read+write. If I am creating certain tables/views that I don't want other people to be able to drop or modify, is it possible for me to revoke drop/write access for others?
The access to hive tables depends on HDFS access rights.
Whenever you create a new table tbl in database located in db, a new directory db/tbl will be created.
If you want to restrict write group access to that directory use hadoop fs -chmod, for example:
hadoop fs -chmod 750 db/tbl
If you want to find out where tables are located in a database, you can create a table without specifying a location, and run describe formated tbl.
You can always check what are the access rights of the tables by running hadoop fs -ls db
Regarding views:
Although Storage Based Authorization can provide access control at the level of Databases, Tables and Partitions, it can not control authorization at finer levels such as columns and views because the access control provided by the file system is at the level of directory and files. A prerequisite for fine grained access control is a data server that is able to provide just the columns and rows that a user needs (or has) access to. In the case of file system access, the whole file is served to the user. HiveServer2 satisfies this condition, as it has an API that understands rows and columns (through the use of SQL), and is able to serve just the columns and rows that your SQL query asked for.
SQL Standards Based Authorization (introduced in Hive 0.13.0, HIVE-5837) can be used to enable fine grained access control. It is based on the SQL standard for authorization, and uses the familiar grant/revoke statements to control access. It needs to be enabled through HiveServer2 configuration.
Note that for Hive command line, SQL Standards Based Authorization is disabled. This is because secure access control is not possible for the Hive command line using an access control policy in Hive, because users have direct access to HDFS and so they can easily bypass the SQL standards based authorization checks or even disable it altogether. Disabling this avoids giving a false sense of security to users.
So, in short, SQL Standards Based Authorization needs to be enabled in the config.
Then you'll be able to use: REVOKE on views.

How to use hive with multiple users

I have several users use the same hive.
Now i want each user to have a private metadata in hive.
example:
user a call show table : a1 , a2, a3 ...
user b call show table : b1 , b2 ,b3 ...
Of course when user run query they can not access table of other user.
thanks.
In order to make setup easy for new users, Hive's Metastore is
configured to store metadata locally in an embedded Apache Derby
database. Unfortunately, this configuration only allows a single user
to access the Metastore at a time. Cloudera strongly encourages users
to use a MySQL database instead. This section describes how to
configure Hive to use a remote MySQL database, which allows Hive to
support multiple users. See the Hive Metastore documentation for
additional information.
For more details see the part with heading 'Configuring the Hive Metastore' here.
Once the external meta store has been created then Hive authorization can be used to grant/restrict privileges.
This is the disclaimer from Hive
Hive authorization is not completely secure. In its current form, the authorization scheme is intended primarily to prevent good users from accidentally doing bad things, but makes no promises about preventing malicious users from doing malicious things.

Resources