Can a user without admin rights manage object access privileges in hive? - hadoop

I am working with hive 0.14 mainly using beeline.
I am not an admin but I am looking to create a couple of views that the team can use.
We've got a common hive database where everyone has read+write. If I am creating certain tables/views that I don't want other people to be able to drop or modify, is it possible for me to revoke drop/write access for others?

The access to hive tables depends on HDFS access rights.
Whenever you create a new table tbl in database located in db, a new directory db/tbl will be created.
If you want to restrict write group access to that directory use hadoop fs -chmod, for example:
hadoop fs -chmod 750 db/tbl
If you want to find out where tables are located in a database, you can create a table without specifying a location, and run describe formated tbl.
You can always check what are the access rights of the tables by running hadoop fs -ls db
Regarding views:
Although Storage Based Authorization can provide access control at the level of Databases, Tables and Partitions, it can not control authorization at finer levels such as columns and views because the access control provided by the file system is at the level of directory and files. A prerequisite for fine grained access control is a data server that is able to provide just the columns and rows that a user needs (or has) access to. In the case of file system access, the whole file is served to the user. HiveServer2 satisfies this condition, as it has an API that understands rows and columns (through the use of SQL), and is able to serve just the columns and rows that your SQL query asked for.
SQL Standards Based Authorization (introduced in Hive 0.13.0, HIVE-5837) can be used to enable fine grained access control. It is based on the SQL standard for authorization, and uses the familiar grant/revoke statements to control access. It needs to be enabled through HiveServer2 configuration.
Note that for Hive command line, SQL Standards Based Authorization is disabled. This is because secure access control is not possible for the Hive command line using an access control policy in Hive, because users have direct access to HDFS and so they can easily bypass the SQL standards based authorization checks or even disable it altogether. Disabling this avoids giving a false sense of security to users.
So, in short, SQL Standards Based Authorization needs to be enabled in the config.
Then you'll be able to use: REVOKE on views.

Related

How to create linux/unix group based permission in hive?

we have many databases with tables in it in hive and we want to give access to user according to Linux/Unix group, currently we are using hive role and user level permission but we want to switch to linux/unix group for table and database level permission. How can we do it?
This isn't controlled at the Hive level. You'd just hadoop fs -chmod the directories where the data is
By default, Hadoop access is not authenticated or secured; username checks are based purely on strings, not really Unix group names. In other words, clients can easily bypass your checks.
In order to do this properly, you need an LDAP and Kerberos server for authorization and authentication. With an LDAP server, you can use sssd to create local users and groups across your cluster, but to properly authorize them, you should issue keytabs rather than rely on Unix permission sets.
To further secure the system, Apache Ranger can be used to restrict, obfuscate, and audit data access

How can a user with no sentry permissions on a table can execute queries if he has access to the table files in HDFS?

As per https://www.cloudera.com/documentation/enterprise/latest/topics/sg_sentry_service_config.html,
HiveServer2 impersonation lets users execute queries and access HDFS files as the connected user rather than as the super user. Access policies are applied at the file level using the HDFS permissions specified in ACLs (access control lists). Enabling HiveServer2 impersonation bypasses Sentry from the end-to-end authorization process. Specifically, although Sentry enforces access control policies on tables and views within the Hive warehouse, it does not control access to the HDFS files that underlie the tables. This means that users without Sentry permissions to tables in the warehouse may nonetheless be able to bypass Sentry authorization checks and execute jobs and queries against tables in the warehouse as long as they have permissions on the HDFS files supporting the table.
Access policies are applied at the file level using the HDFS permissions specified in ACLs (access control lists) -> I didn't understand this.
My undestanding is that, whenever a user runs a query, authorization will be done by the sentry plugin(binding) in the data engine with the help of sentry server to validate whether the user has access(select, insert) to the resources(db, table) he is trying to query. In this case, if the user doesn't have access to the resource, then it should fail here, how can the query be successful when he has access to the files corresponding to a table in HDFS and doesn't have sentry permissions on the table? What am I missing here?
I feel like you didn't see "users without Sentry permissions to tables in the warehouse" part.
Sure, Sentry is used, but not all users are automatically given permissions, therefore it falls back to the ACLs applied at the HDFS level given by chown/chmod/setfacl functions. You need to explicitly add a "deny all" rule to say no one can access the Hive databases unless otherwise set
This can simply be bypassed as well by reading the raw HDFS location of the tables using Spark or Pig, and not using Hive. That's what it's really saying.
Also, not all Hadoop clusters use Sentry for authorized access

Restrict User Access Rights In ClickHouse

I have created multiple Databases in Clickhouse and a new User, and now can I restrict that newly created user to be able to access a particular database.
In users.xml in 'user' (near profile, quota...) you could specify optional section
<allow_databases>
<database>default</database>
<database>test</database>
</allow_databases>
If there is no 'allow_databases' section - it means that access to all databases is allowed.
Access to database 'system' is always allowed (because system database is used to process queries).
User could list all databases and tables (using SHOW queries or system tables), even if there is no access.
Database access limits are completely unrelated to 'readonly' settings. There is no possibility to provide full access to one database and readonly access to another.

Authentication and Security in Hadoop

We are building system which queries Hive table. Our Service Layer will construct Hive Query based on User Selection on UI, We have some security related questions over here
• Is it Ok to pass Hive Dynamic Query constructed at service layer to a UDF/HQL in Hive ?
• Are there any SQL Injection kind of Scenarios occurs in Hive, We are Hive 0.14, it contains delete and update statements.
• How can we manage Role Authorization to access table only like perform Read instead of Write and Delete. Is there way to manage permission for Hive table. Or will it be managed by HCatalog?
Yes, you can pass a dynamic query to Hive using PowerShell APIs. Check out https://hadoopsdk.codeplex.com/
Hive 14 does support Insert, Update and Delete. Check out https://issues.apache.org/jira/browse/HIVE-5317
Role authorization is currently not supported by HDInsight (as of 6/2015) but is something we are actively investigating and hope to bring to market soon.
Role based authorization and auditing with record, fields and cell level control and dynamic masking is available on hadoop with bluetalon policy engine.

How to use hive with multiple users

I have several users use the same hive.
Now i want each user to have a private metadata in hive.
example:
user a call show table : a1 , a2, a3 ...
user b call show table : b1 , b2 ,b3 ...
Of course when user run query they can not access table of other user.
thanks.
In order to make setup easy for new users, Hive's Metastore is
configured to store metadata locally in an embedded Apache Derby
database. Unfortunately, this configuration only allows a single user
to access the Metastore at a time. Cloudera strongly encourages users
to use a MySQL database instead. This section describes how to
configure Hive to use a remote MySQL database, which allows Hive to
support multiple users. See the Hive Metastore documentation for
additional information.
For more details see the part with heading 'Configuring the Hive Metastore' here.
Once the external meta store has been created then Hive authorization can be used to grant/restrict privileges.
This is the disclaimer from Hive
Hive authorization is not completely secure. In its current form, the authorization scheme is intended primarily to prevent good users from accidentally doing bad things, but makes no promises about preventing malicious users from doing malicious things.

Resources