we have many databases with tables in it in hive and we want to give access to user according to Linux/Unix group, currently we are using hive role and user level permission but we want to switch to linux/unix group for table and database level permission. How can we do it?
This isn't controlled at the Hive level. You'd just hadoop fs -chmod the directories where the data is
By default, Hadoop access is not authenticated or secured; username checks are based purely on strings, not really Unix group names. In other words, clients can easily bypass your checks.
In order to do this properly, you need an LDAP and Kerberos server for authorization and authentication. With an LDAP server, you can use sssd to create local users and groups across your cluster, but to properly authorize them, you should issue keytabs rather than rely on Unix permission sets.
To further secure the system, Apache Ranger can be used to restrict, obfuscate, and audit data access
Related
I have two hiveserver2 instances running. One uses Binary Transport (for HUE), the other uses HTTP transport (for ODBC connections).
I am trying to grant access for one user (ra01 in the screenshot) to only a specific table in Hive.
The user account is intended to be used for ODBC connection from PowerBI.
I set the policy as seen in the screenshot. The policy seems to work if I try it in HUE but if I use the same user via ODBC, it seems to grant all permissions and it is using "Hadoop-ACL" instead of "Ranger-ACL" as seen in the attached screenshot.
What am I missing?
It appears the issue is with the folder permissions in HDFS.
I followed the instructions in the link below; it fixed the issue for me:
Best practices in HDFS authorization with Apache Ranger
As per https://www.cloudera.com/documentation/enterprise/latest/topics/sg_sentry_service_config.html,
HiveServer2 impersonation lets users execute queries and access HDFS files as the connected user rather than as the super user. Access policies are applied at the file level using the HDFS permissions specified in ACLs (access control lists). Enabling HiveServer2 impersonation bypasses Sentry from the end-to-end authorization process. Specifically, although Sentry enforces access control policies on tables and views within the Hive warehouse, it does not control access to the HDFS files that underlie the tables. This means that users without Sentry permissions to tables in the warehouse may nonetheless be able to bypass Sentry authorization checks and execute jobs and queries against tables in the warehouse as long as they have permissions on the HDFS files supporting the table.
Access policies are applied at the file level using the HDFS permissions specified in ACLs (access control lists) -> I didn't understand this.
My undestanding is that, whenever a user runs a query, authorization will be done by the sentry plugin(binding) in the data engine with the help of sentry server to validate whether the user has access(select, insert) to the resources(db, table) he is trying to query. In this case, if the user doesn't have access to the resource, then it should fail here, how can the query be successful when he has access to the files corresponding to a table in HDFS and doesn't have sentry permissions on the table? What am I missing here?
I feel like you didn't see "users without Sentry permissions to tables in the warehouse" part.
Sure, Sentry is used, but not all users are automatically given permissions, therefore it falls back to the ACLs applied at the HDFS level given by chown/chmod/setfacl functions. You need to explicitly add a "deny all" rule to say no one can access the Hive databases unless otherwise set
This can simply be bypassed as well by reading the raw HDFS location of the tables using Spark or Pig, and not using Hive. That's what it's really saying.
Also, not all Hadoop clusters use Sentry for authorized access
I have one Hadoop cluster (Cloudera distribution) given access to multiple user. Now from different users we are creating databases. How do i verify which user is creating which database.? Can anyone suggest me.?
Use below query:
Describe formatted databaseName.tableName;
Will show the owner and other details like table type,size etc.
I am working with hive 0.14 mainly using beeline.
I am not an admin but I am looking to create a couple of views that the team can use.
We've got a common hive database where everyone has read+write. If I am creating certain tables/views that I don't want other people to be able to drop or modify, is it possible for me to revoke drop/write access for others?
The access to hive tables depends on HDFS access rights.
Whenever you create a new table tbl in database located in db, a new directory db/tbl will be created.
If you want to restrict write group access to that directory use hadoop fs -chmod, for example:
hadoop fs -chmod 750 db/tbl
If you want to find out where tables are located in a database, you can create a table without specifying a location, and run describe formated tbl.
You can always check what are the access rights of the tables by running hadoop fs -ls db
Regarding views:
Although Storage Based Authorization can provide access control at the level of Databases, Tables and Partitions, it can not control authorization at finer levels such as columns and views because the access control provided by the file system is at the level of directory and files. A prerequisite for fine grained access control is a data server that is able to provide just the columns and rows that a user needs (or has) access to. In the case of file system access, the whole file is served to the user. HiveServer2 satisfies this condition, as it has an API that understands rows and columns (through the use of SQL), and is able to serve just the columns and rows that your SQL query asked for.
SQL Standards Based Authorization (introduced in Hive 0.13.0, HIVE-5837) can be used to enable fine grained access control. It is based on the SQL standard for authorization, and uses the familiar grant/revoke statements to control access. It needs to be enabled through HiveServer2 configuration.
Note that for Hive command line, SQL Standards Based Authorization is disabled. This is because secure access control is not possible for the Hive command line using an access control policy in Hive, because users have direct access to HDFS and so they can easily bypass the SQL standards based authorization checks or even disable it altogether. Disabling this avoids giving a false sense of security to users.
So, in short, SQL Standards Based Authorization needs to be enabled in the config.
Then you'll be able to use: REVOKE on views.
I have several users use the same hive.
Now i want each user to have a private metadata in hive.
example:
user a call show table : a1 , a2, a3 ...
user b call show table : b1 , b2 ,b3 ...
Of course when user run query they can not access table of other user.
thanks.
In order to make setup easy for new users, Hive's Metastore is
configured to store metadata locally in an embedded Apache Derby
database. Unfortunately, this configuration only allows a single user
to access the Metastore at a time. Cloudera strongly encourages users
to use a MySQL database instead. This section describes how to
configure Hive to use a remote MySQL database, which allows Hive to
support multiple users. See the Hive Metastore documentation for
additional information.
For more details see the part with heading 'Configuring the Hive Metastore' here.
Once the external meta store has been created then Hive authorization can be used to grant/restrict privileges.
This is the disclaimer from Hive
Hive authorization is not completely secure. In its current form, the authorization scheme is intended primarily to prevent good users from accidentally doing bad things, but makes no promises about preventing malicious users from doing malicious things.