Grant create external table in Sentry - hadoop

I have a 4 node cloudera cluster with kerberos enabled on it with sentry securing Hive service.
When i am create a table using hive user i am able to do so as it have all privileges on database default.
0: jdbc:hive2://clnode4:10000/default> create table t123 (a int);
No rows affected (0.204 seconds)
0: jdbc:hive2://clnode4:10000/default> show tables from default;
+--------------+--+
| tab_name |
+--------------+--+
| t1 |
| t12 |
| t123 |
+--------------+--+
3 rows selected (0.392 seconds)
But when i am trying to create a external table on same env with same user hive i am getting error as below
0: jdbc:hive2://clnode4:10000/default> create external table t1_ex (a string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 'hdfs:///user/olap/KyvosDemo/Distance.csv';
Error: Error while compiling statement: FAILED: SemanticException No valid privileges
User hive does not have privileges for CREATETABLE (state=42000,code=40000)
I have provided all access on URI as well from were i am reading the data for external table.
Is there any way to provide create external table to user in sentry any help would be great.

I am able to solve the problem by granting all privileges on the server to hive user as below
grant all on server server1 to role hive;
role hive is assigned to hive user.
Edit
More help on this one can find the server name in hive configuration with the property name "hive.sentry.server"

Related

Confusion with the external tables in hive

I have created the hive external table using below command:
use hive2;
create external table depTable (depId int comment 'This is the unique id for each dep', depName string,location string) comment 'department table' row format delimited fields terminated by ","
stored as textfile location '/dataDir/';
Now, when I view the HDFS I can see the db but there is no depTable inside the warehouse.
[cloudera#quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
[cloudera#quickstart ~]$
Above you can see that there is no table created in this DB. As far as I know, external tables are not stored in the hive warehouse.So am I correct ?? If yes then where is it stored ??
But if I create external table first and then load the data then I am able to see the file inside hive2.db.
hive> create external table depTable (depId int comment 'This is the unique id for each dep', depName string,location string) comment 'department table' row format delimited fields terminated by "," stored as textfile;
OK
Time taken: 0.056 seconds
hive> load data inpath '/dataDir/department_data.txt' into table depTable;
Loading data to table default.deptable
Table default.deptable stats: [numFiles=1, totalSize=90]
OK
Time taken: 0.28 seconds
hive> select * from deptable;
OK
1001 FINANCE SYDNEY
2001 AUDIT MELBOURNE
3001 MARKETING PERTH
4001 PRODUCTION BRISBANE
Now, if I fire the hadoop fs query I can see this table under database as below:
[cloudera#quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
Found 1 items
drwxrwxrwx - cloudera supergroup 0 2019-01-17 09:07 /user/hive/warehouse/hive2.db/deptable
If I delete the table still I am able to see table in the HDFS as below:
[cloudera#quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
Found 1 items
drwxrwxrwx - cloudera supergroup 0 2019-01-17 09:11 /user/hive/warehouse/hive2.db/deptable
So, what is the exact behavior of the external tables ?? When I create using LOCATION keyword where does it get stored and when I create using load statement why it is getting stored in the HDFS and after deleting why it doesn't get deleted.
The main difference between EXTERNAL and MANAGED tables is in Drop table/partition behavior.
When you drop MANAGED table/partition, the location with data files also removed.
When you drop EXTERNAL table, the location with data files remains as is.
UPDATE: TBLPROPERTIES ("external.table.purge"="true") in release 4.0.0+ (HIVE-19981) when set on external table would delete the data as well.
EXTERNAL table as well as MANAGED is being stored in the location specified in DDL. You can create table on top of existing location with data files already in the location and it will work for both EXTERNAL or MANAGED, does not matter.
You even can create both EXTERNAL and MANAGED tables on top of the same location, see this answer with more details and tests: https://stackoverflow.com/a/54038932/2700344
If you specified location, the data will be stored in that location for both types of tables. If you did not specify location, the data will be in default location: /user/hive/warehouse/database_name.db/table_name for both managed and external tables.
Update: Also there can be some restrictions on location depending on platform/vendor, see https://stackoverflow.com/a/67073849/2700344, you may not be allowed to create manged/external tables outside their default allowed root location.
See also official Hive docs on Managed vs External Tables

Creating HIVE External Table from read-only folder

I have access to Hadoop cluster where I have read-only access to hdfs folder where the data is (in this case /data/table1/data_PART0xxx). I would like to build a HIVE EXTERNAL Table which would provide me easier way to query the data.
So, I created a table as follows:
CREATE EXTERNAL TABLE myDB.Table1 (column1 STRING, column2 STRING, column3 STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(.{10})(.{16})(.{10})"
)
LOCATION '/data/table1';
However, it gives me error:
Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [my_user] does not have [ALL] privilege on [hdfs://hadoopcluster/data/table1] (state=42000,code=40000)
which I understand, I don't have right to write anything, but how can I do this, so that the table is explicitly defined as read-only?
Edit: I know I can set it as CREATE TEMPORARY EXTERNAL TABLE ... but I would like to have a bit more permanent solution. Further, I have also no privileges to set the folder /data/table1 to mode 777. Isn't there a way to tell HIVE, this table is meant only to be queried, no further data will be added (at least not through HIVE).
Edit: Further, I have seen JIRA ticket for this since 2009 set to important, but still not resolved.

alter table/add columns in non native table in hive

I created a hive table with a storage handler and now I want to add a column to that table but it gives me below error:
[Code: 10134, SQL State: 42000] Error while compiling statement: FAILED:
SemanticException [Error 10134]: ALTER TABLE can only be used for [ADDPROPS,
DROPPROPS] to a non-native table
As per the hive documentation any hive table you create with storage handler is non native table.
Here's a link https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
There is a JIRA case for enhancement is open with Apache for the same.
https://issues.apache.org/jira/browse/HIVE-1240
For ex, I am using Druid Storage Handler in my case.
I created a hive table using:
CREATE TABLE druid_table_1
(`__time` TIMESTAMP, `dimension1` STRING, `metric1` int)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler';
and then I am trying to add a column:
ALTER TABLE druid_table_1 ADD COLUMNS (`dimension2` STRING);
With above approach I am getting an error.
Is there any other way to add a column to non native tables in hive without recreating it?
Patch is available in HDP 2.5+ from Hortonworks. Support for ADD columns has been added in ALTER statement.
Column can be added into druid table using ALTER table DDL in hive.
ALTER TABLE ADD COLUMNS (col_name data_type)
There is no need to specify partition spec as these are druid backed hive tables and partition/storage is maintained by druid.

user xxxx not authorized to view the data (state=,code=0) in spark-sql & hive

I am able to create table in spark-sql using beeline,but when i am trying to run query on created table I am getting error
"user not authorized to view the data".
below are the steps which i have perform
$SPARK_HOME/bin/beeline
!connect jdbc:hive2://server:10000 username password
CREATE EXTERNAL TABLE IF NOT EXISTS tablename(Name STRING,count INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/xxxx/'
when i do show tables i can see table has been created ,but while querying on tables i am getting error.
please help

How to provide Vertica user with read-only access to certain specified system tables?

We're looking to set up a user in our Vertica database that can see certain system tables, (projections, projection_storage and views), but we don't want this user to be a dbadmin, because we don't want them to have write privileges on these tables. I've tried using GRANT statements to give a regular user access to these tables, but that doesn't seem to work. Each user can only see their own own records in those tables. Is there a way to set up a user as I describe, or do we need to have this user be a dbadmin?
Our use case is that we need a user that can get a list of the schemas that exist in our database and iterate through each schema, gathering information to store in one central location. If our user is granted usage on the individual schemas, then they can get a list of those schemas, but they aren't able to access the necessary records in the projection_storage and views tables.
Thank you!
Granting USAGE on the schema to the user or role is not enough for users to see its projections in projection_storage table. If the user or the role has SELECT access on the table, then projections for those tables can be viewed in projection_storage. I am in Vertica 7.1, and I was able to view projection records by granting SELECT permission just to the role instead of granting to individual user ID.
If the user does not need to access tables but needs to list out tables in the schema for some reporting purpose, one option would be to periodically dump the content of projection_storage to a different table and grant proper privileges on this table to the user.
Just for the sake of maintaince you should create database roles !! and then give acces to those roles to your users . Other-wise the maintainance will be hell to you !!
Normally, I just give a user USAGE on a schema. And then "GRANT SELECT on to ;"
Do they have INSERT permissions on those tables?
Granting select access to the role on the table , does not grant complete access to metadata tables like projection_storage . This seems to be a bug. In order to get complete access select needs to be granted to individual user id.
You can follow the below steps to create a user with select privileges to a schema .
I ll follow this with a example ,In my test database I have a schema 'sid' with a table 'student_table'.
1) Login as a admin on your database .
[dbadmin#localhost bin]$ vsql -u
User name: dbadmin
Password:
2) Create the user with a password
dbadmin=> create user test identified by 'R';
CREATE USER
3) Give the newly created user a Grant for the usage on the database.
dbadmin=> Grant ALL on database vertica to test;
GRANT PRIVILEGE
4) You can then grant the user the Usage to the schema
dbadmin=> Grant Usage on Schema sid to test;
GRANT PRIVILEGE
5) Finally provide the select grant to the user on the table .
dbadmin=> Grant select on sid.student_table to test ;
GRANT PRIVILEGE
dbadmin=> \q
6) Login with the new user 'test' , You will be able to access both projection storage and
your table sid.student_table
[dbadmin#localhost bin]$ vsql -u
vsql: Warning: The -u option is deprecated. Use -U.
User name: test
Password:
Welcome to vsql, the Vertica Analytic Database interactive terminal.
test=> select * From sid.student_table;
Student_ID | Last_name | First_Name | Class_Code | Grade_pt
------------+-----------+------------+------------+--------------------
9999 | T_ | S% | PG | 98.700000000000000
(1 row)
test=> select * From projection_storage;
-[ RECORD 1 ]-----------+-----------------------------------------
node_name | v_vertica_node0001
projection_id | 45035996273836526
projection_name | Student_Table_DBD_1_rep_tet1_v1_node0001
projection_schema | sid
projection_column_count | 6
row_count | 9
used_bytes | 375
wos_row_count | 0
wos_used_bytes | 0
ros_row_count | 9
ros_used_bytes | 375
ros_count | 1
anchor_table_name | Student_Table
anchor_table_schema | sid
anchor_table_id | 45035996273756612

Resources