Hadoop Hive: How to allow regular user continuously write data and create tables in warehouse directory? - hadoop

I am running Hadoop 2.2.0.2.0.6.0-101 on a single node.
I am trying to run Java MRD program that writes data to an existing Hive table from Eclipse under regular user. I get exception:
org.apache.hadoop.security.AccessControlException: Permission denied: user=dev, access=WRITE, inode="/apps/hive/warehouse/testids":hdfs:hdfs:drwxr-xr-x
This happens because regular user has no write permission to warehouse directory, only hdfs user does:
drwxr-xr-x - hdfs hdfs 0 2014-03-06 16:08 /apps/hive/warehouse/testids
drwxr-xr-x - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
To circumvent this I change permissions on warehouse directory, so everybody now have write permissions:
[hdfs#localhost wks]$ hadoop fs -chmod -R a+w /apps/hive/warehouse
[hdfs#localhost wks]$ hadoop fs -ls /apps/hive/warehouse
drwxrwxrwx - hdfs hdfs 0 2014-03-06 16:08 /apps/hive/warehouse/testids
drwxrwxrwx - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
This helps to some extent, and MRD program can now write as a regular user to warehouse directory, but only once. When trying to write data into the same table second time I get:
ERROR security.UserGroupInformation: PriviledgedActionException as:dev (auth:SIMPLE) cause:org.apache.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.testids
Now, if I delete output table and create it anew in hive shell, I again get default permissions that do not allow regular user to write data into this table:
[hdfs#localhost wks]$ hadoop fs -ls /apps/hive/warehouse
drwxr-xr-x - hdfs hdfs 0 2014-03-11 12:19 /apps/hive/warehouse/testids
drwxrwxrwx - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
Please advise on Hive correct configuration steps that will allow a program run as a regular user do the following operations in Hive warehouse:
Programmatically create / delete / rename Hive tables?
Programmatically read / write data from Hive tables?
Many thanks!

If you maintain the table from outside Hive, then declare the table as external:
An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.
A Hive administrator can create the table and it can point it toward your own user owned HDFS storage location and you grant Hive permission to read from there.
As a general comment, there are no ways for an unprivileged user to do an unauthorized privileged action. Any such way is technically an exploit and you should never rely on it: even if is possible today, it will likely be closed soon. Hive Authorization (and HCatalog authorization) is orthogonal to HDFS authorization.
Your application is also incorrect, irrelevant of authorization issues. You are trying to write 'twice' in the same table which means your application does not handle partitions correctly. Start from An Introduction to Hive’s Partitioning.

You can configure for hdfs-site.xml such as:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
This configure will disable permissions on HDFS. So, a regular user can do the operations on HDFS.
I hope this solve will help you.

Related

Issue loading data in Hive (group related)

I have a sample_data file (created schema earlier in hive table people
Upon running following command to load data in table people:
LOAD DATA LOCAL INPATH 'sample_data.csv' OVERWRITE INTO TABLE people;
I get following trace:
Loading data to table default.people Failed with exception Unable to
move source file:/home/hduser1/sample_data.csv to destination
hdfs://hive-master:54310/user/hive/warehouse/people/sample_data.csv
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask
Tried following but went In-vain:
hadoop fs -chmod g+w /user/hive/warehouse
sudo chmod -R 777 /home/hduser1/sample_data.csv
Further, analysis shows something interesting
-rwxrwxrwx 1 hduser1 hadoop_group 2874 Feb 21 09:50 sample_data.csv
Note: File sample_data.csv has access rights for hduser1 in hadoop_group whereas the following line shows that /user/hive/warehouse has access rights are for the hduser1 in supergroup.
drwxrwxrwx - hduser1 supergroup 0 2018-02-21 10:35 /user/hive/warehouse/people
How can I overcome this issue? Am I missing any sort of configurations?
When you use LOCAL option with LOAD DATA INPATH... the file is expected to be on the server running Hive. If you don't have access to it, the best way is to move the data to HDFS manually and use LOAD DATA INPATH...,

Not able to create new table in hive from Spark-shell

I am using single node setup in Redhat and installed Hadoop Hive Pig and Spark . I configured hive metadata in Derby and everything . I created new folder for Hive tables and gave full privilege (chmod 777 ) . Then I created one table from Hive CLI and I am able to select those data in Spark-shell and printed those values to the console. But from Spark-shell/Spark-Sql I am not able to create new tables .It is throwing error as
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/2016/hive/test2 is not a directory or unable to create one)
I checked the permission and User(using same user for Installation and Hive and Hadoop Spark etc).
Is there anything need to be done for getting full integration of Spark and Hive
Thanks
Check that the permissions in hdfs are correct (not just the filesystem)
hadoop fs -chmod -R 755 /user
If the error message persists afterwards please update the question.

Permission denied issue in mapreduce?

I have tried the below query.
hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount /home/cloudera/Desktop/words/output
map reduce is started after that its showing below error. can anyone please help on this issue.
15/11/04 10:33:57 INFO mapred.JobClient: Task Id : attempt_201511040935_0008_m_000002_0, Status : FAILED
org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
Do I need to change anything config file or in cloudera manager.
The exception suggests that you are trying to write to the HDFS root directory "/" which you (user:cloudera) does not have permission to do.
Without knowing what your specific jar does:
I guess that the last argument ("/home/cloudera/Desktop/words/output") is where you wish to place the output.
I guess this is supposed to be within HDFS where /home does not exist.
Try to change this to somewhere where you can write, possibly "/user/cloudera/words/output"
There are set of default directories to be created before you start using the hadoop cluster,
do, it should show you the directories
$ hadoop fs -ls /
sample user, if you want to run as cloudera you need on hdfs
/user/cloudera -- the user running the program
/user/hadoop -- your hadoop file system user
/user/mapred -- your mapred user
/tmp -- temporary which needs to have permission hdfs chmod 1777
HTH.
The last argument that you are passing should be the output path of HDFS not the default file system.
As you are running with cloudera user, you can point to the /user/cloudera/words/output. But first you need to check whether you have cloudera in your HDFS and you have write permission by issuing the following
hadoop fs -ls /user/
Once you have it change your command to following:
hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount <path_where_you_have_write_permission_in_HDFS>

Can not create directory on hdfs of hadoop

I can not operate my hdfs about permission. I executed hadoop fs -ls / and return
drwx------ - ubuntu supergroup 0 2015-09-02 09:58 /tmp
user of OS is user1,
How to change the user of hdfs to user1? I add dfs.permissions.enabled to false in hdfs-site.xml, and format hdfs again, but the problem still exists.
could anybody help me ?
There in no affair of OS user, namely user1.
You communicate with hadoop server via a hadoop client.
So you should check you hadoop client's conf, on my machine hadoop-site.xml.
In hadoop-site.sml, you set the corresponding username and password to your hdfs
<property>
<name>hadoop.job.ugi</name>
<value>yourusername,yourpassword</value>
<description>username, password used by client</description>
</property>

security.UserGroupInformation: PriviledgedActionException error for MR

Whenever i m trying to execute a map reduce job to write to Hbase table i am getting the following error in the console. I am running the MR job from the user account.
ERROR security.UserGroupInformation: PriviledgedActionException as:user cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/data1/input/Filename.csv
I did the hadoop ls, user is the owner of the file.
-rw-r--r-- 1 user supergroup 7998682 2014-04-17 18:49 /data1/input/Filename.csv
All my daemons are perfectly running, if i am using hbase client api, i am able to insert.
Please help, thanks in advance.
Thanks,
KG
If you look at the following path
Input path does not exist: file:/data1/input/Filename.csv
you can see that it is pointing to local filesystem not to hdfs. Try prefixing the filesystem type hdfs in the path as follows
hdfs://<NAMENODE-HOST>:<IPC-PORT>/data1/input/Filename.csv

Resources