Difference between Superuser and supergroup in Hadoop - hadoop

What is supergroup and superuser in Hadoop/HDFS?

Superuser
Based on the Hadoop official documentation:
The super-user is the user with the same identity as the name node process itself. Loosely, if you started the name node, then you are the super-user. The super-user can do anything in that permissions checks never fail for the super-user.
Supergroup
Supergroup is the group of superusers. This group is used to ensure that the Hadoop Client has superuser access. It can be configured using dfs.permissions.superusergroup property in the core-site.xml file.
References
Hadoop superuser and supergroup

Related

Hadoop Nodemanager failing with error Can't get group information

I have kerberos configured Apache hadoop(2.8.5) installed. NameNode, DataNode and ResourceManager is running fine, but Nodemanager is failing to start with error:
Can't get group information for hadoop#configured value of yarn.nodemanager.linux-container-executor.group - Success.
file permissions:
container-executor.cfg: -rw------- 1 root hadoop
container-executor: ---Sr-s--- 1 root hadoop
container-executor.cfg
yarn.nodemanager.local-dirs=/hadoop/data/yarn/local
yarn.nodemanager.linux-container-executor.group=hadoop#configured value
of yarn.nodemanager.linux-container-executor.group
banned.users=hdfs,yarn,mapred,bin,root#comma separated list of users who can not run applications
min.user.id=1000#Prevent other super-users
Simply remove the comment:
#configured value
from the configuration line:
yarn.nodemanager.linux-container-executor.group
on the container-executor.cfg file
It should looke like this:
yarn.nodemanager.local-dirs=/hadoop/data/yarn/local
yarn.nodemanager.linux-container-executor.group=hadoop
of yarn.nodemanager.linux-container-executor.group
banned.users=hdfs,yarn,mapred,bin,root
min.user.id=1000
This configuration file has had historical problem with spaces, comments, etc..

Can not create directory on hdfs of hadoop

I can not operate my hdfs about permission. I executed hadoop fs -ls / and return
drwx------ - ubuntu supergroup 0 2015-09-02 09:58 /tmp
user of OS is user1,
How to change the user of hdfs to user1? I add dfs.permissions.enabled to false in hdfs-site.xml, and format hdfs again, but the problem still exists.
could anybody help me ?
There in no affair of OS user, namely user1.
You communicate with hadoop server via a hadoop client.
So you should check you hadoop client's conf, on my machine hadoop-site.xml.
In hadoop-site.sml, you set the corresponding username and password to your hdfs
<property>
<name>hadoop.job.ugi</name>
<value>yourusername,yourpassword</value>
<description>username, password used by client</description>
</property>

Hiveserver2: Failed to create/change scratchdir permissions to 777: Could not create FileClient

I'm running a MapR Community Edition Hadoop cluster (M3).
Unfortunately, the HiveServer2 service crashes and, according the log file in /opt/mapr/hive/hive-0.13/logs/mapr/hive.log, there's a problem with permissions on the scratch directory:
2015-02-24 21:21:08,187 WARN [main]: server.HiveServer2 (HiveServer2.java:init(74)) - Failed to create/change scratchdir permissions to 777: Could not create FileClient java.io.IOException: Could not create FileClient
I checked the settings for the scratch directory using hive -e 'set;' | grep scratch:
hive.exec.scratchdir=/user/mapr/tmp/hive/
hive.scratch.dir.permission=700
I notice that hive.scratch.dir.permission is set to 700 and the error message suggests that it wants to change this to 777. However, according to the filesystem, /mapr/my.cluster.com/user/mapr/tmp has 777 permissions and belongs to the mapr user.
mapr#hadoop01:/mapr/my.cluster.com/user/mapr/tmp$ ls -al
total 2
drwxr-xr-x 3 mapr mapr 1 Feb 22 10:39 .
drwxr-xr-x 5 mapr mapr 3 Feb 24 08:40 ..
drwxrwxrwx 56 mapr mapr 54 Feb 23 10:20 hive
Judging by the filesystem permissions, I would expect the mapr user to do whatever it wants with this folder and so don't understand the error message.
I'm curious to know if anyone's seen this before and, if so, how did you fix it?
Update:
I had a look at the source code, and notice some relevant comments just prior to the warning:
// When impersonation is enabled, we need to have "777" permission on root scratchdir, because
// query specific scratch directories under root scratchdir are created by impersonated user and
// if permissions are not "777" the query fails with permission denied error.
I added set the following properties in hive-site.xml:
<property>
<name>hive.scratch.dir.permission</name>
<value>777</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive/</value>
</property>
... and created the /tmp/hive/ folder in HDFS with 777 permissions:
mapr#hadoop01:~$ hadoop fs -ls -d /tmp/hive
drwxrwxrwx - mapr mapr 0 2015-02-27 08:38 /tmp/hive
Although this looked promising, I still got the same warning in hive.log.
Update the permission of your /tmp/hive HDFS directory to set it to 777:
hadoop fs -chmod 777 /tmp/hive
Or remove /tmp/hive , temporary files will be created anyway even when you delete them.
hadoop fs -rm -r /tmp/hive;
rm -rf /tmp/hive

i executed a hadoop mapreduce program successfully, Can someone tell me how see output through browser like <http://localhost:port/hdfsLocation/>

i executed a hadoop mapreduce program successfully in CDH4, but where can i see my output ? , Can someone tell me how to see output through browser like: It will be helpfull to me
on terminal
hadoop dfs -ls /inputfile
it will give result like
Found 2 items
-rw-r--r-- 3 user17 supergroup 0 2014-11-27 16:47 /inputfile/_SUCCESS
-rw-r--r-- 3 user17 supergroup 24441 2014-11-27 16:47 /inputfile/part-00000
hadoop dfs -cat /inputfile/part-00000
NameNode and DataNode each run an internal web server in order to display basic information about the current status of the cluster. With the default configuration, the NameNode front page is at http://namenode-name:50070/. It lists the DataNodes in the cluster and basic statistics of the cluster. The web interface can also be used to browse the file system (using "Browse the file system" link on the NameNode front page).
if you want see output on web please see. http://gethue.com/#

Hadoop Hive: How to allow regular user continuously write data and create tables in warehouse directory?

I am running Hadoop 2.2.0.2.0.6.0-101 on a single node.
I am trying to run Java MRD program that writes data to an existing Hive table from Eclipse under regular user. I get exception:
org.apache.hadoop.security.AccessControlException: Permission denied: user=dev, access=WRITE, inode="/apps/hive/warehouse/testids":hdfs:hdfs:drwxr-xr-x
This happens because regular user has no write permission to warehouse directory, only hdfs user does:
drwxr-xr-x - hdfs hdfs 0 2014-03-06 16:08 /apps/hive/warehouse/testids
drwxr-xr-x - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
To circumvent this I change permissions on warehouse directory, so everybody now have write permissions:
[hdfs#localhost wks]$ hadoop fs -chmod -R a+w /apps/hive/warehouse
[hdfs#localhost wks]$ hadoop fs -ls /apps/hive/warehouse
drwxrwxrwx - hdfs hdfs 0 2014-03-06 16:08 /apps/hive/warehouse/testids
drwxrwxrwx - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
This helps to some extent, and MRD program can now write as a regular user to warehouse directory, but only once. When trying to write data into the same table second time I get:
ERROR security.UserGroupInformation: PriviledgedActionException as:dev (auth:SIMPLE) cause:org.apache.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.testids
Now, if I delete output table and create it anew in hive shell, I again get default permissions that do not allow regular user to write data into this table:
[hdfs#localhost wks]$ hadoop fs -ls /apps/hive/warehouse
drwxr-xr-x - hdfs hdfs 0 2014-03-11 12:19 /apps/hive/warehouse/testids
drwxrwxrwx - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
Please advise on Hive correct configuration steps that will allow a program run as a regular user do the following operations in Hive warehouse:
Programmatically create / delete / rename Hive tables?
Programmatically read / write data from Hive tables?
Many thanks!
If you maintain the table from outside Hive, then declare the table as external:
An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.
A Hive administrator can create the table and it can point it toward your own user owned HDFS storage location and you grant Hive permission to read from there.
As a general comment, there are no ways for an unprivileged user to do an unauthorized privileged action. Any such way is technically an exploit and you should never rely on it: even if is possible today, it will likely be closed soon. Hive Authorization (and HCatalog authorization) is orthogonal to HDFS authorization.
Your application is also incorrect, irrelevant of authorization issues. You are trying to write 'twice' in the same table which means your application does not handle partitions correctly. Start from An Introduction to Hive’s Partitioning.
You can configure for hdfs-site.xml such as:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
This configure will disable permissions on HDFS. So, a regular user can do the operations on HDFS.
I hope this solve will help you.

Resources