what is the plus (+) in hdfs after the list of rights on a file or folder? - hadoop

When I execute
hdfs dfs -ls /projects
I get for example a list of folders like this:
drwxrwxr-x+ - user1 group1 0 2020-05-06 09:54 /projects/project1
drwxrwxr-x - user2 group2 0 2020-05-28 09:49 /projects/project2
drwxrwxr-x+ - user3 group3 0 2020-03-12 14:20 /projects/project3
What does the plus in drwxrwxr-x+ mean? The folder /projects/project2 doesn't have ACLs.
Does the plus mean that a folder/file has ACLs defined on it?

You guessed it right. Any file or directory that has an ACL to it will have a + character appended to the permission string.
Ref: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#ACLs_Shell_Commands

Related

HDFS NFS locations using weird numerical username values for directory permissions

Seeing nonsense values for user names in folder permissions for NFS mounted HDFS locations, while the HDFS locations themselves (using Hortonworks HDP 3.1) appear fine. Eg.
➜ ~ ls -lh /nfs_mount_root/user
total 6.5K
drwx------. 3 accumulo hdfs 96 Jul 19 13:53 accumulo
drwxr-xr-x. 3 92668751 hadoop 96 Jul 25 15:17 admin
drwxrwx---. 3 ambari-qa hdfs 96 Jul 19 13:54 ambari-qa
drwxr-xr-x. 3 druid hadoop 96 Jul 19 13:53 druid
drwxr-xr-x. 2 hbase hdfs 64 Jul 19 13:50 hbase
drwx------. 5 hdfs hdfs 160 Aug 26 10:41 hdfs
drwxr-xr-x. 4 hive hdfs 128 Aug 26 10:24 hive
drwxr-xr-x. 5 h_etl hdfs 160 Aug 9 14:54 h_etl
drwxr-xr-x. 3 108146 hdfs 96 Aug 1 15:43 ml1
drwxrwxr-x. 3 oozie hdfs 96 Jul 19 13:56 oozie
drwxr-xr-x. 3 882121447 hdfs 96 Aug 5 10:56 q_etl
drwxrwxr-x. 2 spark hdfs 64 Jul 19 13:57 spark
drwxr-xr-x. 6 zeppelin hdfs 192 Aug 23 15:45 zeppelin
➜ ~ hadoop fs -ls /user
Found 13 items
drwx------ - accumulo hdfs 0 2019-07-19 13:53 /user/accumulo
drwxr-xr-x - admin hadoop 0 2019-07-25 15:17 /user/admin
drwxrwx--- - ambari-qa hdfs 0 2019-07-19 13:54 /user/ambari-qa
drwxr-xr-x - druid hadoop 0 2019-07-19 13:53 /user/druid
drwxr-xr-x - hbase hdfs 0 2019-07-19 13:50 /user/hbase
drwx------ - hdfs hdfs 0 2019-08-26 10:41 /user/hdfs
drwxr-xr-x - hive hdfs 0 2019-08-26 10:24 /user/hive
drwxr-xr-x - h_etl hdfs 0 2019-08-09 14:54 /user/h_etl
drwxr-xr-x - ml1 hdfs 0 2019-08-01 15:43 /user/ml1
drwxrwxr-x - oozie hdfs 0 2019-07-19 13:56 /user/oozie
drwxr-xr-x - q_etl hdfs 0 2019-08-05 10:56 /user/q_etl
drwxrwxr-x - spark hdfs 0 2019-07-19 13:57 /user/spark
drwxr-xr-x - zeppelin hdfs 0 2019-08-23 15:45 /user/zeppelin
Notice the difference for users ml1 and q_etl that they have numerical user values when running ls on the NFS locations, rather then their user names.
Even doing something like...
[hdfs#HW04 ml1]$ hadoop fs -chown ml1 /user/ml1
does not change the NFS permissions. Even more annoying, when trying to change the NFS mount permissions as root, we see
[root#HW04 ml1]# chown ml1 /nfs_mount_root/user/ml1
chown: changing ownership of ‘/nfs_mount_root/user/ml1’: Permission denied
This causes real problems, since the differing uid means that I can't access these dirs even as the "correct" user to write to them. Not sure what to make of this. Anyone with more Hadoop experience have any debugging suggestions or fixes?
UPDATE:
Doing a bit more testing / debugging, found that the rules appear to be...
If the NFS server node has no uid (or gid?) that matches the uid of the user on the node accessing the NFS mount, we get the weird uid values as seen here.
If there is a uid associated to the username of the user on the requesting node, then that is the uid user that we see assigned to the location when accessing via NFS (even if that uid on the NFS server node is not actually for the requesting user), eg.
[root#HW01 ~]# clush -ab id ml1
---------------
HW[01,04] (2)
---------------
uid=1025(ml1) gid=1025(ml1) groups=1025(ml1)
---------------
HW[02-03] (2)
---------------
uid=1027(ml1) gid=1027(ml1) groups=1027(ml1)
---------------
HW05
---------------
uid=1026(ml1) gid=1026(ml1) groups=1026(ml1)
[root#HW01 ~]# exit
logout
Connection to hw01 closed.
➜ ~ ls -lh /hdpnfs/user
total 6.5K
...
drwxr-xr-x. 6 atlas hdfs 192 Aug 27 12:04 ml1
...
➜ ~ hadoop fs -ls /user
Found 13 items
...
drwxr-xr-x - ml1 hdfs 0 2019-08-27 12:04 /user/ml1
...
[root#HW01 ~]# clush -ab id atlas
---------------
HW[01,04] (2)
---------------
uid=1027(atlas) gid=1005(hadoop) groups=1005(hadoop)
---------------
HW[02-03] (2)
---------------
uid=1024(atlas) gid=1005(hadoop) groups=1005(hadoop)
---------------
HW05
---------------
uid=1005(atlas) gid=1006(hadoop) groups=1006(hadoop)
If wondering why I have, user on the cluster that have varying uids across the cluster nodes, see the problem posted here: How to properly change uid for HDP / ambari-created user? (note that these odd uid setting for hadoop service users was set up by Ambari by default).
After talking with someone more knowledgeable in HDP hadoop, found that the problem is that when Ambari was setup and run to initially install the hadoop cluster, there may have been other preexisting users on the designated cluster nodes.
Ambari creates its various service users by giving them the next available UID of a nodes available block of user UIDs. However, prior to installing Ambari and HDP on the nodes, I created some users on the to-be namenode (and others) in order to do some initial maintenance checks and tests. I should have just done this as root. Adding these extra users offset the UID counter on those nodes and so as Ambari created users on the nodes and incremented the UIDs, it was starting from different starting counter values. Thus, the UIDs did not sync and caused problems with HDFS NFS.
To fix this, I...
Used Ambari to stop all running HDP services
Go to Service Accounts in Ambari and copy all of the expected service users name strings
For each user, run something like id <service username> to get the group(s) for each user. For service groups (which may have multiple members), can do something like grep 'group-name-here' /etc/group. I recommend doing it this way as the Ambari docs of default users and groups does not have some of the info that you can get here.
Use userdel and groupdel to remove all the Ambari service users and groups
Then recreate all the groups across the cluster
Then recreate all the users across the cluster (may need to specify UID if nodes have other users not on others)
Restart the HDP services (hopefully everything should still run as if nothing happend, since HDP should be looking for the literal string (not the UIDs))
For the last parts, can use something like clustershell, eg.
# remove user
$ clush -ab userdel <service username>
# check that the UID you want to use is actually available on all nodes
$ clush -ab id <some specific UID you want to use>
# assign that UID to a new service user
$ clush -ab useradd --uid <the specific UID> --gid <groupname> <service username>
To get the lowest common available UID from each node, used...
# for UID
getent passwd | awk -F: '($3>1000) && ($3<10000) && ($3>maxuid) { maxuid=$3; } END { print maxuid+1; }'
# for GID
getent passwd | awk -F: '($4>1000) && ($4<10000) && ($4>maxuid) { maxuid=$4; } END { print maxuid+1; }'
Ambari also creates some /home dirs for users. Once you are done recreating the users, will need to change the permissions for the dirs (can also use something like clush there as well).
* Note that this was a huge pain and you would need to manually correct the UIDs of users whenever you added another cluster node. I did this for a test cluster, but for production (or even a larger test) you should just useKerberos or SSSD + Active Directory.

AvroStorage - output file name definition

I use AvroStorage to store result set from the pig. Is there a way how can I store data into one specified avro file...e.g OutputFileGen1? Pig is storing data into the directory named OutpuFileGen1 with structure as listed below:
ls -al OutputFileGen1/
total 20
drwxr-xr-x 2 root root 4096 2016-01-18 14:35 .
drwxr-xr-x 6 root root 4096 2016-01-19 10:27 ..
-rw-r--r-- 1 root root 4083 2016-01-18 14:35 part-m-00000.avro
-rw-r--r-- 1 root root 40 2016-01-18 14:35 .part-m-00000.avro.crc
-rw-r--r-- 1 root root 0 2016-01-18 14:35 _SUCCESS
-rw-r--r-- 1 root root 8 2016-01-18 14:35 ._SUCCESS.crc
Thank you
The number of part in the pig output directory depends on how many parallel task your job does. Here you have only have one file : part-m-00000.
http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features
But maybe you want a single file in purpose, so if you want to get this file I suggest to use the hadoop fs -getmerge <src dir> <target dir>command, to get the file in the local file system in order to use the data it contains.

Use of Group and roles in HDFS

I am new to HDFS,
When i run hadoop fs -ls /tmp/data command , I get the below output
-rw-r--r-- 2 root root 52784 2014-08-01 09:52 /tmp/data/sample1.pdf
-rw-r--r-- 2 root root 52784 2014-08-01 09:52 /tmp/data/Sample2.pdf
From this output i wanna know Which is the Group? and What is the use of the Group ?
Which is the user?
1st root is the user , 2nd root is the group.
Group is used to group all users under it so as to restrict access to hdfs directories based on the groups.

How to put a file to hdfs with secondary group?

I have a local file
-rw-r--r-- 1 me developers 102445154 Oct 22 10:02 file1.csv
which I'm attempting to put to hdfs:
/usr/bin/hdfs dfs -put ./file1.csv hdfs://000.00.00.00/user/me/
which works fine, but the group is wrong
-rw-r--r-- 3 me me 102445154 2013-10-22 10:23 hdfs://000.00.00.00/user/file1.csv
How do I get the group developers to come with?
Use the chgrp option on the file.

Samba Share Permissions Issue - Public share with file-stystem permissions only

I'm trying to create a Samba share on a Linux (SLES10) system, but I'm having trouble with the Samba permissions. I want to create this as a public share, with file permissions controlled at the file-system level (so all users can map the drive, but they can only browse further if they have further file-system permissons).
I've been able to create the share, and map to it with any user. The problem is that I only seem able to obtain sufficient permissions if I login as "root". If I login with another user, even if they have permissions to read and write to the underlying folder, I cannot browse that any folders at all.
Here is the share information from /etc/samba/smb.conf:
[sambatest]
comment = Samba Test
public = yes
path = /var/opt/folder
read only = No
writeable = Yes
write list = user1 user2 user3
browseable = Yes
Here is an example of the directory permissions in the shared folder:
drwxrwxr-x 5 user1 group1 40 Nov 4 17:02 .
drwxr-xr-x 11 user1 group1 4096 Oct 20 09:20 ..
drwxrwx--- 4 user1 group1 41 Nov 4 17:02 BASE
drwxrwx--- 6 user1 group1 78 Oct 28 10:11 Files
drwxrwx--- 2 user1 group1 22 Nov 4 17:02 test
After the mapping the drive with the credentials of "user1", I try to browse "test" from Windows XP, but get a message "Z:\test is not accessible: Access is denied".
If I map the same shared folder using the "root" credentials, it works.
Little help? I'm sure I've come across this before, but can't figure out how to fix it...
I think you need to try chmod that dir with 755 and try with this config
[sambatest]
comment = Samba Test
public = yes
path = /var/opt/folder
read only = yes
writeable = yes
write list = user1 user2 user3
browseable = yes
create mask = 0775
More info on:
http://oreilly.com/openbook/samba/book/ch06_02.html
http://www.cyberciti.biz/tips/how-do-i-set-permissions-to-samba-shares.html

Resources