SafeModeException at cosmos.lab.fi-ware.org - hadoop

According to the wiki
http://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/BigData_Analysis_-_Quick_Start_for_Programmers#Step_1._Create_a_Cosmos_account
and via
ssh my_user#cosmos.lab.fi-ware.org
1) I realize that there is no folder '/user/my_user' but '/home/my_user'. Why? May I suposse that my user is not propertly created?
2) I am trying to create a folder, but I get the SafeModeException:
hadoop fs -mkdir /home/my_user/test
mkdir: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /home/my_user/test. Name node is in safe mode.
I have tried:
hadoop dfsadmin -safemode leave
with this result:
safemode: org.apache.hadoop.security.AccessControlException: Access denied for user my_user. Superuser privilege is required
Thanks!

Pablo, yesterday the Namenode of the Cosmos instance entered in safe mode due to the HDD was running out of space. It should be fixed now, but while the safe mode was enabled, nothing could be done with HDFS, including your user account creation.
I have completed the registration process manually, try it and let me know if something is still wrong (I also answered you by private email, with all the details regarding your user).
Regarding the Hadoop commands you tried (leaving safe mode and creating a folder under /user), these are privileged operations :)

Related

Why does h2o require write access on hdfs root directory?

Seeing error message
Job setup failed : org.apache.hadoop.security.AccessControlException: Permission denied: user=airflow, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399) at ...
when trying to connect to start the h2o cluster (h2o-3.28.0.1-hdp3.1). Ie it appears that it does not like that the root hdfs dir hdfs:/// does not have write permissions for my user (and giving write access to my user via ranger does appear to fix the problem), but this seems wrong.
From past experience, I've seen this for case where the launching user does not have write permissions the their own hdfs:///user/<username> folder, but seems odd to me that h2o wants the user to have write access over the entire top level hdfs dir. Is this normal? Can I change this?
Possibly related: Finding that after starting the cluster, can't manually kill in YARN ResourceManager UI or killing the PID, rather need to go to the h2o cluster url and use the admin tab to shutdown the cluster. Any ideas why this would happen?
Found the problem, can't find the docs / other-post-detailing-this right now, but basically, when running the hadoop jar h2odriver.jar ... command, there is an optional param called -output where you would normally put some hdfs location that h2o will write stuff (from what I can recall, this is some legacy directory that is not super important) to.
I had forgotten that this is an HDFS location and put some local temp folder's absolute path. The error was because h2o was trying to create that folder by creating the entire path in hdfs that lead to it, thus requiring being able to write from the hdfs root dir. The correct value would be something like /user/<username>.

Spark/Hadoop can't read root files

I'm trying to read a file inside a folder that only me (and root) can read/write, through spark, first I start the shell with:
spark-shell --master yarn-client
then I:
val base = sc.textFile("file///mount/bases/FOLDER_LOCKED/folder/folder/file.txt")
base.take(1)
And got the following error:
2018-02-19 13:40:20,835 WARN scheduler.TaskSetManager:
Lost task 0.0 in stage 0.0 (TID 0, mydomain, executor 1):
java.io.FileNotFoundException: File file: /mount/bases/FOLDER_LOCKED/folder/folder/file.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
...
I am suspecting that as yarn/hadoop was launched with the user hadoop it can't go further in this folder to get the file. How could I solve this?
OBS: This folder can't be open to other users because it has private data.
EDIT1: This /mount/bases is a network storage, using a cifs connection.
EDIT2: hdfs and yarn was launched with the user hadoop
As hadoop was the user that lauched hdfs and yarn, he is the user that will try to open a file in a job, so it must be authorized to access this folder, fortunely hadoop checks what user is executing the job first to allow the access to a folder/file, so you will not take risks at this.
Well, if it would have been access related issue with the file, you would have got 'access denied' as an error. In this particular scenario, I think file that you are trying to read is not present at all, or might have some other name[typos]. Just check for the file name.

Hortonworks Practice Exam - Copy File from local machine to hdfs ERROR

I am currently working on the Hortonworks practice exam and I am getting errors I have not been able to troubleshoot.
During the first step the prompt asks Put the three files from the home/horton/datasets/flight delays directory on the local machine into the user/horton/flight delays directory in hdfs permission denied error. When on the node that hdfs is installed on (root#namenode). I run the simple command:
hadoop fs -copyFromLocal /home/horton/datasets/flightdelays/flight_delays1.csv /user/horton/flightdelays
This returns the error /home/horton/datasets/flightdelays/flight_delays1.csv no such file or directory
When I run the same exact command above from the command line on the local machine instead of running it after being ssh'd onto the namenode (horton#some-ip) I get a permission denied error:
permission denied user=horton access=WRITE inode='/user/horton/flightdelays":hdfs:hdfs:drwxr-xr-x
If anyone has done this practice exam before or knows what this error is and could lend any assistance it would be greatly appreciated. When researching online a lot of people are running into the same issue with the permission denied but im going to assume that on a practice exam that they set up you shouldn't be needing to use sudo for every command you run.
Again any help would be fantastic thanks!!
Try this on CLI
sudo -u hdfs hdfs -copyFromLocal /input/file/path /hdfs/path/
Try this in your command line
hadoop fs -put /localfile.txt /hdfs path
The issue is that the folder you're trying to write to has ownership and permssions of hdfs:hdfs:drwxr-xr-x meaning it is owned by the 'hdfs' user and group. Only the hdfs user has write permissions to that folder everyone else has read and execute permissions only. Thus writing to that folder as the 'horton' user will not work.
You need to run the command as hdfs like so:
sudo -u hdfs hadoop fs -copyFromLocal /home/horton/datasets/flightdelays/flight_delays1.csv /user/horton/flightdelays

"Permission denied" for almost everything after a successful ssh into gcloud instance that was created using bdutil

Just created instance and deployed a cluster using bdutil. SSH works fine as I can ssh into instance using ./bdutil shell.
When I try to access directories such as Hadoop, hdfs etc., it throws an error:
Permission Denied
The terminal appears like this username#hadoop-m $ I know hadoop-m is the name of the instance. What is the username? It says my name but I don't know where it got this from or what the password is.
I am using Ubuntu to ssh into the instance.
Not a hadoop expert, I can answer a bit generally. On GCE when you ssh in gcloud creates a username from you google account name. Hadoop directories such as hadoop or hdfs are probably owned by a different user. Please try using sudo chmod to make give your username permissions to read/write the directories you need.
To elaborate on Jeff's answer, bdutil-deployed clusters set up the user hadoop as the Hadoop admin (this 'admin' user may differ on different Hadoop systems, where Hadoop admin accounts may be split into separate users hdfs, yarn, mapred, etc). Note that bdutil clusters should work without needing deal with Hadoop admin stuff for normal jobs, but if you need to access those Hadoop directories, you can either do:
sudo su hadoop
or
sudo su
to open a shell as hadoop or root, respectively. Or as Jeff mentions, you can sudo chmod to grant broader access to your own username.

Hadoop: Pseudo Distributed mode for multiple users

I appreciate your help in advance.
I have setup Hadoop in Pseudo Distributed mode using the root user credentials. I want to provide access to multiple users (let us say hadoop1, hadoop2, etc) to be able to submit and run MapReduce jobs on this cluster. How do we get this done?
What I have done so far?
> - Setup Hadoop to run in Pseudo-distributed mode
> - Used "root" user credentials to set this up.
> - Added users hadoop1 and hadoop2 to a group called "hadoop".
> - Added root also to be part of the group "hadoop".
> - Created a folder called hdfstmp and set this as the path for hadoop.tmp.dir.
> - Started the cluster using bin/start-all.sh
> - Ran MapReduce jobs using hadoop1 and hadoop2 users.
I got the error below:
Exception in thread "main" java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1006)
at java.io.File.createTempFile(File.java:1989)
at org.apache.hadoop.util.RunJar.main(RunJar.java:119)
To overcome this error, I gave group "hadoop" rwx permissions to folder hdfstmp. The permissions on this folder look like drwxrwxr-x.
Submitted MapReduce jobs using hadoop1 and hadoop2 users login. The job ran fine without any errors.
However, if I do a stop-all.sh and then do a start-all.sh, the DataNode (and occassionally even NameNode) does not start up. When I check the logs, i see an error as below:
2013-09-21 16:43:54,518 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /data/hdfstmp/dfs/data, expected: rwxr-xr-x, while actual: rwxrwxr-x
Now, without change to the group ownership of the hdfstmp directory, my MR jobs submitted by different users do not run. But when the NameNode gets restarted, i get the issue as above.
How do i overcome this issue? What is the best practice for the same?
Also, is there a way to monitor the jobs that are being submitted by the different users? I am assuming the Web UI should allow me to do this. Please confirm.
I appreciate any assistance you can provide me on this issue. Thanks.
Regards
Adding a dedicated Hadoop system user
We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc).
#addgroup hadoop
#adduser --ingroup hadoop hadoop1
#adduser --ingroup hadoop hadoop2
This will add the user hduser and the group hadoop to your local machine.
Change permission of your hadoop installed directory
chown -R hduser:hadoop hadoop
And lastly change hadoop temporary directoy permission
If your temp directory is /app/hadoop/tmp
#mkdir -p /app/hadoop/tmp
#chown hduser:hadoop /app/hadoop/tmp
and if you want to tighten up security, chmod from 755 to 750...
#chmod 750 /app/hadoop/tmp

Resources