Hadoop permission issue - hadoop

I've homebrew installed hadoop but now having permission control problems when doing
hadoop namenode -format and ./start-all.sh command.
I think it's because I put settings in "core-site.xml". The "hadoop.tmp.dir" I put "/tmp/${name}" under.
Now it's giving me error in namenode -format as: can't create folder, permission denied.
Even I sudo this command, but in the start-all.sh command, still a lot of permissions are denied. I tried to sudo start-all.sh but the password (I only use this pass for my admin on mac) but denied also.
I think it's because of the permission issues. Is there anyway I can fix it?
Thanks!

On your local system, it looks like you do not have the hduser user created.
As a typical setup process, it is a good process to create a hadoop group and a hduser user added to that group.
You can do that with the root/super user account with the following command:
$ sudo adduser --ingroup hadoop hduser
This assumes you have the hadoop group setup. If that is not setup, you can create a group with:
$ sudo addgroup hadoop

So when you run Hadoop it stores things in the data, name, and tmp dirs that you configure in the hdfs-site.xml file. If you don't set these settings they will point to ${hadoop.tmp.dir}/dfs/data, in your case the /tmp dir. This is not where you want your data stored. You will first need to add these to your hdfs config file, among other settings.
On master :
<property>
<name>dfs.data.dir</name>
<value>/app/hadoop/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/app/hadoop/name</value>
</property>
On slaves :
<property>
<name>dfs.data.dir</name>
<value>/app/hadoop/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>master:/app/hadoop/name</value>
</property>
Now once this is done you must actually make those directories. So create the following dirs on master :
/app/hadoop/name, /app/hadoop/data, and /app/hadoop/tmp.
Create the same on slaves except the name dir.
Now you need to set the permissions so that they can be used by Hadoop.
The second line just to be sure.
sudo chown <hadoop user>:<hadoop user> /app/hadoop/name /app/hadoop/data /app/hadoop/tmp
sudo chmod 0777 /app/hadoop/name /app/hadoop/data /app/hadoop/tmp
Try that, see if it works. I can answer questions if it's not the whole answer.

Related

Hadoop hdfs: input/output error when creating user folder

I've followed the instructions in Hadoop the definitive guide, 4th edition : Appendix A to configure Hadoop in pseudo-distributed mode. Everything is working good, except for when I try to make a directory :
hadoop fs -mkdir -p /user/$USER
The commande is returning the following message : mkdir: /user/my_user_name': Input/output error.
Although, when I first log into my root account sudo -s and then type the hadoop fs -mkdir -p /user/$USER commande, the directory 'user/root'is created (all directories in the path).
I think I'm having Hadoop permission issues.
Any help would be really appreciated,
Thanks.
It means that you have a mistake in the 'core-site.xml' file. For instance, I had an error in the first line (name) in which I wrote 'fa.defaultFS' instead 'fs.defaultFS'.
After that, you have to execute the script 'stop-all.sh' to stop Hadoop. Probably, here, you will have to format the namenode with the commands: 'rm -Rf /app/tmp/your-username/*' and 'hdfs namenode -format'. Next, you have to start Hadoop with the 'start-all.sh' script.
Maybe, you have to reboot the system when you have executed the stop script.
After these steps, I could run that command again.
I Corrected the core-site.xml file based on standard commands and it works fine now.
<property>
<name>hadoop.tmp.dir</name>
<value>/home/your_user_name/hadooptmpdata</value>
<description>Where Hadoop will place all of its working files</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<description>Where HDFS NameNode can be found on the network</description>
</prosperty>

Namenode not starting -su: /home/hduser/../libexec/hadoop-config.sh: No such file or directory

Installed Hadoop 2.7.1 on Ubuntu 15.10
Everything is working fine, only when I hit JPS , I can see all the demons running, except namenode .
at start it shows : -su: /home/hduser/../libexec/hadoop-config.sh: No such file or directory
When I googled it I came to know that , I can ignore this , as my
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
are set properly and hduser ( the user which runs the hadoop) owns the permission for these folders
any clue ??
After spending some time , this simple change worked for me .
press ifconfig.
copy ip address
sudo gedit /etc/hosts
comment this line
#127.0.0.1 localhost
add the following line
10.0.2.15(your ip address) Hadoop-NameNode
This might be problem due to frequent Namenode format. Please see the namenode logs in logger.
Probable solution :
Check your hadoop.tmp.dir in core-site.xml.
On that location, make sure that you have same clusterid for namenode and datanode(otherwise make them same).
You can see clusterid inside VERSION file in dfs/name/current and dfs/data/current. If that make sense.

Hadoop 2.x -- how to configure secondary namenode?

I have an old Hadoop install that I'm looking to update to Hadoop 2. In the
old setup, I have a $HADOOP_HOME/conf/masters file that specifies the
secondary namenode.
Looking through the Hadoop 2 documentation I can't find any mention of a
"masters" file, or how to setup a secondary namenode.
Any help in the right direction would be appreciated.
The slaves and masters files in the conf folder are only used by some scripts in the bin folder like start-mapred.sh, start-dfs.sh and start-all.sh scripts.
These scripts are a mere convenience so that you can run them from a single node to ssh into each master / slave node and start the desired hadoop service daemons.
You only need these files on the name node machine if you intend to launch your cluster from this single node (using password-less ssh).
Alternatively, You can also start an Hadoop daemon manually on a machine via
bin/hadoop-daemon.sh start [namenode | secondarynamenode | datanode | jobtracker | tasktracker]
In order to run the secondary name node, use the above script on the designated machines providing the 'secondarynamenode' value to the script
See #pwnz0r 's 2nd comment on answer on How separate hadoop secondary namenode from primary namenode?
To reiterate here:
In hdfs-site.xml:
<property>
<name>dfs.secondary.http.address</name>
<value>$secondarynamenode.full.hostname:50090</value>
<description>SecondaryNameNodeHostname</description>
</property>
I am using Hadoop 2.6 and had to use
<property>
<name>dfs.secondary.http.address</name>
<value>secondarynamenode.hostname:50090</value>
</property>
for further details refer https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
Update hdfs-site.xml file by updating and adding following property
cd $HADOOP_HOME/etc/hadoop
sudo vi hdfs-site.xml
Then paste these lines into configuration tag
<property>
<name>dfs.secondary.http.address</name>
<value>hostname:50090</value>
</property>

Getting error when trying to run Hadoop 2.4.0 (-bash: bin/start-all.sh: No such file or directory)

I am doing the following to install and run Hadoop on my Mac:
First I install HomeBrew as the Package Manager
ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"
Then I install Hadoop using the Brew command:
brew install hadoop
Then the following:
cd /usr/local/Cellar/hadoop/1.1.2/libexec
export HADOOP_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.kdc="
Then I configure Hadoop by adding the following to proper .xml files:
core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
I then Enable SSH to localhost:
System Preferences > Sharing > “Remote Login” is checked.
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
I then Format Hadoop filesystem:
bin/hadoop namenode -format
And then Start Hadoop (or at least try...this is where I get the error)
bin/start-all.sh
I get the error -bash: bin/start-all.sh: No such file or directory.
The one "odd" thing I did during setup was, since there is no longer a mapred-site.xml file in 2.4.0, I simply copied the mapred-site.xml.template file to my desktop, renamed it to mapred-site.xml, and put that new copy in the folder. I also tried running without any mapred-site.xml configuration but I still get this error.
AFAIK , brew installs hadoop-2.4.0 by default. See here https://github.com/Homebrew/homebrew/blob/master/Library/Formula/hadoop.rb
And in hadoop2.x there is no start-all.sh file in bin folder. It is moved to sbin. Also you need some more configurations. These links may be useful : http://codesfusion.blogspot.in/2013/10/setup-hadoop-2x-220-on-ubuntu.htmlhttps://hadoop.apache.org/docs/r2.2.0/

Hadoop keeps on writing mapred intermediate outuput in /tmp directory

I have limited capacity in /tmp so I want to move all the intermediate output of mapred in a bigger partition, say /home/hdfs/tmp_data .
If I understand correctly, I just need to set
<property>
<name>mapred.child.tmp</name>
<value>/home/hdfs/tmp_data</value>
in mapred-site.xml
I restart the cluster through Ambari, I check everything is written in the conf file,
however, when I run a pig script, it keeps writing in:
/tmp/hadoop-hdfs/mapred/local/taskTracker/hdfs/jobcache/job_localXXX/attempt_YY/output
I have also modified hadoop.tmp.dir in core-site.xml to be /home/hdfs/tmp_data , but nothing changes.
Is there any parameter that overwrite my settings?
Try override the following property in tasktracker nodes mapred-site.xml file and restart it.
<property>
<name>mapred.local.dir/name>
<value>/home/hdfs/tmp_data</value>
</property>

Resources