What's the standard way to create files in your hdfs filesystem? - hadoop

I learned that I have to configure the NameNode and DataNode dir in hdfs-site.xml. So that's my hdfs-site.xml configuration on the NameNode:
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file://usr/local/hadoop-2.6.0/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
</configuration>
I did almost the same on my DataNode and changed dfs.namenode to dfs.datanode.
Then I formatted the filesystem via
hadoop namenode -format
Everything seems to be finished without an error.
Then I wanted to create a directory in my HDFS filesystem by using:
hdfs dfs -mkdir test
And I got an error:
mkdir: `test': No such file or directory
What did I miss or what's the common process from formatting to creating files/directories with HDFS?

Well, it's so easy.
hdfs dfs -mkdir /test
was created successfully.
hdfs dfs -put myFile /test/myFile
works as well.

Create a directory:
hdfs dfs -mkdir directoryName
Create a new file in directory
hdfs dfs -touchz directoryName/Newfilename
Write into newly created file in HDFS
nano filename
Save it Cntr+X Y
Read the newly created file from HDFS
nano fileName
Or
hdfs dfs -cat directoryName/fileName

HDFS is a non POSIX compliant file systems so you can't edit files directly inside of HDFS, however you can Copy a file from your local system to HDFS using following command:
hdfs dfs -put /path/in/source/system/filename /path/in/HDFS/system/destination

If you want to create multiple sub-directories then you should also use -p flag:
hdfs dfs -mkdir -p /test/another_test/one_more_test

Related

access file in datanode from namenode hadoop

I have configured the namenode in hadoopmaster and data node in hadoopslave.When start-dfs.sh in hadoopmaster the namenode in hadoopmaster and datanode in hadoopslave getting started.when i try to execute the command hdfs dfs -ls / in hadoopmaster i can't view the files that i had put in hadoopslave.
Note:I put a file in hadoopslave using hdfs dfs -put /sample.txt /
Correct me if I'am wrong!

Cluster configuration and hdfs

Im trying to configure my cluster by following this tutorial -
https://developer.yahoo.com/hadoop/tutorial/module2.html
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.71.128:9000</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop-user/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop-user/hdfs/name</value>
</property>
</configuration>
I have also copied a local file to /user/prema/ using the below commands
hadoop-user#hadoop-desk:~/hadoop$ bin/hadoop dfs -put /home/hadoop-user/googlebooks-eng-all-1gram-20120701-0 /user/prema
hadoop-user#hadoop-desk:~/hadoop$ bin/hadoop dfs -ls /user/prema
Found 1 items
-rw-r--r-- 1 hadoop-user supergroup 192403080 2014-11-19 02:43 /user/prema
Now, I'm confused. I have datafiles here- /user/prema but the data node in the cluster config points to this - /home/hadoop-user/hdfs/data..How does it get related?
The /user/prema is a folder within HDFS. The folder /home/hadoop-user/hdfs/data is a folder within the regular filesystem.
The regular filesystem folder is the place where HDFS stores its data. So when you read data from HDFS, it actually goes to the physical regular filesystem folder to read the data. You should never need to touch this data as its format is not very user-friendly - the HDFS takes care of data manipulation for you.

hadoop file system list my own root directory

I met a very wired situation when I try to install single node hadoop yarn 2.2.0 on my mac. I follow the tutorial on this link: http://raseshmori.wordpress.com/2012/09/23/install-hadoop-2-0-1-yarn-nextgen/.
When I start the hadoop, and jps to check the status, it shows: (which means normal, I think)
5552 Jps
7162 ResourceManager
7512 Jps
7243 NodeManager
6962 DataNode
7060 SecondaryNameNode
6881 NameNode
However, after enter
hadoop fs -ls /
The files lists are the files in my own root but not the hadoop file system root. There must be some error when I set the hadoop that mix my own fs with the hdfs. Could any one give me a hint about it?
Use the following command for accessing HDFS
hadoop fs -ls hdfs://localhost:9000/
Or
Populate ${HADOOP_CONF_DIR}/core-site.xml as follows. If your doing so even without specifying hdfs:// URI you will be able to access HDFS.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Add the following line at the starting of the file $HOME/yarn/hadoop-2.0.1-alpha/libexec/hadoop-config.sh
export HADOOP_CONF_DIR=$HOME/yarn/hadoop-2.0.1-alpha/etc/hadoop

Namenode not getting started

I was using Hadoop in a pseudo-distributed mode and everything was working fine. But then I had to restart my computer because of some reason. And now when I am trying to start Namenode and Datanode I can find only Datanode running. Could anyone tell me the possible reason of this problem? Or am I doing something wrong?
I tried both bin/start-all.sh and bin/start-dfs.sh.
I was facing the issue of namenode not starting. I found a solution using following:
first delete all contents from temporary folder: rm -Rf <tmp dir> (my was /usr/local/hadoop/tmp)
format the namenode: bin/hadoop namenode -format
start all processes again:bin/start-all.sh
You may consider rolling back as well using checkpoint (if you had it enabled).
hadoop.tmp.dir in the core-site.xml is defaulted to /tmp/hadoop-${user.name} which is cleaned after every reboot. Change this to some other directory which doesn't get cleaned on reboot.
Following STEPS worked for me with hadoop 2.2.0,
STEP 1 stop hadoop
hduser#prayagupd$ /usr/local/hadoop-2.2.0/sbin/stop-dfs.sh
STEP 2 remove tmp folder
hduser#prayagupd$ sudo rm -rf /app/hadoop/tmp/
STEP 3 create /app/hadoop/tmp/
hduser#prayagupd$ sudo mkdir -p /app/hadoop/tmp
hduser#prayagupd$ sudo chown hduser:hadoop /app/hadoop/tmp
hduser#prayagupd$ sudo chmod 750 /app/hadoop/tmp
STEP 4 format namenode
hduser#prayagupd$ hdfs namenode -format
STEP 5 start dfs
hduser#prayagupd$ /usr/local/hadoop-2.2.0/sbin/start-dfs.sh
STEP 6 check jps
hduser#prayagupd$ $ jps
11342 Jps
10804 DataNode
11110 SecondaryNameNode
10558 NameNode
In conf/hdfs-site.xml, you should have a property like
<property>
<name>dfs.name.dir</name>
<value>/home/user/hadoop/name/data</value>
</property>
The property "dfs.name.dir" allows you to control where Hadoop writes NameNode metadata.
And giving it another dir rather than /tmp makes sure the NameNode data isn't being deleted when you reboot.
Open a new terminal and start the namenode using path-to-your-hadoop-install/bin/hadoop namenode
The check using jps and namenode should be running
Why do most answers here assume that all data needs to be deleted, reformatted, and then restart Hadoop?
How do we know namenode is not progressing, but taking lots of time.
It will do this when there is a large amount of data in HDFS.
Check progress in logs before assuming anything is hung or stuck.
$ [kadmin#hadoop-node-0 logs]$ tail hadoop-kadmin-namenode-hadoop-node-0.log
...
016-05-13 18:16:44,405 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 117/141 transactions completed. (83%)
2016-05-13 18:16:56,968 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 121/141 transactions completed. (86%)
2016-05-13 18:17:06,122 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 122/141 transactions completed. (87%)
2016-05-13 18:17:38,321 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 123/141 transactions completed. (87%)
2016-05-13 18:17:56,562 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 124/141 transactions completed. (88%)
2016-05-13 18:17:57,690 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 127/141 transactions completed. (90%)
This was after nearly an hour of waiting on a particular system.
It is still progressing each time I look at it.
Have patience with Hadoop when bringing up the system and check logs before assuming something is hung or not progressing.
In core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yourusername/hadoop/tmp/hadoop-${user.name}
</value>
</property>
</configuration>
and format of namenode with :
hdfs namenode -format
worked for hadoop 2.8.1
If anyone using hadoop1.2.1 version and not able to run namenode, go to core-site.xml, and change dfs.default.name to fs.default.name.
And then format the namenode using $hadoop namenode -format.
Finally run the hdfs using start-dfs.sh and check for service using jps..
Did you change conf/hdfs-site.xml dfs.name.dir?
Format namenode after you change it.
$ bin/hadoop namenode -format
$ bin/hadoop start-all.sh
If you facing this issue after rebooting the system, Then below steps will work fine
For workaround.
1) format the namenode: bin/hadoop namenode -format
2) start all processes again:bin/start-all.sh
For Perm fix: -
1) go to /conf/core-site.xml change fs.default.name to your custom one.
2) format the namenode: bin/hadoop namenode -format
3) start all processes again:bin/start-all.sh
Faced the same problem.
(1) Always check for the typing mistakes in the configuring the .xml files, especially the xml tags.
(2) go to bin dir. and type ./start-all.sh
(3) then type jps , to check if processes are working
Add hadoop.tmp.dir property in core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yourname/hadoop/tmp/hadoop-${user.name}</value>
</property>
</configuration>
and format hdfs (hadoop 2.7.1):
$ hdfs namenode -format
The default value in core-default.xml is /tmp/hadoop-${user.name}, which will be deleted after reboot.
Try this,
1) Stop all hadoop processes : stop-all.sh
2) Remove the tmp folder manually
3) Format namenode : hadoop namenode -format
4) Start all processes : start-all.sh
If you kept default configurations when running hadoop the port for the namenode would be 50070. You will need to find any processes running on this port and kill them first.
Stop all running hadoop with : bin/stop-all.sh
check all processes running in port 50070
sudo netstat -tulpn | grep :50070 #check any processes running in
port 50070, if there are any the / will
appear at the RHS of the output.
sudo kill -9 <process_id> #kill_the_process.
sudo rm -r /app/hadoop/tmp #delete the temp folder
sudo mkdir /app/hadoop/tmp #recreate it
sudo chmod 777 –R /app/hadoop/tmp (777 is given for this example purpose only)
bin/hadoop namenode –format #format hadoop namenode
bin/start-all.sh #start-all hadoop services
Refer this blog
For me the following worked after I changed the directory of the namenode
and datanode in hdfs-site.xml
-- before executing the following steps stop all services with stop-all.sh or in my case I used the stop-dfs.sh to stop the dfs
On the new configured directory, for every node (namenode and datanode), delete every folder/files inside it (in my case a 'current' directory).
delete the Hadoop temporary directory: $rm -rf /tmp/haddop-$USER
format the Namenode: hadoop/bin/hdfs namenode -format
start-dfs.sh
After I followed those steps my namenode and datanodes were alive using the new configured directory.
I ran $hadoop namenode to start namenode manually at foreground.
From the logs I figured out that 50070 is ocuupied, which was defaultly used by dfs.namenode.http-address. After configuring dfs.namenode.http-address in hdfs-site.xml, everything went well.
I got the solution just share with you that will work who got the errors:
1. First check the /home/hadoop/etc/hadoop path, hdfs-site.xml and
check the path of namenode and datanode
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
2.Check the permission,group and user of namenode and datanode of the particular path(/home/hadoop/hadoopdata/hdfs/datanode), and check if there are any problems in all of them and if there are any mismatch then correct it. ex .chown -R hadoop:hadoop in_use.lock, change user and group
chmod -R 755 <file_name> for change the permission
After deleting a resource managers' data folder, the problem is gone.
Even if you have formatting cannot solve this problem.
If your namenode is stuck in safemode you can ssh to namenode, su hdfs user and run the following command to turn off safemode:
hdfs dfsadmin -fs hdfs://server.com:8020 -safemode leave
Instead of formatting namenode, may be you can use the below command to restart the namenode. It worked for me:
sudo service hadoop-master restart
hadoop dfsadmin -safemode leave
I was facing the same issue of namenode not starting with Hadoop-3.2.1**** version. I did the steps to resolve the issue:
Delete the contents from temporary folder from the name node directory. In my case the "current" directory made by root user: rm -rf (dir name)
Format the namenode: hdfs namenode -format
start the processes again:start-dfs.sh
Point #1 has change in the hdfs-site.xml file.
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop/node-data/hdfs/namenode</value>
</property>
I ran into the same thing after a restart.
for hadoop-2.7.3 all I had to do was format the namenode:
<HadoopRootDir>/bin/hdfs namenode -format
Then a jps command shows
6097 DataNode
755 RemoteMavenServer
5925 NameNode
6293 SecondaryNameNode
6361 Jps

Having issue setting up Hadoop

The issue I'm having is that when I run bin/hadoop fs -ls it prints out all the files of the local directory that I'm in and not the files in hdfs (which currently should be none). Here's how I set everything up:
I've downloaded and unzipped all the 0.20.2 files into /home/micah/hadoop-install/. I've edited my conf/hdfs-site.xml with the following settings and created the appropriate directories:
<configuration>
<property>
<name>fs.default.name</name>
<value>localhost:9000</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/micah/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/micah/hdfs/name</value>
</property>
</configuration>
I then ran bin/hadoop namenode -format followed by bin/start-dfs.sh.
Try this:
#http://www.mail-archive.com/common-user#hadoop.apache.org/msg00407.html
rm -r /tmp/hadoop****
build $HADOOP_HOME
echo export JAVA_HOME=$JAVA_HOME >> $HADOOP_HOME/conf/hadoop-env.sh
echoThenRun "$HADOOP_HOME/bin/stop-all.sh"
echoThenRun "$HADOOP_HOME/bin/hadoop namenode -format"
I had a similar issue and found that my HDFS data directory permissions were wrong.
Removing group write privileges with chmod -R g-w from the data directory fixed the problem.
Thanks , Doing the following resolved my issue
rm -r /tmp/hadoop****
build $HADOOP_HOME
echo export JAVA_HOME=$JAVA_HOME >> $HADOOP_HOME/conf/hadoop-env.sh
echoThenRun "$HADOOP_HOME/bin/stop-all.sh"
echoThenRun "$HADOOP_HOME/bin/hadoop namenode -format"

Resources