How do i check hadoop server name? - hadoop

when I try to import data to excel from hdfs, it asks to enter the name of hadoop server, so I am confused what to type there, kindly help me on the same..

Give the name of your Namenode and try once.

You should enter the name of your name node here.
For e.g. in hdfs-site.xml, there is a property dfs.namenode.http-address, check the value of this property. You need to set this to the name of the server mentioned in this property.
For me, it is set to:
<property>
<name>dfs.namenode.http-address</name>
<value>mballur.myorg.com:50070</value>
<description>The address and the base port where the dfs namenode
web ui will listen on. If the port is 0 then the server will
start on a free port.</description>
<final>true</final>
</property>
So, if you take my example, you need to set this to mballur.myorg.com.
You can also get the name of the NameNode, by running hadoop fsck command.
For e.g. when I run the following command:
CMD PROMPT>hadoop fsck /tmp/
I get the following output:
Connecting to namenode via http://mballur.myorg.com:50070/fsck?ugi=mballur&path=%2Ftmp
FSCK started by mballur (auth:SIMPLE) from /192.168.56.1 for path /tmp at Wed Jan 06 18:29:57 IST 2016
You can see the first line, highlighted portion is the name of the Name Node:
Connecting to namenode via http://mballur.myorg.com:50070/fsck?ugi=mballur&path=%2Ftmp
Also, check this tutorial in Youtube: https://www.youtube.com/watch?v=_eyE7Qcj0_A

Related

Meaning of fs.defaultFS property in core-site.xml in hadoop

I am trying to set up hadoop in fully distributed mode, and to some extent I am successful in doing this.
However, I have got some doubt in one of the parameter setting in core-site.xml --> fs.defaultFS
In my set up, I have three nodes as described below:
Node1 -- 192.168.1.2 --> Configured to be Master (Running ResourceManager and NameNode daemons)
Node2 -- 192.168.1.3 --> Configured to be Slave (Running NodeManager and Datanode daemons)
Node3 -- 192.168.1.4 --> Configured to be Slave (Running NodeManager and Datanode daemons)
Now what does property fs.defaultFS mean? For example, if I set it like this:
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.2:9000/</value>
</property>
I am not able to understand the meaning of hdfs://192.168.1.2:9000. I can figure out that hdfs would mean that we are using hdfs file system, but what does the other parts means?
Does this mean that the host with IP address 192.168.1.2 is running the Namenode at port 9000?
Can anyone help me understand this?
In this code:
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.2:9000/</value>
</property>
Include fs.defaultFS/fs.default.name in core-site.xml to allow dfs commands without providing full site name in the command. Running hdfs dfs -ls / instead of hdfs dfs -ls hdfs://hdfs/
This is used to specify the default file system and defaults to your local file system that's why it needs be set to a HDFS address. This is important for client configuration as well so your local configuration file should include this element.
Above #Shashank explained very appropriate that :
hdfs://192.168.1.2:9000/. Here 9000 denotes port on which datanode will send heartbeat to namenode. And full address is the machine name that is converted to hostname.
<name>fs.default.name</name>.
Here fs denotes file system and default.name denotes namenode
<value>hdfs://192.168.1.2:9000/</value>.
Here 9000 denotes port on which datanode will send heartbeat to namenode. And full address is the machine name that is converted to hostname.
Something important to note about port is that you can give any port greater than 1024 as lesser than that have to give root privileges.

DataNode is Not Starting in singlenode hadoop 2.6.0

I installed hadoop 2.6.0 in my laptop running Ubuntu 14.04LTS. I successfully started the hadoop daemons by running start-all.sh and I run a WourdCount example successfully, then I tried to run a jar example that didn't work with me so I decide to format using hadoop namenode -format and start all over again but when I start all daemons using start-dfs.sh && start-yarn.sh then jps all daemons runs but not the datanode as shown bellow:
hdferas#feras-Latitude-E4310:/usr/local/hadoop$ jps
12628 NodeManager
12110 NameNode
12533 ResourceManager
13335 Jps
12376 SecondaryNameNode
How to solve that?
I have faced this issue and it is very easy to solve. Your datanode is not starting because after your namenode and datanode started running you formatted the namenode again. That means you have cleared the metadata from namenode. Now the files which you have stored for running the word count are still in the datanode and datanode has no idea where to send the block reports since you formatted the namenode so it will not start.
Here are the things you need to do to fix it.
Stop all the Hadoop services (stop-all.sh) and close any active ssh connections.
cat /usr/local/hadoop/etc/hadoop/hdfs-site.xml
This step is important, see where datanode's data is gettting stored. It is the value associated for datanode.data.dir. For me it is /usr/local/hadoop/hadoop_data/hdfs/datanode. Open your terminal and navigate to above directory and delete the directory named current which will be there under that directory. Make sure you are only deleting the "current" directory.
sudo rm -r /usr/local/hadoop/hadoop_data/hdfs/datanode/current
Now format the namenode and check whether everything is fine.
hadoop namenode -format
say yes if it asks you for anything.
jps
Hope my answer solves the issue. If it doesn't let me know.
Little advice: Don't format your namenode. Without namenode there is no way to reconstruct the data. If your wordcount is not running that is some other problem.
I had this issue when formatting namenode too. What i did to solve the issue was:
Find your dfs.name.dir location. Consider for example, your dfs.name.dir is /home/hadoop/hdfs.
(a) Now go to, /home/hadoop/hdfs/current.
(b) Search for the file VERSION. Open it using a text editor.
(c) There will be a line namespaceID=122684525 (122684525 is my ID, yours will be different). Note the ID down.
Now find your hadoop.tmp.dir location. Mine is /home/hadoop/temp.
(a) Go to /home/hadoop/temp/dfs/data/current.
(b) Search for the file VERSION and open it using a text editor.
(c) There will be a line namespaceID=. The namespaceID in this file and previous one must be same.
(d) This is the main reason why my datanode was not started. I made them both same and now datanode starts fine.
Note: copy the namespaceID from /home/hadoop/hdfs/current/VERSION to
/home/hadoop/temp/dfs/data/current/VERSION. Dont do it in reverse.
Now do start-dfs.sh && start-yarn.sh. Datanode will be started.
You Just need To Remove All The Contents Of DataNode Folder And Format The Datanode By Using The Following Command
hadoop namenode -format
Even I had same issue and checked the log and found below error
Exception - Datanode log
FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.io.IOException: All directories in dfs.datanode.data.dir are invalid: "/usr/local/hadoop_store/hdfs/datanode/
Ran the below command to resolve the issue
sudo chown -R hduser:hadoop /usr/local/hadoop_store
Note - I have create the namenode and datanode under the path /usr/local/hadoop_store
The above problem is occurred due to format the namenode (hadoop namenode -format) without stopping the dfs and yarn daemons. While formating namenode, the question given below is appeared and you press Y key for this.
Re-format filesystem in Storage Directory /tmp/hadoop-root/dfs/name ? (Y or N)
Solution,
You need to delete the files within the current(directory name) directory of dfs.name.dir, you mention in hdfs.site.xml. In my system dfs.name.dir is available in /tmp/hadoop-root/dfs/name/current.
rm -r /tmp/hadoop-root/dfs/name/current
By using the above comment I removed files inside in the current directory. Make sure you are only deleting the "current" directory.Again format the namenode after stopped the dfs and yarn daemons (stop-dfs.sh & stop-yarn.sh). Now datanode will start normally!!
at core-site.xml check for absolute path of temp directory, If this is not pointed correctly or not created (mkdir). The data node cant be started.
add below property in yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
not the right way to do it. but surely works~
remove files from your datanode ,namenode and tmp folder. any files/folders created inside these are owned by hadoop and may have some reference to the last run datanode details which may have failed or locked due to which the datanode does not star at the next attempt
I got the same issue (DataNode & TaskTracker would not come up).
RESOLUTION:
DELETE EVERY "CURRENT" SUB-DIRECTORY UNDER: data, name, and namesecondary to resolve DataNode/taskTracker not showing when you start-all.sh, then jps
(My dfs.name.dir location is: /home/training/hadoop-temp/dfs/data/current; /home/training/hadoop-temp/dfs/name/current; /home/training/hadoop-temp/dfs/namesecondary/current
Make sure you stop services: stop-all.sh
1. Go to each "current" sub-directory under data, name, namesecondary and remove/delete (example: rm -r name/current)
2. Then format: hadoop namenode -format
3. mkdir current under /home/training/hadoop-temp/dfs/data/current
4. Take the directory and contents from /home/training/hadoop-temp/dfs/name/current and copy into the /data/current directory
EXAMPLE: files under:
/home/training/hadoop-temp/dfs/name/current
[training#CentOS current]$ ls -l
-rw-rw-r--. 1 training training 9901 Sep 25 01:50 edits
-rw-rw-r--. 1 training training 582 Sep 25 01:50 fsimage
-rw-rw-r--. 1 training training 8 Sep 25 01:50 fstime
-rw-rw-r--. 1 training training 101 Sep 25 01:50 VERSION
5. Change the storageType=NAME_NODE in VERSION to storageType=DATA_NODE in the data/current/VERSION that you just copied over.
BEFORE:
[training#CentOS dfs]$ cat data/current/VERSION
namespaceID=1018374124
cTime=0
storageType=NAME_NODE
layoutVersion=-32
AFTER:
[training#CentOS dfs]$ cat data/current/VERSION
namespaceID=1018374124
cTime=0
storageType=DATA_NODE
layoutVersion=-32
6. Make sure each subdirectory below has the same files that name/current has for data, name, namesecondary
[training#CentOS dfs]$ pwd
/home/training/hadoop-temp/dfs/
[training#CentOS dfs]$ ls -l
total 12
drwxr-xr-x. 5 training training 4096 Sep 25 01:29 data
drwxrwxr-x. 5 training training 4096 Sep 25 01:19 name
drwxrwxr-x. 5 training training 4096 Sep 25 01:29 namesecondary
7. Now start the services: start-all.sh
You should see all 5 services when you type: jps
I am using hadoop-2.6.0.I resolved using:
1.Deleting all files within
/usr/local/hadoop_store/hdfs
command : sudo rm -r /usr/local/hadoop_store/hdfs/*
2.Format hadoop namenode
command : hadoop namenode -format
3.Go to ..../sbin directory(cd /usr/local/hadoop/sbin)
start-all.sh
use command==> hduser#abc-3551:/$ jps
Following services would be started now :
19088 Jps
18707 ResourceManager
19043 NodeManager
18535 SecondaryNameNode
18329 DataNode
18159 NameNode
When I had this same issue, the 'Current' folder wasn't even being created in my hadoop/data/datanode folder. If this is the case for you too,
~copy the contents of 'Current' from namenode and paste it into datanode folder.
~Then, open VERSION for datanode and change the storageType=NAME_NODE to storageType=DATA_NODE
~run jps to see that the datanode continues to run

Hadoop - namenode is not starting up

I am trying to run hadoop as a root user, i executed namenode format command hadoop namenode -format when the Hadoop file system is running.
After this, when i try to start the name node server, it shows error like below
13/05/23 04:11:37 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:330)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
I tried to search for any solution, but cannot find any clear solution.
Can anyone suggest?
Thanks.
DFS needs to be formatted. Just issue the following command after stopping all and then restart.
hadoop namenode -format
Cool, i have found the solution.
Stop all running server
1) stop-all.sh
Edit the file /usr/local/hadoop/conf/hdfs-site.xml and add below configuration if its missing
<property>
<name>dfs.data.dir</name>
<value>/app/hadoop/tmp/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/app/hadoop/tmp/dfs/name</value>
<final>true</final>
</property>
Start both HDFS and MapReduce Daemons
2) start-dfs.sh
3) start-mapred.sh
Then now run the rest of the steps to run the map reduce sample given in this link
Note : You should be running the command bin/start-all.sh if the direct command is not running.
format hdfs when namenode stop.(just like the top answer).
I add some more details.
FORMAT command will check or create path/dfs/name, and initialize or reinitalize it.
then running start-dfs.sh would run namenode, datanode, then namesecondary.
when namenode check not exist path/dfs/name or not initialize, it occurs a fatal error, then exit.
that's why namenode not start up.
more details you can check HADOOP_COMMON/logs/XXX.namenode.log
Make sure the directory you've specified for your namenode is completely empty. Something like a "lost+found" folder in said directory will trigger this error.
hdfs-site.xml your value is wrong. You input the wrong folder that's why is not starting the name node.
First mkdir [folder], then set hdfs-site.xml then format
make sure that the directory to name(dfs.name.dir) and data (dfs.data.dir) folder is correctly listed in hdfs-site.xml
Formatting namenode worked for me
bin/hadoop namenode -format

hadoop hdfs points to file:/// not hdfs://

So I installed Hadoop via Cloudera Manager cdh3u5 on CentOS 5. When I run cmd
hadoop fs -ls /
I expected to see the contents of hdfs://localhost.localdomain:8020/
However, it had returned the contents of file:///
Now, this goes without saying that I can access my hdfs:// through
hadoop fs -ls hdfs://localhost.localdomain:8020/
But when it came to installing other applications such as Accumulo, accumulo would automatically detect Hadoop Filesystem in file:///
Question is, has anyone ran into this issue and how did you resolve it?
I had a look at HDFS thrift server returns content of local FS, not HDFS , which was a similar issue, but did not solve this issue.
Also, I do not get this issue with Cloudera Manager cdh4.
By default, Hadoop is going to use local mode. You probably need to set fs.default.name to hdfs://localhost.localdomain:8020/ in $HADOOP_HOME/conf/core-site.xml.
To do this, you add this to core-site.xml:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost.localdomain:8020/</value>
</property>
The reason why Accumulo is confused is because it's using the same default configuration to figure out where HDFS is... and it's defaulting to file://
We should specify data node data directory and name node meta data directory.
dfs.name.dir,
dfs.namenode.name.dir,
dfs.data.dir,
dfs.datanode.data.dir,
fs.default.name
in core-site.xml file and format name node.
To format HDFS Name Node:
hadoop namenode -format
Enter 'Yes' to confirm formatting name node. Restart HDFS service and deploy client configuration to access HDFS.
If you have already did the above steps. Ensure client configuration is deployed correctly and it points to the actual cluster endpoints.

hadoop conf "fs.default.name" can't be setted ip:port format directly?

all
I have setupped a hadoop cluster in fully distributed mode. First, I set core-site.xml "fs.default.name" and mapred-site.xml "mapred.job.tracker" in hostname:port format, and chang /etc/hosts correspondingly, the cluster works succesfully.
Then I use another way, I set set core-site.xml "fs.default.name" and mapred-site.xml "mapred.job.tracker" in ip:port format. It dosen't work.
I find
ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhost name. Using 'localhost'...
in namenode log file and
ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhos
t name. Using 'localhost'...
java.net.UnknownHostException: slave01: slave01: Name or service not known
in datanode log file.
In my opinion,ip and hostname is equivalent. Is there something wrong in my hadoop conf?
maybe there is a wrong configured hostname in /etc,
you should check hostname /etc/hosts /etc/HOSTNAME (rhel/debian) or rc.conf (archlinux) etc.
I got your point. This is because of that you probably wrote in mapred-site.xml, hdfs://ip:port (it starts with hdfs, this is wrong) but when you write hostname:port, you probably did not write hdfs at the beginning of the value which is correct way. THerefore, firstone did not work,but, second has worked
Fatih haltas
I found answer here.
It seems that HDFS uses host name only for it's all communication and display purposes, so we can NOT use ip directly in core-site.xml and mapred-site.xml

Resources