i executed a hadoop mapreduce program successfully, Can someone tell me how see output through browser like <http://localhost:port/hdfsLocation/> - hadoop

i executed a hadoop mapreduce program successfully in CDH4, but where can i see my output ? , Can someone tell me how to see output through browser like: It will be helpfull to me

on terminal
hadoop dfs -ls /inputfile
it will give result like
Found 2 items
-rw-r--r-- 3 user17 supergroup 0 2014-11-27 16:47 /inputfile/_SUCCESS
-rw-r--r-- 3 user17 supergroup 24441 2014-11-27 16:47 /inputfile/part-00000
hadoop dfs -cat /inputfile/part-00000
NameNode and DataNode each run an internal web server in order to display basic information about the current status of the cluster. With the default configuration, the NameNode front page is at http://namenode-name:50070/. It lists the DataNodes in the cluster and basic statistics of the cluster. The web interface can also be used to browse the file system (using "Browse the file system" link on the NameNode front page).
if you want see output on web please see. http://gethue.com/#

Related

Secondary name node is not displaying when I hit JPS command

I have Hadoop-3.1.3 and I can upload a file in hadoop pseudo distributed mode, also can display the contents of file.
but when I call jps command i am getting the following output
10912 DataNode
13072 ResourceManager
4480 NodeManager
6584 Jps
664 Namenode
I am unable to find secondary name node, is there a problem with any configuration or hadoop installation?
You're assuming that secondary namenode is started with psuedo-distributed?
If the basic commands work, then its fine.
You need to look at log files to know if something is broken, before asking elsewhere....
In general, I always suggest you use Apache Ambari to provision a Hadoop cluster
You can start the Secondary NameNode manually and observe the start up logs to see if there's anything wrong:
hdfs secondarynamenode
If there's no error, run jps again and hopefully you see SecondaryNameNode listed.
I'd suggest running hdfs --help and checking out all of the options, there's a lot of good stuff there.

Hadoop Nodemanager failing with error Can't get group information

I have kerberos configured Apache hadoop(2.8.5) installed. NameNode, DataNode and ResourceManager is running fine, but Nodemanager is failing to start with error:
Can't get group information for hadoop#configured value of yarn.nodemanager.linux-container-executor.group - Success.
file permissions:
container-executor.cfg: -rw------- 1 root hadoop
container-executor: ---Sr-s--- 1 root hadoop
container-executor.cfg
yarn.nodemanager.local-dirs=/hadoop/data/yarn/local
yarn.nodemanager.linux-container-executor.group=hadoop#configured value
of yarn.nodemanager.linux-container-executor.group
banned.users=hdfs,yarn,mapred,bin,root#comma separated list of users who can not run applications
min.user.id=1000#Prevent other super-users
Simply remove the comment:
#configured value
from the configuration line:
yarn.nodemanager.linux-container-executor.group
on the container-executor.cfg file
It should looke like this:
yarn.nodemanager.local-dirs=/hadoop/data/yarn/local
yarn.nodemanager.linux-container-executor.group=hadoop
of yarn.nodemanager.linux-container-executor.group
banned.users=hdfs,yarn,mapred,bin,root
min.user.id=1000
This configuration file has had historical problem with spaces, comments, etc..

Unable to load large file to HDFS on Spark cluster master node

I have fired up a Spark Cluster on Amazon EC2 containing 1 master node and 2 servant nodes that have 2.7gb of memory each
However when I tried to put a file of 3 gb on to the HDFS through the code below
/root/ephemeral-hdfs/bin/hadoop fs -put /root/spark/2GB.bin 2GB.bin
it returns the error, "/user/root/2GB.bin could only be replicated to 0 nodes, instead of 1". fyi, I am able to upload files of smaller size but not when it exceeds a certain size (about 2.2 gb).
If the file exceeds the memory size of a node, wouldn't it will be split by Hadoop to the other node?
Edit: Summary of my understanding of the issue you are facing:
1) Total HDFS free size is 5.32 GB
2) HDFS free size on each node is 2.6GB
Note: You have bad blocks (4 Blocks with corrupt replicas)
The following Q&A mentions similar issues:
Hadoop put command throws - could only be replicated to 0 nodes, instead of 1
In that case, running JPS showed that the datanode are down.
Those Q&A suggest a way to restart the data-node:
What is best way to start and stop hadoop ecosystem, with command line?
Hadoop - Restart datanode and tasktracker
Please try to restart your data-node, and let us know if it solved the problem.
When using HDFS - you have one shared file system
i.e. all nodes share the same file system
From your description - the current free space on the HDFS is about 2.2GB , while you tries to put there 3GB.
Execute the following command to get the HDFS free size:
hdfs dfs -df -h
hdfs dfsadmin -report
or (for older versions of HDFS)
hadoop fs -df -h
hadoop dfsadmin -report

DataNode is Not Starting in singlenode hadoop 2.6.0

I installed hadoop 2.6.0 in my laptop running Ubuntu 14.04LTS. I successfully started the hadoop daemons by running start-all.sh and I run a WourdCount example successfully, then I tried to run a jar example that didn't work with me so I decide to format using hadoop namenode -format and start all over again but when I start all daemons using start-dfs.sh && start-yarn.sh then jps all daemons runs but not the datanode as shown bellow:
hdferas#feras-Latitude-E4310:/usr/local/hadoop$ jps
12628 NodeManager
12110 NameNode
12533 ResourceManager
13335 Jps
12376 SecondaryNameNode
How to solve that?
I have faced this issue and it is very easy to solve. Your datanode is not starting because after your namenode and datanode started running you formatted the namenode again. That means you have cleared the metadata from namenode. Now the files which you have stored for running the word count are still in the datanode and datanode has no idea where to send the block reports since you formatted the namenode so it will not start.
Here are the things you need to do to fix it.
Stop all the Hadoop services (stop-all.sh) and close any active ssh connections.
cat /usr/local/hadoop/etc/hadoop/hdfs-site.xml
This step is important, see where datanode's data is gettting stored. It is the value associated for datanode.data.dir. For me it is /usr/local/hadoop/hadoop_data/hdfs/datanode. Open your terminal and navigate to above directory and delete the directory named current which will be there under that directory. Make sure you are only deleting the "current" directory.
sudo rm -r /usr/local/hadoop/hadoop_data/hdfs/datanode/current
Now format the namenode and check whether everything is fine.
hadoop namenode -format
say yes if it asks you for anything.
jps
Hope my answer solves the issue. If it doesn't let me know.
Little advice: Don't format your namenode. Without namenode there is no way to reconstruct the data. If your wordcount is not running that is some other problem.
I had this issue when formatting namenode too. What i did to solve the issue was:
Find your dfs.name.dir location. Consider for example, your dfs.name.dir is /home/hadoop/hdfs.
(a) Now go to, /home/hadoop/hdfs/current.
(b) Search for the file VERSION. Open it using a text editor.
(c) There will be a line namespaceID=122684525 (122684525 is my ID, yours will be different). Note the ID down.
Now find your hadoop.tmp.dir location. Mine is /home/hadoop/temp.
(a) Go to /home/hadoop/temp/dfs/data/current.
(b) Search for the file VERSION and open it using a text editor.
(c) There will be a line namespaceID=. The namespaceID in this file and previous one must be same.
(d) This is the main reason why my datanode was not started. I made them both same and now datanode starts fine.
Note: copy the namespaceID from /home/hadoop/hdfs/current/VERSION to
/home/hadoop/temp/dfs/data/current/VERSION. Dont do it in reverse.
Now do start-dfs.sh && start-yarn.sh. Datanode will be started.
You Just need To Remove All The Contents Of DataNode Folder And Format The Datanode By Using The Following Command
hadoop namenode -format
Even I had same issue and checked the log and found below error
Exception - Datanode log
FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.io.IOException: All directories in dfs.datanode.data.dir are invalid: "/usr/local/hadoop_store/hdfs/datanode/
Ran the below command to resolve the issue
sudo chown -R hduser:hadoop /usr/local/hadoop_store
Note - I have create the namenode and datanode under the path /usr/local/hadoop_store
The above problem is occurred due to format the namenode (hadoop namenode -format) without stopping the dfs and yarn daemons. While formating namenode, the question given below is appeared and you press Y key for this.
Re-format filesystem in Storage Directory /tmp/hadoop-root/dfs/name ? (Y or N)
Solution,
You need to delete the files within the current(directory name) directory of dfs.name.dir, you mention in hdfs.site.xml. In my system dfs.name.dir is available in /tmp/hadoop-root/dfs/name/current.
rm -r /tmp/hadoop-root/dfs/name/current
By using the above comment I removed files inside in the current directory. Make sure you are only deleting the "current" directory.Again format the namenode after stopped the dfs and yarn daemons (stop-dfs.sh & stop-yarn.sh). Now datanode will start normally!!
at core-site.xml check for absolute path of temp directory, If this is not pointed correctly or not created (mkdir). The data node cant be started.
add below property in yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
not the right way to do it. but surely works~
remove files from your datanode ,namenode and tmp folder. any files/folders created inside these are owned by hadoop and may have some reference to the last run datanode details which may have failed or locked due to which the datanode does not star at the next attempt
I got the same issue (DataNode & TaskTracker would not come up).
RESOLUTION:
DELETE EVERY "CURRENT" SUB-DIRECTORY UNDER: data, name, and namesecondary to resolve DataNode/taskTracker not showing when you start-all.sh, then jps
(My dfs.name.dir location is: /home/training/hadoop-temp/dfs/data/current; /home/training/hadoop-temp/dfs/name/current; /home/training/hadoop-temp/dfs/namesecondary/current
Make sure you stop services: stop-all.sh
1. Go to each "current" sub-directory under data, name, namesecondary and remove/delete (example: rm -r name/current)
2. Then format: hadoop namenode -format
3. mkdir current under /home/training/hadoop-temp/dfs/data/current
4. Take the directory and contents from /home/training/hadoop-temp/dfs/name/current and copy into the /data/current directory
EXAMPLE: files under:
/home/training/hadoop-temp/dfs/name/current
[training#CentOS current]$ ls -l
-rw-rw-r--. 1 training training 9901 Sep 25 01:50 edits
-rw-rw-r--. 1 training training 582 Sep 25 01:50 fsimage
-rw-rw-r--. 1 training training 8 Sep 25 01:50 fstime
-rw-rw-r--. 1 training training 101 Sep 25 01:50 VERSION
5. Change the storageType=NAME_NODE in VERSION to storageType=DATA_NODE in the data/current/VERSION that you just copied over.
BEFORE:
[training#CentOS dfs]$ cat data/current/VERSION
namespaceID=1018374124
cTime=0
storageType=NAME_NODE
layoutVersion=-32
AFTER:
[training#CentOS dfs]$ cat data/current/VERSION
namespaceID=1018374124
cTime=0
storageType=DATA_NODE
layoutVersion=-32
6. Make sure each subdirectory below has the same files that name/current has for data, name, namesecondary
[training#CentOS dfs]$ pwd
/home/training/hadoop-temp/dfs/
[training#CentOS dfs]$ ls -l
total 12
drwxr-xr-x. 5 training training 4096 Sep 25 01:29 data
drwxrwxr-x. 5 training training 4096 Sep 25 01:19 name
drwxrwxr-x. 5 training training 4096 Sep 25 01:29 namesecondary
7. Now start the services: start-all.sh
You should see all 5 services when you type: jps
I am using hadoop-2.6.0.I resolved using:
1.Deleting all files within
/usr/local/hadoop_store/hdfs
command : sudo rm -r /usr/local/hadoop_store/hdfs/*
2.Format hadoop namenode
command : hadoop namenode -format
3.Go to ..../sbin directory(cd /usr/local/hadoop/sbin)
start-all.sh
use command==> hduser#abc-3551:/$ jps
Following services would be started now :
19088 Jps
18707 ResourceManager
19043 NodeManager
18535 SecondaryNameNode
18329 DataNode
18159 NameNode
When I had this same issue, the 'Current' folder wasn't even being created in my hadoop/data/datanode folder. If this is the case for you too,
~copy the contents of 'Current' from namenode and paste it into datanode folder.
~Then, open VERSION for datanode and change the storageType=NAME_NODE to storageType=DATA_NODE
~run jps to see that the datanode continues to run

Hadoop 2.2 Add new Datanode to an existing hadoop installation

I first installed hadoop 2.2 on my machine (called Abhishek-PC) and everything worked fine. I am able to run the entire system successfully. (both namenode and datanode).
Now I created 1 VM hdclient1 and I want to add this VM as a data node.
Here are the steps which I have followed
I setup SSH successfully and I can ssh into hdclient1 without a password and I can login from hdclient1 into my main machine without a password.
I setup hadoop 2.2 on this VM and I modified the configuration files as per many tutorials on the web. Here are my configuration files
Name Node configuration
https://drive.google.com/file/d/0B0dV2NMSGYPXdEM1WmRqVG5uYlU/edit?usp=sharing
Data Node configuration
https://drive.google.com/file/d/0B0dV2NMSGYPXRnh3YUo1X2Frams/edit?usp=sharing
Now when I start start-dfs.sh on my first machine, I can see that DataNode starts successfully on hdclient1. Here is a screenshot from my hadoop console.
https://drive.google.com/file/d/0B0dV2NMSGYPXOEJ3UV9SV1d5bjQ/edit?usp=sharing
As you can see both the machines appear in my cluster (main main and data node).
Although both are called "localhost" for some strange reason.
I can see that the logs are being created on hdclient1in those logs there are no exceptions.
here are the logs from the name node
https://drive.google.com/file/d/0B0dV2NMSGYPXM0dZTWVRUWlGaDg/edit?usp=sharing
Here are the logs from the data node
https://drive.google.com/file/d/0B0dV2NMSGYPXNV9wVmZEcUtKVXc/edit?usp=sharing
I can login to the namenode UI successfully http://Abhishek-PC:50070
but here the UI in the live nodes it says only 1 live node and there is no mention of hdclient1.
https://drive.google.com/file/d/0B0dV2NMSGYPXZmMwM09YQlI4RzQ/edit?usp=sharing
I can create a directory in hdfs successfully hadoop fs -mkdir /small
From the datanode I can see that this directory has been created by using this command hadoop fs -ls /
Now when I try to add a file to my HDFS and I say
hadoop fs -copyFromLocal ~/Downloads/book/war_and_peace.txt /small
i get an error message
abhishek#Abhishek-PC:~$ hadoop fs -copyFromLocal
~/Downloads/book/war_and_peace.txt /small 14/01/04 20:07:41 WARN
util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable 14/01/04
20:07:41 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/small/war_and_peace.txt.COPYING could only be replicated to 0 nodes
instead of minReplication (=1). There are 1 datanode(s) running and
no node(s) are excluded in this operation. at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
So my question is What am I doing wrong here? Why do I get this exception when I try to copy the file into HDFS?
We have a 3-node cluster (all physical boxes) that's been working great for a couple of months. This article helped me the most to setup.

Resources