How to check the port number of the hadoop services - hadoop

How to check the port number of the hadoop services eg: port number for hive, oozie, sqoop, pig etc. I heard each hadoop service has a port number.

Normally Port use to get configured in the Configuration Files it self, available in either under "/etc/hadoop/conf/" or "/usr/local/hadoop/conf/" location "hadoop" with respected names like "pig/hive/sqoop" etc.
The Configuration named as "hdfs-site.xml/core-site.xml/hive-site.xml/mapred-site.xml...etc"
Some of the default Ports Used by Hadoop and it's Eco Systmems are:
Daemon Default Port Configuration Paramete
Namenode 50070 dfs.http.address
Datanodes 50075 dfs.datanode.http.address
Secondarynamenode 50090 dfs.secondary.http.address
Backup/Checkpoint node 50105 dfs.backup.http.address
Jobracker 50030 mapred.job.tracker.http.address
Tasktrackers 50060 mapred.task.tracker.http.address
Also Check Reference: MORE DETAIL PORTS

You can take advantage of Cloudera Distribution of Hadoop post numbers for each components : Ports Used by Components of CDH 5

Related

Meaning of fs.defaultFS property in core-site.xml in hadoop

I am trying to set up hadoop in fully distributed mode, and to some extent I am successful in doing this.
However, I have got some doubt in one of the parameter setting in core-site.xml --> fs.defaultFS
In my set up, I have three nodes as described below:
Node1 -- 192.168.1.2 --> Configured to be Master (Running ResourceManager and NameNode daemons)
Node2 -- 192.168.1.3 --> Configured to be Slave (Running NodeManager and Datanode daemons)
Node3 -- 192.168.1.4 --> Configured to be Slave (Running NodeManager and Datanode daemons)
Now what does property fs.defaultFS mean? For example, if I set it like this:
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.2:9000/</value>
</property>
I am not able to understand the meaning of hdfs://192.168.1.2:9000. I can figure out that hdfs would mean that we are using hdfs file system, but what does the other parts means?
Does this mean that the host with IP address 192.168.1.2 is running the Namenode at port 9000?
Can anyone help me understand this?
In this code:
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.2:9000/</value>
</property>
Include fs.defaultFS/fs.default.name in core-site.xml to allow dfs commands without providing full site name in the command. Running hdfs dfs -ls / instead of hdfs dfs -ls hdfs://hdfs/
This is used to specify the default file system and defaults to your local file system that's why it needs be set to a HDFS address. This is important for client configuration as well so your local configuration file should include this element.
Above #Shashank explained very appropriate that :
hdfs://192.168.1.2:9000/. Here 9000 denotes port on which datanode will send heartbeat to namenode. And full address is the machine name that is converted to hostname.
<name>fs.default.name</name>.
Here fs denotes file system and default.name denotes namenode
<value>hdfs://192.168.1.2:9000/</value>.
Here 9000 denotes port on which datanode will send heartbeat to namenode. And full address is the machine name that is converted to hostname.
Something important to note about port is that you can give any port greater than 1024 as lesser than that have to give root privileges.

Zookeer is part of hadoop or separate configuration?

As I read from various tuts, zookeeper helps to coordinate and sync various hadoop clusters.
Currently I installed hadoop 2.5.0. When I do jps it displays
4494 SecondaryNameNode
8683 Jps
4679 ResourceManager
3921 NameNode
4174 DataNode
4943 NodeManager
no process for zookeeper.
I had doubt whether zookeeper is part of hdfs or we need to install it manually?
If you use hadoop only, zookeeper is not required! for other tools in hadoop, i.e. hbase, it depends on zookeeper! but you don't need install it dedicatedly, hbase has included it, if you startup hbase, the zookeeper will startup at the same time.

Hadoop: pseudo cluster, adding datanode

am trying to install a multiple pseudo nodes for an experimental cluster. The reason is simple: I just have only one machine in my office.
Therefore, i followed this guide: and especially the answer of Matt:
http://search-hadoop.com/m/sApJY1zWgQV/
I created an additional folder conf2
1.1. In hadoop-env.sh, i edited HADOOP_IDENT_STRING to ${USER}_02
1.2. I changed the data.dir in hdfs-site.xml
1.3. In hdfs-site.xml i changed the port of:
dfs.datanode.address (default 0.0.0.0:50010)
dfs.datanode.ipc.address (default 0.0.0.0:50020)
dfs.datanode.http.address (default 0.0.0.0:50075)
dfs.datanode.https.address (default 0.0.0.0:50475)
I tried the command: "./hadoop-daemons.sh --config ../conf2 start datanode"
on my current single node hadoop system
The error is still: "localhost: datanode running as process 42855. Stop it first."
The jps command says:
:~/hadoop/bin$ jps
2255 Jps
43412 SecondaryNameNode
43853 TaskTracker
42855 DataNode
43544 JobTracker
42537 NameNode
Does anyone have an idea how i could trick my hadoop system to accept the additional data node now?
thanks alot

cdh4.3,Exception from the logs ,after ./start-dfs.sh ,datanode and namenode start fail

here is the logs from hadoop-datanode-...log:
FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1421227885-192.168.2.14-1371135284949 (storage id DS-30209445-192.168.2.41-50010-1371109358645) service to /192.168.2.8:8020
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-30209445-192.168.2.41-50010-1371109358645, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-f16e4a3e-4776-4893-9f43-b04d8dc651c9;nsid=1710848135;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:648)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3498)
my mistake:namenode can start,datanode can't start
I saw this once too, the namenode server needs to do a reverse lookup request ,
so an nslookup 192.168.2.41 should return a name, it doesn't so 0.0.0.0 is also recorded
You don't need to hardcode address into /etc/hosts if you have dns working correctly (i.e. the in-addr.arpa file matches the entries in domain file) But if you don't have dns then you need to help hadoop out.
There seems to be a Name Resolution issue.
Datanode denied communication with namenode:
DatanodeRegistration(0.0.0.0,
storageID=DS-30209445-192.168.2.41-50010-1371109358645,
infoPort=50075, ipcPort=50020,
Here DataNode is identifying itself as 0.0.0.0.
Looks like dfs.hosts enforcement. Can you recheck on your NameNode's hdfs-site.xml configs that you are surely not using a dfs.hosts file?
This error may arise if the datanode that is trying to connect to the namenode is either listed in the file defined by dfs.hosts.exclude or that dfs.hosts is used and that datanode is not listed within that file. Make sure the datanode is not listed in excludes, and if you are using dfs.hosts, add it to the includes. Restart hadoop after that and run hadoop dfsadmin -refreshNodes.
HTH
Reverse DNS lookup is required when a datanode tries to register with a namenode. I got the same exceptions with Hadoop 2.6.0 because my DNS does not allow reverse lookup.
But you can disable Hadoop's reverse lookup by setting this configuration "dfs.namenode.datanode.registration.ip-hostname-check" to false in hdfs-site.xml
I got this solution from here and it solved my problem.

Running multiple hadoop instances on same machine

I wish to run a second instance of Hadoop on a machine which already has an instance of Hadoop running. After untar'ing hadoop distribution, some config files need to changed from hadoop-version/conf directory. The linux user will be same for both the instances. I have identified the following attributes, but, I am not sure if this is good enough.
hdfs-site.xml : dfs.data.dir and dfs.name.dir
core-site.xml : fs.default.name and hadoop.tmp.dir
mapred-site.xml : mapred.job.tracker
I couldn't find the attribute names for the port number of job tracker/task tracker/DFS web interface. Their default values are 50030, 50060 and 50070 respctively.
Are there any more attributes that need to be changed to ensure that the new hadoop instance is running in its own environment?
Look for ".address" in src/hdfs/hdfs-default.xml and src/mapred/mapred-default.xml, and you'll find plenty attributes defined there.
BTW, I had a box with firewall enabled, and I observed that the effective ports in default configuration are 50010, 50020, 50030, 50060, 50070, 50075 and 50090.

Resources