I have my YARN resource manager on a different node than my namenode, and I can see that something is running, which I take to be the resource manager. Ports 8031 and 8030 are bound, but not port 8032, to which my client tries to connect.
I am on CDH 5.3.1, and the following is part of the output of lsof -i
java 12478 yarn 230u IPv4 61325 0t0 TCP hadoop2.adastragrp.com:48797->hadoop2.adastragrp.com:8031 (ESTABLISHED)
java 13753 yarn 159u IPv4 61302 0t0 TCP hadoop2.adastragrp.com:8031 (LISTEN)
java 13753 yarn 170u IPv4 61308 0t0 TCP hadoop2.adastragrp.com:8030 (LISTEN)
java 13753 yarn 191u IPv4 61326 0t0 TCP hadoop2.adastragrp.com:8031->hadoop2.adastragrp.com:48797 (ESTABLISHED)
How do I diagnose what's wrong here? I suspect that the resource manager is running, but can't bind to port 8032, but I have no idea why that could be.
In the cloudera manager, the ResourceManager is shown as having good health, but at the same time I get this report:
ResourceManager summary: hadoop2.adastragrp.com (Availability:
Unknown, Health: Good). This health test is bad because the Service
Monitor did not find an active ResourceManager.
I can execute yarn application -list locally on the resource manager node, but when I do the same on a different node, it tries to connect to the resource manager correctly, but fails to do so. Both nodes are connected, can ping each other, and so on. I disabled the iptables service on the VM.
nmap output:
8032/tcp filtered unknown host-prohibited

Wether the port was occupied by other process? For example, you stop your hadoop cluster abnormally, result in some process still running. If so, try to ps -e|grep java,and kill it.

Gotcha, on CentOS 6 stopping the iptables service didn't really disable the firewall. I had to disable it with system-config-firewall.


This CDH cluster has been install for months, and be used to backup logs.
Today I try to run flink on yarn, and want to open yarn web ui to check flink taskmanagers' state, i find 8088 port connect refuse.
This site can’t be reached
47.74.***.*** refused to connect.
Search Google for *** *** 8088
yarn port & address config as follows:
yarn.resourcemanager.address 8032
yarn.resourcemanager.scheduler.address 8030
yarn.resourcemanager.resource-tracker.address 8031
yarn.resourcemanager.admin.address 8033
yarn.resourcemanager.webapp.address 8088
yarn.resourcemanager.webapp.https.address 8090
Even curl 'http://ip:8088' on the resource manager host, also get "connection refuse"
[root#bigdata-cdh02 ~]# netstat -tunlp|grep 8088
tcp 0 0* LISTEN 20606/java
BTW, I check yarn logs, it seems that yarn has successfully allocated resources for flink.

I'd really appreciate some help to get cloudera manager running on AWS EC2.
Its my first install, and I'm aiming to use the AWS Free Tier to spin up a few nodes and do some training on Hadoop cluster and the cloudera distribution. I'm using the RedHat RHEL 7.2 image on AWS EC2.
I am following the instructions here... Cloudera Manager installation
I have installed cloudera manager OK, and get to the screen where it invites you to use a browser to log-in to the cloudera manager server. But that's where the problem starts. It seems the app is not listening on port 7180, so there's no hope of connecting from another machine across the network. I can't even connect locally, on the server, yet the service appears to be running OK. But its not listening on port 7180.
Q1 - How can I confirm the config is set to use port 7180.?
Q2 - are there obvious steps that I'm missing here ?
Thanks in advance,
I'm beginning to wonder if the Free EC2 host is running short on memory to run cloudera manager. I saw one comment that implied that....AWS Forum post . But the process doesn't crash or report any problems in its logfile. So it must be OK, right?
[Edit.... with more diagnostic info....]
Here's a list of the diagnostics I've checked:-
SELinux is not running [for install and testing purposes],
WAN firewalls,
EC2 firewall/Security group,
Local firewall on server,
Cloudera manager log,
Is the service up and running?
Can you connect locally?
Securtity group on the EC2 instance, it contains:-
SSH and Port 7180,
Firewall/iptables/firewalld on the RedHat instance, tried:-
adding ports to iptables, then
dissabling iptables, then
adding ports to firewalld, then
dissabling the firewalld service,
$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT tcp -- anywhere anywhere tcp dpt:ssh
ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:7180
ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:7182
But I'm getting the feeling that the installation of cloudera manager is not happy, or not running correctly.
I've checked the cloudera manager log, and it ends with the following.
$ tail /var/log/cloudera-scm-server/cloudera-scm-server.log
2016-02-25 11:02:23,581 INFO main:com.cloudera.cmon.components.MetricSchemaUpdate: persisting 19264 new metrics
2016-02-25 11:02:28,920 INFO main:com.cloudera.cmon.components.MetricSchemaUpdate: persisting 0 updated metrics
2016-02-25 11:02:28,924 INFO main:com.cloudera.cmon.components.MetricSchemaManager: Cross entity aggregates processed.
And when I use tail -f, and restart the cloudera-scm-server service, the log scrolls a lot, and comes back the same state. If I search for ERROR, there are no lines with "ERR".
$ sudo service cloudera-scm-server start
Starting cloudera-scm-server (via systemctl): [ OK ]
$ sudo systemctl status cloudera-scm-server
● cloudera-scm-server.service - LSB: Cloudera SCM Server
Loaded: loaded (/etc/rc.d/init.d/cloudera-scm-server)
Active: active (exited) since Thu 2016-02-25 12:23:03 EST; 44s ago
Docs: man:systemd-sysv-generator(8)
Process: 747 ExecStart=/etc/rc.d/init.d/cloudera-scm-server start (code=exited, status=0/SUCCESS)
So, if I try to test the service, by connecting from the local machine I get the sort of behavious that makes me thing its just not listening, and maybe not started correctly.
Try poke it with a curl from the same shell as the cloudera-scm-server service was started
$ curl localhost:7180
curl: (7) Failed connect to localhost:7180; Connection refused
$ wget localhost:7180
--2016-02-25 08:00:16-- http://localhost:7180/
Resolving localhost (localhost)... ::1,
Connecting to localhost (localhost)|::1|:7180... failed: Connection refused.
Connecting to localhost (localhost)||:7180... failed: Connection refused.
Try check what ports are listening on that machine, no 7180 , what's up with that???
$ netstat -nltp
(No info could be read for "-p": geteuid()=1000 but you should be root.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0* LISTEN -
tcp 0 0* LISTEN -
tcp 0 0* LISTEN -
tcp6 0 0 :::7432 :::* LISTEN -
tcp6 0 0 :::22 :::* LISTEN -
tcp6 0 0 ::1:25 :::* LISTEN -
Here's what to look for, and a possible solution - give it more memory...
Check the status of the cloudera-scm-server service using [depending on your flavour of linux]
$ sudo service cloudera-scm-server status
$ sudo systemctl status cloudera-scm-server
Look for the status - Active: active (running)
But if you find - Active: active (exited)
you may have a problem during the startup of the cloudera-scm-server.
In which case, look at the log files for cloudera-scm-server
$sudo ls -l /var/log/cloudera-scm-server
$sudo cat /var/log/cloudera-scm-server/cloudera-scm-server.out
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x000000078dc58000, 265809920, 0) failed; error='Cannot allocate memory' (errno=12)
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 265809920 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid831.log
[ec2-user#ip-172-31-31-166 ~]$ sudo tail -100 /var/log/cloudera-scm-server/cloudera-scm-server.out
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x000000078dc58000, 265809920, 0) failed; error='Cannot allocate memory' (errno=12)
Use the command top to indicate how much memory is available to your system.
Possible solution - have a look at this discussion at Cloudera forum
In this case the java heap size was too small.
As we see that heap was exhausted, assuming this is not a memory leak
or something of the sort, Cloudera Manager may need more heap to
operate. This can be configured in:
/etc/default/cloudera-scm-server You could, for instance, change "-Xmx2G" to "-Xmx3G" or "-Xmx4G" If the problem still
happens, perhaps the heap dumps will yeild some clues.
I'd suggest you tail the logs. If you are using the free tier, cloudera manager will take a while to come up... possibly up to 5 minutes or more after you start the cloudera-scm-server.
The logs should show if there are any errors, possibly issues with memory allocation since the free tier servers have limited memory available. The little snippet of log entries looks fine and typical - it will go through a long list of processes before the UI comes up on 7180.
Also while that is going on, run top or even free -g to see how much resources are being used - particularly memory.
I was having the exact same issue, cannot hit the CM login using public DNS or IP on port 7180.
Following steps will help you :
iptables stopped (service iptables stop)
SELinux disabled (got to /etc/selinux/config and disbaled the selinux)
curl/wget localhost:7180 works (check the curl status)
ufw allow 7180
service httpd status should be running.
check va/log/cloudera-scm-server log : if any error found then troubleshoot the error
cloudera-scm-server status (should be running state)
netstat -nap | grep 7180 returns (if running other service then kill it)
telnet localhost 7180 (should be connected)
1] Check the status:
sudo service cloudera-scm-server status
*cloudera-scm-server.service - LSB: Cloudera SCM Server Loaded: loaded (/etc/rc.d/init.d/cloudera-scm-server; bad; vendor preset: disabled) Active: active (exited) since UTC; 47min ago Docs: man:systemd-sysv-generator(8) rm /var/run/cloudera-scm-server.pid
NOTE : The Cloudera Manager service will not be running as it exited abnormally.
Running service cloudera-scm-server status will print following message "cloudera-scm-server dead but pid file exists".
Reason: Out of memory.
Solution : Examine the heap dump that the Cloudera Manager Server creates when it runs out of memory. The heap dump file is created in the /tmp directory, has file extension .hprof and file permission of 600. Its owner and group will be the owner and group of the Cloudera Manager server process, normally cloudera-scm:cloudera-scm.
Link : http://www.cloudera.com/documentation/manager/5-0-x/Cloudera-Manager-Diagnostics-Guide/cm5dg_troubleshooting_cluster_config.html
Check the status of `cloudera-scm-server` and follow the instructions ahead:
[root#quickstart ~]# `service cloudera-scm-server status`
By default, Cloudera's QuickStart VM manages CDH using Linux's configuration
and service management. To use Cloudera Manager instead, you must shut down
and disable the existing CDH services and then start Cloudera Manager. You can
do this by running the following command:
`sudo /home/cloudera/cloudera-manager`
[root#quickstart ~]# `sudo /home/cloudera/cloudera-manager `
`[QuickStart] Shutting down CDH services via init scripts...
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
[QuickStart] Disabling CDH services on boot...
[QuickStart] Starting Cloudera Manager services...
[QuickStart] Deploying client configuration...
[QuickStart] Starting CM Management services...
[QuickStart] Enabling CM services on boot...
[QuickStart] Starting CDH services...`
Success! You can now log into Cloudera Manager from the QuickStart VM's browser:
Username: cloudera
Password: cloudera

I am writing a Ruby script that deploys a server on port 8000 in the background, and then in the foreground I issue queries to the server. After I've issued my queries I kill the server, however when I kill the server, it seems to be switching ports.
I am doing it the following way in the ruby script:
To see PID that is running on port 8000:
lsof -i:8000 -t
java 26364 user1 84u IPv6 199069 0t0 TCP *:8000 (LISTEN)
To kill the server I issue the command:
kill 26364
I then see if anything is running on port 8000:
# check if killed
lsof -i:8000 -t
ruby 25560 user1 58u IPv4 199123 0t0 TCP localhost:45789->localhost:8000 (ESTABLISHED)
java 26364 user1 84u IPv6 199069 0t0 TCP *:8000 (LISTEN)
java 26364 user1 85u IPv6 199124 0t0 TCP localhost:8000->localhost:45789 (ESTABLISHED)
I only want to kill the process that is listening on port 8000,
and keep my ruby script running.
Can someone please tell me what is going on? Why is it switching ports? How can I only kill my server port?
It doesn't look to me like it's switching ports; it's still listening on port 8000. It looks to me like two things are happening:
The java process (PID 26364) is catching or ignoring the kill signal (SIGTERM), and continuing to listen on port 8000.
A ruby process (PID 25560) is making a connection to localhost:8000 (from port 45789, which was probably dynamically allocated). That is, ruby is making a normal connection to the server on port 8000.
Note that the java process owns the port 8000 end of the localhost:8000<->localhost:45789 TCP session, and the ruby process owns the port 45789 end.
Whether the ruby process's connection is somehow a result of the kill signal, or just something it happened to do at about the same time, I couldn't tell you.

When I setup the hadoop cluster, I read the namenode runs on 50070 and I set up accordingly and it's running fine.
But in some books I have come across name node address :
What exactly is the proper number to set the port of namenode?
The default Hadoop ports are as follows: (HTTP ports, they have WEB UI):
Daemon Default Port Configuration Parameter
----------------------- ------------ ----------------------------------
Namenode 50070 dfs.http.address
Datanodes 50075 dfs.datanode.http.address
Secondarynamenode 50090 dfs.secondary.http.address
Backup/Checkpoint node? 50105 dfs.backup.http.address
Jobracker 50030 mapred.job.tracker.http.address
Tasktrackers 50060 mapred.task.tracker.http.address
Internally, Hadoop mostly uses Hadoop IPC, which stands for Inter Process Communicator, to communicate amongst servers. The following table presents the ports and protocols that Hadoop uses. This table does not include the HTTP ports mentioned above.
Daemon Default Port Configuration Parameter
Namenode 8020 fs.default.name
Datanode 50010 dfs.datanode.address
Datanode 50020 dfs.datanode.ipc.address
Backupnode 50100 dfs.backup.address
The default address of namenode web UI is http://localhost:50070/. You can open this address in your browser and check the namenode information.
The default address of namenode server is hdfs://localhost:8020/. You can connect to it to access HDFS by HDFS api. The is the real service address.
Default port for namenode is 9870 on hadoop 3.x. Please refer to https://hadoop.apache.org/docs/r3.0.0/ for details.
9000 is the default HDFS service port.This does not have a web UI.50070 is the default NameNode web UI port (Although, in hadoop 3.0 onwards 50070 is updated to 9870)
That is because default is different for different hadoop configurations and distributions.
We can always configure port by changing fs.default.name or fs.defaultFS properties as below in core-site.xml
For Hadoop 1.0.4 if I dont mention port number like below
then default port taken is 8020. But for some of the version like .20 i read it is 9000. So it is dependent on the version of hadoop you are using.
But all the configuration and distributation are using 50070 as standard port number for HDFS ui.
To access Hadoop WEB UI , you need to type http://localhost:50075/
though your core-site.xml is having http://localhost:9000 because it is for hdfs requests and 50075 is the default port for WEB UI.
There are other HTTP ports that would run in server for monitoring. Example: 50070, 8088, 9870, 9864, 9868, 16010, 16030
Hadoop IPC(Inter Process Communicator) ports (Eg. 9000) cannot be accessed through your web browser.
You can find the ports that can be accessed in browser by the following command:
lsof -i -P -n | grep LISTEN
For example, the ports in my server were:
Hadoop Cluster - http://server-name:8088/cluster
Hadoop NameNode/DFS Health - http://server-name:9870/dfshealth.html#tab-overview
Hadoop DataNode - http://server-name:9864/datanode.html
Hadoop Status - http://server-name:9868/status.html
HBase Master Status - http://server-name:16010/master-status
HBase Region server - http://server-name:16030/rs-status
50070 is the default UI port for namenode . while 8020/9000 is the Inter Process Communicator port (IPC) for namenode.
Reference to IPC port : https://en.wikipedia.org/wiki/Inter-process_communication
50070 is default UI port of Namenode for http. for https its 50470.
9000 is the IPC port(Inter Process Communicator). If you click on localhost:50070, you can see namenode configurations with an overview 9000 (active) and on localhost:9000 you will get message:
"It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon." required for file system metadata operations.
You can check what ports each daemon is listening on if you’re having trouble finding the web interface. For example, to check ports the NameNode is listening on:
lsof -Pan -iTCP -sTCP:LISTEN -p `jps | grep "\sNameNode" | cut -d " " -f1`
This will give you output similar to
java 4053 pi 275u IPv6 45474 0t0 TCP *:9870 (LISTEN)
java 4053 pi 288u IPv6 43476 0t0 TCP (LISTEN)
As you can see *:9870 is listed, which is the web interface.
lsof -Pan -iTCP -sTCP:LISTEN -p <pid> lists all network files with TCP state LISTEN. -p filters the list by process id. So by plugging in a process id after this command, you can see all the ports a process is listening on.
jps | grep "\sNameNode" | cut -d " " -f1 gets the process id of the NameNode.

Error connecting rabbitmq cluster on Amazon EC2

I am experiencing some difficulties connecting two RabbitMQ nodes on amazon EC2.
The two nodes are controlled using puppet, here is my rabbit.config file:
{mnesia, [{dump_log_write_threshold, 1000}]},
{rabbit, [
{tcp_listeners, [5672]},
{kernel, [{inet_dist_listen_min, 55700},{inet_dist_listen_max, 55800}]} ,
{cluster_nodes, ['rabbit#server1', 'rabbit#server2']}
I believe the rights ports for the cluster to connect are open. I am able to telnet from server2 to server1 on both 5672 and 4369.
I have the same /var/lib/rabbitmq/.erlang.cookie on both servers.
And from erlang command line when I net_admin:ping the other node I get pang back.
However, when I run cluster_status on any node they do not look like they are aware of each other. Doing stop_app, reset,rabbitmqctl cluster rabbit#server1 I always get the following error:
Error: {no_running_cluster_nodes...
Has anybody solved a similar problem, or know how to solve it?
Have you opened the ports between 55700 and 55800?
Try checking this to understand what other ports RabbitMQ listens on:
netstat -plten | grep beam
And I'd double-check the cookie...
Like Ivan suggests, you can check which ports the servers are listening on first and then add those TCP rules to Security Groups for servers. That's a good first step.
netstat -plten | grep beam
Returns the following (if server still running and not stop_app)
tcp 0 0* LISTEN 498 118739 15519/beam
tcp 0 0* LISTEN 498 119032 15519/beam
tcp 0 0* LISTEN 498 119029 15519/beam
tcp 0 0 :::5672 :::* LISTEN 498 119018 15519/beam
Notice the common ports 5672 15672 55672 for amqp and web server and the other port is the port the cluster is listening on. Check your other instances and make sure your range includes both of them, then retry and it will work.
Security Group > Inbound > TCP Rule:
30000-65535 and the Security Group allowed sg-XXXXXX and repeat for reciprocating security groups and don't forget to "Apply Rules".
Next make sure you share the /var/lib/rabbitmq/.erlang.cookie (just copy from one server to all others and restart instances)
Then on your command line:
[root#ip-172-31-27-150 ~]# rabbitmqctl stop_app
Stopping node 'rabbit#ip-172-31-27-150' ...
[root#ip-172-31-27-150 ~]# rabbitmqctl reset
Resetting node 'rabbit#ip-172-31-27-150' ...
[root#ip-172-31-27-150 ~]# rabbitmqctl join_cluster rabbit#ip-172-31-28-79
Clustering node 'rabbit#ip-172-31-27-150' with 'rabbit#ip-172-31-28-79' ...
Lastly, don't forget to restart your instance rabbitmqctl start_app
This worked for me on 5 EC2 instances.
thanks for your answer, what I did is to remove the content of this directory except .erlang.cookie ( rm -R /var/lib/rabbitmq/ ). And the cluster connected successfully.
