Connection is not being established

Connection is not being established - hadoop

I have two running container for flume and hadoop. Let it be hadoop2 and flume2. I created these two containers from two images namely hadoop_alone and flume_alone.
docker run -d -p 10.236.173.XX:8020:8020 -p 10.236.173.XX:50030:50030 -p 10.236.173.XX:50060:50060 -p 10.236.173.XX:50070:50070 -p 10.236.173.XX:50075:50075 -p 10.236.173.XX:50090:50090 -p 10.236.173.XX:50105:50105 --name hadoopservices hadoop_alone
I get into hadoop container and checked for exposed ports. So All the ports are exposed properly.
docker run -d --name flumeservices -p 0.0.0.0:5140:5140 -p 0.0.0.0:44444:44444 --link hadoopservices:hadoopservices flume_alone
I get into flume container and checked for env and etc/hosts entries. There is an entry for hadoopservices and env variables are created automatically.
My core-site.xml
fs.defaultFS
hdfs://0.0.0.0:8020
I modified it so it ll accept services at 8020 from all the containers.
My source and sink in flume.conf
a2.sources.r1.type = netcat
a2.sources.r1.bind = localhost
a2.sources.r1.port = 5140
a2.sinks.k1.type = hdfs
a2.sinks.k1.hdfs.fileType = DataStream
a2.sinks.k1.hdfs.writeFormat = Text
a2.sinks.k1.hdfs.path = hdfs://hadoopservices:8020/user/root/syslog/%y-%m-%d/%H%M/%S
a2.sinks.k1.hdfs.filePrefix = events
a2.sinks.k1.hdfs.roundUnit = minute
a2.sinks.k1.hdfs.useLocalTimeStamp = true
I restarted hadoop namenode after changing core-site.xml.
I try to write into hdfs from flume using
/usr/bin/flume-ng agent --conf-file /etc/flume-ng/conf/flume.conf --name a2 -Dflume.root.logger=INFO,console
It says
INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
So i found something is the problem with connection established between these two contianers. I get into hadoop container and checked for port connections
netstat -tna
tcp 0 0 127.0.0.1:52521 127.0.0.1:8020 TIME_WAIT
tcp 0 0 127.0.0.1:8020 127.0.0.1:52516 ESTABLISHED
tcp 0 0 127.0.0.1:52516 127.0.0.1:8020 ESTABLISHED
But i expect it to be
tcp 0 0 172.17.1.XX:54342 172.17.1.XX:8020 TIME_WAIT
tcp 0 0 172.17.1.XX:54332 172.17.1.XX:8020 ESTABLISHED
tcp 0 0 172.17.1.XX:8020 172.17.1.XX:54332 ESTABLISHED
Where 172.17.1.XX is the ip of my hadoop container.
I found the cause. Is it the reason?
Which configuration should be modified? And or my run statement? What should be changed to establish connection between these two docker containers so that i can able to write into hdfs from flume.
If you need more info, i 'll edit it further.
Please tell me some ideas.

If anybody face the same problem, please do the following steps.
1) Check whether 0.0.0.0:8020 is updated in core-site.xml
2) If you update it inside running container, **I suggest you all to restart ALL the services NOT ONLY namenode**. [better do as part of Dockerfile]
3) Check for `env` and `etc/hosts` contents in flume container
4) And hostname in `etc/hosts` must be matched with the `hdfs path` parameter in flume.conf
5) Get into hadoop container and do `netstat -tna` and you must see connection established to <hadoop_container_ip>:8020. Not to your localhost[127.0.0.1].
I hope it 'll be helpful to the people who tries to link containers and port mapping.

Related

How to get my hdfs docker client running?

I'm starting an hdfs server with:
docker run -d sequenceiq/hadoop-docker:2.6.0
I'm observing the running docker processes with
docker ps
Which gets the following result:
6bfa4f2fd3b5 sequenceiq/hadoop-docker:2.6.0 "/etc/bootstrap.sh -d"
31 minutes ago Up 31 minutes 22/tcp, 8030-8033/tcp, 8040/tcp,
8042/tcp, 8088/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp,
50090/tcp kind_hawking
I'm trying to connect via hdfs to my docker container with:
sudo docker run -ti davvdg/hdfs-client hadoop fs -fs hdfs://localhost:50075 -ls /
This gives the following resut:
ls: Call From a48f81b8e1bb/172.17.0.3 to localhost:50075 failed on
connection exception: java.net.ConnectException: Connection refused; For
more details see: http://wiki.apache.org/hadoop/ConnectionRefused
My question is: How to get my hdfs docker client running?
Edit:
Thanks to some helpful feedback from #shizhz I'm updating the question.
Here is my Dockerfile
FROM sequenceiq/hadoop-docker:2.6.0
CMD ["/etc/bootstrap.sh", "-d"]
# Hdfs ports
EXPOSE 50010 50020 50070 50075 50090 8020 9000
# Mapred ports
EXPOSE 10020 19888
#Yarn ports
EXPOSE 8030 8031 8032 8033 8040 8042 8088
#Other ports
EXPOSE 49707 2122
EXPOSE 9000
EXPOSE 2022
Here is how I'm building the image:
sudo docker build -t my-hdfs .
Here is how I'm running the image:
sudo docker run -d -p my-hdfs
Here is how I'm checking the processes:
sudo docker ps
with a result like:
d9c9855cfaf0 my-hdfs "/etc/bootstrap.sh -d" 2 minutes ago
Up 2 minutes 0.0.0.0:32801->22/tcp, 0.0.0.0:32800->2022/tcp,
0.0.0.0:32799->2122/tcp, 0.0.0.0:32798->8020/tcp, 0.0.0.0:32797->8030/tcp,
0.0.0.0:32796->8031/tcp, 0.0.0.0:32795->8032/tcp, 0.0.0.0:32794->8033/tcp,
0.0.0.0:32793->8040/tcp, 0.0.0.0:32792->8042/tcp, 0.0.0.0:32791->8088/tcp,
0.0.0.0:32790->9000/tcp, 0.0.0.0:32789->10020/tcp, 0.0.0.0:32788->19888/tcp,
0.0.0.0:32787->49707/tcp, 0.0.0.0:32786->50010/tcp, 0.0.0.0:32785->50020/tcp,
0.0.0.0:32784->50070/tcp, 0.0.0.0:32783->50075/tcp, 0.0.0.0:32782->50090/tcp
agitated_curran
Here is how I'm getting the IP address:
docker inspect --format '{{ .NetworkSettings.IPAddress }}' d9c9855cfaf0
with a result like:
172.17.0.3
Here is how I'm running the test:
sudo docker run --rm sequenceiq/hadoop-docker:2.6.0 /usr/local/hadoop-2.6.0/bin/hadoop fs -fs hdfs://192.168.0.3:9000 -ls /
With a result like:
17/04/08 19:51:54 INFO ipc.Client: Retrying connect to server:
192.168.0.3/192.168.0.3:9000. Already tried 0 time(s); maxRetries=45
ls: Call From fafcd377f4a0/172.17.0.2 to 192.168.0.3:9000 failed on connection
exception: java.net.ConnectException: Connection refused; For more details
see: http://wiki.apache.org/hadoop/ConnectionRefused
My question is: How to get my hdfs docker client running?

By default, each container will use bridge network driver and has its own isolation network environment. It's not the same thing but you can just simply think they are different servers and has their own private IPs. So when you started a client container and try to connect the address hdfs://localhost:50075, it'll actually try to connect the port 50075 of itself, rather than connecting the hadoop server container, obviously it'll be refused. Please refer to their official network docs for more info.
Containers on the same host can communicate with each other by their private IPs, so to connect to your hadoop server container, you can firstly find out it private IP by:
$> docker inspect --format '{{ .NetworkSettings.IPAddress }}' 378
192.168.0.2
And then I can use the client by(And I think the port should be 9000):
$> docker run --rm sequenceiq/hadoop-docker:2.6.0 /usr/local/hadoop-2.6.0/bin/hadoop fs -fs hdfs://192.168.0.2:9000 -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2015-01-15 04:04 /user
If you want to run client container on another host, then you need to know multi-host networking.

for the client, you could use the thin docker image just for client like hdfs client.
one of them are pretty good that I've used before: https://hub.docker.com/r/ryneyang/hadoop-utils/

ResourceManager does not start

I installed Hadoop (HDP 2.5.3) on 4 VMs with Ambari (1 Ambari Server and 3 Ambari Clients; with the DNS entries server, node0, node1, node2) with HDFS, YARN, MapReduce and Zookeeper.
However, YARN doesn't want to start. When starting the Resource Manager on node1 I get the following error:
resource_management.core.exceptions.ExecutionFailed: Execution of 'curl -sS -L -w '%{http_code}' -X GET 'http://node0:50070/webhdfs/v1/ats/done/?op=GETFILESTATUS&user.name=hdfs' 1>/tmp/tmpgsiRLj 2>/tmp/tmpMENUFa' returned 7. curl: (7) Failed to connect to node0 port 50070: connection refused 000
App Timeline Server and History Server on node1 don't want to start either. Zookeeper, NameNode, DataNode and Nodemanager on Node0 is up. The nodes can reach each other (tried with ping) so that shouldn't be the problem.
Hopefully one can help me. I'm really new to this topic and not really familiar with the system.

You should check the host file (/etc/hosts), see the host name and FNDN, check if there any duplicates name, IP address.
Could you also confirm the firewall activity by steps:
sudo ufw status
And also check the port in iptables (or allow port in firewall: udp, tcp).

How to execute a command on a running docker container?

I have a container running hadoop. I have another docker file which contains Map-Reduce job commands like creating input directory, processing a default example, displaying output. Base image for the second file is hadoop_image created from first docker file.
EDIT
Dockerfile - for hadoop
#base image is ubuntu:precise
#cdh installation
#hadoop-0.20-conf-pseudo installation
#CMD to start-all.sh
start-all.sh
#start all the services under /etc/init.d/hadoop-*
hadoop base image created from this.
Dockerfile2
#base image is hadoop
#flume-ng and flume-ng agent installation
#conf change
#flume-start.sh
flume-start.sh
#start flume services
I am running both containers separately. It works fine. But if i run
docker run -it flume_service
it starts flume and show me a bash prompt [/bin/bash is the last line of flume-start.sh]. The i execute
hadoop fs -ls /
in the second running container, i am getting the following error
ls: Call From 514fa776649a/172.17.5.188 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I understand i am getting this error because hadoop services are not started yet. But my doubt is my first container is running. I am using this as base image for second container. Then why am i getting this error? Do i need to change anything in hdfs-site.xml file on flume contianer?
Pseudo-Distributed mode installation.
Any suggestions?
Or Do i need to expose any ports and like so? If so, please provide me an example
EDIT 2
iptables -t nat -L -n
I see
sudo iptables -t nat -L -n
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE tcp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-6
MASQUERADE udp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-6
MASQUERADE all -- 192.168.122.0/24 !192.168.122.0/24
MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-
Chain DOCKER (2 references)
target prot opt source destination
It is in docker#domian. Not inside a container.
EDIT
See last comment under surazj' answer

Have you tried linking the container?
For example, your container named hadoop is running in psedo dist mode. You want to bring up another container that contains flume. You could link the container like
docker run -it --link hadoop:hadoop --name flume ubuntu:14.04 bash
when you get inside the flume container - type env command to see ip and port exposed by hadoop container.
From the flume container you should be able to do something like. (ports on hadoop container should be exposed)
$ hadoop fs -ls hdfs://<hadoop containers IP>:8020/
The error you are getting might be related to some hadoop services not running on flume. do jps to check services running. But I think if you have hadoop classpath setup correctly on flume container, then you can run the above hdfs command (-ls hdfs://:8020/) without starting anything. But if you want
hadoop fs -ls /
to work on flume container, then you need to start hadoop services on flume container also.
On your core-site.xml add dfs.namenode.rpc-address like this so namenode listens to connection from all ip
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address</name>
<value>0.0.0.0:8020</value>
</property>
Make sure to restart the namenode and datanode
sudo /etc/init.d/hadoop-hdfs-namenode restart && sudo /etc/init.d/hadoop-hdfs-datanode restart
Then you should be able to do this from your hadoop container without connection error, eg
hadoop fs -ls hdfs://localhost:8020/
hadoop fs -ls hdfs://172.17.0.11:8020/
On the linked container. Type env to see exposed ports by your hadoop container
env
You should see something like
HADOOP_PORT_8020_TCP=tcp://172.17.0.11:8020
Then you can verify the connection from your linked container.
telnet 172.17.0.11 8020

I think I met the same problem yet. I either can't start hadoop namenode and datanode by hadoop command "start-all.sh" in docker1.
That is because it launch namenode and datanode through "hadoop-daemons.sh" but it failed. The real problem is "ssh" is not work in docker.
So, you can do either
(solution 1) :
Replace all terms "daemons.sh" to "daemon.sh" in start-dfs.sh,
than run start-dfs.sh
(solution 2) : do
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode
You can see datanode and namenode are working fine by command "jps"
Regards.

flume cannot connect to HDFS port 9099

I am trying to access the log files HDFS using flume.I am connected to port 9099 but I donno why flume trying to connect 8020 I am getting following errors:
java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
NameNode is listening on port 9099 with netstat -tlpn | grep :9099
I think the way to set this is to format namenode and set the port to 8020 but I dont want to do that as it will format everything.
Please help

8020 is the default port for running name node.
You can change this in core-site.xml for the property fs.default.name As you mentioned it is running on 9099 port. check once whether it is mentioned here or not.
Check for flume configuration file which specifies namenode details. you can just stop the cluster and change the port number to default and restart it. No need to format the namenode for this.I had tested the same before answering your question.
Hope it helps!

8020 is a default port; To override it you can use flume-conf.properties.
Update your config with
kafkaTier1.sinks.KafkaHadoopSink.hdfs.path = hdfs://NAME_NODE_HOST:PORT/flume/kafkaEvents/%y-%m-%d/%H%M/%S

Error connecting rabbitmq cluster on Amazon EC2

I am experiencing some difficulties connecting two RabbitMQ nodes on amazon EC2.
The two nodes are controlled using puppet, here is my rabbit.config file:
[
{mnesia, [{dump_log_write_threshold, 1000}]},
{rabbit, [
{tcp_listeners, [5672]},
{kernel, [{inet_dist_listen_min, 55700},{inet_dist_listen_max, 55800}]} ,
{cluster_nodes, ['rabbit#server1', 'rabbit#server2']}
]
}
].
I believe the rights ports for the cluster to connect are open. I am able to telnet from server2 to server1 on both 5672 and 4369.
I have the same /var/lib/rabbitmq/.erlang.cookie on both servers.
And from erlang command line when I net_admin:ping the other node I get pang back.
However, when I run cluster_status on any node they do not look like they are aware of each other. Doing stop_app, reset,rabbitmqctl cluster rabbit#server1 I always get the following error:
Error: {no_running_cluster_nodes...
Has anybody solved a similar problem, or know how to solve it?

Have you opened the ports between 55700 and 55800?
Try checking this to understand what other ports RabbitMQ listens on:
netstat -plten | grep beam
And I'd double-check the cookie...

Like Ivan suggests, you can check which ports the servers are listening on first and then add those TCP rules to Security Groups for servers. That's a good first step.
netstat -plten | grep beam
Returns the following (if server still running and not stop_app)
tcp 0 0 0.0.0.0:37419 0.0.0.0:* LISTEN 498 118739 15519/beam
tcp 0 0 0.0.0.0:15672 0.0.0.0:* LISTEN 498 119032 15519/beam
tcp 0 0 0.0.0.0:55672 0.0.0.0:* LISTEN 498 119029 15519/beam
tcp 0 0 :::5672 :::* LISTEN 498 119018 15519/beam
Notice the common ports 5672 15672 55672 for amqp and web server and the other port is the port the cluster is listening on. Check your other instances and make sure your range includes both of them, then retry and it will work.
Security Group > Inbound > TCP Rule:
30000-65535 and the Security Group allowed sg-XXXXXX and repeat for reciprocating security groups and don't forget to "Apply Rules".
Next make sure you share the /var/lib/rabbitmq/.erlang.cookie (just copy from one server to all others and restart instances)
Then on your command line:
[root#ip-172-31-27-150 ~]# rabbitmqctl stop_app
Stopping node 'rabbit#ip-172-31-27-150' ...
...done.
[root#ip-172-31-27-150 ~]# rabbitmqctl reset
Resetting node 'rabbit#ip-172-31-27-150' ...
...done.
[root#ip-172-31-27-150 ~]# rabbitmqctl join_cluster rabbit#ip-172-31-28-79
Clustering node 'rabbit#ip-172-31-27-150' with 'rabbit#ip-172-31-28-79' ...
...done.
Lastly, don't forget to restart your instance rabbitmqctl start_app
This worked for me on 5 EC2 instances.

thanks for your answer, what I did is to remove the content of this directory except .erlang.cookie ( rm -R /var/lib/rabbitmq/ ). And the cluster connected successfully.
Cheers!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio