Can not connect to WebHDFS by port 14000 in Cloudera Manager - hadoop

I have a Cloudera (the version is cdh6.2.0) cluster and every components(HDFS, HIVE etc.) worked well. However, recently I want to connect to WebHDFS, I found the port(14000) was not running at all, by executing command netstat -antpl|grep 14000 on the NameNode.
I have confirmed the WebHDFS was enabled in Cloudera Manager and it was used 14000 port by default.
Besides, I also tried 50070 port, it didn't listen either. I also tried curl:
curl -i "http://localhost:14000/webhdfs/v1/user/user.name=cloudera&op=GETFILESTATUS"
curl: (7) Failed to connect to localhost port 14000: Connection refused
I appreciate for any help. Thanks.

I solved it by using 9870 port instead.
I found the version of my Hadoop is 3.0, and it is listening with 9870 but not the 50070 for dfs.namenode.http-address.
As for 14000, it is may used for HTTPS REST.
Reference:
https://community.cloudera.com/t5/Support-Questions/Cannot-connect-to-webhdfs/td-p/34830

Related

hadoop fs -ls : Call From server/127.0.1.1 to localhost failed

I have hadoop installed in pseudo-distributed mode.
When running the command
hadoop fs -ls
I am getting the following error:
ls: Call From kali/127.0.1.1 to localhost:9000 failed on connection exception:
java.net.ConnectException: Connection refused;
For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Any suggestions?
If you read the link in the error, I see two immediate points that need addressed.
If the error message says the remote service is on "127.0.0.1" or "localhost" that means the configuration file is telling the client that the service is on the local server. If your client is trying to talk to a remote system, then your configuration is broken.
You should treat pseudodistributed mode as a remote system, even if it is only running locally.
For HDFS, you can resolve that by putting your computer hostname (preferably the full FQDN for your domain), as the HDFS address in core-site.xml. For your case, hdfs://kali:9000 should be enough
Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).
I'm not completely sure why it needs removed, but the general answer I can think of is that Hadoop is a distributed system, and as I mentioned, treat the pseudodistributed mode as if it's remote HDFS server. Therefore, no loopback addresses should use your computers hostname
For example, remove the second line of this
127.0.0.1 localhost
127.0.1.1 kali
Or remove the hostname from this
127.0.0.1 localhost kali
Most importantly (emphasis added)
None of these are Hadoop problems, they are hadoop, host, network and firewall configuration issues

ResourceManager does not start

I installed Hadoop (HDP 2.5.3) on 4 VMs with Ambari (1 Ambari Server and 3 Ambari Clients; with the DNS entries server, node0, node1, node2) with HDFS, YARN, MapReduce and Zookeeper.
However, YARN doesn't want to start. When starting the Resource Manager on node1 I get the following error:
resource_management.core.exceptions.ExecutionFailed: Execution of 'curl -sS -L -w '%{http_code}' -X GET 'http://node0:50070/webhdfs/v1/ats/done/?op=GETFILESTATUS&user.name=hdfs' 1>/tmp/tmpgsiRLj 2>/tmp/tmpMENUFa' returned 7. curl: (7) Failed to connect to node0 port 50070: connection refused 000
App Timeline Server and History Server on node1 don't want to start either. Zookeeper, NameNode, DataNode and Nodemanager on Node0 is up. The nodes can reach each other (tried with ping) so that shouldn't be the problem.
Hopefully one can help me. I'm really new to this topic and not really familiar with the system.
You should check the host file (/etc/hosts), see the host name and FNDN, check if there any duplicates name, IP address.
Could you also confirm the firewall activity by steps:
sudo ufw status
And also check the port in iptables (or allow port in firewall: udp, tcp).

Port is in use 50070

I am using VM with Ambari 2.2 and HDP 2.3 and installing services using Ambari user interface. Issue is NameNode not starting and log indicates error saying port is in use 50070. I tried netstat and other tools to find out if anything is running on port 50070, it is not. I also tried changing 50070 to 50071 but error remains the same except it now says port is in use 50071.Below is the error I get in ambari error file:
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode'' returned 1. starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-hdp-m.out
2016-02-07 11:52:47,058 ERROR namenode.NameNode (NameNode.java:main(1712)) - Failed to start namenode.
java.net.BindException: Port in use: hdp-m.samitsolutions.com:50070
When using Ambari, I came across the port is in use 50070 problem. I found it's actually caused by the mismatch of NameNode's host, not port. In sometimes, Ambari will start namenode on HostB and HostC, while your configure are HostA and HostC.
Such a situation could by caused by: Update wrong namenode config when moving namenode

flume cannot connect to HDFS port 9099

I am trying to access the log files HDFS using flume.I am connected to port 9099 but I donno why flume trying to connect 8020 I am getting following errors:
java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
NameNode is listening on port 9099 with netstat -tlpn | grep :9099
I think the way to set this is to format namenode and set the port to 8020 but I dont want to do that as it will format everything.
Please help
8020 is the default port for running name node.
You can change this in core-site.xml for the property fs.default.name As you mentioned it is running on 9099 port. check once whether it is mentioned here or not.
Check for flume configuration file which specifies namenode details. you can just stop the cluster and change the port number to default and restart it. No need to format the namenode for this.I had tested the same before answering your question.
Hope it helps!
8020 is a default port; To override it you can use flume-conf.properties.
Update your config with
kafkaTier1.sinks.KafkaHadoopSink.hdfs.path = hdfs://NAME_NODE_HOST:PORT/flume/kafkaEvents/%y-%m-%d/%H%M/%S

CDH WebHDFS request redirects to local address on EC2

I am trying to setup an enviroment where I run some of my backend locally, and send requests to an EC2 instance from my local computer. I have CDH 4.5 setup, and it works OK. When I run the following request
curl --negotiate -i -L -u:hdfs http://ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com:50070/webhdfs/v1/tmp/test.txt?op=OPEN
This works from any EC2 instance in that region but does not work outside that. If I try locally it would return the following error
curl: (6) Could not resolve host: ip-xx-xx-xx-xx.eu-west-1.compute.internal
Not sure where I can set this not to redirect the call this way?
Many thanks
The easiest and fastest way to solve this problem is to configure your client hosts file to map the internal address to the external address.
WebHDFS uses the host name configured in hdfs-site.xml which is configured automatically by the Cloudera agent on that datanode. I don't know of a way to override the configured hostname for each datanode in CDH.

Resources