I have standalone secured NiFi 1.12.1 in Docker running all fine. I am sucessfully using Site-To-Site remote processors, Site-To-Site forwarding of Nifi bulletins, calling NiFi API for self-monitoring and such things. I log in through certificate. So far all fine.
Problem crops up when I try to use NiFi Registry. I have access to two instances: secure and insecure.
No matter what exact format I specify (FQDN, just a name, with /nifi-registry or without), when I try to access (e.g. though importing a process group) the either NiFi Registry from NiFi, it fails with o.a.n.w.a.config.NiFiCoreExceptionMapper org.apache.nifi.web.NiFiCoreException: Unable to obtain listing of buckets: java.net.ConnectException: Connection refused (Connection refused). Returning Conflict response.. In logs is just this message with enormous stack-trace and nothing more.
I checked all certificates and they seem OK (certification path, certificate is for clientAuth as well as for serverAuth). I even use them to log into NiFi myself...
What surprises me the most is the fact, that it works for things like Site-To-Site protocols, API calls and such, but not for NiFi Registry.
Please don´t you know what might be a problem? Or any ideas what to check?
TL; DR:
Use IP addresses or edit /etc/hosts. Problem is in translation of hostname to IP address.
When I attempted to access the API of NiFi Registry directly from NiFi through InvokeHTTP, I noticed an important thing - nothing in different container responded to me (failed to connect to target):
#Safe NiFi - the one I am troubleshooting
https://<my FQDN>:8443/nifi-api/flow/registries
#Safe NiFi Registry (another container) - the one I am trying connect to
https://<my FQDN>:18443/nifi-registry-api/buckets
#Unsafe NiFi (another container) - just for test
http://<my FQDN>:28080/nifi-api/flow/registries
#Unsafe NiFi Registry (another container) - just for test
http://<my FQDN>:38080/nifi-registry-api/buckets
Then it dawned on me: To solve problem with Site-To-Site connections (discrepancy in name of container vs HTTPS certificate issued for hosting machine) I gave the container the same name as the hosting Docker machine. To verify I used IP addresses instead of FQDNs and it worked. Also checking /etc/hosts confirmed this - the FQDN pointed to IP address of container instead of Docker host.
Thus, one given FQDN was in container resolved as localhost and as Docker host everywhere else. And since on localhost on the NiFi Registry port(s) nothing listen ...
So as a solution either mangle the /etc/hosts to remove the offending line, or use IP addresses to force traversing through the Docker host.
Related
Despite ElasticSearch and Kibana both running on my production server, I'm unable to visit the GUI over the public IP: http://52.4.153.19:5601/
Localhost curls return 200 but console errors on the browser report timeouts after a few images are retrieved.
I've successfully installed, run, and accessed Kibana on my local (Windows 10) and on my staging AWS EC2 Ubuntu 14.04 environment. I'm able to access both over port 5601 on localhost and the staging environment is accessible over the public IP address and all domains addressed accordingly. The reverse proxy also works and all status indicators are green on the dashboard.
I'm running Kibana 4.5, ElasticSearch 2.3.1, Apache 2.4.12
I've used the same exact volume from the working environment to attach to the production instance, so everything is identical on the two volumes, except that the staging environment's apache vhost uses a subdomain while the production environment's servername is the base domain. Both are configured for SSL wildcards. Both are in separate availability zones at Amazon. I've tried altering the server block to use a subdomain on the production server, just to see if the domain was impactful but the error remains.
I also tried running one instance individually, in case EC2 had some kind of networking error with 0.0.0.0 but I'm unable to come to a resolution. All logs and configurations are identical between the two servers for ElasticSearch and Kibana.
I've tried deleting and re-creating the kibana index, tried alternate settings inclusive of the host, elasticsearch url, extending the max ping and timeout, max retries, extended the apache limits, http.cors to allow different origins. I've tried other ports but both servers are indicating that 5601 is listening in the same way.
I also had the same problem on a completely different volume that was previously attached to this instance.
The only difference I can see is that the working version pings fine while the non-working version has a 100% packet loss when pinging the IP, although I can't imagine why that would be, as I'm able to reach the website on 80, just fine. I can also access various other tools running on other ports. I assume there might be some kind of networking conflict. Any ideas?
May be port 5601 is blocked by firewall
Allow incoming connections to port 5601 by:
sudo iptables -I INPUT -p tcp --dport 5601 -j ACCESS
For security:
Modify above mentioned command and accept connection only from specific address. (See man iptables)
or use Shield plugin for elasticseach
Sorry, forgot to update this question. The answer turned out being that I simply needed to deploy a new instance. Simply by creating a clone of the instance, I was able to resolve the issue. I've had networking problems at AWS, before, with their internal dns/ip conflicts, so I've had to do so, in the past and this turned out to be the quickest and cleanest solution, albeit not providing any definitive insight into the cause.
I am using Google Compute Engine to run Mapreduce jobs on Hadoop (pretty much all default configs). While running the job I get a tracking URL of the form http://PROJECT_NAME:8088/proxy/application_X_Y/ but it fails to open. Did I forget to configure something?
To elaborate on the option Amal mentioned in the other answer of using the "external ip address" of your Google Compute Engine VM, you can obtain the external IP address by running gcloud compute instances describe --zone <your zone> <your master hostname> and looking for natIP.
To open port 8088, you'll have to set up a firewall rule opening that port, likely on your default Google Compute Engine network. You'll want to specify a your.ip.address.here/32 address in the --source-ranges to restrict incoming traffic to just your local machine dialing into your VM, otherwise the anyone in the IP source-ranges would be able to access your Hadoop pages.
If you had used bdutil to turn up your cluster, there's an alternative way which is much easier and more secure; simply run
bdutil <your flags used in deployment, like -e hadoop2, --prefix, etc.> socksproxy
to open SSH with dynamic port forwarding to use as a SOCKS5 proxy that your browser can point to. If you're running on Linux or Mac and have Chrome or Firefox installed, bdutil should also print out a copy/paste command for starting a fresh isolated browser pre-configured to use the socks proxy so that you can click through all the useful links.
If bdutil didn't print out a browser command or you didn't use bdutil, you can also run and configure your SSH socks proxy using these instructions. An SSH-based socks proxy is more secure than opening up firewall ports, and also allows the Hadoop page links to work (otherwise you have to keep manually replacing the hostnames with the external IP addresses).
One correction. You are using YARN. So there is no jobtracker. Jobtracker is present in hadoop 1.x. In YARN, the processing layer became a generic framework and the jobtracker got replaced with Resource manager and application master. The UI that you mentioned in the question was of Resource Manager.
For your problem, try the following tips.
Use the public ip address of the resource manager instance instead of PROJECT_NAME.
Check whether the 8088 port is opened for accessing it from outside.
Another (more secure) way to do this is to use gcloud compute to make an ssh tunnel to your deployment, and then launch Chrome though it.
$ gcloud compute ssh clustername --zone=us-central1-a --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n"
You will need to replace clustername with the name of your deployment, and change the --zone if necessary.
From there, you can launch Chrome through it and then reach the hadoop job tracking URL.
$ chrome --proxy-server="socks5://localhost:1080" \
--host-resolver-rules="MAP * 0.0.0.0 , \
EXCLUDE localhost" --user-data-dir=/tmp/clustername
I have installed a single node haodoop cluster on using Hortonworks/Ambari on Amazon's ec2 host.
Since I don't want this cluster running 24/7, I stop the instance when done. When I reboot the instance later, I get a new IP address and then ambari no longer is able to start the Hadoop related services.
Is there a way other than completely redeploying to reconfigure the cluster so the services will start?
It looks like the IP address lives in various xml files under /etc, in the postgres database table ambari, and possibly other places I haven't found yet.
I tried updating the xml files and postgres database with updated versions of the ip address, internal and external dns names as I could find them, but to no avail. I have not been able to restart the services.
The reason I am doing this is to possibly save the deployment time and data configuration on hdfs and other project specific setup each time I restart the host.
Any suggestions?
Thanks!
Elastic IP can be used. Also, since you mentioned it being a single node cluster - you can use localhost or private IP.
If you use elastic IP, your UIs will always be on the same public IP. However, if you use private IP or localhost and do not associate your instance with an elastic IP you will have to look for public IP everytime you start the instance and then connect to the web UI using the IP.
Thanks for the help, both Harman and TJ are correct. I haven't used an elastic IP because I might have more than one of these running and a time, and for now at least, I don't mind looking up the public ip address.
Harman's suggestion of using "localhost" as the fqdn when setting up ambari in the first place is a really good idea in retrospect. Unless I go through the whole setup again, that's water under the bridge for me, but I recommend this to others who might read this post.
In my case, I figured this out on my own before coming back to the page. The specific step I took was insanely simple after all, thanks to Occam's Razor.
I added the following line in /etc/hosts:
<new internal IP> <old internal dns name>
and then did
ambari-server restart. from the command line. Then I am able to restart all services after logging into ambari.
This is probably incredibly simple and I'm just missing one step. The problem I was (originally) trying to solve was how to get a statically allocated hostname, one that would not change with each restart. I've done the following steps:
I have a domain registered on GoDaddy, and it points to my EIP. I use it to connect over SSH (putty) to my EC2 instance, so I know that part is working. I've opened ports 9080, 9060, 9043, and 9443 as well as SSH and FTP ports. And I've installed and started the software that uses those ports, and that stuff normally just works on a local RHEL install, so I think what's different here is the custom domain name.
I've added my EIP and fully qualified host name to my /etc/hosts file.
I've added my fully qualified host name to my /etc/hostname file and modified the /etc/rc.local script to set the hostname properly on a restart, and that works. If I execute the command hostname, it returns my fully qualified hostname, so that looks ok.
I cannot ping my server, but I think that's ok, because probably amazon blocks pings. So I don't think that's a symptom of anything.
I cannot open a to http://myserver.mydomain:9080/, which normally just works. Here it just times out.
If I do a wget http://myserver.mydomain:9080 from inside the EC2 instance, it returns failed: No Route To Host
But if I do a wget against localhost instead of the fully qualified name I get what I expect as a response.
So.... routing tables? Do those need to change? And if so how?
You probably don't want to do what you did. Everything in EC2 is NAT'd. Meaning that the IP assigned to your instance is a private/internal ip and the public IP is mapped to it by the routing system.
So internally, you want everything to resolve to the private IP, or you will get charged for traffic as it has to get routed to the edge and then route back in. Using the public DNS name will resolve correctly from the default DNS servers.
If you are using RHEL, you will need to make sure both the security group and the internal firewall (iptables) have ports opened. You could just disable the internal firewall since its a bit redundant with the security groups. On the other hand, it can provide some options security groups do not if you need them.
I'm at a loss to explain this behavior with web servers on windows. It's in a domain environment with windows firewall set as domain policy.
local web servers - both as localhost:port and FQDM:port
Tomcat OK
IIS OK
WEBrick OK
Jenkin's server - OK
remote access - using FQDM:port
Tomcat No connection
IIS No connection
WEBrick OK
Jenkin's server - OK
What I don't understand what WEBRick and the server Jenkins uses to accept remote connections.
Are there other diagnostics I should look into?
Is it possible to configure Tomcat to use a similar approach?
I can't tell much about WEBRick or Jenkins, but for Tomcat - if you look at Tomcat 7 source (StandardServer.java), you'll see:
// Set up a server socket to wait on
try {
awaitSocket = new ServerSocket(port, 1,
InetAddress.getByName(address));
} catch (IOException e) { ... }
This means, whatever you specify in address (in your server.xml), goes through this.
The contract of InetAddress.getByName says:
The host name can either be a machine name, such as "java.sun.com", or
a textual representation of its IP address. If a literal IP address is
supplied, only the validity of the address format is checked.
If I was you, I'd try setting just the IP address first and see if there are any problems.
The second step is to check whether you have got local name resolution policy incorrect (hosts file). I've been in situations where local hosts file was incorrect or contained non-resolvable entries, causing all sorts of weird issues like the one you're having.
It sounds like your remote request are never reaching the services that don't reply. And that implies it's a firewall or NAT issue. I don't think it's a configuration issue since you said from the local machine localhost:port and FQDN:port both work.
To diagnose, a good first step is to see if there is any communication remotely with telnet.
telnet hostname port
If you don't see a Connected to FQDN. response, then a firewall, hardware or the local software firewall, blocked the connection. You will need to make sure the firewalls in the way have all the proper ports open, forwarding, etc.