Could not establish site-to-site communication for apache nifi - amazon-ec2

I am working with two instances of nifi.
Instance-1: A secure nifi single node.
Instance-2: A secure 3-node nifi cluster on AWS.
My site to site settings have the below configurations:
Instance-1:
nifi.remote.input.host=<hostname running locally
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
Instance-2:
nifi.remote.input.host=<ec2 public fqdn>.compute.amazonaws.com
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
My remote processor group is in locally running nifi and I am trying to push a flowfile from local to AWS cluster. I am facing error as below:
Error while trying to connect RPG

Related

Use Nifi to copy/move logs from different Nifi servers into AWS S3

We have a Nifi cluster of 4 servers and we want to ingest the logs of all the servers on S3 . Is there a way in Nifi using which we can ingest the logs of each Nifi server to S3. Logs on each node are stored on its local (separate disk mounted for Nifi logs -: /data/logs/nifi)

How to setup a nifi Registry on a Server

I have been able to setup a Nifi Registry locally and connect it to my Local Nifi Cluster.
But i have Nifi Cluster of my organization (which is on a different port) and i want to setup a Nifi Registry for it. Thus i have setup the nifi registry on a server.
Can anyone help me with procedure for doing this

NiFi - connect to another instance (S2S)

I'm trying to use the SiteToSiteProvenance Reporting Task.
The objective is to send provenance data between two dockerized instances of NiFi, one at port 8080 and another at port 9090.
I've created a input port creatively called "IN" on the destination NiFi and the service configuration on the source NiFi is:
However I'm getting the following error:
Unable to refresh Remote Group's peers due to Unable to communicate with remote NiFi cluster in order to determine which nodes exist in the remote cluster
I've also exposed the port 10000 in the destination docker.
As mentioned in the comments, it appears there was a networking issue between the containers.
It was finally resolved by the asker by not using containers.

Access WebHDFS on Hortonworks Hadoop (AWS EC2)

I'm facing an issue with the WebHDFS access on my Amazon EC2 machine. I have installed Hortonworks HDP 2.3 btw.
I can retrieve the file status from my local machine in the browser (chrome) with following http request:
http://<serverip>:50070/webhdfs/v1/user/admin/file.csv?op=GETFILESTATUS
This works fine but if I try to open the file with ?op=OPEN, then it redirects me to the private DNS of the machine, which I cannot access:
http://<privatedns>:50075/webhdfs/v1/user/admin/file.csv?op=OPEN&namenoderpcaddress=<privatedns>:8020&offset=0
I also tried to get access to WebHDFS from the AWS machine itself with this command:
[ec2-user#<ip> conf]$ curl -i http://localhost:50070/webhdfs/v1/user/admin/file.csv?op=GETFILESTATUS
curl: (7) couldn't connect to host
Does anyone know why I cannot connect to localhost or why the OPEN on my local machine does not work?
Unfortunately I couldn't find any tutorial to configure the WebHDFS for a Amazon machine.
Thanks in Advance
What happens is that the namenode redirects you to the datanode. Seems like you installed a single-node cluster, but conceptually the namenode and datanode(s) are distinct, and in your configuration the datanode(s) live/listen on the private side of your EC2 VPC.
You could reconfigure your cluster to host the datanodes on the public IP/DNS (see HDFS Support for Multihomed Networks), but I would not go that way. I think the proper solution is to add a Know gateway, which is a specialized component for accessing a private cluster from a public API. Specifically, you will have to configure the datanode URLs, see Chapter 5. Mapping the Internal Nodes to External URLs. The example there seems spot on for your case:
For example, when uploading a file with WebHDFS service:
The external client sends a request to the gateway WebHDFS service.
The gateway proxies the request to WebHDFS using the service URL.
WebHDFS determines which DataNodes to create the file on and returns
the path for the upload as a Location header in a HTTP redirect, which
contains the datanode host information.
The gateway augments the routing policy based on the datanode hostname
in the redirect by mapping it to the externally resolvable hostname.
The external client continues to upload the file through the gateway.
The gateway proxies the request to the datanode by using the augmented
routing policy.
The datanode returns the status of the upload and the gateway again
translates the information without exposing any internal cluster
details.

Hadoop Dedoop Application unable to contact Hadoop Namenode : Getting "Unable to contact Namenode" error

I'm trying to use the Dedoop application that runs using Hadoop and HDFS on Amazon EC2. The Hadoop cluster is set up and the Namenode JobTracker and all other Daemons are running without error.
But the war Dedoop.war application is not able to connect to the Hadoop Namenode after deploying it on tomcat.
I have also checked to see if the ports are open in EC2.
Any help is appreciated.
If you're using Amazon AWS, I highly recommend using Amazon Elastic Map Reduce. Amazon takes care of setting up and provisioning the Hadoop cluster for you, including things like setting up IP addresses, NameNode, etc.
If you're setting up your own cluster on EC2, you have to be careful with public/private IP addresses. Most likely, you are pointing to the external IP addresses - can you replace them with the internal IP addresses and see if that works?
Can you post some lines of the Stacktrace from Tomcat's log files?
Dedoop must etablish an SOCKS proxy server (similar to ssh -D port username#host) to pass connections to Hadoop nodes on EC2. This is mainly because Hadoop resolves puplic IPS to EC2-internal IPs which breaks MR Jobs submission and HDFS management.
To this end Tomcat must be configured to to etablish ssh connections. The setup procedure is described here.

Resources