Invokehttp POST to public API fails - apache-nifi

I am trying to ingest data from a public API using invoke HTTP processor on a secured Nifi cluster. I have imported the API's certificate to Nifi server cacerts but I am experiencing the error
Routing to failure due to exception: Failed to connect to sandbox-dxl.auth.eu-west1.amazon.com/52.51.xxx.xxx:443.
I am able to ingest data from the same API when I run the pipeline from a local instance of Nifi in my laptop.
When I ping the API URL. I get the following error: says
Name or service not found
Tried pinging the remote url. says Name or service not found

Related

Not able to access Rest Endpoint by machine name over vpn

I am facing issue with Rest Endpoint. When I am trying to access url with machine name instead of localhost, It is giving Access Denied error. It will give this error only over VPN connection otherwise it is working fine without VPN.
I do not have same issue with browser. Browser is able to identify url with machine name. This issue is only when I am trying to consume any endpoint running on different microservice on same machine through Java code or Postman
For Example, If I am consuming some endpoint in Java.
restTemplate.getForEntity("http://localhost:8761/actuator/beans", Object.class).getBody()
//Working fine
restTemplate.getForEntity("http://my_machine_name:8761/actuator/beans", Object.class).getBody()
//Access denied
or through Postman
http://my_machine_name:8761/actuator/beans
Error: connect EACCES 192.xxx.x.x:8761
Mainly I am using Discovery Client to identity the machine name and port so that I need not to hard code localhost in the url. I am using FeignClient for loadbalancer but looks like restTemplate is also giving same error.
I have fixed above error. If you connect your machine with VPN, it change your network. So you need to find which ip address is getting used in your machine. Try ipconfig in command prompt to find the ip address in Windows.
If you give your machine name instead of above IP address then it will not be able to find your machine as your machine name is not available in your network (because of VPN connection).
machine.ip.address=XX.66.223.XXX
eureka.client.service-url.default-zone=http://${machine.ip.address}:8761/eureka
eureka.instance.hostname=${machine.ip.address}
Provide your network ip address in your URL instead of machine name to make it work.

AWS API Gateway fails while invoking backend API using private ip address

I have created public API using AWS api gateway with the resource cars and a GET method. I also have backend API,/api/routing, that is hosted on EC2 windows instance. The backend API only accepts POST request and used for routing the request based on some header values.
In integration request i also have Mapping Template setup so it can POST data to api/routing
So the integration request for cars public API looks like below
The Inbound rules for EC2 instance
Issue
The Endpoint URL is using private ip of EC2 instance. When i Test cars api i get error
Execution failed due to configuration error: Invalid endpoint address
If i change the Endpoint Url to use public ip address then its working as expected
Eventually, i would like to access backend API using private-ip. The EC2 instance is a free instance that AWS created.
I understand that if i have VPC then in API Gateway i need to setup VPC Links, but I have not created any VPC.(unless aws by default creates one).
found it. After creating VPC link I was still selecting Integration Type as HTTP..it should be VPC Link

Logstash: HTTPS Connection to WebHDFS

I am facing issues with WebHDFS.
My organization uses WebHDFS on port 50470, which is both "kerberized" and requires HTTPS.
After following the thread in https://github.com/elastic/logstash/issues/8791, and overcoming the Kerberos issue, I am still facing issues with using Kerberos Authentication with HTTPS for WebHDFS.
I am getting the following logs below:
[2018-12-10T23:08:27,237][ERROR][logstash.outputs.webhdfs ] Webhdfs check request failed. (namenode: :50470, Exception: Failed to connect to host :50470, wrong status line: "\x15\x03\x03\x00\x02\x02")
Googling the web for "\x15\x03\x03\x00\x02\x02", it appears that logstash is trying to communicate via HTTP instead of HTTPS. However, I do not see any settings that allow for communication through HTTPS (not talking about use_ssl_authentication, as I do not need to authenticate my client).
I know that WebHDFS is working fine as curl works (after doing kinit):
curl --negotiate -u : -s -k "https://[hostname]:50470/webhdfs/v1/?op=LISTSTATUS"
May I know if there is a way to communicate via HTTPS for WebHDFS?

Could not establish site-to-site communication for apache nifi

I am working with two instances of nifi.
Instance-1: A secure nifi single node.
Instance-2: A secure 3-node nifi cluster on AWS.
My site to site settings have the below configurations:
Instance-1:
nifi.remote.input.host=<hostname running locally
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
Instance-2:
nifi.remote.input.host=<ec2 public fqdn>.compute.amazonaws.com
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
My remote processor group is in locally running nifi and I am trying to push a flowfile from local to AWS cluster. I am facing error as below:
Error while trying to connect RPG

Access WebHDFS on Hortonworks Hadoop (AWS EC2)

I'm facing an issue with the WebHDFS access on my Amazon EC2 machine. I have installed Hortonworks HDP 2.3 btw.
I can retrieve the file status from my local machine in the browser (chrome) with following http request:
http://<serverip>:50070/webhdfs/v1/user/admin/file.csv?op=GETFILESTATUS
This works fine but if I try to open the file with ?op=OPEN, then it redirects me to the private DNS of the machine, which I cannot access:
http://<privatedns>:50075/webhdfs/v1/user/admin/file.csv?op=OPEN&namenoderpcaddress=<privatedns>:8020&offset=0
I also tried to get access to WebHDFS from the AWS machine itself with this command:
[ec2-user#<ip> conf]$ curl -i http://localhost:50070/webhdfs/v1/user/admin/file.csv?op=GETFILESTATUS
curl: (7) couldn't connect to host
Does anyone know why I cannot connect to localhost or why the OPEN on my local machine does not work?
Unfortunately I couldn't find any tutorial to configure the WebHDFS for a Amazon machine.
Thanks in Advance
What happens is that the namenode redirects you to the datanode. Seems like you installed a single-node cluster, but conceptually the namenode and datanode(s) are distinct, and in your configuration the datanode(s) live/listen on the private side of your EC2 VPC.
You could reconfigure your cluster to host the datanodes on the public IP/DNS (see HDFS Support for Multihomed Networks), but I would not go that way. I think the proper solution is to add a Know gateway, which is a specialized component for accessing a private cluster from a public API. Specifically, you will have to configure the datanode URLs, see Chapter 5. Mapping the Internal Nodes to External URLs. The example there seems spot on for your case:
For example, when uploading a file with WebHDFS service:
The external client sends a request to the gateway WebHDFS service.
The gateway proxies the request to WebHDFS using the service URL.
WebHDFS determines which DataNodes to create the file on and returns
the path for the upload as a Location header in a HTTP redirect, which
contains the datanode host information.
The gateway augments the routing policy based on the datanode hostname
in the redirect by mapping it to the externally resolvable hostname.
The external client continues to upload the file through the gateway.
The gateway proxies the request to the datanode by using the augmented
routing policy.
The datanode returns the status of the upload and the gateway again
translates the information without exposing any internal cluster
details.

Resources