Nifi 1.5.0 Cluster configuration - apache-nifi

Does anyone know how to cluster NiFi 1.5.0? I want to use dataflow.mydomain.com but... I get this error when I try to hit the loadbalancer that reads:
"The request contained an invalid host header [dataflow.mydomain.com] in the request [/nifi/]. Check for request manipulation or third-party intercept."
According to one post that I read, the problem was that the value of nifi.web.http.host had to match the value of the url.
If that's true, I don't understand how a cluster would be possible.
Thanks!
(I'm using a 3 host setup in AWS, the hosts will individually respond if I set the nifi.web.http.host to their private IP and I access it at http://[ip]/nifi/
but not if I use a loadbalancer in front of the cluster).

It is not really an issue of clustering NiFi, it is an issue of accessing it through a load balancer. A cluster does not imply a load balancer.
In the next version of NiFi there will be a new property (nifi.web.proxy.host) where you could put dataflow.mydomain.com and it would let it through.
For now I think you'd have to strip off the host header of each request at your load balancer so that it doesn't get passed on to the NiFi nodes, that it was is triggering the rejection. NiFi is inspecting the headers of the incoming request and seeing that the host header has a value that is not the host of NiFi.

Related

Elasticsearch Python - No viable nodes were discovered on the initial sniff attempt

I have a cluster of Elasticsearch nodes running on different AWS EC2 instances. They internally connect via a network within AWS, so their network and discovery addresses are set up within this internal network. I want to use the python elasticsearch library to connect to these nodes from the outside. The EC2 instances have static public IP addresses attached, and the elastic instances allow https connections from anywhere. The connection works fine, i.e. I can connect to the instances via browser and via the Python elasticsearch library. However, I now want to set up sniffing, so I set up my Python code as follows:
self.es = Elasticsearch([f'https://{elastic_host}:{elastic_port}' for elastic_host in elastic_hosts],
sniff_on_start=True, sniff_on_connection_fail=True, sniffer_timeout=60,
sniff_timeout=10, ca_certs=ca_location, verify_certs=True,
http_auth=(elastic_user, elastic_password))
If I remove the sniffing parameters, I can connect to the instances just fine. However, with sniffing, I immediately get elastic_transport.SniffingError: No viable nodes were discovered on the initial sniff attempt upon startup.
http.publish_host in the elasticsearch.yml configuration is set to the public IP address of my EC2 machines, and the /_nodes/_all/http endpoint returns the public IPs as the publish_address (i.e. x.x.x.x:9200).
We have localized this problem to the elasticsearch-py library after further testing with our other microservices, which could perform sniffing with no problem.
After testing with our other microservices, we found out that this problem was related to the elasticsearch-py library rather than our elasticsearch configuration, as our other microservice, which is golang based, could perform sniffing with no problem.
After further investigation we linked the problem to this open issue on the elasticsearch-py library: https://github.com/elastic/elasticsearch-py/issues/2005.
The problem is that the authorization headers are not properly passed to the request made to Elasticsearch to discover the nodes. To my knowledge, there is currently no fix that does not involve altering the library itself. However, the error message is clearly misleading.

Can one NiFi node have multiple host names?

Problem:
Not able to allow multiple host names for one single NiFi node.
Description:
I have an internal NiFi server with internal computer name 'nifi-1'. nifi.properties has the following:
nifi.web.https.host=0.0.0.0
nifi.web.https.port=9443
This works fine when I hit "https://nifi-1:9443/nifi/" internally.
I have another dns name - "nifi-1.company.com" (both names must be supported) that is routed to the same nifi node. The nifi node rejects with the following error messages when I hit "https://nifi-1.company.com:9443/nifi/":
System Error
The request contained an invalid host header [nifi-1.company.com:9443] in the request [/nifi]. Check for request manipulation or third-party intercept.
Valid host headers are [empty] or:
127.0.0.1
127.0.0.1:9443
localhost
localhost:9443
[::1]
[::1]:9443
nifi-1
nifi-1:9443
10.0.1.82
10.0.1.82:9443
0.0.0.0
0.0.0.0:9443
Question:
How to resolve this problem? Any solutions? (Thanks!)
Another way to phrase the question is how I may add more host names into the list of "valid host headers" as the above.
This issue was pointed at in NiFi 1.5 NIFI-4761. To resolve this issue, whitelist the hostname used to access NiFi using the following parameter in the nifi.properties configuration file :
nifi.web.proxy.host = host:port
Its a comma-separated list of allowed HTTP Host header values to consider when NiFi is running securely and will be receiving requests to a different host[:port]. For example, when running in a Docker container or behind a proxy (e.g. localhost:18443, proxyhost:443). By default, this value is blank, meaning NiFi should allow only requests sent to the host[:port] that NiFi is bound to.
original answer source: how to use nifi.web.proxy.host and nifi.web.proxy.context.path?

Can I access to Nifi Rest-API using localhost instead of actual node-ip address in Nifi cluster?

For example; I have 3 nifi nodes in nifi cluster. Example hostnames of these nodes;
192.168.12.50:8080(primary)
192.168.54.60:8080
192.168.95.70:8080
I know that I can access to nifi-rest api from all nifi nodes. I have GetHTTP processor for get cluster summary from rest-api, and this processor runs on only pimary node. I did set "URL" property of this processor to 192.168.12.50:8080/nifi-api/controller/cluster.
But, if primary node is down, new primary node will be elected. Thus, I will not be able to access 192.168.12.50:8080 address from new primary node. Because this node was down. So, I will not be able to get cluster summary result from rest-api.
In this case, Can I use "localhost:8080/nifi-api/controller/cluster" instead of "192.168.12.50:8080/nifi-api/controller/cluster" for each node in nifi cluster?
It depends on a few things... if you are running securely then you have certificates that are generated for each node specific to the hostname, so the host in the web requests needs to match the host in the certificates, so you can't use localhost in that case.
It also depends how NiFi's web server is configured. If nifi.web.http.host or nifi.web.https.host has a specific hostname specified, then the web server is only bound to that hostname and may not accept connections with a different hostname. In a default unsecure setup, if you leave nifi.web.http.host blank then it binds to all interfaces.
You may be able to use the expression language function to obtain the hostname of the current node. So you could make the url something like "http://${hostname()}/nifi-api/controller/cluster".

Nifi - Remote Process Group - PeerSelector

I have build a simple Process Group. It generates a FlowFile with some random stuff in it and sends it to the Nifi Remote Process Group.
This Remote Process Group is configured to send the FlowFile to localhost or in this case to my own Hostname (I have tried localhost as well).
After this the FlowFile should Appear at the "From MiNiFi" input Port and is sended to the LogAttribute. Nothing Special.
I configured to using RAW but with HTTP it neither works.
I am using the apache/nifi docker image and didn´t changed something in nifi.properties and authorizers.xml but of couse i provide you both:
nifi.properties
authorizers.xml
The Error occuring is this:
WARNING org.apache.nifi.remote.client.PeerSelector#40081613 Unable to refresh Remote Group´s peers due to Unable to communicate with remote Nifi cluster in order to determine which nodes exist in the remote cluster
I hope you can help me. I have wasted too much time with this Problem XD
In nifi.properties you have nifi.web.http.host=f4f40c87b65f so that means the hostname that NiFi is listening for requests on is f4f40c87b65f which means the URL of your RPG must be http://f4f40c87b65f:8080/nifi

How to connect to Elasticsearch server remotely using load balancer

There might be a post which I am looking for. I have very limited time and got requirement at the last moment. I need to push the code to QA and setup elasticsearch with admin team. Please respond me as soon as possible or share the link which has similar post!!.
I have scenario wherein I will have multiple elasticsearch servers, one is hosted on USA , another one in UK and one more server is hosted in India within the same network(companies network) which shares same cluster name. I can set multicast to false and unicast to provide host and IP address information to form a topology.
Now in my application I know that I have to use Transport cLient as follows,
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "myClusterName").build();
Client client = new TransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress("host1", 9300))
.addTransportAddress(new InetSocketTransportAddress("host2", 9300));
Following are my concerns,
1) As per the above information, admin team will just provide the single ip address that is load balancer ip address and the loadbalancer will manage the request and response handling .I mean the loadbalance is responsible to redirect to the respective elasticsearch server . Here my question is, Is it okay to use Transport client to connect to the host with the portnumber as follows ,
new TransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress("loadbalancer-ip-address", “loadbalance-port-number”)) ;
If loadbalancer will redirect the request to elastcisearch server what should be the configuration to loadbalancer like, we need to provde all the elasticsearch host or ipaddress details to it? so that at any given point of time , if there is any failure to the master elasticsearch server it will pick another master.
2) What is the best configuration for 4 nodes or elasticsearch servers like, shards , replicas and etc.
Each node will have one primary shard and 1 replicate ? which can be configured in elasticsearch.yml
Please replay me as soon as possible.
Thanks in advance.

Resources