I'm installing a three node non-secure nifi cluster (1.5) with embedded zookeeper.
The installation has completed, the cluster startup up and election have finished, without any obvious issues.
However when I hit the gui I see the following:
javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)
In nifi-apps.log there is:
2018-03-07 14:41:37,857 WARN [Replicate Request Thread-7] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET /nifi-api/flow/current-user to test-nifi01:8080 due to javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)
2018-03-07 14:41:37,858 WARN [Replicate Request Thread-7] o.a.n.c.c.h.r.ThreadPoolRequestReplicator
javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)
at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:284)
at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:278)
at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$0(JerseyInvocation.java:753)
....
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
....
in nifi-user.log:
2018-03-07 15:18:49,475 INFO [NiFi Web Server-21] org.apache.nifi.web.filter.RequestLogger Attempting request for (<no user found>) POST http://test-nifi01:8080/nifi-api/access/kerberos (source ip: 172.100.1.11)
2018-03-07 15:18:49,476 INFO [NiFi Web Server-21] o.a.n.w.a.c.IllegalStateExceptionMapper java.lang.IllegalStateException: Access tokens are only issued over HTTPS.. Returning Conflict response.
2018-03-07 15:18:49,572 INFO [NiFi Web Server-17] org.apache.nifi.web.filter.RequestLogger Attempting request for (<no user found>) POST http://test-nifi01:8080/nifi-api/access/oidc/exchange (source ip: 172.100.1.11)
2018-03-07 15:18:49,573 INFO [NiFi Web Server-17] o.a.n.w.a.c.IllegalStateExceptionMapper java.lang.IllegalStateException: User authentication/authorization is only supported when running over HTTPS.. Returning Conflict response.
2018-03-07 15:18:49,672 INFO [NiFi Web Server-23] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://test-nifi01:8080/nifi-api/flow/current-user (source ip: 172.100.1.11)
It's the same results regardless of what host I connect to.
I'm not sure what I need to be checking here.
EDIT: Updated with more detail.
Node1:
https://pastebin.com/5nMzWfXw
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=test-nifi01
nifi.web.http.port=8080
nifi.web.http.network.interface.default=eth0
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=test-nifi01
nifi.cluster.node.protocol.port=11003
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
Node 2: https://pastebin.com/n5GMy1nA
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=test-nifi02
nifi.web.http.port=8080
nifi.web.http.network.interface.default=eth0
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=test-nifi02
nifi.cluster.node.protocol.port=11003
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
Node 3: https://pastebin.com/EqztLfnD
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=test-nifi03
nifi.web.http.port=8080
nifi.web.http.network.interface.default=eth0
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=test-nifi03
nifi.cluster.node.protocol.port=11003
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
Anyone have any thoughts on what I should be looking at?
EDIT 2:
It's looks like each node is refusing to connect to itself. E.g on node 2 we have
2018-03-07 20:31:08,099 WARN [Replicate Request Thread-4] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET /nifi-api/flow/current-user to test-nifi02:8080 due to javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)
and a curl also fails:
[taas#test-nifi02 logs]$ curl http://test-nifi02:8080
curl: (7) Failed connect to test-nifi02:8080; Connection refused
Curls to the other nodes are fine. The same behavior is occurring on each node, e,g node1 can't curl to node1 and node3 can't curl to node3.
Make sure you have dns enabled for network/subnet . if you dont have luxury of dns try updating /etc/hosts files with the exact ip address and hostnames
ex:
192.168.0.1 test-nifi01
192.168.0.2 test-nifi02
in all the hosts of the nifi nodes then try connecting it. and also make sure to check if you can able to connect from one node to another node by using shell via ssh
In a NiFi cluster, a request comes in to one node, and then that node replicates the request to all other nodes.
This issue shown nifi-app.log is indicating that the connection was refused between the current node test-nifi01:8080.
I assume you have hosts like test-nifi01, test-nifi02, and test-nifi03, so each of these nodes would need to be able to resolve the other hostnames. You can start by issuing a ping from one node to the other two, and do that on each node.
Related
we have some issue with ambari-metrics-collector service , ( we have HDP cluster version - 2.6.4 with 8 nodes )
ambari metrics collector service can’t start or start of few second then failed
the details about metrics collector version
rpm -qa | grep metrics
ambari-metrics-grafana-2.6.1.0-143.x86_64
ambari-metrics-monitor-2.6.1.0-143.x86_64
ambari-metrics-collector-2.5.0.3-7.x86_64
ambari-metrics-hadoop-sink-2.6.1.0-143.x86_64
all machines are rhel 7.2
we performed the following steps in order to resolve the problem
1.restart metrics-collector service
su - ams -c '/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ stop'
su - ams -c '/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ start'
or
ambari-metrics-collector stop
ambari-metrics-collector start
2.restart ambari-metrics-monitor on all nodes
ambari-metrics-monitor stop
ambari-metrics-monitor start
3.clean the folder /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/
mv /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/zookeeper_0 /tmp/bck/zookeeper/
Then restart metrics-collector service
4.Tuning the metrics-collector parameters according - https://docs.cloudera.com/HDPDocuments/Ambari-2.2.1.0/bk_ambari_reference_guide/content/_ams_general_guidelines.html
we update the follwing parameters in ambari
metrics_collector_heap_size=1024
hbase_regionserver_heapsize=1024
hbase_master_heapsize=512
hbase_master_xmn_size=128
status for now: - steps 1-4 doesn’t help
From the logs we can see the following:
log file - ambari-metrics-collector.log
2020-06-25 09:06:14,474 WARN org.apache.zookeeper.ClientCnxn: Session 0x172eab71f310002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2020-06-25 09:06:14,575 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=master02.sys671.com:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/meta-region-server
log file - hbase-ams-master-master02.sys671.com.log
2020-06-25 09:38:18,799 WARN [RS:0;master02:51842-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Session 0x172ead5d73a0004 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2020-06-25 09:38:20,437 INFO [main-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Opening socket connection to server master02.sys671.com/23.2.35.171:61181. Will not attempt to authenticate using SASL (unknown error)
2020-06-25 09:38:20,438 WARN [main-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Session 0x172ead5d73a0002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
we also not see that port is listening ( timeline.metrics.service.webapp.address )
netstat -tulpn | grep 6188
any advice how to continue from this point ?
we'll appreciate to get any help about this problem
I'm trying to kerberise my HBase Cluster and I get some problems with Zookeeper. When I start Hbase I get this error on the Master log :
ERROR [main-SendThread(X.X.X.X:2181)] client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.
ERROR [main-SendThread(X.X.X.X:2181)] zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.
DEBUG [main-EventThread] zookeeper.ZKWatcher: master:16000-0x16c236187be0000, quorum=Y.Y.Y.Y:2181,X.X.X.X:2181, baseZNode=/hbase Received ZooKeeper Event, type=None, state=AuthFailed, path=null
DEBUG [main] zookeeper.ZooKeeper: Close called on already closed client
On the Zookeeper log, I get :
WARN [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181] quorum.Learner: Unexpected exception, tries=0, connecting to /X.X.X.X:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
I verified my firewall, the ports are open
For the configuration, I followed the HBase Reference Guide :
http://hbase.apache.org/book.html#zk.sasl.auth
At first I thought it was a problem with my keytab but Hadoop is working fine with it.
I run HBase 2.0.5, Hadoop 3.1.2 and the Zookeeper is the one provided by HBase.
Following #SamsonScharfrichter 's comment, I've tried a few things :
I've created and specified in /etc/hosts the FQDN of my servers and modified my configurations to reflect this change.
Changed the hostname of my servers for the FQDN
tried to nslookup my hostnames, didn't work since they are specified in /etc/hosts
It didn't do anything, I'm still getting the error. My guess is that Kerberos tries to search for a DNS on my public NIC and not my private. I do not know why it struggles so hard to find my servers, since hadoop has absolutely no problem with it.
EDIT - I set up a private DNS on my network. DNS working great, still getting the error. I'm about to give up
EDIT 2 - I installed tshark on the node with the error. Apparently I get a frame with the message :
Error: KRB5KDC_ERR_C_PRINCIPAL_UNKNOWN
which is weird, I verified my keytab and the principals listed in kadmin. Maybe there defaults principals that I don't use ?
I have a Storm cluster with 1 Nimbus, 4 Supervisors and 2 Zookeeper nodes. My Storm.yaml is as following:
storm.zookeeper.servers:
- "storage14"
- "storage15"
nimbus.seeds: ["storage01"]
#storm.local.hostname: "storage05"
supervisor.supervisors:
- "storage02"
- "storage03"
- "storage04"
- "storage05"
storm.local.dir: "/tmp/storm"
worker.childopts: "-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump"
This storm.yaml file is used by both Nimbus and Supervisors. When Nimbus is started I have the storm.local.hostname commented out as is shown above.
However, when starting Supervisors on respective nodes, I uncomment the storm.local.hostname and set it to the hostname of the node on which the supervisor is being launched. For instance if I was launching the supervisor on storage05, the storm.yaml file would have the following additional config param:
storm.local.hostname: "storage05"
The problem is even though Nimubs is launched successfully and I can see it on the Storm UI, some supervisors do not seem to be able to connect to Nimbus. For instance of the 4 nodes I start supervisors on, Storm UI often shows only 2 of them connected. However, if I ssh in to these nodes and run jps, I can see that the supervisor process is running on ALL of these nodes.
The Supervisors at the nodes which do end up connecting are not the same always, so it is definitely not a problem with those specific nodes.
Another thing to notice is if I try to execute a topology on whatever nodes that got connected, it does not get registered by the cluster and I can not see that topology on the UI either.
What do you think might be causing this erratic behavior?
UPDATE:
Tail end of nimbus.log has the following lines
2017-01-25 00:04:25.216 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage14/192.168.140.194:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
Your UPDATE (nimbus log) indicates that your Nimbus cannot connect Zookeeper cluster. Please check that Zookeeper cluster (storage14/storage15) is accessible from storage01 (not only node is accessible, but also do telnet to Zookeeper server via "telnet storage14 (and/or storage15) 2181").
When ZK connectivity issue is gone please try starting supervisor again.
I have 2 node in Cassandra cluster with IP:Port aa.aaa.a.aaa:9043(node) and xx.xxx.x.xxx:9043. When i trying to connect using following config **PoolingOptions poolingOptions = new PoolingOptions();
poolingOptions.setCoreConnectionsPerHost(HostDistance.LOCAL, 2)
.setMaxConnectionsPerHost(HostDistance.LOCAL, 4)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost(HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 200)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 200);
cluster = Cluster.builder()
.addContactPointsWithPorts(socketAddressList)
.withPoolingOptions(poolingOptions)
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)
.withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy())).build();
Session session = cluster.connect(cassandraDB);**
I am getting following exception 16/01/14 09:52:45 INFO core.NettyUtil: Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
16/01/14 09:52:46 WARN core.Cluster: ***You listed /xx.xxx.x.xxx:9043 in your contact points, but it could not be reached at startup*
16/01/14 09:52:47 INFO policies.DCAwareRoundRobinPolicy: Using data-center name 'name' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)ent.Futures$CombinedFuture setExceptionAndMaybeLog
SEVERE: input future failed.
com.datastax.driver.core.TransportException: [/xx.xxx.x.xxx:9042] Cannot connect
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:156)
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:139)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /xx.xxx.x.xxx:9042
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
... 6 more
16/01/14 09:52:47 ERROR core.Session: Error creating pool to /xx.xxx.x.xxx:9042
com.datastax.driver.core.TransportException: [/xx.xxx.x.xxx:9042] Cannot connect
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:156)
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:139)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /xx.xxx.x.xxx:9042
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)**
My Question are:
why it is trying to connect on port 9042, while no where i am using this port in code as well as in config file ?
cassandra version: Cassandra 2.2.1
why it is trying to connect on port 9042, while no where i am using this port in code as well as in config file ?
9042 is the default port for the CQL binary protocol.
Can you show us the content of the variable socketAddressList that you passed to the cluster builder ?
Is there any reason you're using port 9043 instead of the default 9042 port ?
I am getting the following errors in the logs of the slave RegionServer.
The problem seems to be at
regionserver.HRegionServer: reportForDuty to
master=localhost,60000,1397430611631 with port=60020
The master is set as localhost but should actually be pointing towards master.
I am unable to figure out how the slave figures out who the master even after going through the docs.
The complete log is:
2014-04-14 04:49:35,939 INFO [regionserver60020] regionserver.HRegionServer: CompactionChecker runs every 10sec
2014-04-14 04:49:35,950 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=localhost,60000,1397430611631 with port=60020, startcode=1397431174733
2014-04-14 04:49:36,083 WARN [regionserver60020] regionserver.HRegionServer: error telling master we are up
com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1671)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:5402)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2013)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:846)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:578)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:866)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1536)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
... 5 more
check your /etc/hosts file,if there is something like
127.0.0.1 localhost yourhost
change it to
127.0.0.1 localhost
192.168.1.1 yourhost