Hadoop timing out trying to write to Cassandra in AWS multi-region configuration - hadoop

I am running a multi-DC Cassandra (open-source, not DSE) cluster in AWS, where one DC (us-west-2) is set up for analytics and the other (us-east) is the transactional store. I'm using NetworkTopologyStrategy with the EC2 snitch, and a consistency level of LOCAL_ONE in my Hadoop config. Hadoop can read from Cassandra without issue, but attempting to write produces a timeout exception.
Running nodetool status shows the DCs are properly configured:
Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token Rack
UN x.x.x.x 1.01 GB 9.9% 9e7f4393-7ac9-4559-b3ff-de48be50016f -9127921345534057723 2a
UN x.x.x.x 1001.16 MB 11.4% d0760383-c3dd-474c-9261-239b71dba3f1 -9221279003374097975 2b
UN x.x.x.x 1.05 GB 11.7% 3f09fbf5-0d85-4283-9009-0ec0e29223c0 -9140104347498952504 2c
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token Rack
UN x.x.x.x 1.1 GB 11.3% 5bbd2de4-e1d2-4a17-9f40-034f60b35954 -9061054426204373981 1b
UN x.x.x.x 1.15 GB 11.5% e34c590e-6176-45b2-a8f9-18b4a9a80032 -9216519687724118609 1c
UN x.x.x.x 1.18 GB 10.9% fa0b0a1a-f156-40fc-a267-970d1eb9cddb -9207673937991303291 1a
UN x.x.x.x 1.46 GB 10.7% b18ae406-c9ec-42b7-a365-b0c6e2fe582f -9206671929961171506 1a
UN x.x.x.x 1.13 GB 11.4% 1ac9c1c5-55ad-4048-b1ba-3b9768933ecc -9146100851344467112 1c
UN x.x.x.x 1.53 GB 11.2% dad665bb-68d9-4811-b421-f33333261867 -9178920986366339267 1b
Stack trace using ColumnFamilyOutputFormat:
java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter$RangeClient.run(ColumnFamilyRecordWriter.java:224)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:123)
at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter$RangeClient.run(ColumnFamilyRecordWriter.java:215)
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 4 more
... and using CqlOutputFormat:
java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:271)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:123)
at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:262)
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 4 more
Both traces ultimately point to AbstractColumnFamilyOutputFormat.createAuthenticatedClient(host, port, conf).
I then opened that source and added some detail to the exception so it would output the host name it's connecting to, which resulted in this trace:
java.io.IOException: java.lang.Exception: Unable to connect to host [hostname]
at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:271)
Caused by: java.lang.Exception: Unable to connect to host [hostname]
at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:139)
at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:262)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:124)
... 1 more
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 4 more
The problem is [hostname] is a machine that's not in the analytics cluster (it's in us-east). Why doesn't it know this automagically, especially when reads work properly? It seems like it's trying all the nodes in the ring regardless of DC.
For the record, writes fail using CqlOutputFormat, ColumnFamilyOutputFormat, and through Pig using CqlStorage and CassandraStorage.

I'd say, try to set the write_request_timeout_in_ms in cassandra.yaml to some very high number and see if that helps. There can be an issue with the node itself, when it is not responding while still appearing as being up. If it still times out, restart service on that node that you suspect is causing the issue.

This issue came down to two things:
For multi-region EC2 setups, Cassandra requires setting broadcast_address to the public IP and the listen_address to the internal IP. In most cases you'll want rpc_address to be the internal IP, but this potentially breaks Cassandra's Hadoop client, which is determining endpoints to talk to based on broadcast_address.
Cassandra's Hadoop client (RingCache specifically) doesn't respect data center on node discovery, and tries to discover all nodes in the ring--including non-local ones. It respects the consistency level on the actual write, but in our case it never got there due to #1.
I filed a ticket and submitted a patch to address these issues:
https://issues.apache.org/jira/browse/CASSANDRA-7252

Related

Cannot connect to remote Neo4j server using neo4j-shell

I have Neo4j 2.3.2 installed on a server whose firewall has ports 1337, 7474, 7473 open (verified using ncat). I can access the web-based console remotely and can access the shell locally, but I cannot access the shell remotely. Namely, when running path/to/neo4j-shell -v -host destination.example.org I get
ERROR (-v for expanded information):
Connection refused
java.rmi.ConnectException: Connection refused to host: <server ip addres>; nested exception is:
java.net.ConnectException: Verbinding is geweigerd
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:129)
at java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:227)
at java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:179)
at com.sun.proxy.$Proxy1.welcome(Unknown Source)
at org.neo4j.shell.impl.AbstractClient.sayHi(AbstractClient.java:257)
at org.neo4j.shell.impl.RemoteClient.findRemoteServer(RemoteClient.java:70)
at org.neo4j.shell.impl.RemoteClient.<init>(RemoteClient.java:62)
at org.neo4j.shell.impl.RemoteClient.<init>(RemoteClient.java:45)
at org.neo4j.shell.ShellLobby.newClient(ShellLobby.java:204)
at org.neo4j.shell.StartClient.startRemote(StartClient.java:355)
at org.neo4j.shell.StartClient.start(StartClient.java:226)
at org.neo4j.shell.StartClient.main(StartClient.java:145)
Caused by: java.net.ConnectException: Verbinding is geweigerd
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.<init>(Socket.java:425)
at java.net.Socket.<init>(Socket.java:208)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:147)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
... 14 more
In /var/lib/neo4j/conf/neo4j.properties, I have
# Enable shell server so that remote clients can connect via Neo4j shell.
remote_shell_enabled=true
# The network interface IP the shell will listen on (use 0.0.0.0 for all interfaces).
remote_shell_host=0.0.0.0
# The port the shell will listen on, default is 1337.
remote_shell_port=1337
Do I need to open other ports apart from the three mentioned?
Is there any further configuration related to the shell I have missed?
neo4j-shell is based on Java RMI. There are couple of resources out there describing how to do firewalling for RMI. In fact it's pretty complex since RMI is doing dynamic port allocation.
Typically I run neo4j-shell locally over an ssh connection, this also gives you authentication and encryption - and you just need to open SSH port.

com.datastax.driver.core.TransportException: [/xx.xxx.x.xxx:9042] Cannot connect

I have 2 node in Cassandra cluster with IP:Port aa.aaa.a.aaa:9043(node) and xx.xxx.x.xxx:9043. When i trying to connect using following config **PoolingOptions poolingOptions = new PoolingOptions();
poolingOptions.setCoreConnectionsPerHost(HostDistance.LOCAL, 2)
.setMaxConnectionsPerHost(HostDistance.LOCAL, 4)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost(HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 200)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 200);
cluster = Cluster.builder()
.addContactPointsWithPorts(socketAddressList)
.withPoolingOptions(poolingOptions)
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)
.withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy())).build();
Session session = cluster.connect(cassandraDB);**
I am getting following exception 16/01/14 09:52:45 INFO core.NettyUtil: Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
16/01/14 09:52:46 WARN core.Cluster: ***You listed /xx.xxx.x.xxx:9043 in your contact points, but it could not be reached at startup*
16/01/14 09:52:47 INFO policies.DCAwareRoundRobinPolicy: Using data-center name 'name' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)ent.Futures$CombinedFuture setExceptionAndMaybeLog
SEVERE: input future failed.
com.datastax.driver.core.TransportException: [/xx.xxx.x.xxx:9042] Cannot connect
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:156)
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:139)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /xx.xxx.x.xxx:9042
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
... 6 more
16/01/14 09:52:47 ERROR core.Session: Error creating pool to /xx.xxx.x.xxx:9042
com.datastax.driver.core.TransportException: [/xx.xxx.x.xxx:9042] Cannot connect
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:156)
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:139)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /xx.xxx.x.xxx:9042
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)**
My Question are:
why it is trying to connect on port 9042, while no where i am using this port in code as well as in config file ?
cassandra version: Cassandra 2.2.1
why it is trying to connect on port 9042, while no where i am using this port in code as well as in config file ?
9042 is the default port for the CQL binary protocol.
Can you show us the content of the variable socketAddressList that you passed to the cluster builder ?
Is there any reason you're using port 9043 instead of the default 9042 port ?

HDP (2.3): Squirrel (3.7): Phoenix (4.4.0.2.3.0.0-2557): Hbase connection time out

We have HDP installed on AWS EC2 Cluster (2 Name Nodes, 3 Data Nodes, and 1 Management Server). In order to use phoenix (4.4.0.2.3.0.0-2557) with Hbase, we have followed basic steps mentioned in Hortonworks documentation.
(Hortonworks-Phoenix Installation Guide)
In order to connect with server, We have used phoenix-client jar with same version as server and written a basic Java driver for connection.
Class.forName("org.apache.phoenix.jdbc.PhoenixDriver").newInstance();
Connection conn = DriverManager.getConnection("jdbc:phoenix:zookeeper_quorom_server_ip","","");
Getting below error after running this program.
org.apache.phoenix.exception.PhoenixIOException: callTimeout=1200000, callDuration=1237464:
at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:881)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1215)
at org.apache.phoenix.query.DelegateConnectionQueryServices.createTable(DelegateConnectionQueryServices.java:112)
at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:1902)
at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:744)
at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:186)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:303)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:293)
at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1236)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1893)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1862)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1862)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:180)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:132)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:151)
at java.sql.DriverManager.getConnection(Unknown Source)
at java.sql.DriverManager.getConnection(Unknown Source)
at HadoopConnector.main(HadoopConnector.java:17)
Caused by: java.net.SocketTimeoutException: callTimeout=1200000, callDuration=1237464:
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3917)
at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:441)
at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:463)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:815)
... 20 more
Caused by: org.apache.hadoop.hbase.MasterNotRunningException: org.apache.hadoop.hbase.MasterNotRunningException: Can't get connection to ZooKeeper: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1533)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1553)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1704)
at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
... 24 more
Caused by: org.apache.hadoop.hbase.MasterNotRunningException: Can't get connection to ZooKeeper: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.checkIfBaseNodeAvailable(ConnectionManager.java:906)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.access$400(ConnectionManager.java:545)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1483)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1524)
... 28 more
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:222)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:541)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.checkIfBaseNodeAvailable(ConnectionManager.java:895)
... 31 more
When I try to configure squirrel ( hbase-phoniex-SQL-client) then I am getting similar timeout exception while connecting.
Also tried pinging my zookeeper quoroum server IP from command prompt then it gives a request timeout error. Not sure what I am doing wrong here.

Hadoop Balancer fails with - IOException: Couldn't set up IO streams (LeaseRenewer Warning)

I am stumbling across this error while running the Hadoop Balancer via Namenode. Anytips on cracking this. The process is also blocking the current user and giving an Out of Memory error on issuing any other command.
14/05/09 11:30:05 WARN hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-77290934_1] for 936 seconds. Will retry shortly ...
java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "hadoop01.xx.xx.xx.xx.com/30.0.1.176"; destination host is: "hadoop01.xx.xx.xx.xx.com":8022;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:763)
at org.apache.hadoop.ipc.Client.call(Client.java:1242)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.renewLease(Unknown Source)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:458)
at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:649)
at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Couldn't set up IO streams
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:671)
at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:252)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1291)
at org.apache.hadoop.ipc.Client.call(Client.java:1209)
... 15 more
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:664)
... 18 more
Once the number of threads created by Hadoop RPC reaches the (ulimit -u) on the node's number of processes, java will report it as an out-of-memory error.
Try increasing the maximum number of processes allowed, i.e. your ulimit -u value.

JMeter: Distributed (Remote) Testing: Unable to run tests remotely

I setup a distributed load testing environment using JMeter. I am running a Linux Virtual Machine (CentOS) on my Windows Vista (Host). The Linux VM is the JMeter Master (client). I have a server (Linux CentOS) that is my JMeter Slave (server).
I did the following:
1) Added the following to client (master) jmeter.properties:
remote_hosts=172.22.222.22:55501 #IP address of the JMeter Slave
client.rmi.localport=55512
mode=Batch
num_sample_threshold=250
2) Added the following to server (slave) jmeter.properties:
server_port=55501
server.rmi.localhostname=172.22.222.22
server.rmi.localport=55511
3) Added the following to server (slave) jmeter-server:
RMI_HOST_DEF=-Djava.rmi.server.hostname=172.22.222.22
4) Then from my Master, I did:
ssh -R 55512:localhost:55512 172.22.222.22
5) Then I started the jmeter server:
sudo ./jmeter-server
I got:
Using local port: 55511
Created remote object: UnicastServerRef [liveRef: [endpoint:[172.22.222.22:55511](local),objID:[637a4bg5:14185b4361e:-7fff, 894250217845851586]]]
6) Then from my Master, I launched the JMeter GUI, and did
Run --> Remote Start --> 172.22.222.22
I got the following error:
2013/10/04 16:03:06 ERROR - jmeter.gui.action.RemoteStart: Failed to initialise remote engine java.rmi.ConnectException: Connection refused to host: 172.22.222.22; nested exception is:
java.net.ConnectException: Connection refused
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:340)
at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
at java.rmi.Naming.lookup(Naming.java:101)
at org.apache.jmeter.engine.ClientJMeterEngine.getEngine(ClientJMeterEngine.java:54)
at org.apache.jmeter.engine.ClientJMeterEngine.<init>(ClientJMeterEngine.java:67)
at org.apache.jmeter.gui.action.RemoteStart.doRemoteInit(RemoteStart.java:176)
at org.apache.jmeter.gui.action.RemoteStart.doAction(RemoteStart.java:79)
at org.apache.jmeter.gui.action.ActionRouter.performAction(ActionRouter.java:81)
at org.apache.jmeter.gui.action.ActionRouter.access$000(ActionRouter.java:40)
at org.apache.jmeter.gui.action.ActionRouter$1.run(ActionRouter.java:63)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:251)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:727)
at java.awt.EventQueue.access$200(EventQueue.java:103)
at java.awt.EventQueue$3.run(EventQueue.java:688)
at java.awt.EventQueue$3.run(EventQueue.java:686)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:697)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:146)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:138)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:91)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.<init>(Socket.java:425)
at java.net.Socket.<init>(Socket.java:208)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:146)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
... 26 more
Can anyone please help me figure out what I did wrong, and how can I resolve this issue?
I tried turning off iptables on both client and server, but I get the same thing:
sudo service iptables stop
sudo chkconfig iptables off
I have seen this issue. You need to setup reverse SSH tunnels from master to client for results transfer. Check this: http://rolfje.wordpress.com/2012/02/16/distributed-jmeter-through-vpn-and-ssl/
See my answer here
I think the only way to work around this is to setup a full featured VPN.

Resources