JMeter: Distributed (Remote) Testing: Unable to run tests remotely - jmeter

I setup a distributed load testing environment using JMeter. I am running a Linux Virtual Machine (CentOS) on my Windows Vista (Host). The Linux VM is the JMeter Master (client). I have a server (Linux CentOS) that is my JMeter Slave (server).
I did the following:
1) Added the following to client (master) jmeter.properties:
remote_hosts=172.22.222.22:55501 #IP address of the JMeter Slave
client.rmi.localport=55512
mode=Batch
num_sample_threshold=250
2) Added the following to server (slave) jmeter.properties:
server_port=55501
server.rmi.localhostname=172.22.222.22
server.rmi.localport=55511
3) Added the following to server (slave) jmeter-server:
RMI_HOST_DEF=-Djava.rmi.server.hostname=172.22.222.22
4) Then from my Master, I did:
ssh -R 55512:localhost:55512 172.22.222.22
5) Then I started the jmeter server:
sudo ./jmeter-server
I got:
Using local port: 55511
Created remote object: UnicastServerRef [liveRef: [endpoint:[172.22.222.22:55511](local),objID:[637a4bg5:14185b4361e:-7fff, 894250217845851586]]]
6) Then from my Master, I launched the JMeter GUI, and did
Run --> Remote Start --> 172.22.222.22
I got the following error:
2013/10/04 16:03:06 ERROR - jmeter.gui.action.RemoteStart: Failed to initialise remote engine java.rmi.ConnectException: Connection refused to host: 172.22.222.22; nested exception is:
java.net.ConnectException: Connection refused
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:340)
at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
at java.rmi.Naming.lookup(Naming.java:101)
at org.apache.jmeter.engine.ClientJMeterEngine.getEngine(ClientJMeterEngine.java:54)
at org.apache.jmeter.engine.ClientJMeterEngine.<init>(ClientJMeterEngine.java:67)
at org.apache.jmeter.gui.action.RemoteStart.doRemoteInit(RemoteStart.java:176)
at org.apache.jmeter.gui.action.RemoteStart.doAction(RemoteStart.java:79)
at org.apache.jmeter.gui.action.ActionRouter.performAction(ActionRouter.java:81)
at org.apache.jmeter.gui.action.ActionRouter.access$000(ActionRouter.java:40)
at org.apache.jmeter.gui.action.ActionRouter$1.run(ActionRouter.java:63)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:251)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:727)
at java.awt.EventQueue.access$200(EventQueue.java:103)
at java.awt.EventQueue$3.run(EventQueue.java:688)
at java.awt.EventQueue$3.run(EventQueue.java:686)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:697)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:146)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:138)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:91)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.<init>(Socket.java:425)
at java.net.Socket.<init>(Socket.java:208)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:146)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
... 26 more
Can anyone please help me figure out what I did wrong, and how can I resolve this issue?
I tried turning off iptables on both client and server, but I get the same thing:
sudo service iptables stop
sudo chkconfig iptables off

I have seen this issue. You need to setup reverse SSH tunnels from master to client for results transfer. Check this: http://rolfje.wordpress.com/2012/02/16/distributed-jmeter-through-vpn-and-ssl/

See my answer here
I think the only way to work around this is to setup a full featured VPN.

Related

Connection Timeout Error in JMeter Distributed Enviroment

Command use to run in Master machine
mvn clean verify -DjMeterTestFile=Script_Name.jmx -Dremote_hosts='slave machine IP"
In Master machine test is getting executed and no errors getting logged
Slave Machine Command
sh /'jmeter file path'/jmeter-server -Djava.rmi.server.hostname='Slave machine IP'
I'm Getting the below error in Jmeter Slave log once after the Test started
2019-02-14 06:42:45,001 ERROR o.a.j.s.RemoteListenerWrapper: testStarted(host) on 'Slave Machine IP'
java.rmi.ConnectException: Connection refused to host: 'Server machine IP'; nested exception is:
java.net.ConnectException: Connection timed out (Connection timed out)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619) ~[?:1.8.0_191]
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216) ~[?:1.8.0_191]
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) ~[?:1.8.0_191]
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:129) ~[?:1.8.0_191]
at java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:227) ~[?:1.8.0_191]
at java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:179) ~[?:1.8.0_191]
at com.sun.proxy.$Proxy21.testStarted(Unknown Source) ~[?:?]
at org.apache.jmeter.samplers.RemoteListenerWrapper.testStarted(RemoteListenerWrapper.java:79) [ApacheJMeter_core-5.0.jar:5.0 r1840935]
at org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:217) [ApacheJMeter_core-5.0.jar:5.0 r1840935]
at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:384) [ApacheJMeter_core-5.0.jar:5.0 r1840935]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Configurations setup in Master Machine and Slave Agent
Master Machine Configurations
client.rmi.localport : 5060
https.socket.protocols: TLS1.2v
java.rmi.server.hostname : slave machine IP
server.rmi.ssl.disable : true
Plugins and Versions used
com.lazerycode.jmeter version 2.8.0
kg.apc:jmeter-plugins-extras:1.4.0
kg.apc:jmeter-plugins-standard:1.4.0
kg.apc:jmeter-plugins-autostop:0.1
kg.apc:jmeter-plugins-casutg:2.6
kg.apc:jmeter-plugins-csvars:0.1
kg.apc:jmeter-plugins-functions:2.0
kg.apc:jmeter-plugins-manager:0.19
kg.apc:jmeter-plugins-perfmon:2.1
kg.apc:jmeter-plugins-prmctl:0.3
kg.apc:jmeter-plugins-tst:2.1
kg.apc:jmeter-plugins-webdriver:2.3
com.blazemeter:jmeter-parallel:0.7
com.blazemeter:jmeter-plugins-wsc:0.7
Slave Machine Configurations
java.rmi.server.hostname='Slave Machine IP'
server.rmi.localport=42840
https.socket.protocols=TLSv1.2
server.rmi.ssl.disable=true
For the same configurations I'm not getting errors in JMeter 4.0
It was observed that results were sent through port 5061 and 5062. Therefore Had to open ports 5061 and 5062 in Controller machine .
Ex : sudo iptables -I INPUT -p tcp -s 10.0.0.0/0 --dport 5061 -j ACCEPT
I faced a similar issue with Jmeter on a windows client and flushing the DNS cache seems to fix my environment.
open Powershell from the windows start menu
type and run "Clear-DnsClientCache"
try again. Restarting Jmeter doesn't seem necessary.

Cannot connect to remote Neo4j server using neo4j-shell

I have Neo4j 2.3.2 installed on a server whose firewall has ports 1337, 7474, 7473 open (verified using ncat). I can access the web-based console remotely and can access the shell locally, but I cannot access the shell remotely. Namely, when running path/to/neo4j-shell -v -host destination.example.org I get
ERROR (-v for expanded information):
Connection refused
java.rmi.ConnectException: Connection refused to host: <server ip addres>; nested exception is:
java.net.ConnectException: Verbinding is geweigerd
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:129)
at java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:227)
at java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:179)
at com.sun.proxy.$Proxy1.welcome(Unknown Source)
at org.neo4j.shell.impl.AbstractClient.sayHi(AbstractClient.java:257)
at org.neo4j.shell.impl.RemoteClient.findRemoteServer(RemoteClient.java:70)
at org.neo4j.shell.impl.RemoteClient.<init>(RemoteClient.java:62)
at org.neo4j.shell.impl.RemoteClient.<init>(RemoteClient.java:45)
at org.neo4j.shell.ShellLobby.newClient(ShellLobby.java:204)
at org.neo4j.shell.StartClient.startRemote(StartClient.java:355)
at org.neo4j.shell.StartClient.start(StartClient.java:226)
at org.neo4j.shell.StartClient.main(StartClient.java:145)
Caused by: java.net.ConnectException: Verbinding is geweigerd
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.<init>(Socket.java:425)
at java.net.Socket.<init>(Socket.java:208)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:147)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
... 14 more
In /var/lib/neo4j/conf/neo4j.properties, I have
# Enable shell server so that remote clients can connect via Neo4j shell.
remote_shell_enabled=true
# The network interface IP the shell will listen on (use 0.0.0.0 for all interfaces).
remote_shell_host=0.0.0.0
# The port the shell will listen on, default is 1337.
remote_shell_port=1337
Do I need to open other ports apart from the three mentioned?
Is there any further configuration related to the shell I have missed?
neo4j-shell is based on Java RMI. There are couple of resources out there describing how to do firewalling for RMI. In fact it's pretty complex since RMI is doing dynamic port allocation.
Typically I run neo4j-shell locally over an ssh connection, this also gives you authentication and encryption - and you just need to open SSH port.

Mapreduce returns error when accessing to datanode on master machine

I set up a Hadoop 2.4.0 cluster with three machines. One master machine is deployed with namenode, resource manager, datanode and node manager. The other two worker machines are deployed with datanode and node manager. When I run Hive query, the work fails and the error is
2014-06-11 13:40:13,364 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From master/127.0.0.1 to
master:43607 failed on connection exception: java.net.ConnectException: Connection >refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:5>7)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImp>l.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1414)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at com.sun.proxy.$Proxy9.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
... 4 more
if I disable the datanode on master machine, everything works well. I'm wondering if it's allowed to deployed datanode on the master machine. Thank you for your kindly help in advance.
BTW, my /etc/hosts on the three machines are the same:
127.0.0.1 localhost
10.1.154.231 master
10.1.153.220 slave1
10.1.153.133 slave2
Please set up passwordless ssh on your master to itself.
You can achieve this by
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys2
Make sure the permissions are correct
chmod 0600 ~/.ssh/authorized_keys2
In this case you may check if the namenode is started correctly on the master by checking logs at your yourhadoopfolder/logs/hadoop-[hadoop-user]-namenode-master.log
It is often caused by the hdfs is not formatted before. Run
hadoop namenode -format
Of course you will need to put your data to the cluster again.

Hadoop timing out trying to write to Cassandra in AWS multi-region configuration

I am running a multi-DC Cassandra (open-source, not DSE) cluster in AWS, where one DC (us-west-2) is set up for analytics and the other (us-east) is the transactional store. I'm using NetworkTopologyStrategy with the EC2 snitch, and a consistency level of LOCAL_ONE in my Hadoop config. Hadoop can read from Cassandra without issue, but attempting to write produces a timeout exception.
Running nodetool status shows the DCs are properly configured:
Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token Rack
UN x.x.x.x 1.01 GB 9.9% 9e7f4393-7ac9-4559-b3ff-de48be50016f -9127921345534057723 2a
UN x.x.x.x 1001.16 MB 11.4% d0760383-c3dd-474c-9261-239b71dba3f1 -9221279003374097975 2b
UN x.x.x.x 1.05 GB 11.7% 3f09fbf5-0d85-4283-9009-0ec0e29223c0 -9140104347498952504 2c
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token Rack
UN x.x.x.x 1.1 GB 11.3% 5bbd2de4-e1d2-4a17-9f40-034f60b35954 -9061054426204373981 1b
UN x.x.x.x 1.15 GB 11.5% e34c590e-6176-45b2-a8f9-18b4a9a80032 -9216519687724118609 1c
UN x.x.x.x 1.18 GB 10.9% fa0b0a1a-f156-40fc-a267-970d1eb9cddb -9207673937991303291 1a
UN x.x.x.x 1.46 GB 10.7% b18ae406-c9ec-42b7-a365-b0c6e2fe582f -9206671929961171506 1a
UN x.x.x.x 1.13 GB 11.4% 1ac9c1c5-55ad-4048-b1ba-3b9768933ecc -9146100851344467112 1c
UN x.x.x.x 1.53 GB 11.2% dad665bb-68d9-4811-b421-f33333261867 -9178920986366339267 1b
Stack trace using ColumnFamilyOutputFormat:
java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter$RangeClient.run(ColumnFamilyRecordWriter.java:224)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:123)
at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter$RangeClient.run(ColumnFamilyRecordWriter.java:215)
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 4 more
... and using CqlOutputFormat:
java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:271)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:123)
at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:262)
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 4 more
Both traces ultimately point to AbstractColumnFamilyOutputFormat.createAuthenticatedClient(host, port, conf).
I then opened that source and added some detail to the exception so it would output the host name it's connecting to, which resulted in this trace:
java.io.IOException: java.lang.Exception: Unable to connect to host [hostname]
at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:271)
Caused by: java.lang.Exception: Unable to connect to host [hostname]
at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:139)
at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:262)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:124)
... 1 more
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 4 more
The problem is [hostname] is a machine that's not in the analytics cluster (it's in us-east). Why doesn't it know this automagically, especially when reads work properly? It seems like it's trying all the nodes in the ring regardless of DC.
For the record, writes fail using CqlOutputFormat, ColumnFamilyOutputFormat, and through Pig using CqlStorage and CassandraStorage.
I'd say, try to set the write_request_timeout_in_ms in cassandra.yaml to some very high number and see if that helps. There can be an issue with the node itself, when it is not responding while still appearing as being up. If it still times out, restart service on that node that you suspect is causing the issue.
This issue came down to two things:
For multi-region EC2 setups, Cassandra requires setting broadcast_address to the public IP and the listen_address to the internal IP. In most cases you'll want rpc_address to be the internal IP, but this potentially breaks Cassandra's Hadoop client, which is determining endpoints to talk to based on broadcast_address.
Cassandra's Hadoop client (RingCache specifically) doesn't respect data center on node discovery, and tries to discover all nodes in the ring--including non-local ones. It respects the consistency level on the actual write, but in our case it never got there due to #1.
I filed a ticket and submitted a patch to address these issues:
https://issues.apache.org/jira/browse/CASSANDRA-7252

Giraph ZooKeeper port problems

I am trying to run the SimpleShortestPathsVertex (aka SimpleShortestPathComputation) example described in the Giraph Quick Start. I am running this on a Hortonworks Sandbox instance (HDP 2.1) using VirtualBox, and I packaged giraph.jar using profile hadoop_2.0.0.
When I try to run the example using
hadoop jar giraph.jar org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsVertex -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip
/user/hue/tinygraph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op /user/hue/output/shortestpaths -w 1
I get the following exception
2014-04-30 07:22:15,390 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to sandbox.hortonworks.com:22181 with poll msecs = 3000
2014-04-30 07:22:15,396 WARN [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
I have found a work around - it seems that Giraph expects ZooKeeper to be running on port 22181, while it is actually running on 2181. I have simply used the Ambari interface to set ZooKeeper to run on 22181 (go to http://127.0.0.1:8080/, login admin/admin, Services tab, ZooKeeper and change the port to 22181, save and Service Actions -> Restart All.
Does anyone have a better solution for this problem? Is there a config via which the port should be specified, or is this port in the Giraph source code a typo?
Yes, you can specify each time you run a Giraph job via using option -Dgiraph.zkList=localhost:2181
Also you can set it up in Hadoop configs and then you don't have to pass on this option each time you submit a Giraph job. For that add the following line in conf/core-site.xml file :
<property><name>giraph.zkList</name><value>localhost:2181</value></property>
[Please check the syntax, I don't recall it on top my head and currently I don't have access to a cluster to check it]

Resources