I setup a 3 node Zookeeper cdh4 ensemble on RHEL 5.5 machines. I have started the service by running zkServer.sh on each of the nodes. ZooKeeper instance is running on all the nodes, but how do I know if it is a part of an ensemble or are they running as individual services?
I tried to start the service and check the ensemble as stated here, on Cloudera's site, but it throws a ClassNotFoundException.
You can use the stat four letter word,
~$echo stat | nc 127.0.0.1 <zkport>
Which gives you output like,
Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
Clients:
/127.0.0.1:55829[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 3
Sent: 2
Connections: 1
Outstanding: 0
Zxid: 0x100000000
Mode: leader
Node count: 4
The Mode: line tells you what mode the server is running in, either leader, follower or standalone if the node is not part of a cluster.
Related
To remove a node from 2 node cluster in AWS I ran
nodetool removenode <Host ID>
After this I was supposed to get my cluster back if I put all the cassandra.yaml and cassandra-rackdc.properties correctly.
I did it but still, I am not able to get back my cluster.
nodetool status is displaying only one node.
significant system.log on cassandra is :
INFO [main] 2017-08-14 13:03:46,409 StorageService.java:553 - Cassandra version: 3.9
INFO [main] 2017-08-14 13:03:46,409 StorageService.java:554 - Thrift API version: 20.1.0
INFO [main] 2017-08-14 13:03:46,409 StorageService.java:555 - CQL supported versions: 3.4.2 (default: 3.4.2)
INFO [main] 2017-08-14 13:03:46,445 IndexSummaryManager.java:85 - Initializing index summary manager with a memory pool size of 198 MB and a resize interval of 60 minutes
INFO [main] 2017-08-14 13:03:46,459 MessagingService.java:570 - Starting Messaging Service on /172.15.81.249:7000 (eth0)
INFO [ScheduledTasks:1] 2017-08-14 13:03:48,424 TokenMetadata.java:448 - Updating topology for all endpoints that have changed
WARN [main] 2017-08-14 13:04:17,497 Gossiper.java:1388 - Unable to gossip with any seeds but continuing since node is in its own seed list
INFO [main] 2017-08-14 13:04:17,499 StorageService.java:687 - Loading persisted ring state
INFO [main] 2017-08-14 13:04:17,500 StorageService.java:796 - Starting up server gossip
Content of files:
cassandra.yaml : https://pastebin.com/A3BVUUUr
cassandra-rackdc.properties: https://pastebin.com/xmmvwksZ
system.log : https://pastebin.com/2KA60Sve
netstat -atun https://pastebin.com/Dsd17i0G
Both the nodes have same error log.
All required ports are open.
Any suggestion ?
It's usually a best practice to have one seed node per DC if you have just two nodes available in your datacenter. You shouldn't make every node a seed node in this case.
I noticed that node1 has - seeds: "node1,node2" and node2 has - seeds: "node2,node1" in your configuration. A node will start by default without contacting any other seeds if it can find it's IP address as first element in - seeds: ... section in the cassandra.yml configuration file. That's what you can also find in your logs:
... Unable to gossip with any seeds but continuing since node is in its own seed list ...
I suspect, that in your case node1 and node2 are starting without contacting each other, since they identify themselves as seed nodes.
Try to use just node1 for seed node in both instance's configuration and reboot your cluster.
In case of node1 being down and node2 is up, you have to change - seeds: ... section in node1 configuration to point just to node2's IP address and just boot node1.
If your nodes can't find each other because of firewall misconfiguration, it's usually a good approach to verify if a specific port is accessible from another location. E.g. you can use nc for checking if a certain port is open:
nc -vz node1 7000
References and Links
See the list of ports Cassandra is using under the following link
http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/secureFireWall.html
See also a detailed documentation on running multiple nodes with plenty of sample commands:
http://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html
This is for future reference. My problem has been solved just by opening 7000 port for same security group in AWS. Although it was open but security group was something different.
When I ran:
ec2-user#ip-Node1 ~]$ telnet Node2 7000
Trying Node2...
telnet: connect to address Node2: Connection timed out
I came to know the problem could be of the security group.
And that is how it has been solved.
About seeds I am using IP of both the nodes, like this:
-seeds: "node1,node2"
It is same on both the nodes.
I was checking out a machine where I've graylog and elastic installed (for graylog).
There's a thing that I can't really undestand, it seems that elasticsearch is running with two nodes on the same machine, which i would like to avoid.
here's the output:
me#server ~ # curl 'localhost:9200/_cat/nodes'
127.0.0.1 127.0.0.1 1 71 0.54 d * Candra
127.0.0.1 127.0.0.1 32 71 0.54 c - graylog-7d4bdfb9-23ac-45e9-a957-1f72b8848e2b
is this normal? how can i set it up to use just one node?
The second node is a lightweight client node (see the c in the 6th column) that Graylog is creating in order to connect to the cluster. It's perfectly normal as you can see in their official documentation:
Graylog hosts an embedded Elasticsearch node which is joining the Elasticsearch cluster as a client node.
I have 6 machines mesos cluster (3 masters and 3 slaves), I acces to mesos User interface 172.16.8.211:5050 and it works correctly and redirect to the leader if it is not. Then If I access to marathon User interface 172.16.8.211:8080 it works correctly. Summing before configuring and executing the consul-cluster marathon works well.
My problem is when I configure and run a consul cluster with 3 servers that are the mesos masters and 3 clients that are the mesos slaves. If I execute consul members it is fine, all the members alive and working together.
But now if I try to access to marathon User interface I can't, and I access to mesos User interface and I go to 'Frameworks' and does not appear marathon Framework.
ikerlan#client3:~$ consul members
Node Address Status Type Build Protocol DC
client3 172.16.8.216:8301 alive client 0.5.2 2 nyc2
client2 172.16.8.215:8301 alive client 0.5.2 2 nyc2
server2 172.16.8.212:8301 alive server 0.5.2 2 nyc2
server3 172.16.8.213:8301 alive server 0.5.2 2 nyc2
client1 172.16.8.214:8301 alive client 0.5.2 2 nyc2
server1 172.16.8.211:8301 alive server 0.5.2 2 nyc2
In Slaves tab of mesos I could see the next:
-Mesos version: 0.27.0
-Marathon version: 0.15.1
I have the next file logs, where would appear something related with this issue?
What could be the problem?
Solution:
I have see in the marathon logs '/var/log/syslog' that the problem is a problem of DNS. So I try to add the IPs of the other hosts of the cluster to the file /etc/hosts. And it resolv the problem, now it works perfectly.
You can add all the cluster hosts to the zookeeper config file, it would work
I set up Apache Storm 0.9.3 in fully distributed mode (3 nodes) that leverages a fully distributed Apache Zookeeper cluster (3.4.6) consisting of 3 nodes. I did the following testing and found out that Storm Nimbus failed to start if the first zookeeper server in the configuration storm.yaml is down or temporary unreachable.
Test #1:
bring up all three zookeeper nodes
start nimbus, supervisor, ui on the storm master node, and launch supervisor on other two nodes
in this case, everything goes well
Test #2:
shut down one of the three zookeeper nodes (zookeeper is still functional)
start nimbus, supervisor, ui on the storm master node, and launch supervisor on other two nodes
in this case, if the failed zookeeper node happened to be the first one in storm.zookeeper.servers, if fails to start neither nimbus nor supervisor on the master node.
I am wondering if any of you guys encounter this problem? Is it something going unexpected or something going wrong with my configuration? Or something else?
My configuration is listed below
storm.zookeeper.servers :
- "zookeeper1.hostname.local"
- "zookeeper2.hostname.local"
- "zookeeper3.hostname.local"
nimbus.host : storm-master.hostname.local
nimbus.thrift.port : 6627
storm.zookeeper.port : 2181
supervisor.slots.ports :
- 6700
- 6701
- 6702
- 6703
ui.port : 8744
storm.local.dir : /opt/apache-storm-0.9.3/storm-local
I want to setup 3 nodes on windows machine for testing purpose. i already have community version installed. I followed some tutorials on youtube to setup 1 machine 3 nodes and docs as well. all 3 nodes are up but they are not connected. i can only see 1 node serving 100% load on "nodetool status"
Here is what i wanted, 3 instances connected as below
127.0.0.1 (seed)
127.0.0.2
127.0.0.3
Here is what i did,
Installed Datastax community edition 2.0.11
Copied apache-cassandra/conf -> conf2 & conf3
modified cassandra.yaml for
cluster_name
seed_address (127.0.0.1)
listen_address (seed ip)
rpc_address 0.0.0.0
endpoint_snitch: SimpleSnitch
Above things were documented but i had to change below ports as it was single machine
rpc_port: [if default is 9160 then node1 will be 9161]
native_transport_port:
storage_port:
Changed "JMX_PORT" in cassandra.bat file (created 2 copies of main file)
started all
I tried ccm but its not picking already installed cassandra it tries to build from source and fails.
Am i missing something, it been 2 days (4-5 hours) i am trying to set this up.
Thanks,
Ninad
From my own tests, on Windows 7, 127.0.0.1/127.0.0.2 point to the same interface so you can't bind to the same port. Yet using different ports for each node, I had the same issue as you (nodes not communicating with each other). At the end I would recommend using Linux for this kind of tests, even a simple virtual machine, because for Linux 127.0.0.1 and 127.0.0.2 are not the same.