I'm new to Cassandra, and I'm trying to set up a simple 2 node cluster on two test ec2 ubuntu instances. but replication is not working, nodetool ring doesn't show both instances. What could I be doing wrong?
I'm using cassandra version 2.0.11.
here's what my config like on both machines:
listen_address: <private_ip>
rpc_address: <private_ip>
broadcast_address: <public_ip>
seeds: <private_ip_of_other_machine>
endpoint_snitch: Ec2Snitch
I have configured EC2 security group to allow all traffic on all ports between these instances. What am I doing wrong here? I can provide the cassandra logs if required.
Thank you.
EDIT: the error I'm getting currently is this:
java.lang.RuntimeException: Unable to gossip with any seeds
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1340)
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:766)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:693)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:585)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625)
ERROR 15:08:03 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1340) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:766) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:693) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:585) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300) [apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) [apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625) [apache-cassandra-2.2.5.jar:2.2.5]
WARN 15:08:03 No local state or state is in silent shutdown, not announcing shutdown
The 1st thing I see is that your seeds: list is wrong. Both nodes should have the same seeds: list. For a simple 2-node test setup, you only need 1 seed (pick either one). If the nodes are in the same AZ, you can use the private IP.
Related
I am seeing some errors in my nifi cluster, I have a 3 node secured nifi cluster i am seeing the below errors. at the 2 nodes
ERROR [main] org.apache.nifi.web.server.JettyServer Unable to load flow due to:
java.io.IOException: org.apache.nifi.cluster.ConnectionException:
Failed to connect node to cluster due to: java.io.IOException:
Could not begin listening for incoming connections in order to load balance data across the cluster.
Please verify the values of the 'nifi.cluster.load.balance.port' and 'nifi.cluster.load.balance.host'
properties as well as the 'nifi.security.*' properties
See the clustering configuration guide for the list of clustering options you have to configure. For load balancing, you'll need to specify ports that are open in your firewall so that the nodes can communicate. You'll also need to make sure that each host has its node hostname property set, its host ports set and that there are no firewall restricts between the nodes and your Apache Zookeeper cluster.
If you want to simplify the setup to play around, you can use the information in the clustering configuration section of the admin guide to set up an embedded ZooKeeper node within each NiFi instance. However, I would recommend setting up an external ZooKeeper cluster. A little more work, but ultimately worth it.
I'm trying to run an elasticsearch cluster with each es-node running in its own container. These containers are deployed using ECS across several machines that may be running other unrelated containers. To avoid port conflicts each port a container exposes is assigned a random value. These random ports are consistent across all running containers of the same type. In other words, all running es-node containers map port 9300 to the same random number.
Here's the config I'm using:
network:
host: 0.0.0.0
plugin:
mandatory: cloud-aws
cluster:
name: ${ES_CLUSTER_NAME}
discovery:
type: ec2
ec2:
groups: ${ES_SECURITY_GROUP}
any_group: false
zen.ping.multicast.enabled: false
transport:
tcp.port: 9300
publish_port: ${_INSTANCE_PORT_TRANSPORT}
cloud.aws:
access_key: ${AWS_ACCESS_KEY}
secret_key: ${AWS_SECRET_KEY}
region: ${AWS_REGION}
In this case _INSTANCE_PORT_TRANSPORT is the port that 9300 is bound to on the host machine. I've confirmed that all the environment variables used above are set correctly. I'm also setting network.publish_host to the host machine's local IP via a command line arg.
When I forced _INSTANCE_PORT_TRANSPORT (and in turn transport.publish_port) to be 9300, everything worked great, but as soon as it's given a random value, nodes can no longer connect to each other. I see errors like this using logger.discovery=TRACE:
ConnectTransportException[[][10.0.xxx.xxx:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /10.0.xxx.xxx:9300];
at org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:952)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:916)
at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:888)
at org.elasticsearch.transport.TransportService.connectToNodeLight(TransportService.java:267)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:395)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
It seems like the port a node binds to is the same as the port it pings while trying to connect to other nodes. Is there any way to make them different? If not, what's the point of transport.publish_port?
The way the discovery-ec2 plugin works is that it's collecting a list of IP addresses using AWS EC2 API and use this list as unicast list of nodes.
But it does not collect any information from the running cluster. Obviously the node is not yet connected!
So it does not know anything about the publish_port of other nodes.
It just adds an IP address. And that's all. Elasticsearch then is using the default port which is 9300.
So there is nothing you can do IMO to fix that in the short time.
But we can imagine adding a new feature which is close to what has been implemented for Google Compute Engine. We are using a specific metadata to get this port from the GCE APIs.
We could do the same for Azure and EC2. Do you want to open an issue so we can track the effort?
We have 2 locations connected by VPN.
Currently we have 2 independent graylog servers.
We want to create some kind co cluster, so we can reach logs on both sides even if VPN is down.
Is is something like this:
We already tried to create Elasticsearch cluster, but this is not the way.
If VPN is down whole cluster is down and logs not working on either side.
I found this article: https://www.elastic.co/blog/scaling_elasticsearch_across_data_centers_with_kafka
with such topology:
but I have no idea how to configure Apache Kafka so it will be broker for graylog and input for syslog server.
Any help/another idea / link will be much appreciated.
I want to replace current 3 ZooKeeper servers with 3 new ZooKeeper servers. I have added:
new Zoo to Ambari,
add new Zoo to variables:
hbase.zookeeper.quorum
ha.zookeeper.quorum
zookeeper.connect
hadoop.registry.zk.quorum
yarn.resourcemanager.zk-address
Restart services, restart RM, and still I can't connect to any new Zoo when I turn off all old Zoo servers.
zookeeper-client -server zoo-new1
I get the following error:
"Unable to read additional data from server sessionid 0x0, likely server has closed socket"
And on new Zoo server in logs (zookeeper.out):
"Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running"
When I run one of the old ZooKeepers, then everything is working, and I can connect also to the new ZooKeeper servers.
My best guess is that this has to do with one of the most important properties in zookeeper, namely leader election. If you start with a zookeeper quorum with 3 servers and add 3 more servers to it. You will have to have at least 4 servers running for the quorum to be accessible. When a zookeeper node was unable to elect a leader it will look as if it's down.
This is also the reason why your setup works when you start one of the old zookeepers, because they are now 4 alive of 6 possible. If you want the new setup to work you need to remove the old servers from the config, so that the quorum only knows about the three new ones. To simply shut a zookeeper server down will not remove it from the quorum.
I've been trying to use the lovely ansible-elasticsearch project to set up a nine-node Elasticsearch cluster.
Each node is up and running... but they are not communcating with each other. The master nodes think there are zero data nodes. The data nodes are not connecting to the master nodes.
They all have the same cluster.name. I have tried with multicast enabled (discovery.zen.ping.multicast.enabled: true) and disabled (previous setting to false, and discovery.zen.ping.unicast.hosts:["host1","host2",..."host9"]) but in either case the nodes are not communicating.
They have network connectivity to one another - verified via telnet over port 9300.
Sample output:
$ curl host1:9200/_cluster/health
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":"waited for [30s]"}],"type":"master_not_discovered_exception","reason":"waited for [30s]"},"status":503}
I cannot think of any more reasons why they wouldn't connect - looking for any more ideas of what to try.
Edit: I finally resolved this issue. The settings that worked were publish_host to "_non_loopback:ipv4_" and unicast with discovery.zen.ping.unicast.hosts set to ["host1:9300","host2:9300","host3:9300"] - listing only the dedicated master nodes. I have a minimum master node count of 2.
The only reasons I can think that can cause that behavior are:
Connectivity issues - Ping is not a good tool to check that nodes can connect to each other. Use telnet and try connecting from host1 to host2 on port 9300.
Your elasticsearch.yml is set to bind 127.0.0.1 or the wrong host (if you're not sure, bind 0.0.0.0 to see if that solves your connectivity issues and then it's important to change it to bind only internal hosts to avoid exposure of elasticsearch directly to the internet).
Your publish_host is incorrect - This usually happens when you run ES inside a docker container for example, you need to make sure that the publish_host is set to an address that can be accessed via other hosts.