Error while starting a Cassandra Cluster on Amazon EC2 - amazon-ec2

I am trying to set up a 3-node cassandra cluster on amazon EC2 instances yet i am having an issue while trying to startup the cluster.
Here are my configuration options:
Node-1
private-ip a.a.a.a
public-ip b.b.b.b
Node-2:
private-ip c.c.c.c
public-ip d.d.d.d
Node-3:
private-ip e.e.e.e
public-ip f.f.f.f
For each node I have chosen both Node-1 and Node-2 to be seeds. Therefore on all the cassandra.yaml files i have added the nodes public IPs.
Moreover, for each instance I have set the following properties:
listen_address private-ip
broadcast_address public-ip
rpc_address 0.0.0.0
broadcast_rpc_address public-ip
endpoint_snitch Ec2Snitch
auto_bootstrap false
Yet while trying to initialize the first node, the following exception happens:
ERROR [main] 2016-12-26 17:08:55,336 CassandraDaemon.java:654 - Exception encountered during startup
java.lang.NullPointerException: null
at org.apache.cassandra.service.StorageService.maybeAddOrUpdateKeyspace(StorageService.java:1025) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:903) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:647) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:518) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:310) [apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:532) [apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:641) [apache-cassandra-2.2.8.jar:2.2.8]
Any idea on what I am doing wrong?

Can you try with rpc_address and listen_address as eth0.
We have built cassandra cluster on EC2 nodes with EC2Snitch and with eth0 and it works perfectly.

Related

How do I curl my elasticsearch on AWS EC2

I installed elasticsearch(docker) 8.2 on aws ec2(ubuntu 20.04.)
Everything is working.My only problem is that I can't reach(curl) it from other instances and my backend server(it is on same vpc).
I added my node to its discovery node, and also set network.host: 0.0.0.0
but I still can't reach it
(I tried with both private and public ip)
Is it necessary to install SSL/TSL on it with elastic 8?
Does anyone has any suggestion how to access it?
Looks like you forgot to bind the docker container port to host port, you need to add below config, to your Elasticsearch container docker yml
ports:
- "9202:9200" (bind 9200 port of host to docker port of 9200, 9200 is the Elasticsearch port by default)
After that you should be able to do the curl from other instances in the VPC.

ElasticSearch Connection Timed Out in EC2 Instance

I am setting up an ELK Stack (which consists of ElasticSearch, LogStash and Kibana) in a single EC2 instance. AWS EC2 instance. I am following the documentation from the elastic.co site.
TL;DR; I cannot access my ElasticSearch interface hosted in an EC2 from the Web URL. How to fix that?
Type : m4.large
vCPU : 2
Memory : 8 GB
Storage: 25 GB (EBS)
Note : I have provisioned the EC2 instance inside a VPC and with an Elastic IP.
I have installed all 3 components. ElasticSearch and LogStash are running as services while Kibana is running via the command ./bin/kibana inside kibana-7.10.1-linux-x86_64/ directory.
When I curl the ElasticSearch endpoint using
curl http://localhost:9200
I get this JSON output. (Which means the service is running and is accessible via Port 9200).
However, when I try to access the same URL via my browser, I get an error saying
Connection Timed Out
Isn't this supposed to return the same JSON output as the one I've mentioned above?
I have attached the elasticsearch.yml file here (Hosted in gofile.io).
Here are the Inbound Rules for the EC2 instance.
EDIT : I tried changing the network.host: 'localhost'
to network.host: 0.0.0.0 and restarted the service but this time I got an error while starting the service. I attached the screenshot of that.
EDIT 2 : I have uploaded the updated elasticsearch.yml to Gofile.org).
The problem is the following line in your elasticsearch.yml configuration file:
node.name: node-1
network.host: 'localhost'
With that configuration, your ES cluster is only accessible from the same host and not from the outside. According to the official documentation, you need to either specify 0.0.0.0 or a specific publicly accessible IP address, otherwise that won't work.
Note that you also need to configure the following two lines in order for the cluster to properly form:
discovery.seed_hosts: ["node-1-ip-address"]
# Bootstrap the cluster using an initial set of master-eligible nodes:
cluster.initial_master_nodes: ["node-1"]

Unable to gossip with any seeds but continuing since node is in its own seed list

To remove a node from 2 node cluster in AWS I ran
nodetool removenode <Host ID>
After this I was supposed to get my cluster back if I put all the cassandra.yaml and cassandra-rackdc.properties correctly.
I did it but still, I am not able to get back my cluster.
nodetool status is displaying only one node.
significant system.log on cassandra is :
INFO [main] 2017-08-14 13:03:46,409 StorageService.java:553 - Cassandra version: 3.9
INFO [main] 2017-08-14 13:03:46,409 StorageService.java:554 - Thrift API version: 20.1.0
INFO [main] 2017-08-14 13:03:46,409 StorageService.java:555 - CQL supported versions: 3.4.2 (default: 3.4.2)
INFO [main] 2017-08-14 13:03:46,445 IndexSummaryManager.java:85 - Initializing index summary manager with a memory pool size of 198 MB and a resize interval of 60 minutes
INFO [main] 2017-08-14 13:03:46,459 MessagingService.java:570 - Starting Messaging Service on /172.15.81.249:7000 (eth0)
INFO [ScheduledTasks:1] 2017-08-14 13:03:48,424 TokenMetadata.java:448 - Updating topology for all endpoints that have changed
WARN [main] 2017-08-14 13:04:17,497 Gossiper.java:1388 - Unable to gossip with any seeds but continuing since node is in its own seed list
INFO [main] 2017-08-14 13:04:17,499 StorageService.java:687 - Loading persisted ring state
INFO [main] 2017-08-14 13:04:17,500 StorageService.java:796 - Starting up server gossip
Content of files:
cassandra.yaml : https://pastebin.com/A3BVUUUr
cassandra-rackdc.properties: https://pastebin.com/xmmvwksZ
system.log : https://pastebin.com/2KA60Sve
netstat -atun https://pastebin.com/Dsd17i0G
Both the nodes have same error log.
All required ports are open.
Any suggestion ?
It's usually a best practice to have one seed node per DC if you have just two nodes available in your datacenter. You shouldn't make every node a seed node in this case.
I noticed that node1 has - seeds: "node1,node2" and node2 has - seeds: "node2,node1" in your configuration. A node will start by default without contacting any other seeds if it can find it's IP address as first element in - seeds: ... section in the cassandra.yml configuration file. That's what you can also find in your logs:
... Unable to gossip with any seeds but continuing since node is in its own seed list ...
I suspect, that in your case node1 and node2 are starting without contacting each other, since they identify themselves as seed nodes.
Try to use just node1 for seed node in both instance's configuration and reboot your cluster.
In case of node1 being down and node2 is up, you have to change - seeds: ... section in node1 configuration to point just to node2's IP address and just boot node1.
If your nodes can't find each other because of firewall misconfiguration, it's usually a good approach to verify if a specific port is accessible from another location. E.g. you can use nc for checking if a certain port is open:
nc -vz node1 7000
References and Links
See the list of ports Cassandra is using under the following link
http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/secureFireWall.html
See also a detailed documentation on running multiple nodes with plenty of sample commands:
http://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html
This is for future reference. My problem has been solved just by opening 7000 port for same security group in AWS. Although it was open but security group was something different.
When I ran:
ec2-user#ip-Node1 ~]$ telnet Node2 7000
Trying Node2...
telnet: connect to address Node2: Connection timed out
I came to know the problem could be of the security group.
And that is how it has been solved.
About seeds I am using IP of both the nodes, like this:
-seeds: "node1,node2"
It is same on both the nodes.

ElasticSearch VM clone - master_not_found_exception, found existing node with the same id but is a different node instance

Here is my setup:
Two instances of Ubuntu 16.04. Second one is clone made from the first one. ElasticSearch is installed only on Guest (Ubuntu) OSes. Configuration has been adjusted after cloning the VM.
I am running with bridged network in VirtualBox - each instance got its IP from the router. Windows (host) firewall is configured appropriately. All machines can ping each other. Ping, Netstat and nmap testing shows that ports 9200 and 9300 are OPEN (tested "remote" hosts also).
ElasticSearch service is running appropriately. I can "curl -XGET" both locally and remotely and get the correct results.
The problem is that the ES from the second machine is not joining the cluster.
Here are the configuration files:
First one:
cluster.name: p4g4n_cluster
node.name: master
node.master: true
network.host: 192.168.0.12
discovery.zen.ping.unicast.hosts: ["192.168.0.12", "192.168.0.17"]
Second one:
cluster.name: p4g4n_cluster
node.name: node1
node.master: false
network.host: 192.168.0.17
discovery.zen.ping.unicast.hosts: ["192.168.0.12", "192.168.0.17"]
if I try curl -XGET 192.168.0.17:9200/_cluster/health I will get master_not_discovered_exception. And if I try basic GET request, I will see that the node1 has _na_ for the cluster_uuid" property, while on first machine - *master*cluster_uuid` is present.
Version of ElasticSearch running is: 5.4.0 and
Version of Lucene is: 6.5.0
Can anyone help me with what needs to happen in order for node1 to see and join the cluster?
I was able to solve this issue.
Digging through the logs showed that this was not a network configuration issue.
Since I first configured the entire ELK stack on one machine and then cloned it, the ES and logstash were already running and pumping syslog logs into the elastic.
Because of this, the cloned machine had the same data folder as the existing one. As it turned out, the node UUID is embedded in the data folder and the solution was to delete the data folder on the cloned VM.
The error that I found in logs was: found existing node {xxx} with the same id but is a different node instance ... So there was an obvious conflict.
I found this github ES issue and this SO answer that dealt with the same issue.
You can try to add network.bind_host: 0.0.0.0 in both servers

Multicast Enable for Logstash - Elasticsearch

I'm trying to configure logstash to join my elasticsearch cluster as a node using multicast to avoid the configuration of a specific host on logstash configuration.
The configuration I have on elasticsearch is basically:
transport.tcp.port: 9300
http.port: 9200
cluster.name: myclustername
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.timeout: 30s
discovery.zen.ping.multicast.enabled: true
discovery.zen.ping.multicast.group: 239.193.200.01
discovery.zen.ping.multicast.port: 54328
On logstash side, I have this configuration:
output {
elasticsearch {
host => "239.193.200.01"
cluster => "myclustername"
protocol => "node"
}
}
My elasticsearch cluster is being discovered successfully using multicast meaning the multicast IP is working as expected, but from that configuration I get the following log output:
log4j, [2014-06-05T05:51:44.001] WARN: org.elasticsearch.transport.netty: [logstash-aruba-30825-2014] exception caught on transport layer [[id: 0xe33ea7dd]], closing connection
java.net.SocketException: Network is unreachable
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:465)
at sun.nio.ch.Net.connect(Net.java:457)
If I remove the host key from the configuration I receive this output log:
log4j, [2014-06-05T06:07:45.500] WARN: org.elasticsearch.discovery: [logstash-aruba-31431-2014] waited for 30s and no initial state was set by the discovery
imeout(org/elasticsearch/action/support/master/TransportMasterNodeOperationAction.java:180)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(org/elasticsearch/cluster/service/InternalClusterService.java:492)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java/util/concurrent/ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java/util/concurrent/ThreadPoolExecutor.java:615)
What am I doing wrong here? I suppose my logstash configuration is wrong but I'm sure what.
As per Logstash 1.4.1 documentation (http://logstash.net/docs/1.4.1/outputs/elasticsearch) you could create an elasticsearch.yml file in the $PWD dir of the Logstash process to ensure it is configured with the same multicast details.
I assume the Elasticsearch cluster can see each other successfully using multicast and there isn't some network issue preventing that. Check at http://your-es-host:9200/_cluster/health?pretty=true make sure the number of nodes is what you expect.
Setting elasticsearch variable to the jvm using $JAVA_OPTS is another possibility:
export JAVA_OPTS="-Des.discovery.zen.ping.multicast.group=224.2.2.4 \
-Des.discovery.zen.ping.multicast.port=54328 \
-Des.discovery.zen.ping.multicast.enabled=true"
Other option is use of elasticsearch_http, I've had the same problem, and now working good.
Resource here

Resources