Issue while querying network topology key space in multi region cluster - amazon-ec2

I have set up Cassandra cluster with 3 nodes in 3 different ec2 instances. each instance is in different availability zone though the datacenter is same.
I am using EC2MultiRegionSnitch, below are my yaml configuration detail
listen address : private ip
broadcast address : publicip
seeds : public of 1 node
While querying networkTopologyKeyspace query I am getting below error "not enough replicas available for query at consistency ONE" . RF for this key space is 3.
Queries on simpleclass keyspace are working perfectly fine.

Related

Curl call for PUT and GET query on multi-node es cluster

I would like to know on a multi-node Elasticsearch cluster (3 nodes), to which node we can send curl call to fetch some results (by running query)?
If we can use any node IP what is can be the best practice? , for example, if
I am using node 1's URL from "node 1, node 2, and node 3", let's say node 1 goes down, I have to manually update the query URL to "node 2 or node 3" is their way so that I can have one centralized URL which does itself.
Do I have to manually do it using Nginx or load balancer, Or there is something in the elastic search itself
Although in ES if you send the request to any node, that is part of a valid ES cluster, it will route the request internally and provide you the result.
But You shouldn't use the directly node ip to communicate with the Elasticsearch for obvious reasons and one of that you already mentioned. You can use the load balancer, ngnix or DNS for your Elasticsearch cluster.
But if you are accessing it programmatically you don't need this also, while creating the Elasticsearch clients, you can specify all the nodes ip in Elasticsearch client, this way even when some nodes are down still your request will not fail.
RestClientBuilder restClientBuilder = RestClient.builder(
new HttpHost(esConfig.getHost(), esConfig.getPort()), new HttpHost(esConfig.getHost2(), esConfig.getPort2()));
As you can see i created my Elasticsearch client(works with Elasticsearch 8.5) WITH two Elasticsearch hosts.

How to configure 3 new instances as dedicated master nodes in a running cluster with all its master and data nodes (Elasticsarch)?

Context:
We have an elastic search cluster with 10 nodes that are all configured as master: true and data true.
Due to the characteristics of our infrastructure, all the nodes of a cluster (cluster of virtual machines in this case) take their configuration from a Github repository. In other words, each and every single node has the same configuration.
From this group these 10 nodes that are configured as master: true, data: true, 3 are configured as master eligible.
Steps we performed:
We turned off the 10 nodes that were being used by the cluster (all master and data true).
We changed the configuration in the old nodes (let's call that group cluster of virtual machines: elastic-data) to data: true and the new nodes (let's call that new cluster of virtual machines elastic-master) to master: true.
We set the new master as master elegible on both configurations (elastic-master and elastic-data nodes).
We restarted the app.
Problem we found:
The cluster started normally. The queries for cluster administration tasks (list of nodes, search for configurations, etc.) went very fast. With the previous configuration, most of the time, it did not respond. When we tried to perform a query to see the data we got: cannot allocate because all found copies of the shard are either stale or corrupt.
After hours of trying to recover from that state, we decided to roll back the configuration, result cluster continues unstable.
A step-by-step guide on how to do this without leaving the cluster in the described state is highly appreciated.

Which elasitcsearch node should i query from my application

If I were to set up my cluster on elastic with 3 master node and 5 to 10 data nodes which node IP address should I actually use in my application to query elastic. I am following Hot warm architecture for elastic but from what I have understood is the master node should always be responsible for handling an incoming request and then coordinating that request to further node in the cluster and to operate on the final response.
So should I only use master node IP addresses in my application to talk with the cluster?
First of all, you shouldn't be using individual IP to connect to a cluster as that can potentially become your single point of failure, if the node goes down. You should have a load balancing URL that connects to data nodes or coordinator nodes to aid your search.
Also, it looks like, you are having dedicated master nodes. Typically for larger size cluster, its not recommended to use master as the search coordinator and should ideally have them in master eligible only role to ensure cluster stability. So you will be left with option of using either data nodes or coordinator only nodes to accept your search requests.
If you are using clients like JEST, NEST etc and not directly using the http endpoint for _search, then you also have option to provide a list of IPs/hostname to form a connection pool.
Like #askids mentioned, always connect to elasticsearch using the standard. Elastic itself provides clients.
https://www.elastic.co/guide/en/elasticsearch/client/index.html
You have not mentioned the clients you are going to be using. If your client is based on Java, use the Elasticsearch's Low-Level or High Level Rest Client. These clients are wrappers on apache http client and provide you all the boilerplate logic of handling connections and other features.
You can also add Sniffer support to it.
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-low.html
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/sniffer.html

How to assign IP ranges to google data flow instances?

I need to move data from google bigquery to elasticsearch instances, For that I have created python dataflow job to copy bigquery table to elasticsearch. But problem is recently they have added IP based restriction on elastic search instances so that it will allow only for specific IP ranges only.
So How can I identify or assign IP ranges of my dataflow workers when I using "DataflowRunner" option?
In the pipeline options you can set the network and the subnetwork you want to use. Each VPC network contains subnets, each with a defined IP range. By defining the subnet to the ip range needed and setting that subnet in the pipeline options you can assign a ip range to your workers.

ElasticSearch: Starting Multiple Cluster

I started two clusters of ElasticSearch with different names but the other one won't show up either in Marvel or querying for health manually.
curl 'http://127.0.0.1:9200/_cat/health?v'
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1501062768 15:22:48 Cove_dev_cluster yellow 1 1 8 8 0 0 8 0 - 50.0%
But it's running on my screen.
I am assuming you are running both clusters (single nodes I believe in this case) on the same machine... In this case the nodes have a default port range setting of 9200-9300 and they are configured to bind to first available port in the specified range. More details available in Network Settings documentation.
So in your case the other cluster is running on port 9201 most likely. If you check for Marvel or query the health manually on port 9201 you should find the other cluster.
However, if you want to have two nodes participating in the same cluster, then make sure that the cluster name matches in the configuration of both instances of elasticsearch you have running.
Hope this helps.

Resources