migrate indexes from old version of elasticsearch to elasticsearch 7.9 - elasticsearch

we want to upgrade our elasticsearch version from 5.6 to 7.9 in our project.
I have to migrate our indexes and docs to new version but I cant use reindex, So I rest high level client to connect to elasticsearch 7 and use http request for elasticsearch 5.
For migration I get part of docs with match_all query and scroll from old version and index them in new elasticsearch with bulk request.
our old version elasticsearch has 3 node.My question is that I have to send request to all node separately and process docs or if I send match_all query search to one node it will be handled by elsaticsearch (I read sth about cordinating node that handle requests and Every node is implicitly a coordinating node cordinating node.) or I have to send request to data node

Adding more details to #saeednasehi answer, Looks like you are getting confused about how Elasticsearch and its queries work internally, please refer to my answer to how search queries works in elasticsearch.
Apart from this while it's true, you can get data by connecting to any node, but in your ES client(JHLRC or HTTP) you should mention all the nodes IP, so that your request(note coordinating) load is distributed among all the data nodes, if you just give one node-IP, than that node always acts as a co-ordinating node in absence of dedicated coordinating node(default).

When you start a cluster of elsticsearch you can see all of the cluster as a single data base. it means that you can fetch and insert to all of the cluster by sending your request to one of them. You just need to send your request to a node and fetch your data.

Related

Curl call for PUT and GET query on multi-node es cluster

I would like to know on a multi-node Elasticsearch cluster (3 nodes), to which node we can send curl call to fetch some results (by running query)?
If we can use any node IP what is can be the best practice? , for example, if
I am using node 1's URL from "node 1, node 2, and node 3", let's say node 1 goes down, I have to manually update the query URL to "node 2 or node 3" is their way so that I can have one centralized URL which does itself.
Do I have to manually do it using Nginx or load balancer, Or there is something in the elastic search itself
Although in ES if you send the request to any node, that is part of a valid ES cluster, it will route the request internally and provide you the result.
But You shouldn't use the directly node ip to communicate with the Elasticsearch for obvious reasons and one of that you already mentioned. You can use the load balancer, ngnix or DNS for your Elasticsearch cluster.
But if you are accessing it programmatically you don't need this also, while creating the Elasticsearch clients, you can specify all the nodes ip in Elasticsearch client, this way even when some nodes are down still your request will not fail.
RestClientBuilder restClientBuilder = RestClient.builder(
new HttpHost(esConfig.getHost(), esConfig.getPort()), new HttpHost(esConfig.getHost2(), esConfig.getPort2()));
As you can see i created my Elasticsearch client(works with Elasticsearch 8.5) WITH two Elasticsearch hosts.

How to keep track of elasticsearch requests

In elastic cluster I have 2 indices. I need to keep track of the requests that come to these indices. For example I have customer and product indices. When a new customer document added to customer index, I need to get the id of the document that added and its body.
Another example when a product document is updated I also need the id of that product and its body or what changed in that document.
My elasticsearch version is 7.17
(I am writing in node.js if you have an code examples or solution I would be appreciated)
you can do this via the Elasticsearch slow log, where you reduce the timing to a 0 so it tracks everything, or via some other proxy that intercepts the requests. Elasticsearch doesn't do this out of the box though unfortunately

Filter the Elasticsearch cat API

I am using the _cat API of elasticsearch to get the various details of my elasticsearch cluster.
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html
What I want is the ability to filter the response which I can't see in the documentation, for example output of _cat/node?v give the node.role which tells whether a node is data or master or ingest node and I want a way to filter the only master and data node in the response.
You can use GET /_cat/master instead of _cat/nodes?v to get the master node. Otherwise, you can use the /_nodes/data:true to get only data nodes
GET /_nodes/data:true
GET /_nodes/ingest:true
GET /_nodes/master:true

Which elasitcsearch node should i query from my application

If I were to set up my cluster on elastic with 3 master node and 5 to 10 data nodes which node IP address should I actually use in my application to query elastic. I am following Hot warm architecture for elastic but from what I have understood is the master node should always be responsible for handling an incoming request and then coordinating that request to further node in the cluster and to operate on the final response.
So should I only use master node IP addresses in my application to talk with the cluster?
First of all, you shouldn't be using individual IP to connect to a cluster as that can potentially become your single point of failure, if the node goes down. You should have a load balancing URL that connects to data nodes or coordinator nodes to aid your search.
Also, it looks like, you are having dedicated master nodes. Typically for larger size cluster, its not recommended to use master as the search coordinator and should ideally have them in master eligible only role to ensure cluster stability. So you will be left with option of using either data nodes or coordinator only nodes to accept your search requests.
If you are using clients like JEST, NEST etc and not directly using the http endpoint for _search, then you also have option to provide a list of IPs/hostname to form a connection pool.
Like #askids mentioned, always connect to elasticsearch using the standard. Elastic itself provides clients.
https://www.elastic.co/guide/en/elasticsearch/client/index.html
You have not mentioned the clients you are going to be using. If your client is based on Java, use the Elasticsearch's Low-Level or High Level Rest Client. These clients are wrappers on apache http client and provide you all the boilerplate logic of handling connections and other features.
You can also add Sniffer support to it.
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-low.html
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/sniffer.html

Multinode couchbase cluster performance issue

We have a Couchbase cluster running consisting of 3 Nodes.
Two nodes have enabled the data, index, query and search service,
The third node is a data only service.
When a "larger" dataset of ~400 entries are created, it takes up to 15 minutes until the documents can be fully queried.
The cluster is accessed by Spring-Data repositories and the Couchbase-Java-Client shipped with Spring-Data-Couchbase only(see versions below).
Performing the same request in our staging environment with a single node cluster and the same GSI Index, the data is, compared to the production state, instantaneously available. So my conclusion would be, that there is an issue with the node sync or the caching in Spring-Data-Couchbase.
Is there a configuration I miss, that would speed up the node sync or anyone else facing the same problem?
Versions:
Couchbase Server 6.0.0 Community
Spring-Boot 2.2.4
Spring-Data-Couchbase 3.2.4
Couchbase Java Client 2.7.11
I suggest that, for one node set only data service and increase the memory size as much as you can.
for 1.node set only data service
for 2.node set data and index service
for 3.node set search and query
if you are not using search service set nodes like these
for 1.node set only data service
for 2.node set only data service
for 3.node set index and query

Resources