How to figure out the shard Id of an account or receipt using the RPC node API? - nearprotocol

I've looked through all the documentation but I cannot find a way to query in which shard a specific receipt is being executed. Is there no easy way of doing it?
There is one ugly way of doing it if the block of the receipt is known but it requires iterating through every receipt in every chunk in that block until found which is not very efficient and requires at least 4 RPC calls (for now with 4 shards).

The RPC indeed does not support that type of query. (It's an open-source project, you could open an issue for it and/or implement it: github.com/near/nearcore/blob/master/chain/jsonrpc.)
You might also find it useful to know the mapping from accounts to shards. For simple nightshade layout, it is defined in the source code to split like this, using lexicographic sort order:
shard 0 account_id < "aurora"
shard 1 account_id >= "aurora" && account_id < "aurora-0"
shard 2 account_id >= "aurora-0" && account_id < "kkuuue2akv_1630967379.near"
shard 3: account_id >= "kkuuue2akv_1630967379.near"
As more phases of sharding are implemented, this can change. I think adding an RPC endpoint could be very valuable.

Related

Why elasticsearch return docs in different order with the same query?

In elasticsearch 7.9, I have an Index with 1 shard and 1 replica. I use simple datetime filter to get docs between start time and end time, but I often get same result set in different order. I do not want to use Sort clause and compute scores. I just want to get results in same order.
So there is anyway to do this without using Sort?
It may be happening due to the fact, that you have 1 replica for your index, which might have some difference or different values for your timestamp field, you can use the preference param and make sure, your search results are always returned from the same shard.
Refer bouncy result issue blog in ES for more info.

elastic search preference setting Custom Value(Java api)

I really really need some helps on elastic search usage in java api...
Let's assume I am using java api from ES.
So far, I understand that elastic search can give inconsistent result due to primary and replica's inconsistency issue(deleting doc makes stats difference in overall due to deletion marking instead of delete it).
So what I tried it
searchRequest.preference("_primary_first").
This gave me consistent result(since it only uses primary shard!)
Now what I want to try in my toy example is,
1) using preference=Custom (string) value
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-preference
2) if I have 5 nodes, I want to designate which node we want to use based on the queryText.
For instance,
'''
if (queryRequest.text().equals("red")) {
// use 1st node
searchRequest.preference("??????")
} else if (queryRequest.text().equals("blue")) {
// use 2nd node
searchRequest.preference("??????")
} else {
// use either 3rd~5th node <- but this is not necessary if it is really hard..
searchRequest.prefernce("???????")
}
'''
Q1)
I guess I need to use custom setting "WISELY" to denote which node to use...
can someone give me simple java.api example?
Q2)
This is another one, but is there any way we can load status for each node from searchResponse?(again in java api friendly)
Q3)
Is there any clever way to specify to use 1st Node(or certain Node Id??) with given query text?(instead using hashmap things...)
For instance,
let say I don't know which query text I will receive, but I want to evenly distribute them to each node(among 5!)
But want to stick with the first choice.
if I see very first query text == "red" and I designate this queryRequest to use Node1, then later I also want to use Node1 if I see the query text == "red" again.. Does someone have idea?
Thank you guys!
Disclaimer:
I am non-CS guy and independant learner who tried to experiment new things to break my comfort zones! :) Please excuse this silly question!
Actually it's not a silly question and the answer has two parts.
You mention nodes and you want to control which node gets what queries based on an attribute.
Some context:
An elasticsearch cluster has elasticsearch nodes
Your documents will be "saved" in an elasticsearch index and the queries you perform will be against that index
An elasticsearch index is but an abstraction, a layer that hides the complexity of shards (basically lucene indices).
Now when you save a document, that document will eventually be stored in a shard (there are segments etc, but no reason to go any further). Now you can have primary shards and replica shards. When you save something, that will go to a primary shard and will be replicated by elasticsearch to the replica shards (if any). Your searches can and will be served both by primary and replica shards.
Now, you want to control which node gets what. What you can control is which shard gets what via routing on save and via routing on search.
You've asked to control which node get's what. Most of the times you won't be needing this. What you can control is what shard gets what, so you'll need to control which node gets what shard. This can be accomplished via shard allocation awareness.
Both of these topics are advanced features and you'll need to make sure to know what you are doing when trying to use them or you'll get very unexpected results.

Multiple Instances of same Microservices should not fetch and update same MongoDB Documents

Problem Statement:
There is one Scheduler in our microservice which will fetch 10 Request Documents(having staus "A") and start processing on them and after that update each Document's status.
We are planning to deploy three instances of the same microservice. once all three instances start executing scheduler then each instance will fetch 10 Request Documents and start processing them.
The same request documents should not be fetched and processed in multiple instances. How can I make sure that each instance of microservice picks 10 different documents?
Each of your workers should choose a different set of documents.
For example, if you have an autoincrement numeric id for your documents:
Worker 1 can get all documents where ID % 3 = 0
Worker 2 can get all documents where ID % 3 = 1
Worker 3 can get all documents where ID % 3 = 2
If autoincrement ID is not available, you can take a look at consistent hashing which is very nice to learn.
Hashing is simple:
Compute the hash of the document ID
Use this hash to get which worker should process this document
The problem with this approach is that, say the range of your document ID hash is between 1 and 1000 and you are using the same approach above. You might end up by Worker 1 getting much more work than worker 2.
With Consistent Hashing, you expand the range to get a more balanced.

How does elastic search brings back a node which is down

I was going through elastic search and wanted to get consistent response from ES clusters.
I read Elasticsearch read and write consistency
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-index_.html
and some other posts and can conclude that ES returns success to write operation after completing writes to all shards (Primary + replica), irrespective of consistency param.
Let me know if my understanding is wrong.
I am wondering if anyone knows, how does elastic search add a node/shard back into a cluster which was down transiently. Will it start serving read requests immediately after it is available or does it ensures it has up to date data before serving read requests?
I looked for the answer to above question, but could not find any.
Thanks
Gopal
If node is removed from the cluster and it joins again, Elasticsearch checks if the data is up to date. If it is not, then it will not be made available for search, until it is brought up to date again (which could mean the whole shard gets copied again).
the consistency parameter is just an additional pre-index check if the number of expected shards are available in the cluster (if the index is configured to have 4 replicas, then the primary shard plus two replicas need to be available, if set to quorum). However this parameter does never change the behaviour that a write needs to be written to all available shards, before returning to the client.

SolrCloud: workaround for classic pagination with "start,rows" parameters

I have SolrCloud with 3 shards.
My purpose: select and process all products from category.
Current implementation: Portion selection in cycle.
1st iteration: q=cat:1&start=0&rows=100
2nd iteration: q=cat:1&start=100&rows=100
3th: q=cat:1&start=200&rows=100
...
But growing "start", performance is down. Explanation here: https://wiki.apache.org/solr/DistributedSearch
Makes it more inefficient to use a high "start" parameter. For
example, if you request start=500000&rows=25 on an index with 500,000+
docs per shard, this will currently result in 500,000 records getting
sent over the network from the shard to the coordinating Solr
instance. If you had a single-shard index, in contrast, only 25
records would ever get sent over the network. (Granted, setting start
this high is not something many people need to do.)
What ideas how I can walk around all records in category?
There is another way to do more effective pagination in Solr - Cursors - which uses the current place in the sort instead. This is particularly useful for deep pagination.
See the section about Cursors at the Pagination of Results wiki page. This should speed up delivery as the Server should be able to do a sort of its local documents, decide where it is in that sequence and return 25 documents after that document.
UPDATE: Also useful link coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets
I think the short answer is "no" - it's a limitation of how Solr does sharding. Instead, can you amass a list of document unique keys outside of Solr - presumably from a backing database - and then retrieve from the index using sets of those keys instead?
e.g. ID:(1 OR 2 OR 3 OR ...very long list...)
Or, if the unique keys are numeric you could use a moving range instead:
ID:[1 TO 1000] then ID:[1001 TO 2000] and so forth.
In both options above you'd also restrict by category as well. They both should avoid the slow down associated with windowing however.

Resources