Does Kademlia iterativeFindNode operation store found contacts in the k-buckets? - algorithm

Following the Kademlia specifications found at XLattice, I was wondering the exact working of the iterativeFindNode operation and how it is useful for bootstrapping and refreshing buckets. The document says:
At the end of this process, the node will have accumulated a set of k active contacts or (if the RPC was FIND_VALUE) may have found a data value. Either a set of triples or the value is returned to the caller. (§4.5, Node Lookup)
The found nodes will be returned to the caller, but the specification don't specify what to do with these values once returned. Especially in the context of refresh and bootstrap:
If no node lookups have been performed in any given bucket's range for tRefresh (an hour in basic Kademlia), the node selects a random number in that range and does a refresh, an iterativeFindNode using that number as key. (§4.6, Refresh)
A node joins the network as follows: [...] it does an iterativeFindNode for n [the node id] (§4.7, Join)
Does running the iterativeFindNode operation in itself enough to refresh k-buckets of contacts, or does the specification omits that the result should be inserted in the contact buckets?
Note: the iterativeFindNode operation uses the underlying RPC and through them can update the k-buckets as specified:
Whenever a node receives a communication from another, it updates the corresponding
bucket. (§3.4.4, Updates)
However, only the recipient of the FIND_NODE RPC will be inserted in the k-buckets, and the response from that node (containing a list of k-contacts) will be ignored.

However, only the recipient of the FIND_NODE RPC will be inserted in the k-buckets, and the response from that node (containing a list of k-contacts) will be ignored.
I can't speak for XLattice, but having worked on a bittorrent kademlia implementation this strikes me as strange.
Incoming requests are not verified to be reachable nodes (NAT and firewall issues) while responses to outgoing RPC calls are a good indicator that a node is indeed reachable.
So incoming requests could only be added as tentative contacts which have still to be verified while incoming responses should be immediately useful for routing table maintenance.
But it's important to distinguish between the triples contained in the response and the response itself. The triples are unverified, the response itself on the other hand is a good verification for the liveness of that node.
Summary:
Incoming requests
semi-useful for routing table
reachability needs to be tested
Incoming responses
Immediately useful for the routing table
Tuples inside responses
not useful by themselves
but you may end up visiting them as part of the lookup process, thus they can become responses

Related

Question about implementing Raft's Client interaction

I'm actually learning MIT6.824,
https://www.youtube.com/channel/UC_7WrbZTCODu1o_kfUMq88g,
and try to implement its lab,
there's a paragraph in raft's paper describing client semantics:
Our goal for Raft is to implement linearizable seman- tics (each operation appears to execute instantaneously, exactly once, at some point between its invocation and its response). However, as described so far Raft can exe- cute a command multiple times: for example, if the leader crashes after committing the log entry but before respond- ing to the client, the client will retry the command with a new leader, causing it to be executed a second time. The solution is for clients to assign unique serial numbers to every command. Then, the state machine tracks the latest serial number processed for each client, along with the as- sociated response. If it receives a command whose serial number has already been executed, it responds immedi- ately without re-executing the
request.
Now I have passed MIT lab 3A, but I have responses map[string]string in kvserver,
which is a map from client's request id to response, but the problem is then the
map will keep increasing if client's keep sending request, Which is problemic in real project. How does Raft handle this in real project? Also, the MIT lab 3 says one client
will execute one command at a time, so probably I can optimize by deleting client's last request's response. But how does Raft handle this in real project where client's behavior is more free?

Corda State Events : Do events have an order?

A network consist of 3 nodes, where 1 node is read-only and participates in every transaction. Request can start from either of the nodes which in turn creates a request state. It is received and processed by other node to create a new response state. Both only issue new states and do not consume the state. Both these state events are received by the read-only node. Would the State events received by the read-only corda node have an order or would it be processed in any order ?
For eg can we say that the request originator state event would be received/processed first and then the other node ? or can it be possible under high load that the other node request gets received/processed by the read-only node first and then the originators event is received.
My experience with corda is very minimal and need to understand
how events are received by the parties when one party acts as
read-only and all remaining parties only issue new states.
In general, the order of the receiving messages is not guaranteed. A node will process messages in the order it receives them. But it's not guaranteed that the received messages are sequential.
If Node A is receiving messages from Node B and Node C, and Node B produces a message before node C. There is no guarantee that the message from Node B is processed first. The one which reaches Node A first gets processed first. The delay could be because of multiple reasons like network latency, etc.

Hazel-cast Cache Key-Values availability on all nodes/Member

I am using hazelCast to cache the data getting fetched from API.
Structure of the API is something like this
Controller->Service->DAOLayer->DB
I am keeping #Cacheable at service layer where getData(int elementID) method is present.
In my architecture there are two PaaS nodes (np3a, np4a). API will be deployed on both of them and users will be accessing them via loadBalancer IP, which will redirect them to either of the nodes.
So It might be possible that for one hit from User X request goes to np3a and for another hit from same user request goes to np4a.
I want that in the very first hit when I would be caching the response on np3a, the same cached response is also available for next hit to np4a.
I have read about
ReplicatedMaps : Memory inefficient
NearCache : when read>write
I am not sure which one approach to take or if you suggest something entirely different.
If you have 2 nodes, Hazelcast will partition data so that half of it will be on node 1, and the other half on node 2. It means there's a 50% chance a user will ask the node containing the data.
If you want to avoid in all cases an additional network request to fetch data that is not present on a node, the only way is to copy data each time to every node. That's the goal of ReplicatedMap. That's the trade-off: performance vs. memory consumption.
NearCache adds an additional cache on the "client-side", if you're using client-server architecture (as opposed to embedded).

rendezvous hashing algorithm new node causing issues

I have three cache servers and use HRW to hash.
When a client sends requests
A server is chosen depending upon the highest weight (hash of request and server).
If request is not found on that, request is forwarded to back-end server and result it fetched, stored in that cache and forwarded to client. (A similar request in future will be fetched from the cache.)
The issue is when for request R1 a result is stored in Server 2, let's suppose. Now let's say 2 new servers come up. Now, if we send R1 again, and it finds weight just like before and weight of any of the new server comes out to be more than previous values, then it won't fetch me a result.
How should I respond to this issue?

Handling processing overhead due to request time out

Consider a service running on a server for a customer c1,but customer c1 times out after 'S' sec for what so ever be the reason so customer again fires the same request ,so server is running duplicate query hence it gets overloaded, resolve this glitch. Please help me !!!
I assume you are on the server side and hence cannot control multiple requests coming in from the same client.
Every client should be having an IP address associated with them. In your load balancer(if you have one) or in your server you need to keep an in-memory cache which keeps track of all requests, their IP addresses, timestamp when request originated and timestamp when request processing finished. Next you define and appropriate time measure - which should be near about 70-80% percentile of processing time for all your requests. Lets say X seconds.
Now, before you accept any request at your loadbalancer/ server you need to check in this in-memory cache whether the same IP has sent the same request and the time elapsed since the last request is less than X. If so do not accept this request and instead send a custom error stating something like "previous request still under processing. Please try after some time".
In case IP address is not enough for identifying a client, as the same client may be sending requests to different endpoints on your server for different services, then you need to store another identifier which maybe a kind of token/session identifier - such as c1 or customer id in your case. Ideally, a customer can send only 1 request from 1 IP Address to an endpoint at any 1 point of time. Just in case you have mobile and web interfaces then you can add the channel-type(web/mobile/tablet) as well to the list of identifying parameters .
So now, a combination of - customer id(c1), IP address, request URL,request time, channel-type will always be unique for a request coming in. Using a key of all these parameters in your cache to uniquely fetch information for a request and validating whether to start processing the request or send a custom error message to prevent overloading the server with re-requests - should solve the problem defined above.
Note - 'S' seconds i.e. client-side timeout - given that the client-side timeout is not in our control - should not concern the server-side and will have no bearing on the design I have detailed above.

Resources