How does consul KV consistency works in regards to updating the same key - consul

If given a consul KV key a/key, where there are multiple agent server instances running, what happens if:
Two requests A (set value to val-a) and B (set value to val-b) are made to the create key endpoint without making use of the parameters cas or acquire in order to update the same key a/key:
If A and B are made in parallel can the key's value become corrupted?
Or if A comes slightly before B can the final value still become val-a?

The data will not be corrupted if Consul receives two write requests at the same time. The write requests would be processed serially by the leader, so the value of a/key would become either val-a or val-b, whichever is processed last.
You can find details on how Consul writes data in Consul's Consensus Protocol documentation.

Related

Is this Redis Race Condition Scenario Possible?

I'm debugging an issue in an application and I'm running into a scneario where I'm out of ideas, but I suspect a race condition might be in play.
Essentially, I have two API routes - let's call them A and B. Route A generates some data and Route B is used to poll for that data.
Route A first creates an entry in the redis cache under a given key, then starts a background process to generate some data. The route immediately returns a polling ID to the caller, while the background data thread continues to run. When the background data is fully generated, we write it to the cache using the same cache key. Essentially, an overwrite.
Route B is a polling route. We simply query the cache using that same cache key - we expect one of 3 scenarios in this case:
The object is in the cache but contains no data - this indicates that the data is still being generated by the background thread and isn't ready yet.
The object is in the cache and contains data - this means that the process has finished and we can return the result.
The object is not in the cache - we assume that this means you are trying to poll for an ID that never existed in the first place.
For the most part, this works as intended. However, every now and then we see scenario 3 being hit, where an error is being thrown because the object wasn't in the cache. Because we add the placeholder object to the cache before the creation route ever returns, we should be able to safely assume this scenario is impossible. But that's clearly not the case.
Is it possible that there is some delay between when a Redis write operation returns and when the data is actually available for querying? That is, is it possible that even though the call to add the cache entry has completed but the data would briefly not be returned by queries? It seems the be the only thing that can explain the behavior we are seeing.
If that is a possibility, how can I avoid this scenario? Is there some way to force Redis to wait until the data is available for query before returning?
Is it possible that there is some delay between when a Redis write operation returns and when the data is actually available for querying?
Yes and it may depend on your Redis topology and on your network configuration. Only standalone Redis servers provides strong consistency, albeit with some considerations - see below.
Redis replication
While using replication in Redis, the writes which happen in a master need some time to propagate to its replica(s) and the whole process is asynchronous. Your client may happen to issue read-only commands to replicas, a common approach used to distribute the load among the available nodes of your topology. If that is the case, you may want to lower the chance of an inconsistent read by:
directing your read queries to the master node; and/or,
issuing a WAIT command right after the write operation, and ensure all the replicas acknowledged it: while the replication process would happen to be synchronous from the client standpoint, this option should be used only if absolutely needed because of its bad performance.
There would still be the (tiny) possibility of an inconsistent read if, during a failover, the replication process promotes a replica which did not receive the write operation.
Standalone Redis server
With a standalone Redis server, there is no need to synchronize data with replicas and, on top of that, your read-only commands would be always handled by the same server which processed the write commands. This is the only strongly consistent option, provided you are also persisting your data accordingly: in fact, you may end up having a server restart between your write and read operations.
Persistence
Redis supports several different persistence options; in your scenario, you may want to configure your server so that it
logs to disk every write operation (AOF) and
fsync every query.
Of course, every configuration setting is a trade off between performance and durability.

Role of off-chain workers

I'm trying to build a mental model of the role of off-chain workers in substrate. The bigger picture seems to be that they move logic inside the substrate node, that was otherwise done by oracles, triggering on predefined transactions. There are two use cases I was thinking of specifically:
1: Validating file formats: incoming transaction proposes a file accessible via url or ipfs hash, and it's format needs to be validated. An off-chain worker fetches the file, asserts format (size, encoding, content, whatever) and if correct submits another transaction saying it's valid.
2: Key generation: let's assume there is a separate service distributed with the substrate node, which manages keys for each instance. Node A runs a key sharing algorithm (like Shamir's secret sharing) via this external service between participants A, B and C, then makes a transaction creating a group (A,B,C) on-chain. This transaction triggers all nodes that are in this group to run off-chain workers, call into their local key store verifying having the key. They can all mark it on-chain afterwards.
As far as I understand it correctly, off-chain workers are triggered in every node after block execution. In the former use case, this would result in lots of transactions validating just one file, and nothing guarantees the correctness of these. What is a good way of reaching consensus on the validity of the file? Is it also possible without economic incentives like staking? It would be problematic with tokens having no value in the network, e.g in enterprise settings. Is this even the right use case for off-chain workers? The second example should not suffer from such issue, we just need all parties to verify having the key.
Where does the thought process above go wrong, and why?
As far as I understand it correctly, off-chain workers are triggered in every node after block execution.
Yes and no. There is a CLI flag for it. And at the time of this writing it says:
--offchain-worker <ENABLED>
Should execute offchain workers on every block.
By default it's only enabled for nodes that are authoring new blocks. [default: WhenValidating] [possible
values: Always, Never, WhenValidating]
In the former use case, this would result in lots of transactions validating just one file, and nothing guarantees the correctness of these.
I think it is the responsibility of the receiving function (aka. Call) to handle and incentivise this. For example, there could be a reward opportunity to validate an address. But, if it has already been submitted by another transaction, you will get slashed (or even if not, you do pay some transaction fee, for nothing). In such cases, you can assume that not all participants will submit a transaction. They will only do it when there is a chance of improvement, which should be depicted by your potential reward/slash scheme.
Is this even the right use case for off-chain workers?
I am no expert here, but I think at least the validation example is a good example. It is just a matter of finding a good incentive + anti-spam slashing.
I am less familiar with the second example, so no comments on that.

Detecting and recovering failed H2 cluster nodes

After going through H2 developer guide I still don't understand how can I find out what cluster node(s) was/were failing and which database needs to be recovered in the event of temporary network failure.
Let's consider the following scenario:
H2 cluster started with N active nodes (is actually it true that H2 can support N>2, i.e. more than 2 cluster nodes?)
(lots DB updates, reads...)
Network connection with one (or several) cluster nodes gets down and node becomes invisible to the rest of the cluster
(lots of DB updates, reads...)
Network link with previously disconnected node(s) restored
It is discovered that cluster node was probably missing (as far as I can see SELECT VALUE FROM INFORMATION_SCHEMA.SETTINGS WHERE NAME='CLUSTER' starts responding with empty string if one node in cluster fails)
After this point it is unclear how to find out what nodes were failing?
Obviously, I can do some basic check like comparing DB size, but it is unreliable.
What is the recommended procedure to find out what node was missing in the cluster, esp. if query above responds with empty string?
Another question - why urlTarget doesn't support multiple parameters?
How I am supposed to use CreateCluster tool if multiple nodes in the cluster failed and I want to recover more than one?
Also I don't understand how CreateCluster works if I had to stop the cluster and I don't want to actually recover any nodes? What's not clear to me is what I need to pass to CreateCluster tool if I don't actually need to copy database.
That is partially right SELECT VALUE FROM INFORMATION_SCHEMA.SETTINGS WHERE NAME='CLUSTER', will return an empty string when queried in standard mode.
However, you can get the list of servers by using Connection.getClientInfo() as well, but it is a two-step process. Paraphrased from h2database.com:
The list of properties returned by getClientInfo() includes a numServers property that returns the number of servers that are in the connection list. getClientInfo() also has properties server0..serverN, where N is the number of servers - 1. So to get the 2nd server from the list you use getClientInfo('server1').
Note: The serverX property only returns IP addresses and ports and not
hostnames.
And before you say simple replication, yes that is default operation, but you can do more advanced things that are outside the scope of your question in clustered H2.
Here's the quote for what you're talking about:
Clustering can only be used in the server mode (the embedded mode does not support clustering). The cluster can be re-created using the CreateCluster tool without stopping the remaining server. Applications that are still connected are automatically disconnected, however when appending ;AUTO_RECONNECT=TRUE, they will recover from that.
So yes if the cluster stops, auto_reconnect is not enabled, and you stick with the basic query, you are stuck and it is difficult to find information. While most people will tell you to look through the API and or manual, they haven't had to look through this one so, my sympathies.
I find it way more useful to track through the error codes, because you get a real good idea of what you can do when you see how the failure is planned for ... here you go.

Queueing mechanism and Elasticsearch 1.4.0

I have a RabbitMQ broker, on which I post different messages that will end up as documents in Elasticsearch. There are multiple consumers from the broker, which are actually different threads in a task executor assigned to an amqp inbound gateway (using spring integration and spring amqp here).
Think at the following scenario: I have created a doc in ES with the structure
{
"field1" : "value1",
"field2" : "value2"
}
Afterwards I send two update requests, both updating the same field, let's say field1. If I send this messages one right after another(common use case in production), my consumer threads will fetch the messages in the right order(amqp allows this), but the processing could happen in the wrong order and the later updated value could be overwritten by the first one. I will end up having wring data.
How can I make sure my data won't get corrupted? =>Having 1 single consumer thread is not enough, because if I want to scale out by adding more machines with my consuming app, I will still end up having multiple consumers. I might need ordering of messages, but having multiple machines I will probably need to create some sort of a cluster aware component, I am using SI, so this seems really hard to do in my opinion.
In pre 1.2 versions of ES, we used an external version, like a timestamp, and ES would have thrown VersionConflictException in my scenario:first update would have had version 10000 let's say, the second 10001 and if the first would have been processed first, ES would reject the request with version 10000 as it's lower than the existing one. But from the latest versions, ES guys have removed this functionality for update operations.
One solution might be to use multiple queues and have a single consumer on each queue; use a hash function to always route updates to the same document to the same queue see the RabbitMQ Tutorials for the various options.
You can scale out by adding more queues (and changing your hash function).
For resiliency, consider running your consumers in Spring XD. You can have a single instance of each rabbit source (for each queue) and XD will take care of failing it over to another container node if it goes down.
Otherwise you could roll your own by having a warm standby - inbound adapters configured with auto-startup="false" and have something monitor and use a <control-bus/> to start a new instance if the active one goes down.
EDIT:
In response to the fourth comment below.
As I said above, to scale out, you would have to change the hash function. So adding consumers automatically while running would be tricky.
You don't have to hard-code the queue names in the jar, you can use a property placeholder and fill it from properties, system properties, or an environment variable.
This solution is the simplest but does have these limitations.
You could, however, build a management app that could scale it out - stop the producer, wait for all queues to quiesce, reconfigure the consumers and restart the producer - Spring Integration provides a <control-bus/> to start/stop adapters; you can also do it via JMX.
Alternative solutions are possible but will generally require maintaining some shared state across a cluster (perhaps using zookeeper etc), so are much more complex; and you still have to deal with race conditions (where the second update might arrive at some consumer before the first).
You can use the default mechanism for consistency checks. Basically you want to verify that you have the latest version of whatever you are updating.
So for that you need to fetch the _version with the object. In queries you can do this by setting version=true on the toplevel. That will cause the _version to be returned along with your query results. Then when doing an update, you simply set the version parameter in the url to the value you have and it will generate a version conflict if it doesn't match.
Nicer is to handle updates using closures. Basically this works as follows: have an update method that fetches the object by id, applies a closure (parameter to the update function) that encapsulate the modifications you want to make, and then stores modified object. If you trap the still possible version conflict, you can simply get the object again and re-apply the closure to the object. We do this and added a random sleep before the retry as well, this vastly reduces the chance of multiple updates failing and is a nice design pattern. Keeping the read and write together minimizes the chance of a conflict and then retrying with a sleep before that minimizes it further. You could add multiple retries to further reduce the risk.

Redis namespacing basics

I am really new to Redis and have been using it along with my Ruby on Rails (Rails 2.3 and Ruby 1.8.7) application using the redis gem for simple tagging functionality as a key value store. I recently realized that I could use it to maintain a user activity feed as well.
The thing is I need the tagging data (stored as key => Sets) in memory and its extremely important to determine results for tagging related operations, where as for the activity feed the data could be deleted on a first in first out basis. Assuming I store X number of activities for every user
Is it possible that I could namespace the redis data sets and have one remain permanently in memory and have the other stay temporarily in the memory. What is the general approach when one uses unrelated data sets that need to have different durations of survival in memory.
Would really appreciate any help on this.
You do not need to define a specific namespace for this. With Redis, you can use the EXPIRE command to set a timeout on a key by key basis.
The general policy regarding key expiration is defined in the configuration file:
# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached? You can select among five behavior:
#
# volatile-lru -> remove the key with an expire set using an LRU algorithm
# allkeys-lru -> remove any key accordingly to the LRU algorithm
# volatile-random -> remove a random key with an expire set
# allkeys->random -> remove a random key, any key
# volatile-ttl -> remove the key with the nearest expire time (minor TTL)
# noeviction -> don't expire at all, just return an error on write operations
#
For your purpose, the volatile-lru policy should be set.
You just have to call EXPIRE on the keys you want to be volatile, and let Redis evict them. However please note it is difficult to guarantee that the oldest keys will be evicted first once the timeout has been triggered. More explanations here.
For your specific use case however, I would not use key expiration but rather try to simulate capped collections. If the activity feed for a given user is represented as a list of objects, it is easy to LPUSH the activity objects, and use LTRIM to limit the size of the list. You get FIFO behavior and keep memory consumption under control for free.
UPDATE:
Now, if you really need to isolate data, you have two main possibilities with Redis:
using two distinct databases. Redis database are identified by an integer, and you can have several of them per instance. Use the select command to switch between databases. Databases can be used to isolate data, but not to assign them different properties (like an expiration policy for instance).
using two distinct instances. An empty Redis instance is a very light process. So several of them can be started without any problem. It is actually the best and the more scalable way to isolate data with Redis. Each instance can have its own policies (including eviction policy). The clients should open as many connections as instances.
But again, you do not need to isolate data to implement your eviction policy requirements.

Resources