Maintain a growing list in offchain storage substrate or use StorageValueRef::mutate in extrensics - substrate

I was trying to have a offchain storage that stores a collection of data (likely Vector).
And I was assuming to keep this vector growing.
One smooth-seeming approach was to use StorageValueRef::mutate() function but only later that I found we can't use that in extrinsic ( or maybe we could and I am not aware of ).
Another simple approach is to use BlockNumber to create a storage key and use BlockNumber from offchain wroker to reference that value.
But on what I am doing there will be need to store multiple data coming into single block. So I will be restricted to be able to store only one value per block which also doesn't fit the requirements.

You could create a map like this:
#[pallet::storage]
pub type MyData<T: Config> =
StorageMap<_, Twox64Concat, T::BlockNumber, Vec<MyData>>;
Then you can do MyData::<T>::append(block_number, data) in your pallet as often as you want.
But I would propose that you introduce some "pruning" window. Let's say 10 and only keep the data of the latest 10 blocks in the state. For that you can just have some MyData::<T>::remove(block_number - 10) in your on_intialize.
But if it is really just about data that you want to set from the runtime for the offchain worker, you could use sp_io::offchain_index::set("key", "data");. But this is a more low level interface. However, here you could also prefix the key by block number to have it unique per block, but you will need to come up with your own custom way of storing multiple values per block.

Related

What is Redis ValueOperations?

What is Redis Value operations in Spring boot?
Is it like we can directly store Key-value pair in Redis database without creating the entity and stuff just by using RedisTemplate<String, Object> ?
Also, if we use ValueOperations how will it impact the performance?
When using Redis, you should think about what data format/datatype suits your needs best, similar to what you would do when coding in any general programming language. All those operations, ValueOperations, ListOperations, SetOperations, HashOperations, StreamOperations are the support provided for interacting with the mentioned datatypes. They are provided by the RedisTemplate.
When you are using ValueOperations, you are more or less treating your whole Redis instance as a giant hash map. For example, you can store entries in Redis like current_user = "John Doe". However, you can also do something silly such as keeping a string representation of a huge hashmap against a key, top_users = <huge_string_representing_a_hash_map> when thinking from the perspective of the second case, what if you want to get the value for one key in the mentioned hash map. Then, the task becomes more or less impossible without transferring the whole hash map in RAM. Yet, if you have used Redis Hashes and HashOperations that would have been a more trivial task.
Going back to your question, if you want to store a simple object using ValueOperations. That wouldn't degrade the performance. In contrast, if you are moving huge maps around, you'll utilise a lot of your network bandwidth and RAM capacity.
In summary, choose your Redist data types carefully to suit your needs.
https://redis.io/topics/data-types

Storage access at previous blocks: i.e storage::get(&key, &block)

How can my pallet access a substrate chain's storage at a previous block?
For example: storage_name::get(&key, &block_number);
Possible, is there documentation?
Not possible, can we request this feature?
It is not possible to query the storage of older blocks from within the runtime, nor would it be a feature that really makes sens to include as you describe it.
Each block should only rely on the data available in that block, else you start to make larger assumptions about the clients you are working with and what data is actually available to them.
The solution is simple here, just store any data you need into your own storage item that persists from block to block. We do this for a number of storages where we need the info from previous blocks like the validator and nominator information in the staking pallet.
When you don't need that data anymore, you can clean it up.
Here is an example: https://github.com/paritytech/substrate/blob/master/frame/staking/src/lib.rs#L969

How do I store module-level write-once state?

I have some module-level state I want to write once and then never modify.
Specifically, I have a set of strings I want to use to look things up in later. What is an efficient and ordinary way of doing this?
I could make a function that always returns the same set:
my_set() -> sets:from_list(["a", "b", "c"]).
Would the VM optimize this, or would the code for constructing the set be re-run every time? I suspect the set would just get GCd.
In that case, should I cache the set in the process dictionary, keyed on something unique like the module md5?
Key = proplists:get_value(md5, module_info()), put(Key, my_set())
Another solution would be to make the caller to call an init function to get back an opaque chunk of state, then pass that state into each function in the module.
A compile-time constant, like your example list ["a", "b", "c"], will be stored in a constant pool on the side when the module is loaded, and not rebuilt each time you run the expression. (In the old days, the list would have been reconstructed from its elements for each new call.) This goes for all constants no matter how complicated (like lists of lists of tuples). But when you call out to a function like sets:from_list/1, the compiler cannot assume anything about the representation used by the sets module, and the set will be constructed dynamically from that constant list.
While an ETS table would work, it is less efficient for larger constants (like, say, a set or map containing many entries), because an ETS table has the same memory model as a process - data is written and read by copying, as if by sending messages. If the constants are small, the difference between copying them and recreating them locally would be negligible, and if the constants are large, you waste time copying them.
What you want instead is a fairly new feature called Persistent Term Storage: https://erlang.org/doc/man/persistent_term.html (since Erlang/OTP 21). It is similar to the way compile time constants are handled, so there is no copying when looking up a value. (The key could be the name of your module.) Persistent Term is intended as pretty much a write-once-read-many storage - you can update the stored entry, but that's a more expensive operation which may trigger a global GC.

several questions about multi-paxos?

I have several questions about multi-paxos
will each instance has it's own proposal Number and accepted ballot and accepted value ? or all the instance share with the same
proposal number ,after one is finished ,then anther one start?
if all the instance share with the same proposal number ,Consider the below condition, server A sends a proposal ,and the acceptor returns the accepted instanceId which might be greater or less than the proposal'instanceid ,then what will proposal do? use that instanceId and it's value for accept phase? then increase it'own instanceId ,waiting for next round ,then re-proposal with it own value? if so , when is the previous accepted value removed,because if it's not removed ,the acceptor will return this intanceId and value again,then it seems it is a loop
Multi-Paxos has a vague description so two persons may build two different systems based on it and in a context of one system the answer is "no," and in the context of another it's "yes."
Approach #1 - "No"
Mindset: Paxos is a two-phase protocol for building write-once registers. Multi-Paxos is a technique how to create a log on top of them.
One of the possible ways to build a log is
Create an array of completely independent write-once registers and initialize the first one with an initial value.
On new record we should:
A) Guess an index (X) of a vacant register and try to write a dummy record here (if it's already used then pick a register with a higher index and retry).
B) Start writing dummy records to every register with smaller than X index until we find a register filled with a non-dummy record.
C) Calculate a new record based on it (e.g., a record may have an ordinal, and we can use it to calculate an ordinal of the new record; since some registers are filled with dummy records the ordinals aren't equal to index) and write it to the X+1 register. In case of a conflict, we should restart the procedure from step A).
To read the log we should start writing dummy values from the first record, and on each conflict, we should increment index and retry until the write is succeeded which would indicate that the log's end is reached.
Of course, there is a lot of overhead in this approach, so please treat it just like a top-level overview what Multi-Paxos is.
The log is a powerful concept, and we can use it as a recipe for building distributed state machines - just think of each record as an update command. Unfortunately, in some cases, there is also a lot of overhead. For example, if you want to build a key/value storage and you care only about the current value than you don't need history and probably need to implement garbage collection to remove past versions from the log to optimize storage costs.
Approach #2 - "Yes"
Mindset: rewritable register as a heavily optimized version of Multi-Paxos.
If you start with the described approach with an application to the creation of key/value storage and then iterate in other to get rid of overhead, e.g., by doing garbage collection on the fly then eventually you may come up with an idea how to update the write-once register to be rewritable.
In that case, each instance uses the same ballot numbers just because all the instances are collapsed into one rewritable instance.
I described this approach in the How Paxos Works post and implemented it in the Gryadka project with 500-lines of JavaScript. Also, the idea behind it was independently checked with TLA+ by Greg Rogers and Tobias Schottdorf.

Choosing Redis datatypes for advanced data manipulation in a simple torrent tracker service

I need your advice on Redis datatypes for my project. The project is a torrent-tracker (ruby, simple sinatra-based) with pure in-memory data store for current information about peers. I feel like this is what Redis is made for. But I'm stuck at choosing proper data types for this. For now I tend to the following setup:
Use list for seeders. Actually I'd better need a ring buffer to get a sequential range of seeders (with given size and start position) and save new start position for the next time.
Use sorted set for leechers. Score for each leecher is downloaded/(downloaded+left) so I can also extract a range for any specific case.
All string values in set and list are string (bencoded) representation of peer data.
What I actually lack in the setup above is:
Necessity to store offset for seeders so data access needs synchronization.
Unknown method of finding a specific seeder in list. Here I may benefit from set but then I won't be able to extract a range of items at once.
(General problem) Need TTL for set/list members (if client shuts down without sending any data before this). Possible option is to make each peer an ordinary string key/value (string or hash), give it TTL, subscribe on destroy and delete it in corresponding list or set.
What could you suggest? Any practical advice?

Resources