Key size change for leveldb - leveldb

We are using leveldb for indexing data blocks for disks and use one leveldb instance per disk.
The keys for the index are
The fingerprint was existing in the key for some historical reasons (not known to me)
We are planning to get rid of this fingerprint suffix from the key (as we concluded that we can maintain uniqueness of the key with just inode and page_offset).
The issue is the upgrade from older version to newer version, where we need to maintain two indexes for brief time till the first index becomes free.
The question is, is there a way to use the same old index and change the key size for the new key insertions and to use only for old keys ignoring the suffix part during lookups ?
Please let me know if my question is not very clear.

You can do some work on leveldb::Options.comparator, which by default is leveldb::BytewiseComparatorImpl.
For a example you can define a class named IgnoreSuffixComparatorImpl :
#include "leveldb/comparator.h"
class IgnoreSuffixComparatorImpl : public Comparator {
...
virtual int Compare(const Slice& a, const Slice& b) const {
return removeSuffix(a).compare(removeSuffix(b));
}
...
}
Then, when you init db, you can use the new comparator:
options.comparator = new IgnoreSuffixComparatorImpl();
s = leveldb::DB::Open(options, db_path, manifest, &db);

Related

How can I store multiple elements in a Rust HashMap for the same key?

I have a HashMap<u32, Sender>. Sender is a open connection object and the key is a user id. Each user can connect from multiple devices. I need to store all possible open connections for the same user id. After this I can iterate and send messages to all open connections for same user.
The above HashMap only stores each user id and connection once. I need to get one key with multiple values. How can I make the value into a list or an array, so I can see which connections exist and send to them all?
I am not talking about different value types, like enums. I am talking about the same type values but more than one. Maybe HashMap is not designed for this?
Alternative ideas are also welcomed.
To do this with a HashMap you should use a Vec as the values, so that each key can point to multiple Senders. The type then would be HashMap<u32, Vec<Sender>>.
Using this structure, just using insert() can get clunky when you need to mutate the values like this, but instead you can use the Entry API for retrieving and updating records in one go. For example:
let mut hash_map: HashMap<u32, Vec<Sender>> = HashMap::new();
hash_map.entry(3)
// If there's no entry for key 3, create a new Vec and return a mutable ref to it
.or_default()
// and insert the item onto the Vec
.push(sender);
You could also use the multimap crate, which does something similar under the hood, but adds a layer of abstraction. You might find it easier to work with:
let mut multi_map = MultiMap::new();
multi_map.insert(3, sender_1);
multi_map.insert(3, sender_2);
The method multi_map.get(key) will the first value with that key, while multi_map.get_vec(key) will retrieve all of them.

What is the purpose of RocksDBStore with Serdes.Bytes() and Serdes.ByteArray()?

RocksDBStore<K,V> stores keys and values as byte[] on disk. It converts to/from K and V typed objects using Serdes provided while constructing the object of RocksDBStore<K,V>.
Given this, please help me understand the purpose of the following code in RocksDbKeyValueBytesStoreSupplier:
return new RocksDBStore<>(name,
Serdes.Bytes(),
Serdes.ByteArray());
Providing Serdes.Bytes() and Serdes.ByteArray() looks redundant.
RocksDbKeyValueBytesStoreSupplier is introduced in KAFKA-5650 (Kafka Streams 1.0.0) as part of KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines.
In KIP-182, there is the following sentence :
The new Interface BytesStoreSupplier supersedes the existing StateStoreSupplier (which will remain untouched). This so we can provide a convenient way for users creating custom state stores to wrap them with caching/logging etc if they chose. In order to do this we need to force the inner most store, i.e, the custom store, to be a store of type <Bytes, byte[]>.
Please help me understand why we need to force custom stores to be of type <Bytes, byte[]>?
Another place (KAFKA-5749) where I found similar sentence:
In order to support bytes store we need to create a MeteredSessionStore and ChangeloggingSessionStore. We then need to refactor the current SessionStore implementations to use this. All inner stores should by of type < Bytes, byte[] >
Why?
Your observation is correct -- the PR implementing KIP-182 did miss to remove the Serdes from RocksDBStore that are not required anymore. This was fixed in 1.1 release already.

Multiple parallel Increments on Parse.Object

Is it acceptable to perform multiple increment operations on different fields of the same object on Parse Server ?
e.g., in Cloud Code :
node.increment('totalExpense', cost);
node.increment('totalLabourCost', cost);
node.increment('totalHours', hours);
return node.save(null,{useMasterKey: true});
seems like mongodb supports it, based on this answer, but does Parse ?
Yes. One thing you can't do is both add and remove something from the same array within the same save. You can only do one of those operations. But, incrementing separate keys shouldn't be a problem. Incrementing a single key multiple times might do something weird but I haven't tried it.
FYI you can also use the .increment method on a key for a shell object. I.e., this works:
var node = new Parse.Object.("Node");
node.id = request.params.nodeId;
node.increment("myKey", value);
return node.save(null, {useMasterKey:true});
Even though we didn't fetch the data, we don't need to know the previous value in order to increment it on the database. Note that you don't have the data so can't access any other necessary data here.

g-wan kv store KV_INCR_KEY

How to use the KV_INCR_KEY?
I found a useful feature in gwan api, but without any sample.
I want to add items to the KV store with this as primary key.
Also, how to get the value of this key?
The KV_INCR_KEY value is a flag intended to be passed to k_add().
You get the newly inserted key's value by checking the return value of k_add(). The documentation states:
kv_add(): add/update a value associated to a key
return: 0:out of memory, else:pointer on existing/inserted kv_item struct
This was derived from an idea discussed on the G-WAN forum. And, like for some other flags (timestamp or persistence, for example), it has not not been implemented yet (KV_NO_UPDATE is functional).
Since what follows the next version (focussed on new scripted languages) is a kind of zero-configuration mapReduce, the KV store will get more attention soon.

Memcached dependent items

I'm using memcahced (specifically the Enyim memcached client) and I would like to able to make a keys in the cache dependant on other keys, i.e. if Key A is dependent on Key B, then whenever Key B is deleted or changed, Key A is also invalidated.
If possible I would also like to make sure that data integrity is maintained in the case of a node in the cluster fails, i.e. if Key B is at some point unavailable, Key A should still be invalid if Key B should become invalid.
Based on this post I believe that this is possible, but I'm struggling to understand the algorithm enough to convince myself how / why this works.
Can anyone help me out?
I've been using memcached quite a bit lately and I'm sure what you're trying to do with depencies isn't possible with memcached "as is" but would need to be handled from client side. Also that the data replication should happen server side and not from the client, these are 2 different domains. (With memcached at least, seeing its lack of data storage logic. The point of memcached though is just that, extreme minimalism for bettter performance)
For the data replication (protection against a physical failing cluster node) you should check out membased http://www.couchbase.org/get/couchbase/current instead.
For the deps algorithm, I could see something like this in a client: For any given key there is a suspected additional key holding the list/array of dependant keys.
# - delete a key, recursive:
function deleteKey( keyname ):
deps = client.getDeps( keyname ) #
foreach ( deps as dep ):
deleteKey( dep )
memcached.delete( dep )
endeach
memcached.delete( keyname )
endfunction
# return the list of keynames or an empty list if the key doesnt exist
function client.getDeps( keyname ):
return memcached.get( key_name + "_deps" ) or array()
endfunction
# Key "demokey1" and its counterpart "demokey1_deps". In the list of keys stored in
# "demokey1_deps" there is "demokey2" and "demokey3".
deleteKey( "demokey1" );
# this would first perform a memcached get on "demokey1_deps" then with the
# value returned as a list of keys ("demokey2" and "demokey3") run deleteKey()
# on each of them.
Cheers
I don't think it's a direct solution but try creating a system of namespaces in your memcache keys, e.g. http://www.cakemail.com/namespacing-in-memcached/. In short, the keys are generated and contain the current values of other memcached keys. In the namespacing problem the idea is to invalidate a whole range of keys who are within a certain namespace. This is achieved by something like incrementing the value of the namespace key, and any keys referencing the previous namespace value will not match when the key is regenerated.
Your problem looks a little different, but I think that by setting up Key A to be in the Key B "namespace, if a node B was unavailable then calculating Key A's full namespaced key e.g.
"Key A|Key B:<whatever Key B value is>"
will return false, thus allowing you to determine that B is unavailable and invalidate the cache lookup for Key A.

Resources