Keep empty data-keys in Redis - data-structures

Redis can be used either directly as a key-value store, where the value is string. Or, in more sophisticated way, the value can be a data structure, for example hash or list. Assuming we have the second case and under the key "H", there is a hash. Items can be added to the hash and removed. Eventually, the hash can be empty, and it can be re-populated again.
I have found out that if we remove the last item from the data structure, our hash "H", Redis removes it from current keys, for some reason.
Example:
HSET "H" "key1" "value1"
HSET "H" "key2" "value2"
HDEL "H" "key1"
<-- Here, "H" is in the list of current keys, whereby HLEN returns 1
HDEL "H" "key2"
<-- Here, for some reason, "H" is not shown among existing keys,
not even as an empty hash (HLEN 0)
HSET "H" "key3" "value3"
<-- Hash is back in the list of keys
My question is: Is it possible to configure Redis so it does still keep showing the value (empty hash, in our example) of the given key ("H", in our example) as an empty non-trivial data structure?

Short answer: No
Redis 'creates the hash' when first item is inserted and 'removes the hash' when last item is removed. I'm using Redis 2.8 and there is no option to 'let an empty hash be'.
Addendum: Same is true for Redis 6 as well.

Manu is right. You have no way of doing that.
But if you explain why you would want to do it, then we might help you better. As you know, in Redis you can set an attribute on a Hash even if it doesn't exist previously, so you don't not need to first create the Hash and then set the attributes. With that on mind, there is no need to keep an empty Hash that would just be wasting memory.
What is your use case?
update: after reading your use case, I am improving the answer.
For your problem of "volatile" hashes, you can do something easy. After you run your KEYS (or SCAN) command, you can create a SET containing all the names of the hashes that existed in this iteration. You can call this something like "last_seen_keys". What you want to do now is, after you call KEYS, you create a set that you call "current_keys". Now you just run a diff between the two sets, so you can see which keys were present in the last pass and not in this one. You can set your values in statsd to zero for those keys. After that, you delete the "last_seen_keys" SET and you rename the "current_keys" SET to "last_seen_keys". That should do the trick

Related

Redis : Pros and Cons for following two approaches

I have a lot of javascript objects like:
var obj1 = {"key1" : value1, "key2" : value2, ...}
var obj2 = {"key3" : value3, "key4" : value4, ...}
and so on...
Following are the two approaches :
Store each object as Redis Hash i.e. one-to-one mapping.
Have one Redis Hash(bucketing can be done for better performance), store each object as stringified object in each key of hash i.e. for each object having a key value pair in the Redis Hash. Parse the object when we need to use the object.
1) -> Takes more space than 2) but has better performance than 2)
2) -> Takes less space than 1) but has worse performance than 1)
Is there a way to determine which approach would be better in the long run?
Update: This data is used on the client side (AngularJS), so all parsing of stringified JSON is done in the frontend.
This would probably be solved by deciding which method minimises the number of steps required to extract the required data from redis.
Case 1: Lots of nested objects
If your objects have a lot of nesting, ie objects within objects, like this,
obj = {key1:{key2:value1, key:3{key4:value2}}}
You should probably stringify and store them.
Because Redis does not allow nesting of data structures. You can't store a hash within another hash.
And storing the name of hash2 as a key within hash1 and querying hash2 after getting hash1 and so on is unnecessarily complex and has a lot of queries. In this case all you have to do is get the entire string from Redis and JSON.parse it. and you can get whatever data you want from the Object.
Case 2: No nested objects.
But on the other hand, if there is no nesting of objects and you store it as a string, you have to JSON.parse() every time you get the data from Redis. And parsing JSON is blocking and is CPU intensive. Node.js: does JSON.parse block the event loop?
Redis documentation also says that hashes are encoded in a very small space, so you should try representing your data using hashes every time it is possible. http://redis.io/topics/memory-optimization
So, in this case, you could probably go ahead and store them all as individual hashes as querying a particular value will be a lot easier.
---------Update---------
Even if the JSON parsing is done on the client, try not to do an extra computation needlessly :)
But nested objects are easier to store and query as a string. Otherwise, you'll have to query more than one hash table. In this case storing as stringified object might just be better for performance.
Redis stores small hashes very efficiently. So much that storing multiple small hashmaps is more memory efficient than one big hashmap.
the number of keys deciding about the encoding to use can be found in redis.conf
hash-max-zipmap-entries 512
also the value of each key should be hash-max-zipmap-value 64
So, you can now decide on the basis of nesting of your objects, number of Hash Keys below which Redis is more memory efficient and the value assigned to your keys.
Do go through http://redis.io/topics/memory-optimization

Redis pipeline as atomic

Is there a way to have the following in a 1-request 1-response way?
given a redis set with key name and for each key has a redis hash with fields
How can I fetch the hashes of those keys in one go?
something like this, which is the best I can do (not sure this will work, but you get the idea)
hashes = []
keys = redis.smembers("myset")
redis.multi do
keys.map do |k|
hashes << redis.hgetall(k)
end
end
hashes = hash.map(&:value) # to resolve future values
but this does at least two requests (which isn't the best but ok), not sure how Redis::Future resolves to value (if it sends another request or not)
No, Redis don't have a command which would return hashes for multiple keys* at once. You're stuck with n + 1 requests - one to get the keys, and one for each key.
You could try to write a Lua script, which to do all that in Redis and return the combined results. See the documentation of the EVAL command for an introduction to scripting in Redis. Note that if you do this, you cannot use a cluster, as keys must be explicitly provided to the script for it to be safe in a cluster.
*MGET does return values for multiple keys, but only if they're strings.

How do I find and remove duplicate mongo documents with ruby

I have a collection in Mongo with duplicates on a specific key that I need to remove all but one of. The Map Reduce solutions don't seem to make it clear how to remove all but one of the duplicates. I am using Ruby, how can I do this in a somewhat efficient way? My current solution is unbelievably slow!
I currently just iterate over an array of the duplicate keys and delete the first document that is returned but this only works if there are at most 1 duplicate document for each key and it is really slow.
dupes.each do |key|
$mongodb.collection("some_collection").remove($mongodb.collection("some_collection").find({key: key}).first)
end
I think you should use the MongoDB ensureIndex() to remove the duplicates. For instance, in your case, you want to drop the duplicate documents give the key duplicate_key, you can do
db.duplicate_collection.ensureIndex({'duplicate_key' : 1},{unique: true, dropDups: true})
where duplicate_collection is the collection where your duplicate documents are. This operation will only preserve single document if there are duplicate documents give a particular key.
After the operation, if you think you want to remove the index, just do the dropIndex operation. For details, you can search the mongodb documentation.
A lot of solutions suggest Map Reduce (which is fast and fine) but I implemented a solution in Ruby that seems pretty fast as well and makes it easy to leave the one document from each duplicate set.
Basically you find all your duplicate keys by adding them to a hash and any time you find a duplicate key in the collection you add the id of that document to an array which you will use in a bulk removal at the end.
all_keys = {}
dupes = []
dupe_key = "some_key"
$mongodb.collection("some_collection").find.each do |doc|
all_keys[doc[dupe_key]].present? ? dupes << doc["_id"] : asins[doc[dupe_key]] = 1
end
$mongodb.collection("some_collection").remove({_id: {"$in" => dupes } })
The only issue with this method is that it potentially won't work if the total list of keys/dupe ids can't be stored in memory. The map reduce solution would probably be best at that point.

thrust::sort_by_key: How to store result in separate array?

I am currently sorting values by key the following way
thrust::sort_by_key(thrust::device_ptr<int>(keys),
thrust::device_ptr<int>(keys + numKeys),
thrust::device_ptr<int>(values);
which sorts the "values" array according to "keys".
Is there a way to leave the the "values" array untouched and instead store the result of sorting "values" in a separate array?
Thanks in advance.
There isn't a direct way to do what you are asking. You have two options to functionally achieve the same thing.
The first is make a copy of the values array before the call, leaving you with a sorted and unsorted version of the original data. So your example becomes
thrust::device_vector<int> values_sorted(thrust::device_ptr<int>(values),
thrust::device_ptr<int>(values + numKeys));
thrust::sort_by_key(thrust::device_ptr<int>(keys),
thrust::device_ptr<int>(keys + numKeys),
values_sorted.begin());
The second alternative is not to pass the values array to the sort at all. Thrust has a very useful permutation iterator which allows for seamless permuted access to an array without modifying the order in which that array is stored (so an iterator based gather operation, if you will). To do this, create an index vector and sort that by key instead, then instantiate a permutation iterator with that sorted index, something like
typedef thrust::device_vector<int>::iterator iit;
thrust::device_vector<int> index(thrust::make_counting_iterator(int(0)),
thrust::make_counting_iterator(int(numKeys));
thrust::sort_by_key(thrust::device_ptr<int>(keys),
thrust::device_ptr<int>(keys + numKeys),
index.begin());
thrust::permutation_iterator<iit,iit> perm(thrust::device_ptr<int>(values),
index.begin());
Now perm will return values in the keys sorted order held by index without ever changing the order of the original data.
[standard disclaimer: all code written in browser, never compiled or tested. Use at own risk]

CouchDB - Querying array key value for first key element only

I have a couchdb view set up using an array key value, in the format:
[articleId, -timestamp]
I want to query for all entries with the same article id. All timestamps are acceptable.
Right now I am using a query like this:
?startkey=["A697CA3027682D5JSSC",-9999999999999]&endkey=["A697CA3027682D5JSSC",0]
but I would like something a bit simpler.
Is there an easy way to completely wildcard the second key element? What would be the simplest syntax for this?
First, as a comment pointed out, there is indeed a special value {} that is ordered after any value, so your query becomes:
startkey=["target ID"]&endkey=["target ID",{}]
This is as equivalent to a wildcard match.
As a side note, there is no need to reverse the ordering in the map function by emitting a negative timestamp, you can reverse the order as an option to the view invocation (your start and end key will be swapped).
startkey=["target ID",{}]&endkey=["target ID"]&descending=true
For future reference, in CouchDB 3 you can use "\ufff0" instead of {}, which would be ordered after a string or number, but before an object.
From the CouchDB 3 docs:
Beware that {} is no longer a suitable “high” key sentinel value. Use a string like "\ufff0" instead.
The query startkey=["foo"]&endkey=["foo",{}] will match most array keys with “foo” in the first element, such as ["foo","bar"] and ["foo",["bar","baz"]]. However it will not match ["foo",{"an":"object"}]

Resources