Is there a way to have the following in a 1-request 1-response way?
given a redis set with key name and for each key has a redis hash with fields
How can I fetch the hashes of those keys in one go?
something like this, which is the best I can do (not sure this will work, but you get the idea)
hashes = []
keys = redis.smembers("myset")
redis.multi do
keys.map do |k|
hashes << redis.hgetall(k)
end
end
hashes = hash.map(&:value) # to resolve future values
but this does at least two requests (which isn't the best but ok), not sure how Redis::Future resolves to value (if it sends another request or not)
No, Redis don't have a command which would return hashes for multiple keys* at once. You're stuck with n + 1 requests - one to get the keys, and one for each key.
You could try to write a Lua script, which to do all that in Redis and return the combined results. See the documentation of the EVAL command for an introduction to scripting in Redis. Note that if you do this, you cannot use a cluster, as keys must be explicitly provided to the script for it to be safe in a cluster.
*MGET does return values for multiple keys, but only if they're strings.
Related
I ran into an odd issue when trying to modify a chef recipe. I have an attribute that contains a large hash of hashes. For each of those sub-hashes, I wanted to add a new key/value to a 'tags' hash within. In my recipe, I create a 'tags' local variable for each of those large hashes and assign the tags hash to that local variable.
I wanted to add a modification to the tags hash, but the modification had to be done at compile time since the value was dependent on a value stored in an input json. My first attempt was to do this:
tags = node['attribute']['tags']
tags['new_key'] = json_value
However, this resulted in a spec error that indicated I should use node.default, or the equivalent attribute assignment function. So I tried that:
tags = node['attribute']['tags']
node.normal['attribute']['tags']['new_key'] = json_value
While I did not have a spec error, the new key/value was not sticking.
At this point I reached my "throw stuff at a wall" phase and used the hash.merge function, which I used to think was functionally identical to hash['new_key'] for a single key/value pair addition:
tags = node['attribute']['tags']
tags.merge({ 'new_key' => 'json_value' })
This ultimately worked, but I do not understand why. What functional difference is there between the two methods that causes one to be seen as a modification of the original chef attribute, but not the other?
The issue is you can't use node['foo'] like that. That accesses the merged view of all attribute levels. If you then want to set things, it wouldn't know where to put them. So you need to lead off by tell it where to put the data:
tags = node.normal['attribute']['tags']
tags['new_key'] = json_value
Or just:
node.normal['attribute']['tags']['new_key'] = json_value
Beware of setting things at the normal level though, it is not reset at the start of each run which is probably what you want here, but it does mean that even if you remove the recipe code doing the set, the value will still be in place on any node that already ran it. If you want to actually remove things, you have to do it explicitly.
I have a lot of javascript objects like:
var obj1 = {"key1" : value1, "key2" : value2, ...}
var obj2 = {"key3" : value3, "key4" : value4, ...}
and so on...
Following are the two approaches :
Store each object as Redis Hash i.e. one-to-one mapping.
Have one Redis Hash(bucketing can be done for better performance), store each object as stringified object in each key of hash i.e. for each object having a key value pair in the Redis Hash. Parse the object when we need to use the object.
1) -> Takes more space than 2) but has better performance than 2)
2) -> Takes less space than 1) but has worse performance than 1)
Is there a way to determine which approach would be better in the long run?
Update: This data is used on the client side (AngularJS), so all parsing of stringified JSON is done in the frontend.
This would probably be solved by deciding which method minimises the number of steps required to extract the required data from redis.
Case 1: Lots of nested objects
If your objects have a lot of nesting, ie objects within objects, like this,
obj = {key1:{key2:value1, key:3{key4:value2}}}
You should probably stringify and store them.
Because Redis does not allow nesting of data structures. You can't store a hash within another hash.
And storing the name of hash2 as a key within hash1 and querying hash2 after getting hash1 and so on is unnecessarily complex and has a lot of queries. In this case all you have to do is get the entire string from Redis and JSON.parse it. and you can get whatever data you want from the Object.
Case 2: No nested objects.
But on the other hand, if there is no nesting of objects and you store it as a string, you have to JSON.parse() every time you get the data from Redis. And parsing JSON is blocking and is CPU intensive. Node.js: does JSON.parse block the event loop?
Redis documentation also says that hashes are encoded in a very small space, so you should try representing your data using hashes every time it is possible. http://redis.io/topics/memory-optimization
So, in this case, you could probably go ahead and store them all as individual hashes as querying a particular value will be a lot easier.
---------Update---------
Even if the JSON parsing is done on the client, try not to do an extra computation needlessly :)
But nested objects are easier to store and query as a string. Otherwise, you'll have to query more than one hash table. In this case storing as stringified object might just be better for performance.
Redis stores small hashes very efficiently. So much that storing multiple small hashmaps is more memory efficient than one big hashmap.
the number of keys deciding about the encoding to use can be found in redis.conf
hash-max-zipmap-entries 512
also the value of each key should be hash-max-zipmap-value 64
So, you can now decide on the basis of nesting of your objects, number of Hash Keys below which Redis is more memory efficient and the value assigned to your keys.
Do go through http://redis.io/topics/memory-optimization
Redis can be used either directly as a key-value store, where the value is string. Or, in more sophisticated way, the value can be a data structure, for example hash or list. Assuming we have the second case and under the key "H", there is a hash. Items can be added to the hash and removed. Eventually, the hash can be empty, and it can be re-populated again.
I have found out that if we remove the last item from the data structure, our hash "H", Redis removes it from current keys, for some reason.
Example:
HSET "H" "key1" "value1"
HSET "H" "key2" "value2"
HDEL "H" "key1"
<-- Here, "H" is in the list of current keys, whereby HLEN returns 1
HDEL "H" "key2"
<-- Here, for some reason, "H" is not shown among existing keys,
not even as an empty hash (HLEN 0)
HSET "H" "key3" "value3"
<-- Hash is back in the list of keys
My question is: Is it possible to configure Redis so it does still keep showing the value (empty hash, in our example) of the given key ("H", in our example) as an empty non-trivial data structure?
Short answer: No
Redis 'creates the hash' when first item is inserted and 'removes the hash' when last item is removed. I'm using Redis 2.8 and there is no option to 'let an empty hash be'.
Addendum: Same is true for Redis 6 as well.
Manu is right. You have no way of doing that.
But if you explain why you would want to do it, then we might help you better. As you know, in Redis you can set an attribute on a Hash even if it doesn't exist previously, so you don't not need to first create the Hash and then set the attributes. With that on mind, there is no need to keep an empty Hash that would just be wasting memory.
What is your use case?
update: after reading your use case, I am improving the answer.
For your problem of "volatile" hashes, you can do something easy. After you run your KEYS (or SCAN) command, you can create a SET containing all the names of the hashes that existed in this iteration. You can call this something like "last_seen_keys". What you want to do now is, after you call KEYS, you create a set that you call "current_keys". Now you just run a diff between the two sets, so you can see which keys were present in the last pass and not in this one. You can set your values in statsd to zero for those keys. After that, you delete the "last_seen_keys" SET and you rename the "current_keys" SET to "last_seen_keys". That should do the trick
I have a collection in Mongo with duplicates on a specific key that I need to remove all but one of. The Map Reduce solutions don't seem to make it clear how to remove all but one of the duplicates. I am using Ruby, how can I do this in a somewhat efficient way? My current solution is unbelievably slow!
I currently just iterate over an array of the duplicate keys and delete the first document that is returned but this only works if there are at most 1 duplicate document for each key and it is really slow.
dupes.each do |key|
$mongodb.collection("some_collection").remove($mongodb.collection("some_collection").find({key: key}).first)
end
I think you should use the MongoDB ensureIndex() to remove the duplicates. For instance, in your case, you want to drop the duplicate documents give the key duplicate_key, you can do
db.duplicate_collection.ensureIndex({'duplicate_key' : 1},{unique: true, dropDups: true})
where duplicate_collection is the collection where your duplicate documents are. This operation will only preserve single document if there are duplicate documents give a particular key.
After the operation, if you think you want to remove the index, just do the dropIndex operation. For details, you can search the mongodb documentation.
A lot of solutions suggest Map Reduce (which is fast and fine) but I implemented a solution in Ruby that seems pretty fast as well and makes it easy to leave the one document from each duplicate set.
Basically you find all your duplicate keys by adding them to a hash and any time you find a duplicate key in the collection you add the id of that document to an array which you will use in a bulk removal at the end.
all_keys = {}
dupes = []
dupe_key = "some_key"
$mongodb.collection("some_collection").find.each do |doc|
all_keys[doc[dupe_key]].present? ? dupes << doc["_id"] : asins[doc[dupe_key]] = 1
end
$mongodb.collection("some_collection").remove({_id: {"$in" => dupes } })
The only issue with this method is that it potentially won't work if the total list of keys/dupe ids can't be stored in memory. The map reduce solution would probably be best at that point.
I installed *memcache_client* GEM Ruby from http://seattlerb.rubyforge.org/memcache-client/
It's easy to get a single value:
cache.get('foo', 'bar')
How to get all values, starting with 'foo', for example foo_1, foo_2, foo_3, foo_* ?
Something like "SELECT * FROM foo", but for Memcached.
There will be about 10 000 "foo_n" entries.
Not a perfect solution, but look at the get_multi function:
keys = (1..10_000).map{ |n| "foo_#{n}" }
data = cache.get_multi(*keys)
Unfortunately memcached doesn't support regex key lookups, or even let you get a list of all the keys to process on your own. One alternative would be to use Redis which can get a list of keys using a glob style pattern.
Might want to look at Redis as an alternative to memcache. It supports lists, sets, sorted sets and hashes. http://code.google.com/p/redis/