I have a lot of javascript objects like:
var obj1 = {"key1" : value1, "key2" : value2, ...}
var obj2 = {"key3" : value3, "key4" : value4, ...}
and so on...
Following are the two approaches :
Store each object as Redis Hash i.e. one-to-one mapping.
Have one Redis Hash(bucketing can be done for better performance), store each object as stringified object in each key of hash i.e. for each object having a key value pair in the Redis Hash. Parse the object when we need to use the object.
1) -> Takes more space than 2) but has better performance than 2)
2) -> Takes less space than 1) but has worse performance than 1)
Is there a way to determine which approach would be better in the long run?
Update: This data is used on the client side (AngularJS), so all parsing of stringified JSON is done in the frontend.
This would probably be solved by deciding which method minimises the number of steps required to extract the required data from redis.
Case 1: Lots of nested objects
If your objects have a lot of nesting, ie objects within objects, like this,
obj = {key1:{key2:value1, key:3{key4:value2}}}
You should probably stringify and store them.
Because Redis does not allow nesting of data structures. You can't store a hash within another hash.
And storing the name of hash2 as a key within hash1 and querying hash2 after getting hash1 and so on is unnecessarily complex and has a lot of queries. In this case all you have to do is get the entire string from Redis and JSON.parse it. and you can get whatever data you want from the Object.
Case 2: No nested objects.
But on the other hand, if there is no nesting of objects and you store it as a string, you have to JSON.parse() every time you get the data from Redis. And parsing JSON is blocking and is CPU intensive. Node.js: does JSON.parse block the event loop?
Redis documentation also says that hashes are encoded in a very small space, so you should try representing your data using hashes every time it is possible. http://redis.io/topics/memory-optimization
So, in this case, you could probably go ahead and store them all as individual hashes as querying a particular value will be a lot easier.
---------Update---------
Even if the JSON parsing is done on the client, try not to do an extra computation needlessly :)
But nested objects are easier to store and query as a string. Otherwise, you'll have to query more than one hash table. In this case storing as stringified object might just be better for performance.
Redis stores small hashes very efficiently. So much that storing multiple small hashmaps is more memory efficient than one big hashmap.
the number of keys deciding about the encoding to use can be found in redis.conf
hash-max-zipmap-entries 512
also the value of each key should be hash-max-zipmap-value 64
So, you can now decide on the basis of nesting of your objects, number of Hash Keys below which Redis is more memory efficient and the value assigned to your keys.
Do go through http://redis.io/topics/memory-optimization
Related
Say I want to store user preferences...something simple like this:
{
"favoriteColor": "green",
"bestFriends": [
"Tom",
"Jenny",
"Horton"
]
}
What's the best, most performant way to store this in redis cache (optimized for reads)?
Imagine UserId = 123
NOTE: Below I'm using the Redis documentation's way of representing the various structures. See here.
Simple, flat, key/value pairs right in the root?
user-123-favoriteColor = green (this is a STRING type)
user-123-bestFriends =
1) "bestFriends" (SET TYPE)
2) "Tom"
3) "Jenny"
4) "Horton"
Hierarchical structure (hash of values)
user-123 =
1) "favoriteColor" (STRING type)
2) "green"
3) "bestFriends" (SET TYPE)
4) "Tom"
5) "Jenny"
6) "Horton"
And a related question...is there any reason not to store user preferences in redis vs the domain sql database?
And one more related question...is it a bad idea to store all users under one root key called "users"?
Hierarchical structure should be preferred.
This answer gives a lot of explanation and helped me.
From the horses mouth:
Use hashes when possible
Small hashes are encoded in a very small space, so you should try representing your data using hashes every time it is possible. For instance if you have objects representing users in a web application, instead of using different keys for name, surname, email, password, use a single hash with all the required fields.
If you want to know more about this, read the next section.
So the answer is yes, use hashes where possible.
The answer here is a good way to do it.
I'm a bit new to Redis, so please forgive if this is basic.
I'm working on an app that sends automatic replies to users for certain events. I would like to use Redis to store who has received what event.
Essentially, in ruby, the data structure could look like this where you have a map of users to events and the dates that each event was sent.
{
"mary#example.com" => {
"sent_comment_reply" => ["12/12/2014", "3/6/2015"],
"added_post_reply" => ["1/4/2006", "7/1/2016"]
}
}
What is the best way to represent this in a Redis data structure so you can ask, did Mary get a sent_comment_reply? and if so, when was the latest?
In short, the question is, how(if possible) can you have a Hash structure that holds an array in Redis.
The rationale as opposed to using a set or list with a compound key is that hashes have O(1) lookup time, whereas lookups on lists(lrange) and sets(smembers) will be O(s+n) and sets O(n), respectively.
One way of structuring it in Redis, depending on the idea that you know the events of the user and you want the latest to be fresh in memory :
A sorted set per user. the content of the sorted set will be event codes; sent_comment_reply, added_post_reply with the score of the latest event as the highest. you can use ZRANK to get the answer for the question :
Did Mary get a sent_comment_reply?
A hash also for the user, this time you will have the field as the event sent_comment_reply and the value is the content of it which should be updated with the latest value including the body, date, etc. this will answer the question:
and if so, when was the latest?
Note: Sorted sets are really fast , and in this example we are depending on the events as the data.
With sorted sets you can add, remove, or update elements in a very
fast way (in a time proportional to the logarithm of the number of
elements). Since elements are taken in order and not ordered
afterwards, you can also get ranges by score or by rank (position) in
a very fast way. Accessing the middle of a sorted set is also very
fast, so you can use Sorted Sets as a smart list of non repeating
elements where you can quickly access everything you need: elements in
order, fast existence test, fast access to elements in the middle!
A possible approach to use a hash to map an array is as follows:
add_element(key , value):
len := redis.hlen(key)
redis.hset(key , len , value)
this will map array[i] element to i field in a hash key.
this will work for some cases, but I would probably go with the answer suggested in https://stackoverflow.com/a/34886801/2868839
For example, key 1 will have values "A","B","C" but key 2 will have value "D". If I use
Map<String, List<String>>
I need to populate the List<String> even when I have only single String value.
What data structure should be used in this case?
Map<String,List<String>> would be the standard way to do it (using a size-1 list when there is only a single item).
You could also have something like Map<String, Object> (which should work in either Java or presumably C#, to name two), where the value is either List<String> or String, but this would be fairly bad practice, as there are readability issue (you don't know what Object represents right off the bat from seeing the type), casting happens during runtime, which isn't ideal, among other things.
It does however depend what type of queries you plan to run. Map<String,Set<String>> might be a good idea if you plan of doing existence checks in the List and it can be large. Set<StringPair> (where StringPair is a class with 2 String members) is another consideration if there are plenty of keys with only 1 mapped value. There are plenty of solutions which would be more appropriate under various circumstances - it basically comes down to looking at the type of queries you want to perform and picking an appropriate structure according to that.
I have a list of objects that are returned as a sequence, I would like to retrieve the keys of each object so as to be able to display the object correctly. At the moment I try data?first?keys which seems to get something like the queries that return the objects (Not sure how to explain that last sentence either but img below shows what I'm trying to explain).
The objects amount of objects returned are correct (7) but displaying the keys for each object is my aim. The macro that attempts this is here (from the apache ofbiz development book chapter 8).
Seems like it my sequence is a list of hashes and as explained by Daniel Dekany this post:
The original problem is that, someHash[key] expects a
string as key. Because, the hash type of FTL, by definition, maps
string keys to arbitrary values. It's not the same as Java's Map.
(Note that to further complicate the matters, in FTL
someSequenceOrString[index] expects an integer index. So, the [] thing
is used for that too.) Now someBeanWrappedMap(key) has technically
nothing to do with all the []-s, it's just a method call, so it
accepts all kind of keys. If you have a Map with non-string keys, you
must use that.
Thanks D Dekany if you're on stack, this ended my half day frustration with the ftl template.
I am reading the source code of MapRedcue to gain more understanding MapReduce's internal mechanism. And I have problem when trying to understand how data produced in map phase are merged and sent to reduce function for further processing. The source code looks too complicated and I just want to know its concepts.
What I want to know is how the values (as parameter Iterator) are sorted before passing to reduce() function. Within MapTask.runOldReducer() it will create ReduceValuesIterator by passing RawKeyValueIterator, where Merger.merge() will get called and lots of actions will be performed (e.g. collect segments). After reading code, it seems to me it only tries to sort by key and the values accompanied with that key will be aggregated/ collected without being removed. For instance, map() may produce
Key Value
http://www.abcfood.com/aLink object A
http://www.abcfood.com/bLink object B
http://www.abcfood.com/cLink object C
Then in reduce(),
Key will be http://www.abcfood.com/ and Values will contain object A, object B, and object C.
So it is sorted by the key http://www.abcfood.com/? Is this correct? Or what is it sorted and then passed to reduce function?
Many thanks.
assuming this is your input :
Key Value
http://www.example.com/asd object A
http://www.abcfood.com/aLink object A
http://www.abcfood.com/bLink object B
http://www.abcfood.com/cLink object C
http://www.example.com/t1 object X
the reducer will get this : (there is no guarantee on order of values)
Key Values
http://www.abcfood.com/ [ "object A", "object C", "object B" ]
http://www.example.com/ [ "object X", "object A" ]
So is there any possibility to get ordered values in reducer?
I need to work with sorted values (calculate difference between values passed with key). I've met the problem :)
http://cornercases.wordpress.com/2011/08/18/hadoop-object-reuse-pitfall-all-my-reducer-values-are-the-same/
I understand that it's bad to COPY values in reducer and then order them. I can get memory overflow. Il'll be better to sort values is some way BEFORE passing KEY + Interable to reducer.