sort set in redis with hash in the collection - sorting

We have created 3 hash in redis using the REPL redis-cli in this way:
hmset redishop:items:Articulo1 artist "Martin Wessely" price 12.99 name "Handcrafted Trees Mug"
hmset redishop:items:Articulo2 artist "Martin Wessely" price 13.99 name "Handcrafted Trees Mug"
hmset redishop:items:Articulo3 artist "Martin Wessely" price 14.99 name "Handcrafted Trees Mug"
I check the structures are created OK in redis and these are there:
hgetall redishop:items:Articulo3
Now we add the hash in a set in this way:
sadd redishop:list-all redishop:items:Articulo3
sadd redishop:list-all redishop:items:Articulo2
sadd redishop:list-all redishop:items:Articulo1
Now we are playing with the command SORT:
SORT redishop:list-all BY redishop:items:*->price
SORT redishop:list-all BY redishop:items:*->price GET redishop:items:*->price
SORT redishop:list-all BY redishop:items:*->price GET # GET redishop:items:*->price
We never get results, the hash in the set are with value null and I dont understand why?
by other hand if we create the hash and set in this other way:
multi
hmset redishop:items:Articulo1 artist "Martin Wessely" price 12.99 name "Handcrafted Trees Mug"
sadd redishop:list-all Articulo1
hmset redishop:items3:Articulo2 artist "Martin Wessely" price 13.99 name "Handcrafted Trees Mug"
sadd redishop:list-all Articulo2
hmset redishop:items3:Articulo3 artist "Martin Wessely" price 14.99 name "Handcrafted Trees Mug"
sadd redishop:list-all Articulo3
exec
In this way the command SORT works perfectly and the hash are inserted in the set, But I dont understand why in base of redis documentation:
The command multi only Marks the start of a transaction block. Subsequent commands will be queued for atomic execution using EXEC.
When I create the hash with the key key:key:key is indifferent if I use : or , or - and the most important in redis we are not creating a tree of structures according the documentation:
https://redis.io/topics/data-types-intro
They tell you is better or a good way include : or dots, but They don tell you he is creating a structures tree. And then I dont understadn why when you add the hash in the set if type Articulo1 instead of redishop:items:Articulo1 is Ok but in the oher case is wrong???? in fact when you type hgetall Articulo1 you receive a null but When you type hgetall redishop:items:Articulo1 you get all the fels an values.. it is so much strange.
exec only executes all the sentences, for these reason should be the same make it with multi or without multi.
Please Any help or explanation on the subject would be of great help.
Thanks in advance.

Now we are playing with the command SORT
Beware of SORT's time complexity and memory requirements, I usually recommend against using it.
We never get results, the hash in the set are with value null and I dont understand why?
The problem lies with how you call SORT and specify the GET and BY clauses. Since your Set's members are the complete (Hash) key names, here's how you should do it with your example data:
127.0.0.1:6379> SORT redishop:list-all BY *->price
1) "redishop:items:Articulo1"
2) "redishop:items:Articulo2"
3) "redishop:items:Articulo3"
127.0.0.1:6379> SORT redishop:list-all BY *->price GET *->price
1) "12.99"
2) "13.99"
3) "14.99"
In this way the command SORT works perfectly
In this case you're populating the Set with only the "id" part of the key names, so the GET and BY clauses map to actual data. To clarify, this has nothing to do with the use (or lack) of MULTI blocks.

Related

Is getting data back from Redis SETS faster or more performant than HSETS?

I currently have a scenario where we are using REDIS to store string field-value pairs within a hashed set HSET.
The original reasoning behind using hashed sets instead of just sets is ease of retrieving the records using HSCAN inside a GUI Search Bar as opposed to just SCAN because it's easier to get the length of a hash to use in the COUNT field.
I read in the Redis documentation that both GET and HGET commands execute with O(1) time complexity, but a member of my team thinks that if I store all the values inside a single key then it basically returns the entire hash during HGET instead of the singular field-value that I need.
So for a made up but similar example:
I have a Redis instance with a single Hashed Set called users.
The hashed set has 150,000 field:value pairs of username:email
If when I execute hget users coolguy, is the entire hash getting returned or just the email for user coolguy?
First of all, HSET is not a hash set, it creates a hash table. The mechanism behind the hash table and set (which is indeed a hash set) in redis is the same, the difference is mainly that the hash table has values.
To answer your question:
If when I execute hget users coolguy, is the entire hash getting returned or just the email for user coolguy?
Just the email for that user. You can also use HMGET to get the emails of multiple users at once. It's O(1) for each user you fetch, or O(n) for n users.

Sort redis cache

How can i sort my redis cache.
The data:
SADD key '{"id":250,"store_id":3,"url_path":"\/blog\/testblog123123",
"status":"Published","title":"TestBlog123123",
'"description":"","image":null,"description_2":"",
"date":"2017-04-17","blogcategory":"Category 3"}'
Next I need to sort my KEY by id.
This works:
SORT key BY *->id DESC
... but only when:
id > 10
because redis sort ONLY first number.
Maybe I should use another command to add, but I need JSON format.
You could use a sorted set from scratch?
ZADD key 250 '{"id":250,"store_id":3,"url_path":"\/blog\/testblog123123",
"status":"Published","title":"TestBlog123123",
'"description":"","image":null,"description_2":"",
"date":"2017-04-17","blogcategory":"Category 3"}'
I am also not sure why to use Set here at all, because uniqueness of a set element will only be guaranteed for the whole JSON string. And if your JSON serializer changes order of two fields in JSON dict, it will produce another string which is unique again and you'll end up with a dangling old string. Same applies, if you add more fields to the string.

Using SORT command to get HASH fields from a sorted set in Redis

For example, in redis-cli I've tried to create a sorted set like this:
zadd sortedset 1 1 2 2 3 3
And I've created a hash like this:
hset data 1 hello
hset data 2 goodbye
hset data 3 sir
My goal is storing identifiers in sorted sets and get strings stored in data hash sorted by the sorted set ordering.
This is what I've tried so far:
sort sortedset by nosort get data->*
...which outputs:
1) (nil)
2) (nil)
3) (nil)
Actually I was expecting that * wildcard should be one of identifiers stored in the so-called sorted set, but it seems like it doesn't perform the substitution to each concrete identifier in the sorted set.
Am I trying to solve the issue in the right way or is there another approach to solve this?
Essentially, you are right but the current implementation of the SORT command only accepts wildcards on the left side of the hash dereference (see lookupKeyByPattern in sort.c). That being the way it is atm, instead of SORT, use a Lua script to this. For example, here's a dirty quick one:
$ redis-cli eval "return redis.call('HMGET', KEYS[2], unpack(redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', '+inf')))" 2 sortedset data
1) "hello"
2) "goodbye"
3) "sir"
I've found that it's an use case that's not actually covered by Redis for now.
Anyway, there's an alternate approach: a combination of sorted sets and hmget.
If I store identifiers in a sorted set and I get them using rank ranges with zrange, it's easy to get paged results from a hash using hmget giving multiple hash keys.

LevelDB: Iterate keys by insertion order

What's a good strategy for generating auto-incrementing keys in LevelDB? My goal is to be able to iterate over the keys in the order that they were inserted.
two methods:
use the default comparator, but use a function to convert the index key '1' to something like '000000001', convert '20' to '000000020', so leveldb will place them near each other;
self define a new comparator, which convert the key from type string to type integer, then you can compare the integer.
with any of the above 2 methods, you need to store a key-value pair in the leveldb: current_id ----> integer, or you can store the current id in a new file using mmap.
then, with yourself defined Add() function, after you get the current id from key current_id you can insert a new key-value pair: id ----> value, then you can update the current_id to plus one.
Since a LevelDB instance can only be accessed from one application at a time, you might as well use a 64-bit long and increment it in the application. When opening the DB (and before allowing any writes), to find the last inserted key you can use the SeekToLast() method of the Iterator.
As I just pointed out in a question on integer keys, if you want to use binary integers you need to create a custom Comparator for the database, otherwise you don't get them in ascending binary order. It's not hard but you may have overlooked the need.
I'm not quite sure what you're asking. If the only data you are adding is keys which are supposed to record an entry as a log then yes, just use an integer key.
However, if you are inserting keys you are going to search for some other reason PLUS you want to later iterate them in insertion order, it gets a bit more complex.
Basically you want to insert two keys for each key value, using a prefix to determine whether keys are "value keys" or "ordering keys". e.g., say you have Frank, John, Sally and Amy as keys and use prefix ~N for Name keys and ~I for Iterator keys.
The database looks like the following, note that the "Iterator keys" don't have a value associated with them as we can just get the names out of the key. I've shown it as if you used a string of two digits for the number, rather than using an integer value and needing a special Comparator.
~I00Frank
~I01John
~I02Sally
~I03Amy
~NAmy => Amy's details
~NFrank => frank's details
~NJohn => John's details
~NSally => Sally's details

Parallelized record combining - matching on multiple keys

I have been looking at using MapReduce to build a parallelized record combining system. The language doesn't matter, I can use a pre-existing library such as Hadoop or build my own if necessary, I'm not worried about that.
The problem that I keep running into, however, is that I need the records to be matched on multiple criteria. For example: I may need to match the records based on person's name or the person's phone number, but not necessarily the person's name and phone number.
For instance, given the following keys for each record:
'John Smith' and '555-555-5555'
'Jane Smith' and '555-555-5555'
'John Smith' and '555-555-1111'
I want the system to take all three records, figure out that they match on one of the keys, and combine them into a single combined record that has both names ('John Smith' and 'Jane Smith') as well as both phone numbers ('555-555-5555' and '555-555-1111').
Is this something that I can accomplish using MapReduce? If so, how would I go about matching the keys produced by the Map function so that all of the matched records can be passed into the Reduce function.* Alternatively, is there a different/better way I could be doing this? My only real requirement is that I need it parallelized.
[*] Please note: I am assuming that the Reduce function could be used in such a way that each call to the Reduce function produces a single combined record, rather than the Reduce function producing a single result for the entire job.
You can definitely do this in the map/reduce paradigm.
Let's say that you're matching on anything containing "smith" or phone numbers starting with "555". You would canonicalize your search string into "smith|^555", for example. In the Map phase, you would do:
John Smith / 555-555-5555 → K: smith|^555, V = (John Smith,555-555-5555)
Jane Doe / 555-555-5555 → K: smith|^555, V = (Jane Doe,555-555-5555)
John Smith / 555-555-1111 → K: smith|^555, V = (John Smith,555-555-1111)
Since you've given them all the same key ("smith|^555") they will all be handed off to the same reducer instance, which would now get, as input:
K: smith|^555, V: [(John Smith,555-555-5555),(Jane Smith,555-555-5555),(John Smith,555-555-1111))
Now, in your reducer step, you can instantiate a hashset for names and another one for numbers, and then when done processing the array of values, output all the keys from the names hashset and all the keys from the numbers hashset.
I don't think Map is useful here, because you can't really create a meaningful key for each record that will help identify the groupings of records.
It is not possible to implement this using Reduce either. Consider the example you yourself gave... If you query for 'Jane Smith', you cannot detect at the time that the first record is related to the query and so will ignore it. In fact you could end up chaining names and numbers together until you've got every record in the file. The only way to pick up all the matches is iteratively scan over the list until you stop finding new links.
This is very easy to parallelize though, you can just share out the records amongst some number of threads, and each can search its own records for new links. I'd suggest treating these sets as rings of data, so that you can record the point you were searching with the most up to date information, and you know you're finished once all threads have done a complete loop.

Resources