Which can I choose: Redis TimeSeries or Sorted sets? - stackexchange.redis

I need to insert and get the below key value pairs in Redis. As I am new to Redis, which can I choose: either Redis timeseries or sortedsets for below set of data:
For BoxNo - 1 below key value pairs:
key1- value1
key2- value2
key3 -value3
. . . . 200key values
For BoxNo-2 below value pairs:
key4 - value4
key5- value5
. . . . 150 key values
The above data need to insert in Redis as key value pairs and I want to retrieve key-value pairs every 15 mins from Redis.
As per my understanding regarding Redis timeseries, datastructure for my data can be like:
Label boxno -1
key key1 value -value1
However in timeseries need to create a timeseries with key every time.
Is my understanding correct?

Related

Redis: Sort hash by fields and get values

lets assume I have hash data type where fields are uuids and values are some strings
HSET myhash d1663228-3ce9-4651-8931-9f929df1778b test_data
HSET myhash 05c16eb1-fb7b-4732-a94c-ed9a2745101e test_other_data
and in separate sorted set I do have uuids stored with scores
ZADD myzset 1 05c16eb1-fb7b-4732-a94c-ed9a2745101e
ZADD myzset 2 d1663228-3ce9-4651-8931-9f929df1778b
What I want is to get all values from myhash sorted by fields scores from myzset
the expected output is
1) test_other_data
2) test_data

Query a table with primary key and two conditions on sort key

I'm trying to query a dynamodb table using the partition key and a sort key. The sort key is a unix date, so I want to request x partition key between these 2 dates on the sort. I am currently able to achieve this with a table scan, but I have to move this to a query for the speed benefit. I am unable to find any decent examples online of people using a partition key and sort key to query their table.
I have carefully read through this https://docs.aws.amazon.com/sdk-for-go/api/service/dynamodb/#DynamoDB.Query and understand that my params must go within the KeyConditionExpression.
I have read through https://github.com/aws/aws-sdk-go/blob/master/service/dynamodb/expression/examples_test.go and understand it on the whole. But I just can't find the syntax for KeyConditionExpression
I'd have thought it was something like this:
keyCond := expression.Key("accountId").
Equal(expression.Value(accountId)).
And(expression.Key("sortKey").
Between(expression.Value(fromDateDec), expression.Value(toDateDec)))
But this throws:
ValidationException: Invalid KeyConditionExpression: Incorrect operand type for operator or function; operator or function: BETWEEN, operand type: NULL
First you need KeyAnd to combine Hash Key and sort key condition.
// keyCondition represents the key condition where the partition key
// "TeamName" is equal to value "Wildcats" and sort key "Number" is equal
// to value 1
keyCondition := expression.KeyAnd(expression.Key("TeamName").Equal(expression.Value("Wildcats")), expression.Key("Number").Equal(expression.Value(1)))
Now instead equal condition you can replace with your between condition as follows
// keyCondition represents the boolean key condition of whether the value
// of the key "foo" is between values 5 and 10
keyCondition := expression.KeyBetween(expression.Key("foo"), expression.Value(5), expression.Value(10))

Composite key / secondary indexing strategy in Redis

Say I have some data of the following sort that I want to store in Redis:
* UUID
* State (e.g. PROCESSED, WAITING_FOR_RESPONSE)
* […] other vals
The UUID or the State are the only two variables that I will ever need to query on
What data structure in Redis is most suited for this?
How would I go about structuring the keys?
Okay, not sure I understand completely but going to try to go with it.
Assuming you need to look up all entities with state PROCESSED you can use sets for these:
SADD PROCESSED 123-abcd-4567-0000
Then you can easily find all entities with PROCESSED state. You'll do same for each state you want.
SMEMBERS PROCESSED
Now you'll also then want to have a hash for all your entities and its values:
HSET 123-abcd-4567-0000 state PROCESSED
HSET 123-abcd-4567-0000 otherproperty valuedata
This will set the "state" in the hash for the UUID to be processed (you'll need to figure out how to keep these in sync, you can use scripts or just handle in your code)
So in summary you have 2 major structures:
Sets to store your State to UUID information
Thus you have 1 set per state
Hashes to store the UUID to properties information
Thus you have 1 hash PER entity
Example Hash
123-abcd-4567-0000 => { state: PROCESSED, active: true }
987-zxy-1234-0000 => { state: PROCESSED, active: false }
But please clarify more if this doesn't seem to fit.
If you want to reduce your key space since the Hashes per entity can be much you can create a hash per attribute instead:
HSET states 123-abcd-4567-0000 PROCESSED
Thus you have a hash per attribute and your key is the UUID and value is the value of the property which is the hash key.
Example Hash
state => { 123-abcd-4567-0000: PROCESSED, 987-zxy-1234-0000: PROCESSED }
active => { 123-abcd-4567-0000: true, 987-zxy-1234-0000: false }
RediSearch (a Redis module) supports adding secondary index to existing data in Redis like Hash.
After setting the schema for the fiels you would like to index you can easily search based on these fields values.
e.g.
127.0.0.1:6379> FT.CREATE myIdx ON HASH PREFIX 1 doc: SCHEMA title TEXT
127.0.0.1:6379> hset doc:1 title "mytitle" body "lorem ipsum" url "http://redis.io"
(integer) 3
127.0.0.1:6379> FT.SEARCH myIdx "#title:mytitle" LIMIT 0 10
1) (integer) 1
2) "doc:1"
3) 1) "title"
2) "mytitle"
3) "body"
4) "lorem ipsum"
5) "url"
6) "http://redis.io"

Map IDs to matrix rows in Hadoop/MapReduce

I have data about users buying products. I want to create a binary matrix of size |users| x |products| such that the element (i,j) in the matrix is 1 iff user_i has bought product_j, else the value is 0.
Now, my data looks something like
userA, productX
userB, productY
userA, productZ
...
UserIds and productIds are all strings. My problem is, how to map these IDs to row indices (for users) and column indices (for products) in the matrix.
There are over a million unique userIds and roughly 3 million productIds.
To make the problem well defined: given the user1, product1 like input above, how do I convert it to something like
1,1
2,2
1,3
where userA is mapped to row 0 of the matrix, userB is mapped to row 1, productX is mapped to column 0 and so on.
Given the size of data, I would have to use Hadoop Map-Reduce but can't think of a foolproof way of efficiently doing this.
This can be solved if we can do the following:
Dump unique userIds.
Dump unique productIds.
Map each unique userId in (1) to a row index.
Map each unique productId in (2) to a column index.
I can do (1) and (2) easily but having trouble coming up with an efficient approach to solve (3) (4 will be solved if we solve 3).
I have a couple of solutions but they are not foolproof.
Solution 1 (naive) for step 3 above
Map all userIds and emit the same key (say "1") for all map tasks.
Have a long counter initialized to 0 in setup() of the reducer.
In the reduce(), emit the counter value along with the input userId and increment the counter by 1.
This would be very inefficient since all 100 million userIds would be processed by a single reducer.
Solution 2 for step 3 above
While mapping userIds, emit each userId against a key which is an integer uniformly sampled from 1,2,3....N (where N is configurable. N = 100 for example). In a way, we are partitioning the input set.
Within the mapper, use Hadoop counters to count the number of userIds assigned to each of those random partitions.
In the reducer setup, first access the counters in the mapping stage to determine how many IDs were assigned to each partition. Use these counters to determine the start and end values for that partition.
Iterate (while counting) over each userId in reduce and generate matrix rowId as start_of_partition + counter.
context.write(userId, matrix row Id)
This method should work but I am not sure how to handle cases when reducer tasks failed/killed.
I believe there should be ways of doing this which I am not aware of. Can we use hashing/modulo to achieve this? How would we handle collisions at scale?

Sorted results from hbase scanner

How to retrieve hbase column family "values" in any sorted order of the same?
like column family value
---------------------------------
column:1 1
column:3 2
column:4 3
column:2 4
HBase itself won't do that, instead you could retrieve the list of KeyValues using the Result.raw[1] method, put that in a List and sort it by passing your own comparator to Collections.sort[2].
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#raw()
http://download.oracle.com/javase/6/docs/api/java/util/Collections.html#sort(java.util.List, java.util.Comparator)

Resources