Pop multiple values from Redis data structure atomically? - data-structures

Is there a Redis data structure, which would allow atomic operation of popping (get+remove) multiple elements, which it contains?
There are well known SPOP or RPOP, but they always return a single value. Therefore, when I need first N values from set/list, I need to call the command N-times, which is expensive. Let's say the set/list contains millions of items. Is there anything like SPOPM "setName" 1000, which would return and remove 1000 random items from set or RPOPM "listName" 1000, which would return 1000 right-most items from list?
I know there are commands like SRANDMEMBER and LRANGE, but they do not remove the items from the data structure. They can be deleted separately. However, if there are more clients reading from the same data structure, some items can be read more than once and some can be deleted without reading! Therefore, atomicity is what my question is about.
Also, I am fine if the time complexity for such operation is more expensive. I doubt it will be more expensive than issuing N (let's say 1000, N from the previous example) separate requests to Redis server.
I also know about separate transaction support. However, this sentence from Redis docs discourages me from using it for parallel processes modifying the set (destructively reading from it):
When using WATCH, EXEC will execute commands only if the watched keys were not modified, allowing for a check-and-set mechanism.

Use LRANGE with LTRIM in a pipeline. The pipeline will be run as one atomic transaction. Your worry above about WATCH, EXEC will not be applicable here because you are running the LRANGE and LTRIM as one transaction without the ability for any other transactions from any other clients to come between them. Try it out.

To expand on Eli's response with a complete example for list collections, using lrange and ltrim builtins instead of Lua:
127.0.0.1:6379> lpush a 0 1 2 3 4 5 6 7 8 9
(integer) 10
127.0.0.1:6379> lrange a 0 3 # read 4 items off the top of the stack
1) "9"
2) "8"
3) "7"
4) "6"
127.0.0.1:6379> ltrim a 4 -1 # remove those 4 items
OK
127.0.0.1:6379> lrange a 0 999 # remaining items
1) "5"
2) "4"
3) "3"
4) "2"
5) "1"
6) "0"
If you wanted to make the operation atomic, you would wrap the lrange and ltrim in multi and exec commands.
Also as noted elsewhere, you should probably ltrim the number of returned items not the number of items you asked for. e.g. if you did lrange a 0 99 but got 50 items you would ltrim a 50 -1 not ltrim a 100 -1.
To implement queue semantics instead of a stack, replace lpush with rpush.

Starting from Redis 3.2, the command SPOP has a [count] argument to retrieve multiple elements from a set.
See http://redis.io/commands/spop#count-argument-extension

Here is a python snippet that can achieve this using redis-py and pipeline:
from redis import StrictRedis
client = StrictRedis()
def get_messages(q_name, prefetch_count=100):
pipe = client.pipeline()
pipe.lrange(q_name, 0, prefetch_count - 1) # Get msgs (w/o pop)
pipe.ltrim(q_name, prefetch_count, -1) # Trim (pop) list to new value
messages, trim_success = pipe.execute()
return messages
I was thinking that I could just do a a for loop of pop but that would not be efficient, even with pipeline especially if the list queue is smaller than prefetch_count. I have a full RedisQueue class implemented here if you want to look. Hope it helps!

if you want a lua script, this should be fast and easy.
local result = redis.call('lrange',KEYS[1],0,ARGV[1]-1)
redis.call('ltrim',KEYS[1],ARGV[1],-1)
return result
then you don't have to loop.
update:
I tried to do this with srandmember (in 2.6) with the following script:
local members = redis.call('srandmember', KEYS[1], ARGV[1])
redis.call('srem', KEYS[1], table.concat(table, ' '))
return members
but I get an error:
error: -ERR Error running script (call to f_6188a714abd44c1c65513b9f7531e5312b72ec9b):
Write commands not allowed after non deterministic commands
I don't know if future version allow this but I assume not. I think it would be problem with replication.

Starting from Redis 6.2 you can use count argument to determine how many elements you want it to be popped from the list. count is available for both LPOP and RPOP. This is the pull request that implements count feature.
redis> rpush foo a b c d e f g
(integer) 7
redis> lrange foo 0 -1
1) "a"
2) "b"
3) "c"
4) "d"
5) "e"
6) "f"
7) "g"
redis> lpop foo
"a"
redis> lrange foo 0 -1
1) "b"
2) "c"
3) "d"
4) "e"
5) "f"
6) "g"
redis> lpop foo 3
1) "b"
2) "c"
3) "d"
redis> lrange foo 0 -1
1) "e"
2) "f"
3) "g"
redis> rpop foo 2
1) "g"
2) "f"
redis>

Redis 4.0+ now supports modules which add all kinds of new functionality and data types with much faster and safer processing than Lua scripts or multi/exec pipelines.
Redis Labs, the current sponsor behind Redis, has a useful set of extension modules called redex here: https://github.com/RedisLabsModules/redex
The rxlists module adds several list operations including LMPOP and RMPOP so you can atomically pop multiple values from a Redis list. The logic is still O(n) (basically doing a single pop in a loop) but all you have to do is install the module once and just send that custom command. I use it on lists with millions of items and thousands popped at once generating 500MB+ of network traffic without issue.

I think you should look at LUA support in Redis. If you write a LUA script and executes it on redis, it is guaranteed that it is atomic (because Redis is mono-threaded). No queries will be performed before the end of your LUA script (ie: you can't implement a big task in LUA or redis will get slow).
So, in this script you add your SPOP and RPOP, you can append the results from each redis command in an LUA array for instance and then return the array to your redis client.
What the documentation is saying about MULTI is that it is optimistic locking, that means it will retry doing the multi thing with WATCH until the watched value is not modified. If you have many writes on the watched value, it will be slower than 'pessimistic' locking (like many SQL databases: POSTGRESQL, MYSQL...) that in some manner 'stops the world' in order for the query to be executed first. Pessimistic locking is not implemented in redis, but you can implement it if you want, but it is complex and maybe you don't need it (not so many writes on this value: optimistic should be quite enough).

you probably can try a lua script (script.lua) like this:
local result = {}
for i = 0 , ARGV[1] do
local val = redis.call('RPOP',KEYS[1])
if val then
table.insert(result,val)
end
end
return result
you can call it this way :
redis-cli eval "$(cat script.lua)" 1 "listName" 1000

Related

Resource management in Bash for running processes in parallel

We have shared server with multiple GPU nodes without resource manager. We make agreements that: "this week you can use nodes ID1,ID2 and ID5". I have a program that gets this ID as a parameter.
When I need to run my program ten times with ten different sets of parameters $ARGS1, $ARGS2, ..., $ARGS10, I run first three commands
programOnGPU $ARGS1 -p ID1 &
programOnGPU $ARGS2 -p ID2 &
programOnGPU $ARGS3 -p ID5 &
Then I must wait for any of those to finish and if e.g ID2 finishes first, I then run
programOnGPU $ARGS4 -p ID2 &
As this is not very convenient when you have a lot of processes I would like to automatize the process. I can not use parallel as I need to reuse IDs.
First use case is a script that needs to execute apriori known 10 commands of the type
programOnGPU $PARAMS -p IDX
when any of them finishes to assign its ID to another one in the queue. Is this possible using bash without too much overhead of the type of the SLURM? I don't need to check the state of physical resource.
General solution would be if I can make a queue in the bash or simple command line utility to which I will submit commands of the type
programABC $PARAMS
and it will add the GPU ID parameter to it and manage the queue that will be preconfigured to be able to use just given IDs and one ID at once. Again I don't want this layer to touch physical GPUs, but to ensure that it executes consistently over allowed ID's.
This is very simple with Redis. It is a very small, very fast, networked, in-memory data-structure server. It can store sets, queues, hashes, strings, lists, atomic integers and so on.
You can access it across a network in a lab, or across the world. There are clients for bash, C/C++, Ruby, PHP, Python and so on.
So, if you are allocated nodes 1, 2 and 5 for the week, you can just store those in a Redis "list" with LPUSH using the Redis "Command Line Interface"* for bash:
redis-cli lpush VojtaKsNodes 1 2 5
If you are not on the Redis host, add its hostname/IP-address into the command like this:
redis-cli -h 192.168.0.4 lpush VojtaKsNodes 1 2 5
Now, when you want to run a job, get a node with BRPOP. I specify an infinite timeout with the zero at the end, but you could wait a different amount of time:
# Get a node with infinite timeout
node=$(redis-cli brpop VojtaKsNodes 0)
run job on "$node"
# Give node back
redis-cli lpush VojtaKsNodes "$node"
I would:
I have a list of IDS=(ID1 ID2 ID5)
I would make 3 files, one with each IDs.
Run <arguments xargs -P3 programOnGPUFromLockedFile so run 3 processes for each of your argument.
Each of the processes will nonblockingly try to flock the 3 files in a loop, endlessly (ie. you can run more processes then 3, if you wanna).
When they succeed to flock,
they read the ID from the file
run the action on that ID
When they terminate, they will free flock, so the next process may flock the file and use the ID.
Ie. it's a very, very basic mutex locking. There are also other ways you can do it, like with an atomic fifo:
Create a fifo
Spawn one process for each argument you want to run with that will:
Read one line from the fifo
That line will be the ID to run on
Do the job on that ID
Output one line with the ID to the fifo back
And then write one ID per line to the fifo (in 3 separate writes! so that it's hopefully atomic), so 3 processes may start.
wait until all except 3 child processes exit
read 3 lines from fifo
wait until all child processes exit

Redis pipeline, dealing with cache misses

I'm trying to figure out the best way to implement Redis pipelining. We use redis as a cache on top of MySQL to store user data, product listings, etc.
I'm using this as a starting point: https://joshtronic.com/2014/06/08/how-to-pipeline-with-phpredis/
My question is, assuming you have an array of ids properly sorted. You loop through the redis pipeline like this:
$redis = new Redis();
// Opens up the pipeline
$pipe = $redis->multi(Redis::PIPELINE);
// Loops through the data and performs actions
foreach ($users as $user_id => $username)
{
// Increment the number of times the user record has been accessed
$pipe->incr('accessed:' . $user_id);
// Pulls the user record
$pipe->get('user:' . $user_id);
}
// Executes all of the commands in one shot
$users = $pipe->exec();
What happens when $pipe->get('user:' . $user_id); is not available, because it hasn't been requested before or has been evicted by Redis, etc? Assuming it's result # 13 from 50, how do we a) find out that we weren't able to retrieve that object and b) keep the array of users properly sorted?
Thank you
I will answer the question referring to Redis protocol. How it works in particular language is more or less the same in that case.
First of all, let's check how Redis pipeline works:
It is just a way to send multiple commands to server, execute them and get multiple replies. There is nothing special, you just get an array with replies for each command in the pipeline.
Why pipelines are much faster is because roundtrip time for each command is saved, i.e. for 100 commands there is only one round-trip time instead of 100. In addition, Redis executes every command synchronously. Executing 100 commands needs potentially fighting 100 times, for Redis to pick that singular command, pipeline is treated as one long command, thus requiring only once to wait being picked synchronously.
You can read more about pipelining here: https://redis.io/topics/pipelining. One more note, because each pipelined batch runs uninterruptible (in terms of Redis) it makes sense to send these commands in overviewable chunks, i.e. don't send 100k commands in a single pipeline, that might block Redis for a long period of time, split them into chunks of 1k or 10k commands.
In your case you run in the loop the following fragment:
// Increment the number of times the user record has been accessed
$pipe->incr('accessed:' . $user_id);
// Pulls the user record
$pipe->get('user:' . $user_id);
The question is what is put into pipeline? Let's say you'd update data for u1, u2, u3, u4 as user ids. Thus the pipeline with Redis commands will look like:
INCR accessed:u1
GET user:u1
INCR accessed:u2
GET user:u2
INCR accessed:u3
GET user:u3
INCR accessed:u4
GET user:u4
Let's say:
u1 was accessed 100 times before,
u2 was accessed 5 times before,
u3 was not accessed before and
u4 and accompanying data does not exist.
The result will be in that case an array of Redis replies having:
101
u1 string data stored at user:u1
6
u2 string data stored at user:u2
1
u3 string data stored at user:u3
1
NIL
As you can see, Redis will treat missing INCR values as being 0 and execute incr(0). Finally, there is nothing being sorted by Redis and the results will come in the oder as requested.
The language binding, e.g. Redis driver, will just parse for you that protocol and give the view to parsed data. Without keeping the oder of commands it'll be impossible for Redis driver to work correctly and for you as programmer to deduce smth. Just keep in mind, that request is not duplicated in the reply i.e. you will not receive key for u1 or u2 when doing GET, but just the data for that key. Thus your implementation must remember that on position 1 (zero based index) comes the result of GET for u1.

Dealing with Memcached Race Conditions

I have two different sources of data which I need to marry together. Data set A will have a foo_key attribute which can map to Data set B's bar_key attribute with a one to many relationship.
Data set A:
[{ foo_key: 12345, other: 'blahblah' }, ...]
Data set B:
[{ bar_key: 12345, other: '' }, { bar_key: 12345, other: '' }, { bar_key: 12345, other: '' }, ...]
Data set A is coming from a SQS queue and any relationships with data set B will be available as I poll A.
Data set B is coming from a separate SQS queue that I am trying to dump into a memcached cache to do quick look ups on when an object drops into data set A.
Originally I was planning on setting the memcached key to be bar_key from the objects in data set B but then realized that if I did that it would be possible to overwrite the value since there can be many of the same bar_key value. Then I was thinking well I can create a key bar_key and the value just be an array of the SQS messages. But since I have multiple hosts polling the SQS queue I think it might be possible that when I check to see if the key is in memcached, check it out, append the new message to it, and then set it, that another host could be trying to preform the same operation and thus the first host's attempt at appending the value would just be overwritten.
I've looked around at memcached key locking but I'm not sure I understand it entirely. Would the solution be that when I get the key/value pair from memcached I create a temporary dummy lock on a new key called bar_key_dummy that expires in x seconds, and if I try to fetch a key that has a bar_key_dummy lock active I just send the SQS message back to the queue without deleting to try again in x seconds?
Here's some pseudocode for what I have going on in my head. Does this make any sense?
store = MemCache.new(host)
sqs_messages.poll do |message|
dummy_key = "#{message.bar_key}_dummy"
sqs.dont_delete_message && next unless store.get(dummy_key).nil?
# set dummy_key in memcache with a value of 1 for 3 seconds
store.set(dummy_key, 1, 3)
temp_data = store.get(message.bar_key) || []
temp_data << message
store.set(message.bar_key, temp_data, 300)
# delete dummy key when done in case shorter than x seconds
store.delete(dummy_key)
end
Thanks for any help!
Memcached has a special operation - cas Compare and Swap.
Command gets returns Item along with its unique CAS value.
Then dataset can be searched and update must be issued with the cas command which takes original unique CAS value.
If CAS was changed in between two command, update operation will fail with the EXISTS error

Move a range of jobs to another queue with qmove

I have several (idle) jobs scheduled on a cluster that I want to move to another queue.
I can move a single job like this (where 1234 is the job id):
qmove newQueue 1234
But now I have hundreds of jobs that I want to move to newQueue. Is it possible to move them all? Using * as a wildcard operator does not work.
If the job ids are in sequential order, you could use Bash's brace extension. For example:
$ echo {0..9}
0 1 2 3 4 5 6 7 8 9
Transferred to moving all jobs ranging from 1000 to 2000, the qmove command would be:
qmove newQueue {1000..2000}
This might even work if there are job ids that you are not allowed to move (from other users or in running state). They should be simply ignored. (not tested)

Limitation in retrieving rows from a mongodb from ruby code

I have a code which gets all the records from a collection of a mongodb and then it performs some computations.
My program takes too much time as the "coll_id.find().each do |eachitem|......." returns only 300 records at an instant.
If I place a counter inside the loop and check it prints 300 records and then sleeps for around 3 to 4 seconds before printing the counter value for next set of 300 records..
coll_id.find().each do |eachcollectionitem|
puts "counter value for record " + counter.to_s
counter=counter +1
---- My computations here -----
end
Is this a limitation of ruby-mongodb api or some configurations needs to be done so that the code can get access to all the records at one instant.
How large are your documents? It's possible that the deseriaization is taking a long time. Are you using the C extensions (bson_ext)?
You might want to try passing a logger when you connect. That could help sort our what's going on. Alternatively, can you paste in the MongoDB log? What's happening there during the pause?

Resources