Combining multiprocessing with gevent - multiprocessing

If I use a multiprocessing pipe to pass data to a second process, and that second process uses a gevent pool to perform network operations, could I safely allow any of those green-threads to read from the pipe as long as a I use a gevent semaphore to control access?
I'd use gipc, but I prefer to be able to have the shared-data facilities of MP.

short answer.. yes that should work!

Related

Limitations on file append when using in multi-processed environment

My process creates a log file and appends a new line at the end of the file by using a, e.g:
fopen("log.txt", "a");
The order of the writes is not critical, but I need to ensure that fopen always succeeds. My question is, can the call above be executed from multiple processes at the same time on Windows, Linux and macOS without any race-condition?
If not, what is the most common and easy way to ensure I can write to the log file? There is file-lokcing, but also a file-lock (aka log.txt.lock) possible. Could anyone share some insights or resources which go more into detail?
If you do not use any synchronization between processes, you'll highly likely have moment when several processes will try to write to the file and the best you can get is mesh of input strings.
In order to synchronize any work in several processes (multiprocessing module). Use Lock. It will prevent several processes to do some work simultaneously.
It will look something like this:
import multiprocessing
# create lock in main process and "send" it to child processes.
lock = multiprocessing.Lock()
# ...
# in child Process
with lock:
do_some_work()
If you need more detailed example, feel free to ask.
Also you can check example in official docs

monetdbe: multiple connections reading vs writing

I am finding that with monetdbe (embedded, Python), I can import data to two tables simultaneously from two processes, but I can't do two SELECT queries.
For example, if I run this in Bash:
(python stdinquery.py < sql_examples/wind.sql &); (python stdinquery.py < sql_examples/first_event_by_day.sql &)
then I get this error from one process, while the other finishes its query fine:
monetdbe.exceptions.OperationalError: Failed to open database: MALException:monetdbe.monetdbe_startup:GDKinit() failed (code -2)
I'm a little surprised that it can write two tables at once but not read two tables at once. Am I overlooking something?
My stdinquery.py is just:
import sys
import monetdbe
monet_conn = monetdbe.connect("dw.db")
cursor = monet_conn.cursor()
query = sys.stdin.read()
cursor.executescript(query)
print(cursor.fetchdf())
You are starting multiple concurrent Python processes. Each of those tries to create or open a database on disk at the dw.db location. That won't work because the embedded database processes are not aware of each others.
With the core C library of monetdbe, it is possible to write multi-threaded applications where each connecting application thread uses its own connection object. See this example written in C here. Again this only works for concurrent threads within a single monetdbe process, not multiple concurrent monetdbe processes claiming the same database location.
Unfortunately it is not currently possible with the Python monetdbe module to setup something analogous to the C example above. But probably in the next release
it is going to be possible to use e.g. concurrent.futures to write something similar in Python.

redis: EVAL and the TIME

I like the Lua-scripting for redis but i have a big problem with TIME.
I store events in a SortedSet.
The score is the time, so that in my application i can view all events in given time-window.
redis.call('zadd', myEventsSet, TIME, EventID);
Ok, but this is not working - i can not access the TIME (Servertime).
Is there any way to get a time from the Server without passing it as an argument to my lua-script? Or is passing the time as argument the best way to do it?
This is explicitly forbidden (as far as I remember). The reasoning behind this is that your lua functions must be deterministic and depend only on their arguments. What if this Lua call gets replicated to a slave with different system time?
Edit (by Linus G Thiel): This is correct. From the redis EVAL docs:
Scripts as pure functions
A very important part of scripting is writing scripts that are pure functions. Scripts executed in a Redis instance are replicated on slaves by sending the script -- not the resulting commands.
[...]
In order to enforce this behavior in scripts Redis does the following:
Lua does not export commands to access the system time or other external state.
Redis will block the script with an error if a script calls a Redis command able to alter the data set after a Redis random command like RANDOMKEY, SRANDMEMBER, TIME. This means that if a script is read-only and does not modify the data set it is free to call those commands. Note that a random command does not necessarily mean a command that uses random numbers: any non-deterministic command is considered a random command (the best example in this regard is the TIME command).
There is a wealth of information on why this is, how to deal with this in different scenarios, and what Lua libraries are available to scripts. I recommend you read the whole documentation!

Blocking read and write on anonymous pipe

I have create a anonymous pipe (using pipe system call in linux and _pipe() in windows). I wanted to know
1. Whether the read and write on this pipe are blocking call (i.e if the pipe is full will the write be blocked)?.
2. Is there is any chance of data being overwritten in anonymous pipe?. If yes which is a better alternative to it?
Thanks,
Manoj
Yes -- the pipe blocks when full, although that rarely happens in modern systems with lots of memory.
If it happens, its a serious bug.

Scaling a ruby script by launching multiple processes instead of using threads

I want to increase the throughput of a script which does net I/O (a scraper). Instead of making it multithreaded in ruby (I use the default 1.9.1 interpreter), I want to launch multiple processes. So, is there a system for doing this to where I can track when one finishes to re-launch it again so that I have X number running at any time. ALso some will run with different command args. I was thinking of writing a bash script but it sounds like a potentially bad idea if there already exists a method for doing something like this on linux.
I would recommend not forking but instead that you use EventMachine (and the excellent em-http-request if you're doing HTTP). Managing multiple processes can be a bit of a handful, even more so than handling multiple threads, but going down the evented path is, in comparison, much simpler. Since you want to do mostly network IO, which consist mostly of waiting, I think that an evented approach would scale as well, or better than forking or threading. And most importantly: it will require much less code, and it will be more readable.
Even if you decide on running separate processes for each task, EventMachine can help you write the code that manages the subprocesses using, for example, EventMachine.popen.
And finally, if you want to do it without EventMachine, read the docs for IO.popen, Open3.popen and Open4.popen. All do more or less the same thing but give you access to the stdin, stdout, stderr (Open3, Open4), and pid (Open4) of the subprocess.
You can try fork http://ruby-doc.org/core/classes/Process.html#M003148
You can get the PID in return and see if this process run again or not.
If you want manage IO concurrency. I suggest you to use EventMachine.
You can either
implement (or find an equivalent gem) a ThreadPool (ProcessPool, in your case), or
prepare an array of all, let's say 1000 tasks to be processed, split it into, say 10 chunks of 100 tasks (10 being the number of parallel processes you want to launch), and launch 10 processes, of which each process right away receives 100 tasks to process. That way you don't need to launch 1000 processes and control that not more than 10 of them work at the same time.

Resources