I am currently doing the Ruby on the Web project for The Odin Project. The goal is to implement a very basic webserver that parses and responds to GET or POST requests.
My solution uses IO#gets and IO#read(maxlen) together with the Content-Length Header attribute to do the parsing.
Other solution use IO#read_nonblock. I googled for it, but was quite confused with the documentation for it. It's often mentioned together with Kernel#select, which didn't really help either.
Can someone explain to me what the nonblock calls do differently than the normal ones, how they avoid blocking the thread of execution, and how they play together with the Kernel#select method?
explain to me what the nonblock calls do differently than the normal ones
The crucial difference in behavior is when there is no data available to read at call time, but not at EOF:
read_nonblock() raises an exception kind of IO::WaitReadable
normal read(length) waits until length bytes are read (or EOF)
how they avoid blocking the thread of execution
According to the documentation, #read_nonblock is using the read(2) system call after O_NONBLOCK is set for the underlying file descriptor.
how they play together with the Kernel#select method?
There's also IO.select. We can use it in this case to wait for availability of input data, so that a subsequent read_nonblock() won't cause an error. This is especially useful if there are multiple input streams, where it is not known from which stream data will arrive next and for which read() would have to be called.
In a blocking write you wait until bytes got written to a file, on the other hand a nonblocking write exits immediately. It means, that you can continue to execute your program, while operating system asynchronously writes data to a file. Then, when you want to write again, you use select to see whether the file is ready to accept next write.
Related
Doing some tests in scm (a scheme interpreter), I've intentionally closed the current-input-port (equivalent to the standard input file descriptor). Once the program work in REPL, the things got crazy, printing systematically a error message. My question is: how could I recover the control of process, that means, how could I reestablish the input file descriptor of such process?
Search for "changing file descriptor of a running process" or something similar, I couldn't find a helpful article.
Thanks in advance
System information: Debian 10.
You almost certainly can't, although this does slightly depend on how the language-level ports are mapped to the underlying OS-level I/O system.
If what you do is close the OS-level standard input then all is lost:
the REPL tries to read from standard input, gets an error as it's closed;
it tries to raise some error which will involve prompting the user for input ...
... from standard input, which is closed, so it gets error;
game over.
The only way to survive this is for one of two things to be true:
either you've wrapped an error handler around the code which is already prepared to deal with this;
or the implementation is smart enough to recognise that it's getting closed-port errors in its closed-port error handler and gives up in some smart way.
Basically once the OS level standard input is gone anything that needs to get input from it is doomed: you can't put it back without OS-level surgery on the process.
However it's possible that the implementation maps a single OS-level I/O stream to multiple language-level streams, and closing only one of these streams would leave the system with some other stream-of-last-resort to which it can still talk, and which still refers to the OS-level standard input. Common Lisp is an example of a system which can (depending on configuration) do this. It has, for instance, *standard-input* *error-output*, *query-io*, *terminal-io* and other streams, and it's very possible to be in a situation where, for instance, *standard-input* has been closed causing read errors, but *query-io* still points somewhere with a human on the end of it.
I don't know if scm does that.
Specifically, if the following events take place in the given order:
Process 1 opens a file in append mode.
Process 2 opens the same file in append mode.
Process 2 gets an exclusive lock using flock(2) on the file descriptor.
Process 1 attempts to write to the file.
What happens?
Will the write return immediately with a code indicating failure? Will it hang until the lock is released, then write and return success? Does the behavior vary by kernel? It seems odd that the documentation doesn't cover this case.
(I could write a couple processes to test it on my system, but I don't know whether my test would be representative of the general case, and if anyone does know, I can anticipate this answer saving a lot of other people a lot of time.)
The write proceeds as normal. flock provides advisory locking. Locking a file exclusively only prevents others from getting a shared or exclusive lock on the same file. Calls other than flock are not affected.
I am performing very rapid file access in ruby (2.0.0 p39474), and keep getting the exception Too many open files
Having looked at this thread, here, and various other sources, I'm well aware of the OS limits (set to 1024 on my system).
The part of my code that performs this file access is mutexed, and takes the form:
File.open( filename, 'w'){|f| Marshal.dump(value, f) }
where filename is subject to rapid change, depending on the thread calling the section. It's my understanding that this form relinquishes its file handle after the block.
I can verify the number of File objects that are open using ObjectSpace.each_object(File). This reports that there are up to 100 resident in memory, but only one is ever open, as expected.
Further, the exception itself is thrown at a time when there are only 10-40 File objects reported by ObjectSpace. Further, manually garbage collecting fails to improve any of these counts, as does slowing down my script by inserting sleep calls.
My question is, therefore:
Am I fundamentally misunderstanding the nature of the OS limit---does it cover the whole lifetime of a process?
If so, how do web servers avoid crashing out after accessing over ulimit -n files?
Is ruby retaining its file handles outside of its object system, or is the kernel simply very slow at counting 'concurrent' access?
Edit 20130417:
strace indicates that ruby doesn't write all of its data to the file, returning and releasing the mutex before doing so. As such, the file handles stack up until the OS limit.
In an attempt to fix this, I have used syswrite/sysread, synchronous mode, and called flush before close. None of these methods worked.
My question is thus revised to:
Why is ruby failing to close its file handles, and how can I force it to do so?
Use dtrace or strace or whatever equivalent is on your system, and find out exactly what files are being opened.
Note that these could be sockets.
I agree that the code you have pasted does not seem to be capable of causing this problem, at least, not without a rather strange concurrency bug as well.
I am writing a win32 app which is using the namedpipe for inter-process communication. When one process is trying to writeFile, it will write the structure (tell other process how many bytes and other info), then it will write the actual data by calling WriteFile again.
The other process, when it is reading, it read the first msg, and then read the second msg based on the information got from the first msg.
My questions are:
If the server process is writing the data, but the client process hasn't read it yet, is it possible to lost the first msg when the client is reading? Example, when the server is calling WriteFile at the second time to write actual data, will the previous msg was overwritten?
Is there any best solution to use waitforsingleobject to sync?
Thanks
A pipe is a little like a real pipe -- when you write more to the pipe, it doesn't overwrite what was already in the pipe. It just adds more data to the pipe that will be delivered after the data that you previously wrote to the pipe.
I rarely find WaitForSingleObject useful for a pipe. If you want to block the current thread until it receives data from the pipe, you can just do a synchronous read, and it'll block until there's data. If you want to block until there's input from any of a number of sources, you usually want WaitForMultipleObjects or MsgWaitForMultipleObjects, so your thread will run when any of the sources has input to process.
The only times I can recall using WaitForSingleObject on a pipe were with a zero timeout, so the receiver would continue other processing if there was no pipe input, and every once in a while check if the pipe has some data to process. While it initially seems like PeekNamedPipe would work for this, it's really most useful for other purposes -- though it might work for you, to read the header data and figure out what other code to invoke to read and process the entire message.
Having said all that, I feel obliged to point out that I haven't written any new code using named pipes in quite a while. I can think of very few situations in which I'd even consider them today -- I'd almost always use sockets instead.
The systemu page says:
systemu can be used on any platform to return status, stdout, and stderr of any command. unlike other methods like open3/popen4 there is zero danger of full pipes or threading issues hanging your process or subprocess.
(https://github.com/ahoward/systemu)
Could anyone explain this a little bit?
Methods like popen and its various spinoffs are convenient and are part of the expected API for a full I/O library.
However, they must be used either casually or carefully because they are prone to deadlock. By casually, I mean, if you both write and read from the command, it's still OK as long as you either don't write a lot or don't read a lot. By carefully, I mean, you can move large amounts of data, but only if you keep the inner details of the operation in mind and deliberately engineer against deadlock.
Imagine writing lots of stuff to your popened command and then reading a result. If you write more than a pipe will buffer, then your process will sleep. That's OK in practice, most of the time, but what if the command has to write a lot of stuff? Now it may sleep and not finish reading input that you are sending. You won't finish sending input so you will never wake up and read results.
Deadlock!