poll() in Ruby? - ruby

I am currently porting a self-written network application from C++ to Ruby. This network application often needs to manage around 10.000 sockets at the same time, that means it needs quick access to any sockets that have readable data available, incoming connections, etc.
I have already experienced while writing it in C++, that select() does not work for this case, because internally it uses 32 DWORDs (128 byte) to manage maximally 1024 sockets with bitmasks. Since I sometimes have to work with more than 10.000 sockets, this function did not suffice. Thus I had to switch to poll() which also made the code more elegant because I did not always have add and remove all the file-descriptors again.
As I can see from the Ruby documentation, Ruby offers IO.select(), which would basically be a wrapper for the C-API (as far as I know). Unfortunately it seems like there is no IO.poll(), which I would need for this particular application.
Does IO.select() have the same limitations as select() on WinSocks and Berkeley Sockets? If yes, is there a way to work around that?

Select cannot safely be used with programs that have more than 1024 file descriptors on a Linux system. This is because the underlying fd_set that the select system call uses is a fixed sized buffer i.e. its size is allocated at compile time, not run time.
From man 2 select:
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
An fd_set is a fixed size buffer. Executing FD_CLR() or
FD_SET() with a value of fd that is negative or is equal to or
larger than FD_SETSIZE will result in undefined behavior.
Moreover, POSIX requires fd to be a valid file descriptor.
This means that if you have more than 1024 file descriptors in your program, and you use the select system call, you will end up with memory corruption.
If you want to use more than 1024 file descriptors in your program, you must use poll or epoll, and ensure that you never use select, or you will get random memory corruption. Changing the size of the file descriptor table through ulimit is very dangerous if you are using select. Don't do it.
Ruby's select does seem to be actually implemented with the select system call, so while it may look like increasing ulimit works, under the hood corruption is happening:
https://github.com/ruby/ruby/blob/trunk/thread.c
Furthermore, some unrelated API's in ruby seem to use select (see thread_pthread.c) so it's probably also unsafe to use those, or any code that uses those API's within a ruby program running with a file descriptor table larger than 1024.

The limitations on IO.select() and in fact the number of open connections you can have per process appear to be determined primarily by the underlying operating system support. Definitely no fixed 1024 socket limit.
For example, under WinXP, I max out at 69 socket opens (even before I get to select). I'm sure that is probably tunable, I just don't know how.
Under Linux, the limitation is the number of open files allowed. By default, the limit is usually 1024 (run ulimit -a to check).
However, you can easily change this e.g. ulimit -n 10000.
I just ran a test and happily went well over 1024 active sockets created with TCPSocket.new, and using IO.select to test for ready data.
NB: there is a good example of IO.select usage in this GServer article.

IO::Reactor may do what you need. It has a poll method that looks similar to what you're describing.

Related

How to access FD_SETSIZE in Ruby? spawn cpp (??)

While reading up about IO in Ruby, moreover refreshing my own albeit limited knowledge about I/O in generally POSIX-friendly libc environments, I found a question here at Stack Overflow: poll() in Ruby? such that raises the question that was the object of research.
The responses had mentioned the availability of a select method in Ruby. However, it also raised a concern about using select under certain conditions on some operating systems - including Linux - e.g when there may be 1024 or more file descriptors open in the Ruby process.
In some of the responses to the question, poll() in Ruby?, it was suggested that if select was called in such an environment, that it could result in memory corruption in the application. While the concern might not be represented as being of such severity, in other documentation, and there may be a way to portably avoid calling select in such circumstances - as later reading has indicated - perhaps the question remains as to how to address this portably, for Ruby's select.
Reading more about it, I noticed that the "BUGS" section of the select(2) manual page, on Linux, provides what may represent an expansive discussion of the issue. The text mentions a constant, FD_SETSIZE as apparently representing the exclusive upper limit on the number of file descriptors that can be open at the time when select is called, such that select might be expected to perform normally then (roughly paraphrased).
Quoting the select(2) manual page:
POSIX allows an implementation to define an upper limit,
advertised via the constant FD_SETSIZE, on the range of file
descriptors that can be specified in a file descriptor set. The
Linux kernel imposes no fixed limit, but the glibc implementation
makes fd_set a fixed-size type, with FD_SETSIZE defined as 1024,
and the FD_*() macros operating according to that limit. To
monitor file descriptors greater than 1023, use poll(2) or
epoll(7) instead.
The implementation of the fd_set arguments as value-result
arguments is a design error that is avoided in poll(2) and
epoll(7).
According to POSIX, select() should check all specified file
descriptors in the three file descriptor sets, up to the limit
nfds-1. However, the current implementation ignores any file
descriptor in these sets that is greater than the maximum file
descriptor number that the process currently has open. According
to POSIX, any such file descriptor that is specified in one of
the sets should result in the error EBADF.
Towards making use of this in Ruby, albeit in what may be a guess of an approach: What might be the best way to determine FD_SETSIZE for the Ruby environment?
If it was available as a constant, this assumes that the value of that constant could be used in a conditional test before calling 'select' on any open file descriptor. The Ruby program might then raise an exception internally, before calling select on any file descriptor equal to or greater than the value of FD_SETSIZE for the instance, at least for generally POSIX-friendly operating systems?
If there's no cleaner way to approach this, maybe it could be worked into the distribution tooling for a project, such as to determine that constant's value for the target architecture then to store it along with any other application constants? I'm guessing a cpp could be used for this - whether from GCC, LLVM, or any other toolchain - perhaps in some ways similar to sb-grovel.
Maybe there's some other way to approach this, and portably so? Perhaps there's already a constant for it, somewhere in Ruby?
Maybe there's already some checking about it, in the Ruby source code? I suppose it's time to look for that GitHub repository now....
Ruby does not export FD_SETSIZE in any way, so the only way to check the size is to use a compiler.
Instead of building your own extension, the least hassle-free way may be to use RubyInline, which makes the check very simple:
gem install RubyInline
Then:
require 'inline'
class FdTest
inline do |builder|
builder.c "
long fd_setsize(void) {
return FD_SETSIZE;
}"
end
end
puts FdTest.new.fd_setsize
=> 1024
This is even semi-portable to Windows, provided you are running under WSL, Cygwin, MinGW, or something similar. Might even work under Visual Studio, provided it is installed with C-support.
Building it as an extension might be another solution to ensure better compatibility, which you can then ship with precompiled binaries for your required platforms.
It all depends on how much trouble you are willing to go through in order to extract this information on all possible platforms, since there really does not exist a fully platform independent solution to something like this.

How to control ram usage for a specific user application with cgroups and systemd?

I am kind of pissed off by the browsers memory use. I would like to limit the total memory used by Chrome, opera, firefox etc. to 800MB for example.
It looks like a job for cgroups.
I've read about cgexec and it would do what I want...
However, I would also like to "prepare" a group called "internet", using a similar method as described here :
https://wiki.archlinux.org/index.php/cgroups#Persistent_group_configuration
And since it's mentioned :
Note: when using Systemd >= 205 to manage cgroups, you can ignore this file entirely.
I'm a bit scared. (and Google finds results relevent for the situation before systemd, but it's a blur for the current situation)
Since Systemd looks like it's becoming the new standard, how to do it with a long term support ?
(...And am I missing/messing something here, because it's quite unclear to me to be honest)
I generally think it is a bad idea since Chrome will probably crash when it will not able to allocate any more memory.
Alternatively, it will swap its data to the disk which is even worse.
Chrome's high memory consumption is what makes it fast.
If you insist on creating a cgroup for your browsers, I suggest creating a script that first creates the cgroups if it does not exist, then runs the application given in the script's parameters.

what's the memory allocation functions can be called from the interrupt environment in AIX?

xmalloc can be used in the process environment only when I write a AIX kernel extension.
what's the memory allocation functions can be called from the interrupt environment in AIX?
thanks.
The network memory allocation routines. Look in /usr/include/net/net_malloc.h. The lowest level is net_malloc and net_free.
I don't see much documentation in IBM's pubs nor the internet. There are a few examples in various header files.
There is public no prototype that I can find for these.
If you look in net_malloc.h, you will see MALLOC and NET_MALLOC macros defined that call it. Then if you grep in all the files under /usr/include, you will see uses of these macros. From these uses, you can deduce the arguments to the macros and thus deduce the arguments to net_malloc itself. I would make one routine that is a pass through to net_malloc that you controlled the interface to.
On your target system, do "netstat -m". The last bucket size you see will be the largest size you can call net_malloc with the M_NOWAIT flag. M_WAIT can be used only at process time and waits for netm to allocate more memory if necessary. M_NOWAIT returns with a 0 if there is not enough memory pinned. At interrupt time, you must use M_NOWAIT.
There is no real checking for the "type" but it is good to pick an appropriate type for debugging purposes later on. The netm output from kdb shows the type.
In a similar fashion, you can figure out how to call net_free.
Its sad IBM has chosen not to document this. An alternative to get this information officially is to pay for an "ISV" question. If you are doing serious AIX development, you want to become an ISV / Partner. It will save you lots of heart break. I don't know the cost but it is within reach of small companies and even individuals.
This book is nice to have too.

Can a read() by one process see a partial write() by another?

If one process does a write() of size (and alignment) S (e.g. 8KB), then is it possible for another process to do a read (also of size and alignment S and the same file) that sees a mix of old and new data?
The writing process adds a checksum to each data block, and I'd like to know whether I can use a reading process to verify the checksums in the background. If the reader can see a partial write, then it will falsely indicate corruption.
What standards or documents apply here? Is there a portable way to avoid problems here, preferably without introducing lots of locking?
When a function is guaranteed to complete without there being any chance of any other process/thread/anything seeing things in a half finished state, it's said to be atomic. It either has or hasn't happened, there is no part way. While I can't speak to Windows, there are very few file operations in POSIX (which is what Linux/BSD/etc attempt to stick to) that are guaranteed to be atomic. Reading and writing are not guaranteed to be atomic.
While it would be pretty unlikely for you to write 2 bytes to a file and another process only see one of those bytes written, if by dumb luck your write straddled two different pages in memory and the VM system had to do something to prepare the second page, it's possible you'd see one byte without the other in a second process. Usually if things are page aligned in your file, they will be in memory, but again you can't rely on that.
Here's a list someone made of what is atomic in POSIX, which is pretty short, and I can't vouch for it's authenticity. (I can't think of why unlink isn't listed, for example).
I'd also caution you against testing what appears to work and running with it, the moment you start accessing files over a network file system (NFS on Unix, or SMB mounts in Windows) a lot of things that seemed to be atomic before no longer are.
If you want to have a second process calculating checksums while a first process is writing the file, you may want to open a pipe between the two and have the first process write a copy of everything down the pipe to the checksumming process. That may be faster than dealing with locking.

Creating an bomb-proof worker process (on windows)

I write a pdf viewer that uses various libraries written in C. This C code is potentially easy to exploit. And there are just too many lines to check. I will have to assume that this code may contain exploitable bugs.
The thing is that the C code is quite straightforward. A stream of bytes go in at one end, and a bitmap (also a stream of bytes) comes out at the other.
Inspired by google chrome, I am thinking to create a separate process that does the decoding and page rendering. Ideally this should be executed in a process that has absolutely no rights to do anything except reading the one input stream it has, and outputting to a stream of bytes (some uncompresed bitmap) at the other end.
What I think the process should not be able to do is:
any disk access
open sockets
limited amount of memory use
access shared memory with other processes
load other dll's
... anything else?
Is that possible? Is this described somewhere?
If you have the source code - you may check it doesn't do the described things.
Well, limiting available memory is a bit more difficult. You may however use SetProcessWorkingSetSize.
Also after you've built the executable you may check its DLL import table (by dependencies walker) to ensure it doesn't access any file/socket function.
This isn't really possible. Ultimately any potential exploit code will be running with whatever privileges this process runs with. If you run it as a standard user then you will limit the damage that could be done, but your best bet is to just fix the code as much as possible.

Resources