Buffered I/O in Chicken Scheme? - scheme

Racket has the nice read-bytes-async! function, which I believe exists in every other programming language in the world. It reads what it can from an input stream, without blocking, into a buffer, returning the number of bytes written.
Said function seems like an absolutely essential function for efficiently implementing, say, the Unix cat tool, yet Chicken Scheme seems to lack any such function. Of course, I can use (read-byte) and (write-byte), but that is slow and eats up all my CPU.
Even (copy-port) seems to not have any such implementation. Instead, before the stream is closed, the data is copied buffer-by-buffer only when the buffers fill. This means that (copy-port (current-input-port) (current-output-port)) does not behave like cat at all.
Am I just suffering from a terrible blind spot in reading the documentation, or does Chicken shockingly actually lack such a function? So cat can't even be written efficiently in Chicken?

I fixed my problem. The posix library has the file-read function that does what I want, albeit on a file descriptor. Fortunately, ports in Chicken are just thin wrappers around file descriptors; there is a port to file descriptor converter in the posix library as well.
Interestingly, these functions work on Windows as well. posix seems to not be limited to POSIX systems.

as you said the posix unit is the key ,but to your question what seems more relevant is set-buffering-mode!
this applies to any port.

Related

How to access FD_SETSIZE in Ruby? spawn cpp (??)

While reading up about IO in Ruby, moreover refreshing my own albeit limited knowledge about I/O in generally POSIX-friendly libc environments, I found a question here at Stack Overflow: poll() in Ruby? such that raises the question that was the object of research.
The responses had mentioned the availability of a select method in Ruby. However, it also raised a concern about using select under certain conditions on some operating systems - including Linux - e.g when there may be 1024 or more file descriptors open in the Ruby process.
In some of the responses to the question, poll() in Ruby?, it was suggested that if select was called in such an environment, that it could result in memory corruption in the application. While the concern might not be represented as being of such severity, in other documentation, and there may be a way to portably avoid calling select in such circumstances - as later reading has indicated - perhaps the question remains as to how to address this portably, for Ruby's select.
Reading more about it, I noticed that the "BUGS" section of the select(2) manual page, on Linux, provides what may represent an expansive discussion of the issue. The text mentions a constant, FD_SETSIZE as apparently representing the exclusive upper limit on the number of file descriptors that can be open at the time when select is called, such that select might be expected to perform normally then (roughly paraphrased).
Quoting the select(2) manual page:
POSIX allows an implementation to define an upper limit,
advertised via the constant FD_SETSIZE, on the range of file
descriptors that can be specified in a file descriptor set. The
Linux kernel imposes no fixed limit, but the glibc implementation
makes fd_set a fixed-size type, with FD_SETSIZE defined as 1024,
and the FD_*() macros operating according to that limit. To
monitor file descriptors greater than 1023, use poll(2) or
epoll(7) instead.
The implementation of the fd_set arguments as value-result
arguments is a design error that is avoided in poll(2) and
epoll(7).
According to POSIX, select() should check all specified file
descriptors in the three file descriptor sets, up to the limit
nfds-1. However, the current implementation ignores any file
descriptor in these sets that is greater than the maximum file
descriptor number that the process currently has open. According
to POSIX, any such file descriptor that is specified in one of
the sets should result in the error EBADF.
Towards making use of this in Ruby, albeit in what may be a guess of an approach: What might be the best way to determine FD_SETSIZE for the Ruby environment?
If it was available as a constant, this assumes that the value of that constant could be used in a conditional test before calling 'select' on any open file descriptor. The Ruby program might then raise an exception internally, before calling select on any file descriptor equal to or greater than the value of FD_SETSIZE for the instance, at least for generally POSIX-friendly operating systems?
If there's no cleaner way to approach this, maybe it could be worked into the distribution tooling for a project, such as to determine that constant's value for the target architecture then to store it along with any other application constants? I'm guessing a cpp could be used for this - whether from GCC, LLVM, or any other toolchain - perhaps in some ways similar to sb-grovel.
Maybe there's some other way to approach this, and portably so? Perhaps there's already a constant for it, somewhere in Ruby?
Maybe there's already some checking about it, in the Ruby source code? I suppose it's time to look for that GitHub repository now....
Ruby does not export FD_SETSIZE in any way, so the only way to check the size is to use a compiler.
Instead of building your own extension, the least hassle-free way may be to use RubyInline, which makes the check very simple:
gem install RubyInline
Then:
require 'inline'
class FdTest
inline do |builder|
builder.c "
long fd_setsize(void) {
return FD_SETSIZE;
}"
end
end
puts FdTest.new.fd_setsize
=> 1024
This is even semi-portable to Windows, provided you are running under WSL, Cygwin, MinGW, or something similar. Might even work under Visual Studio, provided it is installed with C-support.
Building it as an extension might be another solution to ensure better compatibility, which you can then ship with precompiled binaries for your required platforms.
It all depends on how much trouble you are willing to go through in order to extract this information on all possible platforms, since there really does not exist a fully platform independent solution to something like this.

Windows: redirect ReadFile to run process and pipe it's stdout

I was wondering how hard it would be to create a set-up under Windows where a regular ReadFile on certain files is being redirected by the file system to actually run (e.g. ShellExecute) those files, and then the new process' stdout is being used as the file content streamed out to the ReadFile call to the callee...
What I envision the set-up to look like, is that you can configure it to denote a certain folder as 'special', and that this extra functionality is then only available on that folder's content (so it doesn't need to be disk-wide). It might be accessible under a new drive letter, or a path parallel to the source folder; the location it is hooked up to is irrelevant to me.
To those of you that wonder if this is a classic xy problem: it might very well be ;) It's just that this idea has intrigued me, and I want to know what possibilities there are. In my particular case I want to employ it to #include content in my C++ code base, where the actual content included is being made up on the spot, different on each compile round. I could of course also create a script to create such content to include, call it as a pre-build step and leave it at that, but why choose the easy route.
Maybe there are already ready-made solutions for this? I did an extensive Google search for it, but came out empty handed. But then I'm not sure I already know all the keywords involved to do a good search...
When coding up something myself, I think a minifilter driver might be needed intercepting ReadFile calls, but then it must at that spot run usermode apps from kernel space - not a happy marriage I assume. Or use an existing file system driver framework that allows for usermode parts, but I found the price of existing solutions to be too steep for my taste (several thousand dollars).
And I also assume that a standard file system (minifilter) driver might be required to return a consistent file size for such files, although the actual data size returned through ReadFile would of course differ on each call. Not to mention negating any buffering that takes place.
All in all I think that a create-it-yourself solution will take quite some effort, especially when you have never done Windows driver development in your life :) Although I see myself quite capable of learning up on it, the time invested will be prohibitive I think.
Another approach might be to hook ReadFile calls from the process doing the ReadFile - via IAT hooking, or via code injection. But I want this solution to more work 'out-of-the-box', i.e. all ReadFile requests for these special files trigger the correct behavior, regardless of origin. In my case I'd need to intercept my C++ compiler (G++) behavior, but that one is called on the fly by the IDE, so I see no easy way to detect it's startup and hook it up quickly before it does it's ReadFiles. And besides, I only want certain files to be special in this regard; intercepting all ReadFiles for a certain process is overkill.
You want something like FUSE (which I used with profit many times), but for Windows. Apparently there's Dokan, I've never used it but seems to be well known enough (and, at very least, can be used as an inspiration to see "how it's done").

Can a read() by one process see a partial write() by another?

If one process does a write() of size (and alignment) S (e.g. 8KB), then is it possible for another process to do a read (also of size and alignment S and the same file) that sees a mix of old and new data?
The writing process adds a checksum to each data block, and I'd like to know whether I can use a reading process to verify the checksums in the background. If the reader can see a partial write, then it will falsely indicate corruption.
What standards or documents apply here? Is there a portable way to avoid problems here, preferably without introducing lots of locking?
When a function is guaranteed to complete without there being any chance of any other process/thread/anything seeing things in a half finished state, it's said to be atomic. It either has or hasn't happened, there is no part way. While I can't speak to Windows, there are very few file operations in POSIX (which is what Linux/BSD/etc attempt to stick to) that are guaranteed to be atomic. Reading and writing are not guaranteed to be atomic.
While it would be pretty unlikely for you to write 2 bytes to a file and another process only see one of those bytes written, if by dumb luck your write straddled two different pages in memory and the VM system had to do something to prepare the second page, it's possible you'd see one byte without the other in a second process. Usually if things are page aligned in your file, they will be in memory, but again you can't rely on that.
Here's a list someone made of what is atomic in POSIX, which is pretty short, and I can't vouch for it's authenticity. (I can't think of why unlink isn't listed, for example).
I'd also caution you against testing what appears to work and running with it, the moment you start accessing files over a network file system (NFS on Unix, or SMB mounts in Windows) a lot of things that seemed to be atomic before no longer are.
If you want to have a second process calculating checksums while a first process is writing the file, you may want to open a pipe between the two and have the first process write a copy of everything down the pipe to the checksumming process. That may be faster than dealing with locking.

How do I send Hex data?

I am trying to communicate with a modbus slave via either modbusTCP or modbus serial. The manuf. (partlow) has an ASCII communications manual (http://www.partlow.com/uploadedFiles/Downloads/1160%20ASCII%20Comms%20Manual.pdf) which, looks like it differs from the standard communication methods (http://en.wikipedia.org/wiki/Modbus). A lot of existing code out there is setup to work with normal modbus addressing of coils and such, where it seems (at least to me) to be different with these guys.
So, via ruby or perl, how can I send hex data? I may be doing everything fine, but, if I write "0DFA" to a serial port... is that ok? or do I need to convert it into a lower layer first, or denote it somehow?
Been working on this a lot and may have myself mixed up (making things out to be more complicated than they are) but, i am trying to establish comm with this meter, and I can see the TX activity light blink but no RX, which means my data format is wrong...
Been working off this mostly (and a few perl snippets here and there, trying to find something that works):
http://www.messen-und-deuten.de/modbus.html
I am communicating through a terminal server, which accepts modbusTCP (which this script uses) but i'm having trouble applying whats in the comm manual to the code above, to get the packet formatted correctly.
Are you talking about raw data? There are several ways, including
print HANDLE "\x{OD}\x{FA}";
printf HANDLE "%c%c", 0x0D, 0xFA;
print HANDLE "\015\372"; # octal notation
print HANDLE pack("C*", 0x0D, 0xFA);
syswrite HANDLE, "\x{OD}\x{FA}", 2;
I would recommend you look at the RModBus library to help handle some of the intricacies of packet formation over TCP/IP from inside the Ruby language.
It is always possible that the device you are communicating with requires, or conversely avoids the modicon notation. That was a bit of a hiccup when I first tried reading registers from a PLC. The other "gotcha" that I've found with Modbus is that some of the addressing systems are offset by one due to quirkiness in their implementation.

Snoop interprocess communications

Has anyone tried to create a log file of interprocess communications? Could someone give me a little advice on the best way to achieve this?
The question is not quite clear, and comments make it less clear, but anyway...
The two things to try first are ipcs and strace -e trace=ipc.
If you want to log all IPC(seems very intensive), you should consider instrumentation.
Their are a lot of good tools for this, check out PIN in perticular, this section of the manual;
In this example, we show how to do
more selective instrumentation by
examining the instructions. This tool
generates a trace of all memory
addresses referenced by a program.
This is also useful for debugging and
for simulating a data cache in a
processor.
If your doing some heavy weight tuning and analysis, check out TAU (Tuning and analysis utilitiy).
Communication to a kernel driver can take many forms. There is usually a special device file for communication, or there can be a special socket type, like NETLINK. If you are lucky, there's a character device to which read() and write() are the sole means of interaction - if that's the case then those calls are easy to intercept with a variety of methods. If you are unlucky, many things are done with ioctls or something even more difficult.
However, running 'strace' on the program using the kernel driver to communicate can reveal just about all it does - though 'ltrace' might be more readable if there happens to be libraries the program uses for communication. By tuning the arguments to 'strace', you can probably get a dump which contains just the information you need:
First, just eyeball the calls and try to figure out the means of kernel communication
Then, add filters to strace call to log only the kernel communication calls
Finally, make sure strace logs the full strings of all calls, so you don't have to deal with truncated data
The answers which point to IPC debugging probably are not relevant, as communicating with the kernel almost never has anything to do with IPC (atleast not the different UNIX IPC facilities).

Resources