Can I read from a file descriptor on one thread while writing from another? - thread-safety

As the title says, can I do this with a POSIX file descriptor? In my case, it's a serial device where I have one thread reading and another one writing.

Yes, you can do that with a serial port, no problem.
You could do it with a regular file too, though it would probably be confusing since you'd have to carefully manage the contents of the file so that the reader and writer aren't stepping on each other and especially carefully manage the seek pointer (use pread() and pwrite() which don't depend on the seek pointer). Obviously, with a serial port which has separate in & out directions and no concept of a seek pointer, it's more straightforward.

Related

Can I guarantee an atomic append in Ruby?

Based on Is file append atomic in UNIX? and other sources, it looks like on modern Linux, I can open a file in append mode and write small chunks (< PIPE_BUF) to it from multiple processes without worrying about tearing.
Are those limits extended to Ruby with syswrite? Specifically for this code:
f = File.new('...', 'a')
f.syswrite("short string\n")
Can I expect the write to not interleave with other process writing the same way? Or is there some buffering / potential splitting I'm not aware of yet?
Assuming ruby >= 2.3
I recently researched this very topic in order to implement the File appender in Rackstash.
You can find tests for this in the specs which I originally adapted from a blog post on this topic, whose code unfortunately is non-conclusive unfortunately since the author does not write to the file directly but through a pipe. Please read the comments there.
With (1) modern operating systems and (2) their usual local filesystems, the OS guarantees that concurrent appends from multiple processes do write interleave data.
The main points here are:
You need a reasonably modern operating system. Very old (or exotic) systems had lesser guarantees. Here, you might have to lock the file explicitly with e.g. File#flock.
You need to use a compatible filesystem. Most local filesystems like HFS, APFS, NTFS, and the usual Linux filesystems like ext3, ext4, btrfs, ... should be safe. SMB is one of the few network filesystems that also guarantees this. NFS and most FUSE filesystems are not safe in this regard.
Note that this mechanism does not guarantee that concurrent readers always read full writes. While the writes themselves are never interleaved, readers might read the partial results of in-progress writes.
As far as my understanding (and my tests) goes, the atomically writable size isn't even restricted to PIPE_SIZE. This limit only applies when writing to a pipe like a socket or e.g. STDOUT instead of a real file.
Unfortunately, authoritative information on this topic is rather sparse. Most articles (and SO answers) on this topic conflate strict appending with random writes. When not strictly appending (by opening the file in append-only mode), the guarantees are null and void.
Thus, to answer your specific question: Yes, your code from your question should be safe when writing to a local filesystem on modern operating systems. I think syswrite bypasses the file buffer already. To be sure, you should also set f.sync = true before writing to it to completely disable any buffering.
Note that you should still use a Mutex (or similar) if you plan to write to the single opened file from multiple threads in your process (since the append-guarantees of the OS are only valid for concurrent writes to different file-descriptors; it can not discern overlapping writes by the same process to the same file descriptor).
I would not assume that. syswrite calls the write POSIX function, which make no claims about atomicity when dealing with files.
See: Are POSIX' read() and write() system calls atomic?
And Understanding concurrent file writes from multiple processes
Tl;dr- you should implement some concurrency control in your app to synchronize this access.

Golang simultaneous read/write to the file without explicit file lock

I have a situation where I need to concurrently read/write from/to the file, but the scope of operations is limited:
append only, no random offset writes
read from random position, where I know for sure the content has been written before(via append, internal access serialization via golang channel to ensure random read happens only after content's been appended)
there is only one process running
This is a high loaded application and I would like to avoid locking file for each read/write I do
I was going to open 2 files - one for read, another for append only
would doing so create some potential issues/bugs?
what is the recommended practice if I would like to avoid file locking for each read/write I do?
p.s. golang, linux, ext4
I'll assume by "random read" you actually mean "arbitrary read".
If I understand your use case correctly, you don't need to seek or lock or do anything manual. UNIX has this covered via O_APPEND. Here is what you can do:
Open the file with os.O_APPEND. This way every write, regardless of any preceding operations, will go to the end of the file
When reading use File.ReadAt. This lets you specify arbitrary offsets for your reads
Using this scheme you can avoid any sort of locking: the OS will do it for you. Because of the buffer cache this scheme is not even inefficient: appends and reads are pretty much independent.

Why isn't copy operation implemented in kernel?

It's my understanding that most file IO operations are implemented in the kernel, such as CRUD, move or remove. However file copy is not implemented as a kernel level API.
In order to detect a file copy in the kernel one will need to use heuristics approach (discussion on this approach), e.g. as detect file reads, file creates and file writes from the same user with the same file name, but different paths.
Why copy is a user land operation?
First, because caring about whether or not two different files have the same content, where one file's content is copied directly from the other, is a user-space concern that has no logical reason to exist inside a kernel.
At best.
Bytes are bytes.
Second, how would the kernel distinguish copying a file between what are just two different file descriptors? See the man page for sendfile(). Why should the kernel track if the calling user called sendfile() to send the contents of a file to a TCP socket to who-knows-where or to another file?
Third, even if the kernel tracked copying a file, what on God's good Earth would it do with such data?
If you care about such file copy events, set up auditing.

how to use get_user to copy data from user space to kernel space

I want to copy an integer variable from user space to kernel space.
Can anyone give me a simple example how to do this?
I came to know that we can use get_user but i am unable to know how..
Check man pages of copy_to_user and copy_from_user.
Write a simple kernel module, with read/write operations, and register and char device for them, something like /dev/sample.
Do an application write/read, on fd opened by this application.
Now you need to implement the mechanism for transferring this data to kernel space and read back whatever returned.
- In write you do a copy_from_user, before this check passed buffer is valid or not.
- In read you do a copy_to_user.
Make sure error conditions are taken care of, and open call implementation should keep track of how many opens are there, if you want to implement multiple open, and this count should be decremented, when application calls a close on opened FD.
Do you follow ?

Appending data to a file from the Linux Kernel

I'm trying gather measurements of cycle counts for a particular sys call (sys_clone) in the linux kernel. That said, my process won't be the only one calling it and I can't know my pid ahead of time; so I'll have to record every invocation of it for every pid.
The problem that I've got is that the only ways I can figure out how to output this data (debugfs, sysfs, procfs) involve statically sized buffers, which will be quickly overwritten with irrelevant data from other processes calling sys_clone.
So, does anyone know how to append an arbitrary number of lines to a user space accessible file in linux?
You can take the printk()/klogd approach, and use a circular buffer that is exported via /proc. A user-space process blocks on reading your /proc file, and once it reads something that is removed from the buffer. In fact, you could take a look whether klogd/syslogd can be modified to also read your /proc file, thus you wouldn't need to implement the userspace part.
If you are good with something simpler, just printk() your information in a normalized form with some prefix, and then just filter it out from your syslog using this prefix.
There are a few more possibilities (e.g. using netlink to send messages to userspace), but writing to a file from the kernel is not something I'd recommend.
You could stash the counts in the right task_struct, and make it visible through a per-process file in /proc/<pid>/.

Resources