I have a C program that has a scanf call followed by a read call. I want to feed both inputs using printf.
printf 10 | program_name doesn't work for some reason; scanf correctly picks up 10, but the read call defaults to " " and doesn't even ask for input.
I want to use printf twice, once to pass input to scanf and the second time to pass input to read. How can I do this?
As a terrible hack, you need to ensure that scanf's buffer is full. Something like:
{ printf 10; dd if=/dev/zero bs=4094 count=1;
echo This text will go to the read if bufsize is 4096; } | program_name
The technique here is relying on scanf reading the first 4096 bytes to fill its buffer on its first read, leaving data in the pipe for the read to get. The main problem is that it is extremely fragile and requires intimate knowledge of the buffering used. Overall, this is a terrible idea, but not too much worse that calling read after calling scanf on the same file descriptor.
Related
My program normally uses the controlling terminal to read input from the user.
// GetCtty gets the file descriptor of the controlling terminal.
func GetCtty() (*os.File, error) {
return os.OpenFile("/dev/tty", os.O_RDONLY, 0)
}
I am currently constructing several times a s := bufio.NewScanner(GetCtty()) during the programm and read the input from s.Scan() with s.Text(). Which works nice.
However, for testing I am simulating the following input on stdin to my CLI Go-Program
echo -e "yes\nno\nyes\n" | app
This will not work correctly because the first construction of s and s.Scan() will already have buffered other test inputs which will not be available to a new construction by bufio.NewScanner and subsequent scan.
I am wondering how I can make sure that only one line is read from the stdin stream by s *bufio.Scanner or how I can mock my input to the controlling terminal.
I had several guesses but I am not sure if they work:
using only one bufio.Scanner in the whole program is a solution but I did not want to go this way...
write back the buffered data to GetCtty() with s.WriteTo(GetCtty()) (?) want work as the stuff gets appended instead of prepended on stdin?
Somehow only read a single line and do not consume more bytes, does that untimately mean to read not in chunks but byte by bytes (?)...
Use iotest.OneByteReader to disable buffering in the scanner:
s := bufio.NewScanner(iotest.OneByteReader(GetCtty()))
Short version:
how to read from STDIN (or a file) char by char while maintaining high performance using Ruby? (though the problem is probably not Ruby specific)
Long version:
While learning Ruby I'm designing a little utility that has to read from a piped text data, find and collect numbers in it and do some processing.
cat huge_text_file.txt | program.rb
input > 123123sdas234sdsd5a ...
output > 123123, 234, 5, ...
The text input might be huge (gigabytes) and it might not contain newlines or whitespace (any non-digit char is a separator) so I did a char by char reading (though I had my concerns about the performance) and it turns out doing it this way is incredibly slow.
Simply reading char by char with no processing on a 900Kb input file takes around 7 seconds!
while c = STDIN.read(1)
end
If I input data with newlines and read line by line, same file is read 100x times faster.
while s = STDIN.gets
end
It seems like reading from a pipe with STDIN.read(1) doesn't involve any buffering and every time read happens, hard drive is hit - but shouldn't it be cached by OS?
Doesn't STDIN.gets read char by char internally until it encounters '\n'?
Using C, I would probably read data in chunks though I would I have to deal with numbers being split by buffer window but that doesn't look like an elegant solution for Ruby. So what is the proper way of doing this?
P.S Timing reading the same file in Python:
for line in f:
line
f.close()
Running time is 0.01 sec.
c = f.read(1)
while c:
c = f.read(1)
f.close()
Running time is 0.17 sec.
Thanks!
This script reads the IO object word by word, and executes the block every time 1000 words have been found or the end of the file has been reached.
No more than 1000 words will be kept in memory at the same time. Note that using " " as separator means that "words" might contain newlines.
This scripts uses IO#each to specify a separator (a whitespace in this case, to get an Enumerator of words), lazy to avoid doing any operation on the whole file content and each_slice to get an array of batch_size words.
batch_size = 1000
STDIN.each(" ").lazy.each_slice(batch_size) do |batch|
# batch is an Array of batch_size words
end
Instead of using cat and |, you could also read the file directly :
batch_size = 1000
File.open('huge_text_file.txt').each(" ").lazy.each_slice(batch_size) do |batch|
# batch is an Array of batch_size words
end
With this code, no number will be split, no logic is needed, it should be much faster than reading the file char by char and it will use much less memory than reading the whole file into a String.
i have a requirement where many threads will call same shell script to perform a work, and then will write output(data as single text line) to a common text file.
as here many threads will try to write data to same file, my question is whether unix provides a default locking mechanism so that all can not write at the same time.
Performing a short single write to a file opened for append is mostly atomic; you can get away with it most of the time (depending on your filesystem), but if you want to be guaranteed that your writes won't interrupt each other, or to write arbitrarily long strings, or to be able to perform multiple writes, or to perform a block of writes and be assured that their contents will be next to each other in the resulting file, then you'll want to lock.
While not part of POSIX (unlike the C library call for which it's named), the flock tool provides the ability to perform advisory locking ("advisory" -- as opposed to "mandatory" -- meaning that other potential writers need to voluntarily participate):
(
flock -x 99 || exit # lock the file descriptor
echo "content" >&99 # write content to that locked FD
) 99>>/path/to/shared-file
The use of file descriptor #99 is completely arbitrary -- any unused FD number can be chosen. Similarly, one can safely put the lock on a different file than the one to which content is written while the lock is held.
The advantage of this approach over several conventional mechanisms (such as using exclusive creation of a file or directory) is automatic unlock: If the subshell holding the file descriptor on which the lock is held exits for any reason, including a power failure or unexpected reboot, the lock will be automatically released.
my question is whether unix provides a default locking mechanism so
that all can not write at the same time.
In general, no. At least not something that's guaranteed to work. But there are other ways to solve your problem, such as lockfile, if you have it available:
Examples
Suppose you want to make sure that access to the file "important" is
serialised, i.e., no more than one program or shell script should be
allowed to access it. For simplicity's sake, let's suppose that it is
a shell script. In this case you could solve it like this:
...
lockfile important.lock
...
access_"important"_to_your_hearts_content
...
rm -f important.lock
...
Now if all the scripts that access "important" follow this guideline,
you will be assured that at most one script will be executing between
the 'lockfile' and the 'rm' commands.
But, there's actually a better way, if you can use C or C++: Use the low-level open call to open the file in append mode, and call write() to write your data. With no locking necessary. Per the write() man page:
If the O_APPEND flag of the file status flags is set, the file offset
shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
Like this:
// process-wide global file descriptor
int outputFD = open( fileName, O_WRONLY | O_APPEND, 0600 );
.
.
.
// write a string to the file
ssize_t writeToFile( const char *data )
{
return( write( outputFD, data, strlen( data ) );
}
In practice, you can write anything to the file - it doesn't have to be a NUL-terminated character string.
That's supposed to be atomic on writes up to PIPE_BUF bytes, which is usually something like 512, 4096, or 5120. Some Linux filesystems apparently don't implement that properly, so you may in practice be limited to about 1K on those file systems.
We've been using CygWin (/usr/bin/x86_64-w64-mingw32-gcc) to generate Windows 64-bit executable files and it has been working fine through yesterday. Today it stopped working in a bizarre way--it "caches" standard output until the program ends. I wrote a six line example
that did the same thing. Since we use the code in batch, I wouldn't worry except when I run a test case on the now-strangely-caching executable, it opens the output files, ends early, and does not fill them with data. (The same code on Linux works fine, but these guys are using Windows.) I know it's not gorgeous code, but it demonstrates my problem, printing the numbers "1 2 3 4 5 6 7 8 9 10" only after I press the key.
#include <stdio.h>
main ()
{
char q[256];
int i;
for (i = 1; i <= 10; i++)
printf ("%d ", i);
gets (q);
printf ("\n");
}
Does anybody know enough CygWin to help me out here? What do I try? (I don't know how to get version numbers--I did try to get them.) I found a 64-bit cygwin1.dll in /cygdrive/c/cygwin64/bin and that didn't help a bit. The 32-bit gcc compilation works fine, but I need 64-bit to work. Any suggestions will be appreciated.
Edit: we found and corrected an unexpected error in the original code that caused the program not to populate the output files. At this point, the remaining problem is that cygwin won't show the output of the program.
For months, the 64-bit executable has properly generated the expected output, just as the 32-bit version did. Just today, it has started exhibiting the "caching" behavior described above. The program sends many hundreds of lines with many newline characters through stdout. Now, when the 64-bit executable is created as above, none of these lines are shown until the program completes and the entire output it printed at once. Can anybody provide insight into this problem?
This is quite normal. printf outputs to stdout which is a FILE* and is normally line buffered when connected to a terminal. This means you will not see any output until you write a newline, or the internal buffer of the stdout FILE* is full (A common buffer size is 4096 bytes).
If you write to a file or pipe, output might be fully buffered, in which case output is flushed when the internal buffer is full and not when you write a newline.
In all cases the buffers of a FILE* are flushed when: you call fflush(..). You call fclose(..) or the program ends normally.
Your program will behave the same on windows/cygwin as on linux.
You can add a call to fflush(stdout) to see the output immediately.
for (i = 1; i <= 10; i++) {
printf ("%d ", i);
fflush(stdout);
}
Also, do not use the gets() function.
If your real programs "ends early" and does not write data in text files that it's supposed to, it may be it crashes due to a bug of yours before it finishes, in which case the buffered output will not be flushed out. Or, more unlikely, you call the _exit() function, which will terminate the program without flushing FILE* buffers (in contrast to the exit() function)
I have a script that looks something like:
while true; do
read -t10 -d$'\n' input_from_serial_device </dev/ttyS0
# do some costly processing on the string
done
The problem is that it will miss the next input from the serial device because it is burning CPU cycles doing the costly string processing.
I thought I could fix this by using a pipe, on the principle that bash will buffer the input between the two processes:
( while true; do
read -d$'\n' input_from_serial_device </dev/ttyS0
echo $input_from_serial_device
done ) | ( while true; do
read -t10 input_from_first_process
# costly string processing
done )
I firstly want to check that I've understood the pipes correctly and that this will indeed buffer the input between the two processes as I intended. Is this idea correct?
Secondly, if I get the input I'm looking for in the second process, is there a way to immediately kill both processes, rather than exiting from the second and waiting for the next input before exiting the first?
Finally, I realise bash isn't the best way to do this and I'm currently working on a C program, but I'd quite like to get this working as an intermediate solution.
Thank you!
The problem isn't the pipe. It's the serial device.
When you write
while true; do
read -t10 -d$'\n' input_from_serial_device </dev/ttyS0
# use a lot of time
done
the consequence is that the serial device is opened, a line is read from it, and it is then closed. Then it is not opened again until # use a lot of time is done. While a serial device is not open, incoming serial input is thrown away.
If the input is truly coming in faster than it can be processed, then buffering isn't enough. You'll have to throw input away. If, on the other hand, it's dribbling in at an average speed which allows for processing, then you should be able to achieve what you want by keeping the serial device open:
while true; do
read -t10 -r input_from_serial_device
# process input_from_serial_device
done < /dev/ttyS0
Note: I added -r -- almost certainly necessary -- to your read call, and removed -d$'\n', because that is the default.