I am running a program (pianobar) piped to a text file, that outputs every second. The resulting file ("pianobarout.txt") needs to be cleared regularly, or it grows to massive proportions. However, I do not want to stop pianobar to clear the file.
I have tried running > pianobarout.txt as well as echo "" > pianobarout.txt, but both cause the system's resources to spike heavily for almost 30 seconds, causing the audio from pianobar to skip. I tried removing the file, but it appears that the file is not recreated after being deleted, and I just lose the pipe.
I'm working from python, so if any library there can help, those are available to me.
Any ideas?
If you are currently redirecting with truncation, like yourprogram > file.txt, try redirecting with appending: yourprogram >> file.txt.
There is a big difference between the two when the output file is truncated.
With appending redirection, data is written to the current end of the file. If you truncate it to 0 bytes, the next write will happen at position 0.
With truncating redirection, data is written wherever the last write left off in the file. If you truncate it to 0 bytes, writes will continue at byte 1073741824 where it last left off.
This results in a sparse file if the filesystem supports it (ext2-4 and most Unix fs do), or a long wait while the file is written out if it doesn't (like fat32). A long wait could also be caused by anything following the file, such as tail -f, which has to potentially catch up by reading a GB of zeroes.
Alternatives include yourprogram | split -b 1G - output-, which will write 1GB each to output-aa, output-ab, etc, letting you delete old files at your leasure.
Related
I'm trying to set up a script that will create empty .txt files with the size of 24MB in the /tmp/ directory. The idea behind this script is that Zabbix, a monitoring service, will notice that the directory is full and wipe it completely with the usage of a recovery expression.
However, I'm new to Linux and seem to be stuck on the script that generates the files. This is what I've currently written out.
today="$( date +¨%Y%m%d" )"
number=0
while test -e ¨$today$suffix.txt¨; do
(( ++number ))
suffix=¨$( printf -- %02d ¨$number¨ )
done
fname=¨$today$suffix.txt¨
printf ´Will use ¨%s¨ as filename\n´ ¨$fname¨
printf -c 24m /tmp/testf > ¨$fname¨
I'm thinking what I'm doing wrong has to do with the printf command. But some input, advice and/or directions to a guide to scripting are very welcome.
Many thanks,
Melanchole
I guess that it doesn't matter what bytes are actually in that file, as long as it fills up the temp dir. For that reason, the right tool to create the file is dd, which is available in every Linux distribution, often installed by default.
Check the manpage for different options, but the most important ones are
if: the input file, /dev/zero probably which is just an endless stream of bytes with value zero
of: the output file, you can keep the code you have to generate it
count: number of blocks to copy, just use 24 here
bs: size of each block, use 1MB for that
Is there an efficient way to search a list of strings from another text file or from a piped output?
I have tried the following methods:
FINDSTR /G:patternlist.txt <filetocheck>
or
Some program whose output is piped to FINDSTR
SOMEPROGRAM | FINDSTR /G:patternlist.txt
Similarly, tried GREP from MSYS, UnixUtils, GNU package etc.,
GREP -w -F -f patternlist.txt <filetocheck>
or
Some program whose output is piped to GREP
SOMEPROGRAM | GREP -w -F -f patternlist.txt
For example, Pattern List file is a text file which contains one literal string per line.
For example
Patternlist.txt
65sM547P
Bu83842T
t400N897
s20I9U10
i1786H6S
Qj04404e
tTV89965
etc.,
And the file_to_be_checked contains similar texts, but there might be multiple words in a single line in some cases.
For example
Filetocheck.txt
3Lo76SZ4 CjBS8WeS
iI9NvIDC
TIS0jFUI w6SbUuJq joN2TOVZ
Ee3z83an rpAb8rWp
Rmd6vBcg
O2UYJOos
hKjL91CB
Dq0tpL5R R04hKmeI W9Gs34AU
etc.,
They work as expected if the number of pattern literals are less than 50000 and sometime works very slow upto 100000 patterns.
Also, the filetocheck.txt will contain upto 250000 lines and grows upto 30 MB in size.
The problem comes when the pattern file becomes larger than this. I have an instance of patternfile which is around 20 MB and contains 600000 string literals.
Matching this against a list or output of 250000 to 300000 lines of text literally stalls the processor.
I tried SIFT, and multiple other text search tools, but they just kill the system with the memory requirements and processor usage and make the system unresponsive.
I require a commandline based solution or utility which could help in achieving this task because this is a part of another big script.
I have tried multiple programs and methods to speed up, but all in vain like indexing the pattern file, sorting the file alphabetically etc.,.
Since the input will be from a program, there is no option to split the input file as well. It is all in one big piped command.
Example:
PASSWORDGEN | <COMMAND_TO_FILTER_KNOWN_PASSWORDS> >> FILTERED_OUTPUT
The above problem is in part where the system hangs or take very long time to filter the stdout stream or from a saved results file.
System configuration details if this will be any help:
I am running this on a modest 8 GB RAM, SATA HDD, Core i7 with Win 7 64bit and currently I do not have any better configuration available currently.
Any help in this issue is much appreciated.
I am also trying to find a solution if not create a specific code to achieve this (help appreciated in that sense as well.)
i have a basic script.sh that runs some commands inside. The script is like:
(script.sh)
......
`gcc -o program program.c`
if [ $? -eq 0 ]; then
echo "Compiled successfully....\n" >> out.txt
#set a timeout for ./program execution and append results to file
(gtimeout 10s ./program) 2> out.txt # <-- NOT WORKING
......
I run this script through the terminal like:
#Go to this directory,pass all folders to compile&execute the program.c file
czar#MBP~$ for D in /Users/czar/Desktop/1/*; do sh script.sh $D; done
EDIT: The output i get in the terminal, not so important though:
# program.c from 1st folder inside the above path
Cycle l=1: 46 46
Cycle l=1: 48 48
Cycle l=2: 250 274 250
Cycle l=1: 896 896
.........
# program.c from 2nd folder inside the above path
Cycle l=1: 46 46
Cycle l=1: 48 48
Cycle l=2: 250 274 250
Cycle l=1: 896 896
.........
The GOAL is to have those into the out.txt
The output i get is almost what i want: it executes whatever possible in those 10seconds but doesn't redirect the result to out.txt, it just prints to the terminal.
I have tried every suggestion proposed here but no luck.
Any other ideas appreciated.
EDIT 2: SOLUTION given in the comments.
The basic approach is much simpler than the command you copied from the answer to a completely different question. What you need to do is simply redirect standard output to your file:
# Use gtimeout on systems which rename standard Gnu utilities
timeout 10s ./program >> out.txt
However, that will probably not produce all of the output generated by the program if the program is killed by gtimeout, because the output is still sitting in a buffer inside the standard library. (There is nothing special about this buffer; it's just a block of memory malloc'd by the library functions the first time data is written to the stream.) When the program is terminated, its memory is returned to the operating system; nothing will even try to ensure that standard library buffers are flushed to their respective streams.
There are three buffering modes:
Block buffered: no output is produced until the stream's buffer is full. (Usually, the stream's buffer will be around 8kb, but it varies from system to system.)
Line buffered: output is produced when a newline character is sent to the stream. It's also produced if the buffer fills up, but it's rare for a single line to be long enough to fill a buffer.
Unbuffered: No buffering is performed at all. Every character is immediately sent to the output.
Normally, standard output is block buffered unless it is directed to a terminal, in which case it will be line buffered. (That's not guaranteed; the various standards allow quite a lot of latitude.) Line buffering is probably what you want, unless you're in the habit of writing programs which write partial lines. (The oddly-common idiom of putting a newline at the beginning of each output line rather than at the end is a really bad idea, precisely because it defeats line-buffering.) Unbuffered output is another possibility, but it's really slow if the program produces a substantial amount of output.
You can change the buffering mode before you write any data to the stream by calling setvbuf:
/* Line buffer stdout */
setvbuf(stdout, NULL, _IOLBF, 0);
(See man setvbuf for more options.)
You can also tell the library to immediately send any buffered data by calling fflush:
fflush(stdout);
That's an effective technique if you don't want the (slight) overhead of line buffering, but you know when it is important to send data (typically, because the program is about to do some very long computation, or wait for some external event).
If you can't modify the source code, you can use the Gnu utility stdbuf to change the buffering mode before starting the program. stdbuf will not work with all programs -- for example, it won't have any effect if the program does call setvbuf -- but it is usually effective. For example, to line buffer stdout, you could do this:
timeout 10s stdbuf -oL ./program >> out.txt
# Or: gtimeout 10s gstdbuf -oL ./program >> out.txt
See man stdbuf for more information.
I have 305 files. Each is ~10M lines. I only need to alter the first 20 lines of each file.
Specifically I need to add # as the first char of the first 18 Lines, delete the 19th line (but safer to say, delete all lines that are completely blank, and replace > with # on the 20th line.
The remaining 9.9999999M lines dont need to change at all.
If the files were not gzipped, I could do something like:
while read F; do
for i in $(seq 1 100); do
awk '{gsub(/#/,"##"); print $0}' $F
awk more commands
awk more commnds
done
done < "$FNAMES"
but what is really throwing a wrench is the fact the files are all gzipped. Is there any way to efficiently alter these 20 lines without unzipping and / or rewriting the whole file?
No, it is not possible. With adaptive compression schemes (such as the Lempel-Ziv system gzip uses), it adjusts the encoding based on what it sees as it goes through the file. This means that the way the end of the file gets compressed (and hence decompressed) depends on the beginning of the file. If you change just the beginning of the (compressed) file, you'll change how the end gets decompressed, essentially corrupting the file.
So decompressing, modifying, and recompressing is the only way to do it.
I have a situation with a failing LaCie 500GB hard drive. It stays on for only about 10 minutes, then becomes unusable. For those 10 minutes or so I do have complete control.
I can't get my main mov file(160GB) transferred off that quickly, so I was thinking if I split it into small chunks, I could move them all off. I tried splitting the movie file using the SPLIT command, but it of course took longer than 10 minutes. I ended up with about 14GBs of files 2GB each before it failed.
Is there a way I can use a split command and skip any existing files chunks, so as I'm splitting this file it will see xaa, xab, xac and start after that point so it will continue to split the file starting with xad?
Or is there a better option that can split a file in multiple stages? I looked at csplit as well, but that didn't seem like an option either.
Thanks!
-------- UPDATE ------------
Now with the help of bcat and Mark I was able to do this using the following
dd if=/Volumes/badharddrive/file.mov of=/Volumes/mainharddrive/movieparts/moviepart1 bs=1g count=4
dd if=/Volumes/badharddrive/file.mov of=/Volumes/mainharddrive/movieparts/moviepart2 bs=1g count=4 skip=4
dd if=/Volumes/badharddrive/file.mov of=/Volumes/mainharddrive/movieparts/moviepart3 bs=1g count=4 skip=8
etc
cat /Volumes/mainharddrive/movieparts/moviepart[1-3] -> newmovie.mov
You can always use the dd command to copy chunks of the old file into a new location. This has the added benefit of not doing unnecessary writes to the failing drive. Using dd like this could get tedious with such a large mov file, but you should be able to write a simple shell script to automate part of the process.
Duh! bcat's answer is way better than mine, but since I wrote some code I figured I'd go ahead and post it.
input = ARGV[0]
length = ARGV[1].to_i
offset = ARGV[2].to_i
File.open "#{input}-#{offset}-#{length}", 'w' do |file|
file.write(File.read input, length, offset)
end
Use it like this:
$ ruby test.rb input_file length offset