I'm trying to set up a script that will create empty .txt files with the size of 24MB in the /tmp/ directory. The idea behind this script is that Zabbix, a monitoring service, will notice that the directory is full and wipe it completely with the usage of a recovery expression.
However, I'm new to Linux and seem to be stuck on the script that generates the files. This is what I've currently written out.
today="$( date +¨%Y%m%d" )"
number=0
while test -e ¨$today$suffix.txt¨; do
(( ++number ))
suffix=¨$( printf -- %02d ¨$number¨ )
done
fname=¨$today$suffix.txt¨
printf ´Will use ¨%s¨ as filename\n´ ¨$fname¨
printf -c 24m /tmp/testf > ¨$fname¨
I'm thinking what I'm doing wrong has to do with the printf command. But some input, advice and/or directions to a guide to scripting are very welcome.
Many thanks,
Melanchole
I guess that it doesn't matter what bytes are actually in that file, as long as it fills up the temp dir. For that reason, the right tool to create the file is dd, which is available in every Linux distribution, often installed by default.
Check the manpage for different options, but the most important ones are
if: the input file, /dev/zero probably which is just an endless stream of bytes with value zero
of: the output file, you can keep the code you have to generate it
count: number of blocks to copy, just use 24 here
bs: size of each block, use 1MB for that
Related
I have 25 files in a directory, all named xmolout1, xmolout2, ... , xmolout25.
These are all .txt files and I need to copy the last 80 lines from these files to new .txt files.
Preferably, these would automatically generate the correct number (taken from the original file, e.g. xmolout10 would generate final10 etc.).
The original files can be deleted afterwards.
I am a newbie in bash scripting, I know I can copy the last 80 lines using tail -80 filename.txt > newfilename.txt, but I don't know how to implement the loop.
Thanks in advance
If you know the number of files to be processed, you could use a counter variable in a loop:
for ((i=1; i<=25; i++))
do
tail -80 "xmolout$i" >> "final$i"
done
If you want to remain compatible with shells other than bash you can use this syntax:
for i in {1..25}
do
tail -80 "xmolout$i" >> "final$i"
done
this is my very first post on Stackoverflow, and I should probably point out that I am EXTREMELY new to a lot of programming. I'm currently a postgraduate student doing projects involving a lot of coding in various programs, everything from LaTeX to bash, MATLAB etc etc.
If you could explicitly explain your answers that would be much appreciated as I'm trying to learn as I go. I apologise if there is an answer else where that does what I'm trying to do, but I have spent a couple of days looking now.
So to the problem I'm trying to solve: I'm currently using a selection of bioinformatics tools to analyse a range of genomes, and I'm trying to somewhat automate the process.
I have a few sequences with names that look like this for instance (all contained in folders of their own currently as paired files):
SOL2511_S5_L001_R1_001.fastq
SOL2511_S5_L001_R2_001.fastq
SOL2510_S4_L001_R1_001.fastq
SOL2510_S4_L001_R2_001.fastq
...and so on...
I basically wish to automate the process by turning these in to variables and passing these variables to each of the programs I use in turn. So for example my idea thus far was to assign them as wildcards, using the R1 and R2 (which appears in all the file names, as they represent each strand of DNA) as follows:
#!/bin/bash
seq1=*R1_001*
seq2=*R2_001*
On a rudimentary level this works, as it returns the correct files, so now I pass these variables to my first function which trims the DNA sequences down by a specified amount, like so:
# seqtk is the program suite, trimfq is a function within it,
# and the options -b -e specify how many bases to trim from the beginning and end of
# the DNA sequence respectively.
seqtk trimfq -b 10 -e 20 $seq1 >
seqtk trimfq -b 10 -e 20 $seq2 >
So now my problem is I wish to be able to append something like "_trim" to the output file which appears after the >, but I can't find anything that seems like it will work online.
Alternatively, I've been hunting for a script that will take the name of the folder that the files are in, and create a variable for the folder name which I can then give to the functions in question so that all the output files are named correctly for use later on.
Many thanks in advance for any help, and I apologise that this isn't really much of a minimum working example to go on, as I'm only just getting going on all this stuff!
Joe
EDIT
So I modified #ghoti 's for loop (does the job wonderfully I might add, rep for you :D ) and now I append trim_, as the loop as it was before ended up giving me a .fastq.trim which will cause errors later.
Is there any way I can append _trim to the end of the filename, but before the extension?
Explicit is usually better than implied, when matching filenames. Your wildcards may match more than you expect, especially if you have versions of the files with "_trim" appended to the end!
I would be more precise with the wildcards, and use for loops to process the files instead of relying on seqtk to handle multiple files. That way, you can do your own processing on the filenames.
Here's an example:
#!/bin/bash
# Define an array of sequences
sequences=(R1_001 R2_001)
# Step through the array...
for seq in ${sequences[#]}; do
# Step through the files in this sequence...
for file in SOL*_${seq}.fastq; do
seqtk trimfq -b 10 -e 20 "$file" > "${file}.trim"
done
done
I don't know how your folders are set up, so I haven't addressed that in this script. But the basic idea is that if you want the script to be able to manipulate individual filenames, you need something like a for loop to handle the that manipulation on a per-filename basis.
Does this help?
UPDATE:
To put _trim before the extension, replace the seqtk line with the following:
seqtk trimfq -b 10 -e 20 "$file" > "${file%.fastq}_trim.fastq"
This uses something documented in the Bash man page under Parameter Expansion if you want to read up on it. Basically, the ${file%.fastq} takes the $file variable and strips off a suffix. Then we add your extra text, along with the suffix.
You could also strip an extension using basename(1), but there's no need to call something external when you can use something built in to the shell.
Instead of setting variables with the filenames, you could pipe the output of ls to the command you want to run with these filenames, like this:
ls *R{1,2}_001* | xargs -I# sh -c 'seqtk trimfq -b 10 -e 20 "$1" > "${1}_trim"' -- #
xargs -I# will grab the output of the previous command and store it in # to be used by seqtk
I am running a program (pianobar) piped to a text file, that outputs every second. The resulting file ("pianobarout.txt") needs to be cleared regularly, or it grows to massive proportions. However, I do not want to stop pianobar to clear the file.
I have tried running > pianobarout.txt as well as echo "" > pianobarout.txt, but both cause the system's resources to spike heavily for almost 30 seconds, causing the audio from pianobar to skip. I tried removing the file, but it appears that the file is not recreated after being deleted, and I just lose the pipe.
I'm working from python, so if any library there can help, those are available to me.
Any ideas?
If you are currently redirecting with truncation, like yourprogram > file.txt, try redirecting with appending: yourprogram >> file.txt.
There is a big difference between the two when the output file is truncated.
With appending redirection, data is written to the current end of the file. If you truncate it to 0 bytes, the next write will happen at position 0.
With truncating redirection, data is written wherever the last write left off in the file. If you truncate it to 0 bytes, writes will continue at byte 1073741824 where it last left off.
This results in a sparse file if the filesystem supports it (ext2-4 and most Unix fs do), or a long wait while the file is written out if it doesn't (like fat32). A long wait could also be caused by anything following the file, such as tail -f, which has to potentially catch up by reading a GB of zeroes.
Alternatives include yourprogram | split -b 1G - output-, which will write 1GB each to output-aa, output-ab, etc, letting you delete old files at your leasure.
I have a folder of files which need to be renamed.
Instead of a simple incrimental numeric rename function I need to first provide a naming convention which will then incriment in order to ensure file name integrity within the folder.
say i have files:
wei12346.txt
wifr5678.txt
dkgj5678.txt
which need to be renamed to:
Eac-345-018.txt
Eac-345-019.txt
Eac-345-020.txt
Each time i run the script the naming could be different and the numeric incriment to go along with it may also be ddifferent:
Ebc-345-010.pdf
Ebc-345-011.pdf
Ebc-345-012.pdf
So i need to ask for a provided parameter from the user, i was thinking this might be useful as the previous file name in the list of files to be indexed eg: Eac-345-017.txt
The other thing I am unsure about with the incriment is how the script would deal with incrimenting 099 to 100 or 999 to 1000 as i am not aware of how this process is carried out.
I have been told that this is an easy script in perl however I am running cygwin on a windows machine in work and have access to only bash and windows shells in order to execute the script.
Any pointers to get me going would be greatly appreciated, i have some experience programming but scripting is almost entirely new.
Thanks,
Craig
(i understand there are allot of posts on this type of thing already but none seem to offer any concise answer, hence my question)
#!/bin/bash
prefix="$1"
shift
base_n="$1"
shift
step="$1"
shift
n=$base_n
for file in "$#" ; do
formatted_n=$(printf "%03d" $n)
# re-use original file extension whilke we're at it.
mv "$file" "${prefix}-${formatted_n}.${file##*.}"
let n=n+$step
done
Save the file, invoke it like this:
bash fancy_rename.sh Ebc-345- 10 1 /path/to/files/*
Note: In your example you "renamed" a .txt to a .pdf, but above I presumed the extension would stay the same. If you really wanted to just change the extension then it would be a trivial change. If you wanted to actually convert the file format then it would be a little more complex.
Note also that I have formatted the incrementing number with %03d. This means that your number sequence will be e.g.
010
011
012
...
099
100
101
...
999
1000
Meaning that it will be zero padded to three places but will automatically overflow if the number is larger. If you prefer consistency (always 4 digits) you should change the padding to %04d.
OK, you can do the following. You can ask the user first the prefix and then the starting sequence number. Then, you can use the built-in printf from bash to do the correct formatting on the numbers, but you may have to decide to provide enough number width to hold all the sequence, because this will result in a more homogeneous names. You can use read to read user input:
echo -n "Insert the prefix: "
read prefix
echo -n "Insert the sequence number: "
read sn
for i in * ; do
fp=`printf %04d $sn`
mv "$i" "$prefix-$fp.txt"
sn=`expr $sn + 1`
done
Note: You can extract the extension also. That wouldn't be a problem. Also, here I selected 4 numbers fot the sequence number, calculated into the variable $fp.
I have recently come up with a situation where I need to trim some rather large log files once they grow beyond a certain size. Everything but the last 1000 lines in each file is disposed of, the job is run every half hour by cron. My solution was to simply run through the list of files, check size and trim if necessary.
for $file (#fileList) {
if ( ((-s $file) / (1024 * 1024)) > $CSize) {
open FH, "$file" or die "Cannot open ${file}: $!\n";
$lineNo = 0;
my #tLines;
while(<FH>) {
push #tLines, $_;
shift #tLines if ++$lineNo < CLLimit;
}
close FH;
open FH, ">$file" or die "Cannot write to ${file}: $!\n";
print FH #tLines;
close FH;
}
This works in the current form but there is a lot of overhead for large log files (especially the ones with 100_000+ lines) because of the need to read in each line and shift if necessary.
Is there any way I could read in just a portion of the file, e.g. in this instance I want to be able to access only the last "CLLimit" lines. Since the script is being deployed on a system that has seen better days (think Celeron 700MHz with 64MB RAM) I am looking for a quicker alternative using Perl.
I realize you're wanting to use Perl, but if this is a UNIX system, why not use the "tail" utility to do the trimming? You could do this in BASH with a very simple script:
if [ `stat -f "%z" "$file"` -gt "$MAX_FILE_SIZE" ]; then
tail -1000 $file > $file.tmp
#copy and then rm to avoid inode problems
cp $file.tmp $file
rm $file.tmp
fi
That being said, you would probably find this post very helpful if you're set on using Perl for this.
Estimate the average length of a line in the log - call it N bytes.
Seek backwards from the end of the file by 1000 * 1.10 * N (10% margin for error in the factor 1.10). Read forward from there, keeping just the most recent 1000 lines.
The question was asked - which function or module?
Built-in function seek looks to me like the tool to use?
Consider simply using the logrotate utility; it is included in most modern Linux distributions. A related tool for BSD systems is called newsyslog. These tools are designed more-or-less for your intended purpose: it atomically moves a log file out of place, creates a new file (with the same name as before) to hold new log entries, instructs the program generating messages to use the new file, and then (optionally) compresses the old file. You can configure how many rotated logs to keep. Here's a potential tutorial:
http://www.debian-administration.org/articles/117
It is not precisely the interface you desire (keeping a certain number of lines) but the program will likely be more robust than what you will cook up on your own; for example, the answers here do not deal with atomically moving the file and notifying the log program to use a new file so there is the risk that some log messages are lost.