bash script to loop through different files and perform command

bash script to loop through different files and perform command - bash

I have 25 files in a directory, all named xmolout1, xmolout2, ... , xmolout25.
These are all .txt files and I need to copy the last 80 lines from these files to new .txt files.
Preferably, these would automatically generate the correct number (taken from the original file, e.g. xmolout10 would generate final10 etc.).
The original files can be deleted afterwards.
I am a newbie in bash scripting, I know I can copy the last 80 lines using tail -80 filename.txt > newfilename.txt, but I don't know how to implement the loop.
Thanks in advance

If you know the number of files to be processed, you could use a counter variable in a loop:
for ((i=1; i<=25; i++))
do
tail -80 "xmolout$i" >> "final$i"
done
If you want to remain compatible with shells other than bash you can use this syntax:
for i in {1..25}
do
tail -80 "xmolout$i" >> "final$i"
done

Related

For loop within a for loop for iterating files of different extensions

Say I have 20 different files. First 10 files end with .counts.tsv and the rest of the files end with .libsize.tsv. For each .counts.tsv there are matching .libsize.tsv files. I would like to use a for loop for selecting both of these files and run an R script for on those two files types.
Here is what I tried,
#!/bin/bash
arti='/home/path/tofiles'
for counts in ${arti}/*__counts.tsv ; do
for libsize in "$arti"/*__libsize.tsv ; do
Rscript score.R ${counts} ${libsize}
done;
done;
The above shell script iterates over the files more than 200 times whereas I have only 20 files. I need the Rscript to be executed 10 times for both files. Any suggestions would be appreciated.

I started typing up an answer before seeing your comment that you're only interested in a bash solution, posting anyway in case someone finds this question in the future and is open to an R based solution.
If I were approaching this from scratch, I'd probably just use an R function defined in the file that takes the two file names instead of messing around with the system() calls, but this would provide the behavior you desire.
## Get a vector of files matching each extension
counts_names <- list.files(path = ".", pattern ="*.counts.tsv")
libsize_names <- list.files(path = ".", pattern ="*.libsize.tsv")
## Get the root names of the files before the extensions
counts_roots <- gsub(".counts.tsv$", "",counts_names)
libsize_roots <- gsub(".libsize.tsv$", "",libsize_names)
## Get only root names that have both file types
shared_roots <- intersect(libsize_roots,counts_roots)
## Loop through the shared root names and execute an Rscript call based on the two files
for(i in seq_along(shared_roots)){
counts_filename <- paste0(shared_roots[[i]],".counts.tsv")
libsize_filename <- paste0(shared_roots[[i]],".libsize.tsv")
Command <- paste("Rscript score.R",counts_filename,libsize_filename)
system(Command)
}

Construct the second filename with ${counts%counts.tsv} (remove last part).
#!/bin/bash
arti='/home/path/tofiles'
for counts in ${arti}/*__counts.tsv ; do
libsize="${counts%counts.tsv}libsize.tsv"
Rscript score.R "${counts}" "${libsize}"
done
EDIT:
Less safe is trying to make it an oneliner. When the filenames are without spaces and newlines, you can risk an accident with
echo ${arti}/*counts.tsv ${arti}/*.libsize.tsv | xargs -n2 Rscript score.R
and when you feel really lucky (with no other files than those tsv files in $arti) make a bungee jump with
echo ${arti}/* | xargs -n2 Rscript score.R

Have you tried list.files in base? This will allow you to use all files in the folder.
arti='/home/path/tofiles'
for i in list.files(arti) {
script
}

See whether the below helps.
my_list = list.files("./Data")
counts = grep("counts.tsv", my_list, value=T)
libsize = grep("libsize.tsv", my_list, value=T)
for (i in seq(length(counts))){
system(paste("Rscript score.R",counts[i],libsize[i]))
}

Finally,
I tried the following and it helped me,
for sam in "$arti"/*__counts.tsv ; do
filebase=$(basename $sam)
samples=$(ls -1 ${filebase}|awk -F'[-1]' '{print $1}')
Rscript score.R ${samples}__counts.tsv ${samples}__libsize.tsv
done;
For someone looking for something similar :)

Running a process on every combination between files in two folders

I have two folders where the 1st has 19 .fa files and the 2nd has 37096 .fa files
Files in the 1st folder are named BF_genomea[a-s].fa, and files in the 2nd are named [1-37096]ZF_genome.fa
I have to run this process where lastz filein1stfolder filein2ndfolder [arguments] > outputfile.axt, so that I run every file in the 1st folder against every file in the 2nd folder.
Any sort of output file's naming would serve, as far as it allows for id which particular combination of parent files they came from, and they have extension .axt
This is what I have done so far
for file in /tibet/madzays/finch_data/BF_genome_split/*.fa; do for otherfile in /tibet/madzays/finch_data/ZF_genome_split/*.fa; name="${file##*/}"; othername="${otherfile##*/}"; lastz $file $otherfile --step=19 --hspthresh=2200 --gappedthresh=10000 --ydrop=3400 --inner=2000 --seed=12of19 --format=axt --scores=/tibet/madzays/finch_data/BFvsZFLASTZ/HoxD55.q > /home/madzays/qsub/test/"$name""$othername".axt; done; done

Ad I said in a comment, the inner loop is missing a do keyword (for otherfile in pattern; do <-- right there). Is this in the form of a script file? If so, you should add a shebang as the first line to tell the OS how to run the script. And break it into multiple lines and indent the contents of the loops, to make it easier to read (and easier to spot problems like the missing do).
Off the top of my head, I see one other thing I'd change: the output filenames are going to be pretty ugly, just the two input files mashed together with a ".atx" on the end (along the lines of "BF_genomeac.fa14ZF_genome.fa.axt"). I'd parse the IDs out of the input filenames and then use them to build a more reasonable output filename convention. Something like this
#!/bin/bash
for file in /tibet/madzays/finch_data/BF_genome_split/*.fa; do
for otherfile in /tibet/madzays/finch_data/ZF_genome_split/*.fa; do
name="${file##*/}"
tmp="${name#BF_genomea}" # remove filename prefix
id="${tmp%.*}" # remove extension to get the ID
othername="${otherfile##*/}"
otherid="${othername%ZF_genome.fa}" # just have to remove a suffix here
lastz $file $otherfile --step=19 --hspthresh=2200 --gappedthresh=10000 --ydrop=3400 --inner=2000 --seed=12of19 --format=axt --scores=/tibet/madzays/finch_data/BFvsZFLASTZ/HoxD55.q > "/home/madzays/qsub/test/BF${id}_${otherid}ZF.axt"
done
done

The code can nearly directly been translated from your requierements:
base=/tibet/madzays/finch_data
for b in {a..s}
do
for z in {1..37096}
do
lastz $base/BF_genome_split/${b}.fa $base/ZF_genome_split/${z}.fa --hspthresh=2200 --gappedthresh=10000 --ydrop=3400 --inner=2000 --seed=12of19 --format=axt --scores=$base/BFvsZFLASTZ/HoxD55.q > /home/madzays/qsub/test/${b}-${z}.axt
done
done
Note that oneliners easily lead to errors, like missing dos, which are then hard to find from the error message (error in line 1).

Bash script that creates files of a set size

I'm trying to set up a script that will create empty .txt files with the size of 24MB in the /tmp/ directory. The idea behind this script is that Zabbix, a monitoring service, will notice that the directory is full and wipe it completely with the usage of a recovery expression.
However, I'm new to Linux and seem to be stuck on the script that generates the files. This is what I've currently written out.
today="$( date +¨%Y%m%d" )"
number=0
while test -e ¨$today$suffix.txt¨; do
(( ++number ))
suffix=¨$( printf -- %02d ¨$number¨ )
done
fname=¨$today$suffix.txt¨
printf ´Will use ¨%s¨ as filename\n´ ¨$fname¨
printf -c 24m /tmp/testf > ¨$fname¨
I'm thinking what I'm doing wrong has to do with the printf command. But some input, advice and/or directions to a guide to scripting are very welcome.
Many thanks,
Melanchole

I guess that it doesn't matter what bytes are actually in that file, as long as it fills up the temp dir. For that reason, the right tool to create the file is dd, which is available in every Linux distribution, often installed by default.
Check the manpage for different options, but the most important ones are
if: the input file, /dev/zero probably which is just an endless stream of bytes with value zero
of: the output file, you can keep the code you have to generate it
count: number of blocks to copy, just use 24 here
bs: size of each block, use 1MB for that

renaming images into an ordered sequence using shell

I have a bunch of folders containing images that are in order but are not sequential like this:
/root
/f1
img21.jpg
img24.jpg
img26.jpg
img27.jpg
/f2
img06.jpg
img14.jpg
img36.jpg
img57.jpg
and I want to get them looking like this, having the folder title as well as having all the images in sequential order:
/root
/f1
f1_01.jpg
f1_02.jpg
f1_03.jpg
f1_04.jpg
/f2
f2_01.jpg
f2_02.jpg
f2_03.jpg
f2_04.jpg
I'm not sure how to do this using shell script.
Thanks in advance!

Use a for loop to iterate over the directories and another for loop to iterate over the files. Maintain a counter that you increment by 1 for each file.
There's no direct convenient way of padding numbers with leading zeroes. You can call printf, but that's a little slow. A useful, fast trick is to start counting at 101 (if you want two-digit numbers — 1000 if you want 3-digit numbers, and so on) and strip the leading 1.
cd /root
for d in */; do
i=100
for f in "$d/"*; do
mv -- "$f" "$d/${d%/}_${i#1}.${f##*.}"
i=$(($i+1))
done
done
${d%/} strips / at the end of $d, ${i#1} strips 1 at the start of $i and ${f##*.} strip everything from $f except what follows the last .. These constructs are documented in the section on parameter expansion in your shell's manual.
Note that this script assumes that the target file names will not clash with the names of existing files. If you have a directory called img, some files will be overwritten. If this may be a problem, the simplest method is to first move all the files to a different directory, then move them back to the original directory as you rename them.

Within a directory, ls will give you files in lexical order, which gets you the correct sort. So you can do something like this:
let i=0
ls *.jpg | while read file; do
mv $file prefix_$(printf "%02d" $i).jpg
let i++
done
This will take all the *.jpg files and rename them starting with prefix_00.jpg, prefix_01.jpg and so forth.
This obviously only works for a single directory, but hopefully with a little work you can use this to build something that will do what you want.

How to redirect output to different directory in BASH?

I have 3 directories named: /home/user/control4 , /home/user/control8 ,/home/user/control16
I have written a script file which has two loops , the first one is running a simulation, producing 3 files named cc1.txt cc2.txt, and cc3.txt and a second loop is for the names of directory
I like to direct cc1.txt, cc2.txt, cc3.txt to /home/user/control4,/home/user/control8 ,/home/user/control16, respectively. What is the exact syntax for doing this in BASH?
Thanks for your help.
My script file look likes this
#!/user/bin/bash
for j in $(4 8 16 ) # loop for directories
do
for i in $(seq 1 3) # loop for simulations
do
.... produce cc1.txt cc2.txt cc3.txt
done
How to output the three files to the respective directories? something like /home/user/control$j/cc$i.txt?
done

for j in /home/user/control{4,8,16}
do
for i in cc{1,2,3}.txt
do
produce "$j/$i"
done
done
That will produce nine files. If you want one file per directory for a total of three files, some math might do the trick.
for i in {1..3}
do
produce "/home/user/control$((2*2**i))/cc$i.txt"
done

cat cc$i.txt > /home/user/control$j/cc$i.txt

If I'm reading your script right, you'll end up with 9 files: cc1 through 3 each in directories control4, 8, and 16. However, your text description implies that you want only 3 files total: control4/cc1.txt, control8/cc2.txt, and control16/cc3.txt.
If you literally have a command named produce that takes output files as parameters, and you only want 3 files total, I'd just specify the three output directories & files manually, and not worry about looping.
If you literally have a command named produce that takes output files as parameters, and you want 9 files total, then yes, control$j/cc$i.txt will do that.
If you don't have a command named produce that takes output files as parameters (i.e., that line in your code sample was psuedo-code), and the command that produces data just dumps those files out without you specifying anything, then I think you'll have to mv cc1.txt control4 ; mv cc2.txt control8 ; mv cc3.txt control16 after it is done.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash script to loop through different files and perform command - bash

Related

For loop within a for loop for iterating files of different extensions

Running a process on every combination between files in two folders

Bash script that creates files of a set size

renaming images into an ordered sequence using shell

How to redirect output to different directory in BASH?

Categories

Resources