executing a command on multiple paired files - bash

Say I have a command, command.py, and it pairs together files, File_01_R1.fastq to File_01_R2.fastq. The command executed on a single pair looks like this:
command.py -f File_01_R1.fastq -r File_01_R2.fastq
I have many files however, each with a R1 and R2 version. How can I tell this command to go through every file I have, so it also executes
command.py -f File_02_R1.fastq -r File_02_R2.fastq
command.py -f File_03_R1.fastq -r File_03_R2.fastq
and so on.

You may use a simple parameter expansion:
for f in *_R1.fastq; do
echo command.py -f "$f" -r "${f%_R1.fastq}_R2.fastq"
done
This will just print out what's to be executed. Remove the echo if you're happy with the result.

# Loop over all R1.fastq files
for f in File_*_R1.fastq; do
# Replace R1 with R2 in the filename and run the command on both files.
command.py -f "$f" -r "${f/_R1./_R2.}"
done; unset -v f
As #gniourf_gniourf indicates in his comment my answer is slightly less safe than his in that it may match at an incorrect location in the filename (whereas his is anchored at the end).

Related

How to write a Bash script to edit many text files using the same commands? [duplicate]

This question already has answers here:
Run script on multiple files
(3 answers)
Closed 3 years ago.
I'm very new to bash. I have ten text files that I want to edit with the same line of code.
#!/bin/bash
sed -i -e 's/.\{6\}/&\n/g' -e 's/edit/edit2/g' | tr -d "\n" | sed 's/edit2/edit/g'| grep -o "here.*there" | sed -r '/^.{,100}$/d'
< files 1-10
I know I could use sed -f sed.sh <file1 >file1 but that only works with sed commands and it only works one file at a time?
Do I have to run a loop?
There's some great existing answers on the Unix stack exchange that help deal with your problem. Specifically, from this post, they use a loop to recursively loop through all the files in a particular directory, as follows:
( shopt -s globstar dotglob;
for file in **; do
if [[ -f $file ]] && [[ -w $file ]]; then
sed -i -- 's/foo/bar/g' "$file"
fi
done
)
Note the line, shopt -s globstar dotglob;, which allows us to use globbing patterns in the for loop. We also enclose the code in brackets, to prevent the shopt -s globstar dotglob; line option from becoming a global setting.
If you would like to apply this example to your file, you can just place your files in the current directory, and the code would probably look something like this:
( shopt -s globstar dotglob;
for file in **; do
if [[ -f $file ]] && [[ -w $file ]]; then
sed -i -e 's/.\{6\}/&\n/g' -e 's/edit/edit2/g' | tr -d "\n" | sed 's/edit2/edit/g' | grep -o "here.*there" | sed -r '/^.{,100}$/d' "$file"
fi
done
)
Note that we have placed a "$file" variable beside each of the seds that you used in your code, this replaces the name of the file for each command.
There is another example given in the code that allows you to pick which files to run on, rather than all the files in a directory, which you can also re-purpose for your code, as given here:
( shopt -s globstar dotglob
sed -i -- 's/foo/bar/g' **baz*
sed -i -- 's/foo/bar/g' **.baz
)
To answer your question of doing a loop on each line, you will need to put a loop for each line inside your for loop, like so:
while read line ; do
: sed -i -e 's/.\{6\}/&\n/g' -e 's/edit/edit2/g' | tr -d "\n" | sed 's/edit2/edit/g' | grep -o "here.*there" | sed -r '/^.{,100}$/d' "$line”
done
)
Although the for loop can be useful for dealing with files in recursive directories, I would recommend against also using another loop to grab lines, since it muddies your code, and it’s possible there is a better way to do it without parsing line by line.
The linked question is a fairly complete guide to many of the cases you may come across, and is also worth a read if you want to learn more.
Hope that helps!
You could use a for loop.
You could use the tool parallel.
Example
Create a set of test files using a for-loop
mkdir -p /tmp/so58333536
cd /tmp/so58333536
for i in 1.txt 2.txt 3.txt 4.txt 5.txt;do echo "The answer is 41" > $i;done
cat /tmp/so58333536/*
Now correct your mistake using parallel [1].
mkdir /tmp/so58333536.new
ls /tmp/so58333536/* |parallel "sed 's/41/42/' {} > /tmp/so58333536.new/{/}"
cat /tmp/so58333536.new/*
{}:: refers to the current file
{/}:: refers to name of the current file (path is removed)
Reads: List all files in so58333536 and apply the following sed command to each file and write the output to so58333536.new.
[1] Another option is to use sed -i for in-place editing.
Be very carefull with this!! Mistakes can cause serious damages!
# !! Do not use -i option regularly !!
ls /tmp/so58333536/* |parallel "sed -i 's/41/42/'"

Associative array, file names refering to the path, for dmenu

And I started playing with dmenu and it seems such an automation for almost every thing. Unfortunately I'm not familiar with bash and it should be on my list.
I have a folder for my markdowns with subfolders containing my files. I'm trying to have a script to show them in dmenu while using an alias.
If the path to a file is
/home/user/docs/markdown/practice01/rmd/network.rmd
I would like to have
network
as an option in my dmenu. So when I choose
network -----> /home/user/docs/markdown/practice01/rmd/network.rmd
Here is my broken script. There are a few things I'm missing.
This way I get full path on my dmenu which i don't need. I tried to read about associative arrays but I can't figure it out in bash.
This script works but in case I decide to ESC and exit, still it opens up an empty vim in my directory. Hence, I should know if statements huh!
#!/bin/bash
DMenu=("dmenu -l 10 -i -nb "#eaeaea" -sb "#E53935" -nf "#474747"")
cd ~/docs/markdown/
target=$(find -type f -name '*.rmd' | $DMenu)
st vim "$target"
I made a little example. But the problem is that it is a manual work to add each file, which definitely we don't wanna do right!
#!/bin/bash
declare -A dotfiles
dotfiles[i3]="/home/user/dotfiles/i3/.config/i3/config"
dotfiles[vimrc]="/home/user/dotfiles/vim/.vimrc"
list=("i3\nvimrc")
target=$(echo -e $list | dmenu -i -nb "#eaeaea" -sb "#E53935" -nf "#474747")
st vim "${dotfiles["$target"]}"
Thank you
Associative arrays can be weird... but returning output to a variable makes it easier to manipulate as any other string in bash, as shown in the example below:
prefix="$HOME/git/notes"
suffix=".md"
shopt -s nullglob globstar
item=( "$prefix"/**/*${suffix}) # Search *.md in all dirs/subdirs
item=( "${item[#]#"$prefix"/}" )
item=( "${item[#]%${suffix}}" ) # Removes '.md' string from item name
result=$(printf '%s\n' "${item[#]}" | dmenu)
[[ -n $result ]] || exit # exit if nothing is found
gedit "${prefix}/${result}.md" # Open file by adding again '.md'
When the percent sign (%) is used in the pattern ${variable%substring}, it will return content of the variable with the shortest occurrence of substring deleted from the back of the variable.
Listed below for reference are 2 examples I wrote, one in Bash and the other in Python, for managing pass and markdown notes with dmenu:
dmenu-pass.sh
dmenu-launch.py
Also, listed below are a couple nice articles that might help you out:
The weird, wondrous world of Bash arrays
Advanced Bash-Scripting Guide: Manipulating Strings
Instead of putting some code in an array, use a function!
my_dmenu() {
dmenu -l 10 -i -nb "#eaeaea" -sb "#e53935" -nf "#474747"
}
If your markdown files are all in the same folder (and not in subfolders), you certainly don't need find: use a glob instead! and if your files are in subfolders, use a glob instead (with the globstar shell option).
All in all:
#!/bin/bash
my_dmenu() {
dmenu -l 10 -i -nb "#eaeaea" -sb "#e53935" -nf "#474747"
}
base_dir=~/docs/markdown
# Also, check the return code of cd!
cd "$base_dir" || { echo >&2 "Can't cd to $base_dir. Exiting"; exit 1; }
# Using a glob: use the shell option nullglob
shopt -s nullglob
files=( *.rmd )
# Check that there are some files found:
if (( ${#files[#]} == 0 )); then
echo "No files found. Exiting."
exit 1
fi
# Now we're ready to send the files to dmenu:
chosen_file=$(printf '%s\n' "${files[#]}" | my_dmenu)
# If dmenu returns nothing: don't launch vim!
if [[ ! $chosen_file ]]; then
echo "No files selected. Exiting."
exit 1
fi
# Now you can launch vim!
st vim "$chosen_file"
If you also want to find the *.rmd files in subfolders: use instead:
shopt -s nullglob globstar
files=( **/*.rmd )
Edit to address the requirement in your comment (and the edit of your question):
If you want to strip the .rmd suffix to show in dmenu, use:
chosen_file=$(printf '%s\n' "${files[#]%.rmd}" | my_dmenu)
# ...
st vim "$chosen_file.rmd"
The expansion ${files[#]%.rmd} will strip the suffix .rmd from each field of the array files. Don't forget to add this suffix back when you edit the file (as shown in the last line).
dmenuoptions="-l 10 -i -nb '#eaeaea' -sb '#E53935' -nf '#474747'"
st -e vim $(find ~/docs/markdown -type f -name '*.rmd' | dmenu $dmenuoptions)

Running a shell script on several files as inputs

I have a shell command with the following format:
my_cmd -I file1.inp -O file1.out
Where some processing is done on file1.inp and the results are stored in file1.out
In my main directory, I have many files with the format: *.inp and I would like to run this command for all of them and the store the results to *.out. Can I only use shell script to achieve this?
You can use a simple loop:
for file in *.inp ; do
my_cmd -I "${file}" -O "${file%%.inp}.out"
done
${file%%.inp} is a so called parameter expansion. It will effectively remove the extension .inp from the input filename.
One thing (thanks Jean-François Fabre). If the folder does not contain any .inp files the above loop would run once with $file having the literal value *.inp. To avoid that you need to set the nullglob option:
shopt -s nullglob # set the nullglob option
for file in *.inp ; do
my_cmd -I "${file}" -O "${file%%.inp}.out"
done
shopt -u nullglob # unset the nullglob option
Using GNU parallel
parallel my_cmd -I {} -O {.}.out ::: *.inp
By default, this will jobs in parallel, one job per core. {} is an unchanged argument, {.} is the same argument minus its extension. The arguments are taken from the words that follow :::.
ls *.inp| xargs -l1 -I % my_cmd -I % -O %.out

Run Command on Multiple of Files or Single File

I needed to convert several pnm image files to jpeg with pnmtojpeg. So I used this script, which I named 'pnm2jpg':
for f in *.pnm;
do pnmtojpeg -quality=85 "$f" > "${f%.pnm}.jpg";
done
This works very nicely. However, I would like to adapt it further so that it can be used for a single file as well.
In other words, if no files are specified in the command line, then process all the files.
$ pnm2jpg thisfile.pnm # Process only this file.
$ pnm2jpg # Process all pnm files in the current directory.
Your insight is greatly appreciated- Thank you.
Something like:
#!/bin/bash
if [[ -z "$1" ]]; then
for f in *.pnm; do
pnmtojpeg -quality=85 "$f" > "${f%.pnm}.jpg"
done
else
pnmtojpeg -quality=85 "$1" > "${1%.pnm}.jpg"
fi
If you execute pnm2jpg without an argument the if block is processed.
if you execute pnm2jpg thisfile.pnm the else block is processed.

grep spacing error

Hi guys i've a problem with grep . I don't know if there is another search code in shell script.
I'm trying to backup a folder AhmetsFiles which is stored in my Flash Disk , but at the same time I've to group them by their extensions and save them into [extensionName] Folder.
AhmetsFiles
An example : /media/FlashDisk/AhmetsFiles/lecture.pdf must be stored in /home/$(whoami)/Desktop/backups/pdf
Problem is i cant copy a file which name contains spaces.(lecture 2.pptx)
After this introduction here my code.
filename="/media/FlashDisk/extensions"
count=0
exec 3<&0
exec 0< $filename
mkdir "/home/$(whoami)/Desktop/backups"
while read extension
do
cd "/home/$(whoami)/Desktop/backups"
rm -rf "$extension"
mkdir "$extension"
cd "/media/FlashDisk/AhmetsFiles"
files=( `ls | grep -i "$extension"` )
fCount=( `ls | grep -c -i "$extension"` )
for (( i=0 ; $i<$fCount ; i++ ))
do
cp -f "/media/FlashDisk/AhmetsFiles/${files[$i]}" "/home/$(whoami)/Desktop/backups/$extension"
done
let count++
done
exec 0<&3
exit 0
Your looping is way more complicated than it needs to be, no need for either ls or grep or the files and fCount variables:
for file in *.$extension
do
cp -f "/media/FlashDisk/AhmetsFiles/$file" "$HOME/Desktop/backups/$extension"
done
This works correctly with spaces.
I'm assuming that you actually wanted to interpret $extension as a file extension, not some random string in the middle of the filename like your original code does.
Why don't you
grep -i "$extension" | while IFS=: read x ; do
cp ..
done
instead?
Also, I believe you may prefer something like grep -i ".$extension$" instead (anchor it to the end of line).
On the other hand, the most optimal way is probably
cp -f /media/FlashDisk/AhmetsFiles/*.$extension "$HOME/Desktop/backups/$extension/"

Resources