Running a shell script on several files as inputs - bash

I have a shell command with the following format:
my_cmd -I file1.inp -O file1.out
Where some processing is done on file1.inp and the results are stored in file1.out
In my main directory, I have many files with the format: *.inp and I would like to run this command for all of them and the store the results to *.out. Can I only use shell script to achieve this?

You can use a simple loop:
for file in *.inp ; do
my_cmd -I "${file}" -O "${file%%.inp}.out"
done
${file%%.inp} is a so called parameter expansion. It will effectively remove the extension .inp from the input filename.
One thing (thanks Jean-François Fabre). If the folder does not contain any .inp files the above loop would run once with $file having the literal value *.inp. To avoid that you need to set the nullglob option:
shopt -s nullglob # set the nullglob option
for file in *.inp ; do
my_cmd -I "${file}" -O "${file%%.inp}.out"
done
shopt -u nullglob # unset the nullglob option

Using GNU parallel
parallel my_cmd -I {} -O {.}.out ::: *.inp
By default, this will jobs in parallel, one job per core. {} is an unchanged argument, {.} is the same argument minus its extension. The arguments are taken from the words that follow :::.

ls *.inp| xargs -l1 -I % my_cmd -I % -O %.out

Related

How to write a Bash script to edit many text files using the same commands? [duplicate]

This question already has answers here:
Run script on multiple files
(3 answers)
Closed 3 years ago.
I'm very new to bash. I have ten text files that I want to edit with the same line of code.
#!/bin/bash
sed -i -e 's/.\{6\}/&\n/g' -e 's/edit/edit2/g' | tr -d "\n" | sed 's/edit2/edit/g'| grep -o "here.*there" | sed -r '/^.{,100}$/d'
< files 1-10
I know I could use sed -f sed.sh <file1 >file1 but that only works with sed commands and it only works one file at a time?
Do I have to run a loop?
There's some great existing answers on the Unix stack exchange that help deal with your problem. Specifically, from this post, they use a loop to recursively loop through all the files in a particular directory, as follows:
( shopt -s globstar dotglob;
for file in **; do
if [[ -f $file ]] && [[ -w $file ]]; then
sed -i -- 's/foo/bar/g' "$file"
fi
done
)
Note the line, shopt -s globstar dotglob;, which allows us to use globbing patterns in the for loop. We also enclose the code in brackets, to prevent the shopt -s globstar dotglob; line option from becoming a global setting.
If you would like to apply this example to your file, you can just place your files in the current directory, and the code would probably look something like this:
( shopt -s globstar dotglob;
for file in **; do
if [[ -f $file ]] && [[ -w $file ]]; then
sed -i -e 's/.\{6\}/&\n/g' -e 's/edit/edit2/g' | tr -d "\n" | sed 's/edit2/edit/g' | grep -o "here.*there" | sed -r '/^.{,100}$/d' "$file"
fi
done
)
Note that we have placed a "$file" variable beside each of the seds that you used in your code, this replaces the name of the file for each command.
There is another example given in the code that allows you to pick which files to run on, rather than all the files in a directory, which you can also re-purpose for your code, as given here:
( shopt -s globstar dotglob
sed -i -- 's/foo/bar/g' **baz*
sed -i -- 's/foo/bar/g' **.baz
)
To answer your question of doing a loop on each line, you will need to put a loop for each line inside your for loop, like so:
while read line ; do
: sed -i -e 's/.\{6\}/&\n/g' -e 's/edit/edit2/g' | tr -d "\n" | sed 's/edit2/edit/g' | grep -o "here.*there" | sed -r '/^.{,100}$/d' "$line”
done
)
Although the for loop can be useful for dealing with files in recursive directories, I would recommend against also using another loop to grab lines, since it muddies your code, and it’s possible there is a better way to do it without parsing line by line.
The linked question is a fairly complete guide to many of the cases you may come across, and is also worth a read if you want to learn more.
Hope that helps!
You could use a for loop.
You could use the tool parallel.
Example
Create a set of test files using a for-loop
mkdir -p /tmp/so58333536
cd /tmp/so58333536
for i in 1.txt 2.txt 3.txt 4.txt 5.txt;do echo "The answer is 41" > $i;done
cat /tmp/so58333536/*
Now correct your mistake using parallel [1].
mkdir /tmp/so58333536.new
ls /tmp/so58333536/* |parallel "sed 's/41/42/' {} > /tmp/so58333536.new/{/}"
cat /tmp/so58333536.new/*
{}:: refers to the current file
{/}:: refers to name of the current file (path is removed)
Reads: List all files in so58333536 and apply the following sed command to each file and write the output to so58333536.new.
[1] Another option is to use sed -i for in-place editing.
Be very carefull with this!! Mistakes can cause serious damages!
# !! Do not use -i option regularly !!
ls /tmp/so58333536/* |parallel "sed -i 's/41/42/'"

Associative array, file names refering to the path, for dmenu

And I started playing with dmenu and it seems such an automation for almost every thing. Unfortunately I'm not familiar with bash and it should be on my list.
I have a folder for my markdowns with subfolders containing my files. I'm trying to have a script to show them in dmenu while using an alias.
If the path to a file is
/home/user/docs/markdown/practice01/rmd/network.rmd
I would like to have
network
as an option in my dmenu. So when I choose
network -----> /home/user/docs/markdown/practice01/rmd/network.rmd
Here is my broken script. There are a few things I'm missing.
This way I get full path on my dmenu which i don't need. I tried to read about associative arrays but I can't figure it out in bash.
This script works but in case I decide to ESC and exit, still it opens up an empty vim in my directory. Hence, I should know if statements huh!
#!/bin/bash
DMenu=("dmenu -l 10 -i -nb "#eaeaea" -sb "#E53935" -nf "#474747"")
cd ~/docs/markdown/
target=$(find -type f -name '*.rmd' | $DMenu)
st vim "$target"
I made a little example. But the problem is that it is a manual work to add each file, which definitely we don't wanna do right!
#!/bin/bash
declare -A dotfiles
dotfiles[i3]="/home/user/dotfiles/i3/.config/i3/config"
dotfiles[vimrc]="/home/user/dotfiles/vim/.vimrc"
list=("i3\nvimrc")
target=$(echo -e $list | dmenu -i -nb "#eaeaea" -sb "#E53935" -nf "#474747")
st vim "${dotfiles["$target"]}"
Thank you
Associative arrays can be weird... but returning output to a variable makes it easier to manipulate as any other string in bash, as shown in the example below:
prefix="$HOME/git/notes"
suffix=".md"
shopt -s nullglob globstar
item=( "$prefix"/**/*${suffix}) # Search *.md in all dirs/subdirs
item=( "${item[#]#"$prefix"/}" )
item=( "${item[#]%${suffix}}" ) # Removes '.md' string from item name
result=$(printf '%s\n' "${item[#]}" | dmenu)
[[ -n $result ]] || exit # exit if nothing is found
gedit "${prefix}/${result}.md" # Open file by adding again '.md'
When the percent sign (%) is used in the pattern ${variable%substring}, it will return content of the variable with the shortest occurrence of substring deleted from the back of the variable.
Listed below for reference are 2 examples I wrote, one in Bash and the other in Python, for managing pass and markdown notes with dmenu:
dmenu-pass.sh
dmenu-launch.py
Also, listed below are a couple nice articles that might help you out:
The weird, wondrous world of Bash arrays
Advanced Bash-Scripting Guide: Manipulating Strings
Instead of putting some code in an array, use a function!
my_dmenu() {
dmenu -l 10 -i -nb "#eaeaea" -sb "#e53935" -nf "#474747"
}
If your markdown files are all in the same folder (and not in subfolders), you certainly don't need find: use a glob instead! and if your files are in subfolders, use a glob instead (with the globstar shell option).
All in all:
#!/bin/bash
my_dmenu() {
dmenu -l 10 -i -nb "#eaeaea" -sb "#e53935" -nf "#474747"
}
base_dir=~/docs/markdown
# Also, check the return code of cd!
cd "$base_dir" || { echo >&2 "Can't cd to $base_dir. Exiting"; exit 1; }
# Using a glob: use the shell option nullglob
shopt -s nullglob
files=( *.rmd )
# Check that there are some files found:
if (( ${#files[#]} == 0 )); then
echo "No files found. Exiting."
exit 1
fi
# Now we're ready to send the files to dmenu:
chosen_file=$(printf '%s\n' "${files[#]}" | my_dmenu)
# If dmenu returns nothing: don't launch vim!
if [[ ! $chosen_file ]]; then
echo "No files selected. Exiting."
exit 1
fi
# Now you can launch vim!
st vim "$chosen_file"
If you also want to find the *.rmd files in subfolders: use instead:
shopt -s nullglob globstar
files=( **/*.rmd )
Edit to address the requirement in your comment (and the edit of your question):
If you want to strip the .rmd suffix to show in dmenu, use:
chosen_file=$(printf '%s\n' "${files[#]%.rmd}" | my_dmenu)
# ...
st vim "$chosen_file.rmd"
The expansion ${files[#]%.rmd} will strip the suffix .rmd from each field of the array files. Don't forget to add this suffix back when you edit the file (as shown in the last line).
dmenuoptions="-l 10 -i -nb '#eaeaea' -sb '#E53935' -nf '#474747'"
st -e vim $(find ~/docs/markdown -type f -name '*.rmd' | dmenu $dmenuoptions)

Modify a path stored in a bash script variable

I have a variable f in a bash script
f=/path/to/a/file.jpg
I'm using the variable as an input argument to a program that requires and input and an output path.
For example the program's usage would look like this
./myprogram -i inputFilePath -o outputFilePath
using my variable, I'm trying to maintain the same basename, change the extension, and put the output file into a sub directory. For example
./myprogram -i /path/to/a/file.jpg -o /path/to/a/new/file.tiff
I'm trying to do that by doing this
./myprogram -i "$f" -o "${f%.jpg}.tiff"
of course this keeps the basename, changes the extension, but doesn't put the file into the new subdirectory.
How can I modify f to to change /path/to/a/file.jpg into /path/to/a/new/file.tiff?
Actually you can do this in several ways:
Using sed as pointed out by #anubhava
Using dirname and basename:
./myprogram -i "$f" -o "$(dirname -- "$f")/new/$(basename -- "$f" .jpg).tiff"
Using only Bash:
./myprogram -i "$f" -o "${f%/*}/new/$(b=${f##*/}; echo -n ${b%.jpg}.tiff)"
Note that unlike the second solution (using dirname/basename) that is more robust, the third solution (in pure Bash) won't work if "$f" does not contain any slash:
$ dirname "file.jpg"
.
$ f="file.jpg"; echo "${f%/*}"
file.jpg
You may use this sed:
s='/path/to/a/file.jpg'
sed -E 's~(.*/)([^.]+)\.jpg$~\1new/\2.tiff~' <<< "$s"
/path/to/a/new/file.tiff
If you're on a system that supports the basename and dirnamecommands you could use a simple wrapper function eg:
$ type newSubDir
newSubDir is a function
newSubDir ()
{
oldPath=$(dirname "${1}");
fileName=$(basename "${1}");
newPath="${oldPath}/${2}/${fileName}";
echo "${newPath}"
}
$ newSubDir /path/to/a/file.jpg new
/path/to/a/new/file.jpg
If your system doesn't have those, you can accomplish the same thing using string manipulation:
$ file="/path/to/a/file.jpg"
$ echo "${file%/*}"
/path/to/a
$ echo "${file##*/}"
file.jpg

Adding extra argument to xargs

I'm trying to kick off multiple processes to work through some test suites. In my bash script I have the following
printf "%s\0" "${SUITE_ARRAY[#]}" | xargs -P 2 -0 bash -c 'run_test_suite "$#" ${EXTRA_ARG}'
Below is the defined script, cut down to it's basics.
SUITE_ARRAY will be a list of suites that may have 1 or more, {Suite 1, Suite 2, ..., Suite n}
EXTRA_ARG will be like a specific name to store values in another script
#!/bin/bash
run_test_suite(){
suite=$1
someArg=$2
someSaveDir=someArg"/"suite
# some preprocess work happens here, but isn't relevant to running
runSomeScript.sh suite someSaveDir
}
export -f run_test_suite
SUITES=$1
EXTRA_ARG=$2
IFS=','
SUITECOUNT=0
for csuite in ${SUITES}; do
SUITE_ARRAY[$SUITECOUNT]=$csuite
SUITECOUNT=$(($SUITECOUNT+1))
done
unset IFS
printf "%s\0" "${SUITE_ARRAY[#]}" | xargs -P 2 -0 bash -c 'run_test_suite "$#" ${EXTRA_ARG}'
The issue I'm having is how to get the ${EXTRA_ARG} passed into xargs. From how I've come to understand it, xargs will take whatever is piped into it, so the way I have it doesn't seem correct.
Any suggestions on how to correctly pass the values? Thanks in advance
If you want EXTRA_ARG to be available to the subshell, you need to export it. You can do that either explicitly, with the export keyword, or by putting the var=value assignment in the same simple command as xargs itself:
#!/bin/bash
run_test_suite(){
suite=$1
someArg=$2
someSaveDir=someArg"/"suite
# some preprocess work happens here, but isn't relevant to running
runSomeScript.sh suite someSaveDir
}
export -f run_test_suite
# assuming that the "array" in $1 is comma-separated:
IFS=, read -r -a suite_array <<<"$1"
# see the EXTRA_ARG="$2" just before xargs on the same line; this exports the variable
printf "%s\0" "${suite_array[#]}" | \
EXTRA_ARG="$2" xargs -P 2 -0 bash -c 'run_test_suite "$#" "${EXTRA_ARG}"' _
The _ prevents the first argument passed from xargs to bash from becoming $0, and thus not included in "$#".
Note also that I changed "${suite_array[#]}" to be assigned by splitting $1 on commas. This or something like it (you could use IFS=$'\n' to split on newlines instead, for example) is necessary, as $1 cannot contain a literal array; every shell command-line argument is only a single string.
This is something of a guess:
#!/bin/bash
run_test_suite(){
suite="$1"
someArg="$2"
someSaveDir="${someArg}/${suite}"
# some preprocess work happens here, but isn't relevant to running
runSomeScript.sh "${suite}" "${someSaveDir}"
}
export -f run_test_suite
SUITE_ARRAY="$1"
EXTRA_ARG="$2"
printf "%s\0" "${SUITE_ARRAY[#]}" |
xargs -n 1 -I '{}' -P 2 -0 bash -c 'run_test_suite {} '"${EXTRA_ARG}"
Using GNU Parallel it looks like this:
#!/bin/bash
run_test_suite(){
suite="$1"
someArg="$2"
someSaveDir="$someArg"/"$suite"
# some preprocess work happens here, but isn't relevant to running
echo runSomeScript.sh "$suite" "$someSaveDir"
}
export -f run_test_suite
EXTRA_ARG="$2"
parallel -d, -q run_test_suite {} "$EXTRA_ARG" ::: "$1"
Called as:
mytester 'suite 1,suite 2,suite "three"' 'extra "quoted" args here'
If you have the suites in an array:
parallel -q run_test_suite {} "$EXTRA_ARG" ::: "${SUITE_ARRAY[#]}"
Added bonus: Any output from the jobs will not be mixed, so you will not have to deal with http://mywiki.wooledge.org/BashPitfalls#Using_output_from_xargs_-P

executing a command on multiple paired files

Say I have a command, command.py, and it pairs together files, File_01_R1.fastq to File_01_R2.fastq. The command executed on a single pair looks like this:
command.py -f File_01_R1.fastq -r File_01_R2.fastq
I have many files however, each with a R1 and R2 version. How can I tell this command to go through every file I have, so it also executes
command.py -f File_02_R1.fastq -r File_02_R2.fastq
command.py -f File_03_R1.fastq -r File_03_R2.fastq
and so on.
You may use a simple parameter expansion:
for f in *_R1.fastq; do
echo command.py -f "$f" -r "${f%_R1.fastq}_R2.fastq"
done
This will just print out what's to be executed. Remove the echo if you're happy with the result.
# Loop over all R1.fastq files
for f in File_*_R1.fastq; do
# Replace R1 with R2 in the filename and run the command on both files.
command.py -f "$f" -r "${f/_R1./_R2.}"
done; unset -v f
As #gniourf_gniourf indicates in his comment my answer is slightly less safe than his in that it may match at an incorrect location in the filename (whereas his is anchored at the end).

Resources