bash for loop: failed to open file [duplicate] - bash

This question already has answers here:
How can I use a file in a command and redirect output to the same file without truncating it?
(14 answers)
Closed 7 years ago.
I have many directories with multiple files, each following strict naming convention. I tried this loop:
for s in /home/admin1/phd/results_Sample_*_hg38_hg19/Sample_*_bwa_LongSeed_sorted_hg38.bam; do
bedtools genomecov -ibam $s > $s
done
but get for every file:
Failed to open file
I just can't see the mistake

Note that if you are redirecting the output to the input file, bash will truncate the file before the process starts. Like this:
cat file > file
will result an empty file. Meaning you'll loose it's contents.
This is because bash will open all files which are subject of io redirection before it starts the command. If you are using > bash will additionally truncate the file to zero length. I don't know bedtools but I guess it expects non-empty files as input, that's why the message.
About the loop. Usually it is not a good idea to loop over the results of a glob expression like you did. This is because file names may contain the internal field separator of bash which would lead to inconsistent results.
I would suggest to use find instead:
find /home/admin1/phd/results_Sample_*_hg38_hg19 \
-maxdepth 1 \
-name 'Sample_*_bwa_LongSeed_sorted_hg38.bam' \
-exec bash -c 'bedtools genomecov -ibam "{}" > "{}.copy"' \;

Related

Use a command over a set of files and save results [duplicate]

This question already has answers here:
Appending a string before file extension in Bash script
(3 answers)
Execute command on all files in a directory
(10 answers)
Closed 3 years ago.
I have a set of files in a directory
test1.in
test2.in
test3.in
test4.in
...
I have written a conversion tool to extract some data from those files and save it into another file.
I want to run that conversion tool for each file and save the results into different output files
./tool -i test1.in -i test1.out
./tool -i test2.in -o test2.out
I have seen a lot of answer to iterate on the files and run the desired command for each file, but I am not sure how to specify the output.
for file in dir/*.in; do
./tool -i file -o ???
done
How can I create a name for the output file?
for file in dir/*.in; do
./tool -i "$file" -o "${file%.*}".out
done
"${file%.*}" represents the value of $file before the last . (See https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html#Shell-Parameter-Expansion, look for ${parameter%word}).

BASH: How to find the number of files in a directory and assign that number as a variable? [duplicate]

This question already has answers here:
Is there a bash command which counts files?
(16 answers)
Closed 3 years ago.
I am using BASH to attempt to automate a very tedious manual scanning process.
I am very new to scripting and am learning as I go, but I've been googling and reading manuals for a day and a half and still haven't found anything to solve the following problem:
After SSHing into a remote server and then SCPing (sometimes thousands of) files from a second remote server...
1.) I need to count the number of files (inclusive of dotfiles (.pl, .xml, .sh, etc.)) in a directory and assign that numerical value as a variable (eg. $filecount)
2.) Then I need to use either an if or case statement to check if $filecount is >, = or < then 1. (if > 1 a previous SCP was successful, if = 1 the SCP failed or finally if < 1 something went really wrong)
Any guidance is greatly appreciated
For counting the number of files I would use find and print a dot for each file. Then count the dots with wc:
find "${dir}" -maxdepth 1 -type f -printf "." | wc -c
If I would print the real file names and then count the number of lines, the solution would get confused by filenames with newlines in them, which is allowed on UNIX. Therefore I'm just printing a dot.
To store the results in a variable use command substitution:
nfiles=$(find "${dir}" -maxdepth 1 -type f -printf "." | wc -c)
I leave checking the value of $nfiles in an if statement as an exercise for you.
Btw, normally you would check if the scp command succeeded by checking it's exit state:
if ! scp ... ; then
echo "scp command failed"
fi

Looping through certain files in a folder and running command on them while and also change the file name [duplicate]

This question already has answers here:
Extending a script to loop over multiple files and generate output names
(1 answer)
File name without extension in bash for loop
(3 answers)
Closed 5 years ago.
beginner here. I want to write a simple script for use in the bash terminal, but I don't know how to make it. The gist is that I have a folder filled with different files, some .foo, some .bar, etc. I want to create a script that takes all the .foo files and perform a command on them, but in the same line rename them so that the output file is named file.baz.
For example:
command -i file.foo -o file.baz for all .foo files in a directory.
Renaming
You can use the rename command:
$ rename .foo .baz *.foo
Note that on some systems rename points to prename which uses a different syntax:
$ prename 's/.foo$/.baz' *.foo
Use man rename to find out which one you have.
Looping over files and running a command on each of them
You can provide the file list directly on the command line, using a globbing pattern:
$ your_script *.foo
Your script can then iterate over the list like this (using your command's usage):
for file in "$#"; do
your_command -i "$file" -o "${file%.*}.baz"
done
${file%.*} resolves as the name of the file without its extension (file.foo -> file). More information on string manipulation is available here.

Iterate through several files in bash [duplicate]

This question already has answers here:
How to zero pad a sequence of integers in bash so that all have the same width?
(15 answers)
Closed 6 years ago.
I have a folder with several files that are named like this:
file.001.txt.gz, file.002.txt.gz, ... , file.150.txt.gz
What I want to do is use a loop to run a program with each file. I was thinking in something like this (just a sketch):
for i in {1:150}
gunzip file.$i.txt.gz
./my_program file.$i.txt output.$1.txt
gzip file.$1.txt
First of all, I don't know if something like this is gonna work, and second, I can't figure out how to keep the three digits numeration the file have ('001' instead of just '1').
Thanks a lot
The syntax for ranges in bash is
{1..150}
not {1:150}.
Moreover, if your bash is recent enough, you can add the leading zeroes:
{001..150}
The correct syntax of the for loop needs do and done.
for i in {001..150} ; do
# ...
done
It's unclear what $1 contains in your script.
To iterate over files I believe the simpler way is:
(assuming there are no files named 'file.*.txt' already in the directory and that your output file can have a different name)
for i in file.*.txt.gz; do
gunzip $i
./my_program $i $i-output.txt
gzip file.*.txt
done
Using find command:
# Path to the source directory
dir="./"
while read file
do
output="$(basename "$file")"
output="$(dirname "$file")/"${output/#file/output}
echo "$file ==> $output"
done < <(find "$dir" \
-regextype 'posix-egrep' \
-regex '.*file\.[0-9]{3}\.txt\.gz$')
The same via pipe:
find "$dir" \
-regextype 'posix-egrep' \
-regex '.*file\.[0-9]{3}\.txt\.gz$' | \
while read file
do
output="$(basename "$file")"
output="$(dirname "$file")/"${output/#file/output}
echo "$file ==> $output"
done
Sample output
/home/ruslan/tmp/file.001.txt.gz ==> /home/ruslan/tmp/output.001.txt.gz
/home/ruslan/tmp/file.002.txt.gz ==> /home/ruslan/tmp/output.002.txt.gz
(for $dir=/home/ruslan/tmp/).
Description
The scripts iterate the files in $dir directory. The $file variable is filled with the next line read from the find command.
The find command returns a list of paths corresponding to the regular expression '.*file\.[0-9]{3}\.txt\.gz$'.
The $output variable is built from two parts: basename (path without directories) and dirname (path to file's directory).
${output/#file/output} expression replaces file with output at the front end of $output variable (see Manipulating Strings)
Try-
for i in $(seq -w 1 150) #-w adds the leading zeroes
do
gunzip file."$i".txt.gz
./my_program file."$i".txt output."$1".txt
gzip file."$1".txt
done
The syntax for ranges is as choroba said, but when iterating over files you usually want to use a glob. If you know all the files have three digits in their names you can match on digits:
shopt -s nullglob
for i in file.0[0-9][0-9].txt.gz file.1[0-4][0-9] file.15[0].txt.gz; do
gunzip file.$i.txt.gz
./my_program file.$i.txt output.$i.txt
gzip file.$i.txt
done
This will only iterate through files that exist. If you use the range expression, you have to take extra care not to try to operate on files that don't exist.
for i in file.{000..150}.txt.gz; do
[[ -e "$i" ]] || continue
...otherstuff
done

Input and output redirection to the same file [duplicate]

This question already has answers here:
How can I use a file in a command and redirect output to the same file without truncating it?
(14 answers)
Closed 1 year ago.
How can I redirect input and output to a same file in general? I mean specifically there is -o for the sort command and there might be other such options for various command. But how can I generally redirect input and output to same file without clobbering the file?
For example: sort a.txt > a.txt destroys the a.txt file contents, but I want to store answer in the same file. I know I can use mv and rm after using a temporary file, but is it possible to do it directly?
As mentioned on BashPitfalls entry #13, you can use sponge from moreutils to "soak up" the data before opening the file to write to it.
Example Usage:
sort a.txt | sponge a.txt
While the BashPitfalls page mentions that there could be data loss, the man page for sponge says
It also creates the output file atomically by renaming a temp file into place [...]
This would make it no more dangerous than writing to a temp file and doing a mv.
Credit to Charles Duffy for pointing out the BashPitfalls entry in the comments.
If you're familiar with the POSIX apis, you'll recognize that opening a file has a few possible modes, but the most common ones are read, write and append. You'll recall that if you open a file for writing, you will truncate it, immediately.
The redirects are directly analogous to those common modes.
> x # open x for writing
< x # open x for reading
>> x # open x for appending
There are no shell redirect that are equivalent to modes like O_RDWR, unfortunately.
You can check for this with the noclobber option, but you cannot open a file for both reading and writing using shell redirect operators. You must use a temporary file.
Not if the command doesn't support doing the mv itself after it is finished.
The shell truncates the output file before it even runs the command you told it to run. (Try it with a command that doesn't exist and you'll see it still gets truncated.)
This is why some commands have options to do this for you (to save you from having to use command input > output && mv output input or similar yourself).
I realize this is from a billion years ago, but I came here looking for the answer before I realized tee. Maybe I'll forget and stumble upon this in another 4 years.
$ for task in `cat RPDBFFLDQFZ.tasks | shuf -n 5`; do
echo $task;
grep -v $task RPDBFFLDQFZ.tasks | tee RPDBFFLDQFZ.tasks > /dev/null
done
6551a1fac870
26ab104327af
d6a90cf1720f
9eaa4faea92f
45ebf210a1b6

Resources