I have a script that extracts filenames from an input file and is supposed to read each line (filename) and unzip the specified file, saving the unzipped content as individual files. However, I can't get my counter to work and just get all the unzipped files in one large file.
Input file contains a list:
ens/484/59/traj.pdb 0.001353
ens/263/39/traj.pdb 0.004178
ens/400/35/traj.pdb 0.004191
I'm using the regex /.*?/.*?/ to extract the file that I'd like to unzip and name each output{1..40}.pdb -- instead I get one output file: output1.pdb which contains all the contents of the 40 unzipped files.
My question is: how do I correct my counter in order to achieve the desired naming scheme?
#!/bin/bash
file="/home/input.txt"
grep -Po '/.*?/.*?/' $file > filenames.txt
i=$((i+1))
structures='filenames.txt'
while IFS= read line
do
gunzip -c 'ens'$line'traj.pdb.gz' >> 'output'$i'.pdb'
done <"$structures"
rm "$structures"
file="/home/input.txt"
grep -Po '/.*?/.*?/' $file > filenames.txt
structures='filenames.txt'
i=1
while IFS= read "line"
do
gunzip -c 'ens'$line'traj.pdb.gz' >> 'output'$i'.pdb'
i=$(expr $i + 1)
done <$structures
rm $structures
couple of logical mistakes, the counter has to be fined as one out of the while loop and the counter +1 should be inside the loop, also for the counter to work you have to use expr, in this case i made the counter start from 1, so the first entry will get this value. Also on the parameter for the while loop i dont really understand what you are doing, if it works as you have it then cool or else use a test statement after while and before the parameters.
Related
I have a text file that contains a list of file paths. I want to loop through each line (file path), manipulate the file and store the manipulated file under a new name.
To do this I would like dynamically name the new file under file_$index so that each new file gets saved and not overwritten. Any ideas how to do this? my current code is:
for j in $(cat access.txt); do bcftools view $j -include 'maf[0]>0.0 & maf[0]<0.005' -output [FILE_NAME] ; done
i do not know how to dynamically change the file name i.e. to be file_$index. This would be equivalent of doing enumerate on a for loop in python. Note I cannot use the existing file path as that will overwrite the existing file which I do not want
In an ideal world i would manipulate the file path ($j) to extract part of the path as a new name. however I am not sure this is possible so file_$index also works.
Don't read lines with for
#!/usr/bin/env bash
index=1
while IFS= read -ru3 j; do
bcftools view "$j" -include 'maf[0]>0.0 & maf[0]<0.005' -output "${j}_$index"
((index++))
done 3< access.txt
In an ideal world i would manipulate the file path ($j) to extract part of the path as a new name.
As for the "$j" a Parameater Expansion can be use to manipulate/extract/replace whatever you want/wish with that variable.
-ru3 is a shorthand for -r -u 3 It is using the FD 3 just in case bcftools is eating stdin , See help read
((index++)) is an Arithmetic expression in bash.
You can increment your own index with something like this:
i=$(( ${i:-0} + 1 ))
then your output string can be
"$j_$i"
A simple solution using while read
while read idx fpath; do
echo "$(basename $fpath)_$idx.txt"
done <<<$(cat tmp.txt | nl)
Result
a_1.txt
b_2.txt
c_3.txt
For OP's code
bcftools view $j -include 'maf[0]>0.0 & maf[0]<0.005' -output "${fpath}_${idx}"
I'm attempting to write a program that moves zipped files that arrive in a directory, unzips them and then outputs the contents.
#!/bin/bash
shopt -s extglob
echo "Press [CTRL+C] to stop.."
#begin loop
while :
do
#search folder test_source for files and append names to array
queue+=($(ls /home/ec2-user/glutton/test_source | egrep 'test[0-9]{1}.gz'))
for i in $queue; do
#move file in test_source to temp folder
mv /home/ec2-user/glutton/test_source/${queue[i]} /home/ec2-user/glutton/temp
#unzip file
gunzip /home/ec2-user/glutton/temp/${queue[i]}
#add new file name to variable unzipped
unzipped=($(ls /home/ec2-user/glutton/temp | egrep 'test[0-9]{1}'))
cat temp/$unzipped
#Test for successful run
exit_status=$?
if [ $exit_status -eq 1 ]; then
#If fail move file to bank and output
mv /home/ec2-user/glutton/temp/$unzipped /home/ec2-user/glutton/logs/bank
echo "Error! $unzipped moved to /home/ec2-user/glutton/logs/bank"
#write to error.log
echo "moved ${queue[i]} to bank due to error" >> /home/ec2-user/glutton/logs/error.log
else
#If success delete file
rm /home/ec2-user/glutton/temp/$unzipped
fi
#wipe variable
unset unzipped
i=$i+1
done
#wipe array
unset queue
i=0
#go to top of loop
done
This has worked pretty well up until I added the unzipping feature and now my program outputs this error when attempting to move the .gz file:
./glutton.sh: line 11: test0.gz: syntax error: invalid arithmetic operator (error token is ".gz")
When I run the first part of my script in the command line it seems to work perfectly, but doesn't when I run it on its own, I'm pretty confused.
Your main issue is that when you iterate an array like you are doing, you get the first item of the array, not the index. So in your case, $i is not a number, it is the filename (i.e. test1.gz) and it will only see the first file. The cleanest way I have seen to iterate the items in an array would be for i in "${arrayName[#]}".
Also, using '{1}' in your regex is redundant, the character class will already match only 1 character if you don't specify a modifier.
It shouldn't matter depending on the contents of your 'temp' folder, but I would be more specific on your egreps too, if you add -x then it will have to match the whole string. As it is currently, a file called 'not_a_test1' would match.
I have a number (say, 100) of CSV files, out of which some (say, 20) are empty (i.e., 0 bytes file). I would like to concatenate the files into one single CSV file (say, assorted.csv), with the following requirement met:
For each empty file, there must be a blank line in assorted.csv.
It appears that simply doing cat *.csv >> assorted.csv skips the empty files completely in the sense that they do not have any lines and hence there is nothing to concatenate.
Though I can solve this problem using any high-level programming language, I would like to know if and how to make it possible using Bash.
Just make a loop and detect if the file is not empty. If it's empty, just echo the file name+comma in it: that will create a near blank line. Otherwise, prefix each line with the file name+comma.
#!/bin/bash
out=assorted.csv
# delete the file prior to doing concatenation
# or if ran twice it would be counted in the input files!
rm -f "$out"
for f in *.csv
do
if [ -s "$f" ] ; then
#cat "$f" | sed 's/^/$f,/' # cat+sed is too much here
sed "s/^/$f,/" "$f"
else
echo "$f,"
fi
done > $out
I have a folder with, say, ten data files I01.txt, ..., I10.txt.. Each file, when executed using the command /a.out, gives me five output files, namely f1.txt, f2.txt, ... f5.txt.
I have written a simple bash program to execute all the files and save the output printed on the screen to a variable file using the command
./ cosima_peaks_444_temp_muuttuva -args > $counter-res.txt.
Using this, I am able to save the on screen output to the file. But the five files f1 to f5 are altered to store results of the last file run, in this case I10, and the results of the first nine files are lost.
So I want to save the output of each I*.txt file (f1 ... f5) to a a different file such that, when the program executes I01.txt, using ./a.out it stores the output of the files
f1>var1-f1.txt , f2>var1-f2.txt... f5 > var1-f5.txt
and then repeats the same for I02 (f1>var2-f1.txt ...).
#!/bin/bash
# echo "for looping over all the .txt files"
echo -e "Enter the name of the file or q to quit "
read dir
if [[ $dir = q ]]
then
exit
fi
filename="$dir*.txt"
counter=0
if [[ $dir == I ]]
then
for f in $filename ; do
echo "output of $filename"
((counter ++))
./cosima_peaks_444_temp_muuttuva $f -m202.75 -c1 -ng0.5 -a0.0 -b1.0 -e1.0 -lg > $counter-res.txt
echo "counter $counter"
done
fi
If I understand you want to pass files l01.txt, l02.txt, ... to a.out and save the output for each execution of a.out to a separate file like f01.txt, f02.txt, ..., then you could use a short script that reads each file named l*.txt in the directory and passes the value to a.out redirecting the output to a file fN.txt (were N is the same number in the lN.txt filename.) This presumes you are passing each filename to a.out and that a.out is not reading the entire directory automatically.
for i in l*.txt; do
num=$(sed 's/^l\(.*\)[.]txt/\1/' <<<"$i")
./a.out "$i" > "f${num}.txt"
done
(note: that is 's/(lowercase L) ... /\one..')
note: if you do not want the same N from the filename (with its leading '0'), then you can trim the leading '0' from the N value for the output filename.
(you can use a counter as you have shown in your edited post, but you have no guarantee in sort order of the filenames used by the loop unless you explicitly sort them)
note:, this presumes NO spaces or embedded newline or other odd characters in the filename. If your lN.txt names can have odd characters or spaces, then feeding a while loop with find can avoid the odd character issues.
With f1 - f5 Created Each Run
You know the format for the output file name, so you can test for the existence of an existing file name and set a prefix or suffix to provide unique names. For example, if your first pass creates filenames 'pass1-f01.txt', 'pass1-f02.txt', then you can check for that pattern (in several ways) and increment your 'passN' prefix as required:
for f in "$filename"; do
num=$(sed 's/l*\(.*\)[.]txt/\1/' <<<"$f")
count=$(sed 's/^0*//' <<<"$num")
while [ -f "pass${count}-f${num}.txt" ]; do
((count++))
done
./a.out "$f" > "pass${count}-f${num}.txt"
done
Give that a try and let me know if that isn't closer to what you need.
(note: the use of the herestring (<<<) is bash-only, if you need a portable solution, pipe the output of echo "$var" to sed, e.g. count=$(echo "$num" | sed 's/^0*//') )
I replaced your cosima_peaks_444_temp_muuttuva with a function myprog.
The OP asked for more explanation, so I put in a lot of comment:
# This function makes 5 output files for testing the construction
function myprog {
# Fill the test output file f1.txt with the input filename and a datestamp
echo "Output run $1 on $(date)" > f1.txt
# The original prog makes 5 output files, so I copy the new testfile 4 times
cp f1.txt f2.txt
cp f1.txt f3.txt
cp f1.txt f4.txt
cp f1.txt f5.txt
}
# Use the number in the inputfile for making a unique filename and move the output
function move_output {
# The parameter ${1} is filled with something like I03.txt
# You can get the number with a sed action, but it is more efficient to use
# bash functions, even in 2 steps.
# First step: Cut off from the end as much as possiple (%%) starting with a dot.
Inumber=${1%%.*}
# Step 2: Remove the I from the Inumber (that is filled with something like "I03").
number=${Inumber#I}
# Move all outputfiles from last run
for outputfile in f*txt; do
# Put the number in front of the original name
mv "${outputfile}" "${number}_${outputfile}"
done
}
# Start the main processing. You will perform the same logic for all input files,
# so make a loop for all files. I guess all input files start with an "I",
# followed by 2 characters (a number), and .txt. No need to use ls for listing those.
for input in I??.txt; do
# Call the dummy prog above with the name of the first input file as a parameter
myprog "${input}"
# Now finally the show starts.
# Call the function for moving the 5 outputfiles to another name.
move_output "${input}"
done
I guess you have the source code of this a.out binary. If so, I would modify it so that it outputs to several fds instead of several files. Then you can solve this very cleanly using redirects:
./a.out 3> fileX.1 4> fileX.2 5> fileX.3
and so on for every file you want to output. Writing to a file or to a (redirected) fd is equivalent in most programs (notable exception: memory mapped I/O, but that is not commonly used for such scripts - look for mmap calls)
Note that this is not very esoteric, but a very well known technique that is regularly used to separate output (stdout, fd=1) from errors (stderr, fd=2).
Using a Bash script, I'd like to move a list of files by using a for-loop, not a while-loop (for testing purpose). Can anyone explain to me why mv always acts as file rename rather than file move under this for loop? How can I fix it to move the list of files?
The following works:
for file in "/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
do
mv "$file" "/Volumes/HDD2/"
done
UPDATE#1:
However, suppose that I have a sample_pathname.txt
cat sample_pathname.txt
"/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
Why the following for-loop will not work then?
array=$(cat sample_path2.txt)
for file in "${array[#]}"
do
mv "$file" "/Volumes/HDD2/"
done
Thanks.
System: OS X
Bash version: 3.2.53(1)
cat sample_pathname.txt
"/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
The quotation marks here are the problem. Unless you need to cope with file names with newlines in them, the simple and standard way to do this is to list one file name per line, with no quotes or other metainformation.
vbvntv$ cat sample_pathname_fixed.txt
/Volumes/HDD1/001.jpg
/Volumes/HDD1/002.jpg
vbvntv$ while read -r file; do
> mv "$file" "/Volumes/HDD2/"
> done <sample_pathname_fixed.txt
In fact, you could even
xargs mv -t /Volumes/HDD2 <sample_pathname_fixed.txt
(somewhat depending on how braindead your xargs is).
The syntax used in your example will not create an array... It is just storing the file contents in a variable named array.
IFS=$'\n' array=$(cat sample_path2.txt)
If you have a text file containing filenames (each on separate line would be simplest), you can load it into an array and iterate over it as follows. Note the use of $(< file ) as a better alternative to cat and the parenthesis to initialize the contents into an array. Each line of the file corresponds to an index.
array=($(< file_list.txt ))
for file in "${array[#]}"; do
mv "$file" "/absolute/path"
done
Update: Your IFS was probably not set correctly if the command at the top of the post didn't work. I updated it to reflect that. Also, there are a couple of other reliable ways to initialize an array from a file. But like you mentioned, if you are just piping the file directly into a while loop, you may not need it.
This is a shell builtin in Bash 4+ and a synonym of mapfile. This works great if its available.
readarray -t array < file
The 'read' command can also initialize an array for you:
IFS=$'\n' read -d '' -r -a array < file
use this:
for file in "/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
do
f=$(basename $file)
mv "$file" "/Volumes/HDD2/$f"
done