bash: sed processing of the files including leading-zeros - bash

I am using a bash script to loop over the files pre-defined in several groups of the array within the directory in order to edit the file in case if its exist (for example in the 1st group there are 100 files arranged from 0001 to 0100, in the second group - 50 files arranged from 0001 to 0050 etc).
#an array for the groups
systems=(one two three four)
# loop over the groups
for file in "${systems[#]}"; do
i="1"
# introduce K var because the files are numbered as 0001 ... 0100
k=$(printf '%03d' $i)
while [ $i -le 100 ]; do
if [ ! -f "${output}/${file}_${k}.pdb" ]; then
echo 'File '${output}/${file}_${k}.pdb' does not exits!'
break
else
## edit file via SED
# to add i-th number on the first string of the file and substitute smth on the last string;
sed -i -e '1 i\MODEL '$i'' -e 's/TER/ENDMDL/g' ${output}/${file}_${k}.pdb
((i++))
fi
done
done
This script doesn't not work on the stage of SED editing, but if I omit usage of the leading zeros in the names of files and use just i-th index within the script, everything works fine:
# loop over the groups
for file in "${systems[#]}"; do
i="1"
# put k into comment since filles arranged from 1 to 100 without leading zeros;
#k=$(printf '%03d' $i)
while [ $i -le 100 ]; do
# the filles arranged from 1 to 100
if [ ! -f "${output}/${file}_${i}.pdb" ]; then
echo 'File '${output}/${file}_${i}.pdb' does not exits!'
break
else
## edit file via SED
# to add i-th number on the first string of the file
sed -i -e '1 i\MODEL '$i'' -e 's/TER/ENDMDL/g' ${output}/${file}_${i}.pdb
((i++))
fi
done
done

k is assigned before the loop with i
i="1"
# introduce K var because the files are numbered as 0001 ... 0100
k=$(printf '%03d' $i)
while [ $i -le 100 ]; do
...
((i++))
...
done
Move the assignment to k inside the loop.
Alternative:
for ((i=1;i<100;i++)); do
k=$(printf '%03d' ${i})
...

NOTE: have made several edits.
No answers of my own here - just compiling into a single block of code, incorporating the the answers of jas (at his request) and Walter A who likely hit the real problem -
for file in "${systems[#]}"
do for ((i=1;i<100;i++))
do printf -v enumerated "${output}/${file}_%04d.pdb" $i
if [[ -f "$enumerated" ]
then sed -i -e "1 i\\MODEL $i" -e 's/TER/ENDMDL/g' $enumerated
else echo "file not found: '$enumerated''
fi
done
done
Depending on what else is in your directory structure, you might also try this:
for stub in "${systems[#]}"
do for file in "$output/${stub}"_[0-9][0-9][0-9][0-9].pdb
do sed -i -e "1 i\\MODEL ${file//[^0-9]/}" -e 's/TER/ENDMDL/g' "$file"
done
done

Related

How to split large *.csv files with headers in Bash?

I need split big *.csv file for several smaller. Currently there is 661497 rows, I need each file with max. 40000. I've tried solution that I found on Github but with no success:
FILENAME=/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files/my_file.csv
HDR=$(head -1 ${FILENAME})
split -l 40000 ${FILENAME} xyz
n=1
for f in xyz*
do
if [[ ${n} -ne 1 ]]; then
echo ${HDR} > part-${n}-${FILENAME}.csv
fi
cat ${f} >> part-${n}-${FILENAME}.csv
rm ${f}
((n++))
done
The error I get:
/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/download.sh: line 23: part-1-/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files/my_file.csv.csv: No such file or directory
thanks for help!
Keep in mind FILENAME contains both a directory and a file so later in the script when you build the new filename you get something like:
part-1-/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files/tyre_8.csv.csv
One quick-n-easy fix would be split the directory and filename into 2 separate variables, eg:
srcdir='/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files'
filename='tyre_8.csv'
hdr=$(head -1 ${srcdir}/${filename})
split -l 40000 "${srcdir}/${filename}" xyz
n=1
for f in xyz*
do
if [[ ${n} -ne 1 ]]; then
echo ${hdr} > "${srcdir}/part-${n}-${filename}"
fi
cat "${f}" >> "${srcdir}/part-${n}-${filename}"
rm "${f}"
((n++))
done
NOTES:
consider using lowercase variables (using uppercase variables raises the possibility of problems if there's an OS variable of the same name)
wrap variable references in double quotes in case string contains spaces
don't need to add a .csv extension on the new filename since it's already part of $filename

how to continue with the loop even though we use exit for a condition in shell script

I have a following list.txt file with the content
cat list.txt
one
two
zero
three
four
I have a shell script (check.sh) like below,
for i in $(cat list.txt)
do
if [ $i != zero ]; then
echo "the number is $i"
else
exit 1
fi
done
it gives output like below,
./check.sh
the number is one
the number is two
I want to have script which continue with the rest of the items in the list.txt, but it should not process zero and continue with the rest of item.
eg.
the number is one
the number is two
the number is three
the number is four
I tried using "return" but it did not work, gave error.
./check.sh: line 6: return: can only `return' from a function or sourced script
About exit (and return)
The command exit will quit running script. There is no way to continue.
As well, return command will quit function. There in no more way to continue.
About reading input file
For processing line based input file, you'd better to use while read instead of for i in $(cat...:
Simply try:
while read -r i;do
if [ "$i" != "zero" ] ;then
echo number $i
fi
done <list.txt
Alternatively, you could drop unwanted entries before loop:
while read -r i;do
echo number $i
done < <( grep -v ^zero$ <list.txt)
Note: In this specific case, ^zero$ don't need to be quoted. Consider quoting if your string do contain special characters or spaces.
If you have more than one entries to drop, you could use
while read -r i;do echo number $i ;done < <(grep -v '^\(zero\|null\)$' <list.txt)
Alternatively, once input file filtered, use xargs:
If your process is only one single command, you could avoid bash loop by using xargs:
xargs -n 1 echo number < <(grep -v '^\(zero\|null\)$' <list.txt)
How to use continue in bash script
Maybe you are thinking about something like:
while read -r i;do
if [ "$i" = "zero" ] ;then
continue
fi
echo number $i
done <list.txt
Argument of continue is a number representing number of loop to shortcut.
Try this:
for i in {1..5};do
for l in {a..d};do
if [ "$i" -eq 3 ] && [ "$l" = "b" ] ;then
continue 2
fi
echo $i.$l
done
done
(This print 3.a and stop 3 serie at 3.b, breaking 2 loop level)
Then compare with
for i in {1..5};do
for l in {a..d};do
if [ "$i" -eq 3 ] && [ "$l" = "b" ] ;then
continue 1
fi
echo $i.$l
done
done
(This print 3.a , 3.c and 3.d. Only 3.b are skipped, breaking only 1 loop level)

Process files in pairs

I have a list of files:
file_name_FOO31101.txt
file_name_FOO31102.txt
file_name_FOO31103.txt
file_name_FOO31104.txt
And I want to use pairs of files for input into a downstream program such as:
program_call file_name_01.txt file_name_02.txt
program_call file_name_03.txt file_name_04.txt
...
I do not want:
program_call file_name_02.txt file_name_03.txt
I need to do this in a loop as follows:
#!/bin/bash
FILES=path/to/files
for file in $FILES/*.txt;
do
stem=$( basename "${file}" ) # stem : file_name_FOO31104_info.txt
output_base=$( echo $stem | cut -d'_' -f 1,2,3 ) # output_base : FOO31104_info.txt
id=$( echo $stem | cut -d'_' -f 3 ) # get the first field : FOO31104
number=$( echo -n $id | tail -c 2 ) # get the last two digits : 04
echo $id $((id+1))
done
But this does not produce what I want.
In each loop I want to call a program once, with two files as input (last 2 digits of first file always odd 01, last 2 digits of second file always even 02)
I actually wouldn't use a for loop at all. A while loop that shifts files off is a perfectly reasonable way to do this.
# here, we're overriding the argument list with the list of files
# ...you can do this in a function if you want to keep the global argument list intact
set -- "$FILES"/*.txt ## without these quotes paths with spaces break
# handle the case where no files were found matching our glob
[[ -e $1 || -L $1 ]] || { echo "No .txt found in $FILES" >&2; exit 1; }
# here, we're doing our own loop over those arguments
while (( "$#" > 1 )); do ## continue in the loop only w/ 2-or-more remaining
echo "Processing files $1 and $2" ## ...substitute your own logic here...
shift 2 || break ## break even if test doesn't handle this case
done
# ...and add your own handling for the case where there's an odd number of files.
(( "$#" )) && echo "Left over file $1 still exists"
Note that the $#s are quoted inside (( )) here for StackOverflow's syntax highlighting, not because they otherwise need to be. :)
By the way -- consider using bash's native string manipulation.
stem=${file##*/}
IFS=_ read -r p1 p2 id p_rest <<<"$stem"
number=${id:$(( ${#id} - 2 ))}
output_base="${p1}${p2}${id}"
echo "$id $((10#number + 1))" # 10# ensures interpretation as decimal, not octal

issue with if statement in bash

I have issue with an if statement. In WEDI_RC is saved log file in the following format:
name_of_file date number_of_starts
I want to compare first argument $1 with first column and if it is true than increment number of starts. When I start my script it works but just with one file, eg:
file1.c 11:23:07 1
file1.c 11:23:14 2
file1.c 11:23:17 3
file1.c 11:23:22 4
file2.c 11:23:28 1
file2.c 11:23:35 2
file2.c 11:24:10 3
file2.c 11:24:40 4
file2.c 11:24:53 5
file1.c 11:25:13 1
file1.c 11:25:49 2
file2.c 11:26:01 1
file2.c 11:28:12 2
Every time when I change file it begin counts from 1. I need to continue with counting when it ends.
Hope you understand me.
while read -r line
do
echo "line:"
echo $line
if [ "$1"="$($line | grep ^$1)" ]; then
number=$(echo $line | grep $1 | awk -F'[ ]' '{print $3}')
else
echo "error"
fi
done < $WEDI_RC
echo "file"
((number++))
echo $1 `date +"%T"` $number >> $WEDI_RC
There are at least two ways to resolve the problem. The most succinct is probably:
echo "$1 $(date +"%T") $(($(grep -c "^$1 " "$WEDI_RC") + 1))" >> "$WEDI_RC"
However, if you want to have counts for each file separately, you can do that using an associative array, assuming you have Bash version 4.x (not 3.x as is provided on Mac OS X, for example). This code assumes the file is correctly formatted (so that the counts do not reset to 1 each time the file name changes).
declare -A files # Associative array
while read -r file time count # Split line into three variables
do
echo "line: $file $time $count" # One echo - not two
files[$file]="$count" # Record the current maximum for file
done < "$WEDI_RC"
echo "$1 $(date +"%T") $(( ${files[$1]} + 1 ))" >> "$WEDI_RC"
The code uses read to split the line into three separate variables. It echoes what it read and records the current count. When the loop's done, it echoes the data to append to the file. If the file is new (not mentioned in the file yet), then you will get a 1 added.
If you need to deal with the broken file as input, then you can amend the code to count the number of entries for a file, instead of trusting the count value. The bare-array reference notation used in the (( … )) operation is necessary when incrementing the variable; you can't use ${array[sub]}++ with the increment (or decrement) operator because that evaluates to the value of the array element, not its name!
declare -A files # Associative array
while read -r file time count # Split line into three variables
do
echo "line: $file $time $count" # One echo - not two
((files[$file]++)) # Count the occurrences of file
done < "$WEDI_RC"
echo "$1 $(date +"%T") $(( ${files[$1]} + 1 ))" >> "$WEDI_RC"
You can even detect whether the format is in the broken or fixed style:
declare -A files # Associative array
while read -r file time count # Split line into three variables
do
echo "line: $file $time $count" # One echo - not two
if [ $((files[$file]++)) != "$count" ]
then echo "$0: warning - count out of sync: ${files[$file]} vs $count" >&2
fi
done < "$WEDI_RC"
echo "$1 $(date +"%T") $(( ${files[$1]} + 1 ))" >> "$WEDI_RC"
I don't get exactly what you want to achieve with your test [ "$1"="$($line | grep ^$1)" ] but it seems you are checking that the line start with the first argument.
If it is so, I think you can either:
provide the -o option to grep so that it print just the matched output (so $1)
use [[ "$line" =~ ^"$1" ]] as test.

bash script using multiple while loops and read line

I am trying to write a bash script to create some playlists of music. The part that has me stuck is the while loop for read line. I figure I am over thinking this so I turned to stackoverflow for assistance.
# The first while loop is how many playlists I want to create
i=1
while [ $i -le $plist ]
do
echo -e "iteration $i"
i=$[$i + 1]
z=0
# This while loop is for the length of time I want the playlist to be
while [ $z -le $TOTAL ]
do
echo -e "Count $z"
z=$[$z + xxx]
# This while loop is for reading the track list previously generated.
# It would read the line, calculate the track length,
# add to $z, cp the track to a folder
while read line
do
secs=$(metaflac --show-total-samples --show-sample-rate "$line" | tr '\n' ' '
| awk '{print $1/$2}' -)
z=$[$z + $secs]
cp $line to destination folder
done
done
done

Resources