Process files in pairs - bash

I have a list of files:
file_name_FOO31101.txt
file_name_FOO31102.txt
file_name_FOO31103.txt
file_name_FOO31104.txt
And I want to use pairs of files for input into a downstream program such as:
program_call file_name_01.txt file_name_02.txt
program_call file_name_03.txt file_name_04.txt
...
I do not want:
program_call file_name_02.txt file_name_03.txt
I need to do this in a loop as follows:
#!/bin/bash
FILES=path/to/files
for file in $FILES/*.txt;
do
stem=$( basename "${file}" ) # stem : file_name_FOO31104_info.txt
output_base=$( echo $stem | cut -d'_' -f 1,2,3 ) # output_base : FOO31104_info.txt
id=$( echo $stem | cut -d'_' -f 3 ) # get the first field : FOO31104
number=$( echo -n $id | tail -c 2 ) # get the last two digits : 04
echo $id $((id+1))
done
But this does not produce what I want.
In each loop I want to call a program once, with two files as input (last 2 digits of first file always odd 01, last 2 digits of second file always even 02)

I actually wouldn't use a for loop at all. A while loop that shifts files off is a perfectly reasonable way to do this.
# here, we're overriding the argument list with the list of files
# ...you can do this in a function if you want to keep the global argument list intact
set -- "$FILES"/*.txt ## without these quotes paths with spaces break
# handle the case where no files were found matching our glob
[[ -e $1 || -L $1 ]] || { echo "No .txt found in $FILES" >&2; exit 1; }
# here, we're doing our own loop over those arguments
while (( "$#" > 1 )); do ## continue in the loop only w/ 2-or-more remaining
echo "Processing files $1 and $2" ## ...substitute your own logic here...
shift 2 || break ## break even if test doesn't handle this case
done
# ...and add your own handling for the case where there's an odd number of files.
(( "$#" )) && echo "Left over file $1 still exists"
Note that the $#s are quoted inside (( )) here for StackOverflow's syntax highlighting, not because they otherwise need to be. :)
By the way -- consider using bash's native string manipulation.
stem=${file##*/}
IFS=_ read -r p1 p2 id p_rest <<<"$stem"
number=${id:$(( ${#id} - 2 ))}
output_base="${p1}${p2}${id}"
echo "$id $((10#number + 1))" # 10# ensures interpretation as decimal, not octal

Related

Creating files in succession

How would one go about creating a script for creating 25 empty files in succession? (I.e 1-25, 26-51, 52-77)
I can create files 1-25 but I’m having trouble figuring out how to create a script that continues that process from where it left off, every time I run the script.
#!/bin/bash
higher=$( find files -type f -exec basename {} \; | sort -n | tail -1 )
if [[ "$higher" == "" ]]
then
start=1
end=25
else
(( start = higher + 1 ))
(( end = start + 25 ))
fi
echo "$start --> $end"
for i in $(seq $start 1 $end)
do
touch files/"$i"
done
I put my files in a directory called "files".
hence the find on directory "files".
for each file found, I run a basename on it. That will return only integer values, since the files all have a number filename.
sort -n puts them in order.
tail -1 extracts the highest number.
if there are no files, higher will be empty, so the indexes will be 1 and 25.
otherwise, they will be higher + 1, and higher + 26.
I used seq for the for loop to avoid problems with variables inside a range definition (you did {1..25})
#! /usr/bin/env bash
declare -r base="${1:-base-%d.txt}"
declare -r lot="${2:-25}"
declare -i idx=1
declare -i n=0
printf -v filename "${base}" ${idx}
while [[ -e "${filename}" ]]; do
idx+=1
printf -v filename "${base}" "${idx}"
done
while [[ $n -lt $lot ]]; do
printf -v filename "${base}" ${idx}
if [[ ! -e "${filename}" ]]; then
> "$filename"
n+=1
fi
idx+=1
done
This script accepts two optional parameters.
The first is the basename of your future files with a %d token automatically replaced by the file number. Default value is base-%d.txt;
The number of file to create. Default value is 25.
How script works:
Variable declarations
base: file basename (constant)
lot: number of file to create (constant)
idx: search index
n: counter for new files
Search files already created from 1
The loop stop at first hole in the numbering
Loop to create empty files
The condition in the loop allows to fill in the numbering holes
> filename create an empty file

How can I increment a number at the end of a string in bash?

Basically i need to create a function where an argument is passed, and i need to update the number so for example the argument would be
version_2 and after the function it would change it to version_3
just increments by one
in java I would just create a new string, and grab the last character update by one and append but not sure how to do it in bash.
updateVersion() {
version=$1
}
the prefix can be anything for example it can be dog12 or dog_12 and always has one number to update.
after the update it would be dog13 or dog_13 respectively.
updateVersion()
{
[[ $1 =~ ([^0-9]*)([0-9]+) ]] || { echo 'invalid input'; exit; }
echo "${BASH_REMATCH[1]}$(( ${BASH_REMATCH[2]} + 1 ))"
}
# Usage
updateVersion version_11 # output: version_12
updateVersion version11 # output: version12
updateVersion something_else123 # output: something_else124
updateVersion "with spaces 99" # output: with spaces 100
# Putting it in a variable
v2="$(updateVersion version2)"
echo "$v2" # output: version3
Use parameter expansion:
#! /bin/bash
shopt -s extglob
for version in version_1 version_19 version_34.14 ; do
echo $version
v=${version##*[^0-9]}
((++v))
echo ${version%%+([0-9])}$v
done
extglob is needed for the +([0-9]) construct which means "one or more digits".
incrementTrailingNumber() {
local prefix number
if [[ $1 =~ ^(.*[^[:digit:]])([[:digit:]]+)$ ]]; then
prefix=${BASH_REMATCH[1]}
number=${BASH_REMATCH[2]}
printf '%s%s\n' "$prefix" "$(( number + 1 ))"
else
printf '%s\n' "$1"
fi
}
Usage as:
$ incrementTrailingNumber version_30
version_31
$ incrementTrailingNumber foo-2.15
foo-2.16
$ incrementTrailingNumber noNumberHereAtAll # demonstrate noop case
noNumberHereAtAll
Late to the party here, but there is an issue with the accepted answer. It works for the OP's case where there are no numbers before the end, but I had an example like this:
1.0.143
For that, the regexp needs to be a bit looser. Here's how I did it, preserving leading zeroes:
#!/usr/bin/env bash
updateVersion()
{
[[ ${1} =~ ^(.*[^0-9])?([0-9]+)$ ]] && \
[[ ${#BASH_REMATCH[1]} -gt 0 ]] && \
printf "%s%0${#BASH_REMATCH[2]}d" "${BASH_REMATCH[1]}" "$((10#${BASH_REMATCH[2]} + 1 ))" || \
printf "%0${#BASH_REMATCH[2]}d" "$((10#${BASH_REMATCH[2]} + 1))" || \
printf "${1}"
}
# Usage
updateVersion 09 # output 10
updateVersion 1.0.450 # output 1.0.451
updateVersion version_01 # output version_02
updateVersion version12 # output version13
updateVersion version19 # output version20
Notes:
You only need to double-quote the first argument to printf.
Replace ${1} with content in "" if you want to use it on a command line,
instead of in a function.
You can switch the last printf to a basic echo if you prefer. If you are just printing to stdout or stderr, consider adding a newline (\n) at the end of each printf.
You can combine the function content into a single line, but it's harder to read. It's better to break it into lines with \ at every if (&&) and else (||), as above.
What the function does - line by line:
Test the passed value ends with a number of one or more digits, optionally prefixed by at least one non-number. Split into two groups accordingly (indexing is 1-based).
When ending in a number, test there is a non-numeric prefix (i.e. length of group 1 > 0).
When there are non-numerics, print group 1 (a string) followed by group 2 (an integer padded with zeroes to match the original string size). Group 2 is base-10 converted and incremented by 1. The conversion is important - leading zeroes are interpreted as octal by default.
When there are only numbers, increment as above but just print group 2.
If the input is anything else, return the supplied string.

Why doesn't counting files with "for file in $0/*; let i=$i+1; done" work?

I'm new in ShellScripting and have the following script that i created based on a simpler one, i want to pass it an argument with the path to count files. Cannot find my logical mistake to make it work right, the output is always "1"
#!/bin/bash
i=0
for file in $0/*
do
let i=$i+1
done
echo $i
To execute the code i use
sh scriptname.sh /path/to/folder/to/count/files
$0 is the name with which your script was invoked (roughly, subject to several exceptions that aren't pertinent here). The first argument is $1, and so it's $1 that you want to use in your glob expression.
#!/bin/bash
i=0
for file in "$1"/*; do
i=$(( i + 1 )) ## $(( )) is POSIX-compliant arithmetic syntax; let is deprecated.
done
echo "$i"
That said, you can get this number more directly:
#!/bin/bash
shopt -s nullglob # allow globs to expand to an empty list
files=( "$1"/* ) # put list of files into an array
echo "${#files[#]}" # count the number of items in the array
...or even:
#!/bin/sh
set -- "$1"/* # override $# with the list of files matching the glob
if [ -e "$1" ] || [ -L "$1" ]; then # if $1 exists, then it had matches
echo "$#" # ...so emit their number.
else
echo 0 # otherwise, our result is 0.
fi
If you want to count the number of files in a directory, you can run something like this:
ls /path/to/folder/to/count/files | wc -l

issue with if statement in bash

I have issue with an if statement. In WEDI_RC is saved log file in the following format:
name_of_file date number_of_starts
I want to compare first argument $1 with first column and if it is true than increment number of starts. When I start my script it works but just with one file, eg:
file1.c 11:23:07 1
file1.c 11:23:14 2
file1.c 11:23:17 3
file1.c 11:23:22 4
file2.c 11:23:28 1
file2.c 11:23:35 2
file2.c 11:24:10 3
file2.c 11:24:40 4
file2.c 11:24:53 5
file1.c 11:25:13 1
file1.c 11:25:49 2
file2.c 11:26:01 1
file2.c 11:28:12 2
Every time when I change file it begin counts from 1. I need to continue with counting when it ends.
Hope you understand me.
while read -r line
do
echo "line:"
echo $line
if [ "$1"="$($line | grep ^$1)" ]; then
number=$(echo $line | grep $1 | awk -F'[ ]' '{print $3}')
else
echo "error"
fi
done < $WEDI_RC
echo "file"
((number++))
echo $1 `date +"%T"` $number >> $WEDI_RC
There are at least two ways to resolve the problem. The most succinct is probably:
echo "$1 $(date +"%T") $(($(grep -c "^$1 " "$WEDI_RC") + 1))" >> "$WEDI_RC"
However, if you want to have counts for each file separately, you can do that using an associative array, assuming you have Bash version 4.x (not 3.x as is provided on Mac OS X, for example). This code assumes the file is correctly formatted (so that the counts do not reset to 1 each time the file name changes).
declare -A files # Associative array
while read -r file time count # Split line into three variables
do
echo "line: $file $time $count" # One echo - not two
files[$file]="$count" # Record the current maximum for file
done < "$WEDI_RC"
echo "$1 $(date +"%T") $(( ${files[$1]} + 1 ))" >> "$WEDI_RC"
The code uses read to split the line into three separate variables. It echoes what it read and records the current count. When the loop's done, it echoes the data to append to the file. If the file is new (not mentioned in the file yet), then you will get a 1 added.
If you need to deal with the broken file as input, then you can amend the code to count the number of entries for a file, instead of trusting the count value. The bare-array reference notation used in the (( … )) operation is necessary when incrementing the variable; you can't use ${array[sub]}++ with the increment (or decrement) operator because that evaluates to the value of the array element, not its name!
declare -A files # Associative array
while read -r file time count # Split line into three variables
do
echo "line: $file $time $count" # One echo - not two
((files[$file]++)) # Count the occurrences of file
done < "$WEDI_RC"
echo "$1 $(date +"%T") $(( ${files[$1]} + 1 ))" >> "$WEDI_RC"
You can even detect whether the format is in the broken or fixed style:
declare -A files # Associative array
while read -r file time count # Split line into three variables
do
echo "line: $file $time $count" # One echo - not two
if [ $((files[$file]++)) != "$count" ]
then echo "$0: warning - count out of sync: ${files[$file]} vs $count" >&2
fi
done < "$WEDI_RC"
echo "$1 $(date +"%T") $(( ${files[$1]} + 1 ))" >> "$WEDI_RC"
I don't get exactly what you want to achieve with your test [ "$1"="$($line | grep ^$1)" ] but it seems you are checking that the line start with the first argument.
If it is so, I think you can either:
provide the -o option to grep so that it print just the matched output (so $1)
use [[ "$line" =~ ^"$1" ]] as test.

How to know if file in a loop is the last one?

Example
for FILE in $DIR/*
do
if(<is last File>)
doSomethingSpecial($FILE)
else
doSomethingRegular($FILE)
fi
done
What to call for <is last file> to check if the current file is the last one in the array ?
Is there an easy built-in check without checking the array's length by myself ?
What to call for to check if the current file is the last one in the array ?
For a start, you are not using an array. If you were then it would be easy:
declare -a files
files=($DIR/*)
pos=$(( ${#files[*]} - 1 ))
last=${files[$pos]}
for FILE in "${files[#]}"
do
if [[ $FILE == $last ]]
then
echo "$FILE is the last"
break
else
echo "$FILE"
fi
done
I know of no way to tell that you are processing the last element of a list in a for loop. However you could use an array, iterate over all but the last element, and then process the last element outside the loop:
files=($DIR/*)
for file in "${files[#]::${#files[#]}-1}" ; do
doSomethingRegular "$file"
done
doSomethingSpecial "${files[#]: -1:1}"
The expansion ${files[#]:offset:length} evaluates to all the elements starting at offset (or the beginning if empty) for length elements. ${#files[#]}-1 is the number of elements in the array minus 1.
${files[#]: -1:1} evaluates to the last element - -1 from the end, length 1. The space is necessary as :- is treated differently to : -.
Try this
LAST_FILE=""
for f in *
do
if [ ! -z $LAST_FILE ]
then
echo "Process file normally $LAST_FILE"
fi
LAST_FILE=$f
done
if [ ! -z $LAST_FILE ]
then
echo "Process file as last file $LAST_FILE"
fi
Produces
bash[1051]: ls
1 2 3 4
bash[1052]: sh ../last_file.sh
Process file normally 1
Process file normally 2
Process file normally 3
Process file as last file 4
You can use find to find the total number of files.
Then when you are in the loop count to the total number and carry out your task when the total equals the count i.e, the last file.
f=0
tot_files=`find . -iname '*.txt' | wc -l`
for FILE in $DIR/*
do
f=($f+1)
if [[ $f == $tot_files ]];then
carryout your task
fi
done
Building on the current highest-voted answer from #cdarke (https://stackoverflow.com/a/12298757/415523), if looking at a general array of values (rather than specifically files on disk), the loop code would be as follows:
declare -a array
declare -i length current
array=( a b c d e c )
length=${#array[#]}
current=0
for VALUE in "${array[#]}"; do
current=$((current + 1))
if [[ "$current" -eq "$length" ]]; then
echo "$VALUE is the last"
else
echo "$VALUE"
fi
done
This yields the output:
a
b
c
d
e
c is the last
This ensures that only the last item in the array triggers the alternative action and that, if any other item in the array duplicates the last value, the alternative action is not called for the earlier duplicates.
In the case of an array of paths to files in a specific directory, e.g.
array=( $DIR/* )
...it is probably less of a concern, since individual filenames within the same directory are almost-certainly unique (unless you have a really odd filesystem!)
You can abuse the positional parameters, since they act similarly to an array,
but are a little easier to manipulate. You should either save the old positional
parameters, or execute in a subshell.
# Method 1: use a subshell. Slightly cleaner, but you can't always
# do this (for example, you may need to affect variables in the current
# shell
files=( $DIR/* )
(
set -- "${files[#]}"
until (( $# == 1 )); do
doSomethingRegular "$1"
shift
done
doSomethingSpecial "$1"
)
# Method 2: save the positional parameters. A bit uglier, but
# executes everything in the same shell.
files=( $DIR/* )
oldPP=( "$#" )
set -- "${files[#]}"
until (( $# == 1 )); do
doSomethingRegular "$1"
shift
done
doSomethingSpecial "$1"
set -- "${oldPP[#]}"
What makes a file the last one? Is there something special about it? Is it the file with the greatest name when sorted by name?
Maybe you can take the file names backwards. Then, it's the first file you want to treat special and not the last. figuring out the first is a much easier task than doing the last:
for file in $(ls -r1 $dir)
do
if [ ! $processedLast ]
then
doSomethingSpecial($file)
processedLast=1
else
doSomethingRegular($file)
fi
done
No arrays needed. Actually, I like chepner's answer about using positional parameters.
It's old question - but building on answer from #GregReynolds please use this one-liner if commands differ only by parameters on last pass. Ugly, ugly code for one-liner lovers
( ff="" ; for f in * "" ; do [ -n "$ff" ] && echo $(${f:+false} && echo $ff alternate params here || echo normal params $ff ) ; ff=$f ; done )
normal params 1
normal params 2
normal params 3
4 alternate params here

Resources