Iterating through pairs of files with glob - bash

I'm having a difficult time trying to iterate through a long set of files that I need to pair up to run through some process. I'd like to generate a bit of a batch file, pairing each set of matching files one per line. I've done this kind of thing before when it's a simple replacement (e.g. file1 = something.txt, file2 = something.csv). But in this case, the end of the file string is a random UUID, and I can't figure out how to get bash to properly expand the glob the second file.
Given a directory of files like this:
banana_pre-proc_b101a65a-31c7-5e4f-b433-bac4fb1efc1f.txt
banana_proc_a75b3a3e-7140-1cb6-2ad1-c10f7db6743f.txt
cherry_pre-proc_f5d0716f-c205-b0b4-5c63-d33755767de4.txt
cherry_proc_025ff6d5-534d-0020-5446-5da3ed04adc6.txt
kiwi_pre-proc_26075f3b-e3a2-fc1a-a741-615cacfc1a7e.txt
kiwi_proc_be1760f6-413d-edc0-1efc-a134b1b6bfbb.txt
peach_pre-proc_ecafbb30-3df0-6014-61ee-11d1d5745b53.txt
peach_proc_bb3ea3fc-671e-e024-6e61-06a2bc147363.txt
pear_pre-proc_c2db376f-f351-7141-114e-a2ebc3cfc410.txt
pear_proc_ccb2f16a-27cd-c70d-7aac-ce72c3af6575.txt
How can I get a file that looks like:
banana_pre-proc_b101a65a-31c7-5e4f-b433-bac4fb1efc1f.txt banana_proc_a75b3a3e-7140-1cb6-2ad1-c10f7db6743f.txt
cherry_pre-proc_f5d0716f-c205-b0b4-5c63-d33755767de4.txt cherry_proc_025ff6d5-534d-0020-5446-5da3ed04adc6.txt
kiwi_pre-proc_26075f3b-e3a2-fc1a-a741-615cacfc1a7e.txt kiwi_proc_be1760f6-413d-edc0-1efc-a134b1b6bfbb.txt
peach_pre-proc_ecafbb30-3df0-6014-61ee-11d1d5745b53.txt peach_proc_bb3ea3fc-671e-e024-6e61-06a2bc147363.txt
pear_pre-proc_c2db376f-f351-7141-114e-a2ebc3cfc410.txt pear_proc_ccb2f16a-27cd-c70d-7aac-ce72c3af6575.txt
I thought I could do something like
for f in *pre-proc_*txt; do echo "$f" "${f/-pre-proc_/-proc_}"; done
But that doesn't deal with the UUID at the end of the file. I've tried a few other iterations of this strategy too, but none get any closer. What is the trick to doing this? Obviously for a few files like this, I can just manually do it. But, the actual set of files I need to process is quite long and apart from just pulling them all into a text doc and then using some Vim macro or something, I'm a bit baffled as to how to get Bash to expand the glob like I'm intending.

This seems to work:
for preproc in *_pre-proc*; do
base=${preproc%_pre-proc*}
proc=${base}_proc*
echo $preproc $proc
done
We get a base name by stripping of the _pre_proc<uuid> part, and
then use the base name to find the matching _proc file.

This I think should be sufficient:
printf "%s %s\n" *[-_]proc_*.txt
Glob expansions are sorted and the pairs of files share the same prefix.

Related

BASH Shell Find Multiple Files with Wildcard and Perform Loop with Action

I have a script that I call with an application, I can't run it from command line. I derive the directory where the script is called and in the next variable go up 1 level where my files are stored. From there I have 3 variables with the full path and file names (with wildcard), which I will refer to as "masks".
I need to find and "do something with" (copy/write their names to a new file, whatever else) to each of these masks. The do something part isn't my obstacle as I've done this fine when I'm working with a single mask, but I would like to do it cleanly in a single loop instead of duplicating loop and just referencing each mask separately if possible.
Assume in my $FILESFOLDER directory below that I have 2 existing files, aaa0.csv & bbb0.csv, but no file matching the ccc*.csv mask.
#!/bin/bash
SCRIPTFOLDER=${0%/*}
FILESFOLDER="$(dirname "$SCRIPTFOLDER")"
ARCHIVEFOLDER="$FILESFOLDER"/archive
LOGFILE="$SCRIPTFOLDER"/log.txt
FILES1="$FILESFOLDER"/"aaa*.csv"
FILES2="$FILESFOLDER"/"bbb*.csv"
FILES3="$FILESFOLDER"/"ccc*.csv"
ALLFILES="$FILES1
$FILES2
$FILES3"
#here as an example I would like to do a loop through $ALLFILES and copy anything that matches to $ARCHIVEFOLDER.
for f in $ALLFILES; do
cp -v "$f" "$ARCHIVEFOLDER" > "$LOGFILE"
done
echo "$ALLFILES" >> "$LOGFILE"
The thing that really spins my head is when I run something like this (I haven't done it with the copy command in place) that log file at the end shows:
filesfolder/aaa0.csv filesfolder/bbb0.csv filesfolder/ccc*.csv
Where I would expect echoing $ALLFILES just to show me the masks
filesfolder/aaa*.csv filesfolder/bbb*.csv filesfolder/ccc*.csv
In my "do something" area, I need to be able to use whatever method to find the files by their full path/name with the wildcard if at all possible. Sometimes my network is down for maintenance and I don't want to risk failing a change directory. I rarely work in linux (primarily SQL background) so feel free to poke holes in everything I've done wrong. Thanks in advance!
Here's a light refactoring with significantly fewer distracting variables.
#!/bin/bash
script=${0%/*}
folder="$(dirname "$script")"
archive="$folder"/archive
log="$folder"/log.txt # you would certainly want this in the folder, not $script/log.txt
shopt -s nullglob
all=()
for prefix in aaa bbb ccc; do
cp -v "$folder/$prefix"*.csv "$archive" >>"$log" # append, don't overwrite
all+=("$folder/$prefix"*.csv)
done
echo "${all[#]}" >> "$log"
The change in the loop to append the output or cp -v instead of overwrite is a bug fix; otherwise the log would only contain the output from the last loop iteration.
I would probably prefer to have the files echoed from inside the loop as well, one per line, instead of collect them all on one humongous line. Then you can remove the array all and instead simply
printf '%s\n' "$folder/$prefix"*.csv >>"$log"
shopt -s nullglob is a Bash extension (so won't work with sh) which says to discard any wildcard which doesn't match any files (the default behavior is to leave globs unexpanded if they don't match anything). If you want a different solution, perhaps see Test whether a glob has any matches in Bash
You should use lower case for your private variables so I changed that, too. Notice also how the script variable doesn't actually contain a folder name (or "directory" as we adults prefer to call it); fixing that uncovered a bug in your attempt.
If your wildcards are more complex, you might want to create an array for each pattern.
tmpspaces=(/tmp/*\ *)
homequest=($HOME/*\?*)
for file in "${tmpspaces[#]}" "${homequest[#]}"; do
: stuff with "$file", with proper quoting
done
The only robust way to handle file names which could contain shell metacharacters is to use an array variable; using string variables for file names is notoriously brittle.
Perhaps see also https://mywiki.wooledge.org/BashFAQ/020

Bash scripting print list of files

Its my first time to use BASH scripting and been looking to some tutorials but cant figure out some codes. I just want to list all the files in a folder, but i cant do it.
Heres my code so far.
#!/bin/bash
# My first script
echo "Printing files..."
FILES="/Bash/sample/*"
for f in $FILES
do
echo "this is $f"
done
and here is my output..
Printing files...
this is /Bash/sample/*
What is wrong with my code?
You misunderstood what bash means by the word "in". The statement for f in $FILES simply iterates over (space-delimited) words in the string $FILES, whose value is "/Bash/sample" (one word). You seemingly want the files that are "in" the named directory, a spatial metaphor that bash's syntax doesn't assume, so you would have to explicitly tell it to list the files.
for f in `ls $FILES` # illustrates the problem - but don't actually do this (see below)
...
might do it. This converts the output of the ls command into a string, "in" which there will be one word per file.
NB: this example is to help understand what "in" means but is not a good general solution. It will run into trouble as soon as one of the files has a space in its nameā€”such files will contribute two or more words to the list, each of which taken alone may not be a valid filename. This highlights (a) that you should always take extra steps to program around the whitespace problem in bash and similar shells, and (b) that you should avoid spaces in your own file and directory names, because you'll come across plenty of otherwise useful third-party scripts and utilities that have not made the effort to comply with (a). Unfortunately, proper compliance can often lead to quite obfuscated syntax in bash.
I think problem in path "/Bash/sample/*".
U need change this location to absolute, for example:
/home/username/Bash/sample/*
Or use relative path, for example:
~/Bash/sample/*
On most systems this is fully equivalent for:
/home/username/Bash/sample/*
Where username is your current username, use whoami to see your current username.
Best place for learning Bash: http://www.tldp.org/LDP/abs/html/index.html
This should work:
echo "Printing files..."
FILES=(/Bash/sample/*) # create an array.
# Works with filenames containing spaces.
# String variable does not work for that case.
for f in "${FILES[#]}" # iterate over the array.
do
echo "this is $f"
done
& you should not parse ls output.
Take a list of your files)
If you want to take list of your files and see them:
ls ###Takes list###
ls -sh ###Takes list + File size###
...
If you want to send list of files to a file to read and check them later:
ls > FileName.Format ###Takes list and sends them to a file###
ls > FileName.Format ###Takes list with file size and sends them to a file###

replace $1 variable in file with 1-10000

I want to create 1000s of this one file.
All I need to replace in the file is one var
kitename = $1
But i want to do that 1000s of times to create 1000s of diff files.
I'm sure it involves a loop.
people answering people is more effective than google search!
thx
I'm not really sure what you are asking here, but the following will create 1000 files named filename.n containing 1 line each which is "kite name = n" for n = 1 to n = 1000
for i in {1..1000}
do
echo "kitename = $i" > filename.$i
done
If you have mysql installed, it comes with a lovely command line util called "replace" which replaces files in place across any number of files. Too few people know about this, given it exists on most linux boxen everywhere. Syntax is easy:
replace SEARCH_STRING REPLACEMENT -- targetfiles*
If you MUST use sed for this... that's okay too :) The syntax is similar:
sed -i.bak s/SEARCH_STRING/REPLACEMENT/g targetfile.txt
So if you're just using numbers, you'd use something like:
for a in {1..1000}
do
cp inputFile.html outputFile-$a.html
replace kitename $a -- outputFile-$a.html
done
This will produce a bunch of files "outputFile-1.html" through "outputFile-1000.html", with the word "kitename" replaced by the relevant number, inside the file.
But, if you want to read your lines from a file rather than generate them by magic, you might want something more like this (we're not using "for a in cat file" since that splits on words, and I'm assuming here you'd have maybe multi-word replacement strings that you'd want to put in:
cat kitenames.txt | while read -r a
do
cp inputFile.html "outputFile-$a.html"
replace kitename "$a" -- kitename-$a
done
This will produce a bunch of files like "outputFile-red kite.html" and "outputFile-kite with no string.html", which have the word "kitename" replaced by the relevant name, inside the file.

Create a new sequence of files from an existing sequence, along with numbering

I know this question has been asked, but I can't find more than one solution, and it does not work for me. Essentially, I'm looking for a bash script that will take a file list that looks like this:
image1.jpg
image2.jpg
image3.jpg
And then make a copy of each one, but number it sequentially backwards. So, the sequence would have three new files created, being:
image4.jpg
image5.jpg
image6.jpg
And yet, image4.jpg would have been an untouched copy of image3.jpg, and image5.jpg an untouched copy of image2.jpg, and so on. I have already tried the solution outlined in this stackoverflow question with no luck. I am admittedly not very far down the bash scripting path, and if I take the chunk of code in the first listed answer and make a script, I always get "2: Syntax error: "(" unexpected" over and over. I've tried changing the syntax with the ( around a bit, but no success ever. So, either I am doing something wrong or there's a better script around.
Sorry for not posting this earlier, but the code I'm using is:
image=( image*.jpg )
MAX=${#image[*]}
for i in ${image[*]}
do
num=${i:5:3} # grab the digits
compliment=$(printf '%03d' $(echo $MAX-$num | bc))
ln $i copy_of_image$compliment.jpg
done
And I'm taking this code and pasting it into a file with nano, and adding !#/bin/bash as the first line, then chmod +x script and executing in bash via sh script. Of course, in my test runs, I'm using files appropriately titled image1.jpg - but I was also wondering about a way to apply this script to a directory of jpegs, not necessarily titled image(integer).jpg - in my file keeping structure, most of these are a single word, followed by a number, then .jpg, and it would be nice to not have to rewrite the script for each use.
Perhaps something like this. It will work well for something like script image*.jpg where the wildcard matches a set of files which match a regular pattern with monotonously increasing numbers of the same length, and less ideally with a less regular subset of the files in the current directory. It simply assumes that the last file's digit index plus one through the total number of file names is the range of digits to loop over.
#!/bin/sh
# Extract number from final file name
eval lastidx=\$$#
tmp=${lastidx#*[!0-9][0-9]}
lastidx=${lastidx#${lastidx%[0-9]$tmp}}
tmp=${lastidx%[0-9][!0-9]*}
lastidx=${lastidx%${lastidx#$tmp[0-9]}}
num=$(expr $lastidx + $#)
width=${#lastidx}
for f; do
pref=${f%%[0-9]*}
suff=${f##*[0-9]}
# Maybe show a warning if pref, suff, or width changed since the previous file
printf "cp '$f' '$pref%0${width}i$suff'\\n" $num
num=$(expr $num - 1)
done |
sh
This is sh-compatible; the expr stuff and the substring extraction up front is ugly but Bourne-compatible. If you are fine with the built-in arithmetic and string manipulation constructs of Bash, converting to that form should be trivial.
(To be explicit, ${var%foo} returns the value of $var with foo trimmed off the end, and ${var#foo} does similar trimming from the beginning of the value. Regular shell wildcard matching operators are available in the expression for what to trim. ${#var} returns the length of the value of $var.)
Maybe your real test data runs from 001 to 300, but here you have image1 2 3, and therefore you extract one, not three digits from the filename. num=${i:5:1}
Integer arithmetic can be done in the bash without calling bc
${#image[#]} is more robust than ${#image[*]}, but shouldn't be a difference here.
I didn't consult a dictionary, but isn't compliment something for your girl friend? The opposite is complement, isn't it? :)
the other command made links - to make copies, call cp.
Code:
#!/bin/bash
image=( image*.jpg )
MAX=${#image[#]}
for i in ${image[#]}
do
num=${i:5:1}
complement=$((2*$MAX-$num+1))
cp $i image$complement.jpg
done
Most important: If it is bash, call it with bash. Best: do a shebang (as you did), make it executable and call it by ./name . Calling it with sh name will force the wrong interpreter. If you don't make it executable, call it bash name.

How to rename files keeping a variable part of the original file name

I'm trying to make a script that will go into a directory and run my own application with each file matching a regular expression, specifically Test[0-9]*.txt.
My input filenames look like this TestXX.txt. Now, I could just use cut and chop off the Test and .txt, but how would I do this if XX wasn't predefined to be two digits? What would I do if I had Test1.txt, ..., Test10.txt? In other words, How would I get the [0-9]* part?
Just so you know, I want to be able to make a OutputXX.txt :)
EDIT:
I have files with filename Test[0-9]*.txt and I want to manipulate the string into Output[0-9]*.txt
Would something like this help?
#!/bin/bash
for f in Test*.txt ;
do
process < $f > ${f/Test/Output}
done
Bash Shell Parameter Expansion
A good tutorial on regexes in bash is here. Summarizing, you need something like:
if [[$filenamein =~ "^Test([0-9]*).txt$"]]; then
filenameout = "Output${BASH_REMATCH[1]}.txt"
and so on. The key is that, when you perform the =~" regex-match, the "sub-matches" to parentheses-enclosed groups in the RE are set in the entries of arrayBASH_REMATCH(the[0]entry is the whole match,1` the first parentheses-enclosed group, etc).
You need to use rounded brackets around the part you want to keep.
i.e. "Test([0-9]*).txt"
The syntax for replacing these bracketed groups varies between programs, but you'll probably find you can use \1 , something like this:
s/Test(0-9*).txt/Output\1.txt/
If you're using a unix shell, then 'sed' might be your best bet for performing the transformation.
http://www.grymoire.com/Unix/Sed.html#uh-4
Hope that helps
for file in Test[0-9]*.txt;
do
num=${file//[^0-9]/}
process $file > "Output${num}.txt"
done

Resources