BASH Expression to replace beginning and ending of a string in one operation? - bash

Here's a simple problem that's been bugging me for some time. I often find I have a number of input files in some directory, and I want to construct output file names by replacing beginning and ending portions. For example, given this:
source/foo.c
source/bar.c
source/foo_bar.c
I often end up writing BASH expressions like:
for f in source/*.c; do
a="obj/${f##*/}"
b="${a%.*}.obj"
process "$f" "$b"
done
to generate the commands
process "source/foo.c" "obj/foo.obj"
process "source/bar.c "obj/bar.obj"
process "source/foo_bar.c "obj/foo_bar.obj"
The above works, but its a lot wordier than I like, and I would prefer to avoid the temporary variables. Ideally there would be some command that could replace the beginning and ends of a string in one shot, so that I could just write something like:
for f in source/*.c; do process "$f" "obj/${f##*/%.*}.obj"; done
Of course, the above doesn't work. Does anyone know something that will? I'm just trying to save myself some typing here.

Not the prettiest thing in the world, but you can use a regular expression to group the content you want to pick out, and then refer to the BASH_REMATCH array:
if [[ $f =~ ^source/(.*).c$ ]] ; then f="obj/${BASH_REMATCH[1]}.o"; fi

you shouldn't have to worry about your code being "wordier" or not. In fact, being a bit verbose is no harm, consider how much it will improve your(or someone else) understanding of the script. Besides, for performance, using bash's internal string manipulation is much faster than calling external commands. Lastly, you are not going to retype your commands every time you use it right? So why worry that its "wordier" since these commands are already in your script?

Not directly in bash. You can use sed, of course:
b="$(sed 's|^source/(.*).c$|obj/$1.obj|' <<< "$f")"

Why not simply using cd to remove the "source/" part?
This way we can avoid the temporary variables a and b:
for f in $(cd source; printf "%s\n" *.c); do
echo process "source/${f}" "obj/${f%.*}.obj"
done

Related

BASH Shell Find Multiple Files with Wildcard and Perform Loop with Action

I have a script that I call with an application, I can't run it from command line. I derive the directory where the script is called and in the next variable go up 1 level where my files are stored. From there I have 3 variables with the full path and file names (with wildcard), which I will refer to as "masks".
I need to find and "do something with" (copy/write their names to a new file, whatever else) to each of these masks. The do something part isn't my obstacle as I've done this fine when I'm working with a single mask, but I would like to do it cleanly in a single loop instead of duplicating loop and just referencing each mask separately if possible.
Assume in my $FILESFOLDER directory below that I have 2 existing files, aaa0.csv & bbb0.csv, but no file matching the ccc*.csv mask.
#!/bin/bash
SCRIPTFOLDER=${0%/*}
FILESFOLDER="$(dirname "$SCRIPTFOLDER")"
ARCHIVEFOLDER="$FILESFOLDER"/archive
LOGFILE="$SCRIPTFOLDER"/log.txt
FILES1="$FILESFOLDER"/"aaa*.csv"
FILES2="$FILESFOLDER"/"bbb*.csv"
FILES3="$FILESFOLDER"/"ccc*.csv"
ALLFILES="$FILES1
$FILES2
$FILES3"
#here as an example I would like to do a loop through $ALLFILES and copy anything that matches to $ARCHIVEFOLDER.
for f in $ALLFILES; do
cp -v "$f" "$ARCHIVEFOLDER" > "$LOGFILE"
done
echo "$ALLFILES" >> "$LOGFILE"
The thing that really spins my head is when I run something like this (I haven't done it with the copy command in place) that log file at the end shows:
filesfolder/aaa0.csv filesfolder/bbb0.csv filesfolder/ccc*.csv
Where I would expect echoing $ALLFILES just to show me the masks
filesfolder/aaa*.csv filesfolder/bbb*.csv filesfolder/ccc*.csv
In my "do something" area, I need to be able to use whatever method to find the files by their full path/name with the wildcard if at all possible. Sometimes my network is down for maintenance and I don't want to risk failing a change directory. I rarely work in linux (primarily SQL background) so feel free to poke holes in everything I've done wrong. Thanks in advance!
Here's a light refactoring with significantly fewer distracting variables.
#!/bin/bash
script=${0%/*}
folder="$(dirname "$script")"
archive="$folder"/archive
log="$folder"/log.txt # you would certainly want this in the folder, not $script/log.txt
shopt -s nullglob
all=()
for prefix in aaa bbb ccc; do
cp -v "$folder/$prefix"*.csv "$archive" >>"$log" # append, don't overwrite
all+=("$folder/$prefix"*.csv)
done
echo "${all[#]}" >> "$log"
The change in the loop to append the output or cp -v instead of overwrite is a bug fix; otherwise the log would only contain the output from the last loop iteration.
I would probably prefer to have the files echoed from inside the loop as well, one per line, instead of collect them all on one humongous line. Then you can remove the array all and instead simply
printf '%s\n' "$folder/$prefix"*.csv >>"$log"
shopt -s nullglob is a Bash extension (so won't work with sh) which says to discard any wildcard which doesn't match any files (the default behavior is to leave globs unexpanded if they don't match anything). If you want a different solution, perhaps see Test whether a glob has any matches in Bash
You should use lower case for your private variables so I changed that, too. Notice also how the script variable doesn't actually contain a folder name (or "directory" as we adults prefer to call it); fixing that uncovered a bug in your attempt.
If your wildcards are more complex, you might want to create an array for each pattern.
tmpspaces=(/tmp/*\ *)
homequest=($HOME/*\?*)
for file in "${tmpspaces[#]}" "${homequest[#]}"; do
: stuff with "$file", with proper quoting
done
The only robust way to handle file names which could contain shell metacharacters is to use an array variable; using string variables for file names is notoriously brittle.
Perhaps see also https://mywiki.wooledge.org/BashFAQ/020

(bash) What is the least redundant way to systematically apply changes to an array of variables?

My goal is to check a list of file paths if they end in "/" and remove it if that is the case.
Ideally I would like to change the original FILEPATH variables to reflect this change, and I'd like this to work for a long list without unnecessary redundancy. I tried doing it as a loop, but the changes didn't alter the original variables, it just changed the iterating "EACH_PATH" variable. Can anyone think of a better way to do this?
Here is my code:
FILEPATH1="filepath1/file1"
FILEPATH2="filepath2/file2/"
PATH_ARRAY=(${FILEPATH1} ${FILEPATH2})
echo ${PATH_ARRAY[#]}
for EACH_PATH in ${PATH_ARRAY[#]}
do
if [ "${EACH_PATH:$((${#EACH_PATH}-1)):${#EACH_PATH}}"=="/" ]
then EACH_PATH=${EACH_PATH:0:$((${#EACH_PATH}-1))}
fi
done
edit: I'm happy to do this in a totally different way and scrap the code above, I just want to know the most elegant way to do this.
I'm not entirely clear on the actual goal here, but depending on the situation I can see several possible solutions. The best (if it'll work in the situation) is to dispense with the individual variables, and just use array entries. For example, you could use:
declare -a filepath
filepath[1]="filepath1/file1"
filepath[2]="filepath2/file2/"
for index in "${!filepath[#]}"; do
if [[ "${filepath[index]}" = *?/ ]]; then
filepath[index]="${filepath[index]%/}"
fi
done
...and then use "${filepath[x]}" instead of "$FILEPATHx" throughout. Some notes:
I've used lowercase names. It's generally best to avoid all-caps names, since there are a lot of them with special functions, and accidentally using one of those names can cause trouble.
"${!filepath[#]}" gets a list of the indexes of the array (in this case, "1" "2") rather than their values; this is necessary so we can set the entries rather than just look at them.
I changed the logic of the slash-trimming test -- it uses [[ = ]] to do pattern matching, to see if the entry ends with "/" and has at least one character before that (i.e. it isn't just "/", 'cause you don't want to trim that). Then it uses in the expansion %/ to just trim "/" from the end of the value.
If a numerically-indexed array won't work (and you have at least bash version 4), how about a string-indexed ("associative") array? It's very similar, but use declare -A and use $ on variables in the index (and generally quote them). Something like this:
declare -A filepath
filepath["foo"]="filepath1/file1"
filepath["bar"]="filepath2/file2/"
for index in "${!filepath[#]}"; do
if [[ "${filepath["$index"]}" = *?/ ]]; then
filepath["$index"]="${filepath["$index"]%/}"
fi
done
If you really need separate variables instead of array entries, you might be able to use an array of variable names, and indirect variable references. how this works varies quite a bit between different shells, and can easily be unsafe depending on what's in your data (in this case, specifically what's in path_array). Here's a way to do it in bash:
filepath1="filepath1/file1"
filepath2="filepath2/file2/"
path_array=(filepath1 filepath2)
for varname in "${path_array[#]}"; do
if [[ "${!varname}" = *?/ ]]; then
declare "$varname=${!varname%/}"
fi
done
Using sed
PATH_ARRAY=($(echo ${PATH_ARRAY[#]} | sed 's#\/ ##g;s#/$##g'))
Demo:
$FILEPATH1="filepath1/file1"
$FILEPATH2="filepath2/file2/"
$PATH_ARRAY=(${FILEPATH1} ${FILEPATH2})
$echo ${PATH_ARRAY[#]}
filepath1/file1 filepath2/file2/
$PATH_ARRAY=($(echo ${PATH_ARRAY[#]} | sed 's#\/ ##g;s#/$##g'))
$echo ${PATH_ARRAY[#]}
filepath1/file1 filepath2/file2
$

Simple map for pipeline in shell script

I'm dealing with a pipeline of predominantly shell and Perl files, all of which pass parameters (paths) to the next. I decided it would be better to use a single file to store all the paths and just call that for every file. The issue is I am using awk to grab the files at the beginning of each file, and it's turning out to be a lot of repetition.
My question is: I do not know if there is a way to store key-value pairs in a file so shell can natively do something with the key and return the value? It needs to access an external file, because the pipeline uses many scripts and a map in a specific file would result in parameters being passed everywhere. Is there some little quirk I do not know of that performs a map function on an external file?
You can make a file of env var assignments and source that file as need, ie.
$ cat myEnvFile
path1=/x/y/z
path2=/w/xy
path3=/r/s/t
otherOpt1="-x"
Inside your script you can source with either . myEnvFile or the more versbose version of the same feature sourc myEnvFile (assuming bash shell) , i.e.
$cat myScript
#!/bin/bash
. /path/to/myEnvFile
# main logic below
....
# references to defined var
if [[ -d $path2 ]] ; then
cd $path2
else
echo "no pa4h2=$path2 found, can't continue" 1>&1
exit 1
fi
Based on how you've described your problem this should work well, and provide a-one-stop-shop for all of your variable settings.
IHTH
In bash, there's mapfile, but that reads the lines of a file into a numerically-indexed array. To read a whitespace-separated file into an associative array, I would
declare -A map
while read key value; do
map[$key]=$value
done < filename
However this sounds like an XY problem. Can you give us an example (in code) of what you're actually doing? When I see long piplines of grep|awk|sed, there's usually a way to simplify. For example, is passing data by parameters better than passing via stdout|stdin?
In other words, I'm questioning your statement "I decided it would be better..."

For loop in shell script - colons and hash marks?

I am trying to make heads or tails of a shell script. Could someone please explain this line?
$FILEDIR is a directory containing files. F is a marker in an array of files that is returned from this command:
files=$( find $FILEDIR -type f | grep -v .rpmsave\$ | grep -v .swp\$ )
The confusing line is within a for loop.
for f in $files; do
target=${f:${#FILEDIR}}
<<do some more stuff>>
done
I've never seen the colon, and the hash before in a shell script for loop. I haven't been able to find any documentation on them... could someone try and enlighten me? I'd appreciate it.
There are no arrays involved here. POSIX sh doesn't have arrays (assuming you're not using another shell based upon the tags).
The colon indicates a Bash/Ksh substring expansion. These are also not POSIX. The # prefix expands to the number of characters in the parameter. I imagine they intended to chop off the directory part and assign it to target.
To explain the rest of that: first find is run and hilariously piped into two greps which do what could have been done with find alone (except breaking on possible filenames containing newlines), and the output saved into files. This is also something that can't really be done correctly if restricted only to POSIX tools, but there are better ways.
Next, files is expanded unquoted and mutalated by the shell in more ridiculous ways for the for loop to iterate over the meaningless results. If the rest of the script is this bad, probably throw it out and start over. There's no way that will do what's expected.
The colon can be as a substring. So:
A=abcdefg
echo ${A:4}
will print the output:
efg
I'm not sure why they would use a file directory as the 2nd parameter though...
If you are having problems understanding the for loop section, try http://www.dreamsyssoft.com/unix-shell-scripting/loop-tutorial.php

How to rename files keeping a variable part of the original file name

I'm trying to make a script that will go into a directory and run my own application with each file matching a regular expression, specifically Test[0-9]*.txt.
My input filenames look like this TestXX.txt. Now, I could just use cut and chop off the Test and .txt, but how would I do this if XX wasn't predefined to be two digits? What would I do if I had Test1.txt, ..., Test10.txt? In other words, How would I get the [0-9]* part?
Just so you know, I want to be able to make a OutputXX.txt :)
EDIT:
I have files with filename Test[0-9]*.txt and I want to manipulate the string into Output[0-9]*.txt
Would something like this help?
#!/bin/bash
for f in Test*.txt ;
do
process < $f > ${f/Test/Output}
done
Bash Shell Parameter Expansion
A good tutorial on regexes in bash is here. Summarizing, you need something like:
if [[$filenamein =~ "^Test([0-9]*).txt$"]]; then
filenameout = "Output${BASH_REMATCH[1]}.txt"
and so on. The key is that, when you perform the =~" regex-match, the "sub-matches" to parentheses-enclosed groups in the RE are set in the entries of arrayBASH_REMATCH(the[0]entry is the whole match,1` the first parentheses-enclosed group, etc).
You need to use rounded brackets around the part you want to keep.
i.e. "Test([0-9]*).txt"
The syntax for replacing these bracketed groups varies between programs, but you'll probably find you can use \1 , something like this:
s/Test(0-9*).txt/Output\1.txt/
If you're using a unix shell, then 'sed' might be your best bet for performing the transformation.
http://www.grymoire.com/Unix/Sed.html#uh-4
Hope that helps
for file in Test[0-9]*.txt;
do
num=${file//[^0-9]/}
process $file > "Output${num}.txt"
done

Resources