Simple map for pipeline in shell script - shell

I'm dealing with a pipeline of predominantly shell and Perl files, all of which pass parameters (paths) to the next. I decided it would be better to use a single file to store all the paths and just call that for every file. The issue is I am using awk to grab the files at the beginning of each file, and it's turning out to be a lot of repetition.
My question is: I do not know if there is a way to store key-value pairs in a file so shell can natively do something with the key and return the value? It needs to access an external file, because the pipeline uses many scripts and a map in a specific file would result in parameters being passed everywhere. Is there some little quirk I do not know of that performs a map function on an external file?

You can make a file of env var assignments and source that file as need, ie.
$ cat myEnvFile
path1=/x/y/z
path2=/w/xy
path3=/r/s/t
otherOpt1="-x"
Inside your script you can source with either . myEnvFile or the more versbose version of the same feature sourc myEnvFile (assuming bash shell) , i.e.
$cat myScript
#!/bin/bash
. /path/to/myEnvFile
# main logic below
....
# references to defined var
if [[ -d $path2 ]] ; then
cd $path2
else
echo "no pa4h2=$path2 found, can't continue" 1>&1
exit 1
fi
Based on how you've described your problem this should work well, and provide a-one-stop-shop for all of your variable settings.
IHTH

In bash, there's mapfile, but that reads the lines of a file into a numerically-indexed array. To read a whitespace-separated file into an associative array, I would
declare -A map
while read key value; do
map[$key]=$value
done < filename
However this sounds like an XY problem. Can you give us an example (in code) of what you're actually doing? When I see long piplines of grep|awk|sed, there's usually a way to simplify. For example, is passing data by parameters better than passing via stdout|stdin?
In other words, I'm questioning your statement "I decided it would be better..."

Related

BASH Shell Find Multiple Files with Wildcard and Perform Loop with Action

I have a script that I call with an application, I can't run it from command line. I derive the directory where the script is called and in the next variable go up 1 level where my files are stored. From there I have 3 variables with the full path and file names (with wildcard), which I will refer to as "masks".
I need to find and "do something with" (copy/write their names to a new file, whatever else) to each of these masks. The do something part isn't my obstacle as I've done this fine when I'm working with a single mask, but I would like to do it cleanly in a single loop instead of duplicating loop and just referencing each mask separately if possible.
Assume in my $FILESFOLDER directory below that I have 2 existing files, aaa0.csv & bbb0.csv, but no file matching the ccc*.csv mask.
#!/bin/bash
SCRIPTFOLDER=${0%/*}
FILESFOLDER="$(dirname "$SCRIPTFOLDER")"
ARCHIVEFOLDER="$FILESFOLDER"/archive
LOGFILE="$SCRIPTFOLDER"/log.txt
FILES1="$FILESFOLDER"/"aaa*.csv"
FILES2="$FILESFOLDER"/"bbb*.csv"
FILES3="$FILESFOLDER"/"ccc*.csv"
ALLFILES="$FILES1
$FILES2
$FILES3"
#here as an example I would like to do a loop through $ALLFILES and copy anything that matches to $ARCHIVEFOLDER.
for f in $ALLFILES; do
cp -v "$f" "$ARCHIVEFOLDER" > "$LOGFILE"
done
echo "$ALLFILES" >> "$LOGFILE"
The thing that really spins my head is when I run something like this (I haven't done it with the copy command in place) that log file at the end shows:
filesfolder/aaa0.csv filesfolder/bbb0.csv filesfolder/ccc*.csv
Where I would expect echoing $ALLFILES just to show me the masks
filesfolder/aaa*.csv filesfolder/bbb*.csv filesfolder/ccc*.csv
In my "do something" area, I need to be able to use whatever method to find the files by their full path/name with the wildcard if at all possible. Sometimes my network is down for maintenance and I don't want to risk failing a change directory. I rarely work in linux (primarily SQL background) so feel free to poke holes in everything I've done wrong. Thanks in advance!
Here's a light refactoring with significantly fewer distracting variables.
#!/bin/bash
script=${0%/*}
folder="$(dirname "$script")"
archive="$folder"/archive
log="$folder"/log.txt # you would certainly want this in the folder, not $script/log.txt
shopt -s nullglob
all=()
for prefix in aaa bbb ccc; do
cp -v "$folder/$prefix"*.csv "$archive" >>"$log" # append, don't overwrite
all+=("$folder/$prefix"*.csv)
done
echo "${all[#]}" >> "$log"
The change in the loop to append the output or cp -v instead of overwrite is a bug fix; otherwise the log would only contain the output from the last loop iteration.
I would probably prefer to have the files echoed from inside the loop as well, one per line, instead of collect them all on one humongous line. Then you can remove the array all and instead simply
printf '%s\n' "$folder/$prefix"*.csv >>"$log"
shopt -s nullglob is a Bash extension (so won't work with sh) which says to discard any wildcard which doesn't match any files (the default behavior is to leave globs unexpanded if they don't match anything). If you want a different solution, perhaps see Test whether a glob has any matches in Bash
You should use lower case for your private variables so I changed that, too. Notice also how the script variable doesn't actually contain a folder name (or "directory" as we adults prefer to call it); fixing that uncovered a bug in your attempt.
If your wildcards are more complex, you might want to create an array for each pattern.
tmpspaces=(/tmp/*\ *)
homequest=($HOME/*\?*)
for file in "${tmpspaces[#]}" "${homequest[#]}"; do
: stuff with "$file", with proper quoting
done
The only robust way to handle file names which could contain shell metacharacters is to use an array variable; using string variables for file names is notoriously brittle.
Perhaps see also https://mywiki.wooledge.org/BashFAQ/020

Arithmetic in shell script (arithmetic in string)

I'm trying to write a simple script that creates five textfiles enumerated by a variable in a loop. Can anybody tell my how to make the arithmetic expression be evaluated. This doesn't seem to work:
touch ~/test$(($i+1)).txt
(I am aware that I could evaluate the expression in a separate statement or change of the loop...)
Thanks in advance!
The correct answer would depend on the shell you're using. It looks a little like bash, but I don't want to make too many assumptions.
The command you list touch ~/test$(($i+1)).txt will correctly touch the file with whatever $i+1 is, but what it's not doing, is changing the value of $i.
What it seems to me like you want to do is:
Find the largest value of n amongst the files named testn.txt where n is a number larger than 0
Increment the number as m.
touch (or otherwise output) to a new file named testm.txt where m is the incremented number.
Using techniques listed here you could strip the parts of the filename to build the value you wanted.
Assume the following was in a file named "touchup.sh":
#!/bin/bash
# first param is the basename of the file (e.g. "~/test")
# second param is the extension of the file (e.g. ".txt")
# assume the files are named so that we can locate via $1*$2 (test*.txt)
largest=0
for candidate in (ls $1*$2); do
intermed=${candidate#$1*}
final=${intermed%%$2}
# don't want to assume that the files are in any specific order by ls
if [[ $final -gt $largest ]]; then
largest=$final
fi
done
# Now, increment and output.
largest=$(($largest+1))
touch $1$largest$2

Using a variable for associative array key in Bash

I'm trying to create associative arrays based on variables. So below is a super simplified version of what I'm trying to do (the ls command is not really what I want, just used here for illustrative purposes)...
I have a statically defined array (text-a,text-b). I then want to iterate through that array, and create associative arrays with those names and _AA appended to them (so associative arrays called text-a_AA and text-b_AA).
I don't really need the _AA appended, but was thinking it might be
necessary to avoid duplicate names since $NAME is already being used
in the loop.
I will need those defined and will be referencing them in later parts of the script, and not just within the for loop seen below where I'm trying to define them... I want to later, for example, be able to reference text-a_AA[NUM] (again, using variables for the text-a_AA part). Clearly what I have below doesn't work... and from what I can tell, I need to be using namerefs? I've tried to get the syntax right, and just can't seem to figure it out... any help would be greatly appreciated!
#!/usr/bin/env bash
NAMES=('text-a' 'text-b')
for NAME in "${NAMES[#]}"
do
NAME_AA="${NAME}_AA"
$NAME_AA[NUM]=$(cat $NAME | wc -l)
done
for NAME in "${NAMES[#]}"
do
echo "max: ${$NAME_AA[NUM]}"
done
You may want to use "NUM" as the name of the associative array and file name as the key. Then you can rewrite your code as:
NUM[${NAME}_AA]=$(wc -l < "$NAME")
Then rephrase your loop as:
for NAME in "${NAMES[#]}"
do
echo "max: ${NUM[${NAME}_AA]}"
done
Check your script at shellcheck.net
As an aside: all uppercase is not a good practice for naming normal shell variables. You may want to take a look at:
Correct Bash and shell script variable capitalization

Bash script execute shell command with Bash variable as argument

I have one loop that creates a group of variables like DISK1, DISK2... where the number at the end of the variable name gets created by the loop and then loaded with a path to a device name. Now I want to use those variables in another loop to execute a shell command, but the variable doesn't give its contents to the shell command.
for (( counter=1 ; counter<=devcount ; counter++))
do
TEMP="\$DISK$counter"
# $TEMP should hold the variable name of the disk, which holds the device name
# TEMP was only for testing, but still has same problem as $DISK$counter
eval echo $TEMP #This echos correctly
STATD$counter=$(eval "smartctl -H -l error \$DISK$counter" | grep -v "5.41" | grep -v "Joe")
eval echo \$STATD$counter
done
Don't use eval ever, except maybe if there is no other way AND you really know what you are doing.
The STATD$counter=$(...) should give an error. That's not a valid assignment because the string "STATD$counter" is not a valid variable name. What will happen is (using a concrete example, if counter happened to be 3 and your pipeline in the $( ) output "output", bash will only expand that line as far as "STATD3=output" so it will try to find a command named "STATD3=output" and run it. Odds are this is not what you intended.
It sounds like everything you want to do can be accomplished with arrays instead. If you are not familiar with bash arrays take a look at Greg's Wiki, in particular this page or the bash man page to find out how to use them.
For example, in the loop you didn't post in your question: make disk (not DISK: don't use all upper case variable names) an array like so
disk+=( "new value" )
or even
disk[counter]="new value"
Then in the loop in your question, you can make statd an array as well and assign it with values from disk by
statd[counter]="... ${disk[counter]} ..."
It's worth saying again: avoid using eval.

BASH Expression to replace beginning and ending of a string in one operation?

Here's a simple problem that's been bugging me for some time. I often find I have a number of input files in some directory, and I want to construct output file names by replacing beginning and ending portions. For example, given this:
source/foo.c
source/bar.c
source/foo_bar.c
I often end up writing BASH expressions like:
for f in source/*.c; do
a="obj/${f##*/}"
b="${a%.*}.obj"
process "$f" "$b"
done
to generate the commands
process "source/foo.c" "obj/foo.obj"
process "source/bar.c "obj/bar.obj"
process "source/foo_bar.c "obj/foo_bar.obj"
The above works, but its a lot wordier than I like, and I would prefer to avoid the temporary variables. Ideally there would be some command that could replace the beginning and ends of a string in one shot, so that I could just write something like:
for f in source/*.c; do process "$f" "obj/${f##*/%.*}.obj"; done
Of course, the above doesn't work. Does anyone know something that will? I'm just trying to save myself some typing here.
Not the prettiest thing in the world, but you can use a regular expression to group the content you want to pick out, and then refer to the BASH_REMATCH array:
if [[ $f =~ ^source/(.*).c$ ]] ; then f="obj/${BASH_REMATCH[1]}.o"; fi
you shouldn't have to worry about your code being "wordier" or not. In fact, being a bit verbose is no harm, consider how much it will improve your(or someone else) understanding of the script. Besides, for performance, using bash's internal string manipulation is much faster than calling external commands. Lastly, you are not going to retype your commands every time you use it right? So why worry that its "wordier" since these commands are already in your script?
Not directly in bash. You can use sed, of course:
b="$(sed 's|^source/(.*).c$|obj/$1.obj|' <<< "$f")"
Why not simply using cd to remove the "source/" part?
This way we can avoid the temporary variables a and b:
for f in $(cd source; printf "%s\n" *.c); do
echo process "source/${f}" "obj/${f%.*}.obj"
done

Resources