awk, if else conditional when record contains a value - macos

I'm having trouble getting an awk if/else conditional to properly trigger when the record contains a value. Running this in zsh on Mac OS Catalina.
This script (issue is on second to last line)...
echo "abcdefgh" > ./temp
echo "abc\"\(\"h" >> ./temp
echo "abcdefgh" >> ./temp
echo "abcde\(h" >> ./temp
val='"\("'
key="NEW_NEW"
file="./temp"
echo $val
echo $key
echo $file
echo ""
echo "###############"
echo ""
awk '
BEGIN { old=ARGV[1]; new=ARGV[2]; ARGV[1]=ARGV[2]=""; len=length(old) }
($0 ~ /old/){ s=index($0,old); print substr($0,1,s-1) new substr($0,s+len) }{ print $0 }
' $val $key $file
outputs:
"\("
NEW_NEW
./temp
###############
abcdefgh
abc"\("h
abcdefgh
abcde\(h
I want to fix the script so that it changes the "\(" to NEW_NEW but skips the parenthesis without the quotes...
"\("
NEW_NEW
./temp
###############
abcdefgh
abcNEW_NEWh
abcdefgh
abcde\(h
EDIT
This is an abbreviated version of the real script that I'm working on. The answer will need to include the variable expansions that the sample above has, in order for me to use the command in the larger script. The ARGV format in use is preserving special characters, so the main question I have is why the conditional isn’t triggered as expected.

($0 ~ /old/) means "do a regexp comparison between the current record ($0) and the literal regexp old" so it matches when $0 contains the 3 characters o, l, d in that order. You probably were trying to do a regexp comparison against the contents of the variable named old which would be $0 ~ old (see How do I use shell variables in an awk script?) but you don't actually want that, you want a string comparison which would be index($0,old) as shown in your previous question (https://stackoverflow.com/a/62096075/1745001) but which you have now for some reason moved out of the condition part of your condition { action } awk statement and put it as the first part of the action instead. So don't do that.
The other major problem with your script is you're removing the quotes from around your shell variables so they're being interpreted by the shell and undergoing globbing, file name expansion, etc. before awk even gets to see them (see https://mywiki.wooledge.org/Quotes). So don't do that either.
Fixing just the parts I mentioned:
$ cat tst.sh
echo "abcdefgh" > ./temp
echo "abc\"\(\"h" >> ./temp
echo "abcdefgh" >> ./temp
echo "abcde\(h" >> ./temp
val='"\("'
key="NEW_NEW"
file="./temp"
echo "$val"
echo "$key"
echo "$file"
echo ""
echo "###############"
echo ""
awk '
BEGIN { old=ARGV[1]; new=ARGV[2]; ARGV[1]=ARGV[2]=""; len=length(old) }
s=index($0,old) { $0 = substr($0,1,s-1) new substr($0,s+len) }
{ print }
' "$val" "$key" "$file"
.
$ ./tst.sh
"\("
NEW_NEW
./temp
###############
abcdefgh
abcNEW_NEWh
abcdefgh
abcde\(h

This code uses the variables val, key and file but assumes you can alter the content of val in order to compensate for shell expansion when passing to awk
$ file="./temp"; key="NEW_NEW"; val='"\\\\\\("'; \
awk --posix -v val="$val" -v key="$key" '{gsub(val, key)}1' "$file"
abcdefgh
abcNEW_NEWh
abcdefgh
abcde\(h

Related

Bash loop on files in folder without specific pattern

I have to cycle over the files present in a folder but I dont want to cycle over files with a specific pattern ("Reverse"). Here is the code
Thanks
DIRECTORY=/Users/Qi.Wang/projects/CH12F3/data/CH12F3.LAM-HTGTS_mMYC.220512
outputDir=/Users/Qi.Wang/projects/CH12F3/
pw=$(pwd)
cat blank.txt > ./config.txt
c="$pw/config.txt"
o=0
for i in $DIRECTORY/*.fna; do
((o=o+1))
s=${i##*/}
b=${s%.fna}
b="${o}_$(echo $b | awk '{ gsub(/_PairEnd+/, " " ); print $1 }')"
outputDirs="$outputDir$b"
printf "%s\t" $b >> ./config.txt
printf "%s\t" $s >> ./config.txt
cat end.txt >> ./config.txt
printf "perl /Users/andy/projects/HTGTS/pipeline/align_tools/TLPpipeline.pl %s %s which=%s assembly=mm9 blatopt=mask=lower outdir=/%s -skipred -skipredadd -skipblu -skipbluadd \n" $c $DIRECTORY $o $outputDirs >> ./command.sh
done
I Also have another minor problem. When i printf outdirs=%s the variable that is printed is $outputDir that starts with a "/" but after it got printed by printf, looks like the / is not there anymore.
Your awk command puts spaces $b, so $outputDirs will contain spaces. Therefore, you need to quote it to make it a single argument to printf. You should also quote all the other variable arguments.
Also, since you're creating a perl command line, you'll want outdir=%s to be a single argument, so you should put single quotes around that as well.
printf "perl /Users/andy/projects/HTGTS/pipeline/align_tools/TLPpipeline.pl '%s' '%s' 'which=%s' assembly=mm9 blatopt=mask=lower 'outdir=/%s' -skipred -skipredadd -skipblu -skipbluadd \n" "$c" "$DIRECTORY" "$o" "$outputDirs" >> ./command.sh
To skip files with Reverse in the name, enable extended globbing and use a non-matching pattern.
shopt -s extglob
for i in "$DIRECTORY"/!(*Reverse*).fna; do

How can i add quotes around each words stored in a variable in shell script

I have a variable foo.
echo "print foo" "$foo" ---> abc,bc,cde
I wanted to put quotes around each variable.
Expected result = 'abc','bc','cde'.
I have tried this way, but its not working:
join_lines() {
local IFS=${1:-,}
set --
while IFS= read -r line; do set -- "$#" "$'line'"; done
echo "$*"
}
Could you please try following, strictly written and tested with shown samples in GNU awk.
Without loop:
var="abc,bc,cde"
echo "$var" | awk -v s1="'" 'BEGIN{FS=",";OFS="\047,\047"} {$1=$1;$0=s1 $0 s1} 1'
With loop usual way to go through all fields(comma separated):
var="abc,bc,cde"
echo "$var" | awk -v s1="'" 'BEGIN{FS=OFS=","} {for(i=1;i<=NF;i++){$i=s1 $i s1}} 1'
Output will be 'abc','bc','cde'.
As alternative, using 'sed: replacing every 'with'', and adding ' at the beginning and end of the line to wrap the first/last tokens.
sed -e "s/^/'/" -e "s/$/'/" -e "s/,/','/g"
On surface, the question is on how to convert comma separated list of values (stored in a shell variable) into a comma separate list of quoted tokens. Extending the logic provided by OP, but using shell arrays
foo="abc,bc,cde"
IFS=, read -a items <<< "$foo"
result=
for r in "${items[#]}" ; do
[ "$result" ] && result+=","
result+="'$r'"
done
echo "RESULT=$result"
If needed, logic can be placed into a function/filter
function join_lines {
local -a items
local input result
while IFS=, read -a items ; do
result=
for r in "${items[#]}" ; do
[ "$result" ] && result+=","
result+="'$r'"
done
echo "$result"
done
}

Take multiple (any number of input) input strings and concatenate in shell

I want to input multiple strings.
For example:
abc
xyz
pqr
and I want output like this (including quotes) in a file:
"abc","xyz","pqr"
I tried the following code, but it doesn't give the expected output.
NextEmail=","
until [ "a$NextEmail" = "a" ];do
echo "Enter next E-mail: "
read NextEmail
Emails="\"$Emails\",\"$NextEmail\""
done
echo -e $Emails
This seems to work:
#!/bin/bash
# via https://stackoverflow.com/questions/1527049/join-elements-of-an-array
function join_by { local IFS="$1"; shift; echo "$*"; }
emails=()
while read line
do
if [[ -z $line ]]; then break; fi
emails+=("$line")
done
join_by ',' "${emails[#]}"
$ bash vvuv.sh
my-email
another-email
third-email
my-email,another-email,third-email
$
With sed and paste:
sed 's/.*/"&"/' infile | paste -sd,
The sed command puts "" around each line; paste does serial pasting (-s) and uses , as the delimiter (-d,).
If input is from standard input (and not a file), you can just remove the input filename (infile) from the command; to store in a file, add a redirection at the end (> outfile).
If you can withstand a trailing comma, then printf can convert an array, with no loop required...
$ readarray -t a < <(printf 'abc\nxyx\npqr\n' )
$ declare -p a
declare -a a=([0]="abc" [1]="xyx" [2]="pqr")
$ printf '"%s",' "${a[#]}"; echo
"abc","xyx","pqr",
(To be fair, there's a loop running inside bash, to step through the array, but it's written in C, not bash. :) )
If you wanted, you could replace the final line with:
$ printf -v s '"%s",' "${a[#]}"
$ s="${s%,}"
$ echo "$s"
"abc","xyx","pqr"
This uses printf -v to store the imploded text into a variable, $s, which you can then strip the trailing comma off using Parameter Expansion.

how to prevent for loop from using space as deliminator, bash script

I am trying to right a bash script to do multiple checks and searches for a CMS my company uses. I trying to implement a function for a user to be able to search for a certain macro call and the function return all the files that contain the call, the line the macro is called on, and the actual code in the macro call. What I have seems to be getting screwed up by the fact I am using a for loop to format the output. Here's the snippet of the script I am working on:
elif [ "$choice" = "2" ]
then
echo -e "\n What macro call are we looking for $name?"
read macrocall
for i in $(grep -inR "$macrocall" $sitepath/templates/macros/); do
file=$(echo $i | cut -d\: -f1 | awk -F\/ '{ print $NF }')
line=$(echo $i | cut -d\: -f2)
calltext=$(echo $i | cut -d\: -f3-)
echo -e "\nFile: $file"
echo -e "\nLine: $line"
echo -e "\nMacro Call from file: $calltext"
done
fi
the current script runs the first few fields until it gets a a space and then everything gets all screwy. Anybody have any idea how I can have the for loops deliminator to be each result of the grep? any suggestions would be helpful. Let me know if any of you need more info. Thanks!
The right way to do this would be more like:
printf "\n What macro call are we looking for %s?" "$name"
read macrocall
# ensure globbing is off and set IFS to a newline after saving original values
oSET="$-"; set -f; oIFS="$IFS"; IFS=$'\n'
awk -v macrocall="$macrocall" '
BEGIN { lc_macrocall = "\\<" tolower(macrocall) "\\>" }
tolower($0) ~ lc_macrocall {
file=FILENAME
sub(/.*\//,"",file)
printf "\n%s\n", file
printf "\n%d\n", FNR
printf "\nMacro Call from file: %s\n", $0
}
' $(find "$sitepath/templates/macros" -type f -print)
# restore original IFS and globbing values
IFS="$oIFS"; set +f -"$oSET"
This solves the problem of having spaces in your file names as originally requested, but also handles globbing characters in your file names, and the various typical echo issues.
You can set the internal field separator $IFS (which is normally set to space, tab and newline) to just newline to get around this problem:
IFS="\n"

Printf example in bash does not create a newline

Working with printf in a bash script, adding no spaces after "\n" does not create a newline, whereas adding a space creates a newline, e. g.:
No space after "\n"
NewLine=`printf "\n"`
echo -e "Firstline${NewLine}Lastline"
Result:
FirstlineLastline
Space after "\n "
NewLine=`printf "\n "`
echo -e "Firstline${NewLine}Lastline"
Result:
Firstline
Lastline
Question: Why doesn't 1. create the following result:
Firstline
Lastline
I know that this specific issue could have been worked around using other techniques, but I want to focus on why 1. does not work.
Edited:
When using echo instead of printf, I get the expected result, but why does printf work differently?
NewLine=`echo "\n"`
echo -e "Firstline${NewLine}Lastline"
Result:
Firstline
Lastline
The backtick operator removes trailing new lines. See 3.4.5. Command substitution at http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_04.html
Note on edited question
Compare:
[alvaro#localhost ~]$ printf "\n"
[alvaro#localhost ~]$ echo "\n"
\n
[alvaro#localhost ~]$ echo -e "\n"
[alvaro#localhost ~]$
The echo command doesn't treat \n as a newline unless you tell him to do so:
NAME
echo - display a line of text
[...]
-e enable interpretation of backslash escapes
POSIX 7 specifies this behaviour here:
[...] with the standard output of the command, removing sequences of one or more characters at the end of the substitution
Maybe people will come here with the same problem I had:
echoing \n inside a code wrapped in backsticks. A little tip:
printf "astring\n"
# and
printf "%s\n" "astring"
# both have the same effect.
# So... I prefer the less typing one
The short answer is:
# Escape \n correctly !
# Using just: printf "$myvar\n" causes this effect inside the backsticks:
printf "banana
"
# So... you must try \\n that will give you the desired
printf "banana\n"
# Or even \\\\n if this string is being send to another place
# before echoing,
buffer="${buffer}\\\\n printf \"$othervar\\\\n\""
One common problem is that if you do inside the code:
echo 'Tomato is nice'
when surrounded with backsticks will produce the error
command Tomato not found.
The workaround is to add another echo -e or printf
printed=0
function mecho(){
#First time you need an "echo" in order bash relaxes.
if [[ $printed == 0 ]]; then
printf "echo -e $1\\\\n"
printed=1
else
echo -e "\r\n\r$1\\\\n"
fi
}
Now you can debug your code doing in prompt just:
(prompt)$ `mySuperFunction "arg1" "etc"`
The output will be nicely
mydebug: a value
otherdebug: whathever appended using myecho
a third string
and debuging internally with
mecho "a string to be hacktyped"
$ printf -v NewLine "\n"
$ echo -e "Firstline${NewLine}Lastline"
Firstline
Lastline
$ echo "Firstline${NewLine}Lastline"
Firstline
Lastline
It looks like BASH is removing trailing newlines.
e.g.
NewLine=`printf " \n\n\n"`
echo -e "Firstline${NewLine}Lastline"
Firstline Lastline
NewLine=`printf " \n\n\n "`
echo -e "Firstline${NewLine}Lastline"
Firstline
Lastline
Your edited echo version is putting a literal backslash-n into the variable $NewLine which then gets interpreted by your echo -e. If you did this instead:
NewLine=$(echo -e "\n")
echo -e "Firstline${NewLine}Lastline"
your result would be the same as in case #1. To make that one work that way, you'd have to escape the backslash and put the whole thing in single quotes:
NewLine=$(printf '\\n')
echo -e "Firstline${NewLine}Lastline"
or double escape it:
NewLine=$(printf "\\\n")
Of course, you could just use printf directly or you can set your NewLine value like this:
printf "Firstline\nLastline\n"
or
NewLine=$'\n'
echo "Firstline${NewLine}Lastline" # no need for -e
For people coming here wondering how to use newlines in arguments to printf, use %b instead of %s:
$> printf "a%sa" "\n"
a\na
$> printf "a%ba" "\n"
a
a
From the manual:
%b expand backslash escape sequences in the corresponding argument
We do not need "echo" or "printf" for creating the NewLine variable:
NewLine="
"
printf "%q\n" "${NewLine}"
echo "Firstline${NewLine}Lastline"
Bash delete all trailing newlines in commands substitution.
To save trailing newlines, assign printf output to the variable with printf -v VAR
instead of
NewLine=`printf "\n"`
echo -e "Firstline${NewLine}Lastline"
#FirstlineLastline
use
printf -v NewLine '\n'
echo -e "Firstline${NewLine}Lastline"
#Firstline
#Lastline
Explanation
According to bash man
3.5.4 Command Substitution
$(command)
or
`command`
Bash performs the expansion by executing command and replacing the command substitution with the standard output of the command, with any trailing newlines deleted. Embedded newlines are not deleted, but they may be removed during word splitting.
So, after adding any trailing newlines, bash will delete them.
var=$(printf '%s\n%s\n\n\n' 'foo' 'bar')
echo "$var"
output:
foo
bar
According to help printf
printf [-v var] format [arguments]
If the -v option is supplied, the output is placed into the value of the shell variable VAR rather than being sent to the standard output.
In this case, for safe copying of formatted text to the variable, use the [-v var] option:
printf -v var '%s\n%s\n\n\n' 'foo' 'bar'
echo "$var"
output:
foo
bar
Works ok if you add "\r"
$ nl=`printf "\n\r"` && echo "1${nl}2"
1
2

Resources