awk: assigning a shell variable in awk script - shell

I have a situation in awk where I need to convert an input format into another format and later use the number of records processed separately. Is there any way I can use a shell variable to get the value of NR in the END section? Something like:
cat file1 | awk 'some processing END{SHELL_VARIABLE=NR}' > file2
Then later use SHELL_VARIABLE outside awk.
I do not want to process the file and then do a wc -l separately as the files are huge.

One way: Use the redirection inside your awk command and print your result in the END block. And use command substitution to read the result in a shell variable:
my_var=$(awk '{ some processing; print "your output" >>file2 } END { print NR }' file1)

No subprocess can affect the parent's environment variables. What you can do is have awk write output to the file directly, then have it print the value you want to stdout and capture it. Or if you prefer, you could reverse that and have awk just print it to a file and read it back afterwards.
Incidentally, you have a UUOC.
rows=$(awk '{ ...; print > "file2"} END {print NR}' file1)
Or
awk '... END{print NR > "rows"}' file1 >file2
rows=$(<rows)
rm rows

Related

Using a value from stored in a different file awk

I have a value stored in a file named cutoff1
If I cat cutoff1 it will look like
0.34722
I want to use the value stored in cutoff1 inside an awk script. Something like following
awk '{ if ($1 >= 'cat cutoff1' print $1 }' hist1.dat >hist_oc1.dat
I think I am making some mistakes. If I do manually it will look like
awk '{ if ($1 >= 0.34722) print $1 }' hist1.dat >hist_oc1.dat
How can I use the value stored in cutoff1 file inside the above mentioned awk script?
The easiest ways to achieve this are
awk -v cutoff="$(cat cutoff1)" '($1 >= cutoff){print $1}' hist.dat
awk -v cutoff="$(< cutoff1)" '($1 >= cutoff){print $1}' hist.dat
or
awk '(NR==FNR){cutoff=$1;next}($1 >= cutoff){print $1}' cutoff1 hist.dat
or
awk '($1 >= cutoff){print $1}' cutoff="$(cat cutoff1)" hist.dat
awk '($1 >= cutoff){print $1}' cutoff="$(< cutoff1)" hist.dat
note: thanks to Glenn Jackman to point to :
man bash Command substitution: Bash performs the expansion by executing command and replacing the command substitution with the
standard output of the command, with any trailing newlines deleted.
Embedded newlines are not deleted, but they may be removed during word
splitting. The command substitution $(cat file) can be replaced by
the equivalent but faster $(< file).
since awk can read multiple files just add the filename before your data file and treat first line specially. No need for external variable declaration.
awk 'NR==1{cutoff=$1; next} $1>=cutoff{print $1}' cutoff data
PS Just noticed that it's similar to the #kvantour's second answer, but keepin it here as a different flavor.
You could use getline to read a value from another file at your convenience. First the main file to process:
$ cat > file
wait
wait
did you see that
nothing more to see here
And cutoff:
$ cat cutoff
0.34722
An wwk script that reads a line from cutoff when it meets the string see in a record:
$ awk '/see/{if((getline val < "cutoff") > 0) print val}1' file
wait
wait
0.34722
did you see that
nothing more to see here
Explained:
$ awk '
/see/ { # when string see is in the line
if((getline val < "cutoff") > 0) # read a value from cutoff if there are any available
print val # and output the value from cutoff
}1' file # output records from file
As there was only one value, it was printed only once even see was seen twice.

How to write a bash script that dumps itself out to stdout (for use as a help file)?

Sometimes I want a bash script that's mostly a help file. There are probably better ways to do things, but sometimes I want to just have a file called "awk_help" that I run, and it dumps my awk notes to the terminal.
How can I do this easily?
Another idea, use #!/bin/cat -- this will literally answer the title of your question since the shebang line will be displayed as well.
Turns out it can be done as pretty much a one liner, thanks to #CharlesDuffy for the suggestions!
Just put the following at the top of the file, and you're done
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
So for my awk_help example, it'd be:
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
# Basic form of all awk commands
awk search pattern { program actions }
# advanced awk
awk 'BEGIN {init} search1 {actions} search2 {actions} END { final actions }' file
# awk boolean example for matching "(me OR you) OR (john AND ! doe)"
awk '( /me|you/ ) || (/john/ && ! /doe/ )' /path/to/file
# awk - print # of lines in file
awk 'END {print NR,"coins"}' coins.txt
# Sum up gold ounces in column 2, and find out value at $425/ounce
awk '/gold/ {ounces += $2} END {print "value = $" 425*ounces}' coins.txt
# Print the last column of each line in a file, using a comma (instead of space) as a field separator:
awk -F ',' '{print $NF}' filename
# Sum the values in the first column and pretty-print the values and then the total:
awk '{s+=$1; print $1} END {print "--------"; print s}' filename
# functions available
length($0) > 72, toupper,tolower
# count the # of times the word PASSED shows up in the file /tmp/out
cat /tmp/out | awk 'BEGIN {X=0} /PASSED/{X+=1; print $1 X}'
# awk regex operators
https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operators.html
I found another solution that works on Mac/Linux and works exactly as one would hope.
Just use the following as your "shebang" line, and it'll output everything from line 2 on down:
test.sh
#!/usr/bin/tail -n+2
hi there
how are you
Running this gives you what you'd expect:
$ ./test.sh
hi there
how are you
and another possible solution - just use less, and that way your file will open in searchable gui
#!/usr/bin/less
and this way you can grep if for something too, e.g.
$ ./test.sh | grep something

How to save the name of the file if it is being treated in the script

I have 88 folders, each of which contains the file "pair.'numbers'." (pair.3472, pair.7829 and so on). I need to treat the files with awk to extract the second column, but I need to save the numbers. If I try:
#!/bin/bash
for i in {1..88}; do
awk '{print $2}' ~/Documents/attempt.$i/pair* > ~/Results/pred.pair*
done
It doesn't save the numbers, but gives only one file: pred.pair*
Thanks for any tips.
You don't need a loop (and see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for why that's a Good Thing):
awk '
FNR==1 { close(out); out=FILENAME; sub(/\/Documents.*\//,"/Results/pred.",out) }
{ print $2 > out }
' ~/Documents/attempt.{1..88}/pair*
#!/bin/bash
for i in {1..88}; do
awk '{fname=FILENAME;sub(".*/", "", fname);print $2 > ("~/Results/pred."fname)}' ~/Documents/attempt.$i/pair*
done
Use AWK build in variable FILENAME. We need to get the basename fname from FILENAME. Then redirect $2 value to "~/Results/pred."fname
There are several ways to do it: awk has a FILENAME variable and you can redirect the output from within your awk script to a manipulated string which is based on FILENAME.
Or you can do it with bash
for i in {1..88}; do
to_be_processed_fname=$(ls ~/Documents/attempt.$i/pair*)
extension="${to_be_processed_fname/*./}"
awk '{print $2}' "${to_be_processed_fname}" > "$HOME/Results/pred.${extension}"
done
Now the above of course fails if you have more than one pair* files within the same directory. But I'm leaving that to you.

awk syntax — what is OFS and why the 1 at the end?

awk -F"\t" -v OFS="\t" '{if($18~/^ *[0-9]*(\.[0-9]+)?" *$/)sub(/"/,"",$18);else $18=" "}1' sample.txt
The code above is some awk code used in a script I'm modifying. I'm new to Unix so am not able understand the syntax of the above awk.
-F is for splitting the colum with the delimeter.
What is OFS?
And what is the use of 1 at the end of the awk script?
-v OFS="\n" passes a param named OFS from the shell to the awk script. Like the -F option or FS it is the field separator - but for the output. It is called the output field separator
You can test it:
awk -v OFS=' ' '{print 1,2}' a.txt
Output separated by spaces:
1 2
1 2
.
awk -v OFS=';' '{print 1,2}' a.txt
Output separated by ;:
1;2
1;2
In your case it means, that the output will be separated by tabs (as the input)
The 1 at the end of the awk script, let awk print the original input line in addition to the script generated output. That's because an awk script usually contains tests (regex, etc) and actions for them. The test 1 will be always true. And as the default action of awk is printing the current line, it will print the line

awk function printing..... -bash?

For some reason that i'm trying to figure out i'm getting "-bash" printed out of this script:
cat sample | awk -v al=$0 -F"|" '{n = split(al, a, "|")} {print a[1]}'
the 'sample' file contains psv "pipe separated value", like a|b|c|d|e|f|d.
My intention is to use an array.
The result of the above script is an array of length 1 and th only item contained is "-bash", the name of the shell.
$0 by default points to the program that is currently used, but as far as i know, within an awk script, the $0 parameter 'should' point to the entire line being read.
since i would like to understand where the problem exaclty is "i'm new to bash/awk"
can you point me out which of the following steps is failing?
1-"concatenate" the sample file and pass it as input for the awk script
2-define a variable named 'al' with as value each line contained in 'sample'
3-define a pipe "|" as field separator
4-define an action, split the value of 'al' into an array named 'a' using a pipe as splitter
5-define another action, which in this case is simply printing the first item in the array
Any advice? thank you!
The $0 is expanded by the shell before it runs awk, and $0 is the name of the current program, which is bash, the - at the start is because bash was run by login(1) (see the description of the exec builtin in man bash)
You need to quote the $0 so the shell doesn't expand it, and awk sees it:
awk -v 'al=$0' -F"|" '{n = split(al, a, "|")} {print a[1]}' sample
But variable assignments are processed before reading any data, so that sets the variable al to the string "$0" at the start of the program, it does not set al to the contents of each input record.
If you want the record, just say so instead of using a variable:
awk -F"|" '{n = split($0, a, "|")} {print a[1]}' sample
By -v a1=$0, you are setting a1 to the name of the current programme, which is bash. See Arguments in man bash.
Err...
awk -F'|' '{ print $1 }' sample

Resources