Calling an executable program using awk - shell

I have a program in C that I want to call by using awk in shell scripting. How can I do something like this?

From the AWK man page:
system(cmd)
executes cmd and returns its exit status
The GNU AWK manual also has a section that, in part, describes the system function and provides an example:
system("date | mail -s 'awk run done' root")

A much more robust way would be to use the getline() function of GNU awk to use a variable from a pipe. In form cmd | getline result, cmd is run, then its output is piped to getline. It returns 1 if got output, 0 if EOF, -1 on failure.
First construct the command to run in a variable in the BEGIN clause if the command is not dependant on the contents of the file, e.g. a simple date or an ls.
A simple example of the above would be
awk 'BEGIN {
cmd = "ls -lrth"
while ( ( cmd | getline result ) > 0 ) {
print result
}
close(cmd);
}'
When the command to run is part of the columnar content of a file, you generate the cmd string in the main {..} as below. E.g. consider a file whose $2 contains the name of the file and you want it to be replaced with the md5sum hash content of the file. You can do
awk '{ cmd = "md5sum "$2
while ( ( cmd | getline md5result ) > 0 ) {
$2 = md5result
}
close(cmd);
}1'
Another frequent usage involving external commands in awk is during date processing when your awk does not support time functions out of the box with mktime(), strftime() functions.
Consider a case when you have Unix EPOCH timestamp stored in a column and you want to convert that to a human readable date format. Assuming GNU date is available
awk '{ cmd = "date -d #" $1 " +\"%d-%m-%Y %H:%M:%S\""
while ( ( cmd | getline fmtDate) > 0 ) {
$1 = fmtDate
}
close(cmd);
}1'
for an input string as
1572608319 foo bar zoo
the above command produces an output as
01-11-2019 07:38:39 foo bar zoo
The command can be tailored to modify the date fields on any of the columns in a given line. Note that -d is a GNU specific extension, the *BSD variants support -f ( though not exactly similar to -d).
More information about getline can be referred to from this AllAboutGetline article at awk.freeshell.org page.

There are several ways.
awk has a system() function that will run a shell command:
system("cmd")
You can print to a pipe:
print "blah" | "cmd"
You can have awk construct commands, and pipe all the output to the shell:
awk 'some script' | sh

Something as simple as this will work
awk 'BEGIN{system("echo hello")}'
and
awk 'BEGIN { system("date"); close("date")}'

I use the power of awk to delete some of my stopped docker containers. Observe carefully how i construct the cmd string first before passing it to system.
docker ps -a | awk '$3 ~ "/bin/clish" { cmd="docker rm "$1;system(cmd)}'
Here, I use the 3rd column having the pattern "/bin/clish" and then I extract the container ID in the first column to construct my cmd string and passed that to system.

It really depends :) One of the handy linux core utils (info coreutils) is xargs. If you are using awk you probably have a more involved use-case in mind - your question is not very detailled.
printf "1 2\n3 4" | awk '{ print $2 }' | xargs touch
Will execute touch 2 4. Here touch could be replaced by your program. More info at info xargs and man xargs (really, read these).
I believe you would like to replace touch with your program.
Breakdown of beforementioned script:
printf "1 2\n3 4"
# Output:
1 2
3 4
# The pipe (|) makes the output of the left command the input of
# the right command (simplified)
printf "1 2\n3 4" | awk '{ print $2 }'
# Output (of the awk command):
2
4
# xargs will execute a command with arguments. The arguments
# are made up taking the input to xargs (in this case the output
# of the awk command, which is "2 4".
printf "1 2\n3 4" | awk '{ print $2 }' | xargs touch
# No output, but executes: `touch 2 4` which will create (or update
# timestamp if the files already exist) files with the name "2" and "4"
Update In the original answer, I used echo instead of printf. However, printf is the better and more portable alternative as was pointed out by a comment (where great links with discussions can be found).

#!/usr/bin/awk -f
BEGIN {
command = "ls -lh"
command |getline
}
Runs "ls -lh" in an awk script

You can call easily with parameters via the system argument.
For example, to kill jobs corresponding to a certain string (we can otherly of course) :
ps aux | grep my_searched_string | awk '{system("kill " $2)}'

I was able to have this done via below method
cat ../logs/em2.log.1 |grep -i 192.168.21.15 |awk '{system(`date`); print $1}'
awk has a function called system it enables you to execute any linux bash command within the output of awk.

Related

Using awk to replace part of a line with the output of a program

I've got a file with several columns, like so:
13:46:48 1.2.3.4:57 user1
13:46:49 5.6.7.8:58 user2
13:48:07 9.10.11.12:59 user3
I'd like to transform one of the columns by passing it as input to a program:
echo "1.2.3.4:57" | transformExternalIp
10.0.0.4:57
I wrote a small bit of awk to do this:
awk '{ ("echo " $2 " | transformExternalIp") | getline output; $2=output; print}'
But what I got surprised me. Initially, it looked like it was working as expected, but then I started to see weird repeated values. In order to debug, I removed my fancy "transformExternalIp" program in case it was the problem and replaced it with echo and cat, which means literally nothing should change:
awk '{ ("echo " $2 " | cat") | getline output; print $2 " - " output}' connections.txt
For the first thousand lines or so, the left and right sides matched, but then after that, the right side frequently stopped changing:
1.2.3.4:57 - 1.2.3.4:57
2.2.3.4:12 - 2.2.3.4:12
3.2.3.4:24 - 3.2.3.4:24
# .... (okay for a long while)
120.120.3.4:57 - 120.120.3.4:57
121.120.3.4:25 - 120.120.3.4:57
122.120.3.4:100 - 120.120.3.4:57
123.120.3.4:76 - 120.120.3.4:57
What the heck have I done wrong? I'm guessing that I'm misunderstanding something about awk.
Close the command after each invocation to insure a new copy of the command is run for the next set of input, eg:
awk '{ ("echo " $2 " | transformExternalIp") | getline output
close("echo " $2 " | transformExternalIp")
$2=output
print
}'
# or, to reduce issues from making a typo:
awk '{ cmd="echo " $2 " | transformExternalIp"
(cmd) | getline output
close(cmd)
$2=output
print
}'
For more details see this and this.
During my testing with a dummy script (echo $RANDOM; sleep .1) I could generate similar results as OP ... some good/expected lines and then a bunch of duplicates.
I noticed that as soon as the duplicates started occuring, the dummy script wasn't actually being called any more and instead awk was treating the system call as a static result (ie, kept re-using the value from the last 'good' call); it was quite noticeable because the sleep .1 was no longer being called so the output from the awk script sped up significantly.
Can't say that I understand 100% what's happening under the covers ... perhaps an issue with how the script (my dummy script; OP's transforExternalIp) behaves with multiple lines of input when expecting one line of input ... an issue with a limit on the number of open/active process handles ... shrug
("echo" $2" | cat") creates a fork almost every time that you use it.
Then, when the above instruction reaches some kind of fork limit, the output variable isn't updated by getline anymore; that's what's happening here.
If you're using GNU awk then you can fix the issue with a Coprocess:
awk '
BEGIN { cmd = "cat" }
{
print $2 |& cmd
cmd |& getline output
print $2 " - " output
}
' connections.txt
Awk is a tool to manipulate text. A shell is a tool to sequence calls to other tools. Don't use a shell to call awk to sequence calls to transformExternalIp as if it were a shell, just use a shell:
while read -r _ old_ip _; do
new_ip=$(printf '%s\n' "$old_ip" | transformExternalIp)
printf '%s - %s\n' "$old_ip" "$new_ip"
done < connections.txt
When you're using awk you're spawning a subshell for every call to transformExternalIp so it's no more efficient (probably a bit less efficient) than just staying in shell.

how to select the last line of the shell output

Hi I have a shell command like this.
s3=$(awk 'BEGIN{ print "S3 bucket path" }
/Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 }
/s3:\/\//{ print "," $10 }' OFS=',' hive-server2.log)
The output of the above command like this.
echo $s3
2018-02-21T17:58:22,
2018-02-21T17:58:26,
2018-02-21T18:05:33,
2018-02-21T18:05:34
I want to select the last line only. I need the last output like this.
2018-02-21T18:05:34
I tried like this.
awk -v $s3 '{print $(NF)}'
Not working.Any help will be appreciated.
In general, command | tail -n 1 prints the last line of the output from command. However, where command is of the form awk '... { ... print something }' you can refactor to awk '... { ... result = something } END { print result }' to avoid spawning a separate process just to discard the other output. (Conversely, you can replace awk '/condition/ { print something }' | head -n 1 with awk '/condition/ { print something; exit }'.)
If you already have the result in a shell variable s3 and want to print just the last line, a parameter expansion echo "${s3##*$'\n'}" does that. The C-style string $'\n' to represent a newline is a Bash extension, and the parameter expansion operator ## to remove the longest matching prefix isn't entirely portable either, so you should make sure the shebang line says #!/bin/bash, not #!/bin/sh
Notice also that $s3 without quotes is an error unless you specifically require the shell to perform whitespace tokenization and wildcard expansion on the value. You should basically always use double quotes around variables except in a couple of very specific scenarios.
Your Awk command would not work for two reasons; firstly, as explained in the previous paragraph, you are setting s3 to the first token of the variable, and the second is your Awk script (probably a syntax error). In more detail, you are basically running
awk -v s3=firstvalue secondvalue thirdvalue '{ print $(NF) }'
^ value ^ script to run ^ names of files ...
where you probably wanted to say
awk -v s3=$'firstvalue\nsecondvalue\nthirdvalue' '{ print $(NF) }'
But even with quoting, your script would set v to something but then tell Awk to (ignore the variable and) process standard input, which on the command line leaves it reading from your terminal. A fixed script might look like
awk 'END { print }' <<<"$s3"
which passes the variable as standard input to Awk, which prints the last line. The <<<value "here string" syntax is also a Bash extension, and not portable to POSIX sh.
much simple way is
command | grep "your filter" | tail -n 1
or directly
command | tail -n 1
You could try this:
echo -e "This is the first line \nThis is the second line" | awk 'END{print}'
another approach can be, processing the file from the end and exiting after first match.
tac file | awk '/match/{print; exit}'
Hi you can do it just by adding echo $s3 | sed '$!d'
s3=$(awk 'BEGIN{ print "S3 bucket path" }/Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 } /s3:\/\//{ print "," $10 }' OFS=',' hive-server2.log)
echo $s3 | sed '$!d'
It will simply print:-
2018-02-21T18:05:34
Hope this will help you.

How to write a bash script that dumps itself out to stdout (for use as a help file)?

Sometimes I want a bash script that's mostly a help file. There are probably better ways to do things, but sometimes I want to just have a file called "awk_help" that I run, and it dumps my awk notes to the terminal.
How can I do this easily?
Another idea, use #!/bin/cat -- this will literally answer the title of your question since the shebang line will be displayed as well.
Turns out it can be done as pretty much a one liner, thanks to #CharlesDuffy for the suggestions!
Just put the following at the top of the file, and you're done
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
So for my awk_help example, it'd be:
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
# Basic form of all awk commands
awk search pattern { program actions }
# advanced awk
awk 'BEGIN {init} search1 {actions} search2 {actions} END { final actions }' file
# awk boolean example for matching "(me OR you) OR (john AND ! doe)"
awk '( /me|you/ ) || (/john/ && ! /doe/ )' /path/to/file
# awk - print # of lines in file
awk 'END {print NR,"coins"}' coins.txt
# Sum up gold ounces in column 2, and find out value at $425/ounce
awk '/gold/ {ounces += $2} END {print "value = $" 425*ounces}' coins.txt
# Print the last column of each line in a file, using a comma (instead of space) as a field separator:
awk -F ',' '{print $NF}' filename
# Sum the values in the first column and pretty-print the values and then the total:
awk '{s+=$1; print $1} END {print "--------"; print s}' filename
# functions available
length($0) > 72, toupper,tolower
# count the # of times the word PASSED shows up in the file /tmp/out
cat /tmp/out | awk 'BEGIN {X=0} /PASSED/{X+=1; print $1 X}'
# awk regex operators
https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operators.html
I found another solution that works on Mac/Linux and works exactly as one would hope.
Just use the following as your "shebang" line, and it'll output everything from line 2 on down:
test.sh
#!/usr/bin/tail -n+2
hi there
how are you
Running this gives you what you'd expect:
$ ./test.sh
hi there
how are you
and another possible solution - just use less, and that way your file will open in searchable gui
#!/usr/bin/less
and this way you can grep if for something too, e.g.
$ ./test.sh | grep something

How to execute awk command in shell script

I have an awk command that extracts the 16th column from 3rd line in a csv file and prints the first 4 characters.
awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}'
This works fine.
But when I execute it from a shell script, I get and error
#!/bin/ksh
YEAR=awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}'
Error message:
-F,: not found
Use command substitution to assign the output of a command to a variable, as shown below:
YEAR=$(awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}')
you are asking the shell to do :
VAR=value command [arguments...]
which means: launch command but pass it the VAR=value environment first
(ex: LC_ALL=C grep '[0-9]*' /some/file.txt : will grep a number in file.txt (and this with the LC_ALL variable set to C just for the duration of the call of grep)
So here : you ask the shell to launch the -F"," command (ie, -F, once the shell interpret the "," into , with arguments 'NR==3.......... and with the variable YEAR set to the value awk for the duration of the command invocation.
Just replace it with :
#!/bin/ksh
YEAR="$(awk -F',' 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,1,4)}')"
(I didn't try it, but I hope they work for you and your sample.csv file)
(Note that you use "0" to match character position 1, which works in many awk implementation but not all (ie most (but not all) assume 1 when you write 0))
From your description, it looks like you want to extract the year from the 16th field, which might contain leading spaces. You can accomplish it by calling AWK once:
YEAR=$(awk -F, 'NR==3{sub(/^[ \t]*/, "", $16); print ">" substr($16,1,4) "<" }')
Better yet, you don't even have to use awk. Since you are already writing shell script, let's do it all in shell script:
{ read line; read line; read line; } < sample.csv # Get the third line
IFS=, set $line # Breaks line into comma-separated fields
IFS=" " set ${16} # Trick to remove leading spaces, field 16 becomes field 1
YEAR=${1:0:4} # Extract the first 4 char from field 1
Do this:
year=$(awk -F, 'NR==3{sub(/^[ \t]+/,"",$16); print substr($16,1,4); exit }' sample.csv)

Can I chain multiple commands and make all of them take the same input from stdin?

In bash, is there a way to chain multiple commands, all taking the same input from stdin? That is, one command reads stdin, does some processing, writes the output to a file. The next command in the chain gets the same input as what the first command got. And so on.
For example, consider a large text file to be split into multiple files by filtering the content. Something like this:
cat food_expenses.txt | grep "coffee" > coffee.txt | grep "tea" > tea.txt | grep "honey cake" > cake.txt
This obviously does not work, because the second grep gets the first grep's output, not the original text file. I tried inserting tee's but that does not help. Is there some bash magic that can cause the first grep to send its input to the pipe, not the output?
And by the way, splitting a file was a simple example. Consider splitting (filering by pattern search) a continuous live text stream coming over a network and writing the output to different named pipes or sockets. I would like to know if there is an easy way to do it using a shell script.
(This question is a cleaned up version of my earlier one , based on responses that pointed out the unclearness)
For this example, you should use awk as semiuseless suggests.
But in general to have N arbitrary programs read a copy of a single input stream, you can use tee and bash's process output substitution operator:
tee <food_expenses.txt \
>(grep "coffee" >coffee.txt) \
>(grep "tea" >tea.txt) \
>(grep "honey cake" >cake.txt)
Note that >(command) is a bash extension.
The obvious question is why do you want to do this within one command ?
If you don't want to write a script, and you want to run stuff in parallel, bash supports the concepts of subshells, and these can run in parallel. By putting your command in brackets, you can run your greps (or whatever) concurrently e.g.
$ (grep coffee food_expenses.txt > coffee.txt) && (grep tea food_expenses.txt > tea.txt)
Note that in the above your cat may be redundant since grep takes an input file argument.
You can (instead) play around with redirecting output through different streams. You're not limited to stdout/stderr but can assign new streams as required. I can't advise more on this other than direct you to examples here
I like Stephen's idea of using awk instead of grep.
It ain't pretty, but here's a command that uses output redirection to keep all data flowing through stdout:
cat food.txt |
awk '/coffee/ {print $0 > "/dev/stderr"} {print $0}'
2> coffee.txt |
awk '/tea/ {print $0 > "/dev/stderr"} {print $0}'
2> tea.txt
As you can see, it uses awk to send all lines matching 'coffee' to stderr, and all lines regardless of content to stdout. Then stderr is fed to a file, and the process repeats with 'tea'.
If you wanted to filter out content at each step, you might use this:
cat food.txt |
awk '/coffee/ {print $0 > "/dev/stderr"} $0 !~ /coffee/ {print $0}'
2> coffee.txt |
awk '/tea/ {print $0 > "/dev/stderr"} $0 !~ /tea/ {print $0}'
2> tea.txt
You could use awk to split into up to two files:
awk '/Coffee/ { print "Coffee" } /Tea/ { print "Tea" > "/dev/stderr" }' inputfile > coffee.file.txt 2> tea.file.txt
I am unclear why the filtering needs to be done in different steps. A single awk program can scan all the incoming lines, and dispatch the appropriate lines to individual files. This is a very simple dispatch that can feed multiple secondary commands (i.e. persistent processes that monitor the output files for new input, or the files could be sockets that are setup ahead of time and written to by the awk process.).
If there is a reason to have every filter see every line, then just remove the "next;" statements, and every filter will see every line.
$ cat split.awk
BEGIN{}
/^coffee/ {
print $0 >> "/tmp/coffee.txt" ;
next;
}
/^tea/ {
print $0 >> "/tmp/tea.txt" ;
next;
}
{ # default
print $0 >> "/tmp/other.txt" ;
}
END {}
$
Here are two bash scripts without awk. The second one doesn't even use grep!
With grep:
#!/bin/bash
tail -F food_expenses.txt | \
while read line
do
for word in "coffee" "tea" "honey cake"
do
if [[ $line != ${line#*$word*} ]]
then
echo "$line"|grep "$word" >> ${word#* }.txt # use the last word in $word for the filename (i.e. cake.txt for "honey cake")
fi
done
done
Without grep:
#!/bin/bash
tail -F food_expenses.txt | \
while read line
do
for word in "coffee" "tea" "honey cake"
do
if [[ $line != ${line#*$word*} ]] # does the line contain the word?
then
echo "$line" >> ${word#* }.txt # use the last word in $word for the filename (i.e. cake.txt for "honey cake")
fi
done
done;
Edit:
Here's an AWK method:
awk 'BEGIN {
list = "coffee tea";
split(list, patterns)
}
{
for (pattern in patterns) {
if ($0 ~ patterns[pattern]) {
print > patterns[pattern] ".txt"
}
}
}' food_expenses.txt
Working with patterns which include spaces remains to be resolved.
You can probably write a simple AWK script to do this in one shot. Can you describe the format of your file a little more?
Is it space/comma separated?
do you have the item descriptions on a specific 'column' where columns are defined by some separator like space, comma or something else?
If you can afford multiple grep runs this will work,
grep coffee food_expanses.txt> coffee.txt
grep tea food_expanses.txt> tea.txt
and, so on.
Assuming that your input is not infinite (as in the case of a network stream that you never plan on closing) I might consider using a subshell to put the data into a temp file, and then a series of other subshells to read it. I haven't tested this, but maybe it would look something like this
{ cat inputstream > tempfile };
{ grep tea tempfile > tea.txt };
{ grep coffee tempfile > coffee.txt};
I'm not certain of an elegant solution to the file getting too large if your input stream is not bounded in size however.

Resources