Is it possible to have an awk command within a bash script return values to a bash variable, i.e.,if my awk script does some arithmetic operations, can I store the answers in variables so, they can be accessed in the bash script. If possible, how to distinguish between multiple return variables. Thanks.
No. You can use exit to return an error code, but in general you can't modify the shell environment from a subprocess.
You can also, of course, print the desired content in awk and put it into variables in bash by using read:
read a b c <<< $(echo "foo" | awk '{ print $1; print $1; print $1 }')
Now $a, $b and $c are all 'foo'. Note that you have to use the <<<$() syntax to get read to work. If you use a pipeline of any sort a subprocess is created too and the environment read creates the variables in is lost when the pipeline is done executing.
var=$(awk '{ print $1}')
This should set var to the output of awk. Then you can use string functions or whatever from there to differentiate within the value or have awk print only the part you want.
I know this question is old, but there's another way to do this that's worked really well for me, and that's using an unused file descriptor. We all know stdin (&0), stdout (&1), and stderr (&2), but as long as you redirect it (aka: use it), there's no reason you can't use fd3 (&3).
The advantage to this method over other answers is that your awk script can still write to stdout like normal, but you also get the result variables in bash.
In your awk script, at the end, do something like this:
END {
# print the state to fd3
printf "SUM=%s;COUNT=%s\n", tot, cnt | "cat 1>&3"
}
Then, in your bash script, you can do something like this:
awk -f myscript.awk <mydata.txt 3>myresult.sh
source myresult.sh
echo "SUM=${SUM} COUNT=${COUNT}"
Related
I'm trying to make an awk command which stores an entire config file as variables.
The config file is in the following form (keys never have spaces, but values may):
key=value
key2=value two
And my awk command is:
$(awk -F= '{printf "declare %s=\"%s\"\n", $1, $2}' $file)
Running this without the outer subshell $(...) results in the exact commands that I want being printed, so my question is less about awk, and more about how I can run the output of awk as commands.
The command evaluates to:
declare 'key="value"'
which is somewhat of a problem, since then the double quotes are stored with the value. Even worse is when a space is introduced, which results in:
declare 'key2="value' two"
Of course, I cannot remove the quotes or the multi-word values cause problems.
I've tried most every solution I could find, such as set -f, eval, and system().
You don't need to use Awk for this but the do this with built-ins available. Read the config file properly using input redirection
#!/bin/bash
while IFS== read -r k v; do
declare "$k"="$v"
done < config_file
and source the file as
$ source script.sh
$ echo "$key"
value
$ echo "$key2"
value two
If source is not available explicitly, POSIX-ly way of doing it would be to do just
. ./script.sh
I have a file with below commands
cat /some/dir/with/files/file1_name.tsv|awk -F "\\t" '{print $21$19$23$15}'
cat /some/dir/with/files/file2_name.tsv|awk -F "\\t" '{print $2$13$3$15}'
cat /some/dir/with/files/file3_name.tsv|awk -F "\\t" '{print $22$19$3$15}'
When i loop through the file to run the command, i get below error
cat file | while read line; do $line; done
cat: invalid option -- 'F'
Try `cat --help' for more information.
You are not executing the command properly as you intended it. Since you are reading line by line on the file (for unknown reason) you could call the interpreter directly as below
#!/bin/bash
# ^^^^ for running under 'bash' shell
while IFS= read -r line
do
printf "%s" "$line" | bash
done <file
But this has an overhead of creating a forking a new process for each line of the file. If your commands present under file are harmless and is safe to be run in one shot, you can just as
bash file
and be done with it.
Also for using awk, just do as below for each of the lines to avoid useless cat
awk -F "\\t" '{print $21$19$23$15}' file1_name.tsv
You are expecting the pipe (|) symbol to act as you are accustomed to, but it doesn't. To help you understand, try this :
A="ls / | grep e" # Loads a variable with a command with pipe
$A # Does not work
eval "$A" # Works
When expanding a variable without using eval, expansion and word splitting occurs after the shell interprets redirections and pipes, so your pipe symbol is seen just as a literal character.
Some options you have :
A) Avoid piping, by passing the file name as an argument
awk -F "\\t" '{print $21$19$23$15}' /some/dir/with/files/file1_name.tsv
B) Use eval as shown below, the potential security implications of which I would suggest you to research.
C) Put arguments in file and parse it, avoiding the use of eval, something like :
# Assumes arguments separated by spaces
IFS=" " read -r -a arguments;
awk "${arguments[#]-}"
D) Implement the parsing of your data files in Bash instead of awk, and use your configuration file to specify output without the need for expanding anything (e.g. by specifying fields to print separated by spaces).
The first three approaches involve some form of interpretation of outside data as code, and that comes with risks if the file used as input cannot be guaranteed safe. Approach C might be considered a bit better in that regard, but since the command you are calling is awk, an actual program is passed to awk, so whatever awk can do, an attacker (or careless user) with write access to your file can cause your script to do anything awk can do.
I have something like this:
while read line
do command1 $line | awk -v l="$line" '
...
... awk program here doing something ...
...
'
done < inputfile.txt
Now, command1 will have three possible EXIT code statuses (0,1,2), and depending
on which one occures, processing in awk program will be different.
So, EXIT code serves the purpose of a logic in awk program.
My problem is, I don't know how to pass that EXIT code from command1
to my awk program.
I thought maybe there is a way of passing that EXIT code as a var, along with that
line:
-v l="$line" -v l="$EXITCODE", but did not find the way of doing it.
After exiting from while do done loop, I still need access to
${PIPESTATUS[0]} (original exit status of a command1 command) variable, as I am doing some more stuff after.
ADDITION (LOGIC CHANGED)
while read line
do
stdout=$(command $line);
case $? in
0 )
# need to call awk program
awk -v l="$line" -v stdout="$stdout" -f horsepower
;;
1 )
# no need to call awk program here..
;;
2 )
# no need to call awk program here..
;;
;;
*)
# something unimportant
esac
done < inputfile.txt
So as you can see, I changed a logic a bit, so now I am doing my EXIT status code logic outside of awk
program (now I made it as a separate program in its own file called
horsepower), and need only to call it in case 0), to do processing on stdout output generated from previous command.
horsepower has few lines like:
#! /bin/awk -f
/some regex/ { some action }
, and based on what it finds it acts appropriately. But how to make it now act
on that stdout?
Don't Use PIPESTATUS Array
You can use the Bash PIPESTATUS array to capture the exit status of the last foreground pipeline. The trick is knowing which array element you need. However, you can't capture the exit status of the current pipeline this way. For example:
$ false | echo "${PIPESTATUS[0]}"
0
The first time you run this, the exit status will be 0. However, subsequent runs will return 1, because they're showing the status of the previous command.
Use Separate Commands
You'd be much better off using $? to display the exit status of the previous command. However, that will preclude using a pipeline. You will need to split your commands, perhaps storing your standard output in a variable that you can then feed to the next command. You can also feed the exit status of the last command into awk in the same way.
Since you only posted pseudo-code, consider this example:
$ stdout=$(grep root /etc/passwd); \
awk -v status=$? \
-v stdout="$stdout" \
'BEGIN {printf "Status: %s\nVariable: %s\n", status, stdout}'
Status: 0
Variable: root:x:0:0:root,,,:/root:/bin/bash
Note the semicolon separating the commands. You wouldn't necessarily need the semicolon inside a script, but it allows you to cut and paste the example as a one-liner with separate commands.
The output of your command is stored in stdout, and the exit status is stored in $?. Both are declared as variables to awk, and available within your awk script.
Updated to Take Standard Input from Variable
The OP updated the original question, and now wants standard input to come from the data stored in the stdout variable that is storing the results of the command. This can be done with a here-string. The case statement in the updated question can be modified like so:
while read line; do
stdout=$(command $line)
case $? in
0) awk -v l="$line" -f horsepower <<< "$stdout" ;;
# other cases, as needed
esac
done
In this example, the the stdout variable captures standard output from the command stored in line, and then reuses the variable as standard input to the awk script named "horsepower." There are a number of ways this sort of command evaluation can break, but this will work for the OP's use case as currently posted.
I'm using the ls command to list files to be used as input. For each file found, I need to
Perform a system command (importdb) and write to a log file.
Write to an error log file if the first character of column 2, line 6 of the log file created in step 1 is not "0".
rename the file processed so it won't get re-processed on the next run.
My script:
#!/bin/sh
ls APCVENMAST_[0-9][0-9][0-9][0-9]_[0-9][0-9] |
while read LINE
do
importdb -a test901 APCVENMAST ${LINE} > importdb${LINE}.log
awk "{if (NR==6 && substr($2,1,1) != "0")
print "ERROR processing ", ${LINE} > importdb${LINE}err.log
}" < importdb${LINE}.log
mv ${LINE} ${LINE}.PROCESSED
done
This is very preliminary code, and I'm new to this, but I can't get passed parsing errors as the one below.
The error context is:
{if (NR==6 && >>> substr(, <<< awk The statement cannot be correctly parsed.
Issues:
Never double quote an awk script.
Always quote literal strings.
Pass in shell variables correctly either by using -v if you need to access the value in the BEGIN block or after the scripts i.e. awk -v awkvar="$shellvar" 'condition{code}' file or by awk condition{code}' awkvar="$shellvar"
Always quote shell variables.
Conditional should be outside block.
There is ambiguity with redirection and concatenation precedence so use parenthesis.
So the corrected (syntactical) script:
awk 'NR==6 && substr($2,1,1) != 0 {
print "ERROR processing ", line > ("importdb" line "err.log")
}' line="${LINE}" "importdb${LINE}.log"
You have many more issues but as I don't know what you are trying to achieve it's difficult to suggest the correct approach...
You shouldn't parse the output of ls
Awk reads files you don't need to loop using shell constructs
Hi guys sorry for the awkward title but I think it is a bit subtle to describe. So here is my problem, I want to keep a count (i) in a while loop which reads input from awk, and then print the value of i after the loop. However i becomes back to zero after the loop. Below is a simplified version of my program, in reality I also did some string matching inside the loop so that some lines are skipped and i does not increment.
I have tried to remove awk and do another ordinary while loop and i's value is kept after the loop, therefore I believe it's not due to some syntax error.
Any idea is great appreciated!
#!/bin/bash
arr=();
i=0;
awk -F '{print $1}' SOMEFILE | while read var
do
echo $var;
arr[i]=$var;
i=$((i+1));
echo $i;
done
echo $i;
Because the while loop is in a pipeline, it is running as a subprocess, and the value of i is local to that subprocess. There are several ways to keep the value; use a named pipe instead of running while in a pipeline, use process substitution, or use an interpolating heredoc. Here's an example of the latter:
while read var; do ... done << EOF
$( awk ... )
EOF