Modify a shell variable inside awk block of code - bash

Is there any way to modify a shell variable inside awk block of code?
--------- [shell_awk.sh]---------------------
#!/bin/bash
shell_variable_1=<value A>
shell_variable_2=<value B>
shell_variable_3=<value C>
awk 'function A(X)
{ return X+1 }
{ a=A('$shell_variable_1')
b=A('$shell_variable_2')
c=A('$shell_variable_3')
shell_variable_1=a
shell_variable_2=b
shell_variable_3=c
}' FILE.TXT
--------- [shell_awk.sh]---------------------
This is a very simple example, the real script load a file and make some changes using functions, I need to keep each value before change into a specific variable, so then I can register into MySQL the before and after value.
The after value is received from parameters ($1, $2 and so on).
The value before I already know how to get it from the file.
All is done well, except the shell_variable been set by awk variable. Outside from awk block code is easy to set, but inside, is it possible?

No program -- in awk, shell, or any other language -- can directly modify a parent process's memory. That includes variables. However, of course, your awk can write contents to stdout, and the parent shell can read that content and modify its own variables itself.
Here's an example of awk that writes key/value pairs out to be read by bash. It's not perfect -- read the caveats below.
#!/bin/bash
shell_variable_1=10
shell_variable_2=20
shell_variable_3=30
run_awk() {
awk -v shell_variable_1="$shell_variable_1" \
-v shell_variable_2="$shell_variable_2" \
-v shell_variable_3="$shell_variable_3" '
function A(X) { return X+1 }
{ a=A(shell_variable_1)
b=A(shell_variable_2)
c=A(shell_variable_3) }
END {
print "shell_variable_1=" a
print "shell_variable_2=" b
print "shell_variable_3=" c
}' <<<""
}
while IFS="=" read -r key value; do
printf -v "$key" '%s' "$value"
done < <(run_awk)
for var in shell_variable_{1,2,3}; do
printf 'New value for %s is %s\n' "$var" "${!var}"
done
Advantages
Doesn't use eval. Content such as $(rm -rf ~) in the output from awk won't be executed by your shell.
Disadvantages
Can't handle variable contents with newlines. (You could fix this by NUL-delimiting output from your awk script, and adding -d '' to the read command).
A hostile awk script could modify PATH, LD_LIBRARY_PATH, or other security-sensitive variables. (You could fix this by reading variables into an associative array, rather than the global namespace, or by enforcing a prefix on their names).
The code above uses several ksh extensions also available in bash; however, it will not run with POSIX sh. Thus, be sure not to run this via sh scriptname (which only guarantees POSIX functionality).

Related

Bash script does nothing when I run it, seems to keep waiting

I've written my first script, one in which I want to know if 2 files have the same values in a specific column.
Both files are WEKA machine-learning prediction outputs for different algorithms, hence they have to be in the same format, but the prediction column would be different.
Here's the code I've written based on the tutorial presented in https://linuxconfig.org/bash-scripting-tutorial-for-beginners:
#!/bin/bash
lineasdel1=$(wc -l $1 | awk '{print $1}')
lineasdel2=$(wc -l $2 | awk '{print $1}')
if [ "$lineasdel1" != "$lineasdel2" ]; then
echo "Files $1 and $2 have different number of lines, unable to perform"
exit 1
fi
function quitalineasraras {
awk '$1!="==="&&NF>0'
}
function acomodo {
awk '{gsub(/^ +| +$/, ""); gsub(/ +0/, " W 0"); gsub(/ +1$/, " W 1"); gsub(/ +/, "\t") gsub(/\+\tW/, "+"); print}'
}
function procesodel1 {
quitalineasraras "$1" | acomodo
}
function procesodel2 {
quitalineasraras "$2" | acomodo
}
el1procesado=$(procesodel1)
el2procesado=$(procesodel2)
function pegar {
paste <(echo "$el1procesado") <(echo "$el2procesado")
}
function contarintersec {
awk 'BEGIN {FS="\t"} $3==$8 {n++} END {print n}'
}
unido=$(pegar)
interseccion=$(contarintersec $unido)
echo "Estos 2 archivos tienen $interseccion coincidencias."
I ran all individual codes of all functions in the terminal and verified they work successfully (I'm using Linux Mint 19.2). Script's permissions also have been changed to make it executable. Paste command also is supposed to work with that variable syntax.
But when I run it via:
./script.sh file1 file2
if both files have the same number of lines, and I press enter, no output is obtained; instead, the terminal opens an empty line with cursor waiting for something. In order to write another command, I've got to press CTRL+C.
If both files have different number of lines the error message prints successfully, so I think the problem has something to do with the functions, with the fact that awk has different syntax for some chores, or with turning the output of functions into variables.
I know that I'm missing something, but can't come up with what could be.
Any help will be appreciated.
what could be.
function quitalineasraras {
awk '$1!="==="&&NF>0'
}
function procesodel1 {
quitalineasraras "$1" | acomodo
}
el1procesado=$(procesodel1)
The positional variables $1 are set for each function separately. The "$1" inside procesodel1 expands to empty. The quitalineasraras is passed one empty argument "".
The awk inside quitalineasraras is passed only the script without the filename, so it reads the input for standard input, ie. it waits for the input on standard input.
The awk inside quitalineasraras without any file arguments makes your script seem to wait.

bash script to modify and extract information

I am creating a bash script to modify and summarize information with grep and sed. But it gets stuck.
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
#Extract lines starting with ">#HWI"
ONLY=`grep -v ^\>#HWI`
#replaces A and G with R in lines
ONLYR=`sed -e s/A/R/g -e s/G/R/g $ONLY`
grep R $ONLYR | wc -l
The correct way to write a shell script to do what you seem to be trying to do is:
awk '
!/^>#HWI/ {
gsub(/[AG]/,"R")
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
Just put that in the file myscript.sh and execute it as you do today.
To be clear - the bulk of the above code is an awk script, the shell script part is the first and last lines where the shell just calls awk and passes it the input file names.
If you WANT to have intermediate variables then you can create/print them with:
awk '
!/^>#HWI/ {
only = $0
onlyR = only
gsub(/[AG]/,"R",onlyR)
print "only:", only
print "onlyR:", onlyR
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
The above will work robustly, portably, and efficiently on all UNIX systems.
First of all, and as #fedorqui commented - you're not providing grep with a source of input, against which it will perform line matching.
Second, there are some problems in your script, which will result in unwanted behavior in the future, when you decide to manipulate some data:
Store matching lines in an array, or a file from which you'll later read values. The variable ONLY is not the right data structure for the task.
By convention, environment variables (PATH, EDITOR, SHELL, ...) and internal shell variables (BASH_VERSION, RANDOM, ...) are fully capitalized. All other variable names should be lowercase. Since
variable names are case-sensitive, this convention avoids accidentally overriding environmental and internal variables.
Here's a better version of your script, considering these points, but with an open question regarding what you were trying to do in the last line : grep R $ONLYR | wc -l :
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
input_file=$1
# Read lines not matching the provided regex, from $input_file
mapfile -t only < <(grep -v '^\>#HWI' "$input_file")
#replaces A and G with R in lines
for((i=0;i<${#only[#]};i++)); do
only[i]="${only[i]//[AG]/R}"
done
# DEBUG
printf '%s\n' "Here are the lines, after relpace:"
printf '%s\n' "${only[#]}"
# I'm not sure what you were trying to do here. Am I gueesing right that you wanted
# to count the number of R's in ALL lines ?
# grep R $ONLYR | wc -l

how to find the position of a string in a file in unix shell script

Can you please help me solve this puzzle? I am trying to print the location of a string (i.e., line #) in a file, first to the std output, and then capture that value in a variable to be used later. The string is “my string”, the file name is “myFile” which is defined as follows:
this is first line
this is second line
this is my string on the third line
this is fourth line
the end
Now, when I use this command directly at the command prompt:
% awk ‘s=index($0, “my string”) { print “line=” NR, “position= ” s}’ myFile
I get exactly the result I want:
% line= 3, position= 9
My question is: if I define a variable VAR=”my string”, why can’t I get the same result when I do this:
% awk ‘s=index($0, $VAR) { print “line=” NR, “position= ” s}’ myFile
It just won’t work!! I even tried putting the $VAR in quotation marks, to no avail? I tried using VAR (without the $ sign), no luck. I tried everything I could possibly think of ... Am I missing something?
awk variables are not the same as shell variables. You need to define them with the -v flag
For example:
$ awk -v var="..." '$0~var{print NR}' file
will print the line number(s) of pattern matches. Or for your case with the index
$ awk -v var="$Var" 'p=index($0,var){print NR,p}' file
using all uppercase may not be good convention since you may accidentally overwrite other variables.
to capture the output into a shell variable
$ info=$(awk ...)
for multi line output assignment to shell array, you can do
$ values=( $(awk ...) ); echo ${values[0]}
however, if the output contains more than one field, it will be assigned it's own array index. You can change it with setting the IFS variable, such as
$ IFS=$(echo -en "\n\b"); values=( $(awk ...) )
which will capture the complete lines as the array values.

set multiple variables from one awk command?

This is a very common script:
#!/bin/bash
teststr="col1 col2"
var1=`echo ${teststr} | awk '{print $1}'`
var2=`echo ${teststr} | awk '{print $2}'`
echo var1=${var1}
echo var2=${var2}
However I dont like this, especially when there are more fields to parse.
I guess there should be a better way like:
(var1,var2)=`echo ${teststr} | awk '{print $1 $2}'
(in my imagination)
Is that so?
Thanks for help to improve effeciency and save some CPU power.
This might work for you:
var=(col0 col1 col2)
echo "${var[1]}"
col1
Yes, you can, but the best practice is to use the awk way to pass variables to awk.
Example using shell script variables
awk -v awkVar1="$scriptVar1" -v awkVar2="$scriptVar2" '<your awk code>'
Example using environmental variables
awk -v awkVar1=ENVIRON["ENV_VAR1"] -v awkVar2=ENVIRON["ENV_VAR2"] '<your awk code>'
It's possible to use script and environmental variables at the same time
awk -v awkVar1=ENVIRON["ENV_VAR1"] -v awkVar2="$scriptVar2" '<your awk code>'
You may find bash tricks to circumvent the awk way to do it, but it's not safe.
Explanation and more examples
Awk works this way, because it's a programming language by itself and has it's own way to use variables 'inside' awk statements.
By 'inside' i mean the part between the single quotes.
Let's see an example, where we turn off DHCP in a config file, all done using variables in a shell script. I'm going to explain the last line of code.
The script isn't optimal, it's main purpose is to use script variables. Explaining how the script does its job is out of scope of this answer, the focus is on explaining the use of variables.
#!/bin/bash
# set some variables
# set path to the config file to edit
CONFIG_FILE=/etc/netplan/01-netcfg.yaml
# find the line number of the line to change using awk and assign it to a variable
DHCP_LINE=$(awk '/dhcp4: yes/{print FNR}' $CONFIG_FILE)
# get the number of spaces used for identation using awk and assign it to a variable
SPACES=$(awk -v awkDHCP_LINE="$DHCP_LINE" 'FNR==awkDHCP_LINE {print match($0,/[^ ]|$/)-1}' $CONFIG_FILE)
# find DHCP setting and turn it off if needed
awk -v awkDHCP_LINE="$DHCP_LINE" -v awkSPACES="$SPACES" 'FNR==awkDHCP_LINE {sub("dhcp4: yes", "dhcp4: no")}' $CONFIG_FILE
Let's break this last line up to pieces for explanation.
awk -v awkDHCP_LINE="$DHCP_LINE" -v awkSPACES="$SPACES"
This part above assigns the value of DHCP_LINE script variable to the awkDHCP_LINE awk variable and the the value of SPACES script variable to the awkSPACESawk variable.
Please note, that the SPACES variable is passed to awk for the sake of showing how to pass multiple variables only; the awk command doesn't process it.
'FNR==awkDHCP_LINE {sub("dhcp4: yes", "dhcp4: no")}'
This one above is the 'inside' part of awk where the variable(s) passed to awk can be used.
$CONFIG_FILE
This part is outside awk, a generic script variable is used to specify the file that should be processed.
I hope this clears things a bit :)
Note: if you have lots of variables to pass, the solution provided by #potong may prove a better approach depending on your use case.
Bash has Array Support, We just need to supply values dynamically :)
function test_set_array_from_awk(){
# Note : -a is required as declaring array
let -a myArr;
# Hard Coded Valeus
# myArr=( "Foo" "Bar" "other" );
# echo "${myArr[1]}" # Print Bar
# Dynamic values
myArr=( $(echo "" | awk '{print "Foo"; print "Bar"; print "Fooo-And-Bar"; }') );
# Value #index 0
echo "${myArr[0]}" # Print Foo
# Value #index 1
echo "${myArr[1]}" # Print Bar
# Array Length
echo ${#myArr[#]} # Print 3 as array length
# Safe Reading with Default value
echo "${myArr[10]-"Some-Default-Value"}" # Print Some-Default-Value
echo "${myArr[10]-0}" # Print 0
echo "${myArr[10]-''}" # Print ''
echo "${myArr[10]-}" # Print nothing
# With Dynamic Index
local n=2
echo "${myArr["${n}"]-}" # Print Fooo-And-Bar
}
# calling test function
test_set_array_from_awk
Bash Array Documentation : http://tldp.org/LDP/abs/html/arrays.html
You can also use shell set builtin to place whitespace seperated (or more accurately, IFS seperated) into the variables $1, $2 and so on:
#!/bin/bash
teststr="col1 col2"
set -- $teststr
echo "$1" # col1
echo "$2" # col2

Reading java .properties file from bash

I am thinking of using sed for reading .properties file, but was wondering if there is a smarter way to do that from bash script?
This would probably be the easiest way: grep + cut
# Usage: get_property FILE KEY
function get_property
{
grep "^$2=" "$1" | cut -d'=' -f2
}
The solutions mentioned above will work for the basics. I don't think they cover multi-line values though. Here is an awk program that will parse Java properties from stdin and produce shell environment variables to stdout:
BEGIN {
FS="=";
print "# BEGIN";
n="";
v="";
c=0; # Not a line continuation.
}
/^\#/ { # The line is a comment. Breaks line continuation.
c=0;
next;
}
/\\$/ && (c==0) && (NF>=2) { # Name value pair with a line continuation...
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e - 1); # Trim off the backslash.
c=1; # Line continuation mode.
next;
}
/^[^\\]+\\$/ && (c==1) { # Line continuation. Accumulate the value.
v= "" v substr($0,1,length($0)-1);
next;
}
((c==1) || (NF>=2)) && !/^[^\\]+\\$/ { # End of line continuation, or a single line name/value pair
if (c==0) { # Single line name/value pair
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e);
} else { # Line continuation mode - last line of the value.
c=0; # Turn off line continuation mode.
v= "" v $0;
}
# Make sure the name is a legal shell variable name
gsub(/[^A-Za-z0-9_]/,"_",n);
# Remove newlines from the value.
gsub(/[\n\r]/,"",v);
print n "=\"" v "\"";
n = "";
v = "";
}
END {
print "# END";
}
As you can see, multi-line values make things more complex. To see the values of the properties in shell, just source in the output:
cat myproperties.properties | awk -f readproperties.awk > temp.sh
source temp.sh
The variables will have '_' in the place of '.', so the property some.property will be some_property in shell.
If you have ANT properties files that have property interpolation (e.g. '${foo.bar}') then I recommend using Groovy with AntBuilder.
Here is my wiki page on this very topic.
I wrote a script to solve the problem and put it on my github.
See properties-parser
One option is to write a simple Java program to do it for you - then run the Java program in your script. That might seem silly if you're just reading properties from a single properties file. However, it becomes very useful when you're trying to get a configuration value from something like a Commons Configuration CompositeConfiguration backed by properties files. For a time, we went the route of implementing what we needed in our shell scripts to get the same behavior we were getting from CompositeConfiguration. Then we wisened up and realized we should just let CompositeConfiguration do the work for us! I don't expect this to be a popular answer, but hopefully you find it useful.
If you want to use sed to parse -any- .properties file, you may end up with a quite complex solution, since the format allows line breaks, unquoted strings, unicode, etc: http://en.wikipedia.org/wiki/.properties
One possible workaround would using java itself to preprocess the .properties file into something bash-friendly, then source it. E.g.:
.properties file:
line_a : "ABC"
line_b = Line\
With\
Breaks!
line_c = I'm unquoted :(
would be turned into:
line_a="ABC"
line_b=`echo -e "Line\nWith\nBreaks!"`
line_c="I'm unquoted :("
Of course, that would yield worse performance, but the implementation would be simpler/clearer.
In Perl:
while(<STDIN>) {
($prop,$val)=split(/[=: ]/, $_, 2);
# and do stuff for each prop/val
}
Not tested, and should be more tolerant of leading/trailing spaces, comments etc., but you get the idea. Whether you use Perl (or another language) over sed is really dependent upon what you want to do with the properties once you've parsed them out of the file.
Note that (as highlighted in the comments) Java properties files can have multiple forms of delimiters (although I've not seen anything used in practice other than colons). Hence the split uses a choice of characters to split upon.
Ultimately, you may be better off using the Config::Properties module in Perl, which is built to solve this specific problem.
I have some shell scripts that need to look up some .properties and use them as arguments to programs I didn't write. The heart of the script is a line like this:
dbUrlFile=$(grep database.url.file etc/zocalo.conf | sed -e "s/.*: //" -e "s/#.*//")
Effectively, that's grep for the key and filter out the stuff before the colon and after any hash.
if you want to use "shell", the best tool to parse files and have proper programming control is (g)awk. Use sed only simple substitution.
I have sometimes just sourced the properties file into the bash script. This will lead to environment variables being set in the script with the names and contents from the file. Maybe that is enough for you, too. If you have to do some "real" parsing, this is not the way to go, of course.
Hmm, I just run into the same problem today. This is poor man's solution, admittedly more straightforward than clever;)
decl=`ruby -ne 'puts chomp.sub(/=(.*)/,%q{="\1";}).gsub(".","_")' my.properties`
eval $decl
then, a property 'my.java.prop' can be accessed as $my_java_prop.
This can be done with sed or whatever, but I finally went with ruby for its 'irb' which was handy for experimenting.
It's quite limited (dots should be replaced only before '=',no comment handling), but could be a starting point.
#Daniel, I tried to source it, but Bash didn't like dots in variable names.
I have had some success with
PROPERTIES_FILE=project.properties
function source_property {
local name=$1
eval "$name=\"$(sed -n '/^'"$name"'=/,/^[A-Z]\+_*[A-Z]*=/p' $PROPERTIES_FILE|sed -e 's/^'"$name"'=//g' -e 's/"/\\"/g'|head -n -1)\""
}
source_property 'SOME_PROPERTY'
This is a solution that properly parses quotes and terminates at a space when not given quotes. It is safe: no eval is used.
I use this code in my .bashrc and .zshrc for importing variables from shell scripts:
# Usage: _getvar VARIABLE_NAME [sourcefile...]
# Echos the value that would be assigned to VARIABLE_NAME
_getvar() {
local VAR="$1"
shift
awk -v Q="'" -v QQ='"' -v VAR="$VAR" '
function loc(text) { return index($0, text) }
function unquote(d) { $0 = substr($0, eq+2) d; print substr($0, 1, loc(d)-1) }
{ sub(/^[ \t]+/, ""); eq = loc("=") }
substr($0, 1, eq-1) != VAR { next } # assignment is not for VAR: skip
loc("=" QQ) == eq { unquote(QQ); exit }
loc("=" Q) == eq { unquote( Q); exit }
{ print substr($1, eq + 1); exit }
' "$#"
}
This saves the desired variable name and then shifts the argument array so the rest can be passed as files to awk.
Because it's so hard to call shell variables and refer to quote characters inside awk, I'm defining them as awk variables on the command line. Q is a single quote (apostrophe) character, QQ is a double quote, and VAR is that first argument we saved earlier.
For further convenience, there are two helper functions. The first returns the location of the given text in the current line, and the second prints the content between the first two quotes in the line using quote character d (for "delimiter"). There's a stray d concatenated to the first substr as a safety against multi-line strings (see "Caveats" below).
While I wrote the code for POSIX shell syntax parsing, that appears to only differ from your format by whether there is white space around the asignment. You can add that functionality to the above code by adding sub(/[ \t]*=[ \t]*/, "="); before the sub(…) on awk's line 4 (note: line 1 is blank).
The fourth line strips off leading white space and saves the location of the first equals sign. Please verify that your awk supports \t as tab, this is not guaranteed on ancient UNIX systems.
The substr line compares the text before the equals sign to VAR. If that doesn't match, the line is assigning a different variable, so we skip it and move to the next line.
Now we know we've got the requested variable assignment, so it's just a matter of unraveling the quotes. We do this by searching for the first location of =" (line 6) or =' (line 7) or no quotes (line 8). Each of those lines prints the assigned value.
Caveats: If there is an escaped quote character, we'll return a value truncated to it. Detecting this is a bit nontrivial and I decided not to implement it. There's also a problem of multi-line quotes, which get truncated at the first line break (this is the purpose of the "stray d" mentioned above). Most solutions on this page suffer from these issues.
In order to let Java do the tricky parsing, here's a solution using jrunscript to print the keys and values in a bash read-friendy (key, tab character, value, null character) way:
#!/usr/bin/env bash
jrunscript -e '
p = new java.util.Properties();
p.load(java.lang.System.in);
p.forEach(function(k,v) { out.format("%s\t%s\000", k, v); });
' < /tmp/test.properties \
| while IFS=$'\t' read -d $'\0' -r key value; do
key=${key//./_}
printf -v "$key" %s "$value"
printf '=> %s = "%s"\n' "$key" "$value"
done
I found printf -v in this answer by #david-foerster.
To quote jrunscript: Warning: Nashorn engine is planned to be removed from a future JDK release

Resources