regular expression and special variable in Ruby - ruby

When using regular expressions in Ruby, what is the difference between
$1
and
#{$1}
?
NOTE:
markup =~ /(\d+)/
#a = $1
s = "<div> ... '#{$1}' ... </div>"
my_function(par_1,#{$1},par_3)
NOTE 2:
I try again ...
regular expression: /(\d+)/
string: 123
The value of $1 is 123, correct ?
If I want to pass the value of the $1 variable to a function, shall I write
my_function(par_1,#{$1},par_3)
or
my_function(par_1,$1,par_3)
If I want to pass the address of the $1 variable to a function, shall I write
my_function(par_1,#{$1},par_3)
or
my_function(par_1,$1,par_3)
Last question: any reference where I could learn more ?

$1 refers to a numbered capture group
#{$1} would be referring to a named capture group with the name "1", which is weird.
Normally it would be #{$named} or #{r[:named]}

Related

Extract value for a key in a key/pair string

I have key value pairs in a string like this:
key1 = "value1"
key2 = "value2"
key3 = "value3"
In a bash script, I need to extract the value of one of the keys like for key2, I should get value2, not in quote.
My bash script needs to work in both Redhat and Ubuntu Linux hosts.
What would be the easiest and most reliable way of doing this?
I tried something like this simplified script:
pattern='key2\s*=\s*\"(.*?)\".*$'
if [[ "$content" =~ $pattern ]]
then
key2="${BASH_REMATCH[1]}"
echo "key2: $key2"
else
echo 'not found'
fi
But it does not work consistently.
Any better/easier/more reliable way of doing this?
To separate the key and value from your $content variable, you can use:
[[ $content =~ (^[^ ]+)[[:blank:]]*=[[:blank:]]*[[:punct:]](.*)[[:punct:]]$ ]]
That will properly populate the BASH_REMATCH array with both values where your key is in BASH_REMATCH[1] and the value in BASH_REMATCH[2].
Explanation
In bash the [[...]] treats what appears on the right side of =~ as an extended regular expression and matched according to man 3 regex. See man 1 bash under the section heading for [[ expression ]] (4th paragraph). Sub-expressions in parenthesis (..) are saved in the array variable BASH_REMATCH with BASH_REMATCH[0] containing the entire portion of the string (your $content) and each remaining elements containing the sub-expressions enclosed in (..) in the order the parenthesis appear in the regex.
The Regular Expression (^[^ ]+)[[:blank:]]*=[[:blank:]]*[[:punct:]](.*)[[:punct:]]$ is explained as:
(^[^ ]+) - '^' anchored at the beginning of the line, [^ ]+ match one or more characters that are not a space. Since this sub-expression is enclosed in (..) it will be saved as BASH_REMATCH[1], followed by;
[[:blank:]]* - zero or more whitespace characters, followed by;
= - an equal sign, followed by;
[[:blank:]]* - zero or more whitespace characters, followed by;
[[:punct:]] - a punctuation character (matching the '"', which avoids caveats associated with using quotes within the regex), followed by the sub-expression;
(.*) - zero or more characters (the rest of the characters), and since it is a sub-expression in (..) it the characters will be stored in BASH_REMATCH[2], followed by;
[[:punct:]] - a punctuation character (matching the '"' ... ditto), at the;
$ - end of line anchor.
So if you match what your key and value input lines separated by an = sign, it will separate the key and value into the array BASH_REMATCH as you wanted.
Bash supports BRE only and you cannot use \s and .*?.
As an alternative, please try:
while IFS= read -r content; do
# pattern='key2\s*=\s*\"(.*)\".*$'
pattern='key2[[:blank:]]*=[[:blank:]]*"([^"]*)"'
if [[ $content =~ $pattern ]]
then
key2="${BASH_REMATCH[1]}"
echo "key2: $key2"
(( found++ ))
fi
done < input-file.txt
if (( found == 0 )); then
echo "not found"
fi
What you start talking about key-value pairs, it is best to use an associative array:
declare -A map
Now looking at your lines, they look like key = "value" where we assume that:
value is always encapsulated by double quotes, but also could contain a quote
an unknown number of white spaces is before and/or after the equal sign.
So assuming we have a variable line which contains key = "value", the following operations will extract that value:
key="${line%%=*}"; key="${key// /}"
value="${line#*=}"; value="${value#*\042}"; value="${value%\042*}"
IFS=" \t=" read -r value _ <<<"$line"
This allows us now to have something like:
declare -A map
while read -r line; do
key="${line%%=*}"; key="${key// /}"
value="${line#*=}"; value="${value#*\042}"; value="${value%\042*}"
map["$key"]="$value"
done <inputfile
With awk:
awk -v key="key2" '$1 == key { gsub("\"","",$3);print $3 }' <<< "$string"
Reading the output of the variable called string, pass the required key in as a variable called key and then if the first space delimited field is equal to the key, remove the quotes from the third field with the gsub function and print.
Ok, after spending so many hours, this is how I solved the problem:
If you don't know where your script will run and what type of file (win/mac/linux) are you reading:
Try to avoid non-greedy macth in linux bash instead of tweaking diffrent switches.
don't trus end of line match $ when you might get data from windows or mac
This post solved my problem: Non greedy text matching and extrapolating in bash
This pattern works for me in may linux environments and all type of end of lines:
pattern='key2\s*=\s*"([^"]*)"'
The value is in BASH_REMATCH[1]

sed to replace list of subnets at nth position in a file

Adding "ip","ip","ip" to each IP address
tf_alist=$(for i in $alist; do echo -n "\"$i\"",; done | sed 's/,$//')
echo $tf_alist
gives:
"192.168.0.216/29","192.168.92.72/30","192.168.92.70/31"
Now, I've variable.tf as shown below:
variable "allowlisted_cidrs_prod" {
type = list(string)
description = "ip ranges - allowlisted - prod instances"
default = ["192.168.241.88/32", "192.168.128.222/32", "192.168.231.150/32"]
}
variable "allowlisted_cidrs_test" {
type = list(string)
description = "ip ranges - allowlisted - test instances"
default = ["192.168.58.61/32", "192.168.3.224/32"]
}
variable "elb_cipher" {
type = string
description = "ELB cipher"
default = "ELBSecurityPolicy-TLS-1-2-2017-01"
}
At variable "allowlisted_cidrs_prod" I want to replace the below list string:
From:
default = ["192.168.241.88/32", "192.168.128.222/32", "192.168.231.150/32"]
To (as per $tf_alist):
default = ["192.168.0.216/29","192.168.92.72/30","192.168.92.70/31"]
can you suggest a way to do it using sed? TYA!
To replace the string, I'm trying to use the below expression to capture the existing IP list:
old_ip_list=$(cat variable.tf | sed -n '/variable "allowlisted_cidrs_prod"/,$ {/^[[:blank:]]*default[[:blank:]]*=[[:blank:]]*\(.*\).*/ { s//\1/p; q; }}')
gives:
["192.168.241.88/32", "192.168.128.222/32", "192.168.231.150/32"]
Ref: https://sed.js.org/?gist=bdeddb0ed01bdc8f96b3a05952909cd7
Removes "[]" from the given output to match with $tf_alist.
echo -e "\n $old_ip_list" | gsed 's/.*\[//;s/\].*//;'
The best way to edit a structured format like this is to use a tool for that particular structured format. If this is code in a programming language, probably change the code to read the values from an external resource into a variable, and don't modify the code at all if you can avoid it.
Having said that, here is a quick refactoring.
tf_alist=$(printf ',"%s"' $alist | sed 's/^,//')
awk -v new="$tf_alist" '/^variable "/ { v=$2 }
v=="\"allowlisted_cidrs_prod\"" && $1 == "default" { sub(/\[.*/, "[" new "]") }
1' variable.tf >new_variable.tf
This simply keeps track of which variable it has last seen, and only replaces the default line when it's the one we are trying to target. The final 1 is a common Awk idiom to unconditionally print all lines. And of course sub is Awk's equivalent of the s/// command in sed, and "[" new "]" is simple string concatenation; we put square brackets around the value of the variable new (which is defined in the -v option - it contains the value of the shell variable $tf_alist).
As all ad-hoc parsers, this is fairly brittle, but it works with the example you provided (demo: https://ideone.com/7kDNCU). If this is for more than a one-off, seriously think about a different approach, rather than spending more time on making this more robust (or, heavens, reimplementing it in read-only sed).

return array from perl to bash

I'm trying to get back an array from perl to bash.
My perl scrip has an array and then I use return(#arr)
from my bash script I use
VAR = `perl....
when I echo VAR
I get the aray as 1 long string with all the array vars connected with no spaces.
Thanks
In the shell (and in Perl), backticks (``) capture the output of a command. However, Perl's return is normally for returning variables from subroutines - it does not produce output, so you probably want print instead. Also, in bash, array variables are declared with parentheses. So this works for me:
$ ARRAY=(`perl -wMstrict -le 'my #array = qw/foo bar baz/; print "#array"'`); \
echo "<${ARRAY[*]}> 0=${ARRAY[0]} 1=${ARRAY[1]} 2=${ARRAY[2]}"
<foo bar baz> 0=foo 1=bar 2=baz
In Perl, interpolating an array into a string (like "#array") will join the array with the special variable $" in between elements; that variable defaults to a single space. If you simply print #array, then the array elements will be joined by the variable $,, which is undef by default, meaning no space between the elements. This probably explains the behavior you mentioned ("the array vars connected with no spaces").
Note that the above will not work the way you expect if the elements of the array contain whitespace, because bash will split them into separate array elements. If your array does contain whitespace, then please provide an MCVE with sample data so we can perhaps make an alternative suggestion of how to return that back to bash. For example:
( # subshell so IFS is only affected locally
IFS=$'\n'
ARRAY=(`perl -wMstrict -e 'my #array = ("foo","bar","quz baz"); print join "\n", #array'`)
echo "0=<${ARRAY[0]}> 1=<${ARRAY[1]}> 2=<${ARRAY[2]}>"
)
Outputs: 0=<foo> 1=<bar> 2=<quz baz>
Here is one way using Bash word splitting, it will split the string on white space into the new array array:
array_str=$(perl -E '#a = 1..5; say "#a"')
array=( $array_str )
for item in ${array[#]} ; do
echo ": $item"
done
Output:
: 1
: 2
: 3
: 4
: 5

Function with awk to print single or multiple columns

I use awk a lot to select single columns and after learning what an alias was I started off with
alias a1='awk '\{print $1}'\'
alias a2='awk '\{print $2}'\'
...
After I learned a little more I thought those were cheesy and replaced them with
function a() {
awk "{print \$$1}"
}
so now I can do a 3 or a 11 without needing to create explicit aliases.
So that's good, but sometimes I need to select more than one column, and when I do I have to resort to typing out the actual full awk '{print ...}' command (the horror!).
So I'm trying to come up with a way to do something similar to the a function but one that will accept different numbers of arguments, so I could do a 3 or a 5 7 or a 2 4 9.
I've tried diff things with $# and $* but can't get it right and everything I'm trying now I know are cheesy workarounds and so I'd rather just stop and ask how to do it the proper way.
Thanks all.
$ cat tst.sh
function a {
awk -v args="$*" '
BEGIN { n=split(args,f) }
{ for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS) }
'
}
echo "a b c d e f" | a 1 3 5
echo "---"
echo "a b c d e f" | a 1 3 4 6
$ ./tst.sh
a c e
---
a c d f
You could get arbitrary complicated with this sort of thing (what if you wanted to be able to say a 2-5 7 11-, as with cut?) but here's one that will work with a list of numbers:
a() { (IFS=,; awk '{print '"${*/#/$}"'}'); }
That requires a bit of explanation.
a() { ... }
defines a shell function, which differs from an alias in various ways, one of which being that you can give it parameters.
Inside the shell function, I want to change the value of IFS; to avoid having to remember the old value and change it back, I surround the command I actually want to execute with (...), which causes it to execute in a subshell. When the subshell finishes, all environmental changes finish with it, so it effectively makes the change to IFS local.
IFS is the set of characters used for word splitting, but it also defines the character used to separate elements in the expansion of "$*" (that is, the list of function or script arguments) when it is surrounded by quotes. So setting it to , means the $* expansion will be a comma-separated list.
The awk program I want to create is actually something like {print $1,$4,$7}, so aside from putting commas between the list, I need to add a $ before each number. I do that with the bash parameter expansion substitute syntax: ${parameter/pattern/replacement}. By specifying * as the parameter, I get $* with the substitution applied to each argument. (Note that the expansion is quoted. If it weren't, it wouldn't work.)
In the replacement expression, the pattern is empty because the # character at the beginning of the pattern indicates that the match must be at the beginning of the string. Since the actual pattern is empty, the first match always be at the beginning of the string and the replacement ($) will therefore be inserted at the beginning of each argument. The # is needed because // is syntactically different: it means "change all occurrences of the pattern", instead of just the first one.
Unlike many languages, in bash search-and-replace expressions are not terminated with a /, but rather with the matching }. If you type ${p/foo/bar/}, it will replace the first instance of foo with bar/.

$1 and \1 in Ruby

When using regular expressions in Ruby, what is the difference between $1 and \1?
\1 is a backreference which will only work in the same sub or gsub method call, e.g.:
"foobar".sub(/foo(.*)/, '\1\1') # => "barbar"
$1 is a global variable which can be used in later code:
if "foobar" =~ /foo(.*)/ then
puts "The matching word was #{$1}"
end
Output:
"The matching word was bar"
# => nil
Keep in mind there's a third option, the block form of sub. Sometimes you need it. Say you want to replace some text with the reverse of that text. You can't use $1 because it's not bound quickly enough:
"foobar".sub(/(.*)/, $1.reverse) # WRONG: either uses a PREVIOUS value of $1,
# or gives an error if $1 is unbound
You also can't use \1, because the sub method just does a simple text-substitution of \1 with the appropriate captured text, there's no magic taking place here:
"foobar".sub(/(.*)/, '\1'.reverse) # WRONG: returns '1\'
So if you want to do anything fancy, you should use the block form of sub ($1, $2, $`, $' etc. will be available):
"foobar".sub(/.*/){|m| m.reverse} # => returns 'raboof'
"foobar".sub(/(...)(...)/){$1.reverse + $2.reverse} # => returns 'oofrab'

Resources