Bash ranges of non digit or letters - bash

I played around with bash {..} constructs today. I knew
{a..z}
would generate all letters,
{0..9}
digits etc. (numbers in general obviously), but By mistake I got
{Z..a}
yielding:
Z [ ] ^ _ ` a
The characters in between "Z" (90) and "a" (97) are the ASCII 91-96. The astute reader will notice there is a character missing - "\", 92. I'm guessing because of it's special nature. Is this expected behavior as output? Specifically, I'm guessing the \ is being used to escape the space in front of it after substitution, but #John1024 notes that:
echo {Z..a}a
will complain on missing backticks, while the previous version (no a) does not. How exactly is substitution working? Is there a bug?
Second, I guessed the range operator is cooler than I thought and can do any range of ASCII characters I choose, but {[.._} for example fails. Am I missing something to make this work or is this just a curiosity? Are there any more ranges besides letters/digits I can use? and if not, why not do nothing (fail, echo as is) for 'jumping' from caps to lower?

The \ is being generated; however, it subsequently appears to be treated as escaping the following space. Compare:
$ printf '%s\n' 'Z' '[' ']' '^' '_' '`' 'a'
Z
[
]
^
_
`
a
$ printf '%s\n' {Z..a}
Z
[
]
^
_
`
a
The extra blank line following the [ is the space escaped by the backslash generated by {Z..a}.

A special variable obase can be used with bc to print almost any character range(s):
for n in {91..95}; do printf "\x$(echo "obase=16; $n" | bc)"; done
Result:
[\]^_
↳ https://www.gnu.org/software/bc/manual/html_mono/bc.html#TOC6

Related

How to convert a semantic version shell variable to a shifted integer?

Given a shell variable whose value is a semantic version, how can I create another shell variable whose value is (tuple 1 × 1000000) + (tuple 2 × 1000) + (tuple 3) ?
E.g.
$ FOO=1.2.3
$ BAR=#shell magic that, given ${FOO} returns `1002003`
# Shell-native string-manipulation? sed? ...?
I'm unclear about how POSIX-compliance vs. shell-specific syntax comes into play here, but I think a solution not bash-specific is preferred.
Update: To clarify: this isn't as straightforward as replacing "." with zero(es), which was my initial thought.
E.g. The desired output for 1.12.30 is 1012030, not 100120030, which is what a .-replacement approach might provide.
Bonus if the answer can be a one-liner variable-assignment.
A perl one-liner:
echo $FOO | perl -pne 's/\.(\d+)/sprintf "%03d", $1/eg'
How it works:
perl -pne does a REPL with the supplied program
The program contains a replacement function s///
The search string is the regex \.(\d+) which matches a string beginning with dot and ends with digits and capture those digits
The e modifier of the s/// function evaluates the right-hand side of the s/// replacement as an expression. Since we captured the digits, they'll be converted into int and formatted into leading zeros with sprintf
The g modifier replaces all instances of the regex in the input string
Demo
Split on dots, then loop and multiply/add:
version="1.12.30"
# Split on dots instead of spaces from now on
IFS="."
# Loop over each number and accumulate
int=0
for n in $version
do
int=$((int*1000 + n))
done
echo "$version is $int"
Be aware that this treats 1.2 and 0.1.2 the same. If you want to always treat the first number as major/million, consider padding/truncating beforehand.
This should do it
echo $foo | sed 's/\./00/g'
How about this?
$ ver=1.12.30
$ foo=$(bar=($(echo $ver|sed 's/\./ /g')); expr ${bar[0]} \* 1000000 + ${bar[1]} \* 1000 + ${bar[2]})
$ echo $foo
1012030

Extract value for a key in a key/pair string

I have key value pairs in a string like this:
key1 = "value1"
key2 = "value2"
key3 = "value3"
In a bash script, I need to extract the value of one of the keys like for key2, I should get value2, not in quote.
My bash script needs to work in both Redhat and Ubuntu Linux hosts.
What would be the easiest and most reliable way of doing this?
I tried something like this simplified script:
pattern='key2\s*=\s*\"(.*?)\".*$'
if [[ "$content" =~ $pattern ]]
then
key2="${BASH_REMATCH[1]}"
echo "key2: $key2"
else
echo 'not found'
fi
But it does not work consistently.
Any better/easier/more reliable way of doing this?
To separate the key and value from your $content variable, you can use:
[[ $content =~ (^[^ ]+)[[:blank:]]*=[[:blank:]]*[[:punct:]](.*)[[:punct:]]$ ]]
That will properly populate the BASH_REMATCH array with both values where your key is in BASH_REMATCH[1] and the value in BASH_REMATCH[2].
Explanation
In bash the [[...]] treats what appears on the right side of =~ as an extended regular expression and matched according to man 3 regex. See man 1 bash under the section heading for [[ expression ]] (4th paragraph). Sub-expressions in parenthesis (..) are saved in the array variable BASH_REMATCH with BASH_REMATCH[0] containing the entire portion of the string (your $content) and each remaining elements containing the sub-expressions enclosed in (..) in the order the parenthesis appear in the regex.
The Regular Expression (^[^ ]+)[[:blank:]]*=[[:blank:]]*[[:punct:]](.*)[[:punct:]]$ is explained as:
(^[^ ]+) - '^' anchored at the beginning of the line, [^ ]+ match one or more characters that are not a space. Since this sub-expression is enclosed in (..) it will be saved as BASH_REMATCH[1], followed by;
[[:blank:]]* - zero or more whitespace characters, followed by;
= - an equal sign, followed by;
[[:blank:]]* - zero or more whitespace characters, followed by;
[[:punct:]] - a punctuation character (matching the '"', which avoids caveats associated with using quotes within the regex), followed by the sub-expression;
(.*) - zero or more characters (the rest of the characters), and since it is a sub-expression in (..) it the characters will be stored in BASH_REMATCH[2], followed by;
[[:punct:]] - a punctuation character (matching the '"' ... ditto), at the;
$ - end of line anchor.
So if you match what your key and value input lines separated by an = sign, it will separate the key and value into the array BASH_REMATCH as you wanted.
Bash supports BRE only and you cannot use \s and .*?.
As an alternative, please try:
while IFS= read -r content; do
# pattern='key2\s*=\s*\"(.*)\".*$'
pattern='key2[[:blank:]]*=[[:blank:]]*"([^"]*)"'
if [[ $content =~ $pattern ]]
then
key2="${BASH_REMATCH[1]}"
echo "key2: $key2"
(( found++ ))
fi
done < input-file.txt
if (( found == 0 )); then
echo "not found"
fi
What you start talking about key-value pairs, it is best to use an associative array:
declare -A map
Now looking at your lines, they look like key = "value" where we assume that:
value is always encapsulated by double quotes, but also could contain a quote
an unknown number of white spaces is before and/or after the equal sign.
So assuming we have a variable line which contains key = "value", the following operations will extract that value:
key="${line%%=*}"; key="${key// /}"
value="${line#*=}"; value="${value#*\042}"; value="${value%\042*}"
IFS=" \t=" read -r value _ <<<"$line"
This allows us now to have something like:
declare -A map
while read -r line; do
key="${line%%=*}"; key="${key// /}"
value="${line#*=}"; value="${value#*\042}"; value="${value%\042*}"
map["$key"]="$value"
done <inputfile
With awk:
awk -v key="key2" '$1 == key { gsub("\"","",$3);print $3 }' <<< "$string"
Reading the output of the variable called string, pass the required key in as a variable called key and then if the first space delimited field is equal to the key, remove the quotes from the third field with the gsub function and print.
Ok, after spending so many hours, this is how I solved the problem:
If you don't know where your script will run and what type of file (win/mac/linux) are you reading:
Try to avoid non-greedy macth in linux bash instead of tweaking diffrent switches.
don't trus end of line match $ when you might get data from windows or mac
This post solved my problem: Non greedy text matching and extrapolating in bash
This pattern works for me in may linux environments and all type of end of lines:
pattern='key2\s*=\s*"([^"]*)"'
The value is in BASH_REMATCH[1]

Shift between two characters

How to get a shift between two characters in bash?
For instance, in C++ we have:
'c'-'a'=2
Are there any elegant solutions?
Define ord to get the ASCII value of each character (from Unix & Linux Stack Exchange, Bash FAQ):
ord() { LC_CTYPE=C printf '%d' "'$1"; }
(note that the ' is not a typo! It is required for printf to treat a character as a number1)
Then you can subtract one from the other:
$ echo "$(( "$(ord c)" - "$(ord a)" ))"
2
If you wanted to put this in a function, you could:
diff_ord() { echo "$(( "$(ord $1)" - "$(ord $2)" ))"; }
Then call it like:
$ diff_ord c a
2
If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote.

How to loop through a range of characters in a bash script using ASCII values?

I am trying to write a bash script which will read two letter variables (startletter/stopletter) and after that I need to print from the start letter to the stop letter with a for or something else. How can I do that?
I tried to do
#! /bin/bash
echo "give start letter"
read start
echo "give stop letter" read stop
But none of the for constructs work
#for value in {a..z}
#for value in {$start..$stop}
#for (( i=$start; i<=$stop; i++)) do echo "Letter: $c" done
This question is very well explained in BashFAQ/071 How do I convert an ASCII character to its decimal (or hexadecimal) value and back?
# POSIX
# chr() - converts decimal value to its ASCII character representation
# ord() - converts ASCII character to its decimal value
chr () {
local val
[ "$1" -lt 256 ] || return 1
printf -v val %o "$1"; printf "\\$val "
# That one requires bash 3.1 or above.
}
ord() {
# POSIX
LC_CTYPE=C printf %d "'$1"
}
Re-using them for your requirement, a proper script would be written as
read -p "Input two variables: " startLetter stopLetter
[[ -z "$startLetter" || -z "$stopLetter" ]] && { printf 'one of the inputs is empty\n' >&2 ; }
asciiStart=$(ord "$startLetter")
asciiStop=$(ord "$stopLetter")
for ((i=asciiStart; i<=asciiStop; i++)); do
chr "$i"
done
Would print the letters as expected.
Adding it to community-wiki since this is also a cross-site duplicate from Unix.SE - Bash script to get ASCII values for alphabet
In case you feel adventurous and want to use zsh instead of bash, you can use the following:
For zsh versions below 5.0.7 you can use the BRACE_CCL option:
(snip man zshall) If a brace expression matches none of the above forms, it is left
unchanged, unless the option BRACE_CCL (an abbreviation for 'brace character class') is set. In that case, it is expanded to a list of the individual characters between the braces sorted into the order of the characters in the ASCII character set (multibyte characters are not currently handled). The syntax is similar to a [...] expression in filename generation: - is treated specially to denote a range of characters, but ^ or ! as the first character is treated normally. For example, {abcdef0-9}
expands to 16 words 0 1 2 3 4 5 6 7 8 9 a b c d e f.
#!/usr/bin/env zsh
setopt brace_ccl
echo "give start letter"
read cstart
echo "give stop letter"
read cstop
for char in {${cstart}-${cstop}}; do echo $char; done
For zsh versions from 5.0.7 onwards you can use the default brace expansion :
An expression of the form {c1..c2}, where c1 and c2 are single characters (which may be multibyte characters), is expanded to every character in the range from c1 to c2 in whatever character sequence is used internally. For characters with code points below 128 this is US ASCII (this is the only case most users will need). If any intervening character is not printable, appropriate quotation is used to render it printable. If the character sequence is reversed, the output is in reverse order, e.g. {d..a} is substituted as d c b a.
#!/usr/bin/env zsh
echo "give start letter"
read cstart
echo "give stop letter"
read cstop
for char in {${cstart}..${cend}; do echo $char; done
More information on zsh can be found here and the quick reference

Bash: Using '\b' makes strings not equal?

I'm a bit perplexed as to why this doesn't return true...
if [ "$(echo -e 'b\bg')" == "g" ]; then
echo "true"
else
echo "false"
fi
Even without ' (instead a \") it doesn't work.
Running it in a console does:
~$ echo -e 'b\bg'
g
So, does g not equal g in this senario or something?
\b does not remove the preceding character from the string; when displayed, it causes the cursor to move backwards one position, resulting in the preceding character being overwritten by the following character. b\bg is still a 3-character string that will not match any one-character string, even if they look identical when displayed. (For that matter, it is not guaranteed that a terminal will treat \b as moving the cursor back; it might simply display an unprintable character glyph in its place, e.g. b?g.)

Resources