How to check for inequality in bash for loop? - bash

I have the following for loop in a bash script:
for (( j = i; ${1:j:3} != " "; j=j + 1 ))
do
sleep 0.1
done
printf '%s' "${letter[${1:i:j}]}"
i=$j
When run, it leads to an infinite loop of the following errors:
/home/com/morsecoder.sh: line 188: letter: bad array subscript
/home/com/morsecoder.sh: line 184: ((: ... != : syntax error: operand expected (error token is "... != ")
The problem is on the first line; the bad array subscript error is almost certainly a byproduct of that.
I can see the error is caused by my ${1:j:3} != " ". Basically, what I need is for the loop to run through the characters in a string until it finds three consecutive spaces. The string contains Morse code, and each letter is separated by 3 characters (because in American Morse, letters can contain 0, 1, or 2 spaces, so 3 is the minimum letter delimiter).
Afterward, I convert what I have detected to be a complete Morse letter to English and print it out, and then move on to the next Morse characters.
The printf part seems to be working fine, but the error here has me puzzled. I checked and I am using (( and )) properly as well as != to check for inequality. I also tried enclosing ${1:j:3} in quotes, but that did nothing. How can I rephrase the for loop so that I don't get an error about invalid syntax?

This form of the for loop is only for arithmetic operations. You need to use a while loop instead:
j=$i
while [[ ${1:j:3} != " " ]]; do
sleep 0.1
j=$((j+1))
done

Basically, what I need is for the loop to run through the characters
in a string until it finds three consecutive spaces. The string
contains Morse code, and each letter is separated by 3 characters
(because in American Morse, letters can contain 0, 1, or 2 spaces, so
3 is the minimum letter delimiter).
Well that seems an odd way to go about it. The shell has better mechanisms for this task than scanning the string by iterating over an index. For example,
# the value of $1, with the longest suffix matching glob pattern " *" removed.
letter=${1%% *}
The purpose of the sleep 0.1 in your example code is not clear to me, but if you're simply simulating the timing of receiving a signal via Morse code (complete with different timings for letters of different Morse length) then it can be addressed separately.
Afterward, I convert what I have detected to be a complete Morse
letter to English and print it out, and then move on to the next Morse
characters.
So I would approach it more like this:
morse=$1
while [[ -n "$morse" ]]; do
# extract the next letter
next_let=${morse%% *}
# sleep for a duration based on the (Morse) length of the letter
sleep "0.${#next_let}"
# print the corresponding decoded Latin letter
printf '%s' "${letter[${next_let}]}"
# remove the Morse letter and its delimiter, if any
morse=${morse:$((${#next_let}+3))}
done
That covers your loop over all the Morse letters, by the way, not just one.
Iterating over a string by index is not necessarily wrong in shell code, but it has bad code smell.

Related

Extract value for a key in a key/pair string

I have key value pairs in a string like this:
key1 = "value1"
key2 = "value2"
key3 = "value3"
In a bash script, I need to extract the value of one of the keys like for key2, I should get value2, not in quote.
My bash script needs to work in both Redhat and Ubuntu Linux hosts.
What would be the easiest and most reliable way of doing this?
I tried something like this simplified script:
pattern='key2\s*=\s*\"(.*?)\".*$'
if [[ "$content" =~ $pattern ]]
then
key2="${BASH_REMATCH[1]}"
echo "key2: $key2"
else
echo 'not found'
fi
But it does not work consistently.
Any better/easier/more reliable way of doing this?
To separate the key and value from your $content variable, you can use:
[[ $content =~ (^[^ ]+)[[:blank:]]*=[[:blank:]]*[[:punct:]](.*)[[:punct:]]$ ]]
That will properly populate the BASH_REMATCH array with both values where your key is in BASH_REMATCH[1] and the value in BASH_REMATCH[2].
Explanation
In bash the [[...]] treats what appears on the right side of =~ as an extended regular expression and matched according to man 3 regex. See man 1 bash under the section heading for [[ expression ]] (4th paragraph). Sub-expressions in parenthesis (..) are saved in the array variable BASH_REMATCH with BASH_REMATCH[0] containing the entire portion of the string (your $content) and each remaining elements containing the sub-expressions enclosed in (..) in the order the parenthesis appear in the regex.
The Regular Expression (^[^ ]+)[[:blank:]]*=[[:blank:]]*[[:punct:]](.*)[[:punct:]]$ is explained as:
(^[^ ]+) - '^' anchored at the beginning of the line, [^ ]+ match one or more characters that are not a space. Since this sub-expression is enclosed in (..) it will be saved as BASH_REMATCH[1], followed by;
[[:blank:]]* - zero or more whitespace characters, followed by;
= - an equal sign, followed by;
[[:blank:]]* - zero or more whitespace characters, followed by;
[[:punct:]] - a punctuation character (matching the '"', which avoids caveats associated with using quotes within the regex), followed by the sub-expression;
(.*) - zero or more characters (the rest of the characters), and since it is a sub-expression in (..) it the characters will be stored in BASH_REMATCH[2], followed by;
[[:punct:]] - a punctuation character (matching the '"' ... ditto), at the;
$ - end of line anchor.
So if you match what your key and value input lines separated by an = sign, it will separate the key and value into the array BASH_REMATCH as you wanted.
Bash supports BRE only and you cannot use \s and .*?.
As an alternative, please try:
while IFS= read -r content; do
# pattern='key2\s*=\s*\"(.*)\".*$'
pattern='key2[[:blank:]]*=[[:blank:]]*"([^"]*)"'
if [[ $content =~ $pattern ]]
then
key2="${BASH_REMATCH[1]}"
echo "key2: $key2"
(( found++ ))
fi
done < input-file.txt
if (( found == 0 )); then
echo "not found"
fi
What you start talking about key-value pairs, it is best to use an associative array:
declare -A map
Now looking at your lines, they look like key = "value" where we assume that:
value is always encapsulated by double quotes, but also could contain a quote
an unknown number of white spaces is before and/or after the equal sign.
So assuming we have a variable line which contains key = "value", the following operations will extract that value:
key="${line%%=*}"; key="${key// /}"
value="${line#*=}"; value="${value#*\042}"; value="${value%\042*}"
IFS=" \t=" read -r value _ <<<"$line"
This allows us now to have something like:
declare -A map
while read -r line; do
key="${line%%=*}"; key="${key// /}"
value="${line#*=}"; value="${value#*\042}"; value="${value%\042*}"
map["$key"]="$value"
done <inputfile
With awk:
awk -v key="key2" '$1 == key { gsub("\"","",$3);print $3 }' <<< "$string"
Reading the output of the variable called string, pass the required key in as a variable called key and then if the first space delimited field is equal to the key, remove the quotes from the third field with the gsub function and print.
Ok, after spending so many hours, this is how I solved the problem:
If you don't know where your script will run and what type of file (win/mac/linux) are you reading:
Try to avoid non-greedy macth in linux bash instead of tweaking diffrent switches.
don't trus end of line match $ when you might get data from windows or mac
This post solved my problem: Non greedy text matching and extrapolating in bash
This pattern works for me in may linux environments and all type of end of lines:
pattern='key2\s*=\s*"([^"]*)"'
The value is in BASH_REMATCH[1]

How to loop through a range of characters in a bash script using ASCII values?

I am trying to write a bash script which will read two letter variables (startletter/stopletter) and after that I need to print from the start letter to the stop letter with a for or something else. How can I do that?
I tried to do
#! /bin/bash
echo "give start letter"
read start
echo "give stop letter" read stop
But none of the for constructs work
#for value in {a..z}
#for value in {$start..$stop}
#for (( i=$start; i<=$stop; i++)) do echo "Letter: $c" done
This question is very well explained in BashFAQ/071 How do I convert an ASCII character to its decimal (or hexadecimal) value and back?
# POSIX
# chr() - converts decimal value to its ASCII character representation
# ord() - converts ASCII character to its decimal value
chr () {
local val
[ "$1" -lt 256 ] || return 1
printf -v val %o "$1"; printf "\\$val "
# That one requires bash 3.1 or above.
}
ord() {
# POSIX
LC_CTYPE=C printf %d "'$1"
}
Re-using them for your requirement, a proper script would be written as
read -p "Input two variables: " startLetter stopLetter
[[ -z "$startLetter" || -z "$stopLetter" ]] && { printf 'one of the inputs is empty\n' >&2 ; }
asciiStart=$(ord "$startLetter")
asciiStop=$(ord "$stopLetter")
for ((i=asciiStart; i<=asciiStop; i++)); do
chr "$i"
done
Would print the letters as expected.
Adding it to community-wiki since this is also a cross-site duplicate from Unix.SE - Bash script to get ASCII values for alphabet
In case you feel adventurous and want to use zsh instead of bash, you can use the following:
For zsh versions below 5.0.7 you can use the BRACE_CCL option:
(snip man zshall) If a brace expression matches none of the above forms, it is left
unchanged, unless the option BRACE_CCL (an abbreviation for 'brace character class') is set. In that case, it is expanded to a list of the individual characters between the braces sorted into the order of the characters in the ASCII character set (multibyte characters are not currently handled). The syntax is similar to a [...] expression in filename generation: - is treated specially to denote a range of characters, but ^ or ! as the first character is treated normally. For example, {abcdef0-9}
expands to 16 words 0 1 2 3 4 5 6 7 8 9 a b c d e f.
#!/usr/bin/env zsh
setopt brace_ccl
echo "give start letter"
read cstart
echo "give stop letter"
read cstop
for char in {${cstart}-${cstop}}; do echo $char; done
For zsh versions from 5.0.7 onwards you can use the default brace expansion :
An expression of the form {c1..c2}, where c1 and c2 are single characters (which may be multibyte characters), is expanded to every character in the range from c1 to c2 in whatever character sequence is used internally. For characters with code points below 128 this is US ASCII (this is the only case most users will need). If any intervening character is not printable, appropriate quotation is used to render it printable. If the character sequence is reversed, the output is in reverse order, e.g. {d..a} is substituted as d c b a.
#!/usr/bin/env zsh
echo "give start letter"
read cstart
echo "give stop letter"
read cstop
for char in {${cstart}..${cend}; do echo $char; done
More information on zsh can be found here and the quick reference

Bash ranges of non digit or letters

I played around with bash {..} constructs today. I knew
{a..z}
would generate all letters,
{0..9}
digits etc. (numbers in general obviously), but By mistake I got
{Z..a}
yielding:
Z [ ] ^ _ ` a
The characters in between "Z" (90) and "a" (97) are the ASCII 91-96. The astute reader will notice there is a character missing - "\", 92. I'm guessing because of it's special nature. Is this expected behavior as output? Specifically, I'm guessing the \ is being used to escape the space in front of it after substitution, but #John1024 notes that:
echo {Z..a}a
will complain on missing backticks, while the previous version (no a) does not. How exactly is substitution working? Is there a bug?
Second, I guessed the range operator is cooler than I thought and can do any range of ASCII characters I choose, but {[.._} for example fails. Am I missing something to make this work or is this just a curiosity? Are there any more ranges besides letters/digits I can use? and if not, why not do nothing (fail, echo as is) for 'jumping' from caps to lower?
The \ is being generated; however, it subsequently appears to be treated as escaping the following space. Compare:
$ printf '%s\n' 'Z' '[' ']' '^' '_' '`' 'a'
Z
[
]
^
_
`
a
$ printf '%s\n' {Z..a}
Z
[
]
^
_
`
a
The extra blank line following the [ is the space escaped by the backslash generated by {Z..a}.
A special variable obase can be used with bc to print almost any character range(s):
for n in {91..95}; do printf "\x$(echo "obase=16; $n" | bc)"; done
Result:
[\]^_
↳ https://www.gnu.org/software/bc/manual/html_mono/bc.html#TOC6

Shell Bash Replace or remove part of a number or string

Good day.
Everyday i receive a list of numbers like the example below:
11986542586
34988745236
2274563215
4532146587
11987455478
3652147859
As you can see some of them have a 9(11 digits total) in as the third digit and some dont(10 digits total, that`s because the ones with an extra 9 are the new Brazilian mobile number format and the ones without it are in the old format.
The thing is that i have to use the numbers in both formats as a parameter for another script and i usually have do this by hand.
I am trying to create a script that reads the length of a mobile number and check it`s and add or remove the "9" of a number or string if the digits condition is met and save it in a separate file condition is met.
So far i am only able to check its length but i don`t know how to add or remove the "9" in the third digit.
#!/bin/bash
Numbers_file="/FILES/dir/dir2/Numbers_File.txt"
while read Numbers
do
LEN=${#Numbers}
if [ $LEN -eq "11" ]; then
echo "lenght = "$LEN
elif [ $LEN -eq "10" ];then
echo "lenght = "$LEN
else
echo "error"
fi
done < $Numbers_file
You can delete the third character of any string with sed as follows:
sed 's/.//3'
Example:
echo "11986542586" | sed 's/.//3'
1186542586
To add a 9 in the third character:
echo "2274563215" | sed 's/./&9/3'
22794563215
If you are absolutely sure about the occurrence happening only at the third position, you can use an awk statement as below,
awk 'substr($0,3,1)=="9"{$0=substr($0,1,2)substr($0,4,length($0))}1' file
1186542586
3488745236
2274563215
4532146587
1187455478
3652147859
Using the POSIX compliant substr() function, process only the lines having 9 at the 3rd position and move around the record not considering that digit alone.
substr(s, m[, n ])
Return the at most n-character substring of s that begins at position m, numbering from 1. If n is omitted, or if n specifies more characters than are left in the string, the length of the substring shall be limited by the length of the string s
There are lots of text manipulation tools that will do this, but the lightest weight is probably cut because this is all it does.
cut only supports a single range but does have an invert function so cut -c4 would give you just the 4th character, but add in --complement and you get everything but character 4.
echo 1234567890 | cut -c4 --complement
12356789

shell script to add leading zeros in middle of file name

I have files with names like "words_transfer1_morewords.txt". I would like to ensure that the number after "transfer" is five digits, as in "words_transfer00001_morewords.txt". How would I do this with a ksh script? Thanks.
This will work in any Bourne-type/POSIX shell as long as your words and morewords don't contain numbers:
file=words_transfer1_morewords.txt
prefix=${file%%[0-9]*} # words_transfer
suffix=${file##*[0-9]} # _morewords.txt
num=${file#$prefix} # 1_morewords.txt
num=${num%$suffix} # 1
file=$(printf "%s%05d%s" "$prefix" "$num" "$suffix")
echo "$file"
Use ksh's regular expression matching operation to break the filename down into separate parts, them put them back together again after formatting the number.
pre="[^[:digit:]]+" # What to match before the number
num="[[:digit:]]+" # The number to match
post=".*" # What to match after the number
[[ $file =~ ($pre)($num)($post) ]]
new_file=$(printf "%s%05d%s\n" "${.sh.match[#]:1:3}")
Following a successful match with =~, the special array parameter .sh.match contains the full match in element 0, and each capture group in order starting with element 1.

Resources