Get first character of each string with BASH_REMATCH - bash

I'am trying to get the first character of each string using regex and BASH_REMATCH in shell script.
My input text file contain :
config_text = STACK OVER FLOW
The strings STACK OVER FLOW must be uppercase like that.
My output should be something like this :
SOF
My code for now is :
var = config_text
values=$(grep $var test_file.txt | tr -s ' ' '\n' | cut -c 1)
if [[ $values =~ [=(.*)]]; then
echo $values
fi
As you can see I'am using tr and cut but I'am looking to replace them with only BASH_REMATCH because these two commands have been reported in many links as not functional on MacOs.
I tried something like this :
var = config_text
values=$(grep $var test_file.txt)
if [[ $values =~ [=(.*)(\b[a-zA-Z])]]; then
echo $values
fi
VALUES as I explained should be :
S O F
But it seems \b does not work on shell script.
Anyone have an idea how to get my desired output with BASH_REMATCH ONLY.
Thanks in advance for any help.

A generic BASH_REMATCH solution handling any number of words and any separator.
local input="STACK OVER FLOW" pattern='([[:upper:]]+)([^[:upper:]]*)' result=""
while [[ $input =~ $pattern ]]; do
result+="${BASH_REMATCH[1]::1}${BASH_REMATCH[2]}"
input="${input:${#BASH_REMATCH[0]}}"
done
echo "$result"
# Output: "S O F"

Bash's regexes are kind of cumbersome if you don't know how many words there are in the input string. How's this instead?
config_text="STACK OVER FLOW"
sed 's/\([^[:space:]]\)[^[:space:]]*/\1/g' <<<"$config_text"

First Put a valid shebang and paste your script at https://shellcheck.net for validation/recommendation.
With the assumption that the line starts with config and ends with FLOW e.g.
config_text = STACK OVER FLOW
Now the script.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
while IFS= read -r line; do
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
done < test_file.txt
If there is Only one line or the target string/pattern is at the first line of the test_file.txt, the while loop is not needed.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
IFS= read -r line < test_file.txt
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
Make sure you have and running/using Bashv4+ since MacOS, defaults to Bashv3
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?

Another option rather than bash regex would be to utilize bash parameter expansion substring ${parameter:offset:length} to extract the desired characters:
$ read -ra arr <text.file ; printf "%s%s%s\n" "${arr[2]:0:1}" "${arr[3]:0:1}" "${arr[4]:0:1}"
SOF

Related

Asking for user input in a loop until match found in array of command line arguments

Sometimes I need to find a specific serial in a box with many items, so I wrote a simple Bash script that allows me to use a barcode scanner to scan hundreds of barcodes until a match is found, at which point the screen flashes (so I can see it from the corner of my eyes while looking at the box).
The script works great, but it only checks against one specific serial number provided by the user. Here's the code:
#!/bin/bash
INPUT=''
SCAN=''
SN=''
I='0'
clear
printf "Enter serial\n"
read INPUT
SN=`printf "${INPUT}" | tr '[:lower:]' '[:upper:]'`
# Keep comparing scans to needed serial until a match is found
while [[ "${SCAN}" != *"${SN}"* ]];
do
clear
printf "Looking for [ "${SN}" ]\n"
printf "Please scan barcode\n"
read INPUT
SCAN=`printf "${INPUT}" | tr '[:lower:]' '[:upper:]'`
done
# Flash screen when match is found
while [[ "${I}" -lt 3 ]];
do
printf '\e[?5h' && sleep 0.3
printf '\e[?5l' && sleep 0.3
I=$[${I}+1]
done
printf "FOUND\n"
Today I spent hours trying to implement a way to pass multiple possible serial numbers as command line arguments, but I can't seem to get it working. I would like to be able to pass a small, manageable number of possible serials, like this:
$ ./script.sh sn1 sn2 sn3 sn4 sn5
And for the script continue asking for input until I come across the item I am looking for.
I've studied the handling of shell arguments, but I can't seem to "massage" the above while loop to get it to check if the scanned serial exists in the array (created from the command line arguments passed):
#!/bin/bash
snList=( "$#" )
INPUT=''
SCAN=''
SN=''
I='0'
clear
#displaying "things" so I can see what each variable contains (debugging)
printf "$#\n"
printf "$0\n"
printf "$*\n"
printf "$0\n"
printf "$1\n"
printf "$2\n"
printf "$3\n"
printf "snList: $snList\n"
printf "snList[#]: ${snList[#]}\n"
printf "snList[*]: ${snList[*]}\n"
# Keep comparing scans to needed serial until a match is found
while [[ ! " ${snList[*]} " =~ "${SCAN}" ]];
do
clear
printf "Looking for [ "$*" ]\n"
printf "Please scan barcode\n"
read INPUT
SCAN=`printf "${INPUT}" | tr '[:lower:]' '[:upper:]'`
done
I've tried using ${snList[#]} in the loop as well, same result, it behaves like a match was found immediately, without even asking for a scan (indicating that the content of the while loop is not being executed).
Any help will be immensely appreciated, I think I am close, but I can't figure out what I am doing wrong.
Thanks in advance!
Something like this maybe?
#!/usr/bin/env bash
to_compare_input=("$#")
exglob_pattern_input=$(IFS='|'; printf '%s' "#(${to_compare_input[*]})")
until [[ $user_input == $exglob_pattern_input ]]; do
read -r user_input
done
Run the script with the the following arguments.
bash -x ./myscript foo bar baz more
Output
+ to_compare_input=("$#")
++ IFS='|'
++ printf %s '#(foo|bar|baz|more)'
+ exglob_pattern_input='#(foo|bar|baz|more)'
+ [[ '' == #(foo|bar|baz|more) ]]
+ read -r user_input
papa
+ [[ papa == #(foo|bar|baz|more) ]]
+ read -r user_input
mama
+ [[ mama == #(foo|bar|baz|more) ]]
+ read -r user_input
baz
+ [[ baz == #(foo|bar|baz|more) ]]
The first user input is empty since the builtinread has not been executed to ask for the user's input. As shown at the debug message.
+ [[ '' == #(foo|bar|baz|more) ]]
The second (assuming the user has entered papa) is papa
The third (assuming the user has entered mama) is mama
The last is baz which breaks out of off the until loop, because it belongs to the $extglob_pattern_input, which is an extglob feature.
A regex is also an alternative using the =~ operator.
#!/usr/bin/env bash
to_compare_input=("$#")
regex_pattern_input=$(IFS='|'; printf '%s' "^(${to_compare_input[*]})$")
until [[ $user_input =~ $regex_pattern_input ]]; do
read -r user_input
done
Run the script same as before.
Using two loops which was suggested in the comments section.
#!/usr/bin/env bash
to_compare_input=("$#")
inarray() {
local n=$1 h
shift
for h; do
[[ $n == "$h" ]] && return
done
return 1
}
until inarray "$user_input" "${to_compare_input[#]}"; do
read -r user_input
done
As for the tr if your version of bash supports the ^^ and ,, for uppercase and lowercase parameter expansion. use ${user_input^^}
until [[ ${user_input^^} == $exglob_pattern_input ]]; do
until [[ ${user_input^^} =~ $regex_pattern_input ]]; do
until inarray "${user_input^^}" "${to_compare_input[#]}"; do
Assuming no spaces in the bar code texts. You can do something like this
while read -r INPUT
do
#Append spaces to prevent substring matching
if [[ $(echo " $# " | grep -i " ${INPUT} " | wc -l) -eq 1 ]]
then
break
fi
done

How to capture the longest match of a repeating pattern using BASH_REMATCH

I am trying to capture the longest match of a repeating pattern
do_run() {
local regex='.*((abc)+).*'
local str='_abcabcabc123_'
echo "regex=${regex}"$'\n'
echo "str=${str}"$'\n'
if [[ "${str}" =~ ${regex} ]]
then
for i in ${!BASH_REMATCH[#]}
do
echo "$i=${BASH_REMATCH[i]}"
done
else
echo "no match"
fi
}
I get the following output :
regex=.*((abc)+).*
str=_abcabcabc_
0=_abcabcabc123_
1=abc
2=abc
I am trying to get something like :
regex=.*((abc)+).*
str=_abcabcabc123_
0=_abcabcabc123_
x=abcabcabc
(Update : x is just here to indicate that the index of the matching group does not matter but I need to know what number to use to retrieve the matching group ...)
Update:
After reading comment, the following regex will work : ((abc)+)
However, I also need to capture what precedes and what follows ((abc)+).
I had not mentionned it earlier because I thought the same solution would be applied.
So the new code would be :
do_run() {
local regex='(.*)((abc)+)(.*)'
local str='_abcabcabc123_'
echo "regex=${regex}"$'\n'
echo "str=${str}"$'\n'
if [[ "${str}" =~ ${regex} ]]
then
for i in ${!BASH_REMATCH[#]}
do
echo "$i=${BASH_REMATCH[i]}"
done
else
echo "no match"
fi
}
I get then the following output :
regex=(.*)((abc)+)(.*)
str=_abcabcabc123_
0=_abcabcabc123_
1=_abcabc
2=abc
3=abc
4=123_
I want to be able to retrieve abcabcabc from a matching group but also what precedes it and what follows it
As a workaround you can do like this:
[STEP 101] $ cat foo.sh
v=_abcabcabc123_
if [[ $v =~ (abc)+ ]]; then
middle=${BASH_REMATCH[0]}
[[ $v =~ (.*)"$middle" ]]
before=${BASH_REMATCH[1]}
[[ $v =~ "$middle"(.*) ]]
after=${BASH_REMATCH[1]}
echo "before: $before"
echo "middle: $middle"
echo "after : $after"
fi
[STEP 102] $ bash foo.sh
before: _
middle: abcabcabc
after : 123_
[STEP 103] $
I also need to capture what precedes and what follows ((abc)+).
For that, typically you'll need a negative lookahead with perl regex, something along (?<!abc)((abs)+)(.*).
I am bad at perl regex, with perl-enabled grep I was able to this:
$ grep -oxP '(.*)(?<!abc)((abc)+)\K(.*)' <<<'_abcabcabc123_'
123_
$ grep -oP '((abc)+)' <<<'_abcabcabc123_'
abcabcabc
$ rev <<<'_abcabcabc123_' | grep -oP '(.*)(?<!cba)((cba)+)\K(.*)' | rev
_
Bash has no lookarounds and no perl regex. Consider using python or perl.
But you may use sed by splitting the part on the regex and then reading lines, which may be simpler:
$ readarray -t lines < <(<<<'_abcabcabc123_' sed -E 's/((abc)+)/\n&\n/'); declare -p lines
declare -a lines=([0]="_" [1]="abcabcabc" [2]="123_")
Another idea: you may use bash expansion to replace the abc parts by something unique, then split it on that separator:
$ IFS=' ' read -r before post < <(printf "%s\n" "${str//abc/ }") ; declare -p before post
declare -- before="_"
declare -- post="123_"
# or
$ IFS='#' read -r before post < <(<<<"${str//abc/#}" tr -s '#') ; declare -p before post
declare -- before="_"
declare -- post="123_"
For your given input this regex would work:
re='^([^a]|a[^b]*|ab[^c]*)((abc)+)(.*)'
str='_abcabcabc123_'
[[ $str =~ $re ]] && declare -p BASH_REMATCH
Output:
declare -ar BASH_REMATCH=([0]="_abcabcabc123_" [1]="_" [2]="abcabcabc" [3]="abc" [4]="123_")
So you can use:
"${BASH_REMATCH[1]}" # string before
"${BASH_REMATCH[2]}" # string containing all "abc"s
"${BASH_REMATCH[4]}" # string after
RegEx Demo

How do I translate a list using a dictionary in bash?

Say I have a dictionary TSV file dict.txt:
apple pomme
umbrella parapluie
glass verre
... ...
and another file list.txt containing a list of words (from the left column of dict.txt):
pie
apple
blue
...
I'd like to translate them into the corresponding words from the right column of dict.txt, i.e:
tarte
pomme
bleu
...
what is the easiest way to do so?
You can use awk:
awk 'FNR==NR{a[$1]=$2;next} a[$1]{print a[$1]}' dict.txt list.txt
EDIT: If there is a requirement to have multi words (separated by spaces) as word meaning in the dictionary using tab se field separator you can use:
awk -F '\t' 'FNR==NR{a[$1]=$2;next} a[$1]{print a[$1]}' dict.txt list.txt
If you don't have many words (so that everything fits in memory) you can use an associative array:
#!/bin/bash
declare -A english2french=()
# Build dictionary
linenb=0
while ((++linenb)) && IFS=$'\t' read -r en fr; do
if [[ -z $fr ]] || [[ -z $en ]]; then
echo "Error line $linenb: one of the two is empty fr=\`$fr' en=\`$en'"
continue
fi
english2french["$en"]=$fr
done < dict.txt
# Translate
linenb=0
while ((++linenb)) && read -r en; do
[[ -z $en ]] && continue
fr=${english2french["$en"]}
if [[ -n $fr ]]; then
echo "$fr"
else
echo >&2 "Error line $linenb: word \`$en' unknown"
fi
done < list.txt
It seems a bit long, but there are lots of error checks ;).

How can I concatenate / join two bash variables by a newline character?

My non-working code sample (erroneous line -> empty=$empty\n$url):
empty=""
IFS=$'\n'
for line in $s; do
if [[ $line =~ $regex ]]; then
url="${BASH_REMATCH[2]}${BASH_REMATCH[1]}"
echo $url
empty=$empty\n$url
else
echo "$s does not match"
fi
done
echo $empty|sort -f -t/ -k 4
I try to rebuild the modified lines splitted by the for cycle.
empty="$empty"$'\n'"$url"
$'\n' is a literal newline in bash (double-quoting the variable references is not strictly necessary here, but helps readability; alternative: empty=${empty}$'\n'${url}).
Alternative solution with printf:
printf -v empty '%s\n%s' "$empty" "$url"
String literals can have embedded newlines:
entry="$entry
$url"
or
entry+="
$url"

How to printf a variable length line in fixed length chunks?

I need to to analyze (with grep) and print (with some formatting) the content of an
app's log.
This log contains text data in variable length lines. What I need is, after some grepping, loop each line of this output and print it with a maximum fixed length of 50 characters. If a line is longer than 50 chars, it should print a newline and then continue with the rest in the following line and so on until the line is completed.
I tried to use printf to do this, but it's not working and I don't know why. It just outputs the lines in same fashion of echo, without any consideration about printf formatting, though the \t character (tab) works.
function printContext
{
str="$1"
log="$2"
tmp="/tmp/deluge/$$"
rm -f $tmp
echo ""
echo -e "\tLog entries for $str :"
ln=$(grep -F "$str" "$log" &> "$tmp" ; cat "$tmp" | wc -l)
if [ $ln -gt 0 ];
then
while read line
do
printf "\t%50s\n" "$line"
done < $tmp
fi
}
What's wrong? I Know that I can make a substring routine to accomplish this task, but printf should be handy for stuff like this.
Instead of:
printf "\t%50s\n" "$line"
use
printf "\t%.50s\n" "$line"
to truncate your line to 50 characters only.
I'm not sure about printf but seeing as how perl is installed everywhere, how about a simple 1 liner?
echo $ln | perl -ne ' while( m/.{1,50}/g ){ print "$&\n" } '
Here's a clunky bash-only way to break the string into 50-character chunks
i=0
chars=50
while [[ -n "${y:$((chars*i)):$chars}" ]]; do
printf "\t%s\n" "${y:$((chars*i)):$chars}"
((i++))
done

Resources