Extracting a substring from a variable using bash script

Extracting a substring from a variable using bash script - bash

I have a bash variable with value something like this:
10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
There are no spaces within value. This value can be very long or very short. Here pairs such as 65:3.0 exist. I know the value of a number from the first part of pair, say 65. I want to extract the number 3.0 or pair 65:3.0. I am not aware of the position (offset) of 65.
I will be grateful for a bash-script that can do such extraction. Thanks.

Probably awk is the most straight-forward approach:
awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
3.0
Or to get the pair:
$ awk -F: -v RS=',' '$1==65' <<< "$var"
65:3.0

Here's a pure Bash solution:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
while read -r -d, i; do
[[ $i = 65:* ]] || continue
echo "$i"
done <<< "$var,"
You may use break after echo "$i" if there's only one 65:... in var, or if you only want the first one.
To get the value 3.0: echo "${i#*:}".
Other (pure Bash) approach, without parsing the string explicitly. I'm assuming you're only looking for the first 65 in the string, and that it is present in the string:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
value=${var#*,65:}
value=${value%%,*}
echo "$value"
This will be very slow for long strings!
Same as above, but will output all the values corresponding to 65 (or none if there are none):
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
tmpvar=,$var
while [[ $tmpvar = *,65:* ]]; do
tmpvar=${tmpvar#*,65:}
echo "${tmpvar%%,*}"
done
Same thing, this will be slow for long strings!
The fastest I can obtain in pure Bash is my original answer (and it's fine with 10000 fields):
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
IFS=, read -ra ary <<< "$var"
for i in "${ary[#]}"; do
[[ $i = 65:* ]] || continue
echo "$i"
done
In fact, no, the fastest I can obtain in pure Bash is with this regex:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
[[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"
Test of this vs awk,
where the 65:3.0 is at the end:
printf -v var '%s:3.0,' {100..11000}
var+=65:42.0
time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
shows 0m0.020s (rough average) whereas:
time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
shows 0m0.008s (rough average too).
where the 65:3.0 is not at the end:
printf -v var '%s:3.0,' {1..10000}
time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
shows 0m0.020s (rough average) and with early exit:
time awk -F: -v RS=',' '$1==65{print $2;exit}' <<< "$var"
shows 0m0.010s (rough average) whereas:
time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
shows 0m0.002s (rough average).

With grep:
grep -o '\b65\b[^,]*' <<<"$var"
65:3.0
Or
grep -oP '\b65\b:\K[^,]*' <<<"$var"
3.0
\K option ignores everything before matched pattern and ignore pattern itself. It's Perl-compatibility(-P) for grep command .

Here is an gnu awk
awk -vRS="(^|,)65:" -F, 'NR>1{print $1}' <<< "$var"
3.0

try
echo $var | tr , '\n' | awk '/65/'
where
tr , '\n' turn comma to new line
awk '/65/' pick the line with 65
or
echo $var | tr , '\n' | awk -F: '$1 == 65 {print $2}'
where
-F: use : as separator
$1 == 65 pick line with 65 as first field
{ print $2} print second field

Using sed
sed -e 's/^.*,\(65:[0-9.]*\),.*$/\1/' <<<",$var,"
output:
65:3.0
There are two different ways to protect against 65:3.0 being the first-in-line or last-in-line. Above, commas are added to surround the variable providing for an occurrence regardless. Below, the Gnu extension \? is used to specify zero-or-one occurrence.
sed -e 's/^.*,\?\(65:[0-9.]*\),\?.*$/\1/' <<<$var
Both handle 65:3.0 regardless of where it appears in the string.

Try egrep like below:
echo $myvar | egrep -o '\b65:[0-9]+.[0-9]+' |

Related

Getting word index by a delimiter in a variable

Given line, delimiter, and word I want to get the index place of that word in the line based on the delimiter. As simple/short as possible. So for:
line="this-is-a-line_with-some.txt"
delimiter="-"
word="some"
echo <code goes here>
# should come out as 4
Of course I can split it with an array, and print the first occurrence of the word with a for loop, as follows:
line="this-is-a-line_with-some.txt"
delimiter="-"
word="some"
index=0
IFS="$delimiter" read -ra ary <<<"$line"
for i in "${ary[#]}"; do
if [[ $i == ${word}* ]]; then echo $index ; break ; fi
index=$((index+1))
done
But I'm sure there is a simpler solution.

simpler solution.
Replace delimiter with newline and get line numbers with grep.
<<<"$line" tr "$delimiter" '\n' | grep -n "$word" | cut -d: -f1
Minus 1:
<<<"$line" tr "$delimiter" '\n' | grep -n "$word" | cut -d: -f1 | awk '{print $1 - 1}'
# shorter
<<<"$line" tr "$delimiter" '\n' | grep -n "$word" | awk -F: '{print $1-1}'
Or really anyway just awk:
<<<"$line" awk -v RS="$delimiter" -v word="$word" '$0 ~ word{print NR-1}'

Understanding from OP's code and/or comments:
looking for the first occurrence of a ${delimiter}-delimited field that starts with ${word}
location index is 0-based
if ${word} is not found we generate no output
OP's code can be further reduced by using the array's 0-based index (ie, eliminate the need for the index variable):
IFS="$delimiter" read -ra ary <<<"$line"
for i in "${!ary[#]}"
do
[[ "${ary[i]}" == ${word}* ]] && echo "${i}" && break
done
# line="this-is-a-line_with-some.txt"
4
# line="a-some_def-xy-some.pdf"
1
NOTE: if ${word} is not found this will generate no output
A variation on this paramater substitution solution from superuser:
newline="${line%%${word}*}" # truncate string from 1st occurrence of ${word}
if [[ "${newline}" != "{line}" ]] # if strings are different then we found ${word}
then
IFS="${delimiter}" words_before=( ${newline} ) # break remaining string by "${delimiter}" and
# store in array words_before[]
echo "${#words_before[#]}" # number of array entries == index of 1st occurrence of ${word}
fi
# line="this-is-a-line_with-some.txt"
4
# line="a-some_def-xy-some.pdf"
1
NOTE: if ${word} is not found this will generate no output
One awk idea:
awk -F"${delimiter}" -v ptn="${word}" '{for (i=1;i<=NF;i++) if (index($i,ptn) == 1) {print i-1; exit}}' <<< "${line}"
# line="this-is-a-line_with-some.txt"
4
# line="a-some_def-xy-some.pdf"
1
Or using an inline replacement for ptn/${word}:
awk -F"${delimiter}" '{for (i=1;i<=NF;i++) if ($i ~ /^'"${word}"'/) {print i-1; exit}}' <<< "${line}"
# line="this-is-a-line_with-some.txt"
4
# line="a-some_def-xy-some.pdf"
1
NOTE: if ${word} is not found these awk scripts will generate no output
To get ideas for the truly shortest piece of code OP could try posting # codegolf, though the really short answers will likely require locating/installing new software (libs and/or binaries)

A solution without loop or external tool :
line="$delimiter$line"; lin2="${line%$delimiter$word*}"
if test "$lin2" != "$line"; then
IFS="$delimiter" read -ra ary <<<"${lin2#$delimiter}"
echo ${#ary[#]}
fi

Shell awk - Print a position from variable

Here is my string that needs to be parsed.
line='aaa vvv ccc'
I need to print the values one by one.
no_of_users=$(echo $line| wc -w)
If the no_of_users is greater than 1 then I need to print the values one by one.
aaa
vvv
ccc
I used this script.
if [ $no_of_users -gt 1 ]
then
for ((n=1;n<=$no_of_users;n++))
do
-- here is my issue ##echo 'user:'$n $line|awk -F ' ' -vno="${n}" 'BEGIN { print no }'
done
fi
In the { print no } I have to print the value in that position.

You may use this awk:
awk 'NF>1 {OFS="\n"; $1=$1} 1' <<< "$line"
aaa
vvv
ccc
What it does:
NF>1: If number of fields are greater than 1
OFS="\n": Set output field separator to \n
$1=$1: Force restructure of a record
1: Print a record

1st solution: Within single awk could you please try following. Where var is an awk variable which has shell variable line value in it.
awk -v var="$line" '
BEGIN{
num=split(var,arr," ")
if(num>1){
for(i=1;i<=num;i++){ print arr[i] }
}
}'
Explanation: Adding detailed explanation for above.
awk -v var="$line" ' ##Starting awk program and creating var variable which has line shell variable value in it.
BEGIN{ ##Starting BEGIN section of program from here.
num=split(var,arr," ") ##Splitting var into array arr here. Saving its total length into variable num to check it later.
if(num>1){ ##Checking condition if num is greater than 1 then do following.
for(i=1;i<=num;i++){ print arr[i] } ##Running for loop from i=1 to till value of num here and printing arr value with index i here.
}
}'
2nd solution: Adding one more solution tested and written in GNU awk.
echo "$line" | awk -v RS= -v OFS="\n" 'NF>1{$1=$1;print}'

Another option:
if [ $no_of_users -gt 1 ]
then
for ((n=1;n<=$no_of_users;n++))
do
echo 'user:'$n $(echo $line|awk -F ' ' -v x=$n '{printf $x }')
done
fi

You can use grep
echo $line | grep -o '[a-z][a-z]*'

Also with awk:
awk '{print $1, $2, $3}' OFS='\n' <<< "$line"
aaa
vvv
ccc
the key is setting OFS='\n'

Or a really toughie:
printf "%s\n" $line
(note: $line is unquoted)
printf will consume all words in line with word-splitting applied so each word is taken as a single input.
Example Use/Output
$ line='aaa vvv ccc'; printf "%s\n" $line
aaa
vvv
ccc

Using bash:
$ line='aaa vvv'ccc'
$ [[ $line =~ \ ]] && echo -e ${line// /\\n}
aaa
vvv
ccc
$ line=aaa
$ [[ $line =~ \ ]] && echo -e ${line// /\\n}
$
If you are on another shell:
$ line="foo bar baz" bash -c '[[ $line =~ \ ]] && echo -e ${line// /\\n}'

grep -Eq '[[:space:]]' <<< "$line" && xargs printf "%s\n" <<< $line
Do a silent grep for a space in the variable, if true, print with names on separate lines.

awk -v OFS='\n' 'NF>1{$1=$1; print}'
e.g.
$ line='aaa vvv ccc'
$ echo "$line" | awk -v OFS='\n' 'NF>1{$1=$1; print}'
aaa
vvv
ccc
$ line='aaa'
$ echo "$line" | awk -v OFS='\n' 'NF>1{$1=$1; print}'
$

another golfed awk variation
$ awk 'gsub(FS,RS)'
only print if there is a substitution.

Trim line to the first comma (bash)

I have a line from which I need to cut the branch name to the first comma:
commit 2bea9e0351dae65f18d2de11621049b465b1e868 (HEAD, origin/MGB-322, refs/pipelines/36877)
I need to cut out MGB-322.
The number of characters in a line is always different.
awk -F "origin/" '{print $2}' - this is how I cut out
MGB-322, refs/pipelines/36877)
But how to tell it to trim to the first comma?
I tried doing it via substr,
awk -F "origin/" '{print substr ($2,1, index $2 ,)}'
But it is not clear how to correctly specify the comma in index

With any awk. Use / and , as field separator:
awk '{print $3}' FS='[/,]' file
Output:
MGB-322
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

With OP's code fix: considered that you have only occurrence of origin in case you have more than occurrence then change $NF to $2 in following code. Written and tested in https://ideone.com/xjv2we
awk -F"origin/" '{print $NF}' Input_file
sed could be also helpful here, generic solution it's based on first occurrence of comma and / as per OP's thread title. I have written this on mobile so couldn't test it as of now should with though and will test it after sometime.
sed 's/\([^,]*\),\([^/]*\)\/\(.*\)/\3/' Input_file

"I need to cut out MGB-322."
You can use cut in two steps:
echo "${line}" | cut -d"/" -f2 | cut -d"," -f1
I would prefer one step with awk (already anwered by others) or sed
echo "${line}" | sed -r 's/.*origin.(.*), refs.*/\1/'

Why spawn procs? bash's built-in parameter parsing will handle this.
If
$: line="commit 2bea9e0351dae65f18d2de11621049b465b1e868 (HEAD, origin/MGB-322, refs/pipelines/36877)"
then
$: [[ "$line" =~ .*origin.(.*), ]] && echo "${BASH_REMATCH[1]}"
MGB-322
or maybe
$: tmp=${line#*, origin/}; echo ${tmp%,*}
MGB-322
or even
$: IFS=",/" read _ _ x _ <<< "$line" && echo $x
MGB-322
c.f. https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html

Trying to retrieve first 5 characters (only number & alphabet) from string in bash

I have a string like that
1-a-bc-dxyz
I'd want to get 1-a-bc-d ( first 5 characters, only number and alphabet)
Thanks

With gawk:
awk '{ for ( i=1;i<=length($0);i++) { if ( match(substr($0,i,1),/[[:alnum:]]/)) { cnt++;if ( cnt==5) { print substr($0,1,i) } } } }' <<< "1-a-bc-dxyz"
Read each character one by one and then if there is a pattern match for an alpha-numeric character (using the match function), increment a variable cnt. When cnt gets to 5, print the string we have seen so far (using the substr function)
Output:
1-a-bc-d

a='1-a-bc-dxyz'
count=0
for ((i=0;i<${#a};i++)); do
if [[ "${a:$i:1}" =~ [0-9]|[a-Z] ]] && [[ $((++count)) -eq 5 ]]; then
echo "${a:0:$((i+1))}"
exit
fi
done
You can further shrink this as;
a='1-a-bc-dxyz'
count=0
for ((i=0;i<${#a};i++)); do [[ "${a:$i:1}" =~ [0-9]|[a-Z] ]] && [[ $((++count)) -eq 5 ]] && echo "${a:0:$((i+1))}"; done

Using GNU awk:
$ echo 1-a-bc-dxyz | \
awk -F '' '{b=i="";while(gsub(/[0-9a-z]/,"&",b)<5)b=b $(++i);print b}'
1-a-bc-d
Explained:
awk -F '' '{ # separate each char to its own field
b=i="" # if you have more than one record to process
while(gsub(/[0-9a-z]/,"&",b)<5) # using gsub for counting (adjust regex if needed)
b=b $(++i) # gather buffer
print b # print buffer
}'

GNU sed supports an option to replace the k-th occurrence and all after that.
echo "1-a-bc-dxyz" | sed 's/[^a-zA-Z0-9]*[a-zA-Z0-9]//g6'

Using Combination of sed & AWK
echo 1-a-bc-dxyz | sed 's/[-*%$##]//g' | awk -F '' {'print $1$2$3$4$5'}
You can use for loop for printing character as well.

echo '1-a-bc-dxyz' | grep -Eo '^[[:print:]](-*[[:print:]]){4}'
That is pretty simple.
Neither sed nor awk.

Split String in Unix Shell Script

I have a String like this
//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf
and want to get last part of
00000000957481f9-08d035805a5c94bf

Let's say you have
text="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
If you know the position, i.e. in this case the 9th, you can go with
echo "$text" | cut -d'/' -f9
However, if this is dynamic and your want to split at "/", it's safer to go with:
echo "${text##*/}"
This removes everything from the beginning to the last occurrence of "/" and should be the shortest form to do it.
For more information on this see: Bash Reference manual
For more information on cut see: cut man page

The tool basename does exactly that:
$ basename //ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf
00000000957481f9-08d035805a5c94bf

I would use bash string function:
$ string="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
$ echo "${string##*/}"
00000000957481f9-08d035805a5c94bf
But following are some other options:
$ awk -F'/' '$0=$NF' <<< "$string"
00000000957481f9-08d035805a5c94bf
$ sed 's#.*/##g' <<< "$string"
00000000957481f9-08d035805a5c94bf
Note: <<< is herestring notation. They do not create a subshell, however, they are NOT portable to POSIX sh (as implemented by shells such as ash or dash).

In case you want more than just the last part of the path,
you could do something like this:
echo $PWD | rev | cut -d'/' -f1-2 | rev

You can use this BASH regex:
s='//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf'
[[ "$s" =~ [^/]+$ ]] && echo "${BASH_REMATCH[0]}"
00000000957481f9-08d035805a5c94bf

This can be done easily in awk:
string="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
echo "${string}" | awk -v FS="/" '{ print $NF }'
Use "/" as field separator and print the last field.

You can try this...
echo //ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf |awk -F "/" '{print $NF}'

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Extracting a substring from a variable using bash script - bash

Probably awk is the most straight-forward approach: awk -F: -v RS=',' '$1==65{print $2}' <<< "$var" 3.0 Or to get the pair: $ awk -F: -v RS=',' '$1==65' <<< "$var" 65:3.0

With grep: grep -o '\b65\b[^,]' <<<"$var" 65:3.0 Or grep -oP '\b65\b:\K[^,]' <<<"$var" 3.0 \K option ignores everything before matched pattern and ignore pattern itself. It's Perl-compatibility(-P) for grep command .

Here is an gnu awk awk -vRS="(^|,)65:" -F, 'NR>1{print $1}' <<< "$var" 3.0

try echo $var | tr , '\n' | awk '/65/' where tr , '\n' turn comma to new line awk '/65/' pick the line with 65 or echo $var | tr , '\n' | awk -F: '$1 == 65 {print $2}' where -F: use : as separator $1 == 65 pick line with 65 as first field { print $2} print second field

Try egrep like below: echo $myvar | egrep -o '\b65:[0-9]+.[0-9]+' |

Related

Getting word index by a delimiter in a variable

Shell awk - Print a position from variable

Trim line to the first comma (bash)

Trying to retrieve first 5 characters (only number & alphabet) from string in bash

Split String in Unix Shell Script

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Extracting a substring from a variable using bash script - bash

Probably awk is the most straight-forward approach: awk -F: -v RS=',' '$1==65{print $2}' <<< "$var" 3.0 Or to get the pair: $ awk -F: -v RS=',' '$1==65' <<< "$var" 65:3.0

With grep: grep -o '\b65\b[^,]*' <<<"$var" 65:3.0 Or grep -oP '\b65\b:\K[^,]*' <<<"$var" 3.0 \K option ignores everything before matched pattern and ignore pattern itself. It's Perl-compatibility(-P) for grep command .

Here is an gnu awk awk -vRS="(^|,)65:" -F, 'NR>1{print $1}' <<< "$var" 3.0

try echo $var | tr , '\n' | awk '/65/' where tr , '\n' turn comma to new line awk '/65/' pick the line with 65 or echo $var | tr , '\n' | awk -F: '$1 == 65 {print $2}' where -F: use : as separator $1 == 65 pick line with 65 as first field { print $2} print second field

Try egrep like below: echo $myvar | egrep -o '\b65:[0-9]+.[0-9]+' |

Related

Getting word index by a delimiter in a variable

Shell awk - Print a position from variable

Trim line to the first comma (bash)

Trying to retrieve first 5 characters (only number & alphabet) from string in bash

Split String in Unix Shell Script

Categories

Resources

With grep: grep -o '\b65\b[^,]' <<<"$var" 65:3.0 Or grep -oP '\b65\b:\K[^,]' <<<"$var" 3.0 \K option ignores everything before matched pattern and ignore pattern itself. It's Perl-compatibility(-P) for grep command .