How to efficiently parse out number from a string in Shell? - shell

For example if I have the following string:
I ate [ 6 ] chicken wings and [ 5 ] dishes of salad today.
I want to parse out 6 and 5 from this string to store to two variables A and B respectively. I am thinking of using [ and ] as the delimiters, and then narrow down to delimiting with spaces.. I am looking for simpler solutions to this. Thanks.

You can do this pretty easily using "sed" to replace all non-digits with spaces and let the shell use the space as the separator:
LINE="hi 1 there, 65 apples and 73 pears"
for i in $(echo $LINE | sed -e "s/[^0-9]/ /g" )
do
echo $i
done
1
65
73
Of course you can assign "i" to any variable you want as well, or you can create an array of your numbers and print them out:
LINE="hi 1 there 65 apples and 73 pears"
nums=($(echo hi 1 there 65 apples and 73 pears | sed -e "s/[^0-9]/ /g" ))
echo ${nums[#]}
1 65 73

grep -o with a perl regex:
line="I ate [ 6 ] chicken wings and [ 5 ] dishes of salad today."
n=( $( echo "$line" | grep -oP '(?<=\[ )\d+(?= \])' ) )
a=${n[0]} b=${n[1]}
You can work with the n array directly too:
for num in "${n[#]}"; do echo $num; done

Related

Get the most frequently occurring number in a bash array

I needed to find the most frequent number in an array. I did it this way:
# our array, the most frequent value is 55
declare -a array=(44 55 55 55 66 66)
# counting unque string with uniq and then sorting as numbers
array=($(printf "%s\n" "${array[#]}"| uniq -c | sort -n -r))
# printing 2nd element of array, as the first one - number of occurencies
printf ${array[1]}
Is it a better/more beautiful way to do it, instead of building a weird array(2nd step) which consists mixed counts and numbers together?
And am I doing sorting correctly? (uniq returns values in 2 columns, so I'm not sure how it chooses the column)
If I had to do this in bash, I would use awk to skip sorting anything and just count the elements:
printf '%s\n' "${array[#]}" | awk '{
if (++arr[$0] > max) {
max=arr[$0];
ans=$0
}
}
END {print ans}'
You can also implement the same algorithm in bash 4 or later using an associative array:
# These don't strictly need to be initialized, but it's safer
# to ensure they don't already have values.
declare -A counts=()
max=0
ans=
for i in "${array[#]}"; do
if ((++counts[$i] > max)); then
max=${counts[$i]}
ans=$i
fi
done
printf '%s\n' "$ans"
If you dont want to use awk to do this, you can still do it with sort and uniq but be careful, you need to have the input ALREADY sorted before counting. Otherwise it will not work. For instance :
declare -a array=(34 3 45 45 66 55 44 55 55 55 66 45 45 8 6 45 45 66 32 9 18)
printf "%s\n" "${array[#]}" | sort -n -r | uniq -c | sort -n -r | head -1 | awk '{print $2}'
where for the given input the code correctly extracts the most repeated number, but in the sample you gave it will not work and it will tell 55 is the most repeated number, although thats wrong, since its clearly 45, but uniq only counts continuous items, if they are sparse it will count them incorrectly.
Regads!
A bit more verbose version of chepner's logic using associative arrays on bash v4+ onward. We build the associative array hashMap with key as array element and the count of its occurrence as the value. Once we build the array, we find from the array having the max count and retrieve its value.
#!/usr/bin/env bash
declare -a array=(44 55 55 55 66 66)
declare -A hashMap
declare -i max=0
for element in "${array[#]}"; do
((hashMap["$element"]++))
done
for key in "${!hashMap[#]}"; do
(( "${hashMap[$key]}" > max )) && { max="${hashMap[$key]}"; element="$key" ; }
done
printf '%d\n' "$element"
another minimalist awk
$ awk '{for(mi=i=1;i<=NF;i++) if(a[$mi]<++a[$i]) mi=i; print $mi}' <<< "${array[#]}"

Shell Script: Arithmetic operation in array

I'm doing for fun and this as part of my learning process in Shell scripting.
Let say I have initial input A B C
What I'm trying to do is to split the string and convert each of them to decimal value.
A B C = 65 66 67
Then I'll add the decimal value to random number, let say number 1.
Now, decimal value will become = 66 67 68
Finally, I'll convert the decimal to the original value again which will become B C D
ubuntu#Ubuntu:~$ cat testscript.sh -n
#!/bin/bash
1 string="ABC"
2
3 echo -e "\nSTRING = $string"
4 echo LENGTH = ${#string}
5
6 # TUKAR STRING KE ARRAY ... word[x]
7 for i in $(seq 0 ${#string})
8 do word[$i]=${string:$i:1}
9 done
10
11 echo -e "\nZero element of array is [ ${word[0]} ]"
12 echo -e "Entire array is [ ${word[#]}] \n"
13
14 # CHAR to DECIMAL
15 for i in $(seq 0 ${#string})
16 do
17 echo -n ${word[$i]}
18 echo -n ${word[$i]} | od -An -tuC
19 chardec[$i]=$(echo -n ${word[$i]} | od -An -tuC)
20 done
21
22 echo -e "\nNEXT, DECIMAL VALUE PLUS ONE"
23 for i in $(seq 0 ${#string})
24 do
25 echo `expr ${chardec[$i]} + 1`
26 done
27
28 echo
This is the output
ubuntu#Ubuntu:~$ ./testscript.sh
STRING = ABC
LENGTH = 3
Zero element of array is [ A ]
Entire array is [ A B C ]
A 65
B 66
C 67
NEXT, DECIMAL VALUE PLUS ONE
66
67
68
1
As you can see in the output, there are 2 problems (or maybe more)
The last for loop processing additional number. Any idea how to fix this?
NEXT, DECIMAL VALUE PLUS ONE
66
67
68
1
This is the formula to convert decimal value to char. I'm trying to put the last value to another array and then put it in another loop for this purpose. However, I'm still have no idea how to do this in loop based on previous data.
ubuntu#Ubuntu:~$ printf "\x$(printf %x 65)\n"
A
Please advise
Using bash you can replace all of your code with this code:
for i; do
printf "\x"$(($(printf '%x' "'$i'") +1))" "
done
echo
When you run it as:
./testscript.sh P Q R S
It will print:
Q R S T
awk to the rescue!
simpler to do the same in awk environment.
$ echo "A B C" |
awk 'BEGIN{for(i=33;i<127;i++) o[sprintf("%c",i)]=i}
{for(i=1;i<=NF;i++) printf "%c%s", o[$i]+1, ((i==NF)?ORS:OFS)}'
B C D
seq is from FIRST to LAST, so if your string length is 3, then seq 0 3 will give you <0,1,2,3>. Your second to last loop (lines 16-20) is actually running four iterations, but the last iteration prints nothing.
To printf the ascii code, insert it inline, like
printf "\x$(printf %x `expr ${chardec[$i]} + 1`) "
or more readably:
dec=`expr ${chardec[$i]} + 1`
printf "\x$(printf %x $dec)\n"

Bash - finding substrings in string

I am new to bash. I have experience in java and python but no experience in bash so I'm struggling with the simplest of tasks.
What I want to achieve is I want to look through the string and find certain sub strings, numbers to be exact. But not all numbers just number that are followed by " xyz". For example:
string="Blah blah boom boom 14 xyz foo bar 12 foo boom 55 XyZ hue hue 15 xyzlkj 45hh."
And I want to find numbers:
14 55 and 15
How would I go about that?
You can use grep with lookahead
echo "$string" | grep -i -P -o '[0-9]+(?= xyz)'
Explanation:
-i – ignore case
-P – interpret pattern as a Perl regular expression
-o – print only matching
[0-9]+(?= xyz) – match one or more numbers followed by xyz
For more information see:
https://linux.die.net/man/1/grep
http://www.regular-expressions.info/lookaround.html
https://github.com/tldr-pages/tldr/blob/master/pages/common/grep.md
grep + cut approach (without PCRE):
echo $string | grep -io '[0-9]* xyz' | cut -d ' ' -f1
The output:
14
55
15

gnu sed - delete lines between first X and last Y lines

the goal is to shorten a large text:
delete everything between the first X lines and the last Y lines
and maybe insert a line like "file truncated to XY lines..." in the middle.
i played around and achieved this with weird redirections ( Pipe output to two different commands ), subshells,
tee and multiple sed invocations and i wonder if
sed -e '10q'
and
sed -e :a -e '$q;N;11,$D;ba'
can be simplified by merging both into a single sed call.
thanks in advance
Use head and tail:
(head -$X infile; echo Truncated; tail -$Y infile) > outfile
Or awk:
awk -v x=$x -v y=$y '{a[++i]=$0}END{for(j=1;j<=x;j++)print a[j];print "Truncated"; for(j=i-y;j<=i;j++)print a[j]}' yourfile
Or you can use tee like this with process substitution if, as you say, input is coming from a pipe:
yourcommand | tee >(head -$x > p1) | tail -$y > p2 ; cat p[12]
You can do it through a magical incantation of tee, process substitutions, and stdio redirections:
x=5 y=8
seq 20 | {
tee >(tail -n $y >&2) \
>({ head -n $x; echo "..."; } >&2) >/dev/null
} 2>&1
1
2
3
4
5
...
13
14
15
16
17
18
19
20
This version is more sequential and the output should be consistent:
x=5 y=8
seq 20 | {
{
# read and print the first X lines to stderr
while ((x-- > 0)); do
IFS= read -r line
echo "$line"
done >&2
echo "..." >&2
# send the rest of the stream on stdout
cat -
} |
# print the last Y lines to stderr, other lines will be discarded
tail -n $y >&2
} 2>&1
You can also use sed -u 5q (with GNU sed) as an unbuffered alternative to head -n5:
$ seq 99|(sed -u 5q;echo ...;tail -n5)
1
2
3
4
5
...
95
96
97
98
99
if you know the length of the file
EndStart=$(( ${FileLen} - ${Y} + 1))
sed -n "1,${X} p
${X} a\\
--- Truncated part ---
${EndStart},$ p" YourFile
This might work for you (GNU sed):
sed '1,5b;:a;N;s/\n/&/8;Ta;$!D;s/[^\n]*\n//;i\*** truncated file ***' file
Here x=5 and Y=8.
N.B. This leaves short files unadulterated.
Here is a sed alternative that does not require knowledge of file length.
You can insert a modified "head" expression into the sliding loop of your "tail" expression. E.g.:
sed ':a; 10s/$/\n...File truncated.../p; $q; N; 11,$D; ba'
Note that if the ranges overlap there will be duplicate lines in the output.
Example:
seq 30 | sed ':a; 10s/$/\n...File truncated.../p; $q; N; 11,$D; ba'
Output:
1
2
3
4
5
6
7
8
9
10
...File truncated...
20
21
22
23
24
25
26
27
28
29
30
Here is a commented multi-line version to explain what is going on:
:a # loop label
10s/$/\n...File truncated.../p # on line 10, replace end of pattern space
$q # quit here when on the last line
N # read next line into pattern space
11,$D # from line 11 to end, delete the first line of pattern space
ba # goto :a

Sed - convert negative to positive numbers

I am trying to convert all negative numbers to positive numbers and have so far come up with this
echo "-32 45 -45 -72" | sed -re 's/\-([0-9])([0-9])\ /\1\2/p'
but it is not working as it outputs:
3245 -45 -72
I thought by using \1\2 I would have got the positive number back ?
Where am I going wrong ?
Why not just remove the -'s?
[root#vm ~]# echo "-32 45 -45 -72" | sed 's/-//g'
32 45 45 72
My first thought is not using sed, if you don't have to. awk can understand that they're numbers and convert them thusly:
echo "-32 45 -45 -72" | awk -vRS=" " -vORS=" " '{ print ($1 < 0) ? ($1 * -1) : $1 }'
-vRS sets the "record separator" to a space, and -vORS sets the "output record separator" to a space. Then it simply checks each value, sees if it's less than 0, and multiplies it by -1 if it is, and if it's not, just prints the number.
In my opinion, if you don't have to use sed, this is more "correct," since it treats numbers like numbers.
This might work for you:
echo "-32 45 -45 -72" | sed 's/-\([0-9]\+\)/\1/g'
Reason why your regex is failing is
Your only doing a single substitution (no g)
Your replacement has no space at the end.
The last number has no space following so it will always fail.
This would work too but less elegantly (and only for 2 digit numbers):
echo "-32 45 -45 -72" | sed -rn 's/-([0-9])([0-9])(\s?)/\1\2\3/gp'
Of course for this example only:
echo "-32 45 -45 -72" | tr -d '-'
You are dealing with numbers as with a string of characters. More appropriate would be to store numbers in an array and use built in Shell Parameter Expansion to remove the minus sign:
[~] $ # Creating and array with an arbitrary name:
[~] $ array17=(-32 45 -45 -72)
[~] $ # Calling all elements of the array and removing the first minus sign:
[~] $ echo ${array17[*]/-}
32 45 45 72
[~] $

Resources