shell split a string using variable delimiters - shell

My problem is quite simple but I do not manage to solve it:
I have a string that looks like this:
-3445.51692 -7177.16664 -9945.11057
the tricky part is that there could be zero or more withe space between each number and the latter can be either negative or positive, meaning that the string could also be like:
-3445.51692-7177.16664 -9945.11057
or
-3445.51692 7177.16664-9945.11057
(in case of a positive value there is at least one white space that precedes)
and I would like to split this string into three variables that contains each number, e.g.:
a=-3445.51692
b=-7177.16664
c=-9945.11057
Thus, I wanted to use something like
IFS=' -' read -a array <<< "$string"
but I don't know how to specify "zero or more white space". And using "-" as a delimiter removes it from the final result, while I want to keep the sign.
Any ideas ?

Canonicalize the input before you do the IFS splitting, i.e. any minus gets a space prepended:
canonicalized_string=$(echo "$string" | sed 's/-/ -/g')
set -- $canonicalized_string # No need to mess with IFS.
a=$1
b=$2
c=$3
This assumes exactly 3 numbers. In super-compact form:
set -- $(echo "$string" | sed 's/-/ -/g')
a=$1 b=$2 c=$3

Simply use sed to add a space infront of every -, Something like:
echo $string | sed 's/-/ -/g'

You can use read -a by injecting a space first:
s='-3445.51692-7177.16664 -9945.11057'
IFS=' ' read -ra arr <<< "${s//-/ -}"
printf "[%s]\n" "${arr[#]}"
[-3445.51692]
[-7177.16664]
[-9945.11057]

Related

In bash how can I get the last part of a string after the last hyphen [duplicate]

I have this variable:
A="Some variable has value abc.123"
I need to extract this value i.e abc.123. Is this possible in bash?
Simplest is
echo "$A" | awk '{print $NF}'
Edit: explanation of how this works...
awk breaks the input into different fields, using whitespace as the separator by default. Hardcoding 5 in place of NF prints out the 5th field in the input:
echo "$A" | awk '{print $5}'
NF is a built-in awk variable that gives the total number of fields in the current record. The following returns the number 5 because there are 5 fields in the string "Some variable has value abc.123":
echo "$A" | awk '{print NF}'
Combining $ with NF outputs the last field in the string, no matter how many fields your string contains.
Yes; this:
A="Some variable has value abc.123"
echo "${A##* }"
will print this:
abc.123
(The ${parameter##word} notation is explained in ยง3.5.3 "Shell Parameter Expansion" of the Bash Reference Manual.)
Some examples using parameter expansion
A="Some variable has value abc.123"
echo "${A##* }"
abc.123
Longest match on " " space
echo "${A% *}"
Some variable has value
Longest match on . dot
echo "${A%.*}"
Some variable has value abc
Shortest match on " " space
echo "${A%% *}"
some
Read more Shell-Parameter-Expansion
The documentation is a bit painful to read, so I've summarised it in a simpler way.
Note that the '*' needs to swap places with the ' ' depending on whether you use # or %. (The * is just a wildcard, so you may need to take off your "regex hat" while reading.)
${A% *} - remove shortest trailing * (strip the last word)
${A%% *} - remove longest trailing * (strip the last words)
${A#* } - remove shortest leading * (strip the first word)
${A##* } - remove longest leading * (strip the first words)
Of course a "word" here may contain any character that isn't a literal space.
You might commonly use this syntax to trim filenames:
${A##*/} removes all containing folders, if any, from the start of the path, e.g.
/usr/bin/git -> git
/usr/bin/ -> (empty string)
${A%/*} removes the last file/folder/trailing slash, if any, from the end:
/usr/bin/git -> /usr/bin
/usr/bin/ -> /usr/bin
${A%.*} removes the last extension, if any (just be wary of things like my.path/noext):
archive.tar.gz -> archive.tar
How do you know where the value begins? If it's always the 5th and 6th words, you could use e.g.:
B=$(echo "$A" | cut -d ' ' -f 5-)
This uses the cut command to slice out part of the line, using a simple space as the word delimiter.
As pointed out by Zedfoxus here. A very clean method that works on all Unix-based systems. Besides, you don't need to know the exact position of the substring.
A="Some variable has value abc.123"
echo "$A" | rev | cut -d ' ' -f 1 | rev
# abc.123
More ways to do this:
(Run each of these commands in your terminal to test this live.)
For all answers below, start by typing this in your terminal:
A="Some variable has value abc.123"
The array example (#3 below) is a really useful pattern, and depending on what you are trying to do, sometimes the best.
1. with awk, as the main answer shows
echo "$A" | awk '{print $NF}'
2. with grep:
echo "$A" | grep -o '[^ ]*$'
the -o says to only retain the matching portion of the string
the [^ ] part says "don't match spaces"; ie: "not the space char"
the * means: "match 0 or more instances of the preceding match pattern (which is [^ ]), and the $ means "match the end of the line." So, this matches the last word after the last space through to the end of the line; ie: abc.123 in this case.
3. via regular bash "indexed" arrays and array indexing
Convert A to an array, with elements being separated by the default IFS (Internal Field Separator) char, which is space:
Option 1 (will "break in mysterious ways", as #tripleee put it in a comment here, if the string stored in the A variable contains certain special shell characters, so Option 2 below is recommended instead!):
# Capture space-separated words as separate elements in array A_array
A_array=($A)
Option 2 [RECOMMENDED!]. Use the read command, as I explain in my answer here, and as is recommended by the bash shellcheck static code analyzer tool for shell scripts, in ShellCheck rule SC2206, here.
# Capture space-separated words as separate elements in array A_array, using
# a "herestring".
# See my answer here: https://stackoverflow.com/a/71575442/4561887
IFS=" " read -r -d '' -a A_array <<< "$A"
Then, print only the last elment in the array:
# Print only the last element via bash array right-hand-side indexing syntax
echo "${A_array[-1]}" # last element only
Output:
abc.123
Going further:
What makes this pattern so useful too is that it allows you to easily do the opposite too!: obtain all words except the last one, like this:
array_len="${#A_array[#]}"
array_len_minus_one=$((array_len - 1))
echo "${A_array[#]:0:$array_len_minus_one}"
Output:
Some variable has value
For more on the ${array[#]:start:length} array slicing syntax above, see my answer here: Unix & Linux: Bash: slice of positional parameters, and for more info. on the bash "Arithmetic Expansion" syntax, see here:
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Arithmetic-Expansion
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Shell-Arithmetic
You can use a Bash regex:
A="Some variable has value abc.123"
[[ $A =~ [[:blank:]]([^[:blank:]]+)$ ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
Prints:
abc.123
That works with any [:blank:] delimiter in the current local (Usually [ \t]). If you want to be more specific:
A="Some variable has value abc.123"
pat='[ ]([^ ]+)$'
[[ $A =~ $pat ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
echo "Some variable has value abc.123"| perl -nE'say $1 if /(\S+)$/'

Is there a way to format the width of a substring within a string in a bash/sh script?

I have to format the width of a substring within a string using a bash script, but without using tokens or loops. A single character between two colons should be prepended by a 0 in order to match the standard width of 2 for each field.
For e.g
from:
6:0:36:35:30:30:72:6c:73:0:c:52:4c:30:31:30:31:30:30:30:31:36:39:0:1:3
to
06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03
How can I do this?
sed -r 's/\<([0-9a-f])\>/0\1/g'
Search and replace with a regex. Use \< and \> to match word boundaries so [0-9a-f] only matches single digits.
$ sed -r 's/\<([0-9a-f])\>/0\1/g' <<< "6:0:36:35:30:30:72:6c:73:0:c:52:4c:30:31:30:31:30:30:30:31:36:39:0:1:3"
06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03
awk -F: -v OFS=: '{for(i=1;i<=NF;i++) if(length($i)==1)gsub($i,"0&",$i)}1' file
Output:
06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03
This will divide the whole line into fields separated by : , if the length of any of the field is == 1. then it will replace that field with 0field.
Bash solution:
IFS=:; for i in $string; do echo -n 0$i: | tail -c 3; done
With
str="06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03"
you can add a '0' to all tokens and remove those that are unwanted:
sed -r 's/0([0-9a-f]{2})/\1/g' <<< "0${str//:/:0}"
That doesn't feel right, making errors and repairing them.
A better alternative is
echo $(IFS=:; printf "%2s:" ${str} | tr " " "0")

I want to extract the strings from file name

one_two_three_four_five.rtf
I need five in A variable
I need four in B variable
And remaining in C variable
Should read from the last character
Note after 2 underscore from the last. There could be many underscores but should take has C variable.
Is it possible?
For example using parameter expansion
#!/bin/ksh
string="one_two_three_four_five.rtf"
base=${string%.rtf}
a=${base##*_}; base=${base%_$a}
b=${base##*_}; base=${base%_$b}
c=$base
echo "$a - $b - $c"
s="one_two_three_four_five.rtf"
source <(sed -r 's/(.*)_([^_]*)_([^_]*)[.].*/C="\1"; B="\2";A="\3"/' <<< "${s}")
# Result:
echo "A=$A, B=$B, C=$C"
A=five, B=four, C=one_two_three
Explanation:
sed -r No need for escaping backslashes
(.*)_ Matches largest string until underscore with the condition that there are underscores left for matching the remaining string
([^_]*) String without underscore
[.] A dot without special meaning
"\1" First remembered string
<<< "${s}" Input for sed is like echo "${s}" | sed ...
<(..) Simulates a file, so sourcing these will execute the commands.

print upto second last character in unix

If the length of a string is 5 then how can I print upto 4th character of the string using shell scripting.I have stored the string in a variable and length in other variable.but how can i print upto length -1.
If you are using BASH then it is fairly straight forward to remove last character:
s="string1,string2,"
echo "${s%?}"
? matches any single character and %? removes any character from right hand side.
That will output:
string1,string2
Otherwise you can use this sed to remove last character:
echo "$s" | sed 's/.$//'
string1,string2
You can do it with bash "parameter substitution":
string=12345
new=${string:0:$((${#string}-1))}
echo $new
1234
where I am saying:
new=${string:a:b}
where:
a=0 (meaning starting from the first character)
and:
b=${#string} i.e. the length of the string minus 1, performed in an arithmetic context, i.e. inside `$((...))`
str="something"
echo $str | cut -c1-$((${#str}-1))
will give result as
somethin
If you have two different variables, then you can try this also.
str="something"
strlen=9
echo $str | cut -c1-$((strlen-1))
cut -c1-8 will print from first character to eighth.
Just for fun:
When you have the string and length in vars already,
s="example"
slen=${#s}
you can use
printf "%.$((slen-1))s\n" "$s"
As #anubhava showed, you can also have a clean solution.
So do not try
rev <<< "${s}" | cut -c2- | rev

unix shell replace string twice (in one line)

I run a script with the param -A AA/BB . To get an array with AA and BB, i can do this.
INPUT_PARAM=(${AIRLINE_OPTION//-A / }) #get rid of the '-A ' in the begining
LIST=(${AIRLINES_PARAM//\// }) # split by '/'
Can we achieve this in a single line?
Thanks in advance.
One way
IFS=/ read -r -a LIST <<< "${AIRLINE_OPTION//-A /}"
This places the output from the parameter substitution ${AIRLINE_OPTION//-A /} into a "here-string" and uses the bash read built-in to parse this into an array. Splitting by / is achieved by setting the value of IFS to / for the read command.
LIST=( $(IFS=/; for x in ${AIRLINE_OPTION#-A }; do printf "$x "; done) )
This is a portable solution, but if your read supports -a and you don't mind portability then you should go for #1_CR's solution.
With awk, for example, you can create an array and store it in LIST variable:
$ LIST=($(awk -F"[\/ ]" '{print $2,$3}' <<< "-A AA/BB"))
Result:
$ echo ${LIST[0]}
AA
$ echo ${LIST[1]}
BB
Explanation
-F"[\/ ]" defines two possible field separators: a space or a slash /.
'{print $2$3}' prints the 2nd and 3rd fields based on those separators.

Resources