String separation in unix shell - bash

STRINGS=str1,str2,str3
I want each string seperated by space as given below:
STRING1=`echo $STRINGS | cut -d',' -f1` ==> gives "str1"
REMAINING_STRING=`echo $STRINGS | cut -d',' -f2- |
sed -e 's/,/ /g'` ==> gives "str2 str3"
But when the string contains only one entry, for e.g STRINGS=str1 , then REMAINING_STRING is also populating with same value as STRING1. I want REMAINING_STRING to be null when the STRINGS contain only one entry.
STRINGS=str1
STRING1=`echo $STRINGS| cut -d',' -f1` ==> gives "str1"
REMAINING_STRING=`echo $STRINGS | cut -d',' -f2- | sed -e 's/,/ /g'`
==> gives "str1", But this should come as null.
How to do this in unix shell?

$ STRINGS=str1
$ echo $STRINGS | cut -d',' -f2- | sed -e 's/,/ /g'
str1
$ echo $STRINGS | cut -s -d',' -f2- | sed -e 's/,/ /g'
$
Explanation of -s from the man page.
-s, --only-delimited
do not print lines not containing delimiters

Use the -a flag of the read command, to split the string into an array. Example:
$ cat script.sh
#!/bin/bash
strings=str1,str2,str3
IFS=, read -ra arr <<< "$strings"
echo "First element: ${arr[0]}"
echo "Second element: ${arr[1]}"
strings=str1
IFS=, read -ra arr <<< "$strings"
echo "First element: ${arr[0]}"
echo "Second element: ${arr[1]}"
$ ./script.sh
First element: str1
Second element: str2
First element: str1
Second element:
The alternative method of splitting the string with
IFS=, arr=($strings)
will also work for most strings, but will fail in there is pathname expansion, E.g. arr=(*) would match all files in the current directory (as konsolebox noted).

Extension of user000001's answer:
$ cat strings.sh
#!/bin/bash
function splitstr {
local a
set -f
IFS=, a=($1); str_first=$a; unset a[0]; str_rest="${a[#]}"
set +f
}
splitstr 'one,*,three,four'; echo -e "<$str_first>\t<$str_rest>"
splitstr 'one'; echo -e "<$str_first>\t<$str_rest>"
splitstr ''; echo -e "<$str_first>\t<$str_rest>"
$ ./strings.sh
<one> <* three four>
<one> <>
<> <>

Related

How to get version number from string in bash

I have a variable having following format
bundle="chn-pro-X.Y-Z.el8.x86_64"
X,Y,Z are numbers having any number of digits
Ex:
1.0-2 # X=1 Y=0 Z=2
12.45-9874 # X=12 Y=45 Z=9874
How can I grab X.Y and store it in another variable?
EDIT:
I wasn't right with my wording, but
I want to store X.Y into new variable not individual X & Y's
I'm looking to finally have a variable version which has X.Y grabbed from bundle:
version="X.Y"
I would use awk:
bundle="chn-pro-12.45-9874.el8.x86_64"
echo "$bundle" | awk -F "[.-]" '{print $3,$4,$5}'
12 45 9874
Now if you want to assign to x, y, z use read and process substitution:
read -r x y z < <(echo "$bundle" | awk -F "[.-]" '{print $3,$4,$5}')
echo "x=$x, y=$y, z=$z"
x=12, y=45, z=9874
If you just want the value of X.Y as a single value this is still great use for awk:
bundle="chn-pro-12.45-9874.el8.x86_64"
echo "$bundle" | awk -F "[-]" '{print $3}'
12.45
And if you then want to put that into a variable:
x_y=$(echo "$bundle" | awk -F "[-]" '{print $3}')
echo "x_y=$x_y"
x_y=12.45
Or you can use cut in this case to get the third field:
echo "$bundle" | cut -d- -f3
12.45
Like that:
$ bundle="chn-pro-1.0-2.el8.x86_64"
$ X="$(echo "$bundle" | cut -d . -f1 | cut -d- -f3)"
$ Y="$(echo "$bundle" | cut -d . -f2 | cut -d- -f1)"
$ Z="$(echo "$bundle" | cut -d . -f2 | cut -d- -f2)"
$ echo "$X"
1
$ echo "$Y"
0
$ echo "$Z"
2
You can merge X and Y into a single variable:
$ XY="$X.$Y"
$ echo $XY
1.0
Use regex to separate numbers:
numbers=$(echo $bundle | grep -Eo '([0-9]+\.[0-9]+\-[0-9]+)' | sed 's/\./\t/g;s/\-/\t/g')
Then assign them to variables with using awk or tr or cut, whatever you want:
X=$(echo $numbers| awk '{print $1}')
Y=$(echo $numbers| awk '{print $2}')
Z=$(echo $numbers| awk '{print $3}')
EDIT
For storing x.y into single version variable you can simply ignore pervios commands:
version=$(echo $bundle | grep -Eo '([0-9]+\.[0-9]+\-[0-9]+)' | grep -Eo '([0-9]+\.[0-9]+)')
Given this input:
$ bundle="chn-pro-12.45-9874.el8.x86_64"
using GNU or BSD sed for -E:
$ foo=$(echo "$bundle" | sed -E 's/.*-([0-9]+\.[0-9]+)-[0-9].*/\1/')
$ echo "$foo"
12.45
or with any sed:
$ foo=$(echo "$bundle" | sed 's/.*-\([0-9][0-9]*\.[0-9][0-9]*\)-[0-9].*/\1/')
$ echo "$foo"
12.45
Assumptions:
the input string will always contain (at least) 3 hyphens
the desired version string will always reside between the 2nd and 3rd hyphens of the input string
we need to maintain the input string (ie, don't clobber/overwrite the variable containing the input string)
We can eliminate the subprocess calls (necessary for echo/sed/grep/awk/sed) by using some parameter expansions:
$ bundle="chn-pro-X.Y-Z.el8.x86_64"
$ temp="${bundle#*-}" # strip off 1st hyphen delimited string
$ echo "${temp}"
pro-X.Y-Z.el8.x86_64
$ temp="${temp#*-}" # strip off 2nd hyphen delimited string
$ echo "${temp}"
X.Y-Z.el8.x86_64
$ version="${temp%%-*}" # save 3rd hyphen delimited string (aka our version)
$ echo "${version}"
X.Y
NOTE: We can eliminate the temp variable by replacing all occurrences of temp with version with the understanding version does not contain what we want until after the 3rd parameter expansion has occurred, eg:
$ bundle="chn-pro-X.Y-Z.el8.x86_64"
$ version="${bundle#*-}"
$ version="${version#*-}"
$ version="${version%%-*}"
$ echo "${version}"
X.Y

Extract data between delimiters from a Shell Script variable

I have this shell script variable, var. It keeps 3 entries separated by new line. From this variable var, I want to extract 2, and 0.078688. Just these two numbers.
var="USER_ID=2
# 0.078688
Suhas"
These are the code I tried:
echo "$var" | grep -o -P '(?<=\=).*(?=\n)' # For extracting 2
echo "$var" | awk -v FS="(# |\n)" '{print $2}' # For extracting 0.078688
None of the above working. What is the problem here? How to fix this ?
Just use tr alone for retaining the numerical digits, the dot (.) and the white-space and remove everything else.
tr -cd '0-9. ' <<<"$var"
2 0.078688
From the man page, of tr for usage of -c, -d flags,
tr [OPTION]... SET1 [SET2]
-c, -C, --complement
use the complement of SET1
-d, --delete
delete characters in SET1, do not translate
To store it in variables,
IFS=' ' read -r var1 var2 < <(tr -cd '0-9. ' <<<"$var")
printf "%s\n" "$var1"
2
printf "%s\n" "$var2"
2
0.078688
Or in an array as
IFS=' ' read -ra numArray < <(tr -cd '0-9. ' <<<"$var")
printf "%s\n" "${numArray[#]}"
2
0.078688
Note:- The -cd flags in tr are POSIX compliant and will work on any systems that has tr installed.
echo "$var" |grep -oP 'USER_ID=\K.*'
2
echo "$var" |grep -oP '# \K.*'
0.078688
Your solution is near to perfect, you need to chance \n to $ which represent end of line.
echo "$var" |awk -F'# ' '/#/{print $2}'
0.078688
echo "$var" |awk -F'=' '/USER_ID/{print $2}'
2
You can do it with pure bash using a regex:
#!/bin/bash
var="USER_ID=2
# 0.078688
Suhas"
[[ ${var} =~ =([0-9]+).*#[[:space:]]([0-9\.]+) ]] && result1="${BASH_REMATCH[1]}" && result2="${BASH_REMATCH[2]}"
echo "${result1}"
echo "${result2}"
With awk:
First value:
echo "$var" | grep 'USER_ID' | awk -F "=" '{print $2}'
Second value:
echo "$var" | grep '#' | awk '{print $2}'
Assuming this is the format of data as your sample
# For extracting 2
echo "$var" | sed -e '/.*=/!d' -e 's///'
echo "$var" | awk -F '=' 'NR==1{ print $2}'
# For extracting 0.078688
echo "$var" | sed -e '/.*#[[:blank:]]*/!d' -e 's///'
echo "$var" | awk -F '#' 'NR==2{ print $2}'

how do I split a string on the nth delimiter?

For every line in my file, I want to print everything on that line before the 4th dash.
Input:
TCGA-HC-8216-10A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01
and I want to split each line on the fourth dash "-"
Output:
TCGA-HC-8216-10A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A
I know I can split on every dash like this:
#!/usr/bin/env bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo $IN | tr "-" "\n")
for x in $arr
do
echo "> [$x]"
done
but this splits and prints each part of the string between every dash.
Use cut
cut -d- -f1-4 <<'EOF'
TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01
EOF
You are cutting your input on -d (delimiter) of - and returning -f (fields) 1-4, one through four.
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo "$IN" | cut -d '-' -f1-4)
echo "$arr"
Prints:
TCGA-HC-8216-01A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A
Using pure bash and pattern matching:
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
re='([^-]+-){3}[^-]+'
for line in $IN
do
if [[ $line =~ $re ]]; then
trunc=${BASH_REMATCH[0]}
fi
echo "$trunc"
done
Output:
TCGA-HC-8216-01A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A
Using grep with ERE:
arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*")
With BRE:
arr=$(echo "$IN" | grep -o "^\([^-]*-\)\{3\}[^-]*")
Example:
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*")
for x in $arr
do
echo "> [$x]"
done
Output:
> [TCGA-HC-8216-01A]
> [TCGA-J4-8200-10A]
> [TCGA-EJ-A65E-10A]

replacement for cut --output-delimiter

I created a script that was using
cut -d',' -f- --output-delimiter=$'\n'
to add a newline for each command separated value in RHEL 5, for e.g.
[root]# var="hi,hello how,are you,doing"
[root]# echo $var
hi,hello how,are you,doing
[root]# echo $var|cut -d',' -f- --output-delimiter=$'\n'
hi
hello how
are you
doing
But unfortunately when I run the same command in Solaris 10, it doesn't work at all :( !
bash-3.00# var="hi,hello how,are you,doing"
bash-3.00# echo $var
hi,hello how,are you,doing
bash-3.00# echo $var|cut -d',' -f- --output-delimiter=$'\n'
cut: illegal option -- output-delimiter=
usage: cut -b list [-n] [filename ...]
cut -c list [filename ...]
cut -f list [-d delim] [-s] [filename]
I checked the man page for 'cut' and alas there is no ' --output-delimiter ' in there !
So how do I achieve this in Solaris 10 (bash)? I guess awk would be a solution, but I'm unable to frame up the options properly.
Note: The comma separated variables might have " " space in them.
What about using tr for this?
$ tr ',' '\n' <<< "$var"
hi
hello how
are you
doing
or
$ echo $var | tr ',' '\n'
hi
hello how
are you
doing
With sed:
$ sed 's/,/\n/g' <<< "$var"
hi
hello how
are you
doing
Or with awk:
$ awk '1' RS=, <<< "$var"
hi
hello how
are you
doing
Perhaps do it in bash itself?
var="hi,hello how,are you,doing"
printf "$var" | (IFS=, read -r -a arr; printf "%s\n" "${arr[#]}")
hi
hello how
are you
doing

Redirect output to a bash array

I have a file containing the string
ipAddress=10.78.90.137;10.78.90.149
I'd like to place these two IP addresses in a bash array. To achieve that I tried the following:
n=$(grep -i ipaddress /opt/ipfile | cut -d'=' -f2 | tr ';' ' ')
This results in extracting the values alright but for some reason the size of the array is returned as 1 and I notice that both the values are identified as the first element in the array. That is
echo ${n[0]}
returns
10.78.90.137 10.78.90.149
How do I fix this?
Thanks for the help!
do you really need an array
bash
$ ipAddress="10.78.90.137;10.78.90.149"
$ IFS=";"
$ set -- $ipAddress
$ echo $1
10.78.90.137
$ echo $2
10.78.90.149
$ unset IFS
$ echo $# #this is "array"
if you want to put into array
$ a=( $# )
$ echo ${a[0]}
10.78.90.137
$ echo ${a[1]}
10.78.90.149
#OP, regarding your method: set your IFS to a space
$ IFS=" "
$ n=( $(grep -i ipaddress file | cut -d'=' -f2 | tr ';' ' ' | sed 's/"//g' ) )
$ echo ${n[1]}
10.78.90.149
$ echo ${n[0]}
10.78.90.137
$ unset IFS
Also, there is no need to use so many tools. you can just use awk, or simply the bash shell
#!/bin/bash
declare -a arr
while IFS="=" read -r caption addresses
do
case "$caption" in
ipAddress*)
addresses=${addresses//[\"]/}
arr=( ${arr[#]} ${addresses//;/ } )
esac
done < "file"
echo ${arr[#]}
output
$ more file
foo
bar
ipAddress="10.78.91.138;10.78.90.150;10.77.1.101"
foo1
ipAddress="10.78.90.137;10.78.90.149"
bar1
$./shell.sh
10.78.91.138 10.78.90.150 10.77.1.101 10.78.90.137 10.78.90.149
gawk
$ n=( $(gawk -F"=" '/ipAddress/{gsub(/\"/,"",$2);gsub(/;/," ",$2) ;printf $2" "}' file) )
$ echo ${n[#]}
10.78.91.138 10.78.90.150 10.77.1.101 10.78.90.137 10.78.90.149
This one works:
n=(`grep -i ipaddress filename | cut -d"=" -f2 | tr ';' ' '`)
EDIT: (improved, nestable version as per Dennis)
n=($(grep -i ipaddress filename | cut -d"=" -f2 | tr ';' ' '))
A variation on a theme:
$ line=$(grep -i ipaddress /opt/ipfile)
$ saveIFS="$IFS" # always save it and put it back to be safe
$ IFS="=;"
$ n=($line)
$ IFS="$saveIFS"
$ echo ${n[0]}
ipAddress
$ echo ${n[1]}
10.78.90.137
$ echo ${n[2]}
10.78.90.149
If the file has no other contents, you may not need the grep and you could read in the whole file.
$ saveIFS="$IFS"
$ IFS="=;"
$ n=$(</opt/ipfile)
$ IFS="$saveIFS"
A Perl solution:
n=($(perl -ne 's/ipAddress=(.*);/$1 / && print' filename))
which tests for and removes the unwanted characters in one operation.
You can do this by using IFS in bash.
First read the first line from file.
Seoncd convert that to an array with = as delimeter.
Third convert the value to an array with ; as delimeter.
Thats it !!!
#!/bin/bash
IFS='\n' read -r lstr < "a.txt"
IFS='=' read -r -a lstr_arr <<< $lstr
IFS=';' read -r -a ip_arr <<< ${lstr_arr[1]}
echo ${ip_arr[0]}
echo ${ip_arr[1]}

Resources