Splitting string into array - bash

I want to split the string and construct the array. I tried the below code:
myString="first column:second column:third column"
set -A myArray `echo $myString | awk 'BEGIN{FS=":"}{for (i=1; i<=NF; i++) print $i}'`
# Following is just to make sure that array is constructed properly
i=0
while [ $i -lt ${#myArray[#]} ]
do
echo "Element $i:${myArray[$i]}"
(( i=i+1 ))
done
exit 0
It produces the following result:
Element 0:first
Element 1:column
Element 2:second
Element 3:column
Element 4:third
Element 5:column
This is not what I want it to be. When I construct the array, I want that array to contain only three elements.
Element 0:first column
Element 1:second column
Element 2:third column
Can you please advise?

Here is how I would approach this problem: use the IFS variable to tell the shell (bash) that you want to split the string into colon-separated tokens.
$ cat split.sh
#!/bin/sh
# Script to split fields into tokens
# Here is the string where tokens separated by colons
s="first column:second column:third column"
IFS=":" # Set the field separator
set $s # Breaks the string into $1, $2, ...
i=0
for item # A for loop by default loop through $1, $2, ...
do
echo "Element $i: $item"
((i++))
done
Run it:
$ ./split.sh
Element 0: first column
Element 1: second column
Element 2: third column

if you definitely want to use arrays in Bash, you can try this way
$ myString="first column:second column:third column"
$ myString="${myString//:/ }" #remove all the colons
$ echo "${myString}"
first column second column third column
$ read -a myArr <<<$myString
$ echo ${myArr[#]}
first column second column third column
$ echo ${myArr[1]}
column
$ echo ${myArr[2]}
second
otherwise, the "better" method is to use awk entirely

Note that saving and restoring IFS as I often seen in these solutions has the side effect that if IFS wasn't set, it ends up changed to being an empty string which causes weird problems with subsequent splitting.
Here's the solution I came up with based on Anton Olsen's extended to handle >2 values separated by a colon. It handles values in the list that have spaces correctly, not splitting on the space.
colon_list=${1} # colon-separate list to split
while true ; do
part=${colon_list%%:*} # Delete longest substring match from back
colon_list=${colon_list#*:} # Delete shortest substring match from front
parts[i++]=$part
# We are done when there is no more colon
if test "$colon_list" = "$part" ; then
break
fi
done
# Show we've split the list
for part in "${parts[#]}"; do
echo $part
done

Ksh or Bash
#! /bin/sh
myString="first column:second column:third column"
IFS=: A=( $myString )
echo ${A[0]}
echo ${A[1]}

Looks like you've already found the solution, but note that you can do away with awk entirely:
myString="first column:second column:third column"
OIFS="$IFS"
IFS=':'
myArray=($myString)
IFS=$OIFS
i=0
while [ $i -lt ${#myArray[#]} ]
do
echo "Element $i:${myArray[$i]}"
(( i=i+1 ))
done

Related

Setting changing env variable from associative array [duplicate]

I'm attempting to read an input file line by line which contains fields delimited by periods.
I want to put them into an array of arrays so I can loop through them later on. The input appears to be ok, but 'pushing' that onto the array (inData) doesn't appear to be working.
The code goes :
Input file:
GSDB.GOSALESDW_DIST_INVENTORY_FACT.MONTH_KEY
GSDB.GOSALESDW_DIST_INVENTORY_FACT.ORGANIZATION_KEY
infile=${1}
OIFS=$IFS
IFS=":"
cat ${infile} | while read line
do
line=${line//\./:}
inarray=(${line})
# echo ${inarray[#]}
# echo ${#inarray[#]}
# echo ${inarray[0]}
# echo ${inarray[1]}
# echo ${inarray[2]}
inData=("${inData[#]}" "${inarray[#]}")
done
IFS=$OIFS
echo ${#inData[#]}
for ((i = 0; i < ${#inData[#]}; i++))
do
echo $i
for ((j = 0; j < ${#inData[$i][#]}; j++))
do
echo ${inData[$i][$j]}
done
done
Field nest box in bash but it can not circumvent see the example.
#!/bin/bash
# requires bash 4 or later; on macOS, /bin/bash is version 3.x,
# so need to install bash 4 or 5 using e.g. https://brew.sh
declare -a pages
pages[0]='domain.de;de;https'
pages[1]='domain.fr;fr;http'
for page in "${pages[#]}"
do
# turn e.g. 'domain.de;de;https' into
# array ['domain.de', 'de', 'https']
IFS=";" read -r -a arr <<< "${page}"
site="${arr[0]}"
lang="${arr[1]}"
prot="${arr[2]}"
echo "site : ${site}"
echo "lang : ${lang}"
echo "prot : ${prot}"
echo
done
Bash has no support for multidimensional arrays. Try
array=(a b c d)
echo ${array[1]}
echo ${array[1][3]}
echo ${array[1]exit}
For tricks how to simulate them, see Advanced Bash Scripting Guide.
Knowing that you can split string into "array". You could creat a list of lists. Like for example a list of databases in DB servers.
dbServersList=('db001:app001,app002,app003' 'db002:app004,app005' 'dbcentral:central')
# Loop over DB servers
for someDbServer in ${dbServersList[#]}
do
# delete previous array/list (this is crucial!)
unset dbNamesList
# split sub-list if available
if [[ $someDbServer == *":"* ]]
then
# split server name from sub-list
tmpServerArray=(${someDbServer//:/ })
someDbServer=${tmpServerArray[0]}
dbNamesList=${tmpServerArray[1]}
# make array from simple string
dbNamesList=(${dbNamesList//,/ })
fi
# Info
echo -e "\n----\n$someDbServer\n--"
# Loop over databases
for someDB in ${dbNamesList[#]}
do
echo $someDB
done
done
Output of above would be:
----
db001
--
app001
app002
app003
----
db002
--
app004
app005
----
dbcentral
--
central
I struggled with this but found an uncomfortable compromise. In general, when faced with a problem whose solution involves using data structures in Bash, you should switch to another language like Python. Ignoring that advice and moving right along:
My use cases usually involve lists of lists (or arrays of arrays) and looping over them. You usually don't want to nest much deeper than that. Also, most of the arrays are strings that may or may not contain spaces, but usually don't contain special characters. This allows me to use not-to-confusing syntax to express the outer array and then use normal bash processing on the strings to get a second list or array. You will need to pay attention to your IFS delimiter, obvi.
Thus, associative arrays can give me a way to create a list of lists like:
declare -A JOB_LIST=(
[job1] = "a set of arguments"
[job2] = "another different list"
...
)
This allows you to iterate over both arrays like:
for job in "${!JOB_LIST[#]}"; do
/bin/jobrun ${job[#]}
done
Ah, except that the output of the keys list (using the magical ${!...}) means that you will not traverse your list in order. Therefore, one more necessary hack is to sort the order of the keys, if that is important to you. The sort order is up to you; I find it convenient to use alphanumerical sorting and resorting to aajob1 bbjob3 ccjob6 is perfectly acceptable.
Therefore
declare -A JOB_LIST=(
[aajob1] = "a set of arguments"
[bbjob2] = "another different list"
...
)
sorted=($(printf '%s\n' "${!JOB_LIST[#]}"| /bin/sort))
for job in "${sorted[#]}"; do
for args in "${job[#]}"; do
echo "Do something with ${arg} in ${job}"
done
done
I use Associative Arrays and use :: in the key to denote depth.
The :: can also be used to embed attributes, but that is another subject,...
declare -A __myArrayOfArray=([Array1::Var1]="Assignment" [Array2::Var1]="Assignment")
An Array under Array1
__myArrayOfArray[Array1::SubArray1::Var1]="Assignment"
The entries in any array can be retrieved (in order ...) by ...
local __sortedKeys=`echo ${!__myArrayOfArray[#]} | xargs -n1 | sort -u | xargs`
for __key in ${__sortedKeys}; do
#
# show all properties in the Subordinate Profile "Array1::SubArray1::"
if [[ ${__key} =~ ^Array1::SubArray1:: ]]; then
__property=${__key##Array1::SubArray1::}
if [[ ${__property} =~ :: ]]; then
echo "Property ${__property%%:*} is a Subordinate array"
else
echo "Property ${__property} is set to: ${__myArrayOfArray[${__key}]}"
fi
fi
done
THE list of subordinate "Profiles" can be derived by:
declare -A __subordinateProfiles=()
local __profile
local __key
for __key in "${!__myArrayOfArray[#]}"; do
if [[ $__key =~ :: ]]; then
local __property=${__key##*:}
__profile=${__key%%:*}
__subordinateProfiles[${__profile}]=1
fi
done
A bash array of arrays is possible, if you convert and store each array as a string using declare -p (see my function stringify). This will properly handle spaces and any other problem characters in your arrays. When you extract an array, use function unstringify to rebuild the array. This script demonstrates an array of arrays:
#!/bin/bash
# BASH array of arrays demo
# Convert an array to a string that can be used to reform
# the array as a new variable. This allows functions to
# return arrays as strings. Works for arrays and associative
# arrays. Spaces and odd characters are all handled by bash
# declare.
# Usage: stringify variableName
# variableName - Name of the array variable e.g. "myArray",
# NOT the array contents.
# Returns (prints) the stringified version of the array.
# Examples. Use declare to make an array:
# declare -a myArray=( "O'Neal, Dan" "Kim, Mary Ann" )
# (Or to make a local variable replace declare with local.)
# Stringify myArray:
# stringifiedArray="$(stringify myArray)"
# Reform the array with any name like reformedArray:
# eval "$(unstringify reformedArray "$stringifiedArray")"
# To stringify an argument list "$#", first create the array
# with a name: declare -a myArgs=( "$#" )
stringify() {
declare -p $1
}
# Reform an array from a stringified array. Actually this prints
# the declare command to form the new array. You need to call
# eval with the result to make the array.
# Usage: eval "$(unstringify newArrayName stringifiedArray [local])"
# Adding the optional "local" will create a local variable
# (uses local instead of declare).
# Example to make array variable named reformedArray from
# stringifiedArray:
# eval "$(unstringify reformedArray "$stringifiedArray")"
unstringify() {
local cmd="declare"
[ -n "$3" ] && cmd="$3"
# This RE pattern extracts 2 things:
# 1: the array type, should be "-a" or "-A"
# 2: stringified contents of the array
# and skips "declare" and the original variable name.
local declareRE='^declare ([^ ]+) [^=]+=(.*)$'
if [[ "$2" =~ $declareRE ]]
then
printf '%s %s %s=%s\n' "$cmd" "${BASH_REMATCH[1]}" "$1" "${BASH_REMATCH[2]}"
else
echo "*** unstringify failed, invalid stringified array:" 1>&2
printf '%s\n' "$2" 1>&2
return 1
fi
}
# array of arrays demo
declare -a array # the array holding the arrays
declare -a row1=( "this is" "row 1" )
declare -a row2=( "row 2" "has problem chars" '!##$%^*(*()-_=+[{]}"|\:;,.<.>?/' )
declare -a row3=( "$#" ) # row3 is the arguments to the script
# Fill the array with each row converted to a string.
# stringify needs the NAME OF THE VARIABLE, not the variable itself
array[0]="$(stringify row1)"
array[1]="$(stringify row2)"
array[2]="$(stringify row3)"
# Print array contents
for row in "${array[#]}"
do
echo "Expanding stringified row: $row"
# Reform the row as the array thisRow
eval "$(unstringify thisRow "$row")"
echo "Row values:"
for val in "${thisRow[#]}"
do
echo " '$val'"
done
done
You could make use of (de)referencing arrays like in this script:
#!/bin/bash
OFS=$IFS # store field separator
IFS="${2: }" # define field separator
file=$1 # input file name
unset a # reference to line array
unset i j # index
unset m n # dimension
### input
i=0
while read line
do
a=A$i
unset $a
declare -a $a='($line)'
i=$((i+1))
done < $file
# store number of lines
m=$i
### output
for ((i=0; i < $m; i++))
do
a=A$i
# get line size
# double escape '\\' for sub shell '``' and 'echo'
n=`eval echo \\${#$a[#]}`
for (( j = 0; j < $n; j++))
do
# get field value
f=`eval echo \\${$a[$j]}`
# do something
echo "line $((i+1)) field $((j+1)) = '$f'"
done
done
IFS=$OFS
Credit to https://unix.stackexchange.com/questions/199348/dynamically-create-array-in-bash-with-variables-as-array-name

parsing var with blank spaces

I have a variable
some_var="a23=some value&p44=another_value&uw=possibly_another one"
and I want to convert it into several substrings one for each = (breaking at the &). So I would get
a23=some value
p44=another_value
uw=possibly_another one
If I run this code
for s in ${some_var//&/ };do echo $s;done
I get however
a23=some
value
p44=another_value
uw=possibly_another
one
(it breaks at the empty spaces)
How can I run the loop so that it takes space into account?
read into an array using the '&' as field separator:
#!/bin/bash
some_var='a23=some value&p44=another_value&uw=possibly_another one'
IFS='&' read -r -a arr <<< "$some_var"
for s in "${arr[#]}"; do
printf '%s\n' "$s"
done
A technique similar to yours also would work:
(IFS='&'; for s in $some_var; do printf '%s\n' "$s"; done)
Notice that it runs in a subshell not to mess up with the IFS of the current shell.
You may consider reading this article for detailed information on IFS.
Populating an associative array with the argument=value pairs:
#!/usr/bin/env bash
uri_argstring='a23=some value&p44=another_value&uw=possibly_another one'
# Declare an empty associative array
declare -A uri_args=()
# Populate the associative array by reading key values pairs
# delimiting field by &, = or newline,
# delimiting records (argument=value or key=value pairs) with & or End Of File
while IFS=$'=&\n' read -r -d '&' k v; do
# Add entry to associative array
uri_args[$k]=$v
done <<<"$uri_argstring" # Feed here-string to the while loop reading
# Fancy printing
printf 'Argument=Value\n--------------\n'
# Iterate the keys from the uri_args associative array
for arg in "${!uri_args[#]}"; do
value=${uri_args[$arg]}
# Printout
printf '%s=%s\n' "$arg" "$value"
done
Output:
Argument=Value
--------------
a23=some value
p44=another_value

Bash to split string and numbering it just like Python enumerate function

I found interesting way to split string using tr or IFS
https://linuxhandbook.com/bash-split-string/
#!/bin/bash
#
# Script to split a string based on the delimiter
my_string="One;Two;Three"
my_array=($(echo $my_string | tr ";" "\n"))
#Print the split string
for i in "${my_array[#]}"
do
echo $i
done
Output
One
Two
Three
Based on this code, would be be possible to put a number in front of the string by using Bash?
In Python, there is enumerate function to accomplish this.
number = ['One', 'Two', 'Three']
for i,j in enumerate(number, 1):
print(f'{i} - {j}')
Output
1 - One
2 - Two
3 - Three
I belive there should be similar tricks can be done in Bash Shell probably with awk or sed, but I just can't think the solution for now.
I think you can just add something like count=$(($count+1))
#!/bin/bash
#
# Script to split a string based on the delimiter
my_string="One;Two;Three"
my_array=($(echo $my_string | tr ";" "\n"))
#Print the split string
count=0
for i in "${my_array[#]}"
do
count=$(($count+1))
echo $count - $i
done
This is a slightly modified version of #anubhava's answer.
y_string="One;Two;Three"
IFS=';' read -ra my_array <<< "$my_string"
# ${!array_name[#]} returns the indices/keys of the array
for i in "${!my_array[#]}"
do
echo "$((i+1)) - ${my_array[i]}"
done
From the bash manual,
It is possible to obtain the keys (indices) of an array as well as the values. ${!name[#]} and ${!name[*]} expand to the indices assigned in array variable name.
I saw you posted a post earlier today, sorry I failed to upload the code but still hope this could help you
my_string="AA-BBB"
IFS='-' read -ra my_array <<< "$my_string"
len=${#my_array[#]}
for (( i=0; i<$len; i++ )); do
up=$(($i % 2))
#echo $up
if [ $up -eq 0 ]
then
echo ${my_array[i]} = '"Country name"'
elif [ $up -eq 1 ]
then
echo ${my_array[i]} = '"City name"'
fi
done
Here is a standard bash way of doing this:
my_string="One;Two;Three"
IFS=';' read -ra my_array <<< "$my_string"
# builds my_array='([0]="One" [1]="Two" [2]="Three")'
# loop through array and print index+1 with element
# ${#my_array[#]} is length of the array
for ((i=0; i<${#my_array[#]}; i++)); do
printf '%d: %s\n' $((i+1)) "${my_array[i]}"
done
1: One
2: Two
3: Three

Find characters with exactly two occurrences?

For example, say the string "test this" was inserted to my application -- I only want the s
I'm thinking along the lines of grep wildcards, but I've never really used them.
You could write a script.
Iterate over each character.
Increment a counter for each character per character seen.
At the end, check your counters for the one which is equal to 2.
Here's a pure bash implementation of alex' suggestion doing what steve did in awk:
#!/bin/bash
# your string
string="test this"
# First, make a character array out of it
for ((i=0; i<"${#string}"; i++)); do # (quotes just for SO higlighting)
chars[$i]="${string:$i:1}" # (could be space, so quoted)
done
# associative array will keep track of the count for each character
declare -A counts
# loop through each character and keep track of its count
for ((i=0; i<"${#chars[#]}"; i++)); do # (quotes just for SO higlighting)
key="${chars[$i]}" # current character
# (could be space, so quoted)
if [ -z counts["$key"] ]; then # if it doesn't exist yet in counts,
counts["$key"]=0; # initialize it to 0
else
((counts["$key"]++)) # if it exists, increment it
fi
done
# loop through each key/value and print all with count 2
for key in "${!counts[#]}"; do
if [ ${counts["$key"]} -eq 2 ]; then
echo "$key"
fi
done
Note that it uses an associative array, which was introduced in Bash 4.0, so this'll only work on that or newer.
One way using GNU awk:
echo "$string" | awk -F '' '{ for (i=1; i<=NF; i++) array[$i]++; for (j in array) if (array[j]==2) print j }'

How to split one string into multiple strings separated by at least one space in bash shell?

I have a string containing many words with at least one space between each two. How can I split the string into individual words so I can loop through them?
The string is passed as an argument. E.g. ${2} == "cat cat file". How can I loop through it?
Also, how can I check if a string contains spaces?
I like the conversion to an array, to be able to access individual elements:
sentence="this is a story"
stringarray=($sentence)
now you can access individual elements directly (it starts with 0):
echo ${stringarray[0]}
or convert back to string in order to loop:
for i in "${stringarray[#]}"
do
:
# do whatever on $i
done
Of course looping through the string directly was answered before, but that answer had the the disadvantage to not keep track of the individual elements for later use:
for i in $sentence
do
:
# do whatever on $i
done
See also Bash Array Reference.
Did you try just passing the string variable to a for loop? Bash, for one, will split on whitespace automatically.
sentence="This is a sentence."
for word in $sentence
do
echo $word
done
This
is
a
sentence.
Probably the easiest and most secure way in BASH 3 and above is:
var="string to split"
read -ra arr <<<"$var"
(where arr is the array which takes the split parts of the string) or, if there might be newlines in the input and you want more than just the first line:
var="string to split"
read -ra arr -d '' <<<"$var"
(please note the space in -d ''; it cannot be omitted), but this might give you an unexpected newline from <<<"$var" (as this implicitly adds an LF at the end).
Example:
touch NOPE
var="* a *"
read -ra arr <<<"$var"
for a in "${arr[#]}"; do echo "[$a]"; done
Outputs the expected
[*]
[a]
[*]
as this solution (in contrast to all previous solutions here) is not prone to unexpected and often uncontrollable shell globbing.
Also this gives you the full power of IFS as you probably want:
Example:
IFS=: read -ra arr < <(grep "^$USER:" /etc/passwd)
for a in "${arr[#]}"; do echo "[$a]"; done
Outputs something like:
[tino]
[x]
[1000]
[1000]
[Valentin Hilbig]
[/home/tino]
[/bin/bash]
As you can see, spaces can be preserved this way, too:
IFS=: read -ra arr <<<' split : this '
for a in "${arr[#]}"; do echo "[$a]"; done
outputs
[ split ]
[ this ]
Please note that the handling of IFS in BASH is a subject on its own, so do your tests; some interesting topics on this:
unset IFS: Ignores runs of SPC, TAB, NL and on line starts and ends
IFS='': No field separation, just reads everything
IFS=' ': Runs of SPC (and SPC only)
Some last examples:
var=$'\n\nthis is\n\n\na test\n\n'
IFS=$'\n' read -ra arr -d '' <<<"$var"
i=0; for a in "${arr[#]}"; do let i++; echo "$i [$a]"; done
outputs
1 [this is]
2 [a test]
while
unset IFS
var=$'\n\nthis is\n\n\na test\n\n'
read -ra arr -d '' <<<"$var"
i=0; for a in "${arr[#]}"; do let i++; echo "$i [$a]"; done
outputs
1 [this]
2 [is]
3 [a]
4 [test]
BTW:
If you are not used to $'ANSI-ESCAPED-STRING' get used to it; it's a timesaver.
If you do not include -r (like in read -a arr <<<"$var") then read does backslash escapes. This is left as exercise for the reader.
For the second question:
To test for something in a string I usually stick to case, as this can check for multiple cases at once (note: case only executes the first match, if you need fallthrough use multiple case statements), and this need is quite often the case (pun intended):
case "$var" in
'') empty_var;; # variable is empty
*' '*) have_space "$var";; # have SPC
*[[:space:]]*) have_whitespace "$var";; # have whitespaces like TAB
*[^-+.,A-Za-z0-9]*) have_nonalnum "$var";; # non-alphanum-chars found
*[-+.,]*) have_punctuation "$var";; # some punctuation chars found
*) default_case "$var";; # if all above does not match
esac
So you can set the return value to check for SPC like this:
case "$var" in (*' '*) true;; (*) false;; esac
Why case? Because it usually is a bit more readable than regex sequences, and thanks to Shell metacharacters it handles 99% of all needs very well.
Just use the shells "set" built-in. For example,
set $text
After that, individual words in $text will be in $1, $2, $3, etc. For robustness, one usually does
set -- junk $text
shift
to handle the case where $text is empty or start with a dash. For example:
text="This is a test"
set -- junk $text
shift
for word; do
echo "[$word]"
done
This prints
[This]
[is]
[a]
[test]
$ echo "This is a sentence." | tr -s " " "\012"
This
is
a
sentence.
For checking for spaces, use grep:
$ echo "This is a sentence." | grep " " > /dev/null
$ echo $?
0
$ echo "Thisisasentence." | grep " " > /dev/null
$ echo $?
1
echo $WORDS | xargs -n1 echo
This outputs every word, you can process that list as you see fit afterwards.
(A) To split a sentence into its words (space separated) you can simply use the default IFS by using
array=( $string )
Example running the following snippet
#!/bin/bash
sentence="this is the \"sentence\" 'you' want to split"
words=( $sentence )
len="${#words[#]}"
echo "words counted: $len"
printf "%s\n" "${words[#]}" ## print array
will output
words counted: 8
this
is
the
"sentence"
'you'
want
to
split
As you can see you can use single or double quotes too without any problem
Notes:
-- this is basically the same of mob's answer, but in this way you store the array for any further needing. If you only need a single loop, you can use his answer, which is one line shorter :)
-- please refer to this question for alternate methods to split a string based on delimiter.
(B) To check for a character in a string you can also use a regular expression match.
Example to check for the presence of a space character you can use:
regex='\s{1,}'
if [[ "$sentence" =~ $regex ]]
then
echo "Space here!";
fi
For checking spaces just with bash:
[[ "$str" = "${str% *}" ]] && echo "no spaces" || echo "has spaces"
$ echo foo bar baz | sed 's/ /\n/g'
foo
bar
baz
For my use case, the best option was:
grep -oP '\w+' file
Basically this is a regular expression that matches contiguous non-whitespace characters. This means that any type and any amount of whitespace won't match. The -o parameter outputs each word matches on a different line.
Another take on this (using Perl):
$ echo foo bar baz | perl -nE 'say for split /\s/'
foo
bar
baz

Resources