How to perform a for loop on each character in a string in Bash? - bash

I have a variable like this:
words="这是一条狗。"
I want to make a for loop on each of the characters, one at a time, e.g. first character="这", then character="是", character="一", etc.
The only way I know is to output each character to separate line in a file, then use while read line, but this seems very inefficient.
How can I process each character in a string through a for loop?

You can use a C-style for loop:
foo=string
for (( i=0; i<${#foo}; i++ )); do
echo "${foo:$i:1}"
done
${#foo} expands to the length of foo. ${foo:$i:1} expands to the substring starting at position $i of length 1.

With sed on dash shell of LANG=en_US.UTF-8, I got the followings working right:
$ echo "你好嗎 新年好。全型句號" | sed -e 's/\(.\)/\1\n/g'
你
好
嗎
新
年
好
。
全
型
句
號
and
$ echo "Hello world" | sed -e 's/\(.\)/\1\n/g'
H
e
l
l
o
w
o
r
l
d
Thus, output can be looped with while read ... ; do ... ; done
edited for sample text translate into English:
"你好嗎 新年好。全型句號" is zh_TW.UTF-8 encoding for:
"你好嗎" = How are you[ doing]
" " = a normal space character
"新年好" = Happy new year
"。全型空格" = a double-byte-sized full-stop followed by text description

${#var} returns the length of var
${var:pos:N} returns N characters from pos onwards
Examples:
$ words="abc"
$ echo ${words:0:1}
a
$ echo ${words:1:1}
b
$ echo ${words:2:1}
c
so it is easy to iterate.
another way:
$ grep -o . <<< "abc"
a
b
c
or
$ grep -o . <<< "abc" | while read letter; do echo "my letter is $letter" ; done
my letter is a
my letter is b
my letter is c

I'm surprised no one has mentioned the obvious bash solution utilizing only while and read.
while read -n1 character; do
echo "$character"
done < <(echo -n "$words")
Note the use of echo -n to avoid the extraneous newline at the end. printf is another good option and may be more suitable for your particular needs. If you want to ignore whitespace then replace "$words" with "${words// /}".
Another option is fold. Please note however that it should never be fed into a for loop. Rather, use a while loop as follows:
while read char; do
echo "$char"
done < <(fold -w1 <<<"$words")
The primary benefit to using the external fold command (of the coreutils package) would be brevity. You can feed it's output to another command such as xargs (part of the findutils package) as follows:
fold -w1 <<<"$words" | xargs -I% -- echo %
You'll want to replace the echo command used in the example above with the command you'd like to run against each character. Note that xargs will discard whitespace by default. You can use -d '\n' to disable that behavior.
Internationalization
I just tested fold with some of the Asian characters and realized it doesn't have Unicode support. So while it is fine for ASCII needs, it won't work for everyone. In that case there are some alternatives.
I'd probably replace fold -w1 with an awk array:
awk 'BEGIN{FS=""} {for (i=1;i<=NF;i++) print $i}'
Or the grep command mentioned in another answer:
grep -o .
Performance
FYI, I benchmarked the 3 aforementioned options. The first two were fast, nearly tying, with the fold loop slightly faster than the while loop. Unsurprisingly xargs was the slowest... 75x slower.
Here is the (abbreviated) test code:
words=$(python -c 'from string import ascii_letters as l; print(l * 100)')
testrunner(){
for test in test_while_loop test_fold_loop test_fold_xargs test_awk_loop test_grep_loop; do
echo "$test"
(time for (( i=1; i<$((${1:-100} + 1)); i++ )); do "$test"; done >/dev/null) 2>&1 | sed '/^$/d'
echo
done
}
testrunner 100
Here are the results:
test_while_loop
real 0m5.821s
user 0m5.322s
sys 0m0.526s
test_fold_loop
real 0m6.051s
user 0m5.260s
sys 0m0.822s
test_fold_xargs
real 7m13.444s
user 0m24.531s
sys 6m44.704s
test_awk_loop
real 0m6.507s
user 0m5.858s
sys 0m0.788s
test_grep_loop
real 0m6.179s
user 0m5.409s
sys 0m0.921s

I believe there is still no ideal solution that would correctly preserve all whitespace characters and is fast enough, so I'll post my answer. Using ${foo:$i:1} works, but is very slow, which is especially noticeable with large strings, as I will show below.
My idea is an expansion of a method proposed by Six, which involves read -n1, with some changes to keep all characters and work correctly for any string:
while IFS='' read -r -d '' -n 1 char; do
# do something with $char
done < <(printf %s "$string")
How it works:
IFS='' - Redefining internal field separator to empty string prevents stripping of spaces and tabs. Doing it on a same line as read means that it will not affect other shell commands.
-r - Means "raw", which prevents read from treating \ at the end of the line as a special line concatenation character.
-d '' - Passing empty string as a delimiter prevents read from stripping newline characters. Actually means that null byte is used as a delimiter. -d '' is equal to -d $'\0'.
-n 1 - Means that one character at a time will be read.
printf %s "$string" - Using printf instead of echo -n is safer, because echo treats -n and -e as options. If you pass "-e" as a string, echo will not print anything.
< <(...) - Passing string to the loop using process substitution. If you use here-strings instead (done <<< "$string"), an extra newline character is appended at the end. Also, passing string through a pipe (printf %s "$string" | while ...) would make the loop run in a subshell, which means all variable operations are local within the loop.
Now, let's test the performance with a huge string.
I used the following file as a source:
https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt
The following script was called through time command:
#!/bin/bash
# Saving contents of the file into a variable named `string'.
# This is for test purposes only. In real code, you should use
# `done < "filename"' construct if you wish to read from a file.
# Using `string="$(cat makefiles.txt)"' would strip trailing newlines.
IFS='' read -r -d '' string < makefiles.txt
while IFS='' read -r -d '' -n 1 char; do
# remake the string by adding one character at a time
new_string+="$char"
done < <(printf %s "$string")
# confirm that new string is identical to the original
diff -u makefiles.txt <(printf %s "$new_string")
And the result is:
$ time ./test.sh
real 0m1.161s
user 0m1.036s
sys 0m0.116s
As we can see, it is quite fast.
Next, I replaced the loop with one that uses parameter expansion:
for (( i=0 ; i<${#string}; i++ )); do
new_string+="${string:$i:1}"
done
The output shows exactly how bad the performance loss is:
$ time ./test.sh
real 2m38.540s
user 2m34.916s
sys 0m3.576s
The exact numbers may very on different systems, but the overall picture should be similar.

I've only tested this with ascii strings, but you could do something like:
while test -n "$words"; do
c=${words:0:1} # Get the first character
echo character is "'$c'"
words=${words:1} # trim the first character
done

It is also possible to split the string into a character array using fold and then iterate over this array:
for char in `echo "这是一条狗。" | fold -w1`; do
echo $char
done

The C style loop in #chepner's answer is in the shell function update_terminal_cwd, and the grep -o . solution is clever, but I was surprised not to see a solution using seq. Here's mine:
read word
for i in $(seq 1 ${#word}); do
echo "${word:i-1:1}"
done

#!/bin/bash
word=$(echo 'Your Message' |fold -w 1)
for letter in ${word} ; do echo "${letter} is a letter"; done
Here is the output:
Y is a letter
o is a letter
u is a letter
r is a letter
M is a letter
e is a letter
s is a letter
s is a letter
a is a letter
g is a letter
e is a letter

To iterate ASCII characters on a POSIX-compliant shell, you can avoid external tools by using the Parameter Expansions:
#!/bin/sh
str="Hello World!"
while [ ${#str} -gt 0 ]; do
next=${str#?}
echo "${str%$next}"
str=$next
done
or
str="Hello World!"
while [ -n "$str" ]; do
next=${str#?}
echo "${str%$next}"
str=$next
done

sed works with unicode
IFS=$'\n'
for z in $(sed 's/./&\n/g' <(printf '你好嗎')); do
echo hello: "$z"
done
outputs
hello: 你
hello: 好
hello: 嗎

Another approach, if you don't care about whitespace being ignored:
for char in $(sed -E s/'(.)'/'\1 '/g <<<"$your_string"); do
# Handle $char here
done

Another way is:
Characters="TESTING"
index=1
while [ $index -le ${#Characters} ]
do
echo ${Characters} | cut -c${index}-${index}
index=$(expr $index + 1)
done

fold and while read are great for the job as shown in some answers here. Contrary to those answers, I think it's much more intuitive to pipe in the order of execution:
echo "asdfg" | fold -w 1 | while read c; do
echo -n "$c "
done
Outputs: a s d f g

I share my solution:
read word
for char in $(grep -o . <<<"$word") ; do
echo $char
done

TEXT="hello world"
for i in {1..${#TEXT}}; do
echo ${TEXT[i]}
done
where {1..N} is an inclusive range
${#TEXT} is a number of letters in a string
${TEXT[i]} - you can get char from string like an item from an array

Related

How do I select each information in one line with delimiters [duplicate]

I have this string stored in a variable:
IN="bla#some.com;john#home.com"
Now I would like to split the strings by ; delimiter so that I have:
ADDR1="bla#some.com"
ADDR2="john#home.com"
I don't necessarily need the ADDR1 and ADDR2 variables. If they are elements of an array that's even better.
After suggestions from the answers below, I ended up with the following which is what I was after:
#!/usr/bin/env bash
IN="bla#some.com;john#home.com"
mails=$(echo $IN | tr ";" "\n")
for addr in $mails
do
echo "> [$addr]"
done
Output:
> [bla#some.com]
> [john#home.com]
There was a solution involving setting Internal_field_separator (IFS) to ;. I am not sure what happened with that answer, how do you reset IFS back to default?
RE: IFS solution, I tried this and it works, I keep the old IFS and then restore it:
IN="bla#some.com;john#home.com"
OIFS=$IFS
IFS=';'
mails2=$IN
for x in $mails2
do
echo "> [$x]"
done
IFS=$OIFS
BTW, when I tried
mails2=($IN)
I only got the first string when printing it in loop, without brackets around $IN it works.
You can set the internal field separator (IFS) variable, and then let it parse into an array. When this happens in a command, then the assignment to IFS only takes place to that single command's environment (to read ). It then parses the input according to the IFS variable value into an array, which we can then iterate over.
This example will parse one line of items separated by ;, pushing it into an array:
IFS=';' read -ra ADDR <<< "$IN"
for i in "${ADDR[#]}"; do
# process "$i"
done
This other example is for processing the whole content of $IN, each time one line of input separated by ;:
while IFS=';' read -ra ADDR; do
for i in "${ADDR[#]}"; do
# process "$i"
done
done <<< "$IN"
Taken from Bash shell script split array:
IN="bla#some.com;john#home.com"
arrIN=(${IN//;/ })
echo ${arrIN[1]} # Output: john#home.com
Explanation:
This construction replaces all occurrences of ';' (the initial // means global replace) in the string IN with ' ' (a single space), then interprets the space-delimited string as an array (that's what the surrounding parentheses do).
The syntax used inside of the curly braces to replace each ';' character with a ' ' character is called Parameter Expansion.
There are some common gotchas:
If the original string has spaces, you will need to use IFS:
IFS=':'; arrIN=($IN); unset IFS;
If the original string has spaces and the delimiter is a new line, you can set IFS with:
IFS=$'\n'; arrIN=($IN); unset IFS;
I've seen a couple of answers referencing the cut command, but they've all been deleted. It's a little odd that nobody has elaborated on that, because I think it's one of the more useful commands for doing this type of thing, especially for parsing delimited log files.
In the case of splitting this specific example into a bash script array, tr is probably more efficient, but cut can be used, and is more effective if you want to pull specific fields from the middle.
Example:
$ echo "bla#some.com;john#home.com" | cut -d ";" -f 1
bla#some.com
$ echo "bla#some.com;john#home.com" | cut -d ";" -f 2
john#home.com
You can obviously put that into a loop, and iterate the -f parameter to pull each field independently.
This gets more useful when you have a delimited log file with rows like this:
2015-04-27|12345|some action|an attribute|meta data
cut is very handy to be able to cat this file and select a particular field for further processing.
If you don't mind processing them immediately, I like to do this:
for i in $(echo $IN | tr ";" "\n")
do
# process
done
You could use this kind of loop to initialize an array, but there's probably an easier way to do it.
Compatible answer
There are a lot of different ways to do this in bash.
However, it's important to first note that bash has many special features (so-called bashisms) that won't work in any other shell.
In particular, arrays, associative arrays, and pattern substitution, which are used in the solutions in this post as well as others in the thread, are bashisms and may not work under other shells that many people use.
For instance: on my Debian GNU/Linux, there is a standard shell called dash; I know many people who like to use another shell called ksh; and there is also a special tool called busybox with his own shell interpreter (ash).
For posix shell compatible answer, go to last part of this answer!
Requested string
The string to be split in the above question is:
IN="bla#some.com;john#home.com"
I will use a modified version of this string to ensure that my solution is robust to strings containing whitespace, which could break other solutions:
IN="bla#some.com;john#home.com;Full Name <fulnam#other.org>"
Split string based on delimiter in bash (version >=4.2)
In pure bash, we can create an array with elements split by a temporary value for IFS (the input field separator). The IFS, among other things, tells bash which character(s) it should treat as a delimiter between elements when defining an array:
IN="bla#some.com;john#home.com;Full Name <fulnam#other.org>"
# save original IFS value so we can restore it later
oIFS="$IFS"
IFS=";"
declare -a fields=($IN)
IFS="$oIFS"
unset oIFS
In newer versions of bash, prefixing a command with an IFS definition changes the IFS for that command only and resets it to the previous value immediately afterwards. This means we can do the above in just one line:
IFS=\; read -a fields <<<"$IN"
# after this command, the IFS resets back to its previous value (here, the default):
set | grep ^IFS=
# IFS=$' \t\n'
We can see that the string IN has been stored into an array named fields, split on the semicolons:
set | grep ^fields=\\\|^IN=
# fields=([0]="bla#some.com" [1]="john#home.com" [2]="Full Name <fulnam#other.org>")
# IN='bla#some.com;john#home.com;Full Name <fulnam#other.org>'
(We can also display the contents of these variables using declare -p:)
declare -p IN fields
# declare -- IN="bla#some.com;john#home.com;Full Name <fulnam#other.org>"
# declare -a fields=([0]="bla#some.com" [1]="john#home.com" [2]="Full Name <fulnam#other.org>")
Note that read is the quickest way to do the split because there are no forks or external resources called.
Once the array is defined, you can use a simple loop to process each field (or, rather, each element in the array you've now defined):
# `"${fields[#]}"` expands to return every element of `fields` array as a separate argument
for x in "${fields[#]}" ;do
echo "> [$x]"
done
# > [bla#some.com]
# > [john#home.com]
# > [Full Name <fulnam#other.org>]
Or you could drop each field from the array after processing using a shifting approach, which I like:
while [ "$fields" ] ;do
echo "> [$fields]"
# slice the array
fields=("${fields[#]:1}")
done
# > [bla#some.com]
# > [john#home.com]
# > [Full Name <fulnam#other.org>]
And if you just want a simple printout of the array, you don't even need to loop over it:
printf "> [%s]\n" "${fields[#]}"
# > [bla#some.com]
# > [john#home.com]
# > [Full Name <fulnam#other.org>]
Update: recent bash >= 4.4
In newer versions of bash, you can also play with the command mapfile:
mapfile -td \; fields < <(printf "%s\0" "$IN")
This syntax preserve special chars, newlines and empty fields!
If you don't want to include empty fields, you could do the following:
mapfile -td \; fields <<<"$IN"
fields=("${fields[#]%$'\n'}") # drop '\n' added by '<<<'
With mapfile, you can also skip declaring an array and implicitly "loop" over the delimited elements, calling a function on each:
myPubliMail() {
printf "Seq: %6d: Sending mail to '%s'..." $1 "$2"
# mail -s "This is not a spam..." "$2" </path/to/body
printf "\e[3D, done.\n"
}
mapfile < <(printf "%s\0" "$IN") -td \; -c 1 -C myPubliMail
(Note: the \0 at end of the format string is useless if you don't care about empty fields at end of the string or they're not present.)
mapfile < <(echo -n "$IN") -td \; -c 1 -C myPubliMail
# Seq: 0: Sending mail to 'bla#some.com', done.
# Seq: 1: Sending mail to 'john#home.com', done.
# Seq: 2: Sending mail to 'Full Name <fulnam#other.org>', done.
Or you could use <<<, and in the function body include some processing to drop the newline it adds:
myPubliMail() {
local seq=$1 dest="${2%$'\n'}"
printf "Seq: %6d: Sending mail to '%s'..." $seq "$dest"
# mail -s "This is not a spam..." "$dest" </path/to/body
printf "\e[3D, done.\n"
}
mapfile <<<"$IN" -td \; -c 1 -C myPubliMail
# Renders the same output:
# Seq: 0: Sending mail to 'bla#some.com', done.
# Seq: 1: Sending mail to 'john#home.com', done.
# Seq: 2: Sending mail to 'Full Name <fulnam#other.org>', done.
Split string based on delimiter in shell
If you can't use bash, or if you want to write something that can be used in many different shells, you often can't use bashisms -- and this includes the arrays we've been using in the solutions above.
However, we don't need to use arrays to loop over "elements" of a string. There is a syntax used in many shells for deleting substrings of a string from the first or last occurrence of a pattern. Note that * is a wildcard that stands for zero or more characters:
(The lack of this approach in any solution posted so far is the main reason I'm writing this answer ;)
${var#*SubStr} # drops substring from start of string up to first occurrence of `SubStr`
${var##*SubStr} # drops substring from start of string up to last occurrence of `SubStr`
${var%SubStr*} # drops substring from last occurrence of `SubStr` to end of string
${var%%SubStr*} # drops substring from first occurrence of `SubStr` to end of string
As explained by Score_Under:
# and % delete the shortest possible matching substring from the start and end of the string respectively, and
## and %% delete the longest possible matching substring.
Using the above syntax, we can create an approach where we extract substring "elements" from the string by deleting the substrings up to or after the delimiter.
The codeblock below works well in bash (including Mac OS's bash), dash, ksh, lksh, yash, zsh, and busybox's ash:
(Thanks to Adam Katz's comment, making this loop a lot simplier!)
IN="bla#some.com;john#home.com;Full Name <fulnam#other.org>"
while [ "$IN" != "$iter" ] ;do
# extract the substring from start of string up to delimiter.
iter=${IN%%;*}
# delete this first "element" AND next separator, from $IN.
IN="${IN#$iter;}"
# Print (or doing anything with) the first "element".
printf '> [%s]\n' "$iter"
done
# > [bla#some.com]
# > [john#home.com]
# > [Full Name <fulnam#other.org>]
Why not cut?
cut is usefull for extracting columns in big files, but doing forks repetitively (var=$(echo ... | cut ...)) become quickly overkill!
Here is a correct syntax, tested under many posix shell using cut, as suggested by This other answer from DougW:
IN="bla#some.com;john#home.com;Full Name <fulnam#other.org>"
i=1
while iter=$(echo "$IN"|cut -d\; -f$i) ; [ -n "$iter" ] ;do
printf '> [%s]\n' "$iter"
i=$((i+1))
done
I wrote this in order to compare execution time.
On my raspberrypi, this look like:
$ export TIMEFORMAT=$'(%U + %S) / \e[1m%R\e[0m : %P '
$ time sh splitDemo.sh >/dev/null
(0.000 + 0.019) / 0.019 : 99.63
$ time sh splitDemo_cut.sh >/dev/null
(0.051 + 0.041) / 0.188 : 48.98
Where overall execution time is something like 10x longer, using 1 forks to cut, by field!
This worked for me:
string="1;2"
echo $string | cut -d';' -f1 # output is 1
echo $string | cut -d';' -f2 # output is 2
I think AWK is the best and efficient command to resolve your problem. AWK is included by default in almost every Linux distribution.
echo "bla#some.com;john#home.com" | awk -F';' '{print $1,$2}'
will give
bla#some.com john#home.com
Of course your can store each email address by redefining the awk print field.
How about this approach:
IN="bla#some.com;john#home.com"
set -- "$IN"
IFS=";"; declare -a Array=($*)
echo "${Array[#]}"
echo "${Array[0]}"
echo "${Array[1]}"
Source
echo "bla#some.com;john#home.com" | sed -e 's/;/\n/g'
bla#some.com
john#home.com
This also works:
IN="bla#some.com;john#home.com"
echo ADD1=`echo $IN | cut -d \; -f 1`
echo ADD2=`echo $IN | cut -d \; -f 2`
Be careful, this solution is not always correct. In case you pass "bla#some.com" only, it will assign it to both ADD1 and ADD2.
A different take on Darron's answer, this is how I do it:
IN="bla#some.com;john#home.com"
read ADDR1 ADDR2 <<<$(IFS=";"; echo $IN)
How about this one liner, if you're not using arrays:
IFS=';' read ADDR1 ADDR2 <<<$IN
In Bash, a bullet proof way, that will work even if your variable contains newlines:
IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
Look:
$ in=$'one;two three;*;there is\na newline\nin this field'
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two three" [2]="*" [3]="there is
a newline
in this field")'
The trick for this to work is to use the -d option of read (delimiter) with an empty delimiter, so that read is forced to read everything it's fed. And we feed read with exactly the content of the variable in, with no trailing newline thanks to printf. Note that's we're also putting the delimiter in printf to ensure that the string passed to read has a trailing delimiter. Without it, read would trim potential trailing empty fields:
$ in='one;two;three;' # there's an empty field
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two" [2]="three" [3]="")'
the trailing empty field is preserved.
Update for Bash≥4.4
Since Bash 4.4, the builtin mapfile (aka readarray) supports the -d option to specify a delimiter. Hence another canonical way is:
mapfile -d ';' -t array < <(printf '%s;' "$in")
Without setting the IFS
If you just have one colon you can do that:
a="foo:bar"
b=${a%:*}
c=${a##*:}
you will get:
b = foo
c = bar
Here is a clean 3-liner:
in="foo#bar;bizz#buzz;fizz#buzz;buzz#woof"
IFS=';' list=($in)
for item in "${list[#]}"; do echo $item; done
where IFS delimit words based on the separator and () is used to create an array. Then [#] is used to return each item as a separate word.
If you've any code after that, you also need to restore $IFS, e.g. unset IFS.
The following Bash/zsh function splits its first argument on the delimiter given by the second argument:
split() {
local string="$1"
local delimiter="$2"
if [ -n "$string" ]; then
local part
while read -d "$delimiter" part; do
echo $part
done <<< "$string"
echo $part
fi
}
For instance, the command
$ split 'a;b;c' ';'
yields
a
b
c
This output may, for instance, be piped to other commands. Example:
$ split 'a;b;c' ';' | cat -n
1 a
2 b
3 c
Compared to the other solutions given, this one has the following advantages:
IFS is not overriden: Due to dynamic scoping of even local variables, overriding IFS over a loop causes the new value to leak into function calls performed from within the loop.
Arrays are not used: Reading a string into an array using read requires the flag -a in Bash and -A in zsh.
If desired, the function may be put into a script as follows:
#!/usr/bin/env bash
split() {
# ...
}
split "$#"
you can apply awk to many situations
echo "bla#some.com;john#home.com"|awk -F';' '{printf "%s\n%s\n", $1, $2}'
also you can use this
echo "bla#some.com;john#home.com"|awk -F';' '{print $1,$2}' OFS="\n"
There is a simple and smart way like this:
echo "add:sfff" | xargs -d: -i echo {}
But you must use gnu xargs, BSD xargs cant support -d delim. If you use apple mac like me. You can install gnu xargs :
brew install findutils
then
echo "add:sfff" | gxargs -d: -i echo {}
So many answers and so many complexities. Try out a simpler solution:
echo "string1, string2" | tr , "\n"
tr (read, translate) replaces the first argument with the second argument in the input.
So tr , "\n" replace the comma with new line character in the input and it becomes:
string1
string2
There are some cool answers here (errator esp.), but for something analogous to split in other languages -- which is what I took the original question to mean -- I settled on this:
IN="bla#some.com;john#home.com"
declare -a a="(${IN//;/ })";
Now ${a[0]}, ${a[1]}, etc, are as you would expect. Use ${#a[*]} for number of terms. Or to iterate, of course:
for i in ${a[*]}; do echo $i; done
IMPORTANT NOTE:
This works in cases where there are no spaces to worry about, which solved my problem, but may not solve yours. Go with the $IFS solution(s) in that case.
If no space, Why not this?
IN="bla#some.com;john#home.com"
arr=(`echo $IN | tr ';' ' '`)
echo ${arr[0]}
echo ${arr[1]}
This is the simplest way to do it.
spo='one;two;three'
OIFS=$IFS
IFS=';'
spo_array=($spo)
IFS=$OIFS
echo ${spo_array[*]}
Apart from the fantastic answers that were already provided, if it is just a matter of printing out the data you may consider using awk:
awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"
This sets the field separator to ;, so that it can loop through the fields with a for loop and print accordingly.
Test
$ IN="bla#some.com;john#home.com"
$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"
> [bla#some.com]
> [john#home.com]
With another input:
$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "a;b;c d;e_;f"
> [a]
> [b]
> [c d]
> [e_]
> [f]
IN="bla#some.com;john#home.com"
IFS=';'
read -a IN_arr <<< "${IN}"
for entry in "${IN_arr[#]}"
do
echo $entry
done
Output
bla#some.com
john#home.com
System : Ubuntu 12.04.1
Use the set built-in to load up the $# array:
IN="bla#some.com;john#home.com"
IFS=';'; set $IN; IFS=$' \t\n'
Then, let the party begin:
echo $#
for a; do echo $a; done
ADDR1=$1 ADDR2=$2
Two bourne-ish alternatives where neither require bash arrays:
Case 1: Keep it nice and simple: Use a NewLine as the Record-Separator... eg.
IN="bla#some.com
john#home.com"
while read i; do
# process "$i" ... eg.
echo "[email:$i]"
done <<< "$IN"
Note: in this first case no sub-process is forked to assist with list manipulation.
Idea: Maybe it is worth using NL extensively internally, and only converting to a different RS when generating the final result externally.
Case 2: Using a ";" as a record separator... eg.
NL="
" IRS=";" ORS=";"
conv_IRS() {
exec tr "$1" "$NL"
}
conv_ORS() {
exec tr "$NL" "$1"
}
IN="bla#some.com;john#home.com"
IN="$(conv_IRS ";" <<< "$IN")"
while read i; do
# process "$i" ... eg.
echo -n "[email:$i]$ORS"
done <<< "$IN"
In both cases a sub-list can be composed within the loop is persistent after the loop has completed. This is useful when manipulating lists in memory, instead storing lists in files. {p.s. keep calm and carry on B-) }
In Android shell, most of the proposed methods just do not work:
$ IFS=':' read -ra ADDR <<<"$PATH"
/system/bin/sh: can't create temporary file /sqlite_stmt_journals/mksh.EbNoR10629: No such file or directory
What does work is:
$ for i in ${PATH//:/ }; do echo $i; done
/sbin
/vendor/bin
/system/sbin
/system/bin
/system/xbin
where // means global replacement.
IN='bla#some.com;john#home.com;Charlie Brown <cbrown#acme.com;!"#$%&/()[]{}*? are no problem;simple is beautiful :-)'
set -f
oldifs="$IFS"
IFS=';'; arrayIN=($IN)
IFS="$oldifs"
for i in "${arrayIN[#]}"; do
echo "$i"
done
set +f
Output:
bla#some.com
john#home.com
Charlie Brown <cbrown#acme.com
!"#$%&/()[]{}*? are no problem
simple is beautiful :-)
Explanation: Simple assignment using parenthesis () converts semicolon separated list into an array provided you have correct IFS while doing that. Standard FOR loop handles individual items in that array as usual.
Notice that the list given for IN variable must be "hard" quoted, that is, with single ticks.
IFS must be saved and restored since Bash does not treat an assignment the same way as a command. An alternate workaround is to wrap the assignment inside a function and call that function with a modified IFS. In that case separate saving/restoring of IFS is not needed. Thanks for "Bize" for pointing that out.
Here's my answer!
DELIMITER_VAL='='
read -d '' F_ABOUT_DISTRO_R <<"EOF"
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS"
NAME="Ubuntu"
VERSION="14.04.4 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.4 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
EOF
SPLIT_NOW=$(awk -F$DELIMITER_VAL '{for(i=1;i<=NF;i++){printf "%s\n", $i}}' <<<"${F_ABOUT_DISTRO_R}")
while read -r line; do
SPLIT+=("$line")
done <<< "$SPLIT_NOW"
for i in "${SPLIT[#]}"; do
echo "$i"
done
Why this approach is "the best" for me?
Because of two reasons:
You do not need to escape the delimiter;
You will not have problem with blank spaces. The value will be properly separated in the array.
A one-liner to split a string separated by ';' into an array is:
IN="bla#some.com;john#home.com"
ADDRS=( $(IFS=";" echo "$IN") )
echo ${ADDRS[0]}
echo ${ADDRS[1]}
This only sets IFS in a subshell, so you don't have to worry about saving and restoring its value.

Bash script to filter value from key value [duplicate]

I have this string stored in a variable:
IN="bla#some.com;john#home.com"
Now I would like to split the strings by ; delimiter so that I have:
ADDR1="bla#some.com"
ADDR2="john#home.com"
I don't necessarily need the ADDR1 and ADDR2 variables. If they are elements of an array that's even better.
After suggestions from the answers below, I ended up with the following which is what I was after:
#!/usr/bin/env bash
IN="bla#some.com;john#home.com"
mails=$(echo $IN | tr ";" "\n")
for addr in $mails
do
echo "> [$addr]"
done
Output:
> [bla#some.com]
> [john#home.com]
There was a solution involving setting Internal_field_separator (IFS) to ;. I am not sure what happened with that answer, how do you reset IFS back to default?
RE: IFS solution, I tried this and it works, I keep the old IFS and then restore it:
IN="bla#some.com;john#home.com"
OIFS=$IFS
IFS=';'
mails2=$IN
for x in $mails2
do
echo "> [$x]"
done
IFS=$OIFS
BTW, when I tried
mails2=($IN)
I only got the first string when printing it in loop, without brackets around $IN it works.
You can set the internal field separator (IFS) variable, and then let it parse into an array. When this happens in a command, then the assignment to IFS only takes place to that single command's environment (to read ). It then parses the input according to the IFS variable value into an array, which we can then iterate over.
This example will parse one line of items separated by ;, pushing it into an array:
IFS=';' read -ra ADDR <<< "$IN"
for i in "${ADDR[#]}"; do
# process "$i"
done
This other example is for processing the whole content of $IN, each time one line of input separated by ;:
while IFS=';' read -ra ADDR; do
for i in "${ADDR[#]}"; do
# process "$i"
done
done <<< "$IN"
Taken from Bash shell script split array:
IN="bla#some.com;john#home.com"
arrIN=(${IN//;/ })
echo ${arrIN[1]} # Output: john#home.com
Explanation:
This construction replaces all occurrences of ';' (the initial // means global replace) in the string IN with ' ' (a single space), then interprets the space-delimited string as an array (that's what the surrounding parentheses do).
The syntax used inside of the curly braces to replace each ';' character with a ' ' character is called Parameter Expansion.
There are some common gotchas:
If the original string has spaces, you will need to use IFS:
IFS=':'; arrIN=($IN); unset IFS;
If the original string has spaces and the delimiter is a new line, you can set IFS with:
IFS=$'\n'; arrIN=($IN); unset IFS;
I've seen a couple of answers referencing the cut command, but they've all been deleted. It's a little odd that nobody has elaborated on that, because I think it's one of the more useful commands for doing this type of thing, especially for parsing delimited log files.
In the case of splitting this specific example into a bash script array, tr is probably more efficient, but cut can be used, and is more effective if you want to pull specific fields from the middle.
Example:
$ echo "bla#some.com;john#home.com" | cut -d ";" -f 1
bla#some.com
$ echo "bla#some.com;john#home.com" | cut -d ";" -f 2
john#home.com
You can obviously put that into a loop, and iterate the -f parameter to pull each field independently.
This gets more useful when you have a delimited log file with rows like this:
2015-04-27|12345|some action|an attribute|meta data
cut is very handy to be able to cat this file and select a particular field for further processing.
If you don't mind processing them immediately, I like to do this:
for i in $(echo $IN | tr ";" "\n")
do
# process
done
You could use this kind of loop to initialize an array, but there's probably an easier way to do it.
Compatible answer
There are a lot of different ways to do this in bash.
However, it's important to first note that bash has many special features (so-called bashisms) that won't work in any other shell.
In particular, arrays, associative arrays, and pattern substitution, which are used in the solutions in this post as well as others in the thread, are bashisms and may not work under other shells that many people use.
For instance: on my Debian GNU/Linux, there is a standard shell called dash; I know many people who like to use another shell called ksh; and there is also a special tool called busybox with his own shell interpreter (ash).
For posix shell compatible answer, go to last part of this answer!
Requested string
The string to be split in the above question is:
IN="bla#some.com;john#home.com"
I will use a modified version of this string to ensure that my solution is robust to strings containing whitespace, which could break other solutions:
IN="bla#some.com;john#home.com;Full Name <fulnam#other.org>"
Split string based on delimiter in bash (version >=4.2)
In pure bash, we can create an array with elements split by a temporary value for IFS (the input field separator). The IFS, among other things, tells bash which character(s) it should treat as a delimiter between elements when defining an array:
IN="bla#some.com;john#home.com;Full Name <fulnam#other.org>"
# save original IFS value so we can restore it later
oIFS="$IFS"
IFS=";"
declare -a fields=($IN)
IFS="$oIFS"
unset oIFS
In newer versions of bash, prefixing a command with an IFS definition changes the IFS for that command only and resets it to the previous value immediately afterwards. This means we can do the above in just one line:
IFS=\; read -a fields <<<"$IN"
# after this command, the IFS resets back to its previous value (here, the default):
set | grep ^IFS=
# IFS=$' \t\n'
We can see that the string IN has been stored into an array named fields, split on the semicolons:
set | grep ^fields=\\\|^IN=
# fields=([0]="bla#some.com" [1]="john#home.com" [2]="Full Name <fulnam#other.org>")
# IN='bla#some.com;john#home.com;Full Name <fulnam#other.org>'
(We can also display the contents of these variables using declare -p:)
declare -p IN fields
# declare -- IN="bla#some.com;john#home.com;Full Name <fulnam#other.org>"
# declare -a fields=([0]="bla#some.com" [1]="john#home.com" [2]="Full Name <fulnam#other.org>")
Note that read is the quickest way to do the split because there are no forks or external resources called.
Once the array is defined, you can use a simple loop to process each field (or, rather, each element in the array you've now defined):
# `"${fields[#]}"` expands to return every element of `fields` array as a separate argument
for x in "${fields[#]}" ;do
echo "> [$x]"
done
# > [bla#some.com]
# > [john#home.com]
# > [Full Name <fulnam#other.org>]
Or you could drop each field from the array after processing using a shifting approach, which I like:
while [ "$fields" ] ;do
echo "> [$fields]"
# slice the array
fields=("${fields[#]:1}")
done
# > [bla#some.com]
# > [john#home.com]
# > [Full Name <fulnam#other.org>]
And if you just want a simple printout of the array, you don't even need to loop over it:
printf "> [%s]\n" "${fields[#]}"
# > [bla#some.com]
# > [john#home.com]
# > [Full Name <fulnam#other.org>]
Update: recent bash >= 4.4
In newer versions of bash, you can also play with the command mapfile:
mapfile -td \; fields < <(printf "%s\0" "$IN")
This syntax preserve special chars, newlines and empty fields!
If you don't want to include empty fields, you could do the following:
mapfile -td \; fields <<<"$IN"
fields=("${fields[#]%$'\n'}") # drop '\n' added by '<<<'
With mapfile, you can also skip declaring an array and implicitly "loop" over the delimited elements, calling a function on each:
myPubliMail() {
printf "Seq: %6d: Sending mail to '%s'..." $1 "$2"
# mail -s "This is not a spam..." "$2" </path/to/body
printf "\e[3D, done.\n"
}
mapfile < <(printf "%s\0" "$IN") -td \; -c 1 -C myPubliMail
(Note: the \0 at end of the format string is useless if you don't care about empty fields at end of the string or they're not present.)
mapfile < <(echo -n "$IN") -td \; -c 1 -C myPubliMail
# Seq: 0: Sending mail to 'bla#some.com', done.
# Seq: 1: Sending mail to 'john#home.com', done.
# Seq: 2: Sending mail to 'Full Name <fulnam#other.org>', done.
Or you could use <<<, and in the function body include some processing to drop the newline it adds:
myPubliMail() {
local seq=$1 dest="${2%$'\n'}"
printf "Seq: %6d: Sending mail to '%s'..." $seq "$dest"
# mail -s "This is not a spam..." "$dest" </path/to/body
printf "\e[3D, done.\n"
}
mapfile <<<"$IN" -td \; -c 1 -C myPubliMail
# Renders the same output:
# Seq: 0: Sending mail to 'bla#some.com', done.
# Seq: 1: Sending mail to 'john#home.com', done.
# Seq: 2: Sending mail to 'Full Name <fulnam#other.org>', done.
Split string based on delimiter in shell
If you can't use bash, or if you want to write something that can be used in many different shells, you often can't use bashisms -- and this includes the arrays we've been using in the solutions above.
However, we don't need to use arrays to loop over "elements" of a string. There is a syntax used in many shells for deleting substrings of a string from the first or last occurrence of a pattern. Note that * is a wildcard that stands for zero or more characters:
(The lack of this approach in any solution posted so far is the main reason I'm writing this answer ;)
${var#*SubStr} # drops substring from start of string up to first occurrence of `SubStr`
${var##*SubStr} # drops substring from start of string up to last occurrence of `SubStr`
${var%SubStr*} # drops substring from last occurrence of `SubStr` to end of string
${var%%SubStr*} # drops substring from first occurrence of `SubStr` to end of string
As explained by Score_Under:
# and % delete the shortest possible matching substring from the start and end of the string respectively, and
## and %% delete the longest possible matching substring.
Using the above syntax, we can create an approach where we extract substring "elements" from the string by deleting the substrings up to or after the delimiter.
The codeblock below works well in bash (including Mac OS's bash), dash, ksh, lksh, yash, zsh, and busybox's ash:
(Thanks to Adam Katz's comment, making this loop a lot simplier!)
IN="bla#some.com;john#home.com;Full Name <fulnam#other.org>"
while [ "$IN" != "$iter" ] ;do
# extract the substring from start of string up to delimiter.
iter=${IN%%;*}
# delete this first "element" AND next separator, from $IN.
IN="${IN#$iter;}"
# Print (or doing anything with) the first "element".
printf '> [%s]\n' "$iter"
done
# > [bla#some.com]
# > [john#home.com]
# > [Full Name <fulnam#other.org>]
Why not cut?
cut is usefull for extracting columns in big files, but doing forks repetitively (var=$(echo ... | cut ...)) become quickly overkill!
Here is a correct syntax, tested under many posix shell using cut, as suggested by This other answer from DougW:
IN="bla#some.com;john#home.com;Full Name <fulnam#other.org>"
i=1
while iter=$(echo "$IN"|cut -d\; -f$i) ; [ -n "$iter" ] ;do
printf '> [%s]\n' "$iter"
i=$((i+1))
done
I wrote this in order to compare execution time.
On my raspberrypi, this look like:
$ export TIMEFORMAT=$'(%U + %S) / \e[1m%R\e[0m : %P '
$ time sh splitDemo.sh >/dev/null
(0.000 + 0.019) / 0.019 : 99.63
$ time sh splitDemo_cut.sh >/dev/null
(0.051 + 0.041) / 0.188 : 48.98
Where overall execution time is something like 10x longer, using 1 forks to cut, by field!
This worked for me:
string="1;2"
echo $string | cut -d';' -f1 # output is 1
echo $string | cut -d';' -f2 # output is 2
I think AWK is the best and efficient command to resolve your problem. AWK is included by default in almost every Linux distribution.
echo "bla#some.com;john#home.com" | awk -F';' '{print $1,$2}'
will give
bla#some.com john#home.com
Of course your can store each email address by redefining the awk print field.
How about this approach:
IN="bla#some.com;john#home.com"
set -- "$IN"
IFS=";"; declare -a Array=($*)
echo "${Array[#]}"
echo "${Array[0]}"
echo "${Array[1]}"
Source
echo "bla#some.com;john#home.com" | sed -e 's/;/\n/g'
bla#some.com
john#home.com
This also works:
IN="bla#some.com;john#home.com"
echo ADD1=`echo $IN | cut -d \; -f 1`
echo ADD2=`echo $IN | cut -d \; -f 2`
Be careful, this solution is not always correct. In case you pass "bla#some.com" only, it will assign it to both ADD1 and ADD2.
A different take on Darron's answer, this is how I do it:
IN="bla#some.com;john#home.com"
read ADDR1 ADDR2 <<<$(IFS=";"; echo $IN)
How about this one liner, if you're not using arrays:
IFS=';' read ADDR1 ADDR2 <<<$IN
In Bash, a bullet proof way, that will work even if your variable contains newlines:
IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
Look:
$ in=$'one;two three;*;there is\na newline\nin this field'
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two three" [2]="*" [3]="there is
a newline
in this field")'
The trick for this to work is to use the -d option of read (delimiter) with an empty delimiter, so that read is forced to read everything it's fed. And we feed read with exactly the content of the variable in, with no trailing newline thanks to printf. Note that's we're also putting the delimiter in printf to ensure that the string passed to read has a trailing delimiter. Without it, read would trim potential trailing empty fields:
$ in='one;two;three;' # there's an empty field
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two" [2]="three" [3]="")'
the trailing empty field is preserved.
Update for Bash≥4.4
Since Bash 4.4, the builtin mapfile (aka readarray) supports the -d option to specify a delimiter. Hence another canonical way is:
mapfile -d ';' -t array < <(printf '%s;' "$in")
Without setting the IFS
If you just have one colon you can do that:
a="foo:bar"
b=${a%:*}
c=${a##*:}
you will get:
b = foo
c = bar
Here is a clean 3-liner:
in="foo#bar;bizz#buzz;fizz#buzz;buzz#woof"
IFS=';' list=($in)
for item in "${list[#]}"; do echo $item; done
where IFS delimit words based on the separator and () is used to create an array. Then [#] is used to return each item as a separate word.
If you've any code after that, you also need to restore $IFS, e.g. unset IFS.
The following Bash/zsh function splits its first argument on the delimiter given by the second argument:
split() {
local string="$1"
local delimiter="$2"
if [ -n "$string" ]; then
local part
while read -d "$delimiter" part; do
echo $part
done <<< "$string"
echo $part
fi
}
For instance, the command
$ split 'a;b;c' ';'
yields
a
b
c
This output may, for instance, be piped to other commands. Example:
$ split 'a;b;c' ';' | cat -n
1 a
2 b
3 c
Compared to the other solutions given, this one has the following advantages:
IFS is not overriden: Due to dynamic scoping of even local variables, overriding IFS over a loop causes the new value to leak into function calls performed from within the loop.
Arrays are not used: Reading a string into an array using read requires the flag -a in Bash and -A in zsh.
If desired, the function may be put into a script as follows:
#!/usr/bin/env bash
split() {
# ...
}
split "$#"
you can apply awk to many situations
echo "bla#some.com;john#home.com"|awk -F';' '{printf "%s\n%s\n", $1, $2}'
also you can use this
echo "bla#some.com;john#home.com"|awk -F';' '{print $1,$2}' OFS="\n"
There is a simple and smart way like this:
echo "add:sfff" | xargs -d: -i echo {}
But you must use gnu xargs, BSD xargs cant support -d delim. If you use apple mac like me. You can install gnu xargs :
brew install findutils
then
echo "add:sfff" | gxargs -d: -i echo {}
There are some cool answers here (errator esp.), but for something analogous to split in other languages -- which is what I took the original question to mean -- I settled on this:
IN="bla#some.com;john#home.com"
declare -a a="(${IN//;/ })";
Now ${a[0]}, ${a[1]}, etc, are as you would expect. Use ${#a[*]} for number of terms. Or to iterate, of course:
for i in ${a[*]}; do echo $i; done
IMPORTANT NOTE:
This works in cases where there are no spaces to worry about, which solved my problem, but may not solve yours. Go with the $IFS solution(s) in that case.
So many answers and so many complexities. Try out a simpler solution:
echo "string1, string2" | tr , "\n"
tr (read, translate) replaces the first argument with the second argument in the input.
So tr , "\n" replace the comma with new line character in the input and it becomes:
string1
string2
If no space, Why not this?
IN="bla#some.com;john#home.com"
arr=(`echo $IN | tr ';' ' '`)
echo ${arr[0]}
echo ${arr[1]}
This is the simplest way to do it.
spo='one;two;three'
OIFS=$IFS
IFS=';'
spo_array=($spo)
IFS=$OIFS
echo ${spo_array[*]}
Apart from the fantastic answers that were already provided, if it is just a matter of printing out the data you may consider using awk:
awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"
This sets the field separator to ;, so that it can loop through the fields with a for loop and print accordingly.
Test
$ IN="bla#some.com;john#home.com"
$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"
> [bla#some.com]
> [john#home.com]
With another input:
$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "a;b;c d;e_;f"
> [a]
> [b]
> [c d]
> [e_]
> [f]
IN="bla#some.com;john#home.com"
IFS=';'
read -a IN_arr <<< "${IN}"
for entry in "${IN_arr[#]}"
do
echo $entry
done
Output
bla#some.com
john#home.com
System : Ubuntu 12.04.1
Use the set built-in to load up the $# array:
IN="bla#some.com;john#home.com"
IFS=';'; set $IN; IFS=$' \t\n'
Then, let the party begin:
echo $#
for a; do echo $a; done
ADDR1=$1 ADDR2=$2
Two bourne-ish alternatives where neither require bash arrays:
Case 1: Keep it nice and simple: Use a NewLine as the Record-Separator... eg.
IN="bla#some.com
john#home.com"
while read i; do
# process "$i" ... eg.
echo "[email:$i]"
done <<< "$IN"
Note: in this first case no sub-process is forked to assist with list manipulation.
Idea: Maybe it is worth using NL extensively internally, and only converting to a different RS when generating the final result externally.
Case 2: Using a ";" as a record separator... eg.
NL="
" IRS=";" ORS=";"
conv_IRS() {
exec tr "$1" "$NL"
}
conv_ORS() {
exec tr "$NL" "$1"
}
IN="bla#some.com;john#home.com"
IN="$(conv_IRS ";" <<< "$IN")"
while read i; do
# process "$i" ... eg.
echo -n "[email:$i]$ORS"
done <<< "$IN"
In both cases a sub-list can be composed within the loop is persistent after the loop has completed. This is useful when manipulating lists in memory, instead storing lists in files. {p.s. keep calm and carry on B-) }
In Android shell, most of the proposed methods just do not work:
$ IFS=':' read -ra ADDR <<<"$PATH"
/system/bin/sh: can't create temporary file /sqlite_stmt_journals/mksh.EbNoR10629: No such file or directory
What does work is:
$ for i in ${PATH//:/ }; do echo $i; done
/sbin
/vendor/bin
/system/sbin
/system/bin
/system/xbin
where // means global replacement.
IN='bla#some.com;john#home.com;Charlie Brown <cbrown#acme.com;!"#$%&/()[]{}*? are no problem;simple is beautiful :-)'
set -f
oldifs="$IFS"
IFS=';'; arrayIN=($IN)
IFS="$oldifs"
for i in "${arrayIN[#]}"; do
echo "$i"
done
set +f
Output:
bla#some.com
john#home.com
Charlie Brown <cbrown#acme.com
!"#$%&/()[]{}*? are no problem
simple is beautiful :-)
Explanation: Simple assignment using parenthesis () converts semicolon separated list into an array provided you have correct IFS while doing that. Standard FOR loop handles individual items in that array as usual.
Notice that the list given for IN variable must be "hard" quoted, that is, with single ticks.
IFS must be saved and restored since Bash does not treat an assignment the same way as a command. An alternate workaround is to wrap the assignment inside a function and call that function with a modified IFS. In that case separate saving/restoring of IFS is not needed. Thanks for "Bize" for pointing that out.
Here's my answer!
DELIMITER_VAL='='
read -d '' F_ABOUT_DISTRO_R <<"EOF"
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS"
NAME="Ubuntu"
VERSION="14.04.4 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.4 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
EOF
SPLIT_NOW=$(awk -F$DELIMITER_VAL '{for(i=1;i<=NF;i++){printf "%s\n", $i}}' <<<"${F_ABOUT_DISTRO_R}")
while read -r line; do
SPLIT+=("$line")
done <<< "$SPLIT_NOW"
for i in "${SPLIT[#]}"; do
echo "$i"
done
Why this approach is "the best" for me?
Because of two reasons:
You do not need to escape the delimiter;
You will not have problem with blank spaces. The value will be properly separated in the array.
A one-liner to split a string separated by ';' into an array is:
IN="bla#some.com;john#home.com"
ADDRS=( $(IFS=";" echo "$IN") )
echo ${ADDRS[0]}
echo ${ADDRS[1]}
This only sets IFS in a subshell, so you don't have to worry about saving and restoring its value.

Remove leading zeros from MAC address

I have a MAC address that looks like this.
01:AA:BB:0C:D0:E1
I want to convert it to lowercase and strip the leading zeros.
1:aa:bb:c:d0:e1
What's the simplest way to do that in a Bash script?
$ echo 01:AA:BB:0C:D0:E1 | sed 's/\(^\|:\)0/\1/g;s/.*/\L\0/'
1:aa:bb:c:d0:e1
\(^\|:\)0 represents either the line start (^) or a :, followed by a 0.
We want to replace this by the capture (either line start or :), which removed the 0.
Then, a second substitution (s/.*/\L\0/) put the whole line in lowercase.
$ sed --version | head -1
sed (GNU sed) 4.2.2
EDIT: Alternatively:
echo 01:AA:BB:0C:D0:E1 | sed 's/0\([0-9A-Fa-f]\)/\1/g;s/.*/\L\0/'
This replaces 0x (with x any hexa digit) by x.
EDIT: if your sed does not support \L, use tr:
echo 01:AA:BB:0C:D0:E1 | sed 's/0\([0-9A-Fa-f]\)/\1/g' | tr '[:upper:]' '[:lower:]'
Here's a pure Bash≥4 possibility:
mac=01:AA:BB:0C:D0:E1
IFS=: read -r -d '' -a macary < <(printf '%s:\0' "$mac")
macary=( "${macary[#]#0}" )
macary=( "${macary[#],,}" )
IFS=: eval 'newmac="${macary[*]}"'
The line IFS=: read -r -d '' -a macary < <(printf '%s:\0' "$mac") is the canonical way to split a string into an array,
the expansion "${macary[#]#0}" is that of the array macary with leading 0 (if any) removed,
the expansion "${macary[#],,}" is that of the array macary in lowercase,
IFS=: eval 'newmac="${macary[*]}"' is a standard way to join the fields of an array (note that the use of eval is perfectly safe).
After that:
declare -p newmac
yields
declare -- newmac="1:aa:bb:c:d0:e1"
as required.
A more robust way is to validate the MAC address first:
mac=01:AA:BB:0C:D0:E1
a='([[:xdigit:]]{2})' ; regex="^$a:$a:$a:$a:$a:$a$"
[[ $mac =~ $regex ]] || { echo "Invalid MAC address" >&2; exit 1; }
And then, using the valid result of the regex match (BASH_REMATCH):
set -- $(printf '%x ' $(printf '0x%s ' "${BASH_REMATCH[#]:1}" ))
IFS=: eval 'printf "%s\n" "$*"'
Which will print:
1:aa:bb:c:d0:e1
Hex values without leading zeros and in lowercase.
If Uppercase is needed, change the printf '%x ' to printf '%X '.
If Leading zeros are needed change the same to printf '%02x '.

Loop through a comma-separated shell variable

Suppose I have a Unix shell variable as below
variable=abc,def,ghij
I want to extract all the values (abc, def and ghij) using a for loop and pass each value into a procedure.
The script should allow extracting arbitrary number of comma-separated values from $variable.
Not messing with IFS
Not calling external command
variable=abc,def,ghij
for i in ${variable//,/ }
do
# call your procedure/other scripts here below
echo "$i"
done
Using bash string manipulation http://www.tldp.org/LDP/abs/html/string-manipulation.html
You can use the following script to dynamically traverse through your variable, no matter how many fields it has as long as it is only comma separated.
variable=abc,def,ghij
for i in $(echo $variable | sed "s/,/ /g")
do
# call your procedure/other scripts here below
echo "$i"
done
Instead of the echo "$i" call above, between the do and done inside the for loop, you can invoke your procedure proc "$i".
Update: The above snippet works if the value of variable does not contain spaces. If you have such a requirement, please use one of the solutions that can change IFS and then parse your variable.
If you set a different field separator, you can directly use a for loop:
IFS=","
for v in $variable
do
# things with "$v" ...
done
You can also store the values in an array and then loop through it as indicated in How do I split a string on a delimiter in Bash?:
IFS=, read -ra values <<< "$variable"
for v in "${values[#]}"
do
# things with "$v"
done
Test
$ variable="abc,def,ghij"
$ IFS=","
$ for v in $variable
> do
> echo "var is $v"
> done
var is abc
var is def
var is ghij
You can find a broader approach in this solution to How to iterate through a comma-separated list and execute a command for each entry.
Examples on the second approach:
$ IFS=, read -ra vals <<< "abc,def,ghij"
$ printf "%s\n" "${vals[#]}"
abc
def
ghij
$ for v in "${vals[#]}"; do echo "$v --"; done
abc --
def --
ghij --
I think syntactically this is cleaner and also passes shell-check linting
variable=abc,def,ghij
for i in ${variable//,/ }
do
# call your procedure/other scripts here below
echo "$i"
done
#/bin/bash
TESTSTR="abc,def,ghij"
for i in $(echo $TESTSTR | tr ',' '\n')
do
echo $i
done
I prefer to use tr instead of sed, becouse sed have problems with special chars like \r \n in some cases.
other solution is to set IFS to certain separator
Another solution not using IFS and still preserving the spaces:
$ var="a bc,def,ghij"
$ while read line; do echo line="$line"; done < <(echo "$var" | tr ',' '\n')
line=a bc
line=def
line=ghij
Here is an alternative tr based solution that doesn't use echo, expressed as a one-liner.
for v in $(tr ',' '\n' <<< "$var") ; do something_with "$v" ; done
It feels tidier without echo but that is just my personal preference.
The following solution:
doesn't need to mess with IFS
doesn't need helper variables (like i in a for-loop)
should be easily extensible to work for multiple separators (with a bracket expression like [:,] in the patterns)
really splits only on the specified separator(s) and not - like some other solutions presented here on e.g. spaces too.
is POSIX compatible
doesn't suffer from any subtle issues that might arise when bash’s nocasematch is on and a separator that has lower/upper case versions is used in a match like with ${parameter/pattern/string} or case
beware that:
it does however work on the variable itself and pop each element from it - if that is not desired, a helper variable is needed
it assumes var to be set and would fail if it's not and set -u is in effect
while true; do
x="${var%%,*}"
echo $x
#x is not really needed here, one can of course directly use "${var%%:*}"
if [ -z "${var##*,*}" ] && [ -n "${var}" ]; then
var="${var#*,}"
else
break
fi
done
Beware that separators that would be special characters in patterns (e.g. a literal *) would need to be quoted accordingly.
Here's my pure bash solution that doesn't change IFS, and can take in a custom regex delimiter.
loop_custom_delimited() {
local list=$1
local delimiter=$2
local item
if [[ $delimiter != ' ' ]]; then
list=$(echo $list | sed 's/ /'`echo -e "\010"`'/g' | sed -E "s/$delimiter/ /g")
fi
for item in $list; do
item=$(echo $item | sed 's/'`echo -e "\010"`'/ /g')
echo "$item"
done
}
Try this one.
#/bin/bash
testpid="abc,def,ghij"
count=`echo $testpid | grep -o ',' | wc -l` # this is not a good way
count=`expr $count + 1`
while [ $count -gt 0 ] ; do
echo $testpid | cut -d ',' -f $i
count=`expr $count - 1 `
done

How to shift each letter of the string by a given number of letters?

How can i shift each letter of a string by a given number of letters down or up in bash, without using a hardcoded dictionary?
Do you mean something like ROT13:
pax$ echo 'hello there' | tr '[a-z]' '[n-za-m]'
uryyb gurer
pax$ echo 'hello there' | tr '[a-z]' '[n-za-m]' | tr '[a-z]' '[n-za-m]'
hello there
For a more general solution where you want to provide an arbitrary rotation (0 through 26), you can use:
#!/usr/bin/bash
dual=abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
phrase='hello there'
rotat=13
newphrase=$(echo $phrase | tr "${dual:0:26}" "${dual:${rotat}:26}")
echo ${newphrase}
If you want to rotate also the capitals you could use something like this:
cat data.txt | tr '[a-z]' '[n-za-m]' | tr '[A-Z]' '[N-ZA-M]'
where data.txt has whatever you want to rotate.
$ alpha=abcdefghijklmnopqrstuvwxyz
$ rot=3
$ sed "y/${alpha}/${alpha:$rot}${alpha::$rot}/" <<< 'foobar'
irredu
Shift by 12 characters(A becomes M, and vice versa)
Encryption
----------
$> echo ABCDE | tr '[A-Z]' '[M-ZA-L]' // prints MNOPQ
Decryption
----------
$> echo MNOPQ | tr '[M-ZA-L]' '[A-Z]' // prints ABCDE
In the encryption example, we are piping ABCDE to the command tr which is given two arguments. The first one is a matching string. It will match certain strings in your input(in our case ABCDE). The second argument works upon the result of the first argument and modifies it accordingly. So, we're basically matching any uppercase letter present in the input ABCDE and passing it to the second argument. The second argument replaces the characters with their 12th next counterpart. Now, this part is important to understand and might confuse some people, we're basically going from [M-L] in the second argument. Since the tr command doesn't accept this directly, we're breaking it up into two separate chunks. First chunk is [M-Z] and the second one is [A-L]. It's basically like a search-and-replace mechanism. You search with the first argument, modify with the second argument, as simple as that.
For the second example, I've just swapped the first argument with the second one in the tr command. Which acts perfectly as a decryptor. You could write it the same way as the first example, but I find it less time consuming when I have the encryption algorithm and I can just swap the arguments to have a decryption algorithm as well.
Or
cat data.txt | tr 'a-zA-Z' 'n-za-mN-ZA-M'
It will also work
Without using tr, shift 1 to 25 characters
and can be decrypted using 26 - original key
#!/bin/bash
#set -x
i=0
for letters in {A..Z}
do
abc_cap[$i]="$letters"
((i++))
done
i=0
for letters in {a..z}
do
abc_small[$i]="$letters"
((i++))
done
read -r -p "Enter message to be encrypted/decrypted: " -a message
read -r -p "Enter shift amount (26 - orig key for decrypt): " shift_amount
echo -n "Encrypted message: "
if [ "$shift_amount" -gt 25 ] || [ "$shift_amount" -lt 1 ]
then
echo "Shift amount out of range"
exit
fi
for word in "${message[#]}"
do
while read -r -n 1 letter
do
if [[ "$letter" = [a-z] ]]
then
for a in "${!abc_small[#]}"
do
if [ "${abc_small[$a]}" = "$letter" ]
then
a=$(echo "($a + $shift_amount) % 26" | bc)
echo -n "${abc_small[$a]}"
fi
done
elif [[ "$letter" = [A-Z] ]]
then
for a in "${!abc_cap[#]}"
do
if [ "${abc_cap[$a]}" = "$letter" ]
then
a=$(echo "($a + $shift_amount) % 26" | bc)
echo -n "${abc_cap[$a]}"
fi
done
elif [[ "$letter" = "" ]]
then echo -n " "
else echo -n "$letter"
fi
done < <(echo "$word")
done
echo
exit
Problem statement and how this command can help you:
For example The password is stored in the file data.txt, where 13 positions have rotated all lowercase (a-z) and uppercase (A-Z) letters.
The data.txt file contains 1 line encrypted with the ROT13 ( rotation by 13) algorithm. In order to decrypt it, I have to replace every letter with the letter 13 positions ahead.
file contains the data as shown below
cat data.txt
Gur cnffjbeq vf WIAOOSFzMjXXBC0KoSKBbJ8puQm5lIEi
after rotation to 13 character, the password will look like this.
The password is JVNBBFSmZwKKOP0XbFXOoW8chDz5yVRv
The command to Do that is given below.
cat data.txt | tr '[A-Za-z]' '[N-ZA-Mn-za-m]'
Explanation of the Command
cat data.txt read all the character in data.txt file and then pass to tr command, tr commands takes two arguments, the first argument [A-Za-z] read only the characters made of A-Z or a-z. and in the second argument is rotation regular expression.
[13th character from A - ZA-12th character from A and same expression as for small letters]
[N-ZA-Mn-za-m]
N : 13th character from A.
Z : to the end.
A : first character.
N : just a previous character from the 13th character. to complete the circle.
repeat the same expression for small letters.
We rotated by 13, you can replace the 13th and Previous character by any x position to rotate the string by x characters

Resources