I have a string in a Bash shell script that I want to split into an array of characters, not based on a delimiter but just one character per array index. How can I do this? Ideally it would not use any external programs. Let me rephrase that. My goal is portability, so things like sed that are likely to be on any POSIX compatible system are fine.
Try
echo "abcdefg" | fold -w1
Edit: Added a more elegant solution suggested in comments.
echo "abcdefg" | grep -o .
You can access each letter individually already without an array conversion:
$ foo="bar"
$ echo ${foo:0:1}
b
$ echo ${foo:1:1}
a
$ echo ${foo:2:1}
r
If that's not enough, you could use something like this:
$ bar=($(echo $foo|sed 's/\(.\)/\1 /g'))
$ echo ${bar[1]}
a
If you can't even use sed or something like that, you can use the first technique above combined with a while loop using the original string's length (${#foo}) to build the array.
Warning: the code below does not work if the string contains whitespace. I think Vaughn Cato's answer has a better chance at surviving with special chars.
thing=($(i=0; while [ $i -lt ${#foo} ] ; do echo ${foo:$i:1} ; i=$((i+1)) ; done))
As an alternative to iterating over 0 .. ${#string}-1 with a for/while loop, there are two other ways I can think of to do this with only bash: using =~ and using printf. (There's a third possibility using eval and a {..} sequence expression, but this lacks clarity.)
With the correct environment and NLS enabled in bash these will work with non-ASCII as hoped, removing potential sources of failure with older system tools such as sed, if that's a concern. These will work from bash-3.0 (released 2005).
Using =~ and regular expressions, converting a string to an array in a single expression:
string="wonkabars"
[[ "$string" =~ ${string//?/(.)} ]] # splits into array
printf "%s\n" "${BASH_REMATCH[#]:1}" # loop free: reuse fmtstr
declare -a arr=( "${BASH_REMATCH[#]:1}" ) # copy array for later
The way this works is to perform an expansion of string which substitutes each single character for (.), then match this generated regular expression with grouping to capture each individual character into BASH_REMATCH[]. Index 0 is set to the entire string, since that special array is read-only you cannot remove it, note the :1 when the array is expanded to skip over index 0, if needed.
Some quick testing for non-trivial strings (>64 chars) shows this method is substantially faster than one using bash string and array operations.
The above will work with strings containing newlines, =~ supports POSIX ERE where . matches anything except NUL by default, i.e. the regex is compiled without REG_NEWLINE. (The behaviour of POSIX text processing utilities is allowed to be different by default in this respect, and usually is.)
Second option, using printf:
string="wonkabars"
ii=0
while printf "%s%n" "${string:ii++:1}" xx; do
((xx)) && printf "\n" || break
done
This loop increments index ii to print one character at a time, and breaks out when there are no characters left. This would be even simpler if the bash printf returned the number of character printed (as in C) rather than an error status, instead the number of characters printed is captured in xx using %n. (This works at least back as far as bash-2.05b.)
With bash-3.1 and printf -v var you have slightly more flexibility, and can avoid falling off the end of the string should you be doing something other than printing the characters, e.g. to create an array:
declare -a arr
ii=0
while printf -v cc "%s%n" "${string:(ii++):1}" xx; do
((xx)) && arr+=("$cc") || break
done
If your string is stored in variable x, this produces an array y with the individual characters:
i=0
while [ $i -lt ${#x} ]; do y[$i]=${x:$i:1}; i=$((i+1));done
The most simple, complete and elegant solution:
$ read -a ARRAY <<< $(echo "abcdefg" | sed 's/./& /g')
and test
$ echo ${ARRAY[0]}
a
$ echo ${ARRAY[1]}
b
Explanation: read -a reads the stdin as an array and assigns it to the variable ARRAY treating spaces as delimiter for each array item.
The evaluation of echoing the string to sed just add needed spaces between each character.
We are using Here String (<<<) to feed the stdin of the read command.
I have found that the following works the best:
array=( `echo string | grep -o . ` )
(note the backticks)
then if you do: echo ${array[#]} ,
you get: s t r i n g
or: echo ${array[2]} ,
you get: r
Pure Bash solution with no loop:
#!/usr/bin/env bash
str='The quick brown fox jumps over a lazy dog.'
# Need extglob for the replacement pattern
shopt -s extglob
# Split string characters into array (skip first record)
# Character 037 is the octal representation of ASCII Record Separator
# so it can capture all other characters in the string, including spaces.
IFS= mapfile -s1 -t -d $'\37' array <<<"${str//?()/$'\37'}"
# Strip out captured trailing newline of here-string in last record
array[-1]="${array[-1]%?}"
# Debug print array
declare -p array
string=hello123
for i in $(seq 0 ${#string})
do array[$i]=${string:$i:1}
done
echo "zero element of array is [${array[0]}]"
echo "entire array is [${array[#]}]"
The zero element of array is [h]. The entire array is [h e l l o 1 2 3 ].
Yet another on :), the stated question simply says 'Split string into character array' and don't say much about the state of the receiving array, and don't say much about special chars like and control chars.
My assumption is that if I want to split a string into an array of chars I want the receiving array containing just that string and no left over from previous runs, yet preserve any special chars.
For instance the proposed solution family like
for (( i=0 ; i < ${#x} ; i++ )); do y[i]=${x:i:1}; done
Have left overs in the target array.
$ y=(1 2 3 4 5 6 7 8)
$ x=abc
$ for (( i=0 ; i < ${#x} ; i++ )); do y[i]=${x:i:1}; done
$ printf '%s ' "${y[#]}"
a b c 4 5 6 7 8
Beside writing the long line each time we want to split a problem, so why not hide all this into a function we can keep is a package source file, with a API like
s2a "Long string" ArrayName
I got this one that seems to do the job.
$ s2a()
> { [ "$2" ] && typeset -n __=$2 && unset $2;
> [ "$1" ] && __+=("${1:0:1}") && s2a "${1:1}"
> }
$ a=(1 2 3 4 5 6 7 8 9 0) ; printf '%s ' "${a[#]}"
1 2 3 4 5 6 7 8 9 0
$ s2a "Split It" a ; printf '%s ' "${a[#]}"
S p l i t I t
If the text can contain spaces:
eval a=( $(echo "this is a test" | sed "s/\(.\)/'\1' /g") )
$ echo hello | awk NF=NF FS=
h e l l o
Or
$ echo hello | awk '$0=RT' RS=[[:alnum:]]
h
e
l
l
o
I know this is a "bash" question, but please let me show you the perfect solution in zsh, a shell very popular these days:
string='this is a string'
string_array=(${(s::)string}) #Parameter expansion. And that's it!
print ${(t)string_array} -> type array
print $#string_array -> 16 items
This is an old post/thread but with a new feature of bash v5.2+ using the shell option patsub_replacement and the =~ operator for regex. More or less same with #mr.spuratic post/answer.
str='There can be only one, the Highlander.'
regexp="${str//?/(&)}"
[[ "$str" =~ $regexp ]] &&
printf '%s\n' "${BASH_REMATCH[#]:1}"
Or by just: (which includes the whole string at index 0)
declare -p BASH_REMATCH
If that is not desired, one can remove the value of the first index (index 0), with
unset -v 'BASH_REMATCH[0]'
instead of using printf or echo to print the value of the array BASH_REMATCH
One can check/see the value of the variable "$regexp" with either
declare -p regexp
Output
declare -- regexp="(T)(h)(e)(r)(e)( )(c)(a)(n)( )(b)(e)( )(o)(n)(l)(y)( )(o)(n)(e)(,)( )(t)(h)(e)( )(H)(i)(g)(h)(l)(a)(n)(d)(e)(r)(.)"
or
echo "$regexp"
Using it in a script, one might want to test if the shopt is enabled or not, although the manual says it is on/enabled by default.
Something like.
if ! shopt -q patsub_replacement; then
shopt -s patsub_replacement
fi
But yeah, check the bash version too! If you're not sure which version of bash is in use.
if ! ((BASH_VERSINFO[0] >= 5 && BASH_VERSINFO[1] >= 2)); then
printf 'No dice! bash version 5.2+ is required!\n' >&2
exit 1
fi
Space can be excluded from regexp variable, change it from
regexp="${str//?/(&)}"
To
regexp="${str//[! ]/(&)}"
and the output is:
declare -- regexp="(T)(h)(e)(r)(e) (c)(a)(n) (b)(e) (o)(n)(l)(y) (o)(n)(e) (t)(h)(e) (H)(i)(g)(h)(l)(a)(n)(d)(e)(r)(.)"
Maybe not as efficient as the other post/answer but it is still a solution/option.
If you want to store this in an array, you can do this:
string=foo
unset chars
declare -a chars
while read -N 1
do
chars[${#chars[#]}]="$REPLY"
done <<<"$string"x
unset chars[$((${#chars[#]} - 1))]
unset chars[$((${#chars[#]} - 1))]
echo "Array: ${chars[#]}"
Array: f o o
echo "Array length: ${#chars[#]}"
Array length: 3
The final x is necessary to handle the fact that a newline is appended after $string if it doesn't contain one.
If you want to use NUL-separated characters, you can try this:
echo -n "$string" | while read -N 1
do
printf %s "$REPLY"
printf '\0'
done
AWK is quite convenient:
a='123'; echo $a | awk 'BEGIN{FS="";OFS=" "} {print $1,$2,$3}'
where FS and OFS is delimiter for read-in and print-out
For those who landed here searching how to do this in fish:
We can use the builtin string command (since v2.3.0) for string manipulation.
↪ string split '' abc
a
b
c
The output is a list, so array operations will work.
↪ for c in (string split '' abc)
echo char is $c
end
char is a
char is b
char is c
Here's a more complex example iterating over the string with an index.
↪ set --local chars (string split '' abc)
for i in (seq (count $chars))
echo $i: $chars[$i]
end
1: a
2: b
3: c
zsh solution: To put the scalar string variable into arr, which will be an array:
arr=(${(ps::)string})
If you also need support for strings with newlines, you can do:
str2arr(){ local string="$1"; mapfile -d $'\0' Chars < <(for i in $(seq 0 $((${#string}-1))); do printf '%s\u0000' "${string:$i:1}"; done); printf '%s' "(${Chars[*]#Q})" ;}
string=$(printf '%b' "apa\nbepa")
declare -a MyString=$(str2arr "$string")
declare -p MyString
# prints declare -a MyString=([0]="a" [1]="p" [2]="a" [3]=$'\n' [4]="b" [5]="e" [6]="p" [7]="a")
As a response to Alexandro de Oliveira, I think the following is more elegant or at least more intuitive:
while read -r -n1 c ; do arr+=("$c") ; done <<<"hejsan"
declare -r some_string='abcdefghijklmnopqrstuvwxyz'
declare -a some_array
declare -i idx
for ((idx = 0; idx < ${#some_string}; ++idx)); do
some_array+=("${some_string:idx:1}")
done
for idx in "${!some_array[#]}"; do
echo "$((idx)): ${some_array[idx]}"
done
Pure bash, no loop.
Another solution, similar to/adapted from Léa Gris' solution, but using read -a instead of readarray/mapfile :
#!/usr/bin/env bash
str='azerty'
# Need extglob for the replacement pattern
shopt -s extglob
# Split string characters into array
# ${str//?()/$'\x1F'} replace each character "c" with "^_c".
# ^_ (Control-_, 0x1f) is Unit Separator (US), you can choose another
# character.
IFS=$'\x1F' read -ra array <<< "${str//?()/$'\x1F'}"
# now, array[0] contains an empty string and the rest of array (starting
# from index 1) contains the original string characters :
declare -p array
# Or, if you prefer to keep the array "clean", you can delete
# the first element and pack the array :
unset array[0]
array=("${array[#]}")
declare -p array
However, I prefer the shorter (and easier to understand for me), where we remove the initial 0x1f before assigning the array :
#!/usr/bin/env bash
str='azerty'
shopt -s extglob
tmp="${str//?()/$'\x1F'}" # same as code above
tmp=${tmp#$'\x1F'} # remove initial 0x1f
IFS=$'\x1F' read -ra array <<< "$tmp" # assign array
declare -p array # verification
I am trying to capture the longest match of a repeating pattern
do_run() {
local regex='.*((abc)+).*'
local str='_abcabcabc123_'
echo "regex=${regex}"$'\n'
echo "str=${str}"$'\n'
if [[ "${str}" =~ ${regex} ]]
then
for i in ${!BASH_REMATCH[#]}
do
echo "$i=${BASH_REMATCH[i]}"
done
else
echo "no match"
fi
}
I get the following output :
regex=.*((abc)+).*
str=_abcabcabc_
0=_abcabcabc123_
1=abc
2=abc
I am trying to get something like :
regex=.*((abc)+).*
str=_abcabcabc123_
0=_abcabcabc123_
x=abcabcabc
(Update : x is just here to indicate that the index of the matching group does not matter but I need to know what number to use to retrieve the matching group ...)
Update:
After reading comment, the following regex will work : ((abc)+)
However, I also need to capture what precedes and what follows ((abc)+).
I had not mentionned it earlier because I thought the same solution would be applied.
So the new code would be :
do_run() {
local regex='(.*)((abc)+)(.*)'
local str='_abcabcabc123_'
echo "regex=${regex}"$'\n'
echo "str=${str}"$'\n'
if [[ "${str}" =~ ${regex} ]]
then
for i in ${!BASH_REMATCH[#]}
do
echo "$i=${BASH_REMATCH[i]}"
done
else
echo "no match"
fi
}
I get then the following output :
regex=(.*)((abc)+)(.*)
str=_abcabcabc123_
0=_abcabcabc123_
1=_abcabc
2=abc
3=abc
4=123_
I want to be able to retrieve abcabcabc from a matching group but also what precedes it and what follows it
As a workaround you can do like this:
[STEP 101] $ cat foo.sh
v=_abcabcabc123_
if [[ $v =~ (abc)+ ]]; then
middle=${BASH_REMATCH[0]}
[[ $v =~ (.*)"$middle" ]]
before=${BASH_REMATCH[1]}
[[ $v =~ "$middle"(.*) ]]
after=${BASH_REMATCH[1]}
echo "before: $before"
echo "middle: $middle"
echo "after : $after"
fi
[STEP 102] $ bash foo.sh
before: _
middle: abcabcabc
after : 123_
[STEP 103] $
I also need to capture what precedes and what follows ((abc)+).
For that, typically you'll need a negative lookahead with perl regex, something along (?<!abc)((abs)+)(.*).
I am bad at perl regex, with perl-enabled grep I was able to this:
$ grep -oxP '(.*)(?<!abc)((abc)+)\K(.*)' <<<'_abcabcabc123_'
123_
$ grep -oP '((abc)+)' <<<'_abcabcabc123_'
abcabcabc
$ rev <<<'_abcabcabc123_' | grep -oP '(.*)(?<!cba)((cba)+)\K(.*)' | rev
_
Bash has no lookarounds and no perl regex. Consider using python or perl.
But you may use sed by splitting the part on the regex and then reading lines, which may be simpler:
$ readarray -t lines < <(<<<'_abcabcabc123_' sed -E 's/((abc)+)/\n&\n/'); declare -p lines
declare -a lines=([0]="_" [1]="abcabcabc" [2]="123_")
Another idea: you may use bash expansion to replace the abc parts by something unique, then split it on that separator:
$ IFS=' ' read -r before post < <(printf "%s\n" "${str//abc/ }") ; declare -p before post
declare -- before="_"
declare -- post="123_"
# or
$ IFS='#' read -r before post < <(<<<"${str//abc/#}" tr -s '#') ; declare -p before post
declare -- before="_"
declare -- post="123_"
For your given input this regex would work:
re='^([^a]|a[^b]*|ab[^c]*)((abc)+)(.*)'
str='_abcabcabc123_'
[[ $str =~ $re ]] && declare -p BASH_REMATCH
Output:
declare -ar BASH_REMATCH=([0]="_abcabcabc123_" [1]="_" [2]="abcabcabc" [3]="abc" [4]="123_")
So you can use:
"${BASH_REMATCH[1]}" # string before
"${BASH_REMATCH[2]}" # string containing all "abc"s
"${BASH_REMATCH[4]}" # string after
RegEx Demo
Is there an inline command which can be used to get the first extension of a file ?
I use the following command to get the latest:
FILE="filename.tar.bz2"
EXT="${FILE##*.}"
echo "EXT = ${EXT}"
which returns
EXT = bz2
Is there a similar command to isolate "tar" only ?
var="config/filename.tar.bz2"
ext=$(basename "$var") # extract filename only
ext=${ext#*.} # remove everything in front the first dot
ext=${ext%%.*} # remove everything after a dot
echo "$ext"
Note: uppercase variables by convention are used for exported variables like COLUMNS, LINES UID PWD TERM etc. Prefer using lowercase variables in your scripts.
Expanding on good answer from #KamilCuk, using only POSIX shell grammar and no external command sub-shell call:
#!/usr/bin/env sh
filepath="/path/to/config/filename.tar.bz2.bak"
echo 'filepath:' "$filepath"
# remove everything upto and including last /
filename="${filepath##*/}"
echo 'filename:' "$filename"
# remove everything until and including first dot
all_extensions="${filename#*.}"
echo 'all_extensions:' "$all_extensions"
# remove everything from and including first dot
first_extension="${all_extensions%%.*}"
echo 'First extension:' "$first_extension"
last_extension="${all_extensions##*.}"
echo 'Last extension:' "$last_extension"
# Fill argument array with extensions
IFS='.'; set -- $all_extensions
if [ $# -gt 0 ]
then
ext_num=1
printf '\nIterating all %d extensions:\n' $#
printf '%s\t%s\n' 'ext#' 'extension'
for extension
do
printf '%4d\t%s\n' "$ext_num" "$extension"
ext_num="$((ext_num+1))"
done
fi
Output:
filepath: /path/to/config/filename.tar.bz2.bak
filename: filename.tar.bz2.bak
all_extensions: tar.bz2.bak
First extension: tar
Last extension: bak
Iterating all 3 extensions:
ext# extension
1 tar
2 bz2
3 bak
$ filename="filename.tar.bz2"
$ echo ${filename##*.}
ext
$ file_ext=${filename##*.} #put to variable
$ echo ${file_ext}
ext
echo "filename.tar.gz" | awk -F "." '{print $2}'
This outputs the first instance of a string between periods. In this case, 'tar'.
I have a below file which containing some data
name:Mark
age:23
salary:100
I want to read only name, age and assign to a variable in shell script
How I can achieve this thing
I am able to real all file data by using below script not a particular data
#!/bin/bash
file="/home/to/person.txt"
val=$(cat "$file")
echo $val
please suggest.
Rather than running multiple greps or bash loops, you could just run a single read that reads the output of a single invocation of awk:
read age salary name <<< $(awk -F: '/^age/{a=$2} /^salary/{s=$2} /^name/{n=$2} END{print a,s,n}' file)
Results
echo $age
23
echo $salary
100
echo $name
Mark
If the awk script sees an age, it sets a to the age. If it sees a salary , it sets s to the salary. If it sees a name, it sets n to the name. At the end of the input file, it outputs what it has seen for the read command to read.
Using grep : \K is part of perl regex. It acts as assertion and checks if text supplied left to it is present or not. IF present prints as per regex ignoring the text left to it.
name=$(grep -oP 'name:\K.*' person.txt)
age=$(grep -oP 'age:\K.*' person.txt)
salary=$(grep -oP 'salary:\K.*' person.txt)
Or using awk one liner ,this may break if the line containing extra : .
declare $(awk '{sub(/:/,"=")}1' person.txt )
Will result in following result:
sh-4.1$ echo $name
Mark
sh-4.1$ echo $age
23
sh-4.1$ echo $salary
100
You could try this
if your data is in a file: data.txt
name:vijay
age:23
salary:100
then you could use a script like this
#!/bin/bash
# read will read a line until it hits a record separator i.e. newline, at which
# point it will return true, and store the line in variable $REPLY
while read
do
if [[ $REPLY =~ ^name:.* || $REPLY =~ ^age:.* ]]
then
eval ${REPLY%:*}=${REPLY#*:} # strip suffix and prefix
fi
done < data.txt # read data.txt from STDIN into the while loop
echo $name
echo $age
output
vijay
23
well if you can store data in json or other similar formate it will be very easy to access complex data
data.json
{
"name":"vijay",
"salary":"100",
"age": 23
}
then you can use jq to parse json and get data easily
jq -r '.name' data.json
vijay
I'm trying to parse a .csv file and I have some problems with IFS.
The file contains lines like this:
"Hello","World","this","is, a boring","line"
The columns are separated with a comma, so I tried to explode the line with this code:
IFS=, read -r -a tempArr <<< "$line"
But I get this output:
"Hello"
"World"
"this"
"is
a boring"
"line"
I understand why, so I tried some other commands but I don't get my expected output.
IFS=\",\"
IFS=\",
IFS=',\"'
IFS=,\"
Every time the third element is seperated in 2 parts.
How can I use IFS to seperate the string to 5 parts like this?
"Hello"
"World"
"this"
"is, a boring"
"line"
give this a try:
sed 's/","/"\n"/g' <<<"${line}"
sed has a search and replace command s which is using regex to search pattern.
The regex replaces , in "," with new line char.
As a consequence each element is on a separate line.
You may wish to use the gawk with FPAT to define what makes a valid string -
Input :
"hello","world","this,is"
Script :
gawk -n 'BEGIN{FS=",";OFS="\n";FPAT="([^,]+)|(\"[^\"]+\")"}{$1=$1;print $0}' somefile.csv
Output :
"hello"
"world"
"this,is"
bashlib provides a csvline function. Assuming you've installed it somewhere in your PATH:
line='"Hello","World","this","is, a boring","line"'
source bashlib
csvline <<<"$line"
printf '%s\n' "${CSVLINE[#]}"
...output from the above being:
Hello
World
this
is, a boring
line
To quote the implementation (which is copyright lhunath, the below text being taken from this specific revision of the relevant git repo):
# _______________________________________________________________________
# |__ csvline ____________________________________________________________|
#
# csvline [-d delimiter] [-D line-delimiter]
#
# Parse a CSV record from standard input, storing the fields in the CSVLINE array.
#
# By default, a single line of input is read and parsed into comma-delimited fields.
# Fields can optionally contain double-quoted data, including field delimiters.
#
# A different field delimiter can be specified using -d. You can use -D
# to change the definition of a "record" (eg. to support NULL-delimited records).
#
csvline() {
CSVLINE=()
local line field quoted=0 delimiter=, lineDelimiter=$'\n' c
local OPTIND=1 arg
while getopts :d: arg; do
case $arg in
d) delimiter=$OPTARG ;;
esac
done
IFS= read -d "$lineDelimiter" -r line || return
while IFS= read -rn1 c; do
case $c in
\")
(( quoted = !quoted ))
continue ;;
$delimiter)
if (( ! quoted )); then
CSVLINE+=( "$field" ) field=
continue
fi ;;
esac
field+=$c
done <<< "$line"
[[ $field ]] && CSVLINE+=( "$field" ) ||:
} # _____________________________________________________________________