Storing a variable string with special characters into an array in bash - bash

I need to store a string that may include special characters (to be exact: *) into an array as individual strings. The string is returned by a function so at the point of the array declaration I do not know its contents
foo(){
in="my * string"
echo "$in"
}
arr=($(foo))
What I've already tried was:
arr=("$(foo)")
where * doesn't get expanded but the array consists of 1 string, and:
arr=($(foo | sed -r "s/[\*]/'*'/g"))
that replaces each occurence of * with the string: *. Which is not what I want to achieve. What I aim for is just storing each * from the returned string as *.

Storing an array this way does not expand the "*"
ins="my * string"
read -r -a array <<< "$ins"
echo "${array[*]}"

Short answer:
read -a arr <<< "$(foo)"
To elaborate -
Your function is correctly returning the single string "my * string".
Your assignment to an array executes the function in unquoted context, so the asterisk is evaluated and parsed to the names of everything in the directory.
Putting quotes around the outer parens makes the whole assignment into the string "(my * string)" - also not what you want. You need something that preserves the asterisk unexpanded into directory contents but parses the elements of the string into separate items in your array, yes?
read -a arr <<< "$(foo)"
This passes back the string properly quoted, and then reads it into the array after splitting with $IFS, so each item becomes an unexpanded string in the array.
$: echo "${#arr[#]}"
3
$: printf "%s\n" "${arr[#]}"
my
*
string

Related

How to concatenate string to comma-separated element in bash

I am new to Bash coding. I would like to concatenate a string to each element of a comma-separated strings "array".
This is an example of what I have in mind:
s=a,b,c
# Here a function to concatenate the string "_string" to each of them.
# Expected result:
a_string,b_string,c_string
One way:
$ s=a,b,c
$ echo ${s//,/_string,}_string
a_string,b_string,c_string
Using a proper array is generally a much more robust solution. It allows the values to contain literal commas, whitespace, etc.
s=(a b c)
printf '%s\n' "${s[#]/%/_string}"
As suggested by chepner, you can use IFS="," to merge the result with commas.
(IFS=","; echo "${s[#]/%/_string}")
(The subshell is useful to keep the scope of the IFS reassignment from leaking to the current shell.)
Simply, you could use a for loop
main() {
local input='a,b,c'
local append='_string'
# create an 'output' variable that is empty
local output=
# convert the input into an array called 'items' (without the commas)
IFS=',' read -ra items <<< "$input"
# loop over each item in the array, and append whatever string we want, in this case, '_string'
for item in "${items[#]}"; do
output+="${item}${append},"
done
# in the loop, the comma was re-added back. now, we must remove the so there are only commas _in between_ elements
output=${output%,}
echo "$output"
}
main
I've split it up in three steps:
Make it into an actual array.
Append _string to each element in the array using Parameter expansion.
Turn it back into a scalar (for which I've made a function called turn_array_into_scalar).
#!/bin/bash
function turn_array_into_scalar() {
local -n arr=$1 # -n makes `arr` a reference the array `s`
local IFS=$2 # set the field separator to ,
arr="${arr[*]}" # "join" over IFS and assign it back to `arr`
}
s=a,b,c
# make it into an array by turning , into newline and reading into `s`
readarray -t s < <(tr , '\n' <<< "$s")
# append _string to each string in the array by using parameter expansion
s=( "${s[#]/%/_string}" )
# use the function to make it into a scalar again and join over ,
turn_array_into_scalar s ,
echo "$s"

Basic string manipulation from filenames in bash

I have a some file names in bash that I have acquired with
$ ones=$(find SRR*pass*1*.fq)
$ echo $ones
SRR6301033_pass_1_trimmed.fq
SRR6301034_pass_1_trimmed.fq
SRR6301037_pass_1_trimmed.fq
...
I then converted into an array so I can iterate over this list and perform some operations with filenames:
# convert to array
$ ones=(${ones// / })
and the iteration:
for i in $ones;
do
fle=$(basename $i)
out=$(echo $fle | grep -Po '(SRR\d*)')
echo "quants/$out.quant"
done
which produces:
quants/SRR6301033
SRR6301034
...
...
SRR6301220
SRR6301221.quant
However I want this:
quants/SRR6301033.quant
quants/SRR6301034.quant
...
...
quants/SRR6301220.quant
quants/SRR6301221.quant
Could somebody explain why what I'm doing doesn't work and how to correct it?
Why do you want this be done this complicated? You can get rid of all the unnecessary roundabouts and just use a for loop and built-in parameter expansion techniques to get this done.
# Initialize an empty indexed array
array=()
# Start a loop over files ending with '.fq' and if there are no such files
# the *.fq would be un-expanded and checking it against '-f' would fail and
# in-turn would cause the loop to break out
for file in *.fq; do
[ -f "$file" ] || continue
# Get the part of filename after the last '/' ( same as basename )
bName="${file##*/}"
# Remove the part after '.' (removing extension)
woExt="${bName%%.*}"
# In the resulting string, remove the part after first '_'
onlyFir="${woExt%%_*}"
# Append the result to the array, prefixing/suffixing strings 'quant'
array+=( quants/"$onlyFir".quant )
done
Now print the array to see the result
for entry in "${array[#]}"; do
printf '%s\n' "$entry"
done
Ways your attempt could fail
With ones=$(find SRR*pass*1*.fq) you are storing the results in a variable and not in an array. A variable has no way to distinguish if the contents are a list or a single string separated by spaces
With echo $ones i.e. an unquoted expansion, the string content is subject to word splitting. You might not see a difference as long as you have filenames with spaces, having one might let you interpret parts of the filename as different files
The part ${ones// / } makes no-sense in converting the string to an array as the attempt to use an unquoted variable $ones itself would be erroneous
for i in $ones; would be error prone for the said reasons above, the filenames with spaces could be interpreted as separated files instead of one.

return array from perl to bash

I'm trying to get back an array from perl to bash.
My perl scrip has an array and then I use return(#arr)
from my bash script I use
VAR = `perl....
when I echo VAR
I get the aray as 1 long string with all the array vars connected with no spaces.
Thanks
In the shell (and in Perl), backticks (``) capture the output of a command. However, Perl's return is normally for returning variables from subroutines - it does not produce output, so you probably want print instead. Also, in bash, array variables are declared with parentheses. So this works for me:
$ ARRAY=(`perl -wMstrict -le 'my #array = qw/foo bar baz/; print "#array"'`); \
echo "<${ARRAY[*]}> 0=${ARRAY[0]} 1=${ARRAY[1]} 2=${ARRAY[2]}"
<foo bar baz> 0=foo 1=bar 2=baz
In Perl, interpolating an array into a string (like "#array") will join the array with the special variable $" in between elements; that variable defaults to a single space. If you simply print #array, then the array elements will be joined by the variable $,, which is undef by default, meaning no space between the elements. This probably explains the behavior you mentioned ("the array vars connected with no spaces").
Note that the above will not work the way you expect if the elements of the array contain whitespace, because bash will split them into separate array elements. If your array does contain whitespace, then please provide an MCVE with sample data so we can perhaps make an alternative suggestion of how to return that back to bash. For example:
( # subshell so IFS is only affected locally
IFS=$'\n'
ARRAY=(`perl -wMstrict -e 'my #array = ("foo","bar","quz baz"); print join "\n", #array'`)
echo "0=<${ARRAY[0]}> 1=<${ARRAY[1]}> 2=<${ARRAY[2]}>"
)
Outputs: 0=<foo> 1=<bar> 2=<quz baz>
Here is one way using Bash word splitting, it will split the string on white space into the new array array:
array_str=$(perl -E '#a = 1..5; say "#a"')
array=( $array_str )
for item in ${array[#]} ; do
echo ": $item"
done
Output:
: 1
: 2
: 3
: 4
: 5

what does the ! mean in this expression: ${!mylist[#]}

I'm trying to understand a shell script written by a previous group member. there is this for loop. I can understand it's looping through a list ${!mylist[#]} but I've only seen ${mylist[#]} before, not ${!mylist[#]}.
What does the exclamation mark do here?
for i in ${!mylist[#]};
do
echo ${mylist[i]}
....
done
${!mylist[#]} returns the keys (or indices) to an an array. This differs from ${mylist[#]} which returns the values in the array.
As an example, let's consider this array:
$ arr=(abc def ghi)
In order to get its keys (or indices in this case):
$ echo "${!arr[#]}"
0 1 2
In order to get its values:
$ echo "${arr[#]}"
abc def ghi
From man bash:
It is possible to obtain the keys (indices) of an array as well
as the values. ${!name[#]} and ${!name[*]} expand to the indices
assigned in array variable name. The treatment when in double quotes
is similar to the expansion of the special parameters # and * within
double quotes.
Example using associative arrays
To show that the same applies to associative arrays:
$ declare -A Arr=([a]=one [b]=two)
$ echo "${!Arr[#]}"
a b
$ echo "${Arr[#]}"
one two

Why does IFS not affect the length of an array in bash?

I have two specific questions about the IFS. I'm aware that changing the internal field separator, IFS, changes what the bash script iterates over.
So, why is it that the length of the array doesn't change?
Here's my example:
delimiter=$1
strings_to_find=$2
OIFS=$IFS
IFS=$delimiter
echo "the internal field separator is $IFS"
echo "length of strings_to_find is ${#strings_to_find[#]}"
for string in ${strings_to_find[#]}
do
echo "string is separated correctly and is $string"
done
IFS=$OIFS
But why does the length not get affected by the new IFS?
The second thing that I don't understand is how to make the IFS affect the input arguments.
Let's say I'm expecting my input arguments to look like this:
./executable_shell_script.sh first_arg:second_arg:third_arg
And I want to parse the input arguments by setting the IFS to :. How do I do this? Setting the IFS doesn't seem to do anything. I must be doing this wrong....?
Thank you.
Bash arrays are, in fact, arrays. They are not strings which are parsed on demand. Once you create an array, the elements are whatever they are, and they won't change retroactively.
However, nothing in your example creates an array. If you wanted to create an array out of argument 2, you would need to use a different syntax:
strings_to_find=($2)
Although your strings_to_find is not an array, bash allows you to refer to it as though it were an array of one element. So ${#strings_to_find[#]} will always be one, regardless of the contents of strings_to_find. Also, your line:
for string in ${strings_to_find[#]}
is really no different from
for string in $strings_to_find
Since that expansion is not quoted, it will be word-split, using the current value of IFS.
If you use an array, most of the time you will not want to write for string in ${strings_to_find[#]}, because that just reassembles the elements of an array into a string and then word-splits them again, which loses the original array structure. Normally you will avoid the word-splitting by using double quotes:
strings_to_find=(...)
for string in "${strings_to_find[#]}"
As for your second question, the value of IFS does not alter the shell grammar. Regardless of the value of IFS, words in a command are separated by unquoted whitespace. After the line is parsed, the shell performs parameter and other expansions on each word. As mentioned above, if the expansion is not quoted, the expanded text is then word-split using the value of IFS.
If the word does not contain any expansions, no word-splitting is performed. And even if the word does contain expansions, word-splitting is only performed on the expansion itself. So, if you write:
IFS=:
my_function a:b:c
my_function will be called with a single argument; no expansion takes places, so no word-splitting occurs. However, if you use $1 unquoted inside the function, the expansion of $1 will be word-split (if it is expanded in a context in which word-splitting occurs).
On the other hand,
IFS=:
args=a:b:c
my_function $args
will cause my_function to be invoked with three arguments.
And finally,
IFS=:
args=c
my_function a:b:$args
is exactly the same as the first invocation, because there is no : in the expansion.
This is an example script based on #rici's answer :
#!/bin/bash
fun()
{
echo "Total Params : " $#
}
fun2()
{
array1=($1) # Word splitting occurs here based on the IFS ':'
echo "Total elements in array1 : "${#array1[#]}
# Here '#' before array counts the length of the array
array2=("$1") # No word splitting because we have enclosed $1 in double quotes
echo "Total elements in array2 : "${#array2[#]}
}
IFS_OLD="$IFS"
IFS=$':' #Changing the IFS
fun a:b:c #Nothing to expand here, so no use of IFS at all. See fun2 at last
fun a b c
fun abc
args="a:b:c"
fun $args # Expansion! Word splitting occurs with the current IFS ':' here
fun "$args" # preventing word spliting by enclosing ths string in double quotes
fun2 a:b:c
IFS="$IFS_OLD"
Output
Total Params : 1
Total Params : 3
Total Params : 1
Total Params : 3
Total Params : 1
Total elements in array1 : 3
Total elements in array2 : 1
Bash manpage says :
The shell treats each character of IFS as a delimiter, and splits the
results of the other expansions into words on these characters.

Resources