The semantics of arrays in bash - bash

Check out the following transcript. With all possible rigor and formality, what is going on at each step?
$> ls -1 #This command prints 3 items. no explanation required.
a
b
c
$> X=$(ls -1) #Capture the output (as what? a string?)
$> Y=($(ls -1)) #Capture it again (as an array now?)
$> echo ${#X[#]} #Why is the length 1?
1
$> echo ${#Y[#]} #This works because Y is an array of the 3 items?
3
$> echo $X #Why are the linefeeds now spaces?
a b c
$> echo $Y #Why does the array echo as its first element
a
$> for x in $X;do echo $x; done #iterate over $X
a
b
c
$> for y in $Y;do echo $y; done #iterating over y doesn't work
a
$> echo ${X[2]} #I can loop over $X but not index into it?
$> echo ${Y[2]} #Why does this work if I can't loop over $Y?
c
I assume bash has well established semantics about how arrays and text variables (if that's even what they're called) work, but the user manual is not organized in an optimal fashion for someone who wants to reason about scripts based on whatever small set of underlying principles the language designer intended.

Let me preface the following with the very strong suggestion that you never use ls to populate an array. The correct code would be
Z=( * )
to create an array with each (non-hidden) file in the current directory as a distinct array element.
$> ls -1 #This command prints 3 items. no explanation required.
a
b
c
Correct. Each file name is printed on a separate line (although, beware of file names containing newlines; the parts before and after each newline would appear as separate file names.)
$> X=$(ls -1) #Capture the output (as what? a string?)
Yes. The output of ls is concatenated by the command substitution into a single string using a single space to separate each line. (The command substitution would be subject to word-splitting if it weren't the right-hand side of an assignment; word-splitting will come up below.)
$> Y=($(ls -1)) #Capture it again (as an array now?)
Same as with X, but now each of the words in the result of the command substitution is treated as a separate array element. As long as none of the output lines contain any characters in the value of IFS, each file name is one word and will be treated as a separate array element.
$> echo ${#X[#]} #Why is the length 1?
1
X, not being a real array, is treated as an array with a single element, namely the value of $X.
$> echo ${#Y[#]} #This works because Y is an array of the 3 items?
3
Correct.
$> echo $X #Why are the linefeeds now spaces?
a b c
When $X is unquoted, the resulting expansion is subject to word-splitting. In this case, the newlines are simply treated the same as any other whitespace, separating the result into a sequence of words that are passed to echo as distinct arguments, which are then displayed separated by a single space each.
$> echo $Y #Why does the array echo as its first element
a
For a true array, $Y is equivalent to ${Y[0]}.
$> for x in $X;do echo $x; done #iterate over $X
a
b
c
This works, but has caveats.
$> for y in $Y;do echo $y; done #iterating over y doesn't work
a
See above; $Y only expands to the first element. You want for y in "${Y[#]}"; do to iterate over all the elements.
$> echo ${X[2]} #I can loop over $X but not index into it?
Correct. X is not an array, but $X expanded to a space-separated list which the for loop could iterate over.
$> echo ${Y[2]} #Why does this work if I can't loop over $Y?
c
Indexing and iteration are two completely different things in shell. You don't actually iterate over an array; you iterate over the resulting sequence of words of a properly expanded array.

Related

Parse filename string and extract parent at specific level using shell

I have a filename as a string, say filname="a/b/c/d.png".
Is there a general method to extract the parent directory at a given level using ONLY shell parameter expansion?
I.e. I would like to extract "level 1" and return c or "level 2" and return b.
Explicitly, I DO NOT want to get the entire parent path (i.e. a/b/c/, which is the result of ${filename%/*}).
Using just shell parameter expansion, assuming bash, you can first transform the path into an array (splitting on /) and then ask for specific array indexes:
filename=a/b/c/d.png
IFS=/
filename_array=( $filename )
unset IFS
echo "0 = ${filename_array[0]}"
echo "1 = ${filename_array[1]}"
echo "2 = ${filename_array[2]}"
echo "3 = ${filename_array[3]}"
Running the above produces:
0 = a
1 = b
2 = c
3 = d.png
These indexes are the reverse of what you want, but a little
arithmetic should fix that.
Using zsh, the :h modifier trims the final component off a path in variable expansion.
The (s:...:) parameter expansion flag can be used to split the contents of a variable. Combine those with normal array indexing where a negative index goes from the end of the array, and...
$ filename=a/b/c/d.png
$ print $filename:h
a/b/c
$ level=1
$ print ${${(s:/:)filename:h}[-level]}
c
$ level=2
$ print ${${(s:/:)filename:h}[-level]}
b
You could also use array subscript flags instead to avoid the nested expansion:
$ level=1
$ print ${filename[(ws:/:)-level-1]}
c
$ level=2
$ print ${filename[(ws:/:)-level-1]}
b
w makes the index of a scalar split on words instead of by character, and s:...: has the same meaning, to say what to split on. Have to subtract one from the level to skip over the trailing d.png, since it's not stripped off already like the first way.
The :h (head) and :t (tail) expansion modifiers in zsh accept digits to specify a level; they can be combined to get a subset of the path:
> filname="a/b/c/d.png"
> print ${filname:t2}
c/d.png
> print ${filname:t2:h1}
c
> print ${filname:t3:h1}
b
If the level is in a variable, then the F modifier can be used to repeat the h modifier a specific number of times:
> for i in 1 2 3; printf '%s: %s\n' $i ${filname:F(i)h:t}
1: c
2: b
3: a
If using printf (a shell builtin) is allowed then this will do the trick in bash:
filename='a/b/c/d.png'
level=2
printf -v spaces '%*s' $level
pattern=${spaces//?/'/*'}
component=${filename%$pattern}
component=${component##*/}
echo $component
prints out
b
You can assign different values to the variable level.

How to get output values in bash array by calling other program from bash?

I am stuck with a peculiar situation, where in from python I am printing two strings one by one and reading it in bash script (which calls the python code piece)
I am expecting array size to be 2, but somehow, bash considers spaces also as a element separator and return me size of 3.
Example scripts
multi_line_return.py file has following content
print("foo bar")
print(5)
multi_line_call.sh has following content
#!/bin/bash
PYTHON_EXE="ABSOLUTE_PATH TO PYTHON EXECUTABLE IN LINUX"
CURR_DIR=$(cd $(dirname ${BASH_SOURCE[0]}) && pwd)/
array=()
while read line ; do
array+=($line)
done < <(${PYTHON_EXE} ${CURR_DIR}multi_line_return.py)
echo "array length --> ${#array[#]}"
echo "each variable in new line"
for i in "${array[#]}"
do
printf $i
printf "\n"
done
Now keep both of the above file in same directory and make following call to see result.
bash multi_line_call.sh
As you can see in result,
I am getting
array length = 3
1.foo, 2.bar & 3. 5
The expectation is
One complete line of python output (stdout) as one element of bash array
array length = 2
1. foo bar & 2. 5
Put quotes around $line to prevent it from being split:
array+=("$line")
You can also do it without a loop using readarray:
readarray array < <(${PYTHON_EXE} ${CURR_DIR}multi_line_return.py)

what does the ! mean in this expression: ${!mylist[#]}

I'm trying to understand a shell script written by a previous group member. there is this for loop. I can understand it's looping through a list ${!mylist[#]} but I've only seen ${mylist[#]} before, not ${!mylist[#]}.
What does the exclamation mark do here?
for i in ${!mylist[#]};
do
echo ${mylist[i]}
....
done
${!mylist[#]} returns the keys (or indices) to an an array. This differs from ${mylist[#]} which returns the values in the array.
As an example, let's consider this array:
$ arr=(abc def ghi)
In order to get its keys (or indices in this case):
$ echo "${!arr[#]}"
0 1 2
In order to get its values:
$ echo "${arr[#]}"
abc def ghi
From man bash:
It is possible to obtain the keys (indices) of an array as well
as the values. ${!name[#]} and ${!name[*]} expand to the indices
assigned in array variable name. The treatment when in double quotes
is similar to the expansion of the special parameters # and * within
double quotes.
Example using associative arrays
To show that the same applies to associative arrays:
$ declare -A Arr=([a]=one [b]=two)
$ echo "${!Arr[#]}"
a b
$ echo "${Arr[#]}"
one two

How to loop through the first n letters of the alphabet in bash

I know that to loop through the alphabet, one can do
for c in {a..z}; do something; done
My question is, how can I loop through the first n letters (e.g. to build a string) where n is a variable/parameter given in the command line.
I searched SO, and only found answers doing this for numbers, e.g. using C-style for loop or seq (see e.g. How do I iterate over a range of numbers defined by variables in Bash?). And I don't have seq in my environment.
Thanks.
The straightforward way is sticking them in an array and looping over that by index:
#!/bin/bash
chars=( {a..z} )
n=3
for ((i=0; i<n; i++))
do
echo "${chars[i]}"
done
Alternatively, if you just want them dash-separated:
printf "%s-" "${chars[#]:0:n}"
that other guy's answer is probably the way to go, but here's an alternative that doesn't require an array variable:
n=3 # sample value
i=0 # var. for counting iterations
for c in {a..z}; do
echo $c # do something with "$c"
(( ++i == n )) && break # exit loop, once desired count has been reached
done
#rici points out in a comment that you could make do without aux. variable $i by using the conditional (( n-- )) || break to exit the loop, but note that this modifies $n.
Here's another array-free, but less efficient approach that uses substring extraction (parameter expansion):
n=3 # sample value
# Create a space-separated list of letters a-z.
# Note that chars={a..z} does NOT work.
chars=$(echo {a..z})
# Extract the substring containing the specified number
# of letters using parameter expansion with an arithmetic expression,
# and loop over them.
# Note:
# - The variable reference must be _unquoted_ for this to work.
# - Since the list is space-separated, each entry spans 2
# chars., hence `2*n` (you could subtract 1 after, but it'll work either way).
for c in ${chars:0:2*n}; do
echo $c # do something with "$c"
done
Finally, you can combine the array and list approaches for concision, although the pure array approach is more efficient:
n=3 # sample value
chars=( {a..z} ) # create array of letters
# `${chars[#]:0:n}` returns the first n array elements as a space-separated list
# Again, the variable reference must be _unquoted_.
for c in ${chars[#]:0:n}; do
echo $c # do something with "$c"
done
Are you only iterating over the alphabet to create a subset? If that's the case, just make it simple:
$ alpha=abcdefghijklmnopqrstuvqxyz
$ n=4
$ echo ${alpha:0:$n}
abcd
Edit. Based on your comment below, do you have sed?
% sed -e 's/./&-/g' <<< ${alpha:0:$n}
a-b-c-d-
You can loop through the character code of the letters of the alphabet and convert back and forth:
# suppose $INPUT is your input
INPUT='x'
# get the character code and increment it by one
INPUT_CHARCODE=`printf %x "'$INPUT"`
let INPUT_CHARCODE++
# start from character code 61 = 'a'
I=61
while [ $I -ne $INPUT_CHARCODE ]; do
# convert the index to a letter
CURRENT_CHAR=`printf "\x$I"`
echo "current character is: $CURRENT_CHAR"
let I++
done
This question and the answers helped me with my problem, partially.
I needed to loupe over a part of the alphabet based on a letter in bash.
Although the expansion is strictly textual
I found a solution: and made it even more simple:
START=A
STOP=D
for letter in $(eval echo {$START..$STOP}); do
echo $letter
done
Which results in:
A
B
C
D
Hope it's helpful for someone looking for the same problem i had to solve,
and ends up here as well
(also answered here)
And the complete answer to the original question is:
START=A
n=4
OFFSET=$( expr $(printf "%x" \'$START) + $n)
STOP=$(printf "\x$OFFSET")
for letter in $(eval echo {$START..$STOP}); do
echo $letter
done
Which results in the same:
A
B
C
D

Unable to set second to last command line argument to variable

Regardless of the number of arguments passed to my script, I would like for the second to the last argument to always represent a specific variable in my code.
Executing the program I'd type something like this:
sh myprogram.sh -a arg_a -b arg_b special specific
test=("${3}")
echo $test
The results will show 'special'. So using that same idea if I try this (since I won't know that number of arguments):
secondToLastArg=$(($#-1))
echo $secondToLastArg
The results will show '3'. How do I dynamically assign the second to last argument?
You need a bit of math to get the number you want ($(($#-1))), then use indirection (${!n}) to get the actual argument.
$ set -- a b c
$ echo $#
a b c
$ n=$(($#-1))
$ echo $n
2
$ echo ${!n}
b
$
Indirection (${!n}) tells bash to use the value of n as the name of the variable to use ($2, in this case).
You can use $# as array & array chopping methods:
echo ${#:$(($#-1)):1}
It means, use 1 element starting from $(($#-1))...
If some old versions of shells do not support ${array:start:length} syntax but support only ${array:start} syntax, use below hack:
echo ${#:$(($#-1))} | { read x y ; echo $x; } # OR
read x unused <<< `echo ${#:$(($#-1))}`

Resources