How to combine Bash array offset with default value? - bash

What I'm after is to have the most compact expression that expands the special parameter # with an offset of 2 or else to a default value of foobar if the subscript expands to the empty string or null. I tried the following notations but without luck:
"$#:2:-foobar"
"${#:2:-foobar}"
"${#:2: -foobar}"
Is there such a compact notation? Alternatively what would be a similar solution; ideally without temporary variables?

You may combine the expansion for the second parameter or its default value, followed by the expansion from the next offset.
Assuming your array is the program or function's argument array $# then,
#!/bin/bash
echo A "${#:2}"
echo B "${#:2:}" # your attempt #1
echo C "${#:2-foobar}" # your attempt #2
echo D "${#:2: -foobar}" # your attempt #3
echo E "${2:-foobar}"
echo F "$1" "${2:-foobar}" ${#:3}
G=("$1" "${2:-foobar}" ${#:3})
echo G "${G[#]}"
Will yield the desired result for line F and G (G uses a temp variable though).
Ex:
$ bash expand.sh 1
A
B
C
D
E foobar
F 1 foobar
G 1 foobar
$ bash expand.sh 1 2 3 4
A 2 3 4
B
C 2 3 4
D
E 2
F 1 2 3 4
G 1 2 3 4
If you're trying to do this with a different array than "$#", say H=(1 2 3), providing defaults to index expansions ("${H[2]:-foobar}") doesn't seem to work. Your best bet in this case, assuming you don't want to introduce temporary variables is to use a function or eval. But at that point you might be better off just adding a conditional e.g.,
# assuming that H wasn't sparse. redefine H based on its values
H=(
"${H[0]}"
$([[ -n "${H[1]}" ]] && echo "${H[1]}" || echo "foobar")
${H[#]:2}
)
But, readability will suffer.

Related

what does ${#:3} means in bash?

In the dist_train.sh from mmdetection3d, what does ${#:3} do at the last line ?
I can't understand its bash grammar.
#!/usr/bin/env bash
CONFIG=$1
GPUS=$2
NNODES=${NNODES:-1}
NODE_RANK=${NODE_RANK:-0}
PORT=${PORT:-29500}
MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
python -m torch.distributed.launch \
--nnodes=$NNODES \
--node_rank=$NODE_RANK \
--master_addr=$MASTER_ADDR \
--nproc_per_node=$GPUS \
--master_port=$PORT \
$(dirname "$0")/train.py \
$CONFIG \
--seed 0 \
--launcher pytorch ${#:3}
It is standard parameter expansion:
${parameter:offset}
${parameter:offset:length}
This is referred to as Substring Expansion. It expands to up to
length characters of the value of parameter starting at the character
specified by offset. If parameter is #, an indexed array
subscripted by # or *, or an associative array name, the
results differ as described below. If length is omitted, it expands
to the substring of the value of parameter starting at the character
specified by offset and extending to the end of the value. length
and offset are arithmetic expressions (see Shell Arithmetic).
[...]
If parameter is #, the result is length positional parameters
beginning at offset. A negative offset is taken relative to one
greater than the greatest positional parameter, so an offset of -1
evaluates to the last positional parameter. It is an expansion error
if length evaluates to a number less than zero.
The following examples illustrate substring expansion using positional parameters:
$ set -- 1 2 3 4 5 6 7 8 9 0 a b c d e f g h
$ echo ${#:7}
7 8 9 0 a b c d e f g h
$ echo ${#:7:0}
$ echo ${#:7:2}
7 8
$ echo ${#:7:-2}
bash: -2: substring expression < 0
$ echo ${#: -7:2}
b c
$ echo ${#:0}
./bash 1 2 3 4 5 6 7 8 9 0 a b c d e f g h
$ echo ${#:0:2}
./bash 1
$ echo ${#: -7:0}
Per the Bash Hackers wiki on the Positional Parameters syntax, the ${#:3} means any script argument starting at the third argument.
In other words, the ${#:3} syntax means "all arguments EXCEPT the first and second". A similar SO question exists from which you can infer the same conclusion.
A contrived example:
foo() {
echo "${#:3}"
}
foo a b c d e f g h i
# prints c d e f g h i
Great question.
In bash this is one kind of something called variable expansion. In this case the variable is $# representing all the parameters received by the program (or function), as a string.
Using the colon : means that you want to 'expand' $# to a subset of it's original string (ie. a substring).
So in this instance you're saying give me the string representing all the incoming parameters, but start from the 3rd one.

Bash: reshape a dataset of many rows to dataset of many columns

Suppose I have the following data:
# all the numbers are their own number. I want to reshape exactly as below
0 a
1 b
2 c
0 d
1 e
2 f
0 g
1 h
2 i
...
And I would like to reshape the data such that it is:
0 a d g ...
1 b e h ...
2 c f i ...
Without writing a complex composition. Is this possible using the unix/bash toolkit?
Yes, trivially I can do this inside a language. The idea is NOT TO "just" do that. So if some cat X.csv | rs [magic options] sort of solution (and rs, or the bash reshape command, would be great, except it isn't working here on debian stretch) exists, that is what I am looking for.
Otherwise, an equivalent answer that involves a composition of commands or script is out of scope: already got that, but would rather not have it.
Using GNU datamash:
$ datamash -s -W -g 1 collapse 2 < file
0 a,d,g
1 b,e,h
2 c,f,i
Options:
-s sort
-W use whitespace (spaces or tabs) as delimiters
-g 1 group on the first field
collapse 2 print comma-separated list of values of the second field
To convert the tabs and commas to space characters, pipe the output to tr:
$ datamash -s -W -g 1 collapse 2 < file | tr '\t,' ' '
0 a d g
1 b e h
2 c f i
bash version:
function reshape {
local index number key
declare -A result
while read index number; do
result[$index]+=" $number"
done
for key in "${!result[#]}"; do
echo "$key${result[$key]}"
done
}
reshape < input
We just need to make sure input is in unix format

Bash shell iterations over letters and numbers

Say I want to iterate over two lists of letters and numbers.
A B C D and seq 1 100.
How can I iterate over letters along with numbers but not as in nested for-loop? So it would be A1B2C3D4 A5B6C7D8 ...
What I've tried so far: nested for-loop and & done don't seem to be of any help, since they produce either A1 B1 C1 D1 A2 B2... or inconsistent results of parallel execution.
Also it feels like a very basic parallel loop, so no need for a detailed explanation or actual code: ANY ANSWER mentioning link to docs or the conventional name of such sequence would be immediately accepted.
The following script generates your expected output with a leading space:
Script
for i in {1..100}; do
IFS= read c
printf %s "$c$i"
done < <(yes $' A\nB\nC\n\D')
Output
A1B2C3D4 A5B6C7D8 A9B10C11D12 A13B14C15D16 A17B18C19D20 A21B22C23D24 A25B26C27D28 A29B30C31D32 A33B34C35D36 A37B38C39D40 A41B42C43D44 A45B46C47D48 A49B50C51D52 A53B54C55D56 A57B58C59D60 A61B62C63D64 A65B66C67D68 A69B70C71D72 A73B74C75D76 A77B78C79D80 A81B82C83D84 A85B86C87D88 A89B90C91D92 A93B94C95D96 A97B98C99D100
Explanation
To read the sequence 1 2 3 ... 100 in its full length, we need to repeat the sequence A B C D over and over again. yes is a command that repeats its argument ad infinitum. yes x prints
x
x
x
...
To let yes print something different in every line, we use a trick. $' A\nB\nC\nD' is a string that contains linebreaks ($'' is a so called bash ansi-c quote). yes $' A\nB\nC\nD' will print
A
B
C
D
A
B
...
Instead of printing to the console, we want to consume the text later. To this end, we could write yes ... | someCommand or someCommand < <(yes ...) which has some advantages over a pipe. The latter is called process substitution. Note that for ...; done is also just one command. The redirected stdin can be read from anywhere inside the for loop.
#!/bin/bash
# ASCII code for A
A=65
# Loop from 1 to 100
for ii in $( seq 1 100 )
do
# Compute ASCII code with using modulo
code=$(( (ii-1) % 4 + A ))
# Print letter
printf "\x$(printf %x $code)"
# Print number
echo $ii
done

Bash variable not decrementing in a pipeline

The variable x in the first example doesn't get decremented, while in the second example it works. Why?
Non working example:
#!/bin/bash
x=100
f() {
echo $((x--)) | tr 0-9 A-J
# this also wouldn't work: tr 0-9 A-J <<< $((x--))
f
}
f
Working example:
#!/bin/bash
x=100
f() {
echo $x | tr 0-9 A-J
((x--))
# this also works: a=$((x--))
f
}
f
I think it's related to subshells since I think that the individual commands in the pipeline are running in subshells.
It does decrement if you don't use a pipeline (and avoid a sub shell forking):
x=10
f() {
if ((x)); then
echo $((x--))
f
fi
}
Then call it as:
f
it will print:
10
9
8
7
6
5
4
3
2
1
Since decrement is happening inside the subshell hence current shell doesn't see the decremented value of x and goes in infinite recursion.
EDIT: You can try this work around:
x=10
f() {
if ((x)); then
x=$(tr 0-9 A-J <<< $x >&2; echo $((--x)))
f
fi
}
f
To get this output:
BA
J
I
H
G
F
E
D
C
B

reset row number count in awk

I have a file like this
file.txt
0 1 a
1 1 b
2 1 d
3 1 d
4 2 g
5 2 a
6 3 b
7 3 d
8 4 d
9 5 g
10 5 g
.
.
.
I want reset row number count to 0 in first column $1 whenever value of field in second column $2 changes, using awk or bash script.
result
0 1 a
1 1 b
2 1 d
3 1 d
0 2 g
1 2 a
0 3 b
1 3 d
0 4 d
0 5 g
1 5 g
.
.
.
As long as you don't mind a bit of excess memory usage, and the second column is sorted, I think this is the most fun:
awk '{$1=a[$2]+++0;print}' input.txt
This awk one-liner seems to work for me:
[ghoti#pc ~]$ awk 'prev!=$2{first=0;prev=$2} {$1=first;first++} 1' input.txt
0 1 a
1 1 b
2 1 d
3 1 d
0 2 g
1 2 a
0 3 b
1 3 d
0 4 d
0 5 g
1 5 g
Let's break apart the script and see what it does.
prev!=$2 {first=0;prev=$2} -- This is what resets your counter. Since the initial state of prev is empty, we reset on the first line of input, which is fine.
{$1=first;first++} -- For every line, set the first field, then increment variable we're using to set the first field.
1 -- this is awk short-hand for "print the line". It's really a condition that always evaluates to "true", and when a condition/statement pair is missing a statement, the statement defaults to "print".
Pretty basic, really.
The one catch of course is that when you change the value of any field in awk, it rewrites the line using whatever field separators are set, which by default is just a space. If you want to adjust this, you can set your OFS variable:
[ghoti#pc ~]$ awk -vOFS=" " 'p!=$2{f=0;p=$2}{$1=f;f++}1' input.txt | head -2
0 1 a
1 1 b
Salt to taste.
A pure bash solution :
file="/PATH/TO/YOUR/OWN/INPUT/FILE"
count=0
old_trigger=0
while read a b c; do
if ((b == old_trigger)); then
echo "$((count++)) $b $c"
else
count=0
echo "$((count++)) $b $c"
old_trigger=$b
fi
done < "$file"
This solution (IMHO) have the advantage of using a readable algorithm. I like what's other guys gives as answers, but that's not that comprehensive for beginners.
NOTE:
((...)) is an arithmetic command, which returns an exit status of 0 if the expression is nonzero, or 1 if the expression is zero. Also used as a synonym for let, if side effects (assignments) are needed. See http://mywiki.wooledge.org/ArithmeticExpression
Perl solution:
perl -naE '
$dec = $F[0] if defined $old and $F[1] != $old;
$F[0] -= $dec;
$old = $F[1];
say join "\t", #F[0,1,2];'
$dec is subtracted from the first column each time. When the second column changes (its previous value is stored in $old), $dec increases to set the first column to zero again. The defined condition is needed for the first line to work.

Resources