the logic behind bash power set function - algorithm

The function (output the power set of a given input)
p() { [ $# -eq 0 ] && echo || (shift; p "$#") |
while read r ; do echo -e "$1 $r\n$r"; done }
Test Input
p $(echo -e "1 2 3")
Test Output
1 2 3
2 3
1 3
3
1 2
2
1
I have difficulty grasping the recursion in the following code. I tried to understand it by placing some variables inside of the code to denote the level of recursion and the order of execution, but I am still puzzled.
Here are the things I can tell so far:
The subshell's output will not be shown on the final output, as it gets redirected to the read command via pipe
The echo command appends new line for all of its output
The order of execution I see is:
p (1 2 3) -> 1 followed by all combination of output below\n
all combination of output below
p (2 3) -> 2 3\n3\n
p (3) -> 3
p () ->
So I think I should have p(2) instead of p(3) on execution #3, but how does that happen? Since shift only goes in one direction.
If I were to use "p(1 2 3 4)" as the input, it is the part that shows "1 2 3" in the output that confuses me.

The use of -e in the echo command seems to me pure obfuscation, since it could have been written:
p() { [ $# -eq 0 ] && echo || (shift; p "$#") |
while read r ; do
echo $1 $r
echo $r
done
}
In other words, "for every set in the power set of all but the first argument (shift; p "$#"), output both that set with and without the first argument."
The bash function works by setting up a chain of subshells, each one reading from the next one, something like this, where each box is a subshell and below it, I've shown its output as it reads each line of input: (I used "" to make "nothing" visible. => means "call"; <- means "read".)
+---------+ +-------+ +-------+ +-------+
| p 1 2 3 | ==> | p 2 3 | ==> | p 3 | ==> | p |
+---------+ +-------+ +-------+ +-------+
1 2 3 "" <--+-- 2 3 "" <---+-- 3 "" <-----+-- ""
2 3 "" <-/ / /
1 3 "" <--+-- 3 "" <-/ /
3 "" <-/ /
1 2 "" <--+-- 2 "" <---+-- "" <-/
2 "" <-/ /
1 "" <--+-- "" <-/
"" <-/

Related

select multiple patterns with grep

I have file that looks like that:
t # 3-7, 1
v 0 104
v 1 92
v 2 95
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-8, 1
v 0 94
v 1 13
v 2 19
v 3 5
u 0 1 2
u 0 2 2
u 0 3 2
t # 3-9, 1
v 0 94
v 1 13
v 2 19
v 3 7
u 0 1 2
u 0 2 2
u 0 3 2
t corresponds to header of each block.
I would like to extract multiple patterns from the file and output transactions that contain required patterns altogether.
I tried the following code:
ps | grep -e 't\|u 0 1 2' file.txt
and it works well to extract header and pattern 'u 0 1 2'. However, when I add one more pattern, the output list only headers start with t #. My modified code looks like that:
ps | grep -e 't\|u 0 1 2 && u 0 2 2' file.txt
I tried sed and awk solutions, but they do not work for me as well.
Thank you for your help!
Olha
Use | as the separator before the third alternative, just like the second alternative.
grep -E 't|u 0 1 2|u 0 2 2' file.txt
Also, it doesn't make sense to specify a filename and also pipe ps to grep. If you provide filename arguments, it doesn't read from the pipe (unless you use - as a filename).
You can use grep with multiple -e expressions to grep for more than one thing at a time:
$ printf '%d\n' {0..10} | grep -e '0' -e '5'
0
5
10
Expanding on #kojiro's answer, you'll want to use an array to collect arguments:
mapfile -t lines < file.txt
for line in "${lines[#]}"
do
arguments+=(-e "$line")
done
grep "${arguments[#]}"
You'll probably need a condition within the loop to check whether the line is one you want to search for, but that's it.

How would I loop over pairs of values without repetition in bash?

I'm using a particular program that would require me to examine pairs of variables in a text file by specifying the pairs using indices.
For example:
gcta --reml-bivar 1 2 --grm test --pheno test.phen --out test
Where 1 and 2 would correspond to values from the first two columns in a text file. If I had 50 columns and wanted to examine each pair without repetition (1&2, 2&3, 1&3 ... 50), what would be the best way to automate this by looping through this? So essentially the script would be executing the same command but taking in pairs of indices like:
gcta --reml-bivar 1 3 --grm test --pheno test.phen --out test
gcta --reml-bivar 1 4 --grm test --pheno test.phen --out test
... so on and so forth. Thanks!
Since you haven't shown us any sample input we're just guessing but if your input is list of numbers (extracted from a file or otherwise) then here's an approach:
$ cat combinations.awk
###################
# Calculate all combinations of a set of strings, see
# https://rosettacode.org/wiki/Combinations#AWK
###################
function get_combs(A,B, i,n,comb) {
## Default value for r is to choose 2 from pool of all elements in A.
## Can alternatively be set on the command line:-
## awk -v r=<number of items being chosen> -f <scriptname>
n = length(A)
if (r=="") r = 2
comb = ""
for (i=1; i <= r; i++) { ## First combination of items:
indices[i] = i
comb = (i>1 ? comb OFS : "") A[indices[i]]
}
B[comb]
## While 1st item is less than its maximum permitted value...
while (indices[1] < n - r + 1) {
## loop backwards through all items in the previous
## combination of items until an item is found that is
## less than its maximum permitted value:
for (i = r; i >= 1; i--) {
## If the equivalently positioned item in the
## previous combination of items is less than its
## maximum permitted value...
if (indices[i] < n - r + i) {
## increment the current item by 1:
indices[i]++
## Save the current position-index for use
## outside this "for" loop:
p = i
break
}
}
## Put consecutive numbers in the remainder of the array,
## counting up from position-index p.
for (i = p + 1; i <= r; i++) indices[i] = indices[i - 1] + 1
## Print the current combination of items:
comb = ""
for (i=1; i <= r; i++) {
comb = (i>1 ? comb OFS : "") A[indices[i]]
}
B[comb]
}
}
# Input should be a list of strings
{
split($0,A)
delete B
get_combs(A,B)
PROCINFO["sorted_in"] = "#ind_str_asc"
for (comb in B) {
print comb
}
}
.
$ awk -f combinations.awk <<< '1 2 3 4'
1 2
1 3
1 4
2 3
2 4
3 4
.
$ while read -r a b; do
echo gcta --reml-bivar "$a" "$b" --grm test --pheno test.phen --out test
done < <(awk -f combinations.awk <<< '1 2 3 4')
gcta --reml-bivar 1 2 --grm test --pheno test.phen --out test
gcta --reml-bivar 1 3 --grm test --pheno test.phen --out test
gcta --reml-bivar 1 4 --grm test --pheno test.phen --out test
gcta --reml-bivar 2 3 --grm test --pheno test.phen --out test
gcta --reml-bivar 2 4 --grm test --pheno test.phen --out test
gcta --reml-bivar 3 4 --grm test --pheno test.phen --out test
Remove the echo when you're done testing and happy with the output.
In case anyone's reading this and wants permutations instead of combinations:
$ cat permutations.awk
###################
# Calculate all permutations of a set of strings, see
# https://en.wikipedia.org/wiki/Heap%27s_algorithm
function get_perm(A, i, lgth, sep, str) {
lgth = length(A)
for (i=1; i<=lgth; i++) {
str = str sep A[i]
sep = " "
}
return str
}
function swap(A, x, y, tmp) {
tmp = A[x]
A[x] = A[y]
A[y] = tmp
}
function generate(n, A, B, i) {
if (n == 1) {
B[get_perm(A)]
}
else {
for (i=1; i <= n; i++) {
generate(n - 1, A, B)
if ((n%2) == 0) {
swap(A, 1, n)
}
else {
swap(A, i, n)
}
}
}
}
function get_perms(A,B) {
generate(length(A), A, B)
}
###################
# Input should be a list of strings
{
split($0,A)
delete B
get_perms(A,B)
PROCINFO["sorted_in"] = "#ind_str_asc"
for (perm in B) {
print perm
}
}
.
$ awk -f permutations.awk <<< '1 2 3 4'
1 2 3 4
1 2 4 3
1 3 2 4
1 3 4 2
1 4 2 3
1 4 3 2
2 1 3 4
2 1 4 3
2 3 1 4
2 3 4 1
2 4 1 3
2 4 3 1
3 1 2 4
3 1 4 2
3 2 1 4
3 2 4 1
3 4 1 2
3 4 2 1
4 1 2 3
4 1 3 2
4 2 1 3
4 2 3 1
4 3 1 2
4 3 2 1
Both of the above use GNU awk for sorted_in to sort the output. If you don't have GNU awk you can still use the scripts as-is and if you need to sort the output then pipe it to sort.
If I understand you correctly and you don't need pairs looks like '1 1', '2 2', ... and '1 2', '2 1' ... try this script
#!/bin/bash
for i in $(seq 1 49);
do
for j in $(seq $(($i + 1)) 50);
do gcta --reml-bivar "$i $j" --grm test --pheno test.phen --out test
done;
done;
1 and 2 would correspond to values from the first two columns in a text file.
each pair without repetition
So let's walk through this process:
We repeat the first column from the file times the file length
We repeat each value (each line) from the second column from the file times the file length
We join the repeated columns -> we have all combinations
We need to filter "repetitions", we can just join the file with the original file and filter out repeating columns
So we get each pair without repetitions.
Then we just read the file line by line.
The script:
# create an input file cause you didn't provide any
cat << EOF > in.txt
1 a
2 b
3 c
4 d
EOF
# get file length
inlen=$(<in.txt wc -l)
# join the columns
paste -d' ' <(
# repeat the first column inlen times
# https://askubuntu.com/questions/521465/how-can-i-repeat-the-content-of-a-file-n-times
seq "$inlen" |
xargs -I{} cut -d' ' -f1 in.txt
) <(
# repeat each line inlen times
# https://unix.stackexchange.com/questions/81904/repeat-each-line-multiple-times
awk -v IFS=' ' -v v="$inlen" '{for(i=0;i<v;i++)print $2}' in.txt
) |
# filter out repetitions - ie. filter original lines from the file
sort |
comm --output-delimiter='' -3 <(sort in.txt) - |
# read the file line by line
while read -r one two; do
echo "$one" "$two"
done
will output:
1 b
1 c
1 d
2 a
2 c
2 d
3 a
3 b
3 d
4 a
4 b
4 c
#!/bin/bash
#set the length of the combination depending the
#user's choice
eval rg+=({1..$2})
#the code builds the script and runs it (eval)
eval `
#Character range depending on user selection
for i in ${rg[#]} ; do
echo "for c$i in {1..$1} ;do "
done ;
#Since the script is based on a code that brings
#all possible combinations even with duplicates -
#this is where the deduplication
#prevention conditioning set by (the script writes
#the conditioning code)
op1=$2
op2=$(( $2 - 1 ))
echo -n "if [ 1 == 1 ] "
while [ $op1 -gt 1 ] ; do
echo -n \&\& [ '$c'$op1 != '$c'$op2 ]' '
op2=$(( op2 -1 )
if [ $op2 == 0 ] ; then
op1=$(( op1 - 1 ))
op2=$(( op1 - 1 ))
fi
done ;
echo ' ; then'
echo -n "echo "
for i in ${rg[#]} ;
do
echo -n '$c'$i
done ;
echo \;
echo fi\;
for i in ${rg[#]} ; do
echo 'done ;'
done;`
example: range length
$ ./combs.bash '{1..2} {a..c} \$ \#' 4
12ab$
12ab#
12acb
12ac$
12ac#
12a$b
12a$c
12a$#
12a#b
12a#c
12a#$
..........
#!/bin/bash
len=$2
eval c=($1)
per()
{
((`grep -Poi '[^" ".]'<<<$2|sort|uniq|wc -l` < $((len - ${1}))))&&{ return;}
(($1 == 0))&&{ echo $2;return;}
for i in ${c[#]} ; do
per "$((${1} - 1 ))" "$2 $i"
done
}
per "$2" ""
#example
$ ./neto '{0..3} {a..d} \# \!' 7
0 1 2 3 a b c
0 1 2 3 a b d
0 1 2 3 a b #
0 1 2 3 a b !
0 1 2 3 a c b
0 1 2 3 a c d
0 1 2 3 a c #
0 1 2 3 a c !
0 1 2 3 a d b
...

Grep variable in for loop

I want to grep a specific line for each loop in a for loop. I've already looked on the internet to see an answer to my problem, I tried them but it doesn't seem to work for me... And I don't find what I'm doing wrong.
Here is the code :
for n in 2 4 6 8 10 12 14 ; do
for U in 1 10 100 ; do
for L in 2 4 6 8 ; do
i=0
cat results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat
for k in $(seq 1 1 $L) ; do
${'var'.$k}=`grep " $k " results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat | tail -n 1`
done
which gives me :
%
%
% site density double occupancy
1 0.49791021 0.03866179
2 0.49891438 0.06077808
3 0.50426102 0.05718336
4 0.49891438 0.06077808
./run_deviation_functionL.sh: line 109: ${'var'.$k}=`grep " $k " results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat | tail -n 1`: bad substitution
Then, I would like to take only the density number, with something like:
${'density'.$k}=`echo "${'var'.$k:10:10}" | bc -l`
Anyone knows the reason why it fails?
Use declare to create variable names from variables:
declare density$k="`...`"
Use the variable indirection to retrieve them:
var=var$k
echo ${!var:10:10}

Repeat an element n number of times in an array

Basically, I am trying to repeat each element in the following array [1 2 3] 4 times such that I will get something like this:
[1 1 1 1 2 2 2 2 3 3 3 3]
I tried a very stupid line of code i.e. abc=('1%.0s' {1..4}). But it failed miserably.
I am looking for an efficient one line solution to this problem and preferably, without using loops. If it is not possible to achieve this with just one line, then use loops.
Unless you're trying to avoid loops you can do:
arr=(1 2 3)
for i in ${arr[#]}; do for ((n=1; n<=4; n++)) do echo -n "$i ";done; done; echo
1 1 1 1 2 2 2 2 3 3 3 3
To store the results in an array:
aarr=($(for i in ${arr[#]}; do for ((n=1; n<=4; n++)) do echo -n "$i ";done; done;))
declare -p aarr
declare -a aarr='([0]="1" [1]="1" [2]="1" [3]="1" [4]="2" [5]="2" [6]="2" [7]="2" [8]="3" [9]="3" [10]="3" [11]="3")'
This does what you need and stores it in an array:
declare -a res=($(for v in 1 2 3; do for i in {1..4}; do echo $v; done; done))
Taking your idea to the next step:
$ a=(1 2 3)
$ b=($(for x in "${a[#]}"; do printf "$x%.0s " {1..4}; done))
$ echo ${b[#]}
1 1 1 1 2 2 2 2 3 3 3 3
Alternatively, using sed:
$ echo ${a[*]} | sed -r 's/[[:alnum:]]+/& & & &/g'
1 1 1 1 2 2 2 2 3 3 3 3
Or, using awk:
$ echo ${a[*]} | awk -v RS='[ \n]' '{for (i=1;i<=4;i++)printf "%s ", $0;} END{print""}'
1 1 1 1 2 2 2 2 3 3 3 3
Simple one liner:
for x in 1 2 3 ; do array+="$(printf "%1.0s$x" {1..4})" ;done
Similar to what you wanted.

How to produce cartesian product in bash?

I want to produce such file (cartesian product of [1-3]X[1-5]):
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5
I can do this using nested loop like:
for i in $(seq 3)
do
for j in $(seq 5)
do
echo $i $j
done
done
is there any solution without loops?
Combine two brace expansions!
$ printf "%s\n" {1..3}" "{1..5}
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5
This works by using a single brace expansion:
$ echo {1..5}
1 2 3 4 5
and then combining with another one:
$ echo {1..5}+{a,b,c}
1+a 1+b 1+c 2+a 2+b 2+c 3+a 3+b 3+c 4+a 4+b 4+c 5+a 5+b 5+c
A shorter (but hacky) version of Rubens's answer:
join -j 999999 -o 1.1,2.1 file1 file2
Since the field 999999 most likely does not exist it is considered equal for both sets and therefore join have to do the Cartesian product. It uses O(N+M) memory and produces output at 100..200 Mb/sec on my machine.
I don't like the "shell brace expansion" method like echo {1..100}x{1..100} for large datasets because it uses O(N*M) memory and can when used careless bring your machine to knees. It is hard to stop because ctrl+c does not interrupts brace expansion which is done by the shell itself.
The best alternative for cartesian product in bash is surely -- as pointed by #fedorqui -- to use parameter expansion. However, in case your input that is not easily producible (i.e., if {1..3} and {1..5} does not suffice), you could simply use join.
For example, if you want to peform the cartesian product of two regular files, say "a.txt" and "b.txt", you could do the following. First, the two files:
$ echo -en {a..c}"\tx\n" | sed 's/^/1\t/' > a.txt
$ cat a.txt
1 a x
1 b x
1 c x
$ echo -en "foo\nbar\n" | sed 's/^/1\t/' > b.txt
$ cat b.txt
1 foo
1 bar
Notice the sed command is used to prepend each line with an identifier. The identifier must be the same for all lines, and for all files, so the join will give you the cartesian product -- instead of putting aside some of the resultant lines. So, the join goes as follows:
$ join -j 1 -t $'\t' a.txt b.txt | cut -d $'\t' -f 2-
a x foo
a x bar
b x foo
b x bar
c x foo
c x bar
After both files are joined, cut is used as an alternative to remove the column of "1"s formerly prepended.

Resources