Repeat an element n number of times in an array - bash

Basically, I am trying to repeat each element in the following array [1 2 3] 4 times such that I will get something like this:
[1 1 1 1 2 2 2 2 3 3 3 3]
I tried a very stupid line of code i.e. abc=('1%.0s' {1..4}). But it failed miserably.
I am looking for an efficient one line solution to this problem and preferably, without using loops. If it is not possible to achieve this with just one line, then use loops.

Unless you're trying to avoid loops you can do:
arr=(1 2 3)
for i in ${arr[#]}; do for ((n=1; n<=4; n++)) do echo -n "$i ";done; done; echo
1 1 1 1 2 2 2 2 3 3 3 3
To store the results in an array:
aarr=($(for i in ${arr[#]}; do for ((n=1; n<=4; n++)) do echo -n "$i ";done; done;))
declare -p aarr
declare -a aarr='([0]="1" [1]="1" [2]="1" [3]="1" [4]="2" [5]="2" [6]="2" [7]="2" [8]="3" [9]="3" [10]="3" [11]="3")'

This does what you need and stores it in an array:
declare -a res=($(for v in 1 2 3; do for i in {1..4}; do echo $v; done; done))

Taking your idea to the next step:
$ a=(1 2 3)
$ b=($(for x in "${a[#]}"; do printf "$x%.0s " {1..4}; done))
$ echo ${b[#]}
1 1 1 1 2 2 2 2 3 3 3 3
Alternatively, using sed:
$ echo ${a[*]} | sed -r 's/[[:alnum:]]+/& & & &/g'
1 1 1 1 2 2 2 2 3 3 3 3
Or, using awk:
$ echo ${a[*]} | awk -v RS='[ \n]' '{for (i=1;i<=4;i++)printf "%s ", $0;} END{print""}'
1 1 1 1 2 2 2 2 3 3 3 3

Simple one liner:
for x in 1 2 3 ; do array+="$(printf "%1.0s$x" {1..4})" ;done
Similar to what you wanted.

Related

Loop through a file and paste columns next to one another

Given I have a python script as follows:
#!/usr/bin/python
for i in range(1,4):
print i
I want to run it in a bash loop for 3 times but I want to add the output as columns rather than concatenating. Is there a way to achieve this?
Output:
1 1 1
2 2 2
3 3 3
Like this?:
$ for i in {1..3} ; do echo $i $i $i ; done
1 1 1
2 2 2
3 3 3
You are looking for the pr command:
for i in 1 2 3 ; do
python a.py
done | pr -t -3
Output:
1 1 1
2 2 2
3 3 3
Btw, to get the numbers from 1 to 3 you need to use:
range(1,4) # <-- 4, not 3!
in Python

awk: print first column, then some values, and then all other columns

I want to print the first column, then a couple of columns with fixed values, like this command would do:
awk '{print $1,"1","2","1"}'
and then print all columns except the first after that...
I know this command prints all but the first column:
awk '{$1=""; print $0}'
But that gets rid of the first column.
In other words, this:
3 5 2 2
3 5 2 2
3 5 2 2
3 5 2 2
Needs to become this:
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
Any ideas?
use a loop to iterate through rest of the columns like this:
awk '{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
As an example:
$echo "3 5 2 2" | awk 'BEGIN{ORS=""}{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
3 1 2 1 5 2 2
$
Edit1 :
$ echo "3 5 2 2" | awk 'BEGIN{ORS="\n";OFS="\n"}{print $1,"1","2","1 ";for(i=2;i<=NF;i++) print $i" "}'
3
1
2
1
5
2
2
$
Edit2:
$ echo "3 5 2 2" | awk '{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
3 1 2 1
5
2
2
$
Edit3:
$ echo "3 5 2 2
3 5 2 2
3 5 2 2
3 5 2 2" | awk '{printf("%s %s ", $1,"1 2 1");for(i=2;i<=NF;i++) printf("%s ", $i); printf "\n"}'
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
You are almost there, you just need to store the first column in a temporary variable:
{
head=$1; # Store $1 in head, used later in printf
$1=""; # Empty $1, so that $0 will not contain first column
printf "%s 1 2 1%s\n", head, $0
}
And a full script:
echo "3 5 2 2" | awk '{head=$1;$1="";printf "%s 1 2 1%s\n", head, $0}'
Another solution with awk:
awk '{sub(/.*/, "1 2 1 "$2, $2)}1' File
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
Substitute the 2nd field with "1 2 1" followed by 2nd field itself.
You can do this using sed by replacing the first space by the string you want.
sed 's/ / 1 2 1 /' file
(OR)
With awk by replacing the first field($1):
awk '{$1=$1 " 1 2 1"}1' file
(I prefer the sed solution since it has less characters).

How to produce cartesian product in bash?

I want to produce such file (cartesian product of [1-3]X[1-5]):
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5
I can do this using nested loop like:
for i in $(seq 3)
do
for j in $(seq 5)
do
echo $i $j
done
done
is there any solution without loops?
Combine two brace expansions!
$ printf "%s\n" {1..3}" "{1..5}
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5
This works by using a single brace expansion:
$ echo {1..5}
1 2 3 4 5
and then combining with another one:
$ echo {1..5}+{a,b,c}
1+a 1+b 1+c 2+a 2+b 2+c 3+a 3+b 3+c 4+a 4+b 4+c 5+a 5+b 5+c
A shorter (but hacky) version of Rubens's answer:
join -j 999999 -o 1.1,2.1 file1 file2
Since the field 999999 most likely does not exist it is considered equal for both sets and therefore join have to do the Cartesian product. It uses O(N+M) memory and produces output at 100..200 Mb/sec on my machine.
I don't like the "shell brace expansion" method like echo {1..100}x{1..100} for large datasets because it uses O(N*M) memory and can when used careless bring your machine to knees. It is hard to stop because ctrl+c does not interrupts brace expansion which is done by the shell itself.
The best alternative for cartesian product in bash is surely -- as pointed by #fedorqui -- to use parameter expansion. However, in case your input that is not easily producible (i.e., if {1..3} and {1..5} does not suffice), you could simply use join.
For example, if you want to peform the cartesian product of two regular files, say "a.txt" and "b.txt", you could do the following. First, the two files:
$ echo -en {a..c}"\tx\n" | sed 's/^/1\t/' > a.txt
$ cat a.txt
1 a x
1 b x
1 c x
$ echo -en "foo\nbar\n" | sed 's/^/1\t/' > b.txt
$ cat b.txt
1 foo
1 bar
Notice the sed command is used to prepend each line with an identifier. The identifier must be the same for all lines, and for all files, so the join will give you the cartesian product -- instead of putting aside some of the resultant lines. So, the join goes as follows:
$ join -j 1 -t $'\t' a.txt b.txt | cut -d $'\t' -f 2-
a x foo
a x bar
b x foo
b x bar
c x foo
c x bar
After both files are joined, cut is used as an alternative to remove the column of "1"s formerly prepended.

how to subtract fields pairwise in bash?

I have a large dataset that looks like this:
5 6 5 6 3 5
2 5 3 7 1 6
4 8 1 8 6 9
1 5 2 9 4 5
For every line, I want to subtract the first field from the second, third from fourth and so on deepening on the number of fields (always even). Then, I want to report those lines for which difference from all the pairs exceeds a certain limit (say 2). I should also be able to report next best lines i.e., lines in which one pairwise comparison fails to meet the limit, but all other pairs meet the limit.
from the above example, if I set a limit to 2 then, my output file should contain
best lines:
2 5 3 7 1 6 # because (5-2), (7-3), (6-1) are all > 2
4 8 1 8 6 9 # because (8-4), (8-1), (9-6) are all > 2
next best line(s)
1 5 2 9 4 5 # because except (5-4), both (5-1) and (9-2) are > 2
My current approach is to read every line, save each field as a variable, do subtraction.
But I don't know how to proceed further.
Thanks,
Prints "best" lines to the file "best", and prints "next best" lines to the file "nextbest"
awk '
{
fail_count=0
for (i=1; i<NF; i+=2){
if ( ($(i+1) - $i) <= threshold )
fail_count++
}
if (fail_count == 0)
print $0 > "best"
else if (fail_count == 1)
print $0 > "nextbest"
}
' threshold=2 inputfile
Pretty straightforward stuff.
Loop through fields 2 at a time.
If (next field - current field) does not exceed threshold, increment fail_count
If that line's fail_count is zero, that means it belongs to "best" lines.
Else if that line's fail_count is one, it belongs to "next best" lines.
Here's a bash-way to do it:
#!/bin/bash
threshold=$1
shift
file="$#"
a=($(cat "$file"))
b=$(( ${#a[#]}/$(cat "$file" | wc -l) ))
for ((r=0; r<${#a[#]}/b; r++)); do
br=$((b*r))
for ((c=0; c<b; c+=2)); do
if [[ $(( ${a[br + c+1]} - ${a[br + c]} )) < $threshold ]]; then
break; fi
if [[ $((c+2)) == $b ]]; then
echo ${a[#]:$br:$b}; fi
done
done
Usage:
$ ./script.sh 2 yourFile.txt
2 5 3 7 1 6
4 8 1 8 6 9
This output can then easily be redirected:
$ ./script.sh 2 yourFile.txt > output.txt
NOTE: this does not work properly if you have those empty lines between each line...But I'm sure the above will get you well on your way.
I probably wouldn't do that in bash. Personally, I'd do it in Python, which is generally good for those small quick-and-dirty scripts.
If you have your data in a text file, you can read here about how to get that data into Python as a list of lines. Then you can use a for-loop to process each line:
threshold = 2
results = []
for line in content:
numbers = [int(n) for n in line.split()] # Split it into a list of numbers
pairs = zip(numbers[::2],numbers[1::2]) # Pair up the numbers two and two.
result = [abs(y - x) for (x,y) in pairs] # Subtract the first number in each pair from the second.
if sum(result) > threshold:
results.append(numbers)
Yet another bash version:
First a check function that return nothing but a result code:
function getLimit() {
local pairs=0 count=0 limit=$1 wantdiff=$2
shift 2
while [ "$1" ] ;do
[ $(( $2-$1 )) -ge $limit ] && : $((count++))
: $((pairs++))
shift 2
done
test $((pairs-count)) -eq $wantdiff
}
than now:
while read line ;do getLimit 2 0 $line && echo $line;done <file
2 5 3 7 1 6
4 8 1 8 6 9
and
while read line ;do getLimit 2 1 $line && echo $line;done <file
1 5 2 9 4 5
If you can use awk
$ cat del1
5 6 5 6 3 5
2 5 3 7 1 6
4 8 1 8 6 9
1 5 2 9 4 5
1 5 2 9 4 5 3 9
$ cat del1 | awk '{
> printf "%s _ ",$0;
> for(i=1; i<=NF; i+=2){
> printf "%d ",($(i+1)-$i)};
> print NF
> }' | awk '{
> upper=0;
> for(i=1; i<=($NF/2); i++){
> if($(NF-i)>threshold) upper++
> };
> printf "%d _ %s\n", upper, $0}' threshold=2 | sort -nr
3 _ 4 8 1 8 6 9 _ 4 7 3 6
3 _ 2 5 3 7 1 6 _ 3 4 5 6
3 _ 1 5 2 9 4 5 3 9 _ 4 7 1 6 8
2 _ 1 5 2 9 4 5 _ 4 7 1 6
0 _ 5 6 5 6 3 5 _ 1 1 2 6
You can process result further according to your needs. The result is sorted by ‘goodness’ order.

Split specific column(s)

I have this kind of recrods:
1 2 12345
2 4 98231
...
I need to split the third column into sub-columns to get this (separated by single-space for example):
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Can anybody offer me a nice solution in sed, awk, ... etc ? Thanks!
EDIT: the size of the original third column may vary record by record.
Awk
% echo '1 2 12345
2 4 98231
...' | awk '{
gsub(/./, "& ", $3)
print
}
'
1 2 1 2 3 4 5
2 4 9 8 2 3 1
...
[Tested with GNU Awk 3.1.7]
This takes every character (/./) in the third column ($3) and replaces (gsub()) it with itself followed by a space ("& ") before printing the entire line.
Sed solution:
sed -e 's/\([0-9]\)/\1 /g' -e 's/ \+/ /g'
The first sed expression replaces every digit with the same digit followed by a space. The second expression replaces every block of spaces with a single space, thus handling the double spaces introduced by the previous expression. With non-GNU seds you may need to use two sed invocations (one for each -e).
Using awk substr and printf:
[srikanth#myhost ~]$ cat records.log
1 2 12345 6 7
2 4 98231 8 0
[srikanth#myhost ~]$ awk '{ len=length($3); for(i=1; i<=NF; i++) { if(i==3) { for(j = 1; j <= len; j++){ printf substr($3,j,1) " "; } } else { printf $i " "; } } printf("\n"); }' records.log
1 2 1 2 3 4 5 6 7
2 4 9 8 2 3 1 8 0
You can use this for more than three column records as well.
Using perl:
perl -pe 's/([0-9])(?! )/\1 /g' INPUT_FILE
Test:
[jaypal:~/Temp] cat tmp
1 2 12345
2 4 98231
[jaypal:~/Temp] perl -pe 's/([0-9])(?! )/\1 /g' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Using gnu sed:
sed 's/\d/& /3g' INPUT_FILE
Test:
[jaypal:~/Temp] sed 's/[0-9]/& /3g' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Using gnu awk:
gawk '{print $1,$2,gensub(/./,"& ","G", $NF)}' INPUT_FILE
Test:
[jaypal:~/Temp] gawk '{print $1,$2,gensub(/./,"& ","G", $NF)}' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
If you don't care about spaces, this is a succinct version:
sed 's/[0-9]/& /g'
but if you need to remove spaces, we just chain another regexp:
sed 's/[0-9]/& /g;s/ */ /g'
Note this is compatible with the original sed, thus will run on any UNIX-like.
$ awk -F '' '$1=$1' data.txt | tr -s ' '
1 2 1 2 3 4 5
2 4 9 8 2 3 1
This might work for you:
echo -e "1 2 12345\n2 4 98231" | sed 's/\B\s*/ /g'
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Most probably GNU sed only.

Resources