Relationship between columns - awk - bash

I have a file with a structure more or less like this:
test:
1 2 3 4 5
2 4 5 0 0
6 4 5 0 0
7 8 9 10 11
8 10 11 0 0
12 10 11 0 0
13 10 11 0 0
14 2 3 4 5
15 10 11 0 0
16 2 3 4 5
17 2 3 4 5
What I want is to get the first column when the 4th and the 5th are in the 2nd and 3rd, but the 2nd does not appear in the 2nd of the current line. It's a bit confusing, but it'd be like this:
1 6
7 12
7 13
7 15
14 6
16 6
17 6
I believe I'm almost there using this code:
cat test | awk 'NR==FNR {{a[$4" "$5]=a[$4" "$5]" "$1};next} $2" "$3 in a {print a[$2" "$3],$1}' - test
But the output that I get is:
1 14 16 17 2
1 14 16 17 6
7 8
7 12
7 13
7 15
Any help?
Thanks!

(elaborating on my comment)
This awk procedure uses the main action block to build a 2-d array representing the input table. The END block then makes pair-wise comparisons for each row against all others. The logic looks for rows where the 4th and 5th entry in one row match the 2nd and 3rd entry of the other but excludes rows if the second entry holds the first entry of the row it's being compared to:
(input data is from file named data.txt)
awk '
{
for (col = 1; col <= NF; col++) {
table[NR, col] = $col;}
}
END {
for (i=1; i<=FNR; i++) {
for(j=1; j<=FNR; j++) {
if (table[i,4]==table[j,2] && table[i,5]==table[j,3] && table[i,2]!=table[j,1]) {
print table[i,1]" "table[j,1];}
}}
}
' data.txt
Output:
1 6
7 12
7 13
7 15
14 6
16 6
17 6

Related

How to substract every nth from (n+3)th line in awk?

I have 4 column data files which have approximately 100 lines. I'd like to substract every nth from (n+3)th line and print the values in a new column ($5). The column data has not a regular pattern for each column.
My sample file:
cat input
1 2 3 20
1 2 3 10
1 2 3 5
1 2 3 20
1 2 3 30
1 2 3 40
1 2 3 .
1 2 3 .
1 2 3 . (and so on)
Output should be:
1 2 3 20 0 #(20-20)
1 2 3 10 20 #(30-10)
1 2 3 5 35 #(40-5)
1 2 3 20 ? #(. - 20)
1 2 3 30 ? #(. - 30)
1 2 3 40 ? #(. - 40)
1 2 3 .
1 2 3 .
1 2 3 . (and so on)
How can i do this in awk?
Thank you
For this I think the easiest thing is to read through the file twice. The first time (the NR==FNR block) we save all the 4th column values in an array indexed by the line number. The next block is executed for the second pass and creates a 5th column with the desired calculation (checking first to make sure that we wouldn't go passed the end of the file).
$ cat input
1 2 3 20
1 2 3 10
1 2 3 5
1 2 3 20
1 2 3 30
1 2 3 40
$ awk 'NR==FNR{a[NR]=$4; last=NR; next} {$5 = (FNR+3 <= last ? a[FNR+3] - $4 : "")}1' input input
1 2 3 20 0
1 2 3 10 20
1 2 3 5 35
1 2 3 20
1 2 3 30
1 2 3 40
You can do this using tac + awk + tac:
tac input |
awk '{a[NR]=$4} NR>3 { $5 = (a[NR-3] ~ /^[0-9]+$/ ? a[NR-3] - $4 : "?") } 1' |
tac | column -t
1 2 3 20 0
1 2 3 10 20
1 2 3 5 35
1 2 3 20 ?
1 2 3 30 ?
1 2 3 40 ?
1 2 3 .
1 2 3 .
1 2 3 .

AWK: Add number to the column for specific line

I have a data file of:
1 2 3
1 5 7
2 5 9
11 21 110
6 17 -2
10 2 8
6 4 3
5 1 8
6 1 5
7 3 1
I want to add number 1 to the third column, only for line number 1, 3, 6, 8, 9, 10. And add 2 to the second column, for line number 6~9.
I know how to add 2 to entire second column, and add 1 to entire third column using awk
awk '{print $1, $2+2, $3+1}' data > data2
But how can I modify this code to specific lines of second and third column?
Thanks
Best,
awk to the rescue! You can check for NR in the condition, but for 6 values it will be tedious, alternatively you can check for string match with anchored NR.
$ awk 'BEGIN{lines=",1,3,6,8,9,10,"}
match(lines,","NR","){$3++}
NR>=6 && NR<=9{$2+=2}1' nums
1 2 4
1 5 7
2 5 10
11 21 110
6 17 -2
10 4 9
6 6 3
5 3 9
6 3 6
7 3 2
$ cat tst.awk
BEGIN {
for (i=6;i<=9;i++) {
d[2,i] = 2
}
split("1 3 6 8 9 10",t);
for (i in t) {
d[3,t[i]] = 1
}
}
{ $2 += d[2,NR]; $3 += d[3,NR]; print }
$ awk -f tst.awk file
1 2 4
1 5 7
2 5 10
11 21 110
6 17 -2
10 4 9
6 6 3
5 3 9
6 3 6
7 3 2

Insert next number of sequence + default values for missing lines in awk

I have the following table (the real file is much larger - 2gb):
mwe.txt
X 7 1 3
X 8 1 4
X 9 1 6
X 13 2 8
X 14 2 8
X 15 3 8
X 19 6 10
X 20 6 11
Y 13 2 8
Y 14 2 8
Y 15 3 8
Y 19 6 10
Y 20 6 11
Unfortunately if columns 3 and 4 were zero, no lines were printed for this table. I would like these missing lines inserted, with:
"0" in columns 3 and 4
the next sequential number after the previous row for column 2
the name from the previous row for column 1
a header printed, and
line numbers added as an additional column.
I'd like to be able to pipe this, so I'd like to make it as fast as possible. I've made a start with awk, for which I found code for a similar problem:
awk 'BEGIN { prev_chr="";prev_pos=0;} { if($1==prev_chr && prev_pos+1!=int($2)) {for(i=prev_pos+1;i<int($2);++i) {printf("%s\t%d\t0\n",$1,i);}} print; prev_chr=$1;prev_pos=int($2);}' mwe.txt > output.txt
which outputs the following:
output.txt
X 7 1 3
X 8 1 4
X 9 1 6
X 10 0
X 11 0
X 12 0
X 13 2 8
X 14 2 8
X 15 3 8
X 16 0
X 17 0
X 18 0
X 19 6 10
X 20 6 11
Y 13 2 8
Y 14 2 8
Y 15 3 8
Y 16 0
Y 17 0
Y 18 0
Y 19 6 10
Y 20 6 11
As you can see, it does not put zeros into column 4 for the missing lines.
In short, the desired output:
mCoord chr coord samp1 samp2
1 X 7 1 3
2 X 8 1 4
3 X 9 1 6
4 X 10 0 0
5 X 11 0 0
6 X 12 0 0
7 X 13 2 8
8 X 14 2 8
9 X 15 3 8
10 X 16 0 0
11 X 17 0 0
12 X 18 0 0
13 X 19 6 10
14 X 20 6 11
15 Y 13 2 8
16 Y 14 2 8
17 Y 15 3 8
18 Y 16 0 0
19 Y 17 0 0
20 Y 18 0 0
21 Y 19 6 10
22 Y 20 6 11
An awk solution:
awk 'NR>1 && $2!=exp_idx{
for (i=exp_idx;i<$2;i++){
printf("%d %s %d 0 0\n",++cont,exp_coord,i)
}
}
{print ++cont" "$0;exp_coord=$1;exp_idx=$2+1}
' input
Results
1 X 7 1 3
2 X 8 1 4
3 X 9 1 6
4 X 10 0 0
5 X 11 0 0
6 X 12 0 0
7 X 13 2 8
8 X 14 2 8
9 X 15 3 8
10 X 16 0 0
11 X 17 0 0
12 X 18 0 0
13 X 19 6 10
14 X 20 6 11
15 Y 13 2 8
16 Y 14 2 8
17 Y 15 3 8
18 Y 16 0 0
19 Y 17 0 0
20 Y 18 0 0
21 Y 19 6 10
22 Y 20 6 11
Perl solution:
perl -lpae '#p =# F, next if 1 == $.;
print "$p[0] $_ 0 0" for $p[1] + 1 .. $F[1] - 1;
#p = #F
' input > output
It just remembers the previous line's columns in #p.

Mysterious: Elif and Counter Increment Bug

Hi I am trying to run a for loop with an increment counter that switches into an elif statement. The for loop is a way of building a string of syllables to synthesize with macintalk. I would like to add a short silence every 20ms but I cant seem to get it to work, I've tried a bunch of debugging steps but none seem to work. Can anyone spot the bug that prevents the elif from being accessed?
EDIT
Ok so I followed the suggestion below and used -eq instead = but I noticed that the counter only resets once and does not access the conditional statement a second time. The revised code is posted below:
counter=0;
for k in $indx
do
counter=$(($counter + 1));
echo 'increment counter'
echo $counter
if [ $k -eq 0 ]
then
stream=$stream'#_'${syllarray[k]}
elif [ $counter -eq 20 ]
then
echo Adding Silence after syllable: ${syllarray[k]}
stream=$stream'_'${syllarray[k]}'[[ slnc 20 ]]'
counter=0;
echo 'reset counter'
echo $counter
else
stream=$stream'_'${syllarray[k]}
fi
done
Sample output:
Synthesize A Syllable Stream with Predetermined Lexicon, Word Order and Phonology
Parameters:
Voice -- Victoria
Rate (words/min) -- 120
Pitch Modulation Interval -- 0
Baseline pitch -- 55
Directory Exists :)
Opening Syllable Transcription for Victoria
0
bIY
1
bUW
2
dAE
3
dOW
4
gOW
5
kUW
6
lAE
7
pAE
8
pIY
9
rOW
10
tIY
11
tUW
Counter Balanced Stimulus Order (Indexed by Syllables in Alphabetical Order)
11 2 9 8 4 6 0 5 10 11 2 9 8 4 6 1 3 7 8 4 6 0 5 10 11 2 9 1 3 7 0 5 10 8 4 6 1 3 7 0 5 10 8 4 6 11 2 9 0 5 10 1 3 7 8 4 6 11 2 9 1 3 7 11 2 9 0 5 10 1 3 7 8 4 6 0 5 10 11 2 9 8 4 6 1 3 7 0 5 10 8 4 6 11 2 9 0 5 10 8 4 6 11 2 9 0 5 10 1 3 7 8 4 6 1 3 7 0 5 10 11 2 9 1 3 7 8 4 6 0 5 10 1 3 7 11 2 9 1 3 7 11 2 9 1 3 7 0 5 10 8 4 6 1 3 7 0 5 10 11 2 9 0 5 10 11 2 9 8 4 6 11 2 9 1 3 7 8 4 6 0 5 10 1 3 7 11 2 9 8 4 6 0 5 10 8 4 6 1 3 7 11 2 9 0 5 10 1 3 7 8 4 6 11 2 9 8 4 6 1 3 7 11 2 9 0 5 10 8 4 6 11 2 9 0 5 10 8 4 6 11 2 9 1 3 7 0 5 10 1 3 7 8 4 6 0 5 10 1 3 7 0 5 10 11 2 9 1 3 7 11 2 9 8 4 6 0 5 10 11 2 9 8 4 6 1 3 7
Creating counterbalanced stimulus stream string with proper Macintalk formatting
increment counter
1
increment counter
2
increment counter
3
increment counter
4
increment counter
5
increment counter
6
increment counter
7
increment counter
8
increment counter
9
increment counter
10
increment counter
11
increment counter
12
increment counter
13
increment counter
14
increment counter
15
increment counter
16
increment counter
17
increment counter
18
increment counter
19
increment counter
20
Adding Silence after syllable: gOW
reset counter
0
increment counter
1
..
[truncated for clarity]
..
increment counter
268
Printing Stream to Screen
_tUW_dAE_rOW_pIY_gOW_lAE#_bIY_kUW_tIY_tUW_dAE_rOW_pIY_gOW_lAE_bUW_dOW_pAE_pIY_gOW[[ slnc 20 ]]_lAE#_bIY_kUW_tIY_tUW_dAE_rOW_bUW_dOW_pAE#_bIY_kUW_tIY_pIY_gOW_lAE_bUW_dOW_pAE#_bIY_kUW_tIY_pIY_gOW_lAE_tUW_dAE_rOW#_bIY_kUW_tIY_bUW_dOW_pAE_pIY_gOW_lAE_tUW_dAE_rOW_bUW_dOW_pAE_tUW_dAE_rOW#_bIY_kUW_tIY_bUW_dOW_pAE_pIY_gOW_lAE#_bIY_kUW_tIY_tUW_dAE_rOW_pIY_gOW_lAE_bUW_dOW_pAE#_bIY_kUW_tIY_pIY_gOW_lAE_tUW_dAE_rOW#_bIY_kUW_tIY_pIY_gOW_lAE_tUW_dAE_rOW#_bIY_kUW_tIY_bUW_dOW_pAE_pIY_gOW_lAE_bUW_dOW_pAE#_bIY_kUW_tIY_tUW_dAE_rOW_bUW_dOW_pAE_pIY_gOW_lAE#_bIY_kUW_tIY_bUW_dOW_pAE_tUW_dAE_rOW_bUW_dOW_pAE_tUW_dAE_rOW_bUW_dOW_pAE#_bIY_kUW_tIY_pIY_gOW_lAE_bUW_dOW_pAE#_bIY_kUW_tIY_tUW_dAE_rOW#_bIY_kUW_tIY_tUW_dAE_rOW_pIY_gOW_lAE_tUW_dAE_rOW_bUW_dOW_pAE_pIY_gOW_lAE#_bIY_kUW_tIY_bUW_dOW_pAE_tUW_dAE_rOW_pIY_gOW_lAE#_bIY_kUW_tIY_pIY_gOW_lAE_bUW_dOW_pAE_tUW_dAE_rOW#_bIY_kUW_tIY_bUW_dOW_pAE_pIY_gOW_lAE_tUW_dAE_rOW_pIY_gOW_lAE_bUW_dOW_pAE_tUW_dAE_rOW#_bIY_kUW_tIY_pIY_gOW_lAE_tUW_dAE_rOW#_bIY_kUW_tIY_pIY_gOW_lAE_tUW_dAE_rOW_bUW_dOW_pAE#_bIY_kUW_tIY_bUW_dOW_pAE_pIY_gOW_lAE#_bIY_kUW_tIY_bUW_dOW_pAE#_bIY_kUW_tIY_tUW_dAE_rOW_bUW_dOW_pAE_tUW_dAE_rOW_pIY_gOW_lAE#_bIY_kUW_tIY_tUW_dAE_rOW_pIY_gOW_lAE_bUW_dOW_pAE
Saving Synthesized Stream
Writing to: ./synthesis/stream-Victoria/stream.flac

Linux: GNU sort does not sort seq

Title sums it up.
$ echo `seq 0 10` `seq 5 15` | sort -n
0 1 2 3 4 5 6 7 8 9 10 5 6 7 8 9 10 11 12 13 14 15
Why doesn't this work?
Even if I don't use seq:
echo '0 1 2 3 4 5 6 7 8 9 10 5 6 7 8 9 10 11 12 13 14 15' | sort -n
0 1 2 3 4 5 6 7 8 9 10 5 6 7 8 9 10 11 12 13 14 15
And even ditching echo directly:
$ echo '0 1 2 3 4 5 6 7 8 9 10 5 6 7 8 9 10 11 12 13 14 15' > numbers
$ sort -n numbers
0 1 2 3 4 5 6 7 8 9 10 5 6 7 8 9 10 11 12 13 14 15
sort(1) sorts lines. You have to parse whitespace delimited data yourself:
echo `seq 0 10` `seq 5 15` | tr " " "\n" | sort -n
Because you need newlines for sort:
$ echo `seq 0 10` `seq 5 15` | tr " " "\\n" | sort -n | tr "\\n" " "; echo ""
0 1 2 3 4 5 5 6 6 7 7 8 8 9 9 10 10 11 12 13 14 15
$
You have single line of input. There is nothing to sort.
The command as you typed it results in the sequence of numbers being all passed to sort in one line. That's not what you want. Just pass the output of seq directly to sort:
(seq 0 10; seq 5 15) | sort -n
By the way, as you just found out, the construct
echo `command`
doesn't usually do what you expect and is redundant for what you actually expect: It tells the shell to capture the output of command and pass it to echo, which produces it as output again. Just let the output of the command go through directly (unless you really mean to have it processed by echo, maybe to expand escape sequences, or to collapse everything to one line).

Resources