Awk to multiply consecutive lines - bash

I have a file with a single column with N numbers:
a
b
c
d
e
And I would like to use awk to multiply first with second, second with third and so on and then add all these, i.e:
(a*b)+(b*c)+(c*d)+...
Any suggestions?

I would use the following command:
awk 'NR>1{t+=l*$0}{l=$0}END{print t}' input.txt
Having this input:
1
2
3
4
5
it will ouput:
40
which equals 1*2+2*3+3*4+4*5

Related

How to delete the last 3 lines of every 10 lines in a 100 line text file?

Suppose my file is
a
b
c
d
e
f
g
h
i
j
...
I want my output to be
a
b
c
d
e
f
g
My file is 100 lines long, so the code has to essentially remove the last 3 lines in every 10 line chunk. I tried using the code below, but I couldn't get it to delete a range.
awk 'NR % 7 !=0' file.txt
Awk's NR variable contains the line number starting with 1. NR - 1 is the line number starting with 0.
The reminder of NR - 1 divided by 10 is 0..6 for the first 7 lines of each 10-lines block and 7..9 for the last three.
The command is as simple as:
awk '(NR - 1) % 10 < 7' file.txt
awk '{if(NR % 10 < 8 && NR % 10 != 0) {print}}' file.txt
You need to mod 10 to skip some lines every 10, not mod 7.

Delete matching lines in two tab delimited files

I have 2 tab delimited files
A 2
A 5
B 4
B 5
C 10
and
A 2
A 5
B 5
I want to delete the lines in file1 that are in file2 so that the output is:
B 4
C 10
I have tried:
awk 'NR==FNR{c[$1$2]++;next};!c[$1$2] > 0' file2 file1 > file3
but it deletes more lines than expected.
1026997259 file1
1787919 file2
1023608359 file3
How can I modify this code, so that:
I have 2 tab delimited files
A 2 3
A 5 4
B 4 5
B 5 5
C 10 12
and
A 2 5
A 5 4
B 5 3
F 6 7
Based only in the 1st and 2nd columns, I want to grab the lines in file1 that are also in file2 so that the output is:
B 5 5
C 10 12
Why not to use grep command?
grep -vf file2 file1
Think about it - if you concatenate ab c and a cb they both become abc so what do you think your code is doing with $1$2? Use SUBSEP as intended ($1,$2) and change !c[$1$2] > 0 to !(($1,$2) in c). Also consider whether !c[$1$2] > 0 means !(c[$1$2] > 0) or (!c[$1$2]) > 0. I'd never write the former code so idk for sure, I'd always write it with parens as I intended it to be parsed. So do:
awk 'NR==FNR{c[$1,$2];next} !(($1,$2) in c)' file2 file1
Or just use $0 instead of $1,$2:
awk 'NR==FNR{c[$0];next} !($0 in c)' file2 file1
If the matching lines in the two files are identical, and the two files are sorted in the same order, then comm(1) can do the trick:
comm -23 file1 file2
It prints out lines that are only in the first file (unless -1 is given), lines that are only in the second file (unless -2), and lines that are in both files (unless -3). If you leave more than one option enabled then they will be printed in multiple (tab-separated) columns.

count patterns in a csv file from another csv file in bash

I have two csv files
File A
ID
1
2
3
File B
ID
1
1
1
1
3
2
3
What I want to do is to count how many times that a ID in File A show up in File B, and save the result in a new file C (which is in csv format). For example, 1 in File A shows up 4 times in File B. So in the new file C, I should have something like
File C
ID,Count
1,4
2,1
3,2
Originally I was thinking use "grep -f", but it seems like it only works with .txt format. Unfortunately, File A and B are both in csv format. So now, I am thinking maybe I could use a for loop to get the ID from File A individually and use grep -c to count each one of them. Any idea will be helpful.
Thanks in advance!
You can use this awk command:
awk -v OFS=, 'FNR==1{next} FNR==NR{a[$1]; next} $1 in a{freq[$1]++}
END{print "ID", "Count"; for (i in freq) print i, freq[i]}' fileA fileB
ID,Count
1,4
2,1
3,2
You could use join, sort, uniq and process substitution <(command) creatively:
$ join -2 2 <(sort A) <(sort B | uniq -c) | sort -n > C
$ cat C
ID 1
1 4
2 1
3 2
And if you really really want the header to be ID Count, before writing to file C you could replace that 1 with Count with sed by adding:
... | sed 's/\(ID \)1/\1Count/' > C
to get
ID Count
1 4
2 1
3 2
and if you really really want commas as separators instead of spaces, to replace them with spaces using tr, add also:
... | tr \ , > C
to get
ID,Count
1,4
2,1
3,2
You could of course ditch the trand use the sed like this instead:
... | sed 's/\(ID \)1/\1Count/;s/ /,/' > C
And the output would be like above.

AWK, filter lines which have two columns swapped

I have a file with distances, these are 'A to B' and 'B to A'. I would like to filter out the 'B to A' line(s). I have tried many options, but am confused a bit about the awk syntax.
One thing I tried is a variant on this code
awk '!x[$1,$3]++'
I wanted to find a way to print the line and store the 'A' and 'B' an array of the reversed columns with
awk '{if (a[$1,$3]=!0) print $0} a[$3,$1]++'
or
awk 'a[$1,$3]==0 a[$3,$1]++'
The first one duplicates every line except the first one. The second prints no lines, is there maybe a delimiter needed between the arrays?
awk to the rescue
change field nums according to your data
$ cat dist
1 5
2 4
3 3
4 2
5 1
$ awk '($2,$1) in a{next} {a[$1,$2]; print}' dist
1 5
2 4
3 3
this can be also be written as
$ awk '($2,$1) in a{next} ++a[$1,$2]' dist

Count how many occurences are greater or equel of a defined value in a line

I've a file (F1) with N=10000 lines, each line contains M=20000 numbers. I've an other file (F2) with N=10000 lines with only 1 column. How can count the number of occurences in line i of file F2 that are greater or equal to the number found at line i in the file F2 ? I tried using a bash loop with awk / sed but my output is empty.
Edit >
For now I've only succeed to print the number of occurences that are higher than a defined value. Here an example with a file with 3 lines and a defined value of 15 (sorry it's a very dirty code ..) :
for i in {1..3};do sed -n "$i"p tmp.txt | sed 's/\t/\n/g' | awk '{if($1 > 15){print $1}}' | wc -l; done;
Thanks in advance,
awk 'FNR==NR{a[FNR]=$1;next}
{count=0;for(i=1;i<=NF;i++)
{if($i >= a[FNR])
{count++}
};
print count
}' file2 file1
While processing file2, total line record is equal to line record of current file, store value in array a with current record as index.
initialize count to 0 for each line.
loop through the fields, increment the counter if value is greater or equal at current FNR index in array a.
Print the count value
$ cat file1
1 3 5 7 3 6
2 5 6 8 7 7
4 6 7 8 9 4
$ cat file2
6
3
1
$ awk -f file.awk
2
5
6
You could do it in a single awk command:
awk 'NR==FNR{a[FNR]=$1;next}{c=0;for(i=1;i<=NF;i++)c+=($i>a[FNR]);print c}' file2 file1

Resources