Combine if and NR in awk - bash

I've been trying to figure this silly thing with awk in the last hours but no luck so far.
I understand how to plot every second line, for example:
awk 'NR%2' file
and I also understand how to print a column based file if one column is within a specific range, for example:
awk '{if ($1 > 'yourvalue') print}' file
What I don't quite get is how to combine the two.
In practize, if I have a file organized as:
1 3 6 8
2 8 4 5
3 9 8 7
4 7 3 5
5 7 3 6
6 2 4 6
7 1 4 7
8 3 2 1
9 7 5 3
10 4 5 6
11 8 2 5
how can I get, for example:
1 3 6 8
3 9 8 7
5 7 3 6
7 1 4 7
8 3 2 1
9 7 5 3
10 4 5 6
11 8 2 5
so return every two lines if column 1 is smaller than 7 and print normally the rest.
I tried to combine everything in one single line but I always get errors.

You can reverse the 2nd condition and use OR condition to combine them:
awk 'NR%2 || $1>=7' file
1 3 6 8
3 9 8 7
5 7 3 6
7 1 4 7
8 3 2 1
9 7 5 3
10 4 5 6
11 8 2 5

You can combine conditions using && (and) and || (or).
You can use parentheses for nesting conditions.
For example:
awk 'cond1 && (cond2 || cond3)' file
This:
awk '{if ($1 > 7) print}' file
... is equivalent to this:
awk '$1 > 7 { print }' file
... because you can write conditions outside of the {...} to use as filters.
... which is equivalent to:
awk '$1 > 7' file
... because the default action is to print.

Related

extract specific columns from dataset using AWK

I am trying to apply simple awk script to the dataset file.
The file has 150 columns, I need cols between 20 to 30 only.
below is the script I used to get the records with field between 20 to 30.
code
BEGIN{}
{
for(f=20;f<=30;f++){
print $f;
}
}
I dont know why I get each value of the 10 fields in next line.
That is,
sample dataset
1 2 3 4 5 6 7
2 2 3 4 5 6 7
3 3 3 4 5 6 7
4 4 4 4 5 6 7
5 5 5 5 5 6 7
6 6 6 6 6 6 7
7 7 7 7 7 7 7
I get output as
1
2
3
4
5
6
7
2
2
3
4
5
6
7
...so on
Below is another way of doing the same
awk -v f=20 -v t=30 '{for(i=f;i<=t;i++) \
printf("%s%s",$i,(i==t)?"\n":OFS)}' file
Notes
f and t are the starting and the ending columns respectively.
We used the ternary operator to control the field separator between the needed columns.
Edit
If you need columns 20 thru 30 and the last column, below would suffice :
awk -v f=20 -v t=30 '{for(i=f;i<=t;i++) \
printf("%s%s",$i,(i==t)?OFS""$NF"\n":OFS)}' file
Solution
BEGIN{FS=" ";}
{
for(f=20;f<=30;f++){
printf("%s ",$f);
}print "";
}

Way to grab a line based on lines value

I have an example like so:
1 2 3 4 5 6 7 8 9 10 2.2
1 3 2 3 2 3 2 3 2 33 1.1
11 values per line, all single spaced.
The occasional random character thrown in, but that's it. I'm trying to find a way to copy the line in which the last value is less than a some user/predetermined value. Something akin to a 'grep if $last <= 2', but I can't think of one nor can I find one.
Thanks for any help!
Simple awk use case:
awk -v val=2 '$NF < val' file
Output:
1 3 2 3 2 3 2 3 2 33 1.1

find non-matching lines of two files bash

I'm still new to bash and I've found similar questions to mine, but i still can't solve my problem. I have two files with 2 columns each, separated by a space.
file 1:
1 AGCATTTTTCAAACGAAAGATTTACTACCGATGTGT
2 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA
3 GATCGAACCGGCTGCCTACTGCGTGTAAAGCCGCCC
4 CCGACACAGAGAACATTAGAATACTCAGAGCCATNN
5 TAAGCCTGAGCCTAAACCTAAGCCTAAACATAAGAA
6 AGCAGAGAAGAGATGAGTTGTCGAGTGAGGCGTAAG
7 AACGTTGAAAAATTATCCCGTCAACAGTCTCCAGAA
8 GCCAGAGAGTAAAATATTGGGTGAAGCCAGAGAGTA
9 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA
file 2:
1 AGCATTTTTCAAACGAAAGATTTACTACCGATGTGT
2 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA
3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
4 CCGACACAGAGAACATTAGAATACTCAGAGCCATNN
5 TAAGCCTGAGCCTAAACCTAAGCCTAAACATAAGAA
6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
8 GCCAGAGAGTAAAATATTGGGTGAAGCCAGAGAGTA
9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
I'd like to compare only the second columns of each file, line by line, and output a third file with only the non-matching lines.
output:
3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
You can use awk:
awk 'NR==FNR{a[$2];next} !($2 in a)' file1 file2
3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Explanation:
NR == FNR { # While processing the first file
a[$2] # just push the second field in an array
next # move to next record of first file
}
!($2 in a) # print lines from file2 if array a doesn't that line
grep -vf file1 file2
Output:
3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
You could use diff for this. diff will print out differences in two files.
/test>diff file1 file2
3c3
< 3 GATCGAACCGGCTGCCTACTGCGTGTAAAGCCGCCC
---
> 3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
6,7c6,7
< 6 AGCAGAGAAGAGATGAGTTGTCGAGTGAGGCGTAAG
< 7 AACGTTGAAAAATTATCCCGTCAACAGTCTCCAGAA
---
> 6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9c9
< 9 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA
---
> 9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Grepping for just differences from the second file:
/test>diff file1 file2 | grep ">"
> 3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Paste every two lines in a file together as one line BASH

My colleague has given me a file, in which half of the lines are made of 8 columns of info and the other half are made of the 9th column of info. They are always next to each other, e.g.
1 2 3 4 5 6 7 8
1.1
2 3 4 5 6 7 8 9
1.2
...
a b c d e f g h
abcd
I know how to paste every two lines as one and print them out in Python. But I was wondering if it's possible to do that even more conveniently in BASH?
Thanks guys!
You could use sed or awk, as other answers have mentioned. Those answers are all good.
You could also do this easily in pure shell.
$ while read line1; do read line2; echo "$line1 $line2"; done < input.txt
1 2 3 4 5 6 7 8 1.1
2 3 4 5 6 7 8 9 1.2
Note that whitespace is not preserved.
There's another tool available on most unix-like systems called paste:
$ paste - - < input.txt
1 2 3 4 5 6 7 8 1.1
2 3 4 5 6 7 8 9 1.2
In this case, there's a big space in the first line because paste separates columns using tabs, by default, and the trailing space in the first line of input.txt caused the separating tab to be offset to the next column. You can read paste's man page for options to control this.
Another awk
awk '{f=$0;getline;print f,$0}' file
1 2 3 4 5 6 7 8 1.1
2 3 4 5 6 7 8 9 1.2
And just for the fun of it a gnu awk
awk -v RS="[0-9][.][0-9]" '{$1=$1;print $0,RT}' file
1 2 3 4 5 6 7 8 1.1
2 3 4 5 6 7 8 9 1.2
Here is set the Record Separator to the value in line two.
Then the RT will have the actual separator stored.
try:
awk '{printf "%s%s",$0,(NR%2?FS:RS)}' file
or:
awk 'NR%2{printf "%s ",$0;next}7' file
test:
kent$ echo "1 2 3 4 5 6 7 8
1.1
2 3 4 5 6 7 8 9
1.2"|awk '{printf "%s%s",$0,(NR%2?FS:RS)}'
1 2 3 4 5 6 7 8 1.1
2 3 4 5 6 7 8 9 1.2
kent$ echo "1 2 3 4 5 6 7 8
1.1
2 3 4 5 6 7 8 9
1.2"|awk 'NR%2{printf "%s ",$0;next}7'
1 2 3 4 5 6 7 8 1.1
2 3 4 5 6 7 8 9 1.2
You can sed:
sed 'N;s/\n/ /' file
or awk:
awk 'NF==1{print $0}{printf "%s ",$0}' file

Combine multiple columns of different lengths into one column in BASH

I need to combine columns of different lengths into one column using BASH. Here is an example input file:
11 1 2 3 4 5 6 7 8
12 1 2 3 4 5 6 7 8
13 1 2 3 4 5 6 7 8
14 1 2 5 6 7 8
15 1 2 7 8
And my desired output:
1
1
1
1
1
3
3
3
5
5
5
5
7
7
7
7
7
The input data is pairs of columns as shown. Each pair is separated from another by a fixed number of spaces. Values within a pair of columns are separated by one space. Thanks in advance!
Using GNU awk for fixed width field handling:
$ cat file
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 5 6 7 8
1 2 7 8
$ cat tst.awk
BEGIN{ FIELDWIDTHS="1 1 1 3 1 1 1 3 1 1 1 3 1 1 1" }
{
for (i=1;i<=NF;i++) {
a[NR,i] = $i
}
}
END {
for (i=1;i<=NF;i+=4)
for (j=1;j<=NR;j++)
if ( a[j,i] != " " )
print a[j,i]
}
$ gawk -f tst.awk file
1
1
1
1
1
3
3
3
5
5
5
5
7
7
7
7
7
You may try the following:
awk -f ext.awk input.txt
where input.txt is your input data file and ext.awk is:
BEGIN {
ncols=4 # number of columns
nspc=3 # number of spaces that separates the columns
}
{
str=$0;
for (i=1; i<=ncols; i++) {
pos=match(str,/^([0-9]+) ([0-9]+)/,a)
if (pos>0) {
b[NR,i]=a[1]
if (NR==1) colw[i]=RLENGTH; #assume col width are given as in first row
}
str=substr(str,colw[i]+1+nspc);
}
}
END {
for (i=1;i<=ncols;i++)
for (j=1;j<=NR;j++) {
if (b[j,i]) print b[j,i];
}
}

Resources