Combine multiple columns of different lengths into one column in BASH - bash

I need to combine columns of different lengths into one column using BASH. Here is an example input file:
11 1 2 3 4 5 6 7 8
12 1 2 3 4 5 6 7 8
13 1 2 3 4 5 6 7 8
14 1 2 5 6 7 8
15 1 2 7 8
And my desired output:
1
1
1
1
1
3
3
3
5
5
5
5
7
7
7
7
7
The input data is pairs of columns as shown. Each pair is separated from another by a fixed number of spaces. Values within a pair of columns are separated by one space. Thanks in advance!

Using GNU awk for fixed width field handling:
$ cat file
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 5 6 7 8
1 2 7 8
$ cat tst.awk
BEGIN{ FIELDWIDTHS="1 1 1 3 1 1 1 3 1 1 1 3 1 1 1" }
{
for (i=1;i<=NF;i++) {
a[NR,i] = $i
}
}
END {
for (i=1;i<=NF;i+=4)
for (j=1;j<=NR;j++)
if ( a[j,i] != " " )
print a[j,i]
}
$ gawk -f tst.awk file
1
1
1
1
1
3
3
3
5
5
5
5
7
7
7
7
7

You may try the following:
awk -f ext.awk input.txt
where input.txt is your input data file and ext.awk is:
BEGIN {
ncols=4 # number of columns
nspc=3 # number of spaces that separates the columns
}
{
str=$0;
for (i=1; i<=ncols; i++) {
pos=match(str,/^([0-9]+) ([0-9]+)/,a)
if (pos>0) {
b[NR,i]=a[1]
if (NR==1) colw[i]=RLENGTH; #assume col width are given as in first row
}
str=substr(str,colw[i]+1+nspc);
}
}
END {
for (i=1;i<=ncols;i++)
for (j=1;j<=NR;j++) {
if (b[j,i]) print b[j,i];
}
}

Related

how to compare two column from same file?

I have long data file, file.txt
1 3
3 2
2 3
5 5
8 9
so out file should be, out.txt
1 3
1 2
1 5
1 9
3 3
3 2
3 5
Could you please try following.
awk '
FNR==NR{
a[++count]=$2
next
}
{
for(i=1;i<=count;i++){
print $1,a[i]
}
}
' Input_file Input_file

How to merge three lines at a time

I have a .txt file with 9 lines:
1 2 3 4
1 2 3 5
1 2 3 6
1 2 3 4
1 2 3 5
1 2 3 6
1 2 3 4
1 2 3 5
1 2 3 6
I want to put the first 3 lines into one line, and the next three lines, and again the last three lines:
1 2 3 4 1 2 3 5 1 2 3 6
1 2 3 4 1 2 3 5 1 2 3 6
1 2 3 4 1 2 3 5 1 2 3 6
however it only gives me one consecutive line
I tried
cat old.txt | tr -d '\n' > new.txt
You can use paste to merge together lines.
paste -d " " - - - < input.txt
The -d " " uses a space to delimit between the lines being joined. Each - reads from stdin (and we're redirecting your input file to stdin). If you wanted to join more lines, just increase the number of - etc.

extract specific columns from dataset using AWK

I am trying to apply simple awk script to the dataset file.
The file has 150 columns, I need cols between 20 to 30 only.
below is the script I used to get the records with field between 20 to 30.
code
BEGIN{}
{
for(f=20;f<=30;f++){
print $f;
}
}
I dont know why I get each value of the 10 fields in next line.
That is,
sample dataset
1 2 3 4 5 6 7
2 2 3 4 5 6 7
3 3 3 4 5 6 7
4 4 4 4 5 6 7
5 5 5 5 5 6 7
6 6 6 6 6 6 7
7 7 7 7 7 7 7
I get output as
1
2
3
4
5
6
7
2
2
3
4
5
6
7
...so on
Below is another way of doing the same
awk -v f=20 -v t=30 '{for(i=f;i<=t;i++) \
printf("%s%s",$i,(i==t)?"\n":OFS)}' file
Notes
f and t are the starting and the ending columns respectively.
We used the ternary operator to control the field separator between the needed columns.
Edit
If you need columns 20 thru 30 and the last column, below would suffice :
awk -v f=20 -v t=30 '{for(i=f;i<=t;i++) \
printf("%s%s",$i,(i==t)?OFS""$NF"\n":OFS)}' file
Solution
BEGIN{FS=" ";}
{
for(f=20;f<=30;f++){
printf("%s ",$f);
}print "";
}

Combine if and NR in awk

I've been trying to figure this silly thing with awk in the last hours but no luck so far.
I understand how to plot every second line, for example:
awk 'NR%2' file
and I also understand how to print a column based file if one column is within a specific range, for example:
awk '{if ($1 > 'yourvalue') print}' file
What I don't quite get is how to combine the two.
In practize, if I have a file organized as:
1 3 6 8
2 8 4 5
3 9 8 7
4 7 3 5
5 7 3 6
6 2 4 6
7 1 4 7
8 3 2 1
9 7 5 3
10 4 5 6
11 8 2 5
how can I get, for example:
1 3 6 8
3 9 8 7
5 7 3 6
7 1 4 7
8 3 2 1
9 7 5 3
10 4 5 6
11 8 2 5
so return every two lines if column 1 is smaller than 7 and print normally the rest.
I tried to combine everything in one single line but I always get errors.
You can reverse the 2nd condition and use OR condition to combine them:
awk 'NR%2 || $1>=7' file
1 3 6 8
3 9 8 7
5 7 3 6
7 1 4 7
8 3 2 1
9 7 5 3
10 4 5 6
11 8 2 5
You can combine conditions using && (and) and || (or).
You can use parentheses for nesting conditions.
For example:
awk 'cond1 && (cond2 || cond3)' file
This:
awk '{if ($1 > 7) print}' file
... is equivalent to this:
awk '$1 > 7 { print }' file
... because you can write conditions outside of the {...} to use as filters.
... which is equivalent to:
awk '$1 > 7' file
... because the default action is to print.

Another split file in bash - based on difference between rows of column x

Hello stackoverflow users!
Generally I would like to tune up script I am using, just to make it more insensitive to missing data.
My example data looks like this (tab delimited csv file with headers):
ColA ColB ColC
6 0 0
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
4 0 0
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
3 0 0
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100
I use awk script found elsewhere, as follows:
awk 'BEGIN { fn=0 }
NR==1 { next }
NR==2 { delim=$2 }
$2 == delim {
f=sprintf("file_no%02d.txt",fn++);
print "Creating " f
}
{ print $0 > f }'
Which gives me output I want - omit 1st line, find 2nd column and set delimiter - in this example it will be '0':
file_no00.txt
6 0 0
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
file_no01.txt
4 0 0
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
file_no02.txt
3 0 0
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100
To make the script more robust (imagine that rows with 0's are deleted) I would need to split file according to the subtracted value of rows 'n+1' and 'n' if this value is below 0 split file, so basically if (value_row_n+1)-value_row_n < 0 then split file. Of course I would need also to maintain the file naming. Preferred way is bash with awk use. Any advices? Thanks in advance!
Cheers!
Here is awk command that you can use:
cat file
ColA ColB ColC
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100
awk 'NR == 1 {
next
}
!p || $2 < p {
f = sprintf("file_no%02d.txt",fn++);
print "Creating " f
}
{
p = $2;
print $0 > f
}' file
I suggest small modifications to your current script:
awk 'BEGIN { fn=0; f=sprintf("file_no%02d.txt",fn++); print "Creating " f }
NR==1 { next }
NR==2 { delim=$2 }
$2 - delim < 0 {
f=sprintf("file_no%02d.txt",fn++);
print "Creating " f
}
{ print $0 > f; delim = $2 }' infile
First, create the first file name just before starting the processing.
Second, in last condition save the value of current line to compare with the value of next line.
Third, instead the comparison with zero, do the substraction between previous value and current one to check if result is less than zero.
It yields:
==> file_no00.txt <==
6 0 0
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
==> file_no01.txt <==
4 0 0
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
==> file_no02.txt <==
3 0 0
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100

Resources