How to print columns one after the other in bash? - bash

Is there any better methods to print two or more columns into one column, for example
input.file
AAA 111
BBB 222
CCC 333
output:
AAA
BBB
CCC
111
222
333
I can only think of:
cut -f1 input.file >output.file;cut -f2 input.file >>output.file
But it's not good if there are many columns, or when I want to pipe the output to other commands like sort.
Any other suggestions? Thank you very much!

With awk
awk '{if(maxc<NF)maxc=NF;
for(i=1;i<=NF;i++){(a[i]!=""?a[i]=a[i]RS$i:a[i]=$i)}
}
END{
for(i=1;i<=maxc;i++)print a[i]
}' input.file

You can use a GNU awk array of arrays to store all the data and print it later on.
If the number of columns is constant, this works for any amount of columns:
gawk '{for (i=1; i<=NF; i++) # loop over columns
data[i][NR]=$i # store in data[column][line]
}
END {for (i=1;i<=NR;i++) # loop over lines
for (j=1;j<=NF;j++) # loop over columns
print data[i][j] # print the given field
}' file
Note NR stands for number of records (that is, number of lines here) and NF stands for number of fields (that is, the number of fields in a given line).
If the number of columns changes over rows, then we should use yet another array, in this case to store the number of columns for each row. But in the question I don't see a request for this, so I am leaving it for now.
See a sample with three columns:
$ cat a
AAA 111 123
BBB 222 234
CCC 333 345
$ gawk '{for (i=1; i<=NF; i++) data[i][NR]=$i} END {for (i=1;i<=NR;i++) for (j=1;j<=NF;j++) print data[i][j]}' a
AAA
BBB
CCC
111
222
333
123
234
345
If the number of columns is not constant, using an array to store the number of columns for each row helps to keep track of it:
$ cat sc.wk
{for (i=1; i<=NF; i++)
data[i][NR]=$i
columns[NR]=NF
}
END {for (i=1;i<=NR;i++)
for (j=1;j<=NF;j++)
print (i<=columns[j] ? data[i][j] : "-")
}
$ cat a
AAA 111 123
BBB 222
CCC 333 345
$ awk -f sc.wk a
AAA
BBB
CCC
111
222
333
123
-
345

awk '{print $1;list[i++]=$2}END{for(j=0;j<i;j++){print list[j];}}' input.file
Output
AAA
BBB
CCC
111
222
333
More simple solution would be
awk -v RS="[[:blank:]\t\n]+" '1' input.file

Expects tab as delimiter:
$ cat <(cut -f 1 asd) <(cut -f 2 asd)
AAA
BBB
CCC
111
222
333

Since the order is of no importance:
$ awk 'BEGIN {RS="[ \t\n]+"} 1' file
AAA
111
BBB
222
CCC
333

Ugly, but it works-
for i in {1..2} ; do awk -v p="$i" '{print $p}' input.file ; done
Change the {1..2} to {1..n} where 'n' is the number of columns in the input file
Explanation-
We're defining a variable p which itself is the variable i. i varies from 1 to n and at each step we print the 'i'th column of the file.

This will work for an arbitrary number fo space separated colums
awk '{for (A=1;A<=NF;A++) printf("%s\n",$A);}' input.file | sort -u > output.file
If space is not the separateor ... let's suppose ":" is the separator
awk -F: '{for (A=1;A<=NF;A++) printf("%s\n",$A);}' input.file | sort -u > output.file

Related

Replace number in one file with the number in other file

I have a problem. I have two files (file1 and file2). Both files contain number (with different values) which characterize same variable from different estimations. In file1 this number1 is for example in row beginning with name var1 in field $3, in file2 this number2 is in row beginning with name var2 and is in field $2. I want take number1 from file1 and replace number2 in file2 with it. I tried following script, but it is not working, in output nothing is changed compared to original file2:
#! /bin/bash
Var1=$(cat file1 | grep 'var1' | awk '{printf "%s", $3}' )
Var2=$(cat file2 | grep 'var2' | awk '{printf "%s", $2}' )
cat file2 | awk '{gsub(/'$Var2'/,'$Var1'); print}'
Thanks in advance!
Addition: For example, in file1 I have:
Tomato 2.154 3.789
Apple 1.458 3.578
Orange 2.487 4.045
In file2:
Banana 2.892
Apple 1.687
Mango 2.083
I want to change file2 so, that it would be:
Banana 2.892
Apple 3.578
Mango 2.083
Using this as file1:
var1 junk 101
var2 junk 102
var3 junk 103
And this as file2:
var1 201
var2 202
var3 203
This will extract field 3 from file1 where field 1 is var1:
awk '$1=="var1"{print $3}' file1
101
This will replace field 2 in file2 with x (101) where the first field is var2:
awk -v x=101 '$1=="var2"{$2=x}1' file2
var1 201
var2 101
var3 203
And combining them, you get:
awk -v x=$(awk '$1=="var1"{print $3}' file1) '$1=="var2"{$2=x}1' file2
var1 201
var2 101
var3 203
Assuming you want to overwrite the first file, you can do a conditional mv that runs only when things worked:
awk -v x=$(awk '$1=="var1"{print $3}' file1) '$1=="var2"{$2=x}1' file2 > /tmp/a && mv /tmp/a file2

Shell - Concatenate rows in Column1 If Column 2 has duplicates

I am a newbie for shell programming and currently facing a roadblock in arriving a solution,
I want to concatenate the column A values iff column B is same.
Here is the sample input,
Col A Col B
AAA www.google.com
BBB www.google.com
CCC www.gmail.com
DDD www.yahoo.com
Expected Output
Col A Col B
AAA,BBB www.google.com
CCC www.gmail.com
DDD www.yahoo.com
I am using the below Awk command to segregate the duplicate entries,
awk 'NR == 1 {p=$2; next} p == $2 { printf "%s,%s\n",$1,$2} {p=$2}' FS="," Input.csv
But I am not able to get the duplicates segregated.
Any suggestions or pointers will be highly appreciated.
In case you are not worried about the sequence of the output(like it should be same as shown Input_file) then following may help you on same.
awk 'FNR==1{print;next} {a[$2]=a[$2]?a[$2] "," $1:$1} END{for(i in a){print a[i],i}}' OFS="\t" Input_file
Output will be as follows:
Col A Col B
CCC www.gmail.com
DDD www.yahoo.com
AAA,BBB www.google.com

how to extract lines between two patterns only with awk?

$ awk '/abc/{flag=1; next} /edf/{flag=0} flag' file
flag will print $0, but I only need the first matching lines from two strings.
input:
abc
111
222
edf
333
444
abc
555
666
edf
output:
111
222
So I'm assuming you want to print out the matching lines only for 1st occurrence.
For that you can just use an additional variable and set it once flag goes 0
$ cat file
abc
111
222
edf
333
444
abc
555
666
edf
$ awk '/abc/{flag=1; next} /edf/{if(flag) got1stoccurence=1; flag=0} flag && !got1stoccurence' file
111
222
If you only want the first set of output, then:
awk '/abc/{flag=1; next} /edf/{if (flag == 1) exit} flag' file
Or:
awk '/abc/{flag++; next} /edf/{if (flag == 1) flag++} flag == 1' file
There are other ways to do it too, no doubt. The first is simple and to the point. The second is more flexible if you also want to process the first group of lines appearing between another pair of patterns.
Note that if the input file contains:
xyz
edf
pqr
abc
111
222
edf
It is important not to do anything about the first edf; it is an uninteresting line because no abc line has been read yet.
Using getline with while:
$ awk '/abc/ { while(getline==1 && $0!="edf") print; exit }' file
111
222
Look for /abc/ and once found records will be outputed in the while loop until edf is found.
$ awk '/edf/{exit} f; /abc/{f=1}' file
111
222
If it was possible for edf to appear before abc in your input then it'd be:
$ awk 'f{if (/edf/) exit; print} /abc/{f=1}' file
111
222

gawk use to replace a line containing a pattern with multiple lines using variable

I am trying to replace a line containing the Pattern using gawk, with a set of lines. Let's say, file aa contains
aaaa
ccxyzcc
aaaa
ddxyzdd
I'm using gawk to replace all lines containing xyz with a set of lines 111\n222, my changed contents would contain:
aaaa
111
222
aaaa
111
222
But, if I use:
gawk -v nm2="111\n222" -v nm1="xyz" '{ if (/nm1/) print nm2;else print $0}' "aa"
The changed content shows:
aaaa
ccxyzcc
aaaa
ddxyzdd
I need the entire lines those contain xyz i.e. lines ccxyzcc and ddxyzdd having to be replaced with 111 followed by 222. Please help.
The problem with your code was that /nm1/ tries to match nm1 as pattern not the value in nm1 variable
$ gawk -v nm2="111\n222" -v nm1="xyz" '$0 ~ nm1{print nm2; next} 1' aa
aaaa
111
222
aaaa
111
222
Thanks #fedorqui for suggestion, next can be avoided by simply overwriting content of input line matching the pattern with required text
gawk -v nm2="111\n222" -v nm1="xyz" '$0 ~ nm1{$0=nm2} 1' aa
Solution with GNU sed
$ nm1='xyz'
$ nm2='111\n222'
$ sed "/$nm1/c $nm2" aa
aaaa
111
222
aaaa
111
222
The c command would delete the line matching pattern and add the text given
When using awk's ~ operator, and you don't need to provide a literal regex on the right-hand side.
Your command as-such with the correction of improper syntax would be something like,
gawk -v nm2="111\n222" -v nm1="xyz" '{ if ( $0 ~ nm1 ) print nm2;else print $0}' input-file
which produces the output.
aaaa
111
222
aaaa
111
222
This is how I'd do it:
$ cat aa
aaaa
ccxyzcc
aaaa
ddxyzdd
$ awk '{gsub(/.*xyz.*/, "111\n222")}1' aa
aaaa
111
222
aaaa
111
222
$
Passing variables as patterns to awk is always a bit tricky.
awk -v nm2='111\n222' '{if ($1 ~ /xyz/){ print nm2 } else {print}}'
will give you the output, but the 'xyz' pattern is now fixed.
Passing nm1 as shell variable will also work:
nm1=xyz
awk -v nm2='111\n222' '{if ($1 ~ /'$nm1'/){ print nm2 } else {print}}' aa

Compare all but last N Columns across two files in bash

I have 2 files: one with 18 columns; another with many more. I need to find the rows that mismatch on ONLY the first 18 columns while ignoring the rest in the other file. However, I need to preserve and print the entire row (cut will not work).
File 1:
F1 F2 F3....F18
A B C.... Y
AA BB CC... YY
File 2:
F1 F2 F3... F18... F32
AA BB CC... YY... 123
AAA BBB CCC... YYY...321
Output Not In File 1:
AAA BBB CCC YYY...321
Output Not In File 2:
A B C...Y
If possible, I would like to use diff or awk with as few loops as possible.
You can use awk:
awk '{k=""; for(i=1; i<=18; i++) k=k SUBSEP $i} FNR==NR{a[k]; next} !(k in a)' file1 file2
For each row in both files we are first creating a key by concatenating first 18 fields
We are then storing this key in an associative array while iterating first file
Finally we print each row from 2nd file when this new key value is not found in our associative array.
You can use grep:
grep -vf file1 file2
grep -vf <(cut -d" " -f1-18 file2) file1
to get set differences between two files, you'll need little more, similar to #anubhava's answer
$ awk 'NR==FNR{f1[$0]; next}
{k=$1; for(i=2;i<=18;i++) k=k FS $i;
if(k in f1) delete f1[k];
else f2[$0]}
END{print "not in f1";
for(k in f2) print k;
print "\nnot in f2";
for(k in f1) print k}' file1 file2
can be re-written to preserve order in file2
$ awk 'NR==FNR{f1[$0]; next}
{k=$1; for(i=2;i<=18;i++) k=k FS $i;
if(k in f1) delete f1[k];
else {if(!p) print "not in f1";
f2[$0]; print; p=1}}
END{print "\nnot in f2";
for(k in f1) print k}' file1 file2

Resources