Print awk output into new column - bash

I have lot of files modified (after filtration) and I need to print NR and characters about new files into column - lets see example:
input files: x1, x2, x3, y1, y2, y3, z1, z2, z3 ...
script:
for i in x* y* z*
do awk -v h=$i 'END{c+=lenght+1 ;print h "\t" NR "\t" c}' >> stats.txt
done;
my output looks like:
x1 NR c
x2 NR c
x3 NR c
y1 NR c
y2 NR c
y3 NR c
z1 NR c
z2 NR c
z3 NR c
And I need to save each loop to new column no line:
x1 NR c y1 NR c z1 NR c
x2 NR c y2 NR c z2 NR c
x3 NR c y3 NR c z3 NR c
to keep corresponding files (after filtration) on the same line. I hope I am clear. I need to do this in BASH and awk. Thank you for any help!!
EDITED:
the real output look like:
x 0.457143 872484
y 0.527778 445759
z 0.416667 382712
x 0.457143 502528
y 0.5 575972
z 0.444444 590294
x 0.371429 463939
y 0.694444 398033
z 0.56565 656565
.
.
.
and I need:
x 0.457143 872484 0.457143 502528 0.371429 463939
y 0.52777 445759 0.5 575972 0.694444 398033
.
.
.
I hope it is clear..

Try this:
cat data | tr -d , | awk '{for (i = 1; i <= NF; i += 3) print $i " NR c " $(i+1) " NR c " $(i+2) " NR c"}'
Output:
x1 NR c x2 NR c x3 NR c
y1 NR c y2 NR c y3 NR c
z1 NR c z2 NR c z3 NR c
Same table but transposed (for your task variant):
cat data | tr -d , | awk '{for (i = 1; i <= NF/3; i += 1) print $i " NR c " $(i+3) " NR c " $(i+6) " NR c"}'
Output:
x1 NR c y1 NR c z1 NR c
x2 NR c y2 NR c z2 NR c
x3 NR c y3 NR c z3 NR c
For your task update check the following solution (using bash):
cat data | sort | while read L;
do
y=`echo $L | cut -f1 -d' '`;
{
test "$x" = "$y" && echo -n " `echo $L | cut -f2- -d' '`";
} ||
{
x="$y";echo -en "\n$L";
};
done
(from my solution for similar problem)
Updated script after comment:
sort data | while read L
do
y="`echo \"$L\" | cut -f1 -d' '`"
if [ "$x" = "$y" ]
then
echo -n " `echo \"$L\" | cut -f2- -d' '`"
else
x="$y"
echo -en "\n$L"
fi
done

Related

Combining 2 lines together but "interlaced"

I have 2 lines from an output as follow:
a b c
x y z
I would like to pipe both lines from the last command into a script that would combine them "interlaced", like this:
a x b y c z
The solution should work for a random number of columns from the output, such as:
a b c d e
x y z x y
Should result in:
a x b y c z d x e y
So far, I have tried using awk, perl, sed, etc... but without success. All I can do, is to put the output into one line, but it won't be "interlaced":
$ echo -e 'a b c\nx y z' | tr '\n' ' ' | sed 's/$/\n/'
a b c x y z
Keep fields of odd numbered records in an array, and update the fields of even numbered records using it. This will interlace each pair of successive lines in input.
prog | awk 'NR%2{split($0,a);next} {for(i in a)$i=(a[i] OFS $i)} 1'
Here's a 3 step solution:
$ # get one argument per line
$ printf 'a b c\nx y z' | xargs -n1
a
b
c
x
y
z
$ # split numbers of lines by 2 and combine them side by side
$ printf 'a b c\nx y z' | xargs -n1 | pr -2ts' '
a x
b y
c z
$ # combine all input lines into single line
$ printf 'a b c\nx y z' | xargs -n1 | pr -2ts' ' | paste -sd' '
a x b y c z
$ printf 'a b c d e\nx y z 1 2' | xargs -n1 | pr -2ts' ' | paste -sd' '
a x b y c z d 1 e 2
Could you please try following, it will join every 2 lines in "interlaced" fashion as follows.
awk '
FNR%2!=0 && FNR>1{
for(j=1;j<=NF;j++){
printf("%s%s",a[j],j==NF?ORS:OFS)
delete a
}
}
{
for(i=1;i<=NF;i++){
a[i]=(a[i]?a[i] OFS:"")$i}
}
END{
for(j=1;j<=NF;j++){
printf("%s%s",a[j],j==NF?ORS:OFS)
}
}' Input_file
Here is a simple awk script
script.awk
NR == 1 {split($0,inArr1)} # read fields frrom 1st line into arry1
NR == 2 {split($0,inArr2); # read fields frrom 2nd line into arry2
for (i = 1; i <= NF; i++) printf("%s%s%s%s", inArr1[i], OFS, inArr2[i], OFS); # ouput interlace fields from arr1 and arr2
print; # terminate output line.
}
input.txt
a b c d e
x y z x y
running:
awk -f script.awk input.txt
output:
a x b y c z d x e y x y z x y
Multiline awk solution:
interlaced.awk
{
a[NR] = $0
}
END {
split(a[1], b)
split(a[2], c)
for (i in b) {
printf "%s%s %s", i==1?"":OFS, b[i], c[i]
}
print ORS
}
Run it like this:
foo_program | awk -f interlaced.awk
Perl will do the job. It was invented for this type of task.
echo -e 'a b c\nx y z' | \
perl -MList::MoreUtils=mesh -e \
'#f=mesh #{[split " ", <>]}, #{[split " ", <>]}; print "#f"'
 
a x b y c z
You can of course print out the meshed output any way you want.
Check out http://metacpan.org/pod/List::MoreUtils#mesh
You could even make it into a shell function for easy use:
function meshy {
perl -MList::MoreUtils=mesh -e \
'#f=mesh #{[split " ", <>]}, #{[split " ", <>]}; print "#f"'
}
$ echo -e 'X Y Z W\nx y z w' |meshy
X x Y y Z z W w
$
Ain't Perl grand?
This might work for you (GNU sed):
sed -E 'N;H;x;:a;s/\n(\S+\s+)(.*\n)(\S+\s+)/\1\3\n\2/;ta;s/\n//;s// /;h;z;x' file
Process two lines at time. Append two lines in the pattern space to the hold space which will introduce a newline at the front of the two lines. Using pattern matching and back references, nibble away at the front of each of the two lines and place the pairs at the front. Eventually, the pattern matching fails, then remove the first newline and replace the second by a space. Copy the amended line to hold space, clean up the pattern space ready for the next couple of line (if any) and print.

bash command for splitting cell content by delimiter into multiple rows in the cell column

To draw a task. I have dataframe:
x y1;y2;y3 z1;z2;z3
a b1;b2 c1;c2
I need:
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2
Column 1 has one instance always. Number of instances in a cell can be from one to many but always equal between column 2,3. Thanks
In awk:
$ awk -F"(\t|;)" '{
for(i=2;i<=4;i++)
if($i!="")
print $1, $i, $(i+3)
}' file
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2
Edit: Another version:
$ awk -F"(\t+|;)" '{ # FS tabs or semicolon
for(i=2;i<=int(NF/2)+1;i++)
print $1,$i,$(i+int(NF/2))
}' file
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2
Something like this should make it:
declare -a cols=() # array for individual columns (line fields)
IFS=' ;' # fields separators
while read -a cols; do
n=${#cols[#]} # number of fields in current line
if (( n < 3 || n % 2 != 1 )); then # skip invalid lines
printf "skipping invalid line: %s\n" "${cols[*]}"
continue
fi
for (( i = 1; i <= n / 2; i += 1 )); do # loop over pairs of fields
# printf line
printf "%s %s %s\n" "${cols[0]}" "${cols[i]}" "${cols[n/2+i]}"
done
done < data.txt
Explanations:
IFS is the list of characters used by read to split a line in fields. In your case spaces and ; seem to be the separators.
read -a cols assigns the fields of the read line to the cols array, starting at cell 0.
Example of run:
$ cat data.txt
x y1;y2;y3 z1;z2;z3
a b1;b2 c1;c2
$ ./foo.sh
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2

UPDATED: Bash + Awk : Print first X(dynamic) columns and always last column

#file test.txt
a b c 5
d e f g h 7
gg jj 2
Say X = 3 I need the output like this:
#file out.txt
a b c 5
d e f 7
gg jj 2
NOT this:
a b c 5
d e f 7
gg jj 2 2 <--- WRONG
I've gotten to this stage:
cat test.txt | awk ' { print $1" "$2" "$3" "NF } '
If you're unsure of the total number of fields, then one option would be to use a loop:
awk '{ for (i = 1; i <= 3 && i < NF; ++i) printf "%s ", $i; print $NF }' file
The loop can be avoided by using a ternary:
awk '{ print $1, $2, (NF > 3 ? $3 OFS $NF : $3) }' file
This is slightly more verbose than the approach suggested by 123 but means that you aren't left with trailing white space on the lines with three fields. OFS is the Output Field Separator, a space by default, which is what print inserts between fields when you use a ,.
Use a $ combined with NF :
cat test.txt | awk ' { print $1" "$2" "$3" "$NF } '

complex line copying&modifying on-the-fly with grep or sed

Is there a way to do the followings with either grep, or sed: read each line of a file, and copy it twice and modify each copy:
Original line:
X Y Z
A B C
New lines:
Y M X
Y M Z
B M A
B M C
where X, Y, Z, M are all integers, and M is a fixed integer (i.e. 2) we inject while copying! I suppose a solution (if any) will be so complex that people (including me) will start bleeding after seeing it!
$ awk -v M=2 '{print $2,M,$1; print $2,M,$3;}' file
Y 2 X
Y 2 Z
B 2 A
B 2 C
How it works
-v M=2
This defines the variable M to have value 2.
print $2,M,$1
This prints the second column, followed by M, followed by the first column.
print $2,M,$3
This prints the second column, followed by M, followed by the third column.
Extended Version
Suppose that we want to handle an arbitrary number of columns in which we print all columns between first and last, followed by M, followed by the first, and then print all columns between first and last, followed by M, followed by the last. In this case, use:
awk -v M=2 '{for (i=2;i<NF;i++)printf "%s ",$i; print M,$1; for (i=2;i<NF;i++)printf "%s ",$i; print M,$NF;}' file
As an example, consider this input file:
$ cat file2
X Y1 Y2 Z
A B1 B2 C
The above produces:
$ awk -v M=2 '{for (i=2;i<NF;i++)printf "%s ",$i; print M,$1; for (i=2;i<NF;i++)printf "%s ",$i; print M,$NF;}' file2
Y1 Y2 2 X
Y1 Y2 2 Z
B1 B2 2 A
B1 B2 2 C
The key change to the code is the addition of the following command:
for (i=2;i<NF;i++)printf "%s "
This command prints all columns from the i=2, which is the column after the first to i=NF-1 which is the column before the last. The code is otherwise similar.
Sure; you can write:
sed 's/\(.*\) \(.*\) \(.*\)/\2 M \1\n\2 M \3/'
With bash builtin commands:
m=2; while read a b c; do echo "$b $m $a"; echo "$b $m $c"; done < file
Output:
Y 2 X
Y 2 Z
B 2 A
B 2 C

convert data matrix using awk

Is it possible to transpose the following data matrix input to the desired output?
f1 x1 1.2
f1 x2 2.2
f1 x3 0
f2 x1 1.1
f2 x2 1.2
f2 x3 3.3
f3 x1 2.3
f3 x2 4.4
f3 x3 0.1
output
f1 f2 f3
x1 1.2 1.1 2.3
x2 2.2 1.2 4.4
x3 0 3.3 0.1
This can be a way:
awk '{a[$1,$2]=$3; col[$1]; row[$2]}
END {printf "%s", FS
for (c in col) printf "%s%s", c, FS; print "";
for (r in row) {
printf "%s%s", r, FS
for (c in col) printf "%s%s", a[c,r], FS
print ""
}
}' file
It is quite descriptive, but still:
Store the data in an array a[col, row].
Store the possible names of cols and rows.
Once the file has been read, loop through the results and print.
For the given input it returns:
$ awk '{a[$1,$2]=$3; col[$1]; row[$2]} END {printf "%s", FS; for (c in col) printf "%s%s", c, FS; print ""; for (r in row) { printf "%s%s", r, FS; for (c in col) printf "%s%s", a[c,r], FS; print ""}}' a
f1 f2 f3
x1 1.2 1.1 2.3
x2 2.2 1.2 4.4
x3 0 3.3 0.1
% cat mat.rix
f1 x1 1.2
f1 x2 2.2
f1 x3 0
f2 x1 1.1
f2 x2 1.2
f2 x3 3.3
f3 x1 2.3
f3 x2 4.4
f3 x3 0.1
% cat a.wk
{
if(! row[$1]) { i=i+1; rowname[i]=$1; row[$1]=1 }
if(! col[$2]) { j=j+1; colname[j]=$2; col[$2]=1 }
if(c[$2])
c[$2] = sprintf("%s%s%10.4f", c[$2], OFS, $3)
else
c[$2] = sprintf("%10.4f", $3)
}
END {
printf " "
for(n=1;n<i+1;n++){ printf "%10s%s", rowname[n], OFS } ; print ""
for(n=1;n<j+1;n++){ print colname[n], c[colname[n]] }}
% awk -f a.wk mat.rix
f1 f2 f3
x1 1.2000 1.1000 2.3000
x2 2.2000 1.2000 4.4000
x3 0.0000 3.3000 0.1000
%
Addendum
What if the column names are of different length?
% cat aw.k
{
if(! row[$1]) { i=i+1; rowname[i]=$1; row[$1]=1 }
if(! col[$2]) { j=j+1; colname[j]=$2; col[$2]=1 }
if(c[$2])
c[$2] = sprintf("%s%s%10.4f", c[$2], OFS, $3)
else
c[$2] = sprintf("%10.4f", $3)
}
END {
for(n=1;n<j+1;n++){l = length(colname[n]) ; if(l>lmax) lmax=l}
format = sprintf("%%-%ds%%s%%s\n", lmax)
for(n=1;n<lmax+1;n++) printf "."; printf OFS
for(n=1;n<i+1;n++){ printf "%10s%s", rowname[n], OFS } ; print ""
for(n=1;n<j+1;n++){ printf(format, colname[n], OFS, c[colname[n]])}
}
% awk -f aw.k mat.rix
.. f1 f2 f3
x1 1.2000 1.1000 2.3000
x2 2.2000 1.2000 4.4000
x3 0.0000 3.3000 0.1000
%
please consider the use of %% when the format is built using sprintf

Resources