I want to use this table:
a 16 moe max us
b 11 tom mic us
d 14 roe fox au
t 29 ann teo au
n 28 joe joe ca
and make this matrix by using awk (or any other simple option in bash):
a_16; b_11; d_14; t_29; n_28
us; moe_max; tom_mic; ; ;
au; ; ; roe_fox; ann_teo;
ca; ; ; ; ; joe_joe
I tried this but it didn't work:
awk '{a[$5]=a[$5]?a[$5] FS $1"_"$2:$1"_"$2; b[$5]=b[$5]?b[$5] FS $3"_"$4:$3"_"$4;} END{for (i in a){print i"\t" a[i] "\t" b[i];}}' fis.txt
Using any awk
$ cat tst.awk
{
row = $NF
col = $1 "_" $2
vals[row,col] = $3 "_" $4
}
!seenRow[row]++ { rows[++numRows] = row }
!seenCol[col]++ { cols[++numCols] = col }
END {
OFS = "; "
printf " "
for ( colNr=1; colNr<=numCols; colNr++ ) {
col = cols[colNr]
printf "%s%s", col, (colNr<numCols ? OFS : ORS)
}
for ( rowNr=1; rowNr<=numRows; rowNr++ ) {
row = rows[rowNr]
printf "%s%s", row, OFS
for ( colNr=1; colNr<=numCols; colNr++ ) {
col = cols[colNr]
#val = ((row,col) in vals ? vals[row,col] : " ")
val = vals[row,col]
printf "%s%s", val, (colNr<numCols ? OFS : ORS)
}
}
}
$ awk -f tst.awk file
a_16; b_11; d_14; t_29; n_28
us; moe_max; tom_mic; ; ;
au; ; ; roe_fox; ann_teo;
ca; ; ; ; ; joe_joe
I can't see the pattern in the expected output in your question of when there should be 1, 2, 3, or 4 spaces after each ; so I just used a consistent 2 in the above. Massage it to suit.
Using gawk multidimensional arrays for collecting header columns and row indices:
awk '{
head[NR] = $1"_"$2;
idx[$5][NR] = $3"_"$4
}
END {
h = ""; col_size = length(head);
for (i = 1; i <= col_size; i++) {
h = sprintf("%s %s", h, head[i])
}
print h;
for (lab in idx) {
printf("%s", lab);
for (i = 1; i <= col_size; i++) {
v = sprintf("%s; %s", v, idx[lab][i])
}
print v;
v = "";
}
}' test.txt
a_16 b_11 d_14 t_29 n_28
ca; ; ; ; ; joe_joe
au; ; ; roe_fox; ann_teo;
us; moe_max; tom_mic; ; ;
Here is a ruby to do that:
ruby -e 'd=$<.read.
split(/\R/).
map(&:split).
map{|sa| sa.each_slice(2).map{|ss| ss.join("_") } }.
group_by{|sa| sa[-1] }
# {"us"=>[["a_16", "moe_max", "us"], ["b_11", "tom_mic", "us"]], "au"=>[["d_14", "roe_fox", "au"], ["t_29", "ann_teo", "au"]], "ca"=>[["n_28", "joe_joe", "ca"]]}
heads=d.values.flatten(1).map{|sa| sa[0]}
# ["a_16", "b_11", "d_14", "t_29", "n_28"]
hsh=Hash.new {|h,k| h[k] = ["\t"]*heads.length}
d.each{|k,v|
v.each{|sa|
hsh[k][heads.index(sa[0])]="\t#{sa[1]}"
}
}
puts heads.map{|e| "\t#{e}" }.join(";")
hsh.each{|k,v| puts "#{k};\t#{v.join(";")}"}
' file
Prints:
a_16; b_11; d_14; t_29; n_28
us; moe_max; tom_mic; ; ;
au; ; ; roe_fox; ann_teo;
ca; ; ; ; ; joe_joe
I have a file organized like this:
a b c d
x1
x2
x3
e f g h
x4
x5
x6
and so on. I would like to use awk to write another file as follows:
x1 x2 x3
x4 x5 x6
and so on. I am struggling since I'm still beginning to learn awk and sed. Any suggestions?
I would harness GNU AWK for this task following way, let file.txt content be
a b c d
x1
x2
x3
e f g h
x4
x5
x6
then
awk 'BEGIN{ORS=" "}NR==1{next}NF==1{print $1}NF>1{printf "\n"}' file.txt
gives output
x1 x2 x3
x4 x5 x6
Explanation: I inform GNU AWK to use space as output row separator (ORS), then for 1st row to go next row (skipping first row), if row has 1 field I do print 1st record ($1) which gets trailing space rather than newline, as I set ORS to space. If there is more than one field I just printf newline. Observe that printf does not add trailing space as opposed to print. If you want to know more about ORS or NR or NF then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
(tested in gawk 4.2.1)
I have csv file as below:
C1, C2, C3,Cv1,Cv2,Cv3,Cv4 ... this one can be have longer column
x1, x2 ,x3.1, 1.1, 1.2, 1.3, 1.4
x1, x2, x3.2, 2.1, 2.2, 2.3, 2.4
x1, x2, x3.3, 3.1, 3.2, 3.3, 3.4
i would like to transform this csv file to as below:
C1,C2, C3,CTEXT,XVALUE
x1, x2, x3.1, Cv1 , 1.1
x1, x2, x3.1, Cv2 , 1.2
x1, x2, x3.1, Cv3 , 1.3
x1, x2, x3.1, Cv4 , 1.4
x1, x2, x3.2, Cv1 , 2.1
x1, x2, x3.2, Cv2 , 2.2
x1, x2, x3.2, Cv3 , 2.3
x1, x2, x3.2, Cv4 , 2.4
x1, x2, x3.3, Cv1 , 3.1
x1,x2,x3.3, Cv2 , 3.2
x1,x2,x3.3, Cv3 , 3.3
x1,x2,x3.3, Cv4 , 3.4
Below is my code:
#!/bin/bash
awk -F, -v OFS=, '{ if (NR==1)
{ print $1,$2,$3, "CTEXT","XVALUE"
i=4; while (i < NF) {
a[i]=$i; i=i+1
}
am=NF; next
}
i=4 ; while (i < am) {
if (i > NF) {print "record "NR" insufficient value" >/dev/stderr
break}
print $1,$2,$3,a[i],$i
i=i+1
}
if (am <NF) print "record "NR" too many values for text" >/dev/stderr
}' input.csv
When i run the script, it shows error :
awk: syntax error near line 2
awk: bailing out near line 2
Edit by Ed Morton - I just ran the script through a beautifier (gawk -o- '...') so it's much easier to read/understand:
{
if (NR == 1) {
print $1, $2, $3, "CTEXT", "XVALUE"
i = 4
while (i < NF) {
a[i] = $i
i = i + 1
}
am = NF
next
}
i = 4
while (i < am) {
if (i > NF) {
print("record " NR " insufficient value") > (/dev/) stderr
break
}
print $1, $2, $3, a[i], $i
i = i + 1
}
if (am < NF) {
print("record " NR " too many values for text") > (/dev/) stderr
}
}
Even if you switch your Solaris awk to gawk or nawk, there still
remain some problems. Would you please try the following:
awk -F, -v OFS=, '
NR==1 {
print $1,$2,$3, "CTEXT","XVALUE"
for (i = 4; i <= NF; i++) a[i]=$i
am=NF; next
}
{
if (am < NF) {
print "record "NR" too many values for text" > "/dev/stderr"
next
}
for (i = 4; i <= am; i++) {
if (i > NF) {
print "record "NR" insufficient value" > "/dev/stderr"
break
}
print $1,$2,$3,a[i],$i
}
}' input.csv
You need to increment i up to NR or am (not < but <=).
Enclose /dev/stderr with quotes.
Better to use for loop rather than while.
Hope this helps.
something like this
$ awk -F, 'BEGIN {OFS=FS}
NR==1 {n=split($0,h);
print $1,$2,$3,"CTEXT","XVALUE";
next}
n!=NF {print n<NF?"too many":"not enough";
exit}
{for(i=4;i<=NF;i++) print $1,$2,$3,h[i],$i}' file
C1,C2,C3,CTEXT,XVALUE
x1,x2,x3.1,Cv1,1.1
x1,x2,x3.1,Cv2,1.2
x1,x2,x3.1,Cv3,1.3
x1,x2,x3.1,Cv4,1.4
x1,x2,x3.2,Cv1,2.1
x1,x2,x3.2,Cv2,2.2
x1,x2,x3.2,Cv3,2.3
x1,x2,x3.2,Cv4,2.4
x1,x2,x3.3,Cv1,3.1
x1,x2,x3.3,Cv2,3.2
x1,x2,x3.3,Cv3,3.3
x1,x2,x3.3,Cv4,3.4
To draw a task. I have dataframe:
x y1;y2;y3 z1;z2;z3
a b1;b2 c1;c2
I need:
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2
Column 1 has one instance always. Number of instances in a cell can be from one to many but always equal between column 2,3. Thanks
In awk:
$ awk -F"(\t|;)" '{
for(i=2;i<=4;i++)
if($i!="")
print $1, $i, $(i+3)
}' file
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2
Edit: Another version:
$ awk -F"(\t+|;)" '{ # FS tabs or semicolon
for(i=2;i<=int(NF/2)+1;i++)
print $1,$i,$(i+int(NF/2))
}' file
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2
Something like this should make it:
declare -a cols=() # array for individual columns (line fields)
IFS=' ;' # fields separators
while read -a cols; do
n=${#cols[#]} # number of fields in current line
if (( n < 3 || n % 2 != 1 )); then # skip invalid lines
printf "skipping invalid line: %s\n" "${cols[*]}"
continue
fi
for (( i = 1; i <= n / 2; i += 1 )); do # loop over pairs of fields
# printf line
printf "%s %s %s\n" "${cols[0]}" "${cols[i]}" "${cols[n/2+i]}"
done
done < data.txt
Explanations:
IFS is the list of characters used by read to split a line in fields. In your case spaces and ; seem to be the separators.
read -a cols assigns the fields of the read line to the cols array, starting at cell 0.
Example of run:
$ cat data.txt
x y1;y2;y3 z1;z2;z3
a b1;b2 c1;c2
$ ./foo.sh
x y1 z1
x y2 z2
x y3 z3
a b1 c1
a b2 c2
I have lot of files modified (after filtration) and I need to print NR and characters about new files into column - lets see example:
input files: x1, x2, x3, y1, y2, y3, z1, z2, z3 ...
script:
for i in x* y* z*
do awk -v h=$i 'END{c+=lenght+1 ;print h "\t" NR "\t" c}' >> stats.txt
done;
my output looks like:
x1 NR c
x2 NR c
x3 NR c
y1 NR c
y2 NR c
y3 NR c
z1 NR c
z2 NR c
z3 NR c
And I need to save each loop to new column no line:
x1 NR c y1 NR c z1 NR c
x2 NR c y2 NR c z2 NR c
x3 NR c y3 NR c z3 NR c
to keep corresponding files (after filtration) on the same line. I hope I am clear. I need to do this in BASH and awk. Thank you for any help!!
EDITED:
the real output look like:
x 0.457143 872484
y 0.527778 445759
z 0.416667 382712
x 0.457143 502528
y 0.5 575972
z 0.444444 590294
x 0.371429 463939
y 0.694444 398033
z 0.56565 656565
.
.
.
and I need:
x 0.457143 872484 0.457143 502528 0.371429 463939
y 0.52777 445759 0.5 575972 0.694444 398033
.
.
.
I hope it is clear..
Try this:
cat data | tr -d , | awk '{for (i = 1; i <= NF; i += 3) print $i " NR c " $(i+1) " NR c " $(i+2) " NR c"}'
Output:
x1 NR c x2 NR c x3 NR c
y1 NR c y2 NR c y3 NR c
z1 NR c z2 NR c z3 NR c
Same table but transposed (for your task variant):
cat data | tr -d , | awk '{for (i = 1; i <= NF/3; i += 1) print $i " NR c " $(i+3) " NR c " $(i+6) " NR c"}'
Output:
x1 NR c y1 NR c z1 NR c
x2 NR c y2 NR c z2 NR c
x3 NR c y3 NR c z3 NR c
For your task update check the following solution (using bash):
cat data | sort | while read L;
do
y=`echo $L | cut -f1 -d' '`;
{
test "$x" = "$y" && echo -n " `echo $L | cut -f2- -d' '`";
} ||
{
x="$y";echo -en "\n$L";
};
done
(from my solution for similar problem)
Updated script after comment:
sort data | while read L
do
y="`echo \"$L\" | cut -f1 -d' '`"
if [ "$x" = "$y" ]
then
echo -n " `echo \"$L\" | cut -f2- -d' '`"
else
x="$y"
echo -en "\n$L"
fi
done