awk: print first column, then some values, and then all other columns

awk: print first column, then some values, and then all other columns - bash

I want to print the first column, then a couple of columns with fixed values, like this command would do:
awk '{print $1,"1","2","1"}'
and then print all columns except the first after that...
I know this command prints all but the first column:
awk '{$1=""; print $0}'
But that gets rid of the first column.
In other words, this:
3 5 2 2
3 5 2 2
3 5 2 2
3 5 2 2
Needs to become this:
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
Any ideas?

use a loop to iterate through rest of the columns like this:
awk '{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
As an example:
$echo "3 5 2 2" | awk 'BEGIN{ORS=""}{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
3 1 2 1 5 2 2
$
Edit1 :
$ echo "3 5 2 2" | awk 'BEGIN{ORS="\n";OFS="\n"}{print $1,"1","2","1 ";for(i=2;i<=NF;i++) print $i" "}'
3
1
2
1
5
2
2
$
Edit2:
$ echo "3 5 2 2" | awk '{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
3 1 2 1
5
2
2
$
Edit3:
$ echo "3 5 2 2
3 5 2 2
3 5 2 2
3 5 2 2" | awk '{printf("%s %s ", $1,"1 2 1");for(i=2;i<=NF;i++) printf("%s ", $i); printf "\n"}'
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2

You are almost there, you just need to store the first column in a temporary variable:
{
head=$1; # Store $1 in head, used later in printf
$1=""; # Empty $1, so that $0 will not contain first column
printf "%s 1 2 1%s\n", head, $0
}
And a full script:
echo "3 5 2 2" | awk '{head=$1;$1="";printf "%s 1 2 1%s\n", head, $0}'

Another solution with awk:
awk '{sub(/.*/, "1 2 1 "$2, $2)}1' File
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
Substitute the 2nd field with "1 2 1" followed by 2nd field itself.

You can do this using sed by replacing the first space by the string you want.
sed 's/ / 1 2 1 /' file
(OR)
With awk by replacing the first field($1):
awk '{$1=$1 " 1 2 1"}1' file
(I prefer the sed solution since it has less characters).

Related

AWK assign upper value for rank assignment during tie

I'm working on rank assignment to a list of values that is sorted in a file.
A miniature example is
Input:
1
2
2
2
3
4
Instead of normal ranking when there is a tie, I need to assign the upper value. So the required output is
1 1
2 4 #Note that it is not 2, since we have three 2's the upper bound is 4
2 4
2 4
3 5
4 6
I tried something like below, but it is not consistent.
$ awk ' BEGIN{t=0} NR==FNR { a[$1]++; next } { print $1,a[$1]+t; t=a[$1] } ' rank_in.txt rank_in.txt
1 1
2 4
2 6
2 6
3 4
4 2
This answer does normal ranking, so this question is not duplicate.

Instead of doing a double pass or keeping track of memory, we just use a uniq and reconstruct everything:
uniq -c file | awk '{n=n+$1;for(i=1;i<=$1;++i) print $2,n}' -

Two passes with just awk:
$ awk 'NR==FNR{rank[$1]=NR; next} {print $1, rank[$1]}' file file
1 1
2 4
2 4
2 4
3 5
4 6
or one pass with a pipe:
$ nl file | sort -k2,2 -k1,1nr | awk '$2!=prev{rank=$1; prev=$2} {print $2, rank}'
1 1
2 4
2 4
2 4
3 5
4 6
If you don't have nl on your system you could use cat -n or awk '{print NR, $0}' to generate the line numbers.

Try this awk:
awk 'FNR==NR {++fq[$1]; next} p != $1{s+=fq[$1]} {print p=$1, s}' file file
1 1
2 4
2 4
2 4
3 5
4 6

Assumptions:
input data is already sorted
Sample data:
$ cat rank.dat
1
2
2
2
3
4
One awk idea requiring a single pass through the file:
awk '
function print_rank() {
for ( i=1 ; i<=cnt ; i++ )
print id,rank
}
$1 != id { print_rank() # if we have a new id, print last id
cnt=0 # reset counter
}
{ id=$1 # keep track of current id
rank++ # increment rank by 1 for each new row processed
cnt++ # keep track of number of times we see this id
}
END { print_rank() } # flush last id to stdout
' rank.dat
This generates:
1 1
2 4
2 4
2 4
3 5
4 6

Another awk
$ awk ' NR==FNR { a[$1]++; next } { print $1, FNR + --a[$1] } ' rank_in.txt rank_in.txt
1 1
2 4
2 4
2 4
3 5
4 6
$

how to compare two column from same file?

I have long data file, file.txt
1 3
3 2
2 3
5 5
8 9
so out file should be, out.txt
1 3
1 2
1 5
1 9
3 3
3 2
3 5

Could you please try following.
awk '
FNR==NR{
a[++count]=$2
next
}
{
for(i=1;i<=count;i++){
print $1,a[i]
}
}
' Input_file Input_file

awk command to merge the content of the same file

I have an input file with the following content
1 1
2 1
3 289
4 1
5 2
0 Clear
1 Warning
2 Indeterminate
3 Minor
4 Major
5 Critical
I want to merge the first type of lines with the messages by the first column and obtain
1 1 Warning
2 1 Indeterminate
3 289 Minor
4 1 Major
5 2 Critical

Just use awk:
awk '$1 in a { print $1, a[$1], $2; next } { a[$1] = $2 }' file
Output:
1 1 Warning
2 1 Indeterminate
3 289 Minor
4 1 Major
5 2 Critical

Using join/sed, sed creates different views of the file for each part and join joins on the common field:
join <(sed '/^[0-9]* [0-9]* *$/!d' input) <(sed '/^[0-9]* [0-9]* *$/d' input)
Gives:
1 1 Warning
2 1 Indeterminate
3 289 Minor
4 1 Major
5 2 Critical

You can do this with Awk:
awk 'BEGIN{n=0}NR>6{n=1}n==0{a[$1]=$2}n==1{print $1,a[$1],$2}' file
or another way:
awk 'NR<=5{a[$1]=$2}$2~/[a-zA-z]+/ && $1>0 && $1<=5{print $1,a[$1],$2}' file

Comparing few colums of a file with columns of another file

I have two data files 1.txt and 2.txt
1.txt contains valid lines.
For example.
1 2 1 2
1 3 1 3
In 2.txt i have an extra coloum, but if you ignore that, I have a few valid lines, and few invalid lines. There could be multiple occurrences of the same line in 2.txt
For example:
1 2 1 2 1.9
1 3 1 3 3.4
1 3 1 3 3.4
2 3 2 3 5.6
2 3 2 3 5.6
The second and third lines are the same and valid.
The fourth and fifth lines are also the same but invalid.
I want to write a shell script which compares these two files and outputs two files, valid.txt and invalid.txt which look like these...
valid.txt :
1 2 1 2 1
1 3 1 3 2
and invalid.txt :
2 3 2 3 2
The last extra column of valid.txt and invalid.txt contains the number of times the line has been repeated in 2.txt.

this awk script works for the example data:
awk 'NR==FNR{sub(/ *$/,"");a[$0]++;next}
{sub(/ [^ ]*$/,"")
if($0 in a)
v[$0]++
else
n[$0]++
}
END{
for(x in v)print x,v[x] > "valid.txt"
for(x in n) print x,n[x] >"inv.txt"
}' file1 file2
output:
kent$ head inv.txt valid.txt
==> inv.txt <==
2 3 2 3 2
==> valid.txt <==
1 3 1 3 2
1 2 1 2 1

Split specific column(s)

I have this kind of recrods:
1 2 12345
2 4 98231
...
I need to split the third column into sub-columns to get this (separated by single-space for example):
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Can anybody offer me a nice solution in sed, awk, ... etc ? Thanks!
EDIT: the size of the original third column may vary record by record.

Awk
% echo '1 2 12345
2 4 98231
...' | awk '{
gsub(/./, "& ", $3)
print
}
'
1 2 1 2 3 4 5
2 4 9 8 2 3 1
...
[Tested with GNU Awk 3.1.7]
This takes every character (/./) in the third column ($3) and replaces (gsub()) it with itself followed by a space ("& ") before printing the entire line.

Sed solution:
sed -e 's/\([0-9]\)/\1 /g' -e 's/ \+/ /g'
The first sed expression replaces every digit with the same digit followed by a space. The second expression replaces every block of spaces with a single space, thus handling the double spaces introduced by the previous expression. With non-GNU seds you may need to use two sed invocations (one for each -e).

Using awk substr and printf:
[srikanth#myhost ~]$ cat records.log
1 2 12345 6 7
2 4 98231 8 0
[srikanth#myhost ~]$ awk '{ len=length($3); for(i=1; i<=NF; i++) { if(i==3) { for(j = 1; j <= len; j++){ printf substr($3,j,1) " "; } } else { printf $i " "; } } printf("\n"); }' records.log
1 2 1 2 3 4 5 6 7
2 4 9 8 2 3 1 8 0
You can use this for more than three column records as well.

Using perl:
perl -pe 's/([0-9])(?! )/\1 /g' INPUT_FILE
Test:
[jaypal:~/Temp] cat tmp
1 2 12345
2 4 98231
[jaypal:~/Temp] perl -pe 's/([0-9])(?! )/\1 /g' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Using gnu sed:
sed 's/\d/& /3g' INPUT_FILE
Test:
[jaypal:~/Temp] sed 's/[0-9]/& /3g' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Using gnu awk:
gawk '{print $1,$2,gensub(/./,"& ","G", $NF)}' INPUT_FILE
Test:
[jaypal:~/Temp] gawk '{print $1,$2,gensub(/./,"& ","G", $NF)}' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1

If you don't care about spaces, this is a succinct version:
sed 's/[0-9]/& /g'
but if you need to remove spaces, we just chain another regexp:
sed 's/[0-9]/& /g;s/ */ /g'
Note this is compatible with the original sed, thus will run on any UNIX-like.

$ awk -F '' '$1=$1' data.txt | tr -s ' '
1 2 1 2 3 4 5
2 4 9 8 2 3 1

This might work for you:
echo -e "1 2 12345\n2 4 98231" | sed 's/\B\s*/ /g'
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Most probably GNU sed only.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

awk: print first column, then some values, and then all other columns - bash

Another solution with awk: awk '{sub(/.*/, "1 2 1 "$2, $2)}1' File 3 1 2 1 5 2 2 3 1 2 1 5 2 2 3 1 2 1 5 2 2 3 1 2 1 5 2 2 Substitute the 2nd field with "1 2 1" followed by 2nd field itself.

You can do this using sed by replacing the first space by the string you want. sed 's/ / 1 2 1 /' file (OR) With awk by replacing the first field($1): awk '{$1=$1 " 1 2 1"}1' file (I prefer the sed solution since it has less characters).

Related

AWK assign upper value for rank assignment during tie

how to compare two column from same file?

awk command to merge the content of the same file

Comparing few colums of a file with columns of another file

Split specific column(s)

Categories

Resources