awk command to select exact word in any field - shell

I have input file as
ab,1,3,qqq,bbc
b,445,jj,abc
abcqwe,234,23,123
abc,12,bb,88
uirabc,33,99,66
I have to select the rows which has only 'abc'. And note that abc string can appear in any of the column. Please help me how to achieve this using awk.
Output:
b,445,jj,abc
abc,12,bb,88

You could also use plain grep:
grep "(^|,)abc(,|$)" file
Or if you have to use awk
awk '/(^|,)abc(,|$)/' file

Using awk
awk 'gsub(/(^|,)abc(,|$)/,"&")' file
b,445,jj,abc
abc,12,bb,88
Based on Beny23s regex.
It does look for abc where its starting from ^ start or from a , and
ends with a , or end of line $

Another one using beny23 regex:
awk 'NF>1' FS="(^|,)abc(,|$)" infile
Not asked but if you feel the need to filter just the lines with one ocurrence:
$ cat infile
ab,1,3,qqq,bbc
b,445,jj,abc
abcqwe,234,23,123
abc,12,bb,88
abc,12,bb,abc
uirabc,33,99,66
This will be handy:
$ awk 'NF==2' FS="(^|,)abc(,|$)" infile
b,445,jj,abc
abc,12,bb,88
Also possible using Jotne solution:
$ awk 'gsub(/(^|,)abc(,|$)/,"&")==1' infile

Through awk,
$ awk -F, '{for(i=1;i<=NF;i++){if($i=="abc") print $0;}}' file | uniq
b,445,jj,abc
abc,12,bb,88
OR
$ awk -F, '{for(i=1;i<=NF;i++){if($i=="abc") {print; next}}}' file
b,445,jj,abc,abc
abc,12,bb,88
In the above awk command Field Separator variable is set to , . AWk parses the input file line by line. for function is used to traverse all the fields in a line. If a value of a particular field is abc, then it prints the whole line.

Related

Replacing new line with comma seperator

I have a text file that the records in the following format. Please note that there are no empty files within the Name, ID and Rank section.
"NAME","STUDENT1"
"ID","123"
"RANK","10"
"NAME","STUDENT2"
"ID","124"
"RANK","11"
I have to convert the above file to the below format
"STUDENT1","123","10"
"STUDENT2","124","11"
I understand that this can be achieved using shell script by reading the records and writing it to another output file. But can this can done using awk or sed ?
$ awk -F, '{ORS=(NR%3?FS:RS); print $2}' file
"STUDENT1","123","10"
"STUDENT2","124","11"
With awk:
awk -F, '$1=="\"RANK\""{print $2;next}{printf "%s,",$2}' file
With awk, printing newline each 3 lines:
awk -F, '{printf "%s",$2;if (NR%3){printf ","}else{print""};}'
Following awk may also help you on same.
awk -F, '{ORS=$0~/^"RANK/?"\n":FS;print $NF}' Input_file
With sed
sed -E 'N;N;;y/\n/ /;s/([^,]*)(,[^ ]*)/\2/g;s/,//' infile

awk command: adding prefix to an csv file

I am trying to add an prefix to my csv file. Below is the source csv
A,B
121ABC,London
2212ABC,Paris
312ABC,Tokyo
I am using the following awk command
$ awk -F=',' -vOFS=',' '{$2="AC_"$2; print}' t.csv >t1.csv
But, the output is somewhat adding another column to the csv file.
A,B,AC_
121ABC,London,AC_
2212ABC,Paris,AC_
312ABC,Tokyo,AC_
Any pointers as to where the error is?
You're setting FS to =, instead of ,. Use -F',' or -v FS=',' but not -F=','.
Since you require , for both input and output field separators you should be setting them together to that value in one place rather than setting them both separately to the same value:
awk 'BEGIN{FS=OFS=","} {$2="AC_"$2; print}' t.csv >t1.csv
You can use this awk:
awk 'BEGIN{FS=OFS=","} {$2 = "AC_" $2} 1' file
A,AC_B
121ABC,AC_London
2212ABC,AC_Paris
312ABC,AC_Tokyo
perhaps simpler with sed
$ sed 's/,/&AC_/' file
A,AC_B
121ABC,AC_London
2212ABC,AC_Paris
312ABC,AC_Tokyo

Extract string between two patterns (inclusive) while conserving the format

I have a file in the following format
cat test.txt
id1,PPLLTOMaaaaaaaaaaaJACK
id2,PPLRTOMbbbbbbbbbbbJACK
id3,PPLRTOMcccccccccccJACK
I am trying to identify and print the string between TOM and JACK including these two strings, while maintaining the first column FS=,
Desired output:
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
So far I have tried gsub:
awk -F"," 'gsub(/.*TOM|JACK.*/,"",$2) && !_[$0]++' test.txt > out.txt
and have the following output
id1 aaaaaaaaaaa
id2 bbbbbbbbbbb
id3 ccccccccccc
As you can see I am getting close but not able to include TOM and JACK patterns in my output. Plus I am also losing the original FS. What am I doing wrong?
Any help will be appreciated.
You are changing a field ($2) which causes awk to reconstruct the record using the value of OFS as the field separator and so in this case changing the commas to spaces.
Never use _ as a variable name - using a name with no meaning is just slightly better than using a name with the wrong meaning, just pick a name that means something which, in this case is seen but idk what you are trying to do when using that in this context.
gsub() and sub() do not support capture groups so you either need to use match()+substr():
$ awk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/){$2=substr($2,RSTART,RLENGTH)} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
or use GNU awk for the 3rd arg to match()
$ gawk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/,a){$2=a[0]} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
or for gensub():
$ gawk 'BEGIN{FS=OFS=","} {$2=gensub(/.*(TOM.*JACK).*/,"\\1","",$2)} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
The main difference between the match() and gensub() solutions is how they would behave if TOM appeared twice on the line:
$ cat file
id1,PPLLfooTOMbarTOMaaaaaaaaaaaJACK
id2,PPLRTOMbbbbbbbbbbbJACKfooJACKbar
id3,PPLRfooTOMbarTOMcccccccccccJACKfooJACKbar
$
$ awk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/,a){$2=a[0]} 1' file
id1,TOMbarTOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACKfooJACK
id3,TOMbarTOMcccccccccccJACKfooJACK
$
$ awk 'BEGIN{FS=OFS=","} {$2=gensub(/.*(TOM.*JACK).*/,"\\1","",$2)} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACKfooJACK
id3,TOMcccccccccccJACKfooJACK
and just to show one way of stopping at the first instead of the last JACK on the line:
$ awk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/,a){$2=gensub(/(JACK).*/,"\\1","",a[0])} 1' file
id1,TOMbarTOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMbarTOMcccccccccccJACK
Use capture groups to save the parts of the line you want to keep. Here's how to do it with sed
sed 's/^\([^,]*,\).*\(TOM.*JACK\).*/\1\2/' <test.txt > out.txt
Do you mean to do the following?
$ cat test.txt
id1,PPLLTOMaaaaaaaaaaaJACKABCD
id2,PPLRTOMbbbbbbbbbbbJACKDFCC
id3,PPLRTOMcccccccccccJACKSDER
$ cat test.txt | sed -e 's/,.*TOM/,TOM/g' | sed -e 's/JACK.*/JACK/g'
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
$
This should work as long as the TOM and JACK do not repeat themselves.
sed 's/\(.*,\).*\(TOM.*JACK\).*/\1\2/' <oldfile >newfile
Output:
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK

Add blank column using awk or sed

I have a file with the following structure (comma delimited)
116,1,89458180,17,FFFF,0403254F98
I want to add a blank column on the 4th field such that it becomes
116,1,89458180,,17,FFFF,0403254F98
Any inputs as to how to do this using awk or sed if possible ?
thank you
Assuming that none of the fields contain embedded commas, you can restate the task as replacing the third comma with two commas. This is just:
sed 's/,/,,/3'
With the example line from the file:
$ echo "116,1,89458180,17,FFFF,0403254F98" | sed 's/,/,,/3'
116,1,89458180,,17,FFFF,0403254F98
You can use this awk,
awk -F, '$4="," $4' OFS=, yourfile
(OR)
awk -F, '$4=FS$4' OFS=, yourfile
If you want to add 6th and 8th field,
awk -F, '{$4=FS$4; $1=FS$1; $6=FS$6}1' OFS=, yourfile
Through awk
$ echo '116,1,89458180,17,FFFF,0403254F98' | awk -F, -v OFS="," '{print $1,$2,$3,","$4,$5,$6}'
116,1,89458180,,17,FFFF,0403254F98
It prints a , after third field(delimited) by ,
Through GNU sed
$ echo 116,1,89458180,17,FFFF,0403254F98| sed -r 's/^([^,]*,[^,]*,[^,]*)(.*)$/\1,\2/'
116,1,89458180,,17,FFFF,0403254F98
It captures all the characters upto the third command and stored it into a group. Characters including the third , upto the last are stored into another group. In the replacement part, we just add an , between these two captured groups.
Through Basic sed,
Through Basic sed
$ echo 116,1,89458180,17,FFFF,0403254F98| sed 's/^\([^,]*,[^,]*,[^,]*\)\(.*\)$/\1,\2/'
116,1,89458180,,17,FFFF,0403254F98
echo 116,1,89458180,17,FFFF,0403254F98|awk -F',' '{print $1","$2","$3",,"$4","$5","$6}'
Non-awk
t="116,1,89458180,17,FFFF,0403254F98"
echo $(echo $t|cut -d, -f1-3),,$(echo $t|cut -d, -f4-)
You can use bellow awk command to achieve that.Replace the $3 with what ever the column that you want to make it blank.
awk -F, '{$3="" FS $3;}1' OFS=, filename
sed -e 's/\([^,]*,\)\{4\}/&,/' YourFile
replace the sequence of 4 [content (non comma) than comma ] by itself followed by a comma

Awk adding constant values

I have data in the text file like val1,val2 with multiple lines
and I want to change it to 1,val1,val2,0,0,1
I tried with print statement in awk(solaris) to add constants by it didn't work.
What is the correct way to do it ?
(From the comments) This is what I tried
awk -F, '{print "%s","1,"$1","$2"0,0,1"}' test.txt
Based on the command you posted, a little change makes it:
$ awk -F, 'BEGIN{OFS=FS} {print 1,$1,$2,0,0,1}' file
1,val1,val2,0,0,1
OR using printf (I prefer print):
$ awk -F, '{printf "1,%s,%s,0,0,1", $1, $2}' file
1,val1,val2,0,0,1
To prepend every line with the constant 1 and append with 0,0,1 simply do:
$ awk '{print 1,$0,0,0,1}' OFS=, file
1,val1,val2,0,0,1
A idiomatic way would be:
$ awk '$0="1,"$0",0,0,1"' file
1,val1,val2,0,0,1
Using sed:
sed 's/.*/1,&,0,0,1/' inputfile
Example:
$ echo val1,val2 | sed 's/.*/1,&,0,0,1/'
1,val1,val2,0,0,1

Resources