Awk Match a TSV column and replace all rows with a prefix in bash

Awk Match a TSV column and replace all rows with a prefix in bash - bash

I have a TSV file with the following format:
HAPPY today I feel good
SAD this is a bad day
UPSET Hey please leave me alone!
I have to replace the first column value with a prefix like __label__ plus my value to lower, so that to have as output
__label__happy today I feel good
__label__sad this is a bad day
__label__upset Hey please leave me alone!
in the shell (using awk, sed) etc.

awk 'BEGIN{FS=OFS="\t"}{ $1 = "__label__" tolower($1) }1' infile

Following awk may also help you in same too.
awk -F"\t" '{$1=tolower($1);printf("_label_%s\n",$0)}' OFS="\t" Input_file

another awk
$ awk 'sub($1,"__label__"tolower($1))' file
with GNU sed
$ sed -r 's/[^t]+/__label__\L&/' file

Related

Find Everything between 2 strings -- Sed

I have file which has data in below format.
{"default":true,"groupADG":["ABC","XYZ:mno"],"groupAPR":true}
{"default":true,"groupADG":["PQR"],"groupAPR":true}
I am trying to get output as
"ABC","XYZ:mno"
"PQR"
I tried doing it using sed but somewhere I am going wrong .
sed -e 's/groupADG":[\(.*\)],"groupAPR"/\1/ file.txt
Regards.
Note: If anyone is rating the question negative, I would request to give a reason also for same. As I have tried to fix it myself , since I was unable to do it I posted it here. I also gave my trial example.

Here is one potential solution:
sed -n 's/.*\([[].*[]]\).*/\1/p' file.txt
To exclude the brackets:
sed -n 's/.*\([[]\)\(.*\)\([]]\).*/\2/p'
Also, this would work using AWK:
awk -F'[][]' '{print $2}' file.txt
Just watch out for edge cases (e.g. if there are multiple fields with square brackets in the same line you may need a different strategy)

With your shown samples, following may also help you on same.
awk 'match($0,/\[[^]]*/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
OR with OP's attempts try with /"groupADG" also:
awk 'match($0,/"groupADG":\[[^]]*/){print substr($0,RSTART+12,RLENGTH-12)}' Input_file

With awk setting FS as [][] and the condition /groupADG/
awk -F'[][]' '/groupADG/ {print $2}' file
"ABC","XYZ:mno"
"PQR"

Grabbing only text/substring between 4th and 7th underscores in all lines of a file

I have a list.txt which contains the following lines.
Primer_Adapter_clean_KL01_BOLD1_100_KL01_BOLD1_100_N701_S507_L001_merged.fasta
Primer_Adapt_clean_KL01_BOLD1_500_KL01_BOLD1_500_N704_S507_L001_merged.fasta
Primer_Adapt_clean_LD03_BOLD2_Sessile_LD03_BOLD2_Sessile_N710_S506_L001_merged.fasta
Now I would like to grab only the substring between the 4th underscore and 7th underscore such that it will appear as below
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
I tried the below awk command but I guess I've got it wrong. Any help here would be appreciated. If this can be achieved via sed, I would be interested in that solution too.
awk -v FPAT="[^__]*" '$4=$7' list.txt

I feel like awk is overkill for this. You can just use cut to select just the fields you want:
$ cut -d_ -f5-7 list.txt
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03

awk 'BEGIN{FS=OFS="_"} {print $5,$6,$7}' file
Output:
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03

Replacing new line with comma seperator

I have a text file that the records in the following format. Please note that there are no empty files within the Name, ID and Rank section.
"NAME","STUDENT1"
"ID","123"
"RANK","10"
"NAME","STUDENT2"
"ID","124"
"RANK","11"
I have to convert the above file to the below format
"STUDENT1","123","10"
"STUDENT2","124","11"
I understand that this can be achieved using shell script by reading the records and writing it to another output file. But can this can done using awk or sed ?

$ awk -F, '{ORS=(NR%3?FS:RS); print $2}' file
"STUDENT1","123","10"
"STUDENT2","124","11"

With awk:
awk -F, '$1=="\"RANK\""{print $2;next}{printf "%s,",$2}' file

With awk, printing newline each 3 lines:
awk -F, '{printf "%s",$2;if (NR%3){printf ","}else{print""};}'

Following awk may also help you on same.
awk -F, '{ORS=$0~/^"RANK/?"\n":FS;print $NF}' Input_file

With sed
sed -E 'N;N;;y/\n/ /;s/([^,]*)(,[^ ]*)/\2/g;s/,//' infile

Delete every other row in CSV file using AWK or grep

I have a file like this:
1000_Tv178.tif,34.88552709
1000_Tv178.tif,
1000_Tv178.tif,34.66987165
1000_Tv178.tif,
1001_Tv180.tif,65.51335742
1001_Tv180.tif,
1002_Tv184.tif,33.83784863
1002_Tv184.tif,
1002_Tv184.tif,22.82542442
1002_Tv184.tif,
How can I make it like this using a simple Bash command? :
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
Im other words, I need to delete every other row, starting with the second.
Thanks!

hek2mgl's (deleted) answer was on the right track, given the output you actually desire.
awk -F, '$2'
This says, print every row where the second field has a value.
If the second field has a value, but is nothing but whitespace you want to exclude, try this:
awk -F, '$2~/.*[^[:space:]].*/'`
You could also do this with sed:
sed '/,$/d'
Which says, delete every line that ends with a comma. I'm sure there's a better way, I avoid sed.
If you really want to explicitly delete every other row:
awk 'NR%2'
This says, print every row where the row number modulo 2 is not zero. If you really want to delete every even row it doesn't actually matter that it's a comma-delimited file.

awk provides a simple way
awk 'NR % 2' file.txt

This might work for you (GNU sed):
sed '2~2d' file
or:
sed 'n;d' file

Here's the gnu sed equivalent of the awk answers provided. Now you can safely use sed's -i flag, by specifying a backup extension:
sed -n -i.bak 'N;P' file.txt
Note that gawk4 can do this too:
gawk -i inplace -v INPLACE_SUFFIX=".bak" 'NR%2==1' file.txt
Results:
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442

If OPs input does not contain space after last number or , this awk can be used.
awk '!/,$/'
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
But its not robust at all, any space after , brakes it.
This should fix the last space:
awk '!/,[ ]*$/'

Thank for your help guys, but I also had to make a workaround:
Read it into R and then wrote it out again. Then I installed GNU versions of awk and used gawk '{if ((FNR % 2) != 0) {print $0}}'. So if anyone else have the same problem, try it!

Replace everything between two character

All.
I am newbie to sed.
I want something like
Input:
ABC,DEF,GHI,JKL,MNO
Output:
ABC,,,,MNO
Means....
I want to remove all contents between two ','

This might work for you (GNU sed):
sed 's/[^,]*,/,/2g' file

you could set all fields between 1 and last to empty with awk:
awk -F, -v OFS="," '{for(i=2;i<NF;i++)$i=""}7'

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Awk Match a TSV column and replace all rows with a prefix in bash - bash

awk 'BEGIN{FS=OFS="\t"}{ $1 = "label" tolower($1) }1' infile

Following awk may also help you in same too. awk -F"\t" '{$1=tolower($1);printf("_label_%s\n",$0)}' OFS="\t" Input_file

another awk $ awk 'sub($1,"label"tolower($1))' file with GNU sed $ sed -r 's/[^t]+/label\L&/' file

Related

Find Everything between 2 strings -- Sed

Grabbing only text/substring between 4th and 7th underscores in all lines of a file

Replacing new line with comma seperator

Delete every other row in CSV file using AWK or grep

Replace everything between two character

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Awk Match a TSV column and replace all rows with a prefix in bash - bash

awk 'BEGIN{FS=OFS="\t"}{ $1 = "__label__" tolower($1) }1' infile

Following awk may also help you in same too. awk -F"\t" '{$1=tolower($1);printf("_label_%s\n",$0)}' OFS="\t" Input_file

another awk $ awk 'sub($1,"__label__"tolower($1))' file with GNU sed $ sed -r 's/[^t]+/__label__\L&/' file

Related

Find Everything between 2 strings -- Sed

Grabbing only text/substring between 4th and 7th underscores in all lines of a file

Replacing new line with comma seperator

Delete every other row in CSV file using AWK or grep

Replace everything between two character

Categories

Resources

awk 'BEGIN{FS=OFS="\t"}{ $1 = "label" tolower($1) }1' infile

another awk $ awk 'sub($1,"label"tolower($1))' file with GNU sed $ sed -r 's/[^t]+/label\L&/' file