Replace everything between two character - bash

All.
I am newbie to sed.
I want something like
Input:
ABC,DEF,GHI,JKL,MNO
Output:
ABC,,,,MNO
Means....
I want to remove all contents between two ','

This might work for you (GNU sed):
sed 's/[^,]*,/,/2g' file

you could set all fields between 1 and last to empty with awk:
awk -F, -v OFS="," '{for(i=2;i<NF;i++)$i=""}7'

Related

Grabbing only text/substring between 4th and 7th underscores in all lines of a file

I have a list.txt which contains the following lines.
Primer_Adapter_clean_KL01_BOLD1_100_KL01_BOLD1_100_N701_S507_L001_merged.fasta
Primer_Adapt_clean_KL01_BOLD1_500_KL01_BOLD1_500_N704_S507_L001_merged.fasta
Primer_Adapt_clean_LD03_BOLD2_Sessile_LD03_BOLD2_Sessile_N710_S506_L001_merged.fasta
Now I would like to grab only the substring between the 4th underscore and 7th underscore such that it will appear as below
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
I tried the below awk command but I guess I've got it wrong. Any help here would be appreciated. If this can be achieved via sed, I would be interested in that solution too.
awk -v FPAT="[^__]*" '$4=$7' list.txt
I feel like awk is overkill for this. You can just use cut to select just the fields you want:
$ cut -d_ -f5-7 list.txt
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
awk 'BEGIN{FS=OFS="_"} {print $5,$6,$7}' file
Output:
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03

Awk Match a TSV column and replace all rows with a prefix in bash

I have a TSV file with the following format:
HAPPY today I feel good
SAD this is a bad day
UPSET Hey please leave me alone!
I have to replace the first column value with a prefix like __label__ plus my value to lower, so that to have as output
__label__happy today I feel good
__label__sad this is a bad day
__label__upset Hey please leave me alone!
in the shell (using awk, sed) etc.
awk 'BEGIN{FS=OFS="\t"}{ $1 = "__label__" tolower($1) }1' infile
Following awk may also help you in same too.
awk -F"\t" '{$1=tolower($1);printf("_label_%s\n",$0)}' OFS="\t" Input_file
another awk
$ awk 'sub($1,"__label__"tolower($1))' file
with GNU sed
$ sed -r 's/[^t]+/__label__\L&/' file

Remove multiple lines and following based on string in bash

I have a (fasta) file input.fa that looks like this
>coucou
GAGAGATAGTATAGATATATAGGATATATA
>hello_world
GATATATTCTCTCTGAFAGACGACGACFGACTACTACGAC
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
I would like to get rid of both
>coucou
GAGAGATAGTATAGATATATAGGATATATA
and
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
What I am doing is (based on this solution by #Hai Vu)
$awk '/hello/{getline;next} 1' input.fa | awk '/coucou/{getline;next} 1'
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
Is there a way of doing this (using awk or sed or perl script) without "piping" the first awk result into a second awk command? (sthg like /hello&coucou/{getline;next} 1' input.fa)
Thanks for your answer!
One simple way:
$ awk '/hello/{getline;next} /coucou/{getline;next} 1' input.fa
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
Or if you prefer:
$ awk '/(hello)|(coucou)/{getline;next} 1' input.fa
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
A simple sed command can also handle this:
sed -nr '/>(hello|coucou)/{N;d};p' file
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
This might work for you (GNU sed):
sed -r '/>(coucou|ziva_wesh)/,+1d' file
This deletes the ranges of 2 lines (the match of the line containing >coucou or >ziva_wesh and the following line).

Delete every other row in CSV file using AWK or grep

I have a file like this:
1000_Tv178.tif,34.88552709
1000_Tv178.tif,
1000_Tv178.tif,34.66987165
1000_Tv178.tif,
1001_Tv180.tif,65.51335742
1001_Tv180.tif,
1002_Tv184.tif,33.83784863
1002_Tv184.tif,
1002_Tv184.tif,22.82542442
1002_Tv184.tif,
How can I make it like this using a simple Bash command? :
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
Im other words, I need to delete every other row, starting with the second.
Thanks!
hek2mgl's (deleted) answer was on the right track, given the output you actually desire.
awk -F, '$2'
This says, print every row where the second field has a value.
If the second field has a value, but is nothing but whitespace you want to exclude, try this:
awk -F, '$2~/.*[^[:space:]].*/'`
You could also do this with sed:
sed '/,$/d'
Which says, delete every line that ends with a comma. I'm sure there's a better way, I avoid sed.
If you really want to explicitly delete every other row:
awk 'NR%2'
This says, print every row where the row number modulo 2 is not zero. If you really want to delete every even row it doesn't actually matter that it's a comma-delimited file.
awk provides a simple way
awk 'NR % 2' file.txt
This might work for you (GNU sed):
sed '2~2d' file
or:
sed 'n;d' file
Here's the gnu sed equivalent of the awk answers provided. Now you can safely use sed's -i flag, by specifying a backup extension:
sed -n -i.bak 'N;P' file.txt
Note that gawk4 can do this too:
gawk -i inplace -v INPLACE_SUFFIX=".bak" 'NR%2==1' file.txt
Results:
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
If OPs input does not contain space after last number or , this awk can be used.
awk '!/,$/'
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
But its not robust at all, any space after , brakes it.
This should fix the last space:
awk '!/,[ ]*$/'
Thank for your help guys, but I also had to make a workaround:
Read it into R and then wrote it out again. Then I installed GNU versions of awk and used gawk '{if ((FNR % 2) != 0) {print $0}}'. So if anyone else have the same problem, try it!

insert a blank line between every two lines in a file using shell, sed or awk

I have a file with many lines. I want to insert a blank line between each two lines
for example
original file
xfdljflsad
fjdiaopqqq
dioapfdja;
I want to make it as:
xfdljflsad
fjdiaopqqq
dioapfdja;
how to achieve this?
I want to use shell script, awk or sed for this?
thanks!
With sed, use
sed G input-file
If pilcrow is correct and you do not want an additional newline at the end of the file,
then do:
sed '$!G' input-file
Another alternative is to use pr:
pr -dt input-file
awk '{print nl $0; nl="\n"}' file
My approach if I want to quickly regex a file.
vim file.txt
%s/\n/\n\n/g
Idiomatic awk:
awk 1 ORS='\n\n' file
Similar thing with perl:
perl -nE 'say' file
Append | head -n -1 if final newline is unwanted.

Resources