Delete every other row in CSV file using AWK or grep - bash

I have a file like this:
1000_Tv178.tif,34.88552709
1000_Tv178.tif,
1000_Tv178.tif,34.66987165
1000_Tv178.tif,
1001_Tv180.tif,65.51335742
1001_Tv180.tif,
1002_Tv184.tif,33.83784863
1002_Tv184.tif,
1002_Tv184.tif,22.82542442
1002_Tv184.tif,
How can I make it like this using a simple Bash command? :
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
Im other words, I need to delete every other row, starting with the second.
Thanks!

hek2mgl's (deleted) answer was on the right track, given the output you actually desire.
awk -F, '$2'
This says, print every row where the second field has a value.
If the second field has a value, but is nothing but whitespace you want to exclude, try this:
awk -F, '$2~/.*[^[:space:]].*/'`
You could also do this with sed:
sed '/,$/d'
Which says, delete every line that ends with a comma. I'm sure there's a better way, I avoid sed.
If you really want to explicitly delete every other row:
awk 'NR%2'
This says, print every row where the row number modulo 2 is not zero. If you really want to delete every even row it doesn't actually matter that it's a comma-delimited file.

awk provides a simple way
awk 'NR % 2' file.txt

This might work for you (GNU sed):
sed '2~2d' file
or:
sed 'n;d' file

Here's the gnu sed equivalent of the awk answers provided. Now you can safely use sed's -i flag, by specifying a backup extension:
sed -n -i.bak 'N;P' file.txt
Note that gawk4 can do this too:
gawk -i inplace -v INPLACE_SUFFIX=".bak" 'NR%2==1' file.txt
Results:
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442

If OPs input does not contain space after last number or , this awk can be used.
awk '!/,$/'
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
But its not robust at all, any space after , brakes it.
This should fix the last space:
awk '!/,[ ]*$/'

Thank for your help guys, but I also had to make a workaround:
Read it into R and then wrote it out again. Then I installed GNU versions of awk and used gawk '{if ((FNR % 2) != 0) {print $0}}'. So if anyone else have the same problem, try it!

Related

Grabbing only text/substring between 4th and 7th underscores in all lines of a file

I have a list.txt which contains the following lines.
Primer_Adapter_clean_KL01_BOLD1_100_KL01_BOLD1_100_N701_S507_L001_merged.fasta
Primer_Adapt_clean_KL01_BOLD1_500_KL01_BOLD1_500_N704_S507_L001_merged.fasta
Primer_Adapt_clean_LD03_BOLD2_Sessile_LD03_BOLD2_Sessile_N710_S506_L001_merged.fasta
Now I would like to grab only the substring between the 4th underscore and 7th underscore such that it will appear as below
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
I tried the below awk command but I guess I've got it wrong. Any help here would be appreciated. If this can be achieved via sed, I would be interested in that solution too.
awk -v FPAT="[^__]*" '$4=$7' list.txt
I feel like awk is overkill for this. You can just use cut to select just the fields you want:
$ cut -d_ -f5-7 list.txt
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
awk 'BEGIN{FS=OFS="_"} {print $5,$6,$7}' file
Output:
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03

Awk Match a TSV column and replace all rows with a prefix in bash

I have a TSV file with the following format:
HAPPY today I feel good
SAD this is a bad day
UPSET Hey please leave me alone!
I have to replace the first column value with a prefix like __label__ plus my value to lower, so that to have as output
__label__happy today I feel good
__label__sad this is a bad day
__label__upset Hey please leave me alone!
in the shell (using awk, sed) etc.
awk 'BEGIN{FS=OFS="\t"}{ $1 = "__label__" tolower($1) }1' infile
Following awk may also help you in same too.
awk -F"\t" '{$1=tolower($1);printf("_label_%s\n",$0)}' OFS="\t" Input_file
another awk
$ awk 'sub($1,"__label__"tolower($1))' file
with GNU sed
$ sed -r 's/[^t]+/__label__\L&/' file

sed command in linux.how to use sed to replace only first n ocuurences

I want to replace only first four occurences of LC-COUNT=1.how can i do that.
sed -i "s/LC-COUNT=1/LC-COUNT=$LC_COUNT/1,4" file.txt
Try this -
sed -e '0,/LC-COUNT=1/s//LC-COUNT=\$LC_COUNT/' file.txt > output.txt
Running it only once will replace first occurrences of LC-COUNT=1 by LC-COUNT=$LC_COUNT and will put the output output.txt file. Note : You will have to escape $ char first.
You are going to have to run it four times. But next time, consider output.txt as the original file, I mean do the replace in output.txt.
I think finding and replacing first N occurrences is not possible with sed.
In vim you do the similar kind of thing like -
:%s/LC-COUNT=1/LC-COUNT=\$LC_COUNT/gc
There gc option will ask you for confirmation on each find-replace. You can
This is better suited for awk.
Consider this awk command:
awk -F= '$1=="LC-COUNT" && c<=4 {$2="=$LC_COUNT";c++}1' OFS= file

insert a blank line between every two lines in a file using shell, sed or awk

I have a file with many lines. I want to insert a blank line between each two lines
for example
original file
xfdljflsad
fjdiaopqqq
dioapfdja;
I want to make it as:
xfdljflsad
fjdiaopqqq
dioapfdja;
how to achieve this?
I want to use shell script, awk or sed for this?
thanks!
With sed, use
sed G input-file
If pilcrow is correct and you do not want an additional newline at the end of the file,
then do:
sed '$!G' input-file
Another alternative is to use pr:
pr -dt input-file
awk '{print nl $0; nl="\n"}' file
My approach if I want to quickly regex a file.
vim file.txt
%s/\n/\n\n/g
Idiomatic awk:
awk 1 ORS='\n\n' file
Similar thing with perl:
perl -nE 'say' file
Append | head -n -1 if final newline is unwanted.

Appending to line with sed, adding separator if necessary

I have a properties file, which, when unmodified has the following line:
worker.list=
I would like to use sed to append to that line a value so that after sed has run, the line in the file reads:
worker.list=test
But, when I run the script a second time, I want sed to pick up that a value has already been added, and thus adds a separator:
worker.list=test,test
That's the bit that stumps me (frankly sed scares me with its power, but that's my problem!)
Rich
Thats easy! If you're running GNU sed, you can write it rather short
sed -e '/worker.list=/{s/$/,myValue/;s/=,/=/}'
That'll add ',myValue' to the line, and then remove the comma (if any) after the equal sign.
If you're stuck on some other platform you need to break it apart like so
sed -e '/worker.list=/{' -e 's/$/,myValue/' -e 's/=,/=/' -e '}'
It's a pretty stupid script in that it doesn't know about existance of values etc (I suppose you CAN do a more elaborate parsing, but why should you?), but I guess that's the beauty of it. Oh and it'll destroy a line like this
worker.list=,myval
which will turn into
worker.list=myval,test
If that's a problem let me know, and I'll fix that for you.
HTH.
you can also use awk. Set field delimiter to "=". then what you want to append is always field number 2. example
$ more file
worker.list=
$ awk -F"=" '/worker\.list/{$2=($2=="")? $2="test" : $2",test"}1' OFS="=" file
worker.list=test
$ awk -F"=" '/worker\.list/{$2=($2=="")? $2="test" : $2",test"}1' OFS="=" file >temp
$ mv temp file
$ awk -F"=" '/worker\.list/{$2=($2=="")? $2="test1" : $2",test1"}1' OFS="=" file
worker.list=test,test1
or the equivalent of the sed answer
$ awk -F"=" '/worker\.list/{$2=",test1";sub("=,","=")}1' OFS="=" file

Resources