Find Everything between 2 strings -- Sed - shell

I have file which has data in below format.
{"default":true,"groupADG":["ABC","XYZ:mno"],"groupAPR":true}
{"default":true,"groupADG":["PQR"],"groupAPR":true}
I am trying to get output as
"ABC","XYZ:mno"
"PQR"
I tried doing it using sed but somewhere I am going wrong .
sed -e 's/groupADG":[\(.*\)],"groupAPR"/\1/ file.txt
Regards.
Note: If anyone is rating the question negative, I would request to give a reason also for same. As I have tried to fix it myself , since I was unable to do it I posted it here. I also gave my trial example.

Here is one potential solution:
sed -n 's/.*\([[].*[]]\).*/\1/p' file.txt
To exclude the brackets:
sed -n 's/.*\([[]\)\(.*\)\([]]\).*/\2/p'
Also, this would work using AWK:
awk -F'[][]' '{print $2}' file.txt
Just watch out for edge cases (e.g. if there are multiple fields with square brackets in the same line you may need a different strategy)

With your shown samples, following may also help you on same.
awk 'match($0,/\[[^]]*/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
OR with OP's attempts try with /"groupADG" also:
awk 'match($0,/"groupADG":\[[^]]*/){print substr($0,RSTART+12,RLENGTH-12)}' Input_file

With awk setting FS as [][] and the condition /groupADG/
awk -F'[][]' '/groupADG/ {print $2}' file
"ABC","XYZ:mno"
"PQR"

Related

Grabbing only text/substring between 4th and 7th underscores in all lines of a file

I have a list.txt which contains the following lines.
Primer_Adapter_clean_KL01_BOLD1_100_KL01_BOLD1_100_N701_S507_L001_merged.fasta
Primer_Adapt_clean_KL01_BOLD1_500_KL01_BOLD1_500_N704_S507_L001_merged.fasta
Primer_Adapt_clean_LD03_BOLD2_Sessile_LD03_BOLD2_Sessile_N710_S506_L001_merged.fasta
Now I would like to grab only the substring between the 4th underscore and 7th underscore such that it will appear as below
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
I tried the below awk command but I guess I've got it wrong. Any help here would be appreciated. If this can be achieved via sed, I would be interested in that solution too.
awk -v FPAT="[^__]*" '$4=$7' list.txt
I feel like awk is overkill for this. You can just use cut to select just the fields you want:
$ cut -d_ -f5-7 list.txt
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
awk 'BEGIN{FS=OFS="_"} {print $5,$6,$7}' file
Output:
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03

Awk Match a TSV column and replace all rows with a prefix in bash

I have a TSV file with the following format:
HAPPY today I feel good
SAD this is a bad day
UPSET Hey please leave me alone!
I have to replace the first column value with a prefix like __label__ plus my value to lower, so that to have as output
__label__happy today I feel good
__label__sad this is a bad day
__label__upset Hey please leave me alone!
in the shell (using awk, sed) etc.
awk 'BEGIN{FS=OFS="\t"}{ $1 = "__label__" tolower($1) }1' infile
Following awk may also help you in same too.
awk -F"\t" '{$1=tolower($1);printf("_label_%s\n",$0)}' OFS="\t" Input_file
another awk
$ awk 'sub($1,"__label__"tolower($1))' file
with GNU sed
$ sed -r 's/[^t]+/__label__\L&/' file

awk unable to ignore "++"

check="a1++"
awk -F":" -v name="$check" 'tolower($2)~ tolower(name)' file.txt
It seems that there are some issue with awk when working with a string of "++". It unable to retrieve the value in the file. However,I have tried to change check="44b" it seems to be working perfectly fine
apple:44b:Vietnam
orange:A1++ approved:China
jelly:-34:Malaysia
pear:98:Malaysia
file.txt
As Glenn Jackman mentioned in comments, you should be using index vs ~ since + is a regex metacharacter.
If you use ++ then orange:A123 approved:China would also match...
You can do:
$ awk -F: -v name="$check" 'index(tolower($2), tolower(name))' file
this works on my awk but don't use since the right hand side is treated as regex and special characters will take over. If regex match is not intended, substring match with index is better as in the other answer.
$ check="a1++"; awk -F: -v name="$check" 'tolower($2)~tolower(name)' file
orange:A1++:China
check your awk version, perhaps it's broken
or with the other input
$ check="a1++"; awk -F: -v name="$check" 'tolower($2)~tolower(name)' file
orange:A1++ approved:China
will match as expected, note that "name" is on the right hand side of the regex match and "$2" is on the left.

remove first text with shell script

Please someone help me with this bash script,
lets say I have lots of files with url like below:
https://example.com/x/c-ark4TxjU8/mybook.zip
https://example.com/x/y9kZvVp1k_Q/myfilename.zip
My question is, how to remove all other text and leave only the file name?
I've tried to use the command described in this url How to delete first two lines and last four lines from a text file with bash?
But since the text is random which means it doesn't have exact numbers the code is not working.
You can use the sed utility to parse out just the filenames
sed 's_.*\/__'
You can use awk:
The easiest way that I find:
awk -F/ '{print $NF}' file.txt
or
awk -F/ '{print $6}' file.txt
You can also use sed:
sed 's;.*/;;' file.txt
You can use cut:
cut -d'/' -f6 file.txt

Delete every other row in CSV file using AWK or grep

I have a file like this:
1000_Tv178.tif,34.88552709
1000_Tv178.tif,
1000_Tv178.tif,34.66987165
1000_Tv178.tif,
1001_Tv180.tif,65.51335742
1001_Tv180.tif,
1002_Tv184.tif,33.83784863
1002_Tv184.tif,
1002_Tv184.tif,22.82542442
1002_Tv184.tif,
How can I make it like this using a simple Bash command? :
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
Im other words, I need to delete every other row, starting with the second.
Thanks!
hek2mgl's (deleted) answer was on the right track, given the output you actually desire.
awk -F, '$2'
This says, print every row where the second field has a value.
If the second field has a value, but is nothing but whitespace you want to exclude, try this:
awk -F, '$2~/.*[^[:space:]].*/'`
You could also do this with sed:
sed '/,$/d'
Which says, delete every line that ends with a comma. I'm sure there's a better way, I avoid sed.
If you really want to explicitly delete every other row:
awk 'NR%2'
This says, print every row where the row number modulo 2 is not zero. If you really want to delete every even row it doesn't actually matter that it's a comma-delimited file.
awk provides a simple way
awk 'NR % 2' file.txt
This might work for you (GNU sed):
sed '2~2d' file
or:
sed 'n;d' file
Here's the gnu sed equivalent of the awk answers provided. Now you can safely use sed's -i flag, by specifying a backup extension:
sed -n -i.bak 'N;P' file.txt
Note that gawk4 can do this too:
gawk -i inplace -v INPLACE_SUFFIX=".bak" 'NR%2==1' file.txt
Results:
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
If OPs input does not contain space after last number or , this awk can be used.
awk '!/,$/'
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
But its not robust at all, any space after , brakes it.
This should fix the last space:
awk '!/,[ ]*$/'
Thank for your help guys, but I also had to make a workaround:
Read it into R and then wrote it out again. Then I installed GNU versions of awk and used gawk '{if ((FNR % 2) != 0) {print $0}}'. So if anyone else have the same problem, try it!

Resources