I have a file full of IDs which I need to use to build a list of URLs as part of a bash file.
ids.txt is as follows:
s_Foo
p_Bar
s1_Blah
e_Yah
The URLs will always end in a filename that contains the ID, in its own path.
I've looked around for how to prepend and append using sed, but cannot figure out to do the duplicating copy/paste part (\1) using that tool. The ID can be anything, so pattern matching seems hard. Duplication of everything before the line break seems more sensible? I don't know.
How do I create something like this as urls.txt using sed or awk? Is it possible?
https://link.domain.com/list/s_Foo/s_Foo_meta.xml
https://link.domain.com/list/p_Bar/p_Bar_meta.xml
https://link.domain.com/list/s1_Blah/s1_Blah_meta.xml
https://link.domain.com/list/e_Yah/e_Yah_meta.xml
$ sed 's#.*#https://link.domain.com/list/&/&_meta.xml#' ids.txt
https://link.domain.com/list/s_Foo/s_Foo_meta.xml
https://link.domain.com/list/p_Bar/p_Bar_meta.xml
https://link.domain.com/list/s1_Blah/s1_Blah_meta.xml
https://link.domain.com/list/e_Yah/e_Yah_meta.xml
$ awk '{sub(/.*/,"https://link.domain.com/list/&/&_meta.xml")}1' ids.txt
https://link.domain.com/list/s_Foo/s_Foo_meta.xml
https://link.domain.com/list/p_Bar/p_Bar_meta.xml
https://link.domain.com/list/s1_Blah/s1_Blah_meta.xml
https://link.domain.com/list/e_Yah/e_Yah_meta.xml
try gnu sed:
sed -E 's/\S+/https://link.domain.com/list/&/&_meta.xml' ids.txt >urls.txt
Related
I have a fasta with headers like this:
tr|Q7MX99|Q7MX99_PORGI_BACT
I would like them to say:
tr|Q7MX99|Q7MX99_PORGI_BACT_ORALMICROBIOME
So basically, whenever I have PORGI_BACT I want to append _ORALMICROBIOME to each instance.
I'm sure there is an easy fix through the terminal, but I can't seem to find it.
My first idea is to do something like:
sed 's/>.*/&_ORALMICROBIOME/' file.fa > outfile.fa
BUT I only want to add to specific header endings, and that is where I'm stuck.
Using sed:
sed -r 's/(^.*)(PORGI_BACT|HUMAN_MAM|TESTA_BACT)(.*$)/\1\2_ORALMICROBIOME\3/' file.fa > outfile.fa
Enable regular expression interpretation using -r or -E and then split the line into three sections based on "PORGI_BACT" being in section two and then substitute the line for the first and second sections, followed by "_ORALMICROBIOME" and finally the third section.
You are almost close. Would you please try the following:
sed 's/^>.*PORGI_BACT/&_ORALMICROBIOME/' file.fa > outfile.fa
[Edit]
According to the OP's requirement, how about:
sed -E 's/^>.*(PORGI_BACT|HUMAN_MAM|TESTA_BACT)/&_ORALMICROBIOME/' file.fa > outfile.fa
Sample input as file.fa:
>SEQ0|tr|Q7MX99|Q7MX99_PORGI_BACT
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>SEQ1|tr|Q7MX88|Q7MX88_HUMAN_MAM
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME
LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>SEQ2|tr|Q7MX77|Q7MX77_TESTA_BACT
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>SEQ3|tr|Q7MX66|Q7MX66_DUMMY
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
Output:
>SEQ0|tr|Q7MX99|Q7MX99_PORGI_BACT_ORALMICROBIOME
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>SEQ1|tr|Q7MX88|Q7MX88_HUMAN_MAM_ORALMICROBIOME
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME
LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>SEQ2|tr|Q7MX77|Q7MX77_TESTA_BACT_ORALMICROBIOME
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>SEQ3|tr|Q7MX66|Q7MX66_DUMMY
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
I am doing a find and replace using sed in a bash script. I want to search each file for words with files and no. If both the words are present in the same line then replace red with green else do nothing
sed -i -e '/files|no s/red/green' $file
But I am unable to do so. I am not receiving any error and the file doesn't get updated.
What am I doing wrong here or what is the correct way of achieving my result
/files|no/ means to match lines with either files or no, it doesn't require both words on the same line.
To match the words in either order, use /files.*no|no.*files/.
sed -i -r -e '/files.*no|no.*files/s/red/green/' "$file"
Notice that you need another / at the end of the pattern, before s, and the s operation requires / at the end of the replacement.
And you need the -r option to make sed use extended regexp; otherwise you have to use \| instead of just |.
This might work for you (GNU sed):
sed '/files/{/no/s/red/green/}' file
or:
sed '/files/!b;/no/s/red/green/' file
This method allows for easy extension e.g. foo, bar and baz:
sed '/foo/!b;/bar/!b;/baz/!b;s/red/green/' file
or fee, fie, foe and fix:
sed '/fee/!b;/fi/!b;/foe/!b;/fix/!b;s/bacon/cereal/' file
An awk verison
awk '/files/ && /no/ {sub(/red/,"green")} 1' file
/files/ && /no/ files and no have to be on the same line, in any order
sub(/red/,"green") replace red with green. Use gsub(/red/,"green") if there are multiple red
1 always true, do the default action, print the line.
I've never used sed apart from the few hours trying to solve this. I have a config file with parameters like:
test.us.param=value
test.eu.param=value
prod.us.param=value
prod.eu.param=value
I need to parse these and output this if REGIONID is US:
test.param=value
prod.param=value
Any help on how to do this (with sed or otherwise) would be great.
This works for me:
sed -n 's/\.us\././p'
i.e. if the ".us." can be replaced by a dot, print the result.
If there are hundreds and hundreds of lines it might be more efficient to first search for lines containing .us. and then do the string replacement... AWK is another good choice or pipe grep into sed
cat INPUT_FILE | grep "\.us\." | sed 's/\.us\./\./g'
Of course if '.us.' can be in the value this isn't sufficient.
You could also do with with the address syntax (technically you can embed the second sed into the first statement as well just can't remember syntax)
sed -n '/\(prod\|test\).us.[^=]*=/p' FILE | sed 's/\.us\./\./g'
We should probably do something cleaner. If the format is always environment.region.param we could look at forcing this only to occur on the text PRIOR to the equal sign.
sed -n 's/^\([^,]*\)\.us\.\([^=]\)=/\1.\2=/g'
This will only work on lines starting with any number of chars followed by '.' then 'us', then '.' and then anynumber prior to '=' sign. This way we won't potentially modify '.us.' if found within a "value"
I am attempting to write a bash script that will use sed to replace an entire line in a text file beginning with a given string, and I only want it to perform this replacement for the first match.
For example, in my text file I may have:
hair=brown
age=25
eyes=blue
age=35
weight=177
And I may want to simply replace the first occurrence of a line beginning with "age" with a different number without affecting the 2nd instance of age:
hair=brown
age=55
eyes=blue
age=35
weight=177
So far, I've come up with
sed -i "0,/^PATTERN/s/^PATTERN/PATTERN=XY/" test.txt
but this will only replace the string "age" itself rather than the entire line. I've been trying to throw a "\c" in there somewhere to change the entire line but nothing is working so far. Does anyone have any ideas as to how this can be resolved? Thanks.
Like #ruakh suggests, you can use
sed -i "0,/^PATTERN/ s/^PATTERN=.*$/PATTERN=XY/" test.txt
A shorter and less repetitive way of doing the same would be
sed -i '0,/^\(PATTERN=\).*/s//\1XY/' test.txt
which takes advantage of backreferences and the fact that not specifying a pattern in an s-expression will use the previously matched pattern.
0,...-ranges only work in GNU sed. An alternative might be to use shell redirect with sed:
{ sed '/^\(PATTERN\).*/!n; s//\1VAL;q'; cat ;} < file
or use awk:
awk '$1=="LABEL" && !n++ {$2="VALUE"}1' FS=\\= OFS=\\= file
I'm looking for a way to remove lines within multiple csv files, in bash using sed, awk or anything appropriate where the file ends in 0.
So there are multiple csv files, their format is:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLElong,60,0
EXAMPLEcon,120,6
EXAMPLEdev,60,0
EXAMPLErandom,30,6
So the file will be amended to:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
A problem which I can see arising is distinguishing between double digits that end in zero and 0 itself.
So any ideas?
Using your file, something like this?
$ sed '/,0$/d' test.txt
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
For this particular problem, sed is perfect, as the others have pointed out. However, awk is more flexible, i.e. you can filter on an arbitrary column:
awk -F, '$3!=0' test.csv
This will print the entire line is column 3 is not 0.
use sed to only remove lines ending with ",0":
sed '/,0$/d'
you can also use awk,
$ awk -F"," '$NF!=0' file
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
this just says check the last field for 0 and don't print if its found.
sed '/,[ \t]*0$/d' file
I would tend to sed, but there is an egrep (or: grep -e) -solution too:
egrep -v ",0$" example.csv