how to edit url string with sed - bash

My Linux repository file contain a link that until now was using http with a port number to point to it repository.
baseurl=http://host.domain.com:123/folder1/folder2
I now need a way to replace that URL to use https with no port or a different port .
I need also the possibility to change the server name for example from host.domain.com to host2.domain.com
So my idea was to use sed to search for the start of the http until the first / that come after the 2 // thus catching whatever in between and will give me the ability to change both server name port or http\s usage.
Im now using this code (im using echo just for the example):
the example shows how in 2 cases where one time i have a link with http and port 123 converted to https and the second time the other way around
and both code i was using the same sed for generic reasons.
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
OR
WANTED_URL="http://host.domain.com:123"
echo 'https://host.domain.com/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
is that the correct way doing so?

sed regexes are greedy by default. You can tell sed to consume only non-slashes, like this:
echo 'http://host.domain.com:123/folder1/folder2' | sed -e 's|http://[^/]*|https://host.domain.com|'
result:
https://host.domain.com/folder1/folder2
(BTW you don't have to escape slashes because you are using an alternate separating character)
the key is using [^/]* which will match anything but slashes so it stops matching at the first slash (non-greedy).
You used /.*/ and .* can contain slashes, not that you wanted (greedy by default).
Anyway my approach is different because expression does not include the trailing slash so it is not removed from final output.

Assuming it doesn't really matter if you have 1 sed script or 2 and there isn't a good reason to hard-code the URLs:
$ echo 'http://host.domain.com:123/folder1/folder2' |
sed 's|\(:[^:]*\)[^/]*|s\1|'
https://host.domain.com/folder1/folder2
$ port='123'; echo 'https://host.domain.com/folder1/folder2' |
sed 's|s\(://[^/]*\)|\1:'"$port"'|'
http://host.domain.com:123/folder1/folder2
If that isn't what you need then edit your question to clarify your requirements and in particular explain why:
You want to use hard-coded URLs, and
You need 1 script to do both transformations.
and provide concise, testable sample input and expected output that demonstrates those needs (i.e. cases where the above doesn't work).
wrt what you had:
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
The main issues are:
Don't use all-upper-case for non-exported shell variable names to avoid clashes with exported variables and to avoid obfuscating your code (this convention has been around for 40 years so people expect all upper case variables to be exported).
Never enclose any script in double quotes as it exposes the whole script to the shell for interpretation before the command you want to execute even sees it. Instead just open up the single quotes around the smallest script segment possible when necessary, i.e. to expand $y in a script use cmd 'x'"$y"'z' not cmd "x${y}z" because the latter will fail cryptically and dangerously given various input, script text, environment settings and/or the contents of the directory you run it from.
The -i option for sed is to edit a file in-place so you can't use it on an incoming pipe because you can't edit a pipe in-place.
When you let a shell variable expand to become part of a script, you have to take care about the possible characters it contains and how they'll be interpreted by the command given the context the variable expands into. If you let a whole URL expand into the replacement section of a sed script then you have to be careful to first escape any potential backreference characters or script delimiters. See Is it possible to escape regex metacharacters reliably with sed. If you just let the port number expand then you don't have to deal with any of that.

Related

Combine two expression in Bash

I did check the ABS, but it was hard to find a reference to my problem/question there.
Here it is. Consider the following code (Which extracts the first character of OtherVar and then converts MyVar to uppercase):
OtherVar=foobar
MyChar=${OtherVar:0:1} # get first character of OtherVar string variable
MyChar=${MyChar^} # first character to upper case
Could I somehow condense the second and third line into one statement?
P.S.: As was pointed out below, not needs to have a named variable. I should add, I would like to not add any sub-shells or so and would also accept a somehow hacky way to achieve the desired result.
P.P.S.: The question is purely educational.
You could do it all-in-one without forking sub-shell or running external command:
printf -v MyChar %1s "${OtherVar^}"
Or:
read -n1 MyChar <<<"${OtherVar^}"
Another option:
declare -u MyChar=${OtherVar:0:1}
But I can't see the point in such optimization in a bash script.
There are more suitable text processing interpreters, like awk, sed, even perl or python if performance matters.
You could use the cut command and put it in a complex expression to get it on one line, but I'm not sure it makes the code too much clearer:
OtherVar=foobar
MyChar=$(echo ${OtherVar^} | cut -c1-1) # uppercase first character and cut string

How do I locally source environment variables that I have defined in a Docker-format env-file?

I've written a bunch of environment variables in Docker format, but now I want to use them outside of that context. How can I source them with one line of bash?
Details
Docker run and compose have a convenient facility for importing a set of environment variables from a file. That file has a very literal format.
The value is used as is and not modified at all. For example if the value is surrounded by quotes (as is often the case of shell variables), the quotes are included in the value passed
Lines beginning with # are treated as comments and are ignored
Blank lines are also ignored.
"If no = is provided and that variable is…exported in your local environment," docker "passes it to the container"
Thankfully, whitespace before the = will cause the run to fail
so, for example, this env-file:
# This is a comment, with an = sign, just to mess with us
VAR1=value1
VAR2=value2
USER
VAR3=is going to = trouble
VAR4=this $sign will mess with things
VAR5=var # with what looks like a comment
#VAR7 =would fail
VAR8= but what about this?
VAR9="and this?"
results in these env variables in the container:
user=ubuntu
VAR1=value1
VAR2=value2
VAR3=is going to = trouble
VAR4=this $sign will mess with things
VAR5=var # with what looks like a comment
VAR8= but what about this?
VAR9="and this?"
The bright side is that once I know what I'm working with, it's pretty easy to predict the effect. What I see is what I get. But I don't think bash would be able to interpret this in the same way without a lot of changes. How can I put this square Docker peg into a round Bash hole?
tl;dr:
source <(sed -E -e "s/^([^#])/export \1/" -e "s/=/='/" -e "s/(=.*)$/\1'/" env.list)
You're probably going to want to source a file, whose contents
are executed as if they were printed at the command line.
But what file? The raw docker env-file is inappropriate, because it won't export the assigned variables such that they can be used by child processes, and any of the input lines with spaces, quotes, and other special characters will have undesirable results.
Since you don't want to hand edit the file, you can use a stream editor to transform the lines to something more bash-friendly. I started out trying to solve this with one or two complex Perl 5 regular expressions, or some combination of tools, but I eventually settled on one sed command with one simple and two extended regular expressions:
sed -E -e "s/^([^#])/export \1/" -e "s/=/='/" -e "s/(=.*)$/\1'/" env.list
This does a lot.
The first expression prepends export to any line whose first character is anything but #.
As discussed, this makes the variables available to anything else you run in this session, your whole point of being here.
The second expression simply inserts a single-quote after the first = in a line, if applicable.
This will always enclose the whole value, whereas a greedy match could lop off some of (e.g.) VAR3, for example
The third expression appends a second quote to any line that has at least one =.
it's important here to match on the = again so we don't create an unmatched quotation mark
Results:
# This is a comment, with an =' sign, just to mess with us'
export VAR1='value1'
export VAR2='value2'
export USER
export VAR3='is going to = trouble'
export VAR4='this $sign will mess with things'
export VAR5='var # with what looks like a comment'
#VAR7 ='would fail'
export VAR8=' but what about this?'
export VAR9='"and this?"'
Some more details:
By wrapping the values in single-quotes, you've
prevented bash from assuming that the words after the space are a command
appropriately brought the # and all succeeding characters into the VAR5
prevented the evaluation of $sign, which, if wrapped in double-quotes, bash would have interpreted as a variable
Finally, we'll take advantage of process substitution to pass this stream as a file to source, bring all of this down to one line of bash.
source <(sed -E -e "s/^([^#])/export \1/" -e "s/=/='/" -e "s/(=.*)$/\1'/" env.list)
Et voilĂ !

Using sed to replace text within a java properties file

I have a java properties file that looks like the following:
SiteUrlEndpoint=google.com/mySite
I want to use sed -i to inline replace the url but keep the context path that comes out of it. So for example if I wanted to change the properties file above to use amazon.com then the result would look like:
SiteUrlEndpoint=amazon.com/mySite
I am having trouble with sed to only replace the url and keeping the context path when replacing it inline.
My attempt:
sed -i 's:^[ \t]*siteUrlEndpoint[ \t]*=\([ \t]*.*\)[/]*$:siteUrlEndpoint = 'amazon.com':' file
You can do it with two backreferences, e.g.
sed -i.bak 's|^\(SiteUrlEndpoint=\).*/\(.*\)|\1amazon.com/\2|' file
note: the match of text up to / is greedy. If you have multiple parts of the path following the domain, you probably want to preserve all path components. To make it non-greedy, you could use the following instead
sed -i.bak 's|^\(SiteUrlEndpoint=\)[^/]*/\(.*\)|\1amazon.com/\2|' file
(you can add i.bak to create a backup of the original in file.bak)
To accomplish the same thing, you can match SiteUrlEndpoint= at the beginning of the line first, and then use a single backreference for the change, e.g.
sed -i.bak '/^SiteUrlEndpoint=/s|=[^/]*\(/.*\)|=amazon.com\1|' file
For example, given a file sites containing:
$ cat sites
SiteUrlEndpoint=google.com/path/to/mySite
SiteUrlSomeOther=google.com/mySite
You can change google.com to amazon.com with (using non-greedy form of first example):
$ sed -i 's|^\(SiteUrlEndpoint=\)[^/]*/\(.*\)|\1amazon.com/\2|' sites
Confirming:
$ cat sites
SiteUrlEndpoint=amazon.com/path/to/mySite
SiteUrlSomeOther=google.com/mySite
and
$ cat sites.bak
SiteUrlEndpoint=google.com/path/to/mySite
SiteUrlSomeOther=google.com/mySite
Explanation (first form)
sed -i.bak 's|^\(SiteUrlEndpoint=\) - locate & save
SiteUrlEndpoint=
[^/]*/ - match any folowing characters up to first / (non-greedy -
adjust as needed)
\(.*\) - match and save anything following /
|\1amazon.com/\2|' - full replacement (explanation below)
\1 - first back-reference containing SiteUrlEndpoint=
amazon.com - self-explanatory
/\2 - the '/' second back-reference of everything that followed.
Look over all the solutions and let me know if you have questions.
Regular expressions are hard, especially with complex regular expressions and/or large input files where unexpected changes are to be avoided.
Therefore I strongly recommend using sed -i.bak to keep a backup of the original file to then run a diff on both of them to see what changed.
Assuming that
You only want to change things after the tag siteUrlEndpoint (case insensitive)
You want to change the URL to amazon.com while leaving the path intact
I came up with this solution:
sed -i.bak 's;^\([ \t]*siteurlendpoint[ \t]*=[ \t]*\)[^/]*\(.*\);\1amazon.com\2;Ig' infile
I used a semicolon instead of your colon, that's just my preference when I don't want to use / ;)
Then I wrapped both the leading white spaces and siteurlendpoint as well as everything from the first / onwards into brackets \( \) so that I can take them again in the replacement with \1 and \2. That way I keep the indentation and the capitalisation of SiteUrlEndpoint intact.
For the search options I added an I to the g to make the search case insensitive. I am not sure how standard this option is, you might have to see whether your sed understands it.
The actual part that I want to replace I have just any character not including the next /: [^/]*
As for your line:
Your search term only searches for siteUrlEndpoint with lower case s. Since in your examples you wrote it with capital S, it wouldn't have triggered.
The final [/]*$ doesn't make any sense at all. "This line can end in zero or more of any of these caracters: /."
You precede this [/]*$ with .* which means: zero or more of any character at all.
The single quotes around 'amazon.com' might interfere with the single quotes around the whole search/replace term. It seems to work, but it is sloppy, and will fail if there are ever any spaces in there. It doesn't seem to serve any purpose anyway (except if you want to replace amazon.com with some environment variable like $NEWSITE) so I don't know why you're doing that.
Keep a backreference to the part just before the domain - then match and replace the domain - you can add the -i option after verifying the output of the sed command
url=amazon.com
sed -r 's/\b(SiteUrlEndpoint\s*=\s*)[^/]+/\1'$url'/'
Keep it simple:
$ sed -E 's/(SiteUrlEndpoint=)[^.]+/\1amazon/' file
SiteUrlEndpoint=amazon.com/mySite

Understanding 'sed' command

I am currently trying to install GCC-4.1.2 on my machine: Fedora 20.
In the instruction, the first three commands involve using 'sed' commands, for Makefile modification. However, I am having difficulty in using those commands properly for my case. The website link for GCC-4.1.2.
The commands are:
sed -i 's/install_to_$(INSTALL_DEST) //' libiberty/Makefile.in &&
sed -i 's#\./fixinc\.sh#-c true#' gcc/Makefile.in &&
sed -i 's/#have_mktemp_command#/yes/' gcc/gccbug.in &&
I am trying to understand them by reading the 'sed' man page, but it is not so easy to do so. Any help/tip would be appreciated!
First, the shell part: &&. That just chains the commands together, so each subsequent line will only be run if the prior one is run successfully.
sed -i means "run these commands inline on the file", that is, modify the file directly instead of printing the changed contents to STDOUT. Each sed command here (the string) is a substitute command, which we can tell because the command starts with s.
Substitute looks for a piece of text in the file, and then replaces it. So the order is always s/needle/replacement/. See how the first and last lines have those same forward-slashes? That's the traditional delimiter between the command (substitute), the needle to find in the haystack (install_to_$(INSTALL_DEST), and the text to replace it with ().
So, the first one looks for the string and deletes it (the empty replacement). The last one looks for #have_mktemp_command# and replaces it with yes.
The middle one is a bit weird. See how it starts with s# instead of s/? Well, sed will let you use any delimiter you like to separate the needle from the replacement. Since this needle had a / in it (\./fixinc\.sh), it made sense to use a different delimiter than /. It will replace the text ./fixinc.sh with -c true.
Last note: Why does the second needle have \. instead of .? Well, in a Regular Expression like the needle is (but not used in your example), some characters are magical and do magical fairy dust operations. One of those magic characters is .. To avoid the magic, we put a \ in front of it, escaping away from the magic. (The magic is "match any character", and we want a literal period. That's why.)

Search and replace in Shell

I am writing a shell (bash) script and I'm trying to figure out an easy way to accomplish a simple task.
I have some string in a variable.
I don't know if this is relevant, but it can contain spaces, newlines, because actually this string is the content of a whole text file.
I want to replace the last occurence of a certain substring with something else.
Perhaps I could use a regexp for that, but there are two moments that confuse me:
I need to match from the end, not from the start
the substring that I want to scan for is fixed, not variable.
for truncating at the start: ${var#pattern}
truncating at the end ${var%pattern}
${var/pattern/repl} for general replacement
the patterns are 'filename' style expansion, and the last one can be prefixed with # or % to match only at the start or end (respectively)
it's all in the (long) bash manpage. check the "Parameter Expansion" chapter.
amn expression like this
s/match string here$/new string/
should do the trick - s is for sustitute, / break up the command, and the $ is the end of line marker. You can try this in vi to see if it does what you need.
I would look up the man pages for awk or sed.
Javier's answer is shell specific and won't work in all shells.
The sed answers that MrTelly and epochwolf alluded to are incomplete and should look something like this:
MyString="stuff ttto be edittted"
NewString=`echo $MyString | sed -e 's/\(.*\)ttt\(.*\)/\1xxx\2/'`
The reason this works without having to use the $ to mark the end is that the first '.*' is greedy and will attempt to gather up as much as possible while allowing the rest of the regular expression to be true.
This sed command should work fine in any shell context used.
Usually when I get stuck with Sed I use this page,
http://sed.sourceforge.net/sed1line.txt

Resources