Using sed to replace text within a java properties file - bash

I have a java properties file that looks like the following:
SiteUrlEndpoint=google.com/mySite
I want to use sed -i to inline replace the url but keep the context path that comes out of it. So for example if I wanted to change the properties file above to use amazon.com then the result would look like:
SiteUrlEndpoint=amazon.com/mySite
I am having trouble with sed to only replace the url and keeping the context path when replacing it inline.
My attempt:
sed -i 's:^[ \t]*siteUrlEndpoint[ \t]*=\([ \t]*.*\)[/]*$:siteUrlEndpoint = 'amazon.com':' file

You can do it with two backreferences, e.g.
sed -i.bak 's|^\(SiteUrlEndpoint=\).*/\(.*\)|\1amazon.com/\2|' file
note: the match of text up to / is greedy. If you have multiple parts of the path following the domain, you probably want to preserve all path components. To make it non-greedy, you could use the following instead
sed -i.bak 's|^\(SiteUrlEndpoint=\)[^/]*/\(.*\)|\1amazon.com/\2|' file
(you can add i.bak to create a backup of the original in file.bak)
To accomplish the same thing, you can match SiteUrlEndpoint= at the beginning of the line first, and then use a single backreference for the change, e.g.
sed -i.bak '/^SiteUrlEndpoint=/s|=[^/]*\(/.*\)|=amazon.com\1|' file
For example, given a file sites containing:
$ cat sites
SiteUrlEndpoint=google.com/path/to/mySite
SiteUrlSomeOther=google.com/mySite
You can change google.com to amazon.com with (using non-greedy form of first example):
$ sed -i 's|^\(SiteUrlEndpoint=\)[^/]*/\(.*\)|\1amazon.com/\2|' sites
Confirming:
$ cat sites
SiteUrlEndpoint=amazon.com/path/to/mySite
SiteUrlSomeOther=google.com/mySite
and
$ cat sites.bak
SiteUrlEndpoint=google.com/path/to/mySite
SiteUrlSomeOther=google.com/mySite
Explanation (first form)
sed -i.bak 's|^\(SiteUrlEndpoint=\) - locate & save
SiteUrlEndpoint=
[^/]*/ - match any folowing characters up to first / (non-greedy -
adjust as needed)
\(.*\) - match and save anything following /
|\1amazon.com/\2|' - full replacement (explanation below)
\1 - first back-reference containing SiteUrlEndpoint=
amazon.com - self-explanatory
/\2 - the '/' second back-reference of everything that followed.
Look over all the solutions and let me know if you have questions.

Regular expressions are hard, especially with complex regular expressions and/or large input files where unexpected changes are to be avoided.
Therefore I strongly recommend using sed -i.bak to keep a backup of the original file to then run a diff on both of them to see what changed.
Assuming that
You only want to change things after the tag siteUrlEndpoint (case insensitive)
You want to change the URL to amazon.com while leaving the path intact
I came up with this solution:
sed -i.bak 's;^\([ \t]*siteurlendpoint[ \t]*=[ \t]*\)[^/]*\(.*\);\1amazon.com\2;Ig' infile
I used a semicolon instead of your colon, that's just my preference when I don't want to use / ;)
Then I wrapped both the leading white spaces and siteurlendpoint as well as everything from the first / onwards into brackets \( \) so that I can take them again in the replacement with \1 and \2. That way I keep the indentation and the capitalisation of SiteUrlEndpoint intact.
For the search options I added an I to the g to make the search case insensitive. I am not sure how standard this option is, you might have to see whether your sed understands it.
The actual part that I want to replace I have just any character not including the next /: [^/]*
As for your line:
Your search term only searches for siteUrlEndpoint with lower case s. Since in your examples you wrote it with capital S, it wouldn't have triggered.
The final [/]*$ doesn't make any sense at all. "This line can end in zero or more of any of these caracters: /."
You precede this [/]*$ with .* which means: zero or more of any character at all.
The single quotes around 'amazon.com' might interfere with the single quotes around the whole search/replace term. It seems to work, but it is sloppy, and will fail if there are ever any spaces in there. It doesn't seem to serve any purpose anyway (except if you want to replace amazon.com with some environment variable like $NEWSITE) so I don't know why you're doing that.

Keep a backreference to the part just before the domain - then match and replace the domain - you can add the -i option after verifying the output of the sed command
url=amazon.com
sed -r 's/\b(SiteUrlEndpoint\s*=\s*)[^/]+/\1'$url'/'

Keep it simple:
$ sed -E 's/(SiteUrlEndpoint=)[^.]+/\1amazon/' file
SiteUrlEndpoint=amazon.com/mySite

Related

sed command for inserting text inside single quote

Suppose there's a text file with the following line:
export MYSQL_ADMIN=''
I want to insert text inside that single quote using the sed command, so that it changes to something like this for example:
export MYSQL_ADMIN='abc1'
What is the appropriate sed command for that in Linux?
I tried
sed -i -e ''/MYSQL_ADMIN/s/''/'abc1'/g"
but it didn't work.
Something like sed -i "s;export MYSQL_ADMIN=.*;export MYSQL_ADMIN='abc1';" /path/to/file.ext
-i modify file in place
s means substitute,
First block is what you are matching as an regular expression - the .* matches everything to the end of the line, this ensures you don't keep any text on that line after the substitue - and second block is what you are replacing with that match.
Always check the file after each run of sed if there is no error and check what changed.
To get the single quotes to print you may have to do ""'"" like ""'""abc1""'""
It is important to understand that although
I want to insert text inside that single quote using the sed command
is a perfectly good characterization of the effect you want to achieve, it does not map directly onto operations from sed's repertoire. With sed, the appropriate tool for most line modifications is the s command, which substitutes specified text for one or more matches to a specified regular expression. That would be the most natural thing to use for your case.
Additionally, it is important with sed to understand how and when to bind commands to specific lines. If you don't do that for a given command then it is applied to all lines. Sometimes that's fine, but other times it will produce unwanted results.
I tried
sed -i -e ''/MYSQL_ADMIN/s/''/'abc1'/g"
but it didn't work.
The two leading single quotes in that sed expression match each other, leaving the trailing double quote unmatched. Also, you do not specify the name of the file to modify. This variation would at least be valid shell syntax, and it would have the desired effect on the specified line appearing in file my_script:
sed -i -e "/MYSQL_ADMIN/s/''/'abc1'/g" my_script
That might also make other, unwanted changes, however.
You need to make some assumptions about the content of the file in order to do such a thing at all. The above depends on the text MYSQL_ADMIN and '' to appear on the same line only in the line(s) you want to modify. That may turn out to hold, but it seems unnecessarily risky. An assumption more likely to hold in general would be that there will be only one assignment to variable MYSQL_ADMIN, or that it is acceptable to modify all such assignments that assign a single-quote-delimited empty value.
Going with the latter, one might end up with this:
sed -i -e "s/\<MYSQL_ADMIN=''\(\s\|$\)/MYSQL_ADMIN='abc1'\1/g" my_script
The pattern \<MYSQL_ADMIN=''\(\s\|$\) improves on your plain MYSQL_ADMIN in these significant ways:
the \< causes it to match only immediately after a word boundary -- start of line, whitesepace, or punctuation. This prevents substitutions for other variables whose names happen to end with MYSQL_ADMIN. If you prefer, it would be even stronger to instead anchor the match to the beginning of the line with ^.
including the ='' in the pattern distinguishes between MYSQL_ADMIN and variables whose names contain that as an initial substring. It also ensures that the '' that gets replaced, if any, goes with the variable and does not merely appear somewhere else on the line.
the \(\s\|$\) both matches and captures either a whitespace character or the empty string at the end of a line. This distinguishes between assignments of an empty value and assignments of values that are merely prefixed by '' (which is valid if the file is a shell script). Having included it in the match, the capture allows the matched text, if any, to be preserved in the output (via the \1 in the replacement).
Because that matches the whole assignment, a complete assignment must appear in the replacement, too. On the other hand, this means that (probably) you can apply the command to every line, as shown, with no particular loss of efficiency relative to the previous command.
Even that might produce changes you didn't want, however, such as in comment lines or quoted text.

sed to remove section of text from a variable

So I think I've cracked the regex but just can't crack how to get sed to make the changes. I have a variable which is this:
MAKEVAR = EPICS_BASE=$CI_PROJECT_DIR/3.16/base IPAC=$CI_PROJECT_DIR/3.16/support/ipac SNCSEQ=$CI_PROJECT_DIR/3.16/support/seq
(All one line). But I want to delete the particular section defining IPAC so my regex looks like this:
(IPAC.+\s)
I know from using this tool that that should be correct:
https://www.regextester.com/98103
However when I run different iterations of trying out sed like:
sed 's/(IPAC.+\s)/\&/g' <<< "$MAKEVAR"
And then echo out MAKEVAR, the IPAC section still exists.
How can I update a particular section of text in a shell variable to remove a section beginning with IPAC up until the next space?
Thanks in advance
regextester (or any other online tool) is a great way to verify that a regexp works in that online tool. Unfortunately that doesn't mean it'll work in any given command-line tool. In particular your regexp includes \s which is specific to PCREs and some GNU tools, and uses (...) to delineate capture groups but that's only used in EREs and PCREs, not BREs such as sed supports by default where you'd have to use \(...\), and your replacement text is using '&' which is telling sed you want to replace the string that matches the regexp with a literal \& when in fact you just want to remove it.
This is how to do what I think you're trying to do using any sed:
$ sed 's/IPAC[^ ]* //' <<< "$MAKEVAR"
EPICS_BASE=$CI_PROJECT_DIR/3.16/base SNCSEQ=$CI_PROJECT_DIR/3.16/support/seq
Nevermind, found a workaround:
MAKEVAR=$(sed -E 's/(IPAC.+ipac)//' <<<"$MAKEVAR")
Use a shorter
MAKEVAR=$(sed 's/IPAC.*ipac//' <<< "$MAKEVAR")
IPAC.*ipac matches all the way from first IPAC to last ipac. The matched text is removed from the text.

how to edit url string with sed

My Linux repository file contain a link that until now was using http with a port number to point to it repository.
baseurl=http://host.domain.com:123/folder1/folder2
I now need a way to replace that URL to use https with no port or a different port .
I need also the possibility to change the server name for example from host.domain.com to host2.domain.com
So my idea was to use sed to search for the start of the http until the first / that come after the 2 // thus catching whatever in between and will give me the ability to change both server name port or http\s usage.
Im now using this code (im using echo just for the example):
the example shows how in 2 cases where one time i have a link with http and port 123 converted to https and the second time the other way around
and both code i was using the same sed for generic reasons.
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
OR
WANTED_URL="http://host.domain.com:123"
echo 'https://host.domain.com/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
is that the correct way doing so?
sed regexes are greedy by default. You can tell sed to consume only non-slashes, like this:
echo 'http://host.domain.com:123/folder1/folder2' | sed -e 's|http://[^/]*|https://host.domain.com|'
result:
https://host.domain.com/folder1/folder2
(BTW you don't have to escape slashes because you are using an alternate separating character)
the key is using [^/]* which will match anything but slashes so it stops matching at the first slash (non-greedy).
You used /.*/ and .* can contain slashes, not that you wanted (greedy by default).
Anyway my approach is different because expression does not include the trailing slash so it is not removed from final output.
Assuming it doesn't really matter if you have 1 sed script or 2 and there isn't a good reason to hard-code the URLs:
$ echo 'http://host.domain.com:123/folder1/folder2' |
sed 's|\(:[^:]*\)[^/]*|s\1|'
https://host.domain.com/folder1/folder2
$ port='123'; echo 'https://host.domain.com/folder1/folder2' |
sed 's|s\(://[^/]*\)|\1:'"$port"'|'
http://host.domain.com:123/folder1/folder2
If that isn't what you need then edit your question to clarify your requirements and in particular explain why:
You want to use hard-coded URLs, and
You need 1 script to do both transformations.
and provide concise, testable sample input and expected output that demonstrates those needs (i.e. cases where the above doesn't work).
wrt what you had:
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
The main issues are:
Don't use all-upper-case for non-exported shell variable names to avoid clashes with exported variables and to avoid obfuscating your code (this convention has been around for 40 years so people expect all upper case variables to be exported).
Never enclose any script in double quotes as it exposes the whole script to the shell for interpretation before the command you want to execute even sees it. Instead just open up the single quotes around the smallest script segment possible when necessary, i.e. to expand $y in a script use cmd 'x'"$y"'z' not cmd "x${y}z" because the latter will fail cryptically and dangerously given various input, script text, environment settings and/or the contents of the directory you run it from.
The -i option for sed is to edit a file in-place so you can't use it on an incoming pipe because you can't edit a pipe in-place.
When you let a shell variable expand to become part of a script, you have to take care about the possible characters it contains and how they'll be interpreted by the command given the context the variable expands into. If you let a whole URL expand into the replacement section of a sed script then you have to be careful to first escape any potential backreference characters or script delimiters. See Is it possible to escape regex metacharacters reliably with sed. If you just let the port number expand then you don't have to deal with any of that.

Unable to remove a value from a text file using -sed

I'm trying to remove an ID number from a text file using a series of commands (using terminal), but they don't seem to be working. I need to remove the number and the associated "ID" text
Text in File:
{"id":"098765432"}
Commands I've been using (but don't seem to be working):
sed -i.bak 's/"id":[0-9]\{1,\},//g' ./Filename.txt
sed -i.bak 's/"id":"[0-9]\{1,\}",//g' ./Filename.txt
sed -i.bak 's/"id":"[0-9]\{9,\}",//g' ./Filename.txt
sed -i.bak 's/"id":[0-9]\{9,\},//g' ./Filename.txt
sed -i.bak 's/"[0-9]\{1,\}",//g' ./Filename.txt
Thanks for the help :)
As #Wintermute already noted in the comment, the problem is in the comma before //. However, I am going to explain the whole line, just so the others may understand it completely, in case something is not clear to those who come across this question later.
So, the proper command that will satisfy your requirement is:
sed -i.bak 's/"id":"[0-9]\{1,\}"//g' ./Filename.txt
sed is the command that calls stream editor.
Flag -i is the flag used to represent editing files in place (it makes backup if extension is supplied). In this case, extension written is .bak and indeed the backup file (containing initial context of our file) is created with the original name + the extension provided.
Argument 's/"id":"[0-9]{1,}"//g' is the argument given to the sed command.
Since this argument (regular expression in it) was the cause of the problem, I am going to explain it in detail.
First part we should notice is that its structure is s/Regex/Replacement/g where
Regex = "id":"[0-9]{1,}"
Replacement = nothing (literally nothing, not even blank space)
So basically, as described by Bruce Barnett, s stands for substitution. Regex is the part we will replace with the Replacement. At the end, letter g means that we will change more than just one occurrence of this regex per line (without g, it would replace just the first occurrence in every line, no matter how many are there).
And at the end we have ./Filename.txt, which is the source file we are applying this command on (./ means that the file is in the same directory from where we are running this command).
About the regex used ("id":"[0-9]{1,}"):
It starts with the literals ("id":") and this part will match literally any part in the file which is exactly the same as this one. Next, we have ([0-9]{1,}), which means that we want to, in addition to the first part, look for the at least one occurrence of a number (but it can be more of them, as the matched example from the question shows).
Now you may understand why comma caused this problem. There is no comma in the original text in the file. Thus, none of the commands tried (since all of them contain comma) worked. Of course, some of them have even more reasons.
EDIT: As #ghoti pointed out, replacement is not a regex. It is the string we will put at the place(s) that are found by our regex expression. So in this case, our replacement is blank string (since we want to delete the specified part).

search a pattern in each line and append it at the end of that line

I have a file with the following entries:
folder1/a_b.csv folder1/generated/
folder2/folder3/a_b1.csv folder12/generated/
folder4/b_c.csv folder123/generated/
folder5/d.csv folder1/new_folder/generated/
folder6/12.csv folder/anotherfolder/morefolder/evenmorefolder/generated/
I want to copy the csv file name from each line, paste them at the end of that line and append it with ".org". Hence, the changed file would look like
folder1/a_b.csv folder1/generated/a_b.csv.org
folder2/folder3/a_b1.csv folder12/generated/a_b1.csv.org
folder4/b_c.csv folder123/generated/b_c.csv.org
folder5/d.csv folder1/new_folder/generated/d.csv.org
folder6/12.csv folder/anotherfolder/morefolder/evenmorefolder/generated/12.csv.org
Basically, I am looking for a command in vim or sed using which I can search a pattern in each line and append it at the end of that line. Is it possible?
Thanks in advance.
Vim
Here's how to do this in Vim:
:%s/\([^/]*\.csv\)\( .*\)/&\1.org/
This global (:%) substitution matches the filename (characters that don't contain /, ending in .csv), and captures \(...\) it. It then matches the rest of the line, and captures that, too.
As a replacement, first keep the original match & (or \0), then append the first capture (\1) with the additional suffix.
sed
Though the regular expression syntax is somewhat different than in Vim, the identical expression can be used with sed:
sed -e 's/\([^/]*\.csv\)\( .*\)/&\1.org/' input
Alternatives
It looks like you want to do file renaming in batches. On Linux, the mmv command-line tool is well suited for that; you'll probably find many similar tools on the web, too.
This might work for you (GNU sed):
sed -r 's|/([^ ]*) .*|&\1.org|' file

Resources