Changing the prefix of a file with sed - bash

I would like some advice on this script.
I'm trying to use sed (I didn't manage it with rename) to change a file that contains lines of the format (my test file name is sedtest):
COPY W:\Interfaces\Payments\Tameia\Unprocessed\X151008\E*.*
(that's not the only content of the file).
My goal is to replace the 151008 date part with a different date, I've tried to come up with a solution in sed using this:
sed -i -e "s/Unprocessed\X.*/Unprocessed\X'BLABLA'/" sedtest
but it doesnt seem to work, the line remains unchanged, it's like it doesn't recognize the pattern because of the \. I've tried some alternative delimiters like #, but to no avail.
Thanks in advance for any advice.

There's a couple of issues with your sed command. I would suggest changing it to this:
sed -r 's/(Unprocessed\\X)[0-9]+/\1BLABLA/' file
Since your version of sed supports -i without requiring that you add a suffix to create a backup file, I assume you're using the GNU version, which also supports extended regular expressions with the -r switch. The command captures the part within the () and uses it in the replacement \1. Don't forget that backslashes must be escaped.
If you're going to use -i, I would recommend doing so like -i.bak, so a backup of your file is made to file.bak before it is overwritten.
You haven't shown the exact output you were looking for but I assumed that you wanted the line to become:
COPY W:\Interfaces\Payments\Tameia\Unprocessed\XBLABLA\E*.*
Remember that * is greedy, so .* would match everything up to the end of the line. That's why I changed it to [0-9]+, so that only the digits were replaced, leaving the rest of the line intact.
As you've mentioned using a variable in the replacement, you should use something like this:
sed -r -i.bak "s/(Unprocessed\\X)[0-9]+/\1$var/" file
This assumes that $var is safe to use, i.e. doesn't contain characters that will be interpreted by sed, like \, / or &. See this question for details on handling such cases reliably.

Related

Regex to match characters between two specific characters in shell script

I want to clean my file before/after saving so I have to delete unnecessary characters that I have there. Sadly, even that my regex is working in Regex101, it does not work in shell script I wrote.
I am getting my list from Kubernetes via
kubectl get pods -n $1 -o jsonpath='{range .items[*]}{#.spec.containers[*].image}{","}{#.status.containerStatuses[*].imageID}{"\n"}{end}'
Then I saving it to the temp file and using sed to clear it - the regex should match and (sed should) delete any character between , and # (also should delete #). I am escaping them since they are special characters.
sed -i 's/(?<=\,)(.*?)(?<=\#)//g' temp
The problem is that this regex is working fine (for example in Regex101) but is not working with the sed command. I even tried awk but getting the same output.
awk '!/(?<=\,)(.*?)(?<=\#)/' temp
Am I missing something or is the regex acting differently somehow in Unix/shell?
Thanks for any input.
Example content of the file (for test):
docker.elastic.co/elasticsearch/elasticsearch:7.17.5,docker-pullable://docker.elastic.co/elasticsearch/elasticsearch#sha256:76344d5f89b13147743db0487eb76b03a7f9f0cd55abe8ab887069711f2ee27d
docker.io/bitnami/kafka:3.3.1-debian-11-r11,docker-pullable://bitnami/kafka#sha256:be29db0e37b6ab13df5fc14988a4aa64ee772c7f28b4b57898015cf7435ff662
docker.io/bitnami/mongodb:6.0.3-debian-11-r0,docker-pullable://bitnami/mongodb#sha256:e7438d7964481c0bcfcc8f31bca2d73022c0b7ba883143091a71ae01be6d9edb
docker.io/bitnami/postgresql:14.1.0-debian-10-r80,docker-pullable://bitnami/postgresql#sha256:6eb9c4ab3444e395df159e2cad21f283e4bf30802958467590c886f376dc9959
docker.io/bitnami/zookeeper:3.8.0-debian-11-r47,docker-pullable://bitnami/zookeeper#sha256:0f3169499c5ee02386c3cb262b2a0d3728998d9f0a94130a8161e389f61d1462
Expected output:
docker.elastic.co/elasticsearch/elasticsearch:7.17.5,sha256:76344d5f89b13147743db0487eb76b03a7f9f0cd55abe8ab887069711f2ee27d
docker.io/bitnami/kafka:3.3.1-debian-11-r11,sha256:be29db0e37b6ab13df5fc14988a4aa64ee772c7f28b4b57898015cf7435ff662
docker.io/bitnami/mongodb:6.0.3-debian-11-r0,sha256:e7438d7964481c0bcfcc8f31bca2d73022c0b7ba883143091a71ae01be6d9edb
docker.io/bitnami/postgresql:14.1.0-debian-10-r80,sha256:6eb9c4ab3444e395df159e2cad21f283e4bf30802958467590c886f376dc9959
docker.io/bitnami/zookeeper:3.8.0-debian-11-r47,sha256:0f3169499c5ee02386c3cb262b2a0d3728998d9f0a94130a8161e389f61d1462
You are trying to use Perl extensions which are not supported by more traditional regex tools like sed and Awk.
Perhaps see also Why are there so many different regular expression dialects? and the Stack Overflow regex tag info page.
If I can guess what you are trying to do, you want simply
sed -i 's/,[^#]*#/,/g' temp
The /g flag is unnecessary if you only expect one match per line.
Neither , nor # is a regex metacharacter; they do not require escaping.
Usually you would want to avoid using a temporary file or sed -i; perhaps simply
kubectl blah blah | sed 's/,[^#]*#/,/' > temp
to create the file, or remove the redirection if you want to pipe the results further.

How to use sed to remove ./ between two characters in Unix shell

I am trying to remove ./ between two characters using sed but not getting the desired output.
Sample:
e2b66a3d84ee448c33d7f2a2f7e51c58 ./2017_06_10_0400.txt
I tried the below but it is not working as expected, even the . in the ".txt" is getting removed.
sed -i 's/[./,]//g'
Beware: don't even think of using the -i option until you know the code is working. You can screw things up big time!
Use:
sed -e 's%[.]/%%g'
You can choose the delimiter in a s/// command, and when the regular expressions involve /, it is sensible to choose something else — I often use % when it doesn't figure in the text. The -e is optional. Using [.] to detect an actual dot is one way; you can write \. if you prefer, but I'm allergic to avoidable backslashes (if you've never had to write 16 backslashes in a row to get troff to do what you want, you haven't suffered enough).
Be aware that the -i option behaves differently in GNU sed and BSD (macOS) sed. Using -i.bak works in both (for an arbitrary, non-empty string such as .bak). Otherwise, your code isn't portable (which may or may not matter to you now, but might well do later on).
You have:
sed -i 's/[./,]//g'
The trouble with this is that it looks for any of the characters ., / or , in isolation — so it removes the . in .txt as well as the . and / in ./. You need to look for consecutive characters — as in my suggested solution.
try this:
echo "e2b66a3d84ee448c33d7f2a2f7e51c58 ./2017_06_10_0400.txt" | sed -e 's|\./||'
You need to use escape character \
's#\.\/##g'
:=>echo "e2b66a3d84ee448c33d7f2a2f7e51c58 ./2017_06_10_0400.txt" | sed 's#\.\/##g'
e2b66a3d84ee448c33d7f2a2f7e51c58 2017_06_10_0400.txt
:=>

Remove prefix of each line in a file and output to another file using sed

I have a source code file in which comments are prefixed with // (ie. double slashes and an empty space), I want to convert the source code into a document so I tried to cat file.c and pipe it to sed, the thinking is to replace "double slash and a space" if a line starts with it, with empty string, but it looks like the slash has some special meaning in sed, so what's the best way of constructing the sed arguments?
Thanks!
If you want to remove the special meaning of / from sed then following may help you in same.
sed 's/^\/\/ //g' Input_file
So I am escaping / here by using \ before it, so it will be taken as a literal character rather than it's special meaning in code. Also if you are happy with above command's result then use -i to save the changes in Input_file itself. Hope this helps.
The slash only has meaning if you allow it.
sed 's#^// +##' < file.c

remove absolute path using sed command

I have file which contain following context like
abc...
include /home/user/file.txt'
some text
I need to remove include and also complete path after include.
I have used following command which remove include but did not remove path.
sed -i -r 's#include##g' 'filename'
I am also trying to understand above command but did not understand following thing ( copy paste from somewhere)
i - modify file change
r - read file
s- Need input
g - Need input
Try this,
$ sed '/^include /s/.*//g' file.txt
abc...
some text
It remove all the texts in a line which starts with include. s means substitute. so s/.*//g means replace all the texts with null.g means global. The substitution will be applied globally.
OR
$ sed '/^include /d' file.txt
abc...
some text
d means delete.
It deletes the line which starts with include. To save the changes made(inline edit), your commands should be
sed -i '/^include /s/.*//g' file.txt
sed -i '/^include /d' file.txt
I your case if you just want to delete the second line, you can use:
sed -i '2d' file
If you want to explore something about linux commands then man pages are there for you.
Just go to terminal and type:
man sed
as per your question, The above command without -i will show the file content on terminal by deleting the second line from the input file. However, the input file remains unchanged. To update the original file or to make the changes permanently in the source file, use the -i option.
-i[SUFFIX], --in-place[=SUFFIX] :
edit files in place (makes backup if extension supplied)
-r or --regexp-extended :
option is to use extended regular expressions in the script.
s/regexp/replacement/ :
Attempt to match regexp against the pattern space. If success‐
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
g G : Copy/append hold space to pattern space.
grep -v
This is not about learning sed, but as an alternative (and short) solution, there is:
grep -v '^include' filename_in
Or with output redirection:
grep -v '^include' filename_in > filename_out
-v option for grep inverts matching (hence printing non-matching lines).
For simple deletion that's what I'd use; if you have to modify your path after the include, stick with sed instead.
You can use awk to just delete the line:
awk '/^include/ {next}1' file
sed -i -r 's#include##g' 'filename'
-i: you directly modify the treated file, by default, sed read a file, modify the content via stdout (the original file stay the same).
-r: use of extended regular expression (and not reduce to POSIX limited one).This is not necessary in this case due to simple POSIX compliant action in action list (the s### string).
s#pattern#NewValue#: substitute in current line the pattern (Regular Expression) with "Newvalue" (that also use internal buffer or specific value). The traditionnal form is s/// but in this case, using / in path (pattern or new value) an alternate form is used to avoid to escape all / in pattern or new value
g: is an option of s### that specify change EVERY occurence and not the first (by default)
so here it replace ANY occurence of include by nothing (remove) directly into your file
As per the Avinash Raj solution you got what you want but you want some explaination about some parameter used in sed command
First one is
command: s for substitution
With the sed command the substitute command s changes all occurrences of the regular expression into a new value. A simple example is changing "my" in the "file1" to "yours" in the "file2" file:
sed s/my/yours/ file1 >file2
The character after the s is the delimiter. It is conventionally a slash, because this is what ed, more, and vi use. It can be anything you want, however. If you want to change a pathname that contains a slash - say /usr/local/bin to /common/bin - you could use the backslash to quote the slash:
sed 's/\/usr\/local\/bin/\/common\/bin/' <old >new
/g - Global replacement
Replace all matches, not just the first match.
If you tell it to change a word, it will only change the first occurrence of the word on a line. You may want to make the change on every word on the line instead of the first then add a g after the last delimiter and use the work-around:
Delete with d
Delete the pattern space; immediately start next cycle.
You can delete line by specifying the line number. like
sed '$d' filename.txt
It will remove last line of file
sed '2 d' file.txt
It will delete second line of file.
-i option
This option specifies that files are to be edited in-place. GNU sed does this by creating a temporary file and sending output to this file rather than to the standard output.
To modify file actully you can use -i option without it sed command repressent changes on stdout not actual file. You can take backup of original file before modification by using -i.bak option.
-r option
--regexp-extended
Use extended regular expressions rather than basic regular expressions. Extended regexps are those that egrep accepts; they can be clearer because they usually have less backslashes, but are a GNU extension and hence scripts that use them are not portable.

Find and replace html code for multiple files within multiple directories

I have a very basic understanding of shell scripting, but what I need to do requires more complex commands.
For one task, I need to find and replace html code within the index.html files on my server. These files are in multiple directories with a consistent naming convention. ([letter][3-digit number]) See the example below.
files: index.html
path: /www/mysite/board/today/[rsh][0-9]/
string to find: (div id="id")[code](/div)<--#include="(path)"-->(div id="id")[more code](/div)
string to replace with: (div id="id")<--include="(path)"-->(/div)
I hope you don't mind the pseudo-regex. The folders containing my target index.html files look similar to r099, s017, h123. And suffice the say, the html code I'm trying to replace is relatively long, but its still just a string.
The second task is similar to the first, only the filename changes as well.
files: [rsh][0-9].html
path: www/mysite/person/[0-9]/[0-9]/[0-9]/card/2011/
string: (div id="id")[code](/div)<--include="(path)"-->(div id="id")[more code](/div)
string to replace with: (div id="id")<--include="(path)"-->(/div)
I've seen other examples on SO and elsewhere on the net that simply show scripts modifying files under a single directory to find & replace a string without any special characters, but I haven't seen an example similar to what I'm trying to do just yet.
Any assistance would be greatly appreciated.
Thank You.
You have three separate sub-problems:
replacing text in a file
coping with special characters
selecting files to apply the transformation to
​1. The canonical text replacement tool is sed:
sed -e 's/PATTERN/REPLACEMENT/g' <INPUT_FILE >OUTPUT_FILE
If you have GNU sed (e.g. on Linux or Cygwin), pass -i to transform the file in place. You can act on more than one file in the same command line.
sed -i -e 's/PATTERN/REPLACEMENT/g' FILE OTHER_FILE…
If your sed doesn't have the -i option, you need to write to a different file and move that into place afterwards. (This is what GNU sed does behind the scenes.)
sed -e 's/PATTERN/REPLACEMENT/g' <FILE >FILE.tmp
mv FILE.tmp FILE
​2. If you want to replace a literal string by a literal string, you need to prefix all special characters by a backslash. For sed patterns, the special characters are .\[^$* plus the separator for the s command (usually /). For sed replacement text, the special characters are \& and newlines. You can use sed to turn a string into a suitable pattern or replacement text.
pattern=$(printf %s "$string_to_replace" | sed -e 's![.\[^$*/]!\\&!g')
replacement=$(printf %s "$replacement_string" | sed -e 's![\&]!\\&!g')
​3. To act on multiple files directly in one or more directories, use shell wildcards. Your requirements don't seem completely consistent; I think these are the patterns you're looking for, but be sure to review them.
/www/mysite/board/today/[rsh][0-9][0-9][0-9]/index.html
/www/mysite/person/[0-9]/[0-9]/[0-9]/card/2011/[rsh][0-9].html
This will match files like /www/mysite/board/today/r012/index.html and /www/mysite/person/4/5/6/card/2011/h7.html, but not /www/mysite/board/today/subdir/s012/index.html or /www/mysite/board/today/r1234/index.html.
If you need to act on files in subdirectories recursively, use find. It doesn't seem to be in your requirements and this answer is long enough already, so I'll stop here.
​4. Putting it all together:
string_to_replace='(div id="id")[code](/div)<--#include="(path)"-->(div id="id")[more code](/div)'
replacement_string='(div id="id")<--include="(path)"-->(/div)'
pattern=$(printf %s "$string_to_replace" | sed -e 's![.\[^$*/]!\\&!g')
replacement=$(printf %s "$replacement_string" | sed -e 's![\&]!\\&!g')
sed -i -e "s/$pattern/$replacement/g" \
/www/mysite/board/today/[rsh][0-9][0-9][0-9]/index.html \
/www/mysite/person/[0-9]/[0-9]/[0-9]/card/2011/[rsh][0-9].html
Final note: you seem to be working on HTML with regular expressions. That's often not a good idea.
Finding the files can easily be done using find -regex:
find www/mysite/board/today -regex ".*[rsh][0-9][0-9][0-9]/index.html"
find www/mysite/person -regex ".*[0-9]/[0-9]/[0-9]/card/2011/[rsh][0-9][0-9][0-9].html"
Due to nature of HTML, replacing the content might not be very easy with sed, so I would suggest using an HTML or XML parsing library in a perl script. Can you provide a short sample of an actual html file and the result of the replacements?

Resources