Sed - Replace immediate next string/word coming after a particular pattern - shell

I'm a newbie to shell scripting and any help is much appreciated.
I have a pattern like this rmd_ver=1.0.10
I want to search the pattern rmd_ver= and replace the numeric part 1.0.10 with a new value in all the matches. Hope my question is clear.

To replace any value till the end of the line:
sed -i 's/\(rmd_ver=\)\(.*\)/\1R/' file
sed -i 's/p/r/' file replace p with r in file
\( start first group
rmd_ver= search pattern
\) end first group
\( start second group
.* any characters
\) end second group
\1 back reference to the first group
R replacement text
To replace the exact pattern in any place of the line and possibly several times in one line:
sed -i 's/\(rmd_ver=\)\(1\.0\.10\)/\1R/g' file
\. escape special . into literal .
g to replace multiple occurrences in one line

If you are too lazy to repeat the pattern in the replacement (s/rmd_ver=1\.0\.10/rmd_ver=2.0.0/), store it in a group:
sed -e 's/\(rmd_ver=\)1\.0\.10/\12.0.0/'

From your description I think you just need the substitute command, with syntax s/from_regex/to_result/. To match a number like 1.0.10 you can match a repeat of digits or dot, e.g [0-9.]. That is a bit simple regex in that it will allow a dot at the start and the beginning, but let's start with that. Then your sed command becomes
sed 's/rmd_ver=[0-9.]\+/rmd_ver=42/' filename
The + is a repeat operator, and since sed is using BRE (basic regular expression) syntax it has to be escaped.
If you want to avoid matching dots on ends, like 1.2.3., you will have to change the regex to [0-9][0-9.]\+[0-9] to make sure that the first and last character is not a dot. Maybe you also want to be able to match a single digit, then you have to add an alternative (e.g. /a|b/ matches a or b) to match that:
sed 's/rmd_ver=\([0-9][0-9.]\+[0-9]\|[0-9]\)/rmd_ver=42/' filename

sed 's/\(rmd_ver=\).*[[:number:]]$/\1NEW_VAL/g'
you can replace NEW_VAL with the value you want to replace with.

Related

Regular expressions, capture groups, and the dollar sign

Reading a book about bash and it was introducing regular expressions(I'm pretty new to them) with an example:
rename -n 's/(.*)(.*)/new$1$2/' *
'file1' would be renamed to 'newfile1'
'file2' would be renamed to 'newfile2'
'file3' would be renamed to 'newfile3'
There wasn't really a breakdown provided with this example, unfortunately. I kind of get what capture groups are and that .* is greedy and will match all characters but I'm uncertain as to why two capture groups are needed. Also, I get that $ represents the end of the line but am unsure of what $1$2 is actually doing here. Appreciate any insight provided.
Attempted to research capture groups and the $ for some similar examples with explanations but came up short.
You are correct. (.*)(.*) makes no sense. The second .* will always match the empty string.
For example, matching against file,
the first .* will match the 4 character string starting at position 0 (file), and
the second .* will match the 0 character string starting at position 4 (empty string).
You could simplify the pattern to
rename -n 's/(.*)/new$1/' *
rename -n 's/.*/new$&/' *
rename -n 's/^/new/' *
rename -n '$_ = "new$_"' *
rename -n '$_ = "new" . $_' *
I don't know that rename command. The regular expression looks like sed syntax. If that is the case (as in many other regex forms), it has 3 parts:
s for substitute
everything between the first two slashes (.*)(.*) to specify what to match
everything between the 2nd and 3rd slash new$1$2 is the replacement
$ only mean end of the line on the first part of the regular expression. On the second part $ number refers to the capture groups, $1 is the first group, $2 the second, and so on, with $0 often being the whole matched text.
You are right that .* is greedy and it's pointless to have that repeated. Maybe there was a \. in between and that was an attempt to capture file name and extension. There are better ways to parse file names, like basename. So you could simplify the command to rename -n 's/(.*)/new$1/' *

Extract text between two special characters

Trying to extract the text between the special characters "\ and \" through sed
Ex: "\hell##$\"},
expected output : hell##$
You can do it quite easily with using a capture-group and backreference with basic regular-expressions:
sed 's/^["][\]\([^\]*\).*$/\1/'
Explanation
Normal substitution sed 's/find/replace/, where
find is ^["][\] a double-quote and \ before beginning the capture \(...\) which contains [^\]* (zero or more characters not a \), the closing of the capture \) and then .*$ the remainder of the string;
replace is \1 (the first backreference) containing the text captured between \(...\).
(note: if your "\ doesn't begin the string, remove the first '^' anchor)
Example
$ echo '"\hell##$\"},' | sed 's/^["][\]\([^\]*\).*$/\1/'
hell##$
Look things over and let me know if you have questions.
This might work for you (GNU sed):
sed -nE '/"\\[^\\]*\\+([^\\"][^\\]*\\+)*"/{s/"\\/\n/;s/.*\n//;s/\\"/\n/;P;D}' file
The solution comes in two parts:
Firstly, a regexp to determine whether a pair of two characters exists. This can be tricky as a negated class is insufficient because edge cases can easily defeat a simplistic approach.
Secondly, once a pair of characters does exist the text between them must be extracted piece meal.

Replace All first 4 spaces with a tab

I am doing some documentation work, and I have a tree structure like this:
A
BB
C C
DD
How can I replace just all the occurrences of 2 spaces in the head of the line with '-', like:
A
--BB
--C C
----DD
I have tried sed 's/ /-/g', but this replaces all occurrences of 2 spaces; also sed 's/^ /-/g', this just replaces the first occurrence of 2 spaces. How can I do this?
The regular expression for four spaces at beginning of line is /^ / where I put the slashes just to demarcate the expression (they are not part of the actual regular expression, but they are used as delimiters by sed).
sed 's/^ /\t/' file
In recent sed versions, you can add an -i option to modify file in-place (that is, sed will replace the file with the modified file); on *BSD (including OSX), you need -i '' with an empty option argument.
The \t escape code for tab is also not universally supported; if that is a problem, your shell probably allows you to type a literal tab by prefixing it with ctrl-V.
(Your question title says "tab" but your question asks about dashes. To replace with two dashes, replace \t in the replacement part of the script with --, obviously.)
If you are trying to generalize to "any groups of two spaces at beginning of line should be replaced by a dash", this is not impossible to do in sed, but I would recommend Perl instead:
perl -pe 's%^((?: )+)% "-" x (length($1) / 2)%e' file
This captures the match into $1; the inner parenthesized expression matches two spaces and the + quantifier says to match that as many times as possible. The /e flag allows us to use Perl code in the replacement; this piece of code repeats the character "-" as many times as the captured expression was repeated, which is conveniently equal to half its length.

Unix Shell Script to extract the number from the String

How to extract the bold number from the string below using unix shell script?
17: H.0(-2073):File ID (40008)in xyz file not equal to the file ID(**40004**)in file header.
Thanks :)
echo '17: H.0(-2073):File ID (40008)in xyz file not equal to the file ID(40004)in file header.' | sed -e 's/.*(\([0-9]*\)).*/\1/'
The second part of this line runs sed with command s (substitution). Part between first two slashes (/) is regular expression which matches the following:
Everything (.*) in greedy manner, i.e. until the last occurrence of any number of digits in brackets ( ([0-9]*) ) and then everything again (.*) until the end of line. Expression between \( and \) (i.e. 40004 in this case) is memorized to be used in the second part of s command.
The part between the second / and third / is what we want to place instead of the line matched with regular expression. Here it is \1, meaning reference to the substring between 1st occurrence of \( and \) which is 40004 in our case.
So the part after | replaces the whole input string with string 40004 extracted from it. Regular expressions are powerful but often write-only technique, so I hope this explanation will bring a bit more clarity.

Print all characters upto a matching pattern from a file

Maybe a silly question but I have a text file that needs to display everything upto the first pattern match which is a '/'. (all lines contain no blank spaces)
Example.txt:
somename/for/example/
something/as/another/example
thisfile/dir/dir/example
Preferred output:
somename
something
thisfile
I know this grep code will display everything after a matching pattern:
grep -o '/[^\n]*' '/my/file.txt'
So is there any way to do the complete opposite, maybe rm everything after matching pattern or invert to display my preferred output?
Thanks.
If you're calling an external command like grep, you can get the same results your require with the sed command, i.e.
echo "something/as/another/example" | sed 's:/.*::'
something
Instead of focusing on what you want to keep, think about what you want to remove, in this case everything after the first '/' char. This is what this sed command does.
The leading s means substitute, the :/.*: is the pattern to match, with /.* meaning match the first /' char and all characters after that. The 2nd half of thesedcommand is the replacement. With::`, this means replace with nothing.
The traditional idom for sed is to use s/str/rep/, using / chars to delimit the search from the replacement, but you can use any character you want after the initial s (substitute) command.
Some seds expect the / char, and want a special indication that the following character is the sub/replace delimiter. So if s:/.*:: doesn't work, then s\:/.*:: should work.
IHTH.
Yu can use a much simpler reg exp:
/[^/]*/
The forward slash after the carat is what you're matching to.
jsFiddle
Assuming filename as "file.txt"
cat file.txt | cut -d "/" -f 1
Here, we are cutting the input line with "/" as the delimiter (-d "/"). Then we select the first field (-f 1).
You just need to include starting anchor ^ and also the / in a negated character class.
grep -o '^[^/]*' file

Resources