bash parameter expansion search-and-replace capture groups

bash parameter expansion search-and-replace capture groups - bash

When using bash-style search-and-replace parameter expansion is there a way to refer to captured substrings in the pattern?
For example, I want to insert a leading 0 in filenames that end in "(some digit).mp3". The files have other parens in the name, so I need to look for the close paren closest to the end:
${x/\(([[:digit:]]\).mp3)/\(0}
This doesn't quite work, b/c it doesn't resubstitute the previous end of the string.
Is there a way in bash or zsh to refer to the captured string? $BASH_REMATCH doesn't seem to work.

You could use sed
sed 's/\(.*(\)\([0-9][0-9]*).mp3\)/\10\2/'
I think that gets the desired functionality... I'm sure there is a slicker way to do it, I'm just starting to learn sed myself.

One way to do this is to first strip the extension from the filename leaving only the base filename in name. Next use the length of name - 1 as the string index to return the last character in name. Lastly, check the lastchar against ).
name="${filename%.*}" # remove extension from filename
if [[ ${name:((${#name}-1))} == ")" ]]; then ## just isolates last char and test again ")"
# modify as needed to insert '0'
newfilename="0${filename}" # to append the '0' to beginning of filename
fi
Note: there is no need for a new variable nefilename that was just for illustration. It is perfectly OK just to add the 0 to filename with filename="0${filename}"

Related

Replacing variable section in string

I am having a long list of strings (actually files names) $var looking like this
p1035sEthinylestradiol913
p1035sTAbs872
p946sCarbaryl1182
Now I wish to replace the string, which occurs between the first s and the first integer [1-9], with R. Hence the output should look like:
p1035sR913
p1035sR872
p946sR1182
I was trying something like this:
echo ${var/s*[1-9]/R}
But this of course will remove the first integer in the string after the smatch and that is not what I want. Can someone help me out here? Thanks a lot in advance!

To keep the matched digit you could switch from parameter expansions like ${var/s*[1-9]/R} to matching [[ string =~ pattern ]]. The matched digit could then be retrieved by BASH_REMATCH. However, you still had to do this for every entry in your list.
With sed you automatically change every line and keeping the digit is easy:
sed -E 's/s.*([0-9])/sR\1/' file
or
someCommand | sed -E 's/s.*([0-9])/sR\1/'

How to make sed avoid replacement after specific symbol

I am writing a script for formatting a Fortran source code.
Simple formatting, like having all keywords in capitals or in small letters, etc.
Here is the main command
sed -i -e "/^\!/! s/$small\s/$cap /gI" $filein
It replaces every keyword $small (followed by a space) by a keyword $caps. And the replacement happens only if the line does not start with the "!".
It does what it should. Question:
How to avoid replacement if "!" is encountered in the middle of a line.
Or more generally, how to replace patterns everywhere, but not after a specific symbol, which can be either in the beginning of the line or somewhere else.
Example:
Program test ! It should not change the next program to caps
! Hi, is anything changing here? like program ?
This line does not have any key words
This line has Program and not exclamation mark.
"program" is a keyword. After running the script the result is:
PROGRAM test ! It should not change the next PROGRAM to caps
! Hi, is anything changed here? like program ?
This line does not have any key words
This line has PROGRAM and not exclamation mark.
I want:
PROGRAM test ! It should not change the next program to caps
! Hi, is anything changed here? like program ?
This line does not have any key words
This line has PROGRAM and not exclamation mark.
So far, I've failed to find a nice solution, which does the trick, hopefully with the sed command.

The typicall way in sed is to:
split the string into two parts - save one part in hold space.
do operations on pattern space
get hold space and shuffle for output.
Would be something along:
sed '/!/!b;/[^!]/{b};h;s/.*!//;x;s/!.*//;s/program/PROGRAM/gI;G;s/\n/!/'
/!/!b; - if the line has no !, then print it and start over.
h;s/.*!//;x;s/!.*// - put part after ! in hold space, part before ! in pattern space
s/program/PROGRAM/gI; - do the substitution on part of the string
G;s/\n/!/ - grab the part from hold space and shuffle output - it's easy here.

Assumptions:
OP needs to convert multiple keywords to uppercase
keywords to be capitalized do not include white space (eg, program name will need to be processed as two separate strings program and name)
input delimiter is white space
keywords with 'attached' non-alphanums will be ignored (eg, Program, will be ignored since , will be picked up as part of string) unless OP specifically includes the non-alphanum as part of the keyword definition (eg, keywords includes Program,)
all keywords to be converted to uppercase (ie, not going to worry about any flags to switch between lowercase, uppercase, camelcase, etc)
Sample input data:
$ cat source.txt
Program test ! It should not change the next program to caps # change first 'Program'
! Hi, is anything changing here? like program or MarK? # change nothing
This line does not have any key words ! except here - pRoGraM Mark # change nothing
This line has Program and not exclamation mARk plus MarKer. # change 'Program' and 'mARk' but not MarKer
Hi, hi, hI # change 'Hi,' and 'hi,' but not 'hI'
List of keywords provided in a separate file (whitespace delimited);
$ cat keywords.dat
program
mark hi, # 2 separate keywords: 'mark' and 'hi,' (comma included)
One awk idea:
awk -v comment="!" ' # define character after which conversions are to be ignored
FNR==NR { for ( i=1; i<=NF; i++) # first file contains keywords; process each field as a separate keywork
keywords[toupper($i)] # convert to uppercase and use as index in associative array keywords[]
next
}
{ for ( i=1; i<=NF; i++ ) # second file, process each field separately
{ if ( $i == comment ) # if field is our comment character then stop processing rest of line else ...
break
if ( toupper($i) in keywords ) # if current field is a keyword then convert to uppercase
$i=toupper($i)
}
print # print the current line
}
' keywords.dat source.txt
This generates:
PROGRAM test ! It should not change the next program to caps
! Hi, is anything changing here? like program or MarK?
This line does not have any key words ! except here - pRoGraM Mark
This line has PROGRAM and not exclamation MARK plus MarKer.
HI, HI, hI
NOTES:
while GNU awk can be told to overwrite the input file (eg, awk -i inplace == sed -i), this will require a different approach for processing the keywords.dat file (to keep from overwriting with nothing)
(quite a bit) of additional logic could be added to support uppercase vs lowercase vs camelcase vs whatever ... ignore or include non-alphanums in comparisons ... using multiple/different 'comment' characters ... standardizing other portions of (Fortran) code (eg, indentation) ... etc

This might work for you (GNU sed):
small='Program ' caps='PROGRAM '
sed -E ':a;s/^([^!]*)('"$small"')/\1\n/;ta;s/\n/'"$caps"'/g' file
Replace any occurrence of the variable $small before the symbol ! with a newline, then replace all newlines by the variable $caps.
N.B. The newline is chosen because it can not normally exist in any line presented by sed as it is the delimiter sed uses to present lines in the pattern space. Secondly, the words matching $small are iteratively replaced by a newline, then all newlines globally replaced by $caps. This allows for the replacement to by a superset of the first. If this were not the order of operations, the iterative process may become an endless loop.
If $small is to represent a case insensitive match, add the i flag to the first substitution.

I've tried suggested options, but all of them did not work as expected for the whole file.
I have ended up with multiple sed commands; I am sure that it is not the best solution, but it works for me and does what I need.
My main problem was to avoid replacement after "!" if it appears somewhere in the middle of the line.
So I switched this problem to the one I could handle.
sed -i -e "/^\!/! s/!/!c7u!!c7u!/" $filein # 1. If a line does NOT start with !, search next "!" and replace it with "!c7u!!c7u!"
sed -i "s/!c7u!/\n/" $filein # 2. Move that comment to a new line
for ((i=0; i<$nwords; i++ )); do # Loop through all keywords
word=${words[$i]} # Take a keyword from the list
small=${word,,} # Write it in small letters
cap=${word^^} # Write it in capitals
sed -i -e "/^\!/! s/$small\b/$cap/gI" $filein # 3. Actual replacement in lines not starting with "!"
done
sed -i -e :a -e '$!N;s/\n!c7u//;ta' -e 'P;D' $filein # 4. Undo step 1-2, moving inline comments back

Extract a section in a config file line using sed

I'm trying to continue to extract and isolate sections of text within my wordpress config file via bash script. Can someone help me figure out my sytax?
The lineof code in the wp-config.php file is:
$table_prefix = 'xyz_';
This is what I'm trying to use to extract the xyz_ portion.
prefix=$(sed -n "s/$table_prefix = *'[^']*'/p" wp-config.php)
echo -n "$prefix"
There's something wrong with my characters obviously. Any help would be much appreciated!

Your sed command is malformed. You can use s/regex/replacement/p to print your sed command. Yours, as written, will give unterminated 's' command. If you want to print your whole line out, you can use the capture group \0 to match it as s/<our_pattern>/\0/p
Bash interpets $table_prefix as a variable, and because it is in double quotes, it tries to expand it. Unless you set this variable to something, it expands to nothing. This would cause your sed command to match much more liberally, and we can fix it by escaping the $ as \$table_prefix.
Next, this won't actually match. Your line has multiple spaces before the =, so we need another wildcard there as in ...prefix *= *...
Lastly, to extract the xyz_ portion alone, we'll need to do some things. First, we have to make sure our pattern matches the whole line, so that when we substitute, the rest of the line won't be kept. We can do this by wrapping our pattern to match in ^.* ... .*\$. Next, we want to wrap the target section in a capture group. In sed, this is done with \(<stuff>\). The zeroth capture group is the whole line, and then capture groups are numbered in the order the parentheses appear. this means we can do \([^']*\) to grab that section, and \1 to output it:
All that gives us:
prefix=$(sed -n "s/^.*\$table_prefix *= *'\([^']*\)'.*\$/\1/p" wp-config.php)

The only issue with the regex is that the '$' character specifies that you are using a bash variable and since the pattern is wrapped in double quotes (", bash will attempt to expand the variable. You can mitigate this by either escapping the $ or wrapping the pattern in single quotes and escaping the single quotes in the pattern
Lastly, you are using the sed command s which stands for subsitute. It takes a pattern and replaces the matches with text in the form of s/<pattern>/<replace>/. You can omit the 's' and leave the 'p' or print command at the end. After all your command should look something like:
sed -n "/\$table_prefix = *'[^']*'/p" wp-config.php

bash pattern substitution to remove an arbitrary long sequence of letters

My script deals about filenames which are padded by the letter x to a certain length, so a file may be abcdxxxxxx or fooxxxxxxx. I have the filename stored in a variable fn, and I want to extract just the "stem", i.e. abcd or foo.
I obviously can do this by forking a sed or tr process and feed the file name into it, but bash also has a feature called pattern substitution for variables, and I was wondering whether this could be used.
From the bash man page:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pattern just as in pathname expansion. Parameter is expanded and the longest match of pattern against its value is replaced with string. If pattern begins with /, all matches of pattern are replaced with string. Normally only the first match is replaced.... If pattern begins with %, it must match at the end of the expanded value of parameter.
Now, a pattern denoting the letter x is just x, and since the pattern should match at the end, I need %x
echo ${fn/%x/}
indeed return the filename with the last x removed. But I want to have all x removed, i.e. all occurences of the pattern, which requires according to the man-page that the pattern starts with a slash. I understand this to turn %x into either /%x or %/x However, neither echo ${fn//%x/} nor echo ${fn/%/x/} produce the expected result.
Did I misunderstand something in the description of pattern substitution?

Regarding the substring replacements (/, //, /%, /#). Towards the end in here here:
${var/Pattern/Replacement}
First match of Pattern, within var replaced with Replacement.
${var//Pattern/Replacement}
Global replacement. All matches of Pattern, within var replaced with Replacement.
${var/#Pattern/Replacement}
If prefix of var matches Pattern, then substitute Replacement for Pattern.
${var/%Pattern/Replacement}
If suffix of var matches Pattern, then substitute Replacement for Pattern.
So, it's first match, all matches, prefix string or suffix string and as with globbing you can't x* in the sense of regular expressions, you are left with options described in the other answers.

Try:
echo "${fn%${fn##*[^x]}}"
Examples
$ fn=abcdxxxxxx; echo "${fn%${fn##*[^x]}}"
abcd
$ fn=fooxxxxxxx; echo "${fn%${fn##*[^x]}}"
foo
How it works
For starters, ${parameter##word} is prefix removal. It removes word from the beginning of parameter. In our cvase, ${fn##*[^x]} is the file with everything removed from the front up to an including the last character that is not x. This leaves only the trailing x's. For example:
$ fn=abcdxxxxxx; echo "${fn##*[^x]}"
xxxxxx
${parameter%%word} is suffix removal. It removes word from the end of $parameter. In our case, we want to removes trailing x's (as found above) from $fn. Thus we want ${fn%${fn##*[^x]}}.

Doubling the percent sign will do what you want:
echo "${fn%%x*}"
"Remove, from the end of the string, x and all the characters that follow it"
Or you can use extended globs:
shopt -s extglob
echo "${fn/%+(x)/}"
"Replace, at the end of the string, a sequence of one or more x's with nothing"

Assuming you have the filename in the environment variable fn, then in bash you can do:
if [[ $fn =~ x+$ ]]; then
echo ${fn%$BASH_REMATCH}
fi
This will print the filename with the matched part removed. If you want it to work also when there are no x:es at the end of the filename, replace x+$ with x*$ above, in which case it will always match.
As for the pattern substitution, my guess is it will only attempt the replace matches in the string once at a given location even if you add the / to replace all matches. So when it matches the last x at the end of the string, it will not go back to an earlier location in the string to see if it matches again. Basically this means you cannot combine % and /. If my guess is correct, that is :)

bash script on specific URL string manipulation

I need to manipulate a string (URL) of which I don't know lenght.
the string is something like
https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring
I basically need a regular expression which returns this:
https://x.xx.xxx.xxx/keyword/restofstring
where the x is the current ip which can vary everytime and I don't know the number of dontcares.
I actually have no idea how to do it, been 2 hours on the problem but didn't find a solution.
thanks!

You can use sed as follows:
sed -E 's=(https://[^/]*).*(/keyword/.*)=\1\2='
s stands for substitute and has the form s=search pattern=replacement pattern=.
The search pattern is a regex in which we grouped (...) the parts you want to extract.
The replacement pattern accesses these groups with \1 and \2.
You can feed a file or stdin to sed and it will process the input line by line.
If you have a string variable and use bash, zsh, or something similar you also can feed that variable directly into stdin using <<<.
Example usage for bash:
input='https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring'
output="$(sed -E 's=(https://[^/]*).*(/keyword/.*)=\1\2=' <<< "$input")"
echo "$output" # prints https://x.xx.xxx.xxx/keyword/restofstring

echo "https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring" | sed "s/dontcare[0-9]\+\///g"
sed is used to manipulate text. dontcare[0-9]\+\///g is an escaped form of the regular expression dontcare[0-9]+/, which matches the word "dontcare" followed by 1 or more digits, followed by the / character.
sed's pattern works like this: s/find/replace/g, where g is a command that allowed you to match more than one instance of the pattern.
You can see that regular expression in action here.
Note that this assumes there are no dontcareNs in the rest of the string. If that's the case, Socowi's answer works better.

You could also use read with a / value for $IFS to parse out the trash.
$: IFS=/ read proto trash url trash trash trash keyword rest <<< "https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring"
$: echo "$proto//$url/$keyword/$rest"
https://x.xx.xxx.xxx/keyword/restofstring
This is more generalized when the dontcare... values aren't known and predictable strings.
This one is pure bash, though I like Socowi's answer better.

Here's a sed variation which picks out the host part and the last two components from the path.
url='http://example.com:1234/ick/poo/bar/quux/fnord'
newurl=$(echo "$url" | sed 's%\(https*://[^/?]*[^?/]\)[^ <>'"'"'"]*/\([^/ <>'"''"]*/^/ <>'"''"]*\)%\1\2%')
The general form is sed 's%pattern%replacement%' where the pattern matches through the end of the host name part (captured into one set of backslashed parentheses) then skips through the penultimate slash, then captures the remainder of the URL including the last slash; and the replacement simply recalls the two captured groups without the skipped part between them.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash parameter expansion search-and-replace capture groups - bash

You could use sed sed 's/\(.(\)\([0-9][0-9]).mp3\)/\10\2/' I think that gets the desired functionality... I'm sure there is a slicker way to do it, I'm just starting to learn sed myself.

Related

Replacing variable section in string

How to make sed avoid replacement after specific symbol

Extract a section in a config file line using sed

bash pattern substitution to remove an arbitrary long sequence of letters

bash script on specific URL string manipulation

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash parameter expansion search-and-replace capture groups - bash

You could use sed sed 's/\(.*(\)\([0-9][0-9]*).mp3\)/\10\2/' I think that gets the desired functionality... I'm sure there is a slicker way to do it, I'm just starting to learn sed myself.

Related

Replacing variable section in string

How to make sed avoid replacement after specific symbol

Extract a section in a config file line using sed

bash pattern substitution to remove an arbitrary long sequence of letters

bash script on specific URL string manipulation

Categories

Resources

You could use sed sed 's/\(.(\)\([0-9][0-9]).mp3\)/\10\2/' I think that gets the desired functionality... I'm sure there is a slicker way to do it, I'm just starting to learn sed myself.