Suppose your PATH was set to:
c:\sw\python\3.9.0;c:\sw\python\3.9.1;c:\sw\python\3.10.0;etc
I'd like a clean way to remove all that match:
c:\sw\python\3.9.*
And therefore end up with a PATH of:
c:\sw\python\3.10.0;etc
My ideal solution would involve an Ansible module, but I'm fine with a pure Powershell solution, provided its very few lines and readable.
Divide et impera works fine here. First split the path on semicolons to get a single path element, then filter out non-wanted substrings and finally re-join the results. Like so,
$pp = "c:\data;c:\sw\python\3.9.0;c:\sw\python\3.9.1;c:\sw\python\3.10.0;etc"
# Split the string on semicolon,
# Pick strings that do not match python\3.10
# Join the results together with semicolons as separators
$np = ($pp.split(';') | ? {$_ -notmatch "python\\3\.[^1][^0]"} ) -join ';'
# Output
$np
c:\data;c:\sw\python\3.10.0;etc
Related
I have a log file, and I want to get rid of the third column that start with "external", this column is not always in the third place so I need to find the word "external" and then delete it with the string that follows the colon.
I was thinking in using -replace for that, but does "-replace" accept some regex to delete the rest of the string (after the semicolons) that is always changing?
or maybe there is a better way to do this?
02/02/2020 name:VAL_NATURE external:af2045b2-5992-432e-b790-c1ad4743038 status:good
cat mylog.log | %{$_ -replace "external???",""}
With any delimited file, the first thought I have is to break it at the delimiters (in your case, the white space) and treat it like an object. Deleting a column is trivial if you do that, and it lets you have easy access to the data for other purposes.
If, however, your only task is to remove that column with 'external' + colon + all text up to the next bit of white space, that is an easy thing to do with a regex replace.
$line = '02/02/2020 name:VAL_NATURE external:af2045b2-5992-432e-b790-c1ad4743038 status:good'
$line -replace 'external:.*\s',''
EDIT: Tested the code above, and got this output:
02/02/2020 name:VAL_NATURE status:good
The . is any character, and .* says "any character zero or more times" it continues matching until it gets to whitespace, which is represented by the \s. So this regex matches the word 'external' followed by a ':' followed by zero or more other characters followed by whitespace (space/tab/etc).
I have an input file named test which looks like this
leonid sergeevich vinogradov
ilya alexandrovich svintsov
and when I use grep like this grep 'leonid*vinogradov' test it says nothing, but when I type grep 'leonid.*vinogradov' test it gives me the first string. What's the difference between * and .*? Because I see no difference between any number of any characters and any character followed by any number of any characters.
I use ubuntu 14.04.3.
* doesn't match any number of characters, like in a file glob. It is an operator, which indicates 0 or more matches of the previous character. The regular expression leonid*vinogradov would require a v to appear immediately after 0 or more ds. The . is the regular expression metacharcter representing any single character, so .* matches 0 or more arbitrary characters.
grep uses regex and .* matches 0 or more of any characters.
Where as 'leonid*vinogradov' is also evaluated as regex and it means leoni followed by 0 or more of letter d hence your match fails.
It's Regular Expression grep uses, short as regexp, not wildcards you thought. In this case, "." means any character, "" means any number of (include zero) the previous character, so "." means anything here.
Check the link, or google it, it's a powerful tool you'll find worth to knew.
I'm trying to make a regex that matches anything except an exact ending string, in this case, the extension '.exe'.
Examples for a file named:
'foo' (no extension) I want to get 'foo'
'foo.bar' I want to get 'foo.bar'
'foo.exe.bar' I want to get 'foo.exe.bar'
'foo.exe1' I want to get 'foo.exe1'
'foo.bar.exe' I want to get 'foo.bar'
'foo.exe' I want to get 'foo'
So far I created the regex /.*\.(?!exe$)[^.]*/
but it doesn't work for cases 1 and 6.
You can use a positive lookahead.
^.+?(?=\.exe$|$)
^ start of string
.+? non greedily match one or more characters...
(?=\.exe$|$) until literal .exe occurs at end. If not, match end.
See demo at Rubular.com
Wouldn't a simple replacement work?
string.sub(/\.exe\z/, "")
Do you mean regex matching or capturing?
There may be a regex only answer, but it currently eludes me. Based on your test data and what you want to match, doing something like the following would cover both what you want to match and capture:
name = 'foo.bar.exe'
match = /(.*).exe$/.match(name)
if match == nil
# then this filename matches your conditions
print name
else
# otherwise match[1] is the capture - filename without .exe extension
print match[1]
end
string pattern = #" (?x) (.* (?= \.exe$ )) | ((?=.*\.exe).*)";
First match is a positive look-ahead that checks if your string
ends with .exe. The condition is not included in the match.
Second match is a positive look-ahead with the condition included in the
match. It only checks if you have something followed by .exe.
(?x) is means that white spaces inside the pattern string are ignored.
Or don't use (?x) and just delete all white spaces.
It works for all the 6 scenarios provided.
I have a regex that gives me one result in sed but another in Perl (and Ruby).
I have the string one;two;;three and I want to highlight the substrings delimited by the ;. So I do the following in Perl:
$a = "one;two;;three";
$a =~ s/([^;]*)/[\1]/g;
print $a;
(Or, in Ruby: print "one;two;;three".gsub(/([^;]*)/, "[\\1]").)
The result is:
[one][];[two][];[];[three][]
(I know the reason for the spurious empty substrings.)
Curiously, when I run the same regexp in sed I get a different result. I run:
echo "one;two;;three" | sed -e 's/[^;]*/[\0]/g'
and I get:
[one];[two];[];[three]
What is the reason for this different result?
EDIT:
Somebody replied "because sed is not perl". I know that. The reason I'm asking my question is because I don't understand how sed copes so well with zero-length matches.
This is an interesting and surprising edge case.
Your [^;]* pattern may match the empty string, so it becomes a philosophy question, viz., how many empty strings are between two characters: zero, one, or many?
sed
The sed match clearly follows the philosophy described in the “Advancing After a Zero–Length Regex Match” section of “Zero–Length Regex Matches.”
Now the regex engine is in a tricky situation. We’re asking it to go through the entire string to find all non–overlapping regex matches. The first match ended at the start of the string, where the first match attempt began. The regex engine needs a way to avoid getting stuck in an infinite loop that forever finds the same zero-length match at the start of the string.
The simplest solution, which is used by most regex engines, is to start the next match attempt one character after the end of the previous match, if the previous match was zero–length.
That is, zero empty strings are between characters.
The above passage is not an authoritative standard, and quoting such a document instead would make this a better answer.
Inspecting the source of GNU sed, we see
/* Start after the match. last_end is the real end of the matched
substring, excluding characters that were skipped in case the RE
matched the empty string. */
start = offset + matched;
last_end = regs.end[0];
Perl and Ruby
Perl’s philosophy with s///, which Ruby seems to share—so the documentation and examples below use Perl to represent both—is there is exactly one empty string after each character.
The “Regexp Quote–Like Operators” section of the perlop documentation reads
The /g modifier specifies global pattern matching—that is, matching as many times as possible within the string.
Tracing execution of s/([^;]*)/[\1]/g gives
Start. The “match position,” denoted by ^, is at the beginning of the target string.
o n e ; t w o ; ; t h r e e
^
Attempt to match [^;]*.
o n e ; t w o ; ; t h r e e
^
Note that the result captured in $1 is one.
Attempt to match [^;]*.
o n e ; t w o ; ; t h r e e
^
Important Lesson: The * regex quantifier always succeeds because it means “zero or more.” In this case, the substring in $1 is the empty string.
The rest of the match proceeds as in the above.
Being a perceptive reader, you now ask yourself, “Self, if * always succeeds, how does the match terminate at the end of the target string, or for that matter, how does it get past even the first zero–length match?”
We find the answer to this incisive question in the “Repeated Patterns Matching a Zero–length Substring” section of the perlre documentation.
However, long experience has shown that many programming tasks may be significantly simplified by using repeated subexpressions that may match zero–length substrings. Here’s a simple example being:
#chars = split //, $string; # // is not magic in split
($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
Thus Perl allows such constructs, by forcefully breaking the infinite loop. The rules for this are different for lower–level loops given by the greedy quantifiers *+{}, and for higher-level ones like the /g modifier or split operator.
…
The higher–level loops preserve an additional state between iterations: whether the last match was zero–length. To break the loop, the following match after a zero–length match is prohibited to have a length of zero. This prohibition interacts with backtracking … and so the second best match is chosen if the best match is of zero length.
Other Perl approaches
With the addition of a negative lookbehind assertion, you can filter the spurious empty matches.
$ perl -le '$a = "one;two;;three";
$a =~ s/(?<![^;])([^;]*)/[\1]/g;
print $a;'
[one];[two];[];[three]
Apply what Mark Dominus dubbed Randal’s Rule, “Use capturing when you know what you want to keep. Use split when you know what you want to throw away.” You want to throw away the semicolons, so your code becomes more direct with
$ perl -le '$a = "one;two;;three";
$a = join ";", map "[$_]", split /;/, $a;
print $a;'
[one];[two];[];[three]
From the source code for sed-4.2 for the substitute function:
/sed/execute.c
/* If we're counting up to the Nth match, are we there yet?
And even if we are there, there is another case we have to
skip: are we matching an empty string immediately following
another match?
This latter case avoids that baaaac, when passed through
s,a*,x,g, gives `xbxxcx' instead of xbxcx. This behavior is
unacceptable because it is not consistently applied (for
example, `baaaa' gives `xbx', not `xbxx'). */
This indicates that the behavior we see in Ruby and Perl was consciously avoided in sed. This is not due to any fundamental difference between the languages but a result of special handling in sed
There's something else going on in the perl (and presumably ruby) scripts as that output makes no sense for simply handling the regexp as a BRE or ERE.
awk (EREs) and sed (BREs) behave as they should for just doing an RE replacement:
$ echo "one;two;;three" | sed -e 's/[^;]*/[&]/g'
[one];[two];[];[three]
$ echo "one;two;;three" | awk 'gsub(/[^;]*/,"[&]")'
[one];[two];[];[three]
You said I know the reason for the spurious empty substrings.. Care to clue us in?
Hey I'm trying to use a regex to count the number of quotes in a string that are not preceded by a backslash..
for example the following string:
"\"Some text
"\"Some \"text
The code I have was previously using String#count('"')
obviously this is not good enough
When I count the quotes on both these examples I need the result only to be 1
I have been searching here for similar questions and ive tried using lookbehinds but cannot get them to work in ruby.
I have tried the following regexs on Rubular from this previous question
/[^\\]"/
^"((?<!\\)[^"]+)"
^"([^"]|(?<!\)\\")"
None of them give me the results im after
Maybe a regex is not the way to do that. Maybe a programatic approach is the solution
How about string.count('"') - string.count("\\"")?
result = subject.scan(
/(?: # match either
^ # start-of-string\/line
| # or
\G # the position where the previous match ended
| # or
[^\\] # one non-backslash character
) # then
(\\\\)* # match an even number of backslashes (0 is even, too)
" # match a quote/x)
gives you an array of all quote characters (possibly with a preceding non-quote character) except unescaped ones.
The \G anchor is needed to match successive quotes, and the (\\\\)* makes sure that backslashes are only counted as escaping characters if they occur in odd numbers before the quote (to take Amarghosh's correct caveat into account).