I need help to extract some values from a command:
PS C:\Users\cs> c:\windows\system32\inetsrv\appcmd list sites
SITE "A" (id:1,bindings:http//:csdev.do.com,state:Stopped)
SITE "B" (id:2,bindings:tsd-gr2,state:Stopped)
SITE "C" (id:3,bindings:http/1028:8091:,http/19.28:80:ddprem.do.com,state:Stopped)
SITE "D" (id:4,bindings:http/109.149.232:80,state:Stopped)
I tried to extract first value as below:
PS C:\Users\cs> c:\windows\system32\inetsrv\appcmd list sites | %{ $_.Split('\"')[1]; }
A
B
C
D
I need two more field: the ID and the URL (only if there is do.com in bindings). There might be many URLs in the binding. I need only the first one which has do.com all the remaining should be marked as null or blank.
A,1,csdev.do.com
B,2,null
C,3,ddprem.do.com
D,4,null
While using the WebAdministration module seems like the best approach, you could try regex for this looping over the lines the command c:\windows\system32\inetsrv\appcmd list sites returns and parse the values you need from them.
Since I cannot test this for real myself, I'm using your example output from c:\windows\system32\inetsrv\appcmd list sites as a string array:
$siteList = 'SITE "A" (id:1,bindings:http//:csdev.do.com,state:Stopped)',
'SITE "B" (id:2,bindings:tsd-gr2,state:Stopped)',
'SITE "C" (id:3,bindings:http/1028:8091:,http/19.28:80:ddprem.do.com,state:Stopped)',
'SITE "D" (id:4,bindings:http/109.149.232:80,state:Stopped)'
$regex = [regex] '^SITE "(?<site>\w+)".+id:(?<id>\d+),bindings:(?:.+:(?<url>\w+\.do\.com))?'
$siteList | ForEach-Object {
$match = $regex.Match($_)
while ($match.Success) {
$url = if ($match.Groups['url'].Value) { $match.Groups['url'].Value } else { 'null' }
'{0},{1},{2}' -f $match.Groups['site'].Value, $match.Groups['id'].Value, $url
$match = $match.NextMatch()
}
}
Result:
A,1,csdev.do.com
B,2,null
C,3,ddprem.do.com
D,4,null
Regex details:
^ Assert position at the beginning of the string
SITE\ " Match the characters “SITE "” literally
(?<site> Match the regular expression below and capture its match into backreference with name “site”
\w Match a single character that is a “word character” (letters, digits, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
" Match the character “"” literally
. Match any single character that is not a line break character
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
id: Match the characters “id:” literally
(?<id> Match the regular expression below and capture its match into backreference with name “id”
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
,bindings: Match the characters “,bindings:” literally
(?: Match the regular expression below
. Match any single character that is not a line break character
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
: Match the character “:” literally
(?<url> Match the regular expression below and capture its match into backreference with name “url”
\w Match a single character that is a “word character” (letters, digits, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\. Match the character “.” literally
do Match the characters “do” literally
\. Match the character “.” literally
com Match the characters “com” literally
)
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
Related
I have a string:
string1 = "my name is fname.lname and i live in xyz. my lname is not common"
I want to extract a substring from string1 that is anything between the first empty space " " and ".lname". In the case above, the answer should be "fname.lname"`.
string1[/(?<= ).*?(?=\.lname\b)/]
#=> "name is fname"
(?<= ) is a positive lookbehind that requires the first character matched be immediately preceded by a space, but that space is not part of the match.
(?=\.lname\b) is a positive lookahead that requires the last character matched is immediately followed by the string ".lname"1
, which is itself followed by a word break (\b), but that string is not part of the match. That ensures, for example, that "\.lnamespace" is not matched. If that should be matched, remove \b.
.*? matches zero more characters (.*), non-greedily (?). (Matches are by default greedy.) The non-greedy qualifier has the following effect:
"my name is fname.lname and fname.lname"[/(?<= ).*(?=\.lname\b)/]
#=> "name is fname.lname and fname"
"my name is fname.lname and fname.lname"[/(?<= ).*?(?=\.lname\b)/]
#=> "name is fname"
In other words, the non-greedy (greedy) match matches the first (last) occurrence of ".lname" in the string.
This could alternatively be written with a capture group and no lookarounds:
string1[/ (.*?)\.lname\b/, 1]
#=> "name is fname"
This regular expression reads, "mactch a space followed by zero or more characters, saved in capture group 1, followed by the string ".name" followed by a word break. This uses the form of String#[] that has two arguments, a reference to a capture group.
Yet another way follows.
string1[(string1 =~ / /)+1..(string1 =~ /\.lname\b/)-1]
#=> "name is fname"
1 The period in ".lname" must be escaped because an unescaped period in a regular expression (except in a character class) matches any character.
I want to match characters across multiple lines so I enabled the m flag. However, I do not want to match a specific \n. Instead I want to match a space \s only. But it seems like the newline is matching spaces too:
" 41\n6332 Hardin Rd, Bensalem, PA\n 19020" =~ /\s(\d+\s.+,.+,.+\d+)/m
=> 0
" 41\n6332 Hardin Rd, Bensalem, PA\n 19020" =~ /\s(\d+[ ].+,.+,.+\d+)/m
=> 3
Even I try to explicitly ignore the newline:
" 41\n6332 Hardin Rd, Bensalem, PA\n 19020" =~ /\s(\d+[^\n].+,.+,.+\d+)/m
=> 0
Why is the newline matching a space character? And what can I do to ensure that it does not and still matches characters across multiple lines everywhere else?
The /\s(\d+[^\n].+,.+,.+\d+)/m pattern matches " 41\n6332 Hardin Rd, Bensalem, PA\n 19020" because when the regex engine gets to [^\n] after matching 41 with \d+ backtracking occurs: the regex engine tries to match the string differently since it encountered \n and the next char should be a different char. So, it steps back to \d+ and matches 4, and 1 is not a newline, so matching continues.
You may anchor the search at the start of the string and prevent backtracking with a possessive quantifier, also implementing the negative check with a lookahead:
/\A\s*(\d++(?!\n).+,.+,.+\d)/m
See the regex demo
Details
\A - start of string
\s* - 0+ whitespaces
(\d++(?!\n).+,.+,.+\d) - Capturing group 1:
\d++(?!\n) - 1+ digits (matched possessively with ++ quantifier) not followed with a newline (as (?!\n) is a negative lookahead that fails the match if there is a newline immediately to the right of the current location)
.+,.+, - 2 occurrences of any 1+ chars as many as possible, followed with ,
.+\d - any 1+ chars as many as possible followed with a digit.
I am currently working on a ruby program to calculate terms. It works perfectly fine except for one thing: brackets. I need to filter the content or at least, to put the content into an array, but I have tried for an hour to come up with a solution. Here is my code:
splitted = term.split(/\(+|\)+/)
I need an array instead of the brackets, for example:
"1-(2+3)" #=>["1", "-", ["2", "+", "3"]]
I already tried this:
/(\((?<=.*)\))/
but it returned:
Invalid pattern in look-behind.
Can someone help me with this?
UPDATE
I forgot to mention, that my program will split the term, I only need the content of the brackets to be an array.
If you need to keep track of the hierarchy of parentheses with arrays, you won't manage it just with regular expressions. You'll need to parse the string word by word, and keep a stack of expressions.
Pseudocode:
Expressions = new stack
Add new array on stack
while word in string:
if word is "(": Add new array on stack
Else if word is ")": Remove the last array from the stack and add it to the (next) last array of the stack
Else: Add the word to the last array of the stack
When exiting the loop, there should be only one array in the stack (if not, you have inconsistent opening/closing parentheses).
Note: If your ultimate goal is to evaluate the expression, you could save time and parse the string in Postfix aka Reverse-Polish Notation.
Also consider using off-the-shelf libraries.
A solution depends on the pattern you expect between the parentheses, which you have not specified. (For example, for "(st12uv)" you might want ["st", "12", "uv"], ["st12", "uv"], ["st1", "2uv"] and so on). If, as in your example, it is a natural number followed by a +, followed by another natural number, you could do this:
str = "1-( 2+ 3)"
r = /
\(\s* # match a left parenthesis followed by >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
(\+) # match a plus sign in a capture group
\s* # match >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
\) # match a right parenthesis
/x
str.scan(r0).first
=> ["2", "+", "3"]
Suppose instead + could be +, -, * or /. Then you could change:
(\+)
to:
([-+*\/])
Note that, in a character class, + needn't be escaped and - needn't be escaped if it is the first or last character of the class (as in those cases it would not signify a range).
Incidentally, you received the error message, "Invalid pattern in look-behind" because Ruby's lookarounds cannot contain variable-length matches (i.e., .*). With positive lookbehinds you can get around that by using \K instead. For example,
r = /
\d+ # match one or more digits
\K # forget everything previously matched
[a-z]+ # match one or more lowercase letters
/x
"123abc"[r] #=> "abc"
I need the username to be two or more characters of a-z, 0-9, all downcase. This is the current regex I am using
USER_REGEX = /\A[a-z0-9][-a-z0-9]{1,19}\z/i
With this regex, users are able to use uppercase charters in their username. How do I modify the current regex to avoid that?
The regular expression to filter for two to twenty lower-case characters or digits is
/^[a-z0-9]{2,20}$/
which means:
^ at the front of input
a-z accept lower-case 'a' through 'z'
0-9 accept '0' through '9'
{2,20} accept 2 to 20 elements from preceding [] block
$ until the end of input
You can make a regular expression case-insensitive with trailing i, as in your example; that appears to be the root of problem. That said, I don't know Ruby's peculiarities with respect to regular expressions.
If you must keep the RegEx - remove the "i" from the end
USER_REGEX = /\A[a-z0-9][-a-z0-9]{1,19}\z/i
USER_REGEX = /\A[a-z0-9][-a-z0-9]{1,19}\z/
the "i" tells the RegEx to be a case-insensitive RegEx.
but you want it to be case-sensitive and only match on lowercase letters.
I have the following string:
"h3. My Title Goes Here"
I basically want to remove the first four characters from the string so that I just get back:
"My Title Goes Here".
The thing is I am iterating over an array of strings and not all have the h3. part in front so I can't just ditch the first four characters blindly.
I checked the docs and the closest thing I could find was chomp, but that only works for the end of a string.
Right now I am doing this:
"h3. My Title Goes Here".reverse.chomp(" .3h").reverse
This gives me my desired output, but there has to be a better way. I don't want to reverse a string twice for no reason. Is there another method that will work?
To alter the original string, use sub!, e.g.:
my_strings = [ "h3. My Title Goes Here", "No h3. at the start of this line" ]
my_strings.each { |s| s.sub!(/^h3\. /, '') }
To not alter the original and only return the result, remove the exclamation point, i.e. use sub. In the general case you may have regular expressions that you can and want to match more than one instance of, in that case use gsub! and gsub—without the g only the first match is replaced (as you want here, and in any case the ^ can only match once to the start of the string).
You can use sub with a regular expression:
s = 'h3. foo'
s.sub!(/^h[0-9]+\. /, '')
puts s
Output:
foo
The regular expression should be understood as follows:
^ Match from the start of the string.
h A literal "h".
[0-9] A digit from 0-9.
+ One or more of the previous (i.e. one or more digits)
\. A literal period.
A space (yes, spaces are significant by default in regular expressions!)
You can modify the regular expression to suit your needs. See a regular expression tutorial or syntax guide, for example here.
A standard approach would be to use regular expressions:
"h3. My Title Goes Here".gsub /^h3\. /, '' #=> "My Title Goes Here"
gsub means globally substitute and it replaces a pattern by a string, in this case an empty string.
The regular expression is enclosed in / and constitutes of:
^ means beginning of the string
h3 is matched literally, so it means h3
\. - a dot normally means any character so we escape it with a backslash
is matched literally