Using Ruby + regex, given:
starting-middle+31313131313#mysite.com
I want to obtain just: 31313131313
ie, what is between starting-middle+ and mysite.com
Here's what I have so far:
to = 'starting-middle+31313131313#mysite.com'
to.split(/\+/#mysite.com.*/).first.strip
Between 1st + and 1st #:
to[/\+(.*?)#/,1]
Between 1st + and last #:
to[/\+(.*)#/,1]
Between last + and last #:
to[/.*\+(.*)#/,1]
Between last + and 1st #:
to[/.*\+(.*?)#/,1]
Here is a solution without regex (much easier for me to read):
i = to.index("+")
j = to.index("#")
to[i+1..j-1]
If you care about readability, i suggest to just use "split", like so:
string.split("from").last.split("to").first or, in your case:
to.split("+").last.split("#").first
use the limit 2 if there are more occurancies of '+' or '#' to only care about the first occurancy:
to.split("+",2).last.split("#",2).first
Here is a solution based on regex lookbehind and lookahead.
email = "starting-middle+31313131313#mysite.com"
regex = /(?<=\+).*(?=#)/
regex.match(email)
=> #<MatchData "31313131313">
Explanation
Lookahead is indispensable if you want to match something followed by something else. In your case, it's a position followed by #, which express as (?=#)
Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. In your case, it's a position after +, which express as (?<=\+)
so we can combine those two conditions together.
lookbehind (what you want) lookahead
↓ ↓ ↓
(?<=\+) .* (?=#)
Reference
Regex: Lookahead and Lookbehind Zero-Length Assertions
Related
I'm saving a number with params[:number].gsub(/\D/,''), but I don't want to strip the plus symbol: +
For example if a user saves number +1 (516) 949-9508 it saves as 15169499508 but how can we preserve the + as +15169499508?
In Ruby \D is just an alias for [^0-9]. You may explicitly set [^0-9+]:
params[:number].gsub(/[^0-9+]/,'')
I understand you only want to keep a plus only at the start of the string. You need to use:
.gsub(/\A(\+)|\D+/, '\1')
Here, \A(\+) branch matches a literal plus at the start of the string. The second branch is your \D that matches all chars but digits, just with a + quantifier that matches 1 or more occurrences. The \1 backreference restores that initial plus symbol in the resulting string.
If you don't have any syntactic rules, delete would work just fine:
'+1 (516) 949-9508'.delete('^0-9+') #=> "+15169499508"
What regex can I use in place of regex in the code:
"<tr><td>Total</td><td class=\"bar\">561 of 931</td><td class=\"ctr2\">40%</td><td class=\"bar\">38 of 58</td><td class=\"ctr2\">34%</td><td class=\"ctr1\">58</td><td class=\"ctr2\">94</td>"
.scan(regex).last
to get "40%" (the first percentage figure) without modifying any other part of the code above?
I would do something like this:
regexp = /\A.*?(\d+%)/
matches = "<tr><td>Total</td><td class=\"bar\">561 of 931</td><td class=\"ctr2\">40%</td><td class=\"bar\">38 of 58</td><td class=\"ctr2\">34%</td><td class=\"ctr1\">58</td><td class=\"ctr2\">94</td>".scan(regexp).last
puts matches
#=> 40%
Explanation: \A matches the beginning of the string, .*? matches everything non-greedy and (\d+%) finally matches the number and the percentage sign.
I am currently working on a ruby program to calculate terms. It works perfectly fine except for one thing: brackets. I need to filter the content or at least, to put the content into an array, but I have tried for an hour to come up with a solution. Here is my code:
splitted = term.split(/\(+|\)+/)
I need an array instead of the brackets, for example:
"1-(2+3)" #=>["1", "-", ["2", "+", "3"]]
I already tried this:
/(\((?<=.*)\))/
but it returned:
Invalid pattern in look-behind.
Can someone help me with this?
UPDATE
I forgot to mention, that my program will split the term, I only need the content of the brackets to be an array.
If you need to keep track of the hierarchy of parentheses with arrays, you won't manage it just with regular expressions. You'll need to parse the string word by word, and keep a stack of expressions.
Pseudocode:
Expressions = new stack
Add new array on stack
while word in string:
if word is "(": Add new array on stack
Else if word is ")": Remove the last array from the stack and add it to the (next) last array of the stack
Else: Add the word to the last array of the stack
When exiting the loop, there should be only one array in the stack (if not, you have inconsistent opening/closing parentheses).
Note: If your ultimate goal is to evaluate the expression, you could save time and parse the string in Postfix aka Reverse-Polish Notation.
Also consider using off-the-shelf libraries.
A solution depends on the pattern you expect between the parentheses, which you have not specified. (For example, for "(st12uv)" you might want ["st", "12", "uv"], ["st12", "uv"], ["st1", "2uv"] and so on). If, as in your example, it is a natural number followed by a +, followed by another natural number, you could do this:
str = "1-( 2+ 3)"
r = /
\(\s* # match a left parenthesis followed by >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
(\+) # match a plus sign in a capture group
\s* # match >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
\) # match a right parenthesis
/x
str.scan(r0).first
=> ["2", "+", "3"]
Suppose instead + could be +, -, * or /. Then you could change:
(\+)
to:
([-+*\/])
Note that, in a character class, + needn't be escaped and - needn't be escaped if it is the first or last character of the class (as in those cases it would not signify a range).
Incidentally, you received the error message, "Invalid pattern in look-behind" because Ruby's lookarounds cannot contain variable-length matches (i.e., .*). With positive lookbehinds you can get around that by using \K instead. For example,
r = /
\d+ # match one or more digits
\K # forget everything previously matched
[a-z]+ # match one or more lowercase letters
/x
"123abc"[r] #=> "abc"
I'm trying to remove a period prior to the "#" symbol from an email. I got:
array[0][2].gsub(/\./, '').strip
which removes both periods; "an.email#test.com" becomes "anemail#testcom", while I'm looking for it to become "anemail#test.com". I can't remove just the single period by itself. What am I doing wrong?
If there are no periods before # or if there are more than one period, you can use this regex
email = "my.very.long.email#me.com"
email.gsub(/\.(?=[^#]*\#)/, '')
# => "myverylongemail#me.com"
Regex explanation: period followed by zero or more occurrence of any character other than #, followed by an #
If only the first occurrence of a period before # has to be removed, you can use the same regex with sub instead of gsub
result = subject.gsub(/\.(?=\S+#)/, '')
Explanation
\. matches a period
the (?=\S+#) lookahead asserts that what follows is any non-whitespace chars followed by an arrobas
we replace with the empty string
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
Don't make this more complicated by trying to make it short. Just write it the way you mean it:
a, b = address.split('#')
cleaned = [a.delete('.'), b].join('#')
I have a string like this:
http://www.example.com/value/1234/different-value
How can I extract the 1234?
Note: There may be a slash at the end:
http://www.example.com/value/1234/different-value
http://www.example.com/value/1234/different-value/
/([^/]+)(?=/[^/]+/?$)
should work. You might need to format it differently according to the language you're using. For example, in Ruby, it's
if subject =~ /\/([^\/]+)(?=\/[^\/]+\/?\Z)/
match = $~[1]
else
match = ""
end
Use Slice for Positional Extraction
If you always want to extract the 4th element (including the scheme) from a URI, and are confident that your data is regular, you can use Array#slice as follows.
'http://www.example.com/value/1234/different-value'.split('/').slice 4
#=> "1234"
'http://www.example.com/value/1234/different-value/'.split('/').slice 4
#=> "1234"
This will work reliably whether there's a trailing slash or not, whether or not you have more than 4 elements after the split, and whether or not that fourth element is always strictly numeric. It works because it's based on the element's position within the path, rather than on the contents of the element. However, you will end up with nil if you attempt to parse a URI with fewer elements such as http://www.example.com/1234/.
Use Scan/Match for Pattern Extraction
Alternatively, if you know that the element you're looking for is always the only one composed entirely of digits, you can use String#match with look-arounds to extract just the numeric portion of the string.
'http://www.example.com/value/1234/different-value'.match %r{(?<=/)\d+(?=/)}
#=> #<MatchData "1234">
$&
#=> "1234"
The look-behind and look-ahead assertions are needed to anchor the expression to a path. Without them, you'll match things like w3.example.com too. This solution is a better approach if the position of the target element may change, and if you can guarantee that your element of interest will be the only one that matches the anchored regex.
If there will be more than one match (e.g. http://www.example.com/1234/5678/) then you might want to use String#scan instead to select the first or last match. This is one of those "know your data" things; if you have irregular data, then regular expressions aren't always the best choice.
Javascript:
var myregexp = /:\/\/.*?\/.*?\/(\d+)/;
var match = myregexp.exec(subject);
if (match != null) {
result = match[1];
}
Works with your examples... But I am sure it will fail in general...
Ruby edit:
if subject =~ /:\/\/.*?\/.*?\/(.+?)\//
match = $~[1]
It does work.
I think this is a little simpler than the accepted answer, because it doesn't use any positive lookahead (?=), but rather simply makes the last slash optional via the ? character:
^.+\/(.+)\/.+\/?$
In Ruby:
STDIN.read.split("\n").each do |nextline|
if nextline =~ /^.+\/(.+)\/.+\/?$/
printf("matched %s in %s\n", $~[1], nextline);
else
puts "no match"
end
end
Live Demo
Let's break down what's happening:
^: start of the line
.+\/: match anything (greedily) up to a slash
Since we're going to later match at least 1, at most 2 more slashes, this slash will be either the second last slash (as in http://www.example.com/value/1234/different-value) or the third last slash as in (http://www.example.com/value/1234/different-value/)
Up to this point we've matched http://www.example.com/value/ (due to greediness)
(.+)\/: Our capturing group for 1234 indicated by the parenthesis. It's anything followed by another slash.
Since the previous match matched up to the second or third last slash, this will match up to the last slash or second last slash, respectively
.+: match anything. This would be after our 1234, so we're assuming there are characters after 1234/ (different-value)
\/?: optionally match another slash (the slash after different-value)
$: match the end of the line
Note that in a url, you probably won't have spaces. I used the . character because it's easily distinguished, but perhaps you might use \S instead to match non-spaces.
Also, you might use \A instead of ^ to match start of string (instead of after line break) and \Z instead of $ to match end of string (instead of at line break)