Characters at the end do not match - ruby

I need to match all the alphabets and numbers in a string str.
This is my code.
str.match(/^(AB)(\d+)([A-Za-z][0-9])?/)
When str = AB57933A [sic], it matches only AB57933, and not the characters appended after the numbers.
If I try with str = AB57933AbC [sic], it matches only AB57933; it only matches up to the last number, and not the characters after that.

In the way you have written it:
/^(AB)(\d+)([A-Za-z][0-9])/
you impose that the last character is between 0 and 9, you can replace it depending on your needs by if you do not expect digits after the last letter
/^(AB)(\d+)([A-Za-z]+)/
or by
/^(AB)(\d+)([A-Za-z0-9]+)/
if AB57933AbC12 are also accepted as valid input.
Last but not least, if you do not use back references you can omit the parenthesis as you do not need capturing groups

Related

Using Regexp to check whether a string starts with a consonant

Is there a better way to write the following regular expression in Ruby? The first regex matches a string that begins with a (lower case) consonant, the second with a vowel.
I'm trying to figure out if there's a way to write a regular expression that matches the negative of the second expression, versus writing the first expression with several ranges.
string =~ /\A[b-df-hj-np-tv-z]/
string =~ /\A[aeiou]/
The statement
$string =~ /\A[^aeiou]/
will test whether the string starts with a non-vowel character, which includes digits, punctuation, whitespace and control characters. That is fine if you know beforehand that the string begins with a letter, but to check that it starts with a consonant you can use forward look-ahead to test that it starts with both a letter and a non-vowel, like this
$string =~ /\A(?=[^aeiou])(?=[a-z])/i
To match an arbitrary number of consonants, you can use the sub-expression (?i:(?![aeiou])[a-z]) to match a consonant. It is atomic, so you can put a repetition count like {3} right after it. For example, this program finds all the strings in a list that contain three consonants in a row
list = %w/ aab bybt xeix axei AAsE SAEE eAAs xxsa Xxsr /
puts list.select { |word| word =~ /\A(?i:(?![aeiou])[a-z]){3}/ }
output
bybt
xxsa
Xxsr
I modified the answer provided by #Alexander Cherednichenko in order to get rid of the if statements.
/^[^aeiou\W]/i.match(s) != nil
If you want to catch a string that doesn't start with vowels, but only starts with consonants you can use this code below. It returns true if a string starts with any letter other than A, E, I, O, U. s is any string we give to a function
if /^[^aeiou\W]/i.match(s) == nil
return false
else
return true
end
i added at the end to make regular expression case insensitive.
\W is used to catch any non-word character, for example if a string starts with a digit like: "1something"
[^aeiou] means a range of character except a e i o u
And we put ^ at the beginning before [ to indicate that the following range [^aeiou\W] if for the 1st character
Note that ^[^aeiou\W] pattern is not correct because it also matches a line that starts with a digit, or underscore. Borodin's solution is working well, but there is one more possible solution without lookaheads, based on character class subtraction (more here) and using the more contemporary Regexp#match?:
/\A[a-z&&[^aeiou]]/i.match?(word)
See the Rubular demo.
Details
\A - start of a string (^ in Ruby is start of any line)
[a-z&&[^aeiou]] - an a-z character range matching any ASCII letter (/i flag makes it case insensitive) except for the aeiou chars.
See the Ruby demo:
test = %w/ 1word _word ball area programming /
puts test.select { |w| /\A[a-z&&[^aeiou]]/i.match?(w) }
# => ['ball', 'programming']

How do I match repeated characters?

How do I find repeated characters using a regular expression?
If I have aaabbab, I would like to match only characters which have three repetitions:
aaa
Try string.scan(/((.)\2{2,})/).map(&:first), where string is your string of characters.
The way this works is that it looks for any character and captures it (the dot), then matches repeats of that character (the \2 backreference) 2 or more times (the {2,} range means "anywhere between 2 and infinity times"). Scan will return an array of arrays, so we map the first matches out of it to get the desired results.

Regexp for specific matching of character string

I need a regex to match something like
"4f0f30500be4443126002034"
and
"4f0f30500be4443126002034>4f0f31310be4443126005578"
but not like
"4f0f30500be4443126002034>4f0f31310be4443126005578>4f0f31310be4443126005579"
Try:
^[\da-f]{24}(>[\da-f]{24})?$
[\da-f]{24} is exactly 24 characters consisting only of 0-9, a-f. The whole pattern is one such number optionally followed by a > and a second such number.
I think you want something like:
/^[0-9a-f]{24}(>[0-9a-f]{24})?$/
That matches 24 characters in the 0-9a-f range (which matches your first string) followed by zero or one strings starting with a >, followed by 24 characters in the 0-9a-f range (which matches your second string). Here's a RegexPal for this regex.
Don't need a regex.
str = "4f0f30500be4443126002034>4f0f31310be4443126005578"
match = str.count('>') < 2
match will be set to true for matches where there are 1 or 0 '>' in the string. Otherwise match is set to false.

Regex: Substring the second last value between two slashes of a url string

I have a string like this:
http://www.example.com/value/1234/different-value
How can I extract the 1234?
Note: There may be a slash at the end:
http://www.example.com/value/1234/different-value
http://www.example.com/value/1234/different-value/
/([^/]+)(?=/[^/]+/?$)
should work. You might need to format it differently according to the language you're using. For example, in Ruby, it's
if subject =~ /\/([^\/]+)(?=\/[^\/]+\/?\Z)/
match = $~[1]
else
match = ""
end
Use Slice for Positional Extraction
If you always want to extract the 4th element (including the scheme) from a URI, and are confident that your data is regular, you can use Array#slice as follows.
'http://www.example.com/value/1234/different-value'.split('/').slice 4
#=> "1234"
'http://www.example.com/value/1234/different-value/'.split('/').slice 4
#=> "1234"
This will work reliably whether there's a trailing slash or not, whether or not you have more than 4 elements after the split, and whether or not that fourth element is always strictly numeric. It works because it's based on the element's position within the path, rather than on the contents of the element. However, you will end up with nil if you attempt to parse a URI with fewer elements such as http://www.example.com/1234/.
Use Scan/Match for Pattern Extraction
Alternatively, if you know that the element you're looking for is always the only one composed entirely of digits, you can use String#match with look-arounds to extract just the numeric portion of the string.
'http://www.example.com/value/1234/different-value'.match %r{(?<=/)\d+(?=/)}
#=> #<MatchData "1234">
$&
#=> "1234"
The look-behind and look-ahead assertions are needed to anchor the expression to a path. Without them, you'll match things like w3.example.com too. This solution is a better approach if the position of the target element may change, and if you can guarantee that your element of interest will be the only one that matches the anchored regex.
If there will be more than one match (e.g. http://www.example.com/1234/5678/) then you might want to use String#scan instead to select the first or last match. This is one of those "know your data" things; if you have irregular data, then regular expressions aren't always the best choice.
Javascript:
var myregexp = /:\/\/.*?\/.*?\/(\d+)/;
var match = myregexp.exec(subject);
if (match != null) {
result = match[1];
}
Works with your examples... But I am sure it will fail in general...
Ruby edit:
if subject =~ /:\/\/.*?\/.*?\/(.+?)\//
match = $~[1]
It does work.
I think this is a little simpler than the accepted answer, because it doesn't use any positive lookahead (?=), but rather simply makes the last slash optional via the ? character:
^.+\/(.+)\/.+\/?$
In Ruby:
STDIN.read.split("\n").each do |nextline|
if nextline =~ /^.+\/(.+)\/.+\/?$/
printf("matched %s in %s\n", $~[1], nextline);
else
puts "no match"
end
end
Live Demo
Let's break down what's happening:
^: start of the line
.+\/: match anything (greedily) up to a slash
Since we're going to later match at least 1, at most 2 more slashes, this slash will be either the second last slash (as in http://www.example.com/value/1234/different-value) or the third last slash as in (http://www.example.com/value/1234/different-value/)
Up to this point we've matched http://www.example.com/value/ (due to greediness)
(.+)\/: Our capturing group for 1234 indicated by the parenthesis. It's anything followed by another slash.
Since the previous match matched up to the second or third last slash, this will match up to the last slash or second last slash, respectively
.+: match anything. This would be after our 1234, so we're assuming there are characters after 1234/ (different-value)
\/?: optionally match another slash (the slash after different-value)
$: match the end of the line
Note that in a url, you probably won't have spaces. I used the . character because it's easily distinguished, but perhaps you might use \S instead to match non-spaces.
Also, you might use \A instead of ^ to match start of string (instead of after line break) and \Z instead of $ to match end of string (instead of at line break)

ruby parametrized regular expression

I have a string like "{some|words|are|here}" or "{another|set|of|words}"
So in general the string consists of an opening curly bracket,words delimited by a pipe and a closing curly bracket.
What is the most efficient way to get the selected word of that string ?
I would like do something like this:
#my_string = "{this|is|a|test|case}"
#my_string.get_column(0) # => "this"
#my_string.get_column(2) # => "is"
#my_string.get_column(4) # => "case"
What should the method get_column contain ?
So this is the solution I like right now:
class String
def get_column(n)
self =~ /\A\{(?:\w*\|){#{n}}(\w*)(?:\|\w*)*\}\Z/ && $1
end
end
We use a regular expression to make sure that the string is of the correct format, while simultaneously grabbing the correct column.
Explanation of regex:
\A is the beginnning of the string and \Z is the end, so this regex matches the enitre string.
Since curly braces have a special meaning we escape them as \{ and \} to match the curly braces at the beginning and end of the string.
next, we want to skip the first n columns - we don't care about them.
A previous column is some number of letters followed by a vertical bar, so we use the standard \w to match a word-like character (includes numbers and underscore, but why not) and * to match any number of them. Vertical bar has a special meaning, so we have to escape it as \|. Since we want to group this, we enclose it all inside non-capturing parens (?:\w*\|) (the ?: makes it non-capturing).
Now we have n of the previous columns, so we tell the regex to match the column pattern n times using the count regex - just put a number in curly braces after a pattern. We use standard string substition, so we just put in {#{n}} to mean "match the previous pattern exactly n times.
the first non skipped column after that is the one we care about, so we put that in capturing parens: (\w*)
then we skip the rest of the columns, if any exist: (?:\|\w*)*.
Capturing the column puts it into $1, so we return that value if the regex matched. If not, we return nil, since this String has no nth column.
In general, if you wanted to have more than just words in your columns (like "{a phrase or two|don't forget about punctuation!|maybe some longer strings that have\na newline or two?}"), then just replace all the \w in the regex with [^|{}] so you can have each column contain anything except a curly-brace or a vertical bar.
Here's my previous solution
class String
def get_column(n)
raise "not a column string" unless self =~ /\A\{\w*(?:\|\w*)*\}\Z/
self[1 .. -2].split('|')[n]
end
end
We use a similar regex to make sure the String contains a set of columns or raise an error. Then we strip the curly braces from the front and back (using self[1 .. -2] to limit to the substring starting at the first character and ending at the next to last), split the columns using the pipe character (using .split('|') to create an array of columns), and then find the n'th column (using standard Array lookup with [n]).
I just figured as long as I was using the regex to verify the string, I might as well use it to capture the column.

Resources