How do I match repeated characters? - ruby

How do I find repeated characters using a regular expression?
If I have aaabbab, I would like to match only characters which have three repetitions:
aaa

Try string.scan(/((.)\2{2,})/).map(&:first), where string is your string of characters.
The way this works is that it looks for any character and captures it (the dot), then matches repeats of that character (the \2 backreference) 2 or more times (the {2,} range means "anywhere between 2 and infinity times"). Scan will return an array of arrays, so we map the first matches out of it to get the desired results.

Related

Characters at the end do not match

I need to match all the alphabets and numbers in a string str.
This is my code.
str.match(/^(AB)(\d+)([A-Za-z][0-9])?/)
When str = AB57933A [sic], it matches only AB57933, and not the characters appended after the numbers.
If I try with str = AB57933AbC [sic], it matches only AB57933; it only matches up to the last number, and not the characters after that.
In the way you have written it:
/^(AB)(\d+)([A-Za-z][0-9])/
you impose that the last character is between 0 and 9, you can replace it depending on your needs by if you do not expect digits after the last letter
/^(AB)(\d+)([A-Za-z]+)/
or by
/^(AB)(\d+)([A-Za-z0-9]+)/
if AB57933AbC12 are also accepted as valid input.
Last but not least, if you do not use back references you can omit the parenthesis as you do not need capturing groups

Match consecutive list of exactly one character in set with regular expressions

I don't think I'll even try to explain this, I don't know the words to, but I'd like to achieve the following:
Given a string like this:
+++>><<<--
I'd like a match to give me: +++, but also match if any of the other characters were in the string consecutively like they are. So if the +++ wasn't there, I'd like to match >>.
I tried using the following regular expression:
([><\-\+]+)
However, given the string above, it would match the entire string, and not the first list of consecutive characters.
If it makes a difference, this is in Ruby (1.9.3).
Not sure about the ruby bit, but you can do this with backreferences in the pattern:
(.)\1+
What this does is to use a capturing group () to capture any character . followed by any number + of the same character \1. The \1 is a backreference to the the first captured group; in a pattern with more capturing groups \2 would be the second captured group and so on.
Java Example
Pattern p = Pattern.compile("(.)\\1+");
Matcher m = p.matcher("aaabbccaa");
m.find();
System.out.println(m.group(0)); // prints "aaa"
Ruby Example
# Return an array of matched patterns.
string = '+++>><<<--'
string.scan( /((.)\2+)/ ).collect { |match| match.first }

Regexp for specific matching of character string

I need a regex to match something like
"4f0f30500be4443126002034"
and
"4f0f30500be4443126002034>4f0f31310be4443126005578"
but not like
"4f0f30500be4443126002034>4f0f31310be4443126005578>4f0f31310be4443126005579"
Try:
^[\da-f]{24}(>[\da-f]{24})?$
[\da-f]{24} is exactly 24 characters consisting only of 0-9, a-f. The whole pattern is one such number optionally followed by a > and a second such number.
I think you want something like:
/^[0-9a-f]{24}(>[0-9a-f]{24})?$/
That matches 24 characters in the 0-9a-f range (which matches your first string) followed by zero or one strings starting with a >, followed by 24 characters in the 0-9a-f range (which matches your second string). Here's a RegexPal for this regex.
Don't need a regex.
str = "4f0f30500be4443126002034>4f0f31310be4443126005578"
match = str.count('>') < 2
match will be set to true for matches where there are 1 or 0 '>' in the string. Otherwise match is set to false.

Regex in xpath?

I want to find a table cell that contains the link (\d{0,3} )?pieces.
How would I need to write this xpath?
Can I simply insert the xpath directly into the Capybara search? Or do I need to do something special to indicate it is a regex? Or can I not do it at all?
Xpath 1.0
XPath 1.0 does not include regular expression support. You should be able to achieve the desired match with the following expression:
//td/a['pieces'=substring(#href, string-length(#href) -
string-length('pieces') + 1) and
'pieces'=translate(#href, '0123456789', '') and
string-length(#href) > 5 and
string-length(#href) < 10]
The first test in the predicate checks that the string ends with pieces. The second test ensures that the entire string equals pieces when all of the digits are removed (i.e. there are no other characters). The final two tests ensure that the entire length of the string is between 6 and 9, which is the length of pieces plus zero to three digits.
Test it on the following document:
<table>
<tr>
<td>test0</td>
<td>no match</td>
<td>no match</td>
<td>test1</td>
<td>test2</td>
<td>no match</td>
<td>test3</td>
</tr>
</table>
It should match only the test0, test1, test2, and test3 links.
(Note: The expression may be further complicated by the possibility of other characters preceding the portion you're attempting to match.)
XPath 2.0
Achieving this in XPath 2.0 is trivial with the matches function.
//td/a[
substring-after(concat(#href ,'x') ,'pieces')='x'
and
111>=concat(0 ,translate( substring-before(#href ,'pieces') ,'0123456789 -.' ,'1111111111xxx'))
]
This is another solution, not necessarily better, but, perhaps, interesting.
The first conjunct is true just when #href contains exactly one occurrence
of 'pieces', and it is at the end.
The second conjunct is true just when the part of #href before 'pieces' is empty
or is a numeral made entirely of digits (no .,-, or white-space), with at most 3 digits.
The number of 1's in the '111>=' is the maximum number of digits that will match.
Reference: http://www.w3.org/TR/xpath
The substring-after function returns the substring of the first argument string that follows the first occurrence of the second argument string in the first argument string, or the empty string if the first argument string does not contain the second argument string.
The substring-before function returns the substring of the first argument string that precedes the first occurrence of the second argument string in the first argument string, or the empty string if the first argument string does not contain the second argument string.
... a string that consists of optional whitespace followed by an optional minus sign followed by a Number followed by whitespace is converted to the IEEE 754 number ... any other string is converted to NaN
Number ::= Digits ('.' Digits?)? | '.' Digits
An attribute node has a string-value. The string-value is the normalized value as specified by the XML Recommendation [XML]
The normalize-space function returns the argument string with whitespace normalized by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space.

How to insert tag every 5 characters in a Ruby String?

I would like to insert a <wbr> tag every 5 characters.
Input: s = 'HelloWorld-Hello guys'
Expected outcome: Hello<wbr>World<wbr>-Hell<wbr>o guys
s = 'HelloWorld-Hello guys'
s.scan(/.{5}|.+/).join("<wbr>")
Explanation:
Scan groups all matches of the regexp into an array. The .{5} matches any 5 characters. If there are characters left at the end of the string, they will be matched by the .+. Join the array with your string
There are several options to do this. If you just want to insert a delimiter string you can use scan followed by join as follows:
s = '12345678901234567'
puts s.scan(/.{1,5}/).join(":")
# 12345:67890:12345:67
.{1,5} matches between 1 and 5 of "any" character, but since it's greedy, it will take 5 if it can. The allowance for taking less is to accomodate the last match, where there may not be enough leftovers.
Another option is to use gsub, which allows for more flexible substitutions:
puts s.gsub(/.{1,5}/, '<\0>')
# <12345><67890><12345><67>
\0 is a backreference to what group 0 matched, i.e. the whole match. So substituting with <\0> effectively puts whatever the regex matched in literal brackets.
If whitespaces are not to be counted, then instead of ., you want to match \s*\S (i.e. a non whitespace, possibly preceded by whitespaces).
s = '123 4 567 890 1 2 3 456 7 '
puts s.gsub(/(\s*\S){1,5}/, '[\0]')
# [123 4 5][67 890][ 1 2 3 45][6 7]
Attachments
Source code and output on ideone.com
References
regular-expressions.info
Finite Repetition, Greediness
Character classes
Grouping and Backreferences
Dot Matches (Almost) Any Character
Here is a solution that is adapted from the answer to a recent question:
class String
def in_groups_of(n, sep = ' ')
chars.each_slice(n).map(&:join).join(sep)
end
end
p 'HelloWorld-Hello guys'.in_groups_of(5,'<wbr>')
# "Hello<wbr>World<wbr>-Hell<wbr>o guy<wbr>s"
The result differs from your example in that the space counts as a character, leaving the final s in a group of its own. Was your example flawed, or do you mean to exclude spaces (whitespace in general?) from the character count?
To only count non-whitespace (“sticking” trailing whitespace to the last non-whitespace, leaving whitespace-only strings alone):
# count "hard coded" into regexp
s.scan(/(?:\s*\S(?:\s+\z)?){1,5}|\s+\z/).join('<wbr>')
# parametric count
s.scan(/\s*\S(?:\s+\z)?|\s+\z/).each_slice(5).map(&:join).join('<wbr>')

Resources