Finding the first duplicate character in the string Ruby - ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.

The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end

s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }

Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end

I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).

You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}

I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c

As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

Related

ruby: Grab numbers only within quotes

I would like the following sub-string
1100110011110000
from
foo = "bar9-9 '11001100 11110000 A'A\n"
I have so far used the below, which yields
puts foo.split(',').map!(&:strip)[0].gsub(/\D/, '')
>> 991100110011110000
Getting rid of the 2 leading 9's is not too difficult in this scenario, but I would like a general solution which grabs numbers only within the ' ' single quotes
You can find the quoted part first with scan and then remove non-digits:
> results = "bar9-9 '11001100 11110000 A'A\n".scan(/'[^']*'/).map{|m| m.gsub(/\D/, '')}
# => ["1100110011110000"]
> results[0]
# => "1100110011110000"
The zeros and ones within the quoted string can be extracted using String#gsub with a regular expression, as opposed to methods that convert the string to an array of strings, modify the array and converted it back to a string. Here are three ways of doing that.
str ="bar9-9 '11001100 11110000 A'A\n"
#1: Extract the substring of interest and then remove characters other than zero and one
def extract(str)
str[str.index("'")+1, str.rindex("'")-1].gsub(/[^01]/,'')
end
extract str
#=> "1100110011110000"
#2 Use a flag to indicate when zeroes and ones are to be kept
def extract(str)
found = false
str.gsub(/./m) do |c|
found = !found if c == "'"
(found && (c =~ /[01]/)) ? c : ''
end
end
extract str
#=> "1100110011110000"
Here the regular expression requires the m modifier (to enable multiline mode) in order to convert the newline character to an empty string. (One could alternatively write str.chomp.gsub(/./)....)
Notice that this second method works when there are multiple single-quoted substrings.
extract "bar9-9 '11001100 11110000 A'A'10x1y'\n"
#=> "1100110011110000101"
#3 Use the flip-flop operator (variant of #2)
def extract(str)
str.gsub(/./m) do |c|
next '' if (c=="'") .. (c=="'")
c =~ /[01]/ ? c : ''
end
end
extract str
#=> "1100110011110000"
extract "bar9-9 '11001100 11110000 A'A'10x1y'\n"
#=> "1100110011110000101"
foo.slice(/'.*?'/).scan(/\d+/).join
#=> "1100110011110000"

Simple regex - ignoring certain characters

I'm trying to use the match method with an argument of a regex to select a valid phone number, by definition, any string with nine digits.
For example:
9347584987 is valid,
(456)322-3456 is valid,
(324)5688890 is valid.
But
(340)HelloWorld is NOT valid and
456748 is NOT valid.
So far, I'm able to use \d{9} to select the example string of 9 digit characters in a row, but I'm not sure how to specifically ignore any character, such as '-' or '(' or ')' in the middle of the sequence.
What kind of Regex could I use here?
Given:
nums=['9347584987','(456)322-3456','(324)5688890','(340)HelloWorld', '456748 is NOT valid']
You can split on a NON digit and rejoin to remove non digits:
> nums.map {|s| s.split(/\D/).join}
["9347584987", "4563223456", "3245688890", "340", "456748"]
Then filter on the length:
> nums.map {|s| s.split(/\D/).join}.select {|s| s.length==10}
["9347584987", "4563223456", "3245688890"]
Or, you can grab a group of numbers that look 'phony numbery' by using a regex to grab digits and common delimiters:
> nums.map {|s| s[/[\d\-()]+/]}
["9347584987", "(456)322-3456", "(324)5688890", "(340)", "456748"]
And then process that list as above.
That would delineate:
> '123 is NOT a valid area code for 456-7890'[/[\d\-()]+/]
=> "123" # no match
vs
> '123 is NOT a valid area code for 456-7890'.split(/\D/).join
=> "1234567890" # match
I suggest using one regular expression for each valid pattern rather than constructing a single regex. It would be easier to test and debug, and easier to maintain the code. If, for example, "123-456-7890" or 123-456-7890 x231" were in future deemed valid numbers, one need only add a single, simple regex for each to the array VALID_PATTERS below.
VALID_PATTERS = [/\A\d{10}\z/, /\A\(\d{3}\)\d{3}-\d{4}\z/, /\A\(\d{3}\)\d{7}\z/]
def valid?(str)
VALID_PATTERS.any? { |r| str.match?(r) }
end
ph_nbrs = %w| 9347584987 (456)322-3456 (324)5688890 (340)HelloWorld 456748 |
ph_nbrs.each { |s| puts "#{s.ljust(15)} \#=> #{valid?(s)}" }
9347584987 #=> true
(456)322-3456 #=> true
(324)5688890 #=> true
(340)HelloWorld #=> false
456748 #=> false
String#match? made its debut in Ruby v2.4. There are many alternatives, including str.match(r) and str =~ r.
"9347584987" =~ /(?:\d.*){9}/ #=> 0
"(456)322-3456" =~ /(?:\d.*){9}/ #=> 1
"(324)5688890" =~ /(?:\d.*){9}/ #=> 1
"(340)HelloWorld" =~ /(?:\d.*){9}/ #=> nil
"456748" =~ /(?:\d.*){9}/ #=> nil
Pattern: (Rubular Demo)
^\(?\d{3}\)?\d{3}-?\d{4}$ # this makes the expected symbols optional
This pattern will ensure that an opening ( at the start of the string is followed by 3 numbers the a closing ).
^(\(\d{3}\)|\d{3})\d{3}-?\d{4}$
On principle, though, I agree with melpomene in advising that you remove all non-digital characters, test for 9 character length, then store/handle the phone numbers in a single/reliable/basic format.

Regex too Capture certain words at start of string Ruby

Looking for help in writing a regex for capturing whether a particular string starts with certain strings and capture the start and remaining string. E.g
Let's say the possible starts of strings are 'P', 'RO', 'RPX' and the sample string is 'PIXR' or 'ROXP' or 'RPX'.
I am looking to write a regex which captures the start and trailing part of string if it starts with the given possible strings e.g
'PIXRT' =~ // outputs 'P' and 'IXRT'
Not very conversant with regexes so any help is really appreciated.
You may use a regex with 2 capturing groups, one capturing the known values at the start and the rest will capture the rest of the string:
rx = /\A(RPX|RO|P)(.*)/m
"PIXRT".scan(rx)
# => [P, IXRT]
See the Ruby demo
Details:
\A - start of string
(RPX|RO|P) - one of the values that must be at the start of the string (mind the order of these alternatives: the longer ones come first!)
(.*) - any 0+ chars up to the end of the string (m modifier will make . match line breaks, too).
def split_after_start_string(str, *start_strings)
a = str.split(/(?<=\A#{start_strings.join('|')})/)
if a.size == 2
a
elsif start_strings.include?(str)
a << ''
else
nil
end
end
start_strings = %w| P RO RPX | #=> ["P", "RO", "RPX"]
split_after_start_string('PIXR', *start_strings) #=> ["P", "IXR"]
split_after_start_string('IPXR', *start_strings) #=> nil
split_after_start_string('ROXP', *start_strings) #=> ["RO", "XP"]
split_after_start_string('RPX', *start_strings) #=> ["RPX", ""]
The regex reads, "match one element of start_stringx at the beginning of the string in a positive lookbehind". For smart_strings in the examples, the regex is:
/(?<=\A#{start_strings.join('|')})/ #=> /(?<=\AP|RO|RPX)/

Stuck in Abbreviation implementation to ruby string

I want to convert all the words(alphabetic) in the string to their abbreviations like i18n does. In other words I want to change "extraordinary" into "e11y" because there are 11 characters between the first and the last letter in "extraordinary". It works with a single word in the string. But how can I do the same for a multi-word string? And of course if a word is <= 4 there is no point to make an abbreviation from it.
class Abbreviator
def self.abbreviate(x)
x.gsub(/\w+/, "#{x[0]}#{(x.length-2)}#{x[-1]}")
end
end
Test.assert_equals( Abbreviator.abbreviate("banana"), "b4a", Abbreviator.abbreviate("banana") )
Test.assert_equals( Abbreviator.abbreviate("double-barrel"), "d4e-b4l", Abbreviator.abbreviate("double-barrel") )
Test.assert_equals( Abbreviator.abbreviate("You, and I, should speak."), "You, and I, s4d s3k.", Abbreviator.abbreviate("You, and I, should speak.") )
Your mistake is that your second parameter is a substitution string operating on x (the original entire string) as a whole.
Instead of using the form of gsub where the second parameter is a substitution string, use the form of gsub where the second parameter is a block (listed, for example, third on this page). Now you are receiving each substring into your block and can operate on that substring individually.
def short_form(str)
str.gsub(/[[:alpha:]]{4,}/) { |s| "%s%d%s" % [s[0], s.size-2, s[-1]] }
end
The regex reads, "match four or more alphabetic characters".
short_form "abc" # => "abc"
short_form "a-b-c" #=> "a-b-c"
short_form "cats" #=> "c2s"
short_form "two-ponies-c" #=> "two-p4s-c"
short_form "Humpty-Dumpty, who sat on a wall, fell over"
#=> "H4y-D4y, who sat on a w2l, f2l o2r"
I would recommend something along the lines of this:
class Abbreviator
def self.abbreviate(x)
x.gsub(/\w+/) do |word|
# Skip the word unless it's long enough
next word unless word.length > 4
# Do the same I18n conversion you do before
"#{word[0]}#{(word.length-2)}#{word[-1]}"
end
end
end
The accepted answer isn't bad, but it can be made a lot simpler by not matching words that are too short in the first place:
def abbreviate(str)
str.gsub(/([[:alpha:]])([[:alpha:]]{3,})([[:alpha:]])/i) { "#{$1}#{$2.size}#{$3}" }
end
abbreviate("You, and I, should speak.")
# => "You, and I, s4d s3k."
Alternatively, we can use lookbehind and lookahead, which makes the Regexp more complex but the substitution simpler:
def abbreviate(str)
str.gsub(/(?<=[[:alpha:]])[[:alpha:]]{3,}(?=[[:alpha:]])/i, &:size)
end

Splitting string based on word

I have a string composed by words divided by'#'. For instance 'this#is#an#example' and I need to extract the last word or the last two words according to the second to last word.
If the second to last is 'myword' I need the last two words otherwise just the last one.
'this#is#an#example' => 'example'
'this#is#an#example#using#myword#also' => 'myword#also'
Is there a better way than splitting and checking the second to last? perhaps using regular expression?
Thanks.
You can use the end-of-line anchor $ and make the myword# prefix optional:
str = 'this#is#an#example'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "example"
str = 'this#is#an#example#using#myword#also'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "myword#also"
However, I don't think using a regular expression is "better" in this case. I would use something like Santosh's (deleted) answer: split the line by # and use an if clause.
def foo(str)
*, a, b = str.split('#')
if a == 'myword'
"#{a}##{b}"
else
b
end
end
str = 'this#is#an#example#using#myword#also'
array = str.split('#')
array[-2] == 'myword' ? array[-2..-1].join('#') : array[-1]
With regex:
'this#is#an#example'[/(myword\#)*\w+$/]
# => "example"
'this#is#an#example#using#myword#also'[/(myword\#)*\w+$/]
# => "myword#also"

Resources