Ruby regex match starting at specific position - ruby

In Python I can do this:
import re
regex = re.compile('a')
regex.match('xay',1) # match because string starts with 'a' at 1
regex.match('xhay',1) # no match because character at 1 is 'h'
However in Ruby, the match method seems to match everything past the positional argument. For instance, /a/.match('xhay',1) will return a match, even though the match actually starts at 2. However, I want to only consider matches that start at a specific position.
How do I get a similar mechanism in Ruby? I would like to match patterns that start at a specific position in the string as I can in Python.

/^.{1}a/
for matching a at location x+1 in the string
/^.{x}a/
--> DEMO

How about below using StringScanner ?
require 'strscan'
scanner = StringScanner.new 'xay'
scanner.pos = 1
!!scanner.scan(/a/) # => true
scanner = StringScanner.new 'xnnay'
scanner.pos = 1
!!scanner.scan(/a/) # => false

Regexp#match has an optional second parameter pos, but it works like Python's search method. You could however check if the returned MatchData begins at the specified position:
re = /a/
match_data = re.match('xay', 1)
match_data.begin(0) == 1
#=> true
match_data = re.match('xhay', 1)
match_data.begin(0) == 1
#=> false
match_data = re.match('áay', 1)
match_data.begin(0) == 1
#=> true
match_data = re.match('aay', 1)
match_data.begin(0) == 1
#=> true

Extending a little bit on what #sunbabaphu answered:
def matching_at_pos(x=0, regex)
/\A.{#{x-1}}#{regex}/
end # note the position is 1 indexed
'xxa' =~ matching_at_pos(2, /a/)
=> nil
'xxa' =~ matching_at_pos(3, /a/)
=> 0
'xxa' =~ matching_at_pos(4, /a/)
=> nil

The answer to this question is \G.
\G matches the starting point of the regex match, when calling the two-argument version of String#match that takes a starting position.
'xay'.match(/\Ga/, 1) # match because /a/ starts at 1
'xhay'match(/\Ga/, 1) # no match because character at 1 is 'h'

Related

Ruby, looping through a string deleting groups of characters until a desired output is achieved

I have a coding problem I solved and want to refactor. I know there has to be a cleaner way of doing what I did.
The goal is to write a method that takes a string of "!" and "?" and reduces the string by eliminating all odd groupings of each symbol.
Example - a string "????!!!" would have an odd grouping of "!!!" because there are three in a row. These would be deleted from the string.
If there is only one "!" or "?" its left because it is not in a group.
Ex -
remove("!????!!!?") answer == "!"
# => ("!????!!!?" --> "!?????" --> "!")
In the first string, the only odd grouping is "!!!", once removed, it leaves a new string with an odd grouping "?????". You remove the next odd grouping so you're left with "!". This fits the desired output.
Another example
remove("!???!!") == ""
# => ("!???!!" --> "!!!" --> "")
Current code:
def remove(s)
arr = [s]
i = 0
until i == arr[0].length
s = s.chars.chunk{|c|c}.map{ |n,a| a.join }.select{|x| x if x.length.even? || x.length <= 1}.join
arr << s
i += 1
end
return arr[-1]
end
My code solves this problem and all test cases. I have a suspicion that my until loop can be removed/refactored so that I could solve this problem in one line and have spent hours trying to figure it out with no luck.
Suppose
str = "???!!!???!"
If we first remove the two groups "???" we are left with "!!!!", which cannot be reduced further.
If we first remove the group "!!!" we are left with "??????!", which cannot be reduced further.
If we are permitted to remove all odd groups of either character without reference to the effect that either has on the other, we obtain !, which cannot be reduced further.
It's not clear what rule is to be used. Here are three possibilities and code to implement each.
I will use the following two regular expressions, and in the first two cases a helper method.
Rq = /
(?<!\?) # do not match a question mark, negative lookbehind
\? # match a question mark
(\?{2})+ # match two question marks one or more times
(?!\?) # do not match a question mark, negative lookahead
/x # free-spacing regex definition mode
which is commonly written /(?<!\?)\?(\?{2})+(?!\?)/.
Similarly,
Rx = /(?<!!)!(!{2})+(?!!)/
def sequential(str, first_regex, second_regex)
s = str.dup
loop do
size = s.size
s = s.gsub(first_regex,'').gsub(second_regex,'')
return s if s.size == size
end
end
I apply each of the three methods below to two example strings:
str1 = "???!!!???!"
str2 = 50.times.map { ['?', '!'].sample }.join
#=> "?!!!?!!!?!??????!!!?!!??!!???!?!????!?!!!?!?!???!?"
Replace all odd groups of "?" then odd groups of "!" then repeat until no further removals are possible
def question_before_exclamation(str)
sequential(str, Rq, Rx)
end
question_before_exclamation str1 #=> "!!!!"
question_before_exclamation str2 #=> "??!??!?!!?!?!!?"
Replace all odd groups of "!" then odd groups of "?" then repeat until no further removals are possible
def exclamation_before_question(str)
sequential(str, Rx, Rq)
end
exclamation_before_question str1 #=> "??????!"
exclamation_before_question str2 #=> "??!????!!?!?!!?!?!!?"
Replace all odd groups of both "?" and "!" then repeat until no further removals are possible
Rqx = /#{Rq}|#{Rx}/
#=> /(?-mix:(?<!\?)\?(\?{2})+(?!\?))|(?-mix:(?<!!)!(!{2})+(?!!))/
def question_and_explanation(str)
s = str.dup
loop do
size = s.size
s = s.gsub(Rqx,'')
return s if s.size == size
end
end
question_and_explanation str1 #=> "!"
question_and_explanation str2 #=> "??!?!!?!?!!?!?!!?"
I don't know the exact Ruby syntax for this, but you could simplify your solution by using regular expressions:
Gather all matches of consecutive characters
if all matches are of even length or 1 exit
Test if matches are an odd length
if an odd length, replace with the empty string
else do nothing
Goto step 1
A solution in Perl would be:
#!perl
use strict;
use warnings;
use feature qw(say);
my $string = '!????!!!?';
sub reduce {
my ($s) = #_;
while ( my #matches = $s =~ m/((.)\2+)/g ) {
last if ! grep { length($_) > 1 && length($_) % 2 == 1 } #matches;
foreach my $match ( #matches ) {
$s =~ s/\Q$match// if length($match) > 1 && length($match) % 2 == 1;
}
}
return $s;
}
say reduce($string);
I could be wrong (this is ruby, after all) but I don't think you'll find a one-liner for this because ruby's utility functions generally aren't recursive. But you can use regex to simplify your logic, at the very least:
def remove(s)
while s =~ /(?<!\!)\!([\!]{2})+(?!\!)/ || s =~ /(?<!\?)\?([\?]{2})+(?!\?)/
s.gsub! /(?<!\!)\!([\!]{2})+(?!\!)/, "" # remove odd !
s.gsub! /(?<!\?)\?([\?]{2})+(?!\?)/, "" # remove odd ?
end
return s
end
To make the regex less mind-boggling, it helps to look at them with 'a' instead of '?' and '!':
/(?<!a)a([a]{2})+(?!a)/ #regex for 'a'
(?<!a) #negative lookbehind: the match cannot start with an 'a'
a([a]{2})+ #the match should be an 'a' followed by 1 or more pairs
(?!a) #negative lookahead: the match cannot end with an 'a'
It should be simple enough with a regular expression replacement
def remove(string)
begin
original = string
string.gsub!(/(\!{3,})|(\?{3,})/) { |s| s.length.even? ? s : '' }
end until original == string
string
end
puts remove("!????!!!?").inspect # answer == "!"
puts remove("!???!!").inspect # answer == ""
puts remove("!????!!").inspect # answer == "!????!!"

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

Ruby regex checks string for variations of pattern of same length

I was wondering how you construct the regular expression to check if the string has a variation of a pattern with the same length. Say the string is "door boor robo omanyte" how do I return the words that have the variation of [door]?
You can easily get all the possible words using Array#permutation. Then you can scan for them in provided string. Here:
possible_words = %w[d o o r].permutation.map &:join
# => ["door", "doro", "door", "doro", "droo", "droo", "odor", "odro", "oodr", "oord", "ordo", "orod", "odor", "odro", "oodr", "oord", "ordo", "orod", "rdoo", "rdoo", "rodo", "rood", "rodo", "rood"]
string = "door boor robo omanyte"
string.scan(possible_words.join("|"))
# => ["door"]
string = "door close rood example ordo"
string.scan(possible_words.join("|"))
# => ["door", "rood", "ordo"]
UPDATE
You can improve scan further by looking for word boundary. Here:
string = "doorrood example ordo"
string.scan(/"\b#{possible_words.join('\b|\b')}\b"/)
# => ["ordo"]
NOTE
As Cary correctly pointed out in comments below, this process is quite inefficient if you intend to find permutation for a fairly large string. However it should work fine for OP's example.
If the comment I left on your question correctly interprets the question, you could do this:
str = "door sit its odor to"
str.split
.group_by { |w| w.chars.sort.join }
.values
.select { |a| a.size > 1 }
#=> [["door", "odor"], ["sit", "its"]]
This assumes all the letters are the same case.
If case is not important, just make a small change:
str = "dooR sIt itS Odor to"
str.split
.group_by { |w| w.downcase.chars.sort.join }
.values
.select { |a| a.size > 1 }
#=> [["dooR", "Odor"], ["sIt", "itS"]]
In my opinion the fastest way to find this will be
word_a.chars.sort == word_b.chars.sort
since we are using the same characters inside the words
IMO, some kind of iteration is definitely necessary to build a regular expression to match this one. Not using a regular expression is better too.
def variations_of_substr(str, sub)
# Creates regexp to match words with same length, and
# with same characters of str.
patt = "\\b" + ( [ "[#{sub}]{1}" ] * sub.size ).join + "\\b"
# Above alone won't be enough, characters in both words should
# match exactly.
str.scan( Regexp.new(patt) ).select do |m|
m.chars.sort == sub.chars.sort
end
end
variations_of_substr("door boor robo omanyte", "door")
# => ["door"]

Get last character in string

I want to get the last character in a string MY WAY - 1) Get last index 2) Get character at last index, as a STRING. After that I will compare the string with another, but I won't include that part of code here. I tried the code below and I get a strange number instead. I am using ruby 1.8.7.
Why is this happening and how do I do it ?
line = "abc;"
last_index = line.length-1
puts "last index = #{last_index}"
last_char = line[last_index]
puts last_char
Output-
last index = 3
59
Ruby docs told me that array slicing works this way -
a = "hello there"
a[1] #=> "e"
But, in my code it does not.
UPDATE:
I keep getting constant up votes on this, hence the edit. Using [-1, 1] is correct, however a better looking solution would be using just [-1]. Check Oleg Pischicov's answer.
line[-1]
# => "c"
Original Answer
In ruby you can use [-1, 1] to get last char of a string. Here:
line = "abc;"
# => "abc;"
line[-1, 1]
# => ";"
teststr = "some text"
# => "some text"
teststr[-1, 1]
# => "t"
Explanation:
Strings can take a negative index, which count backwards from the end
of the String, and an length of how many characters you want (one in
this example).
Using String#slice as in OP's example: (will work only on ruby 1.9 onwards as explained in Yu Hau's answer)
line.slice(line.length - 1)
# => ";"
teststr.slice(teststr.length - 1)
# => "t"
Let's go nuts!!!
teststr.split('').last
# => "t"
teststr.split(//)[-1]
# => "t"
teststr.chars.last
# => "t"
teststr.scan(/.$/)[0]
# => "t"
teststr[/.$/]
# => "t"
teststr[teststr.length-1]
# => "t"
Just use "-1" index:
a = "hello there"
a[-1] #=> "e"
It's the simplest solution.
If you are using Rails, then apply the method #last to your string, like this:
"abc".last
# => c
You can use a[-1, 1] to get the last character.
You get unexpected result because the return value of String#[] changed. You are using Ruby 1.8.7 while referring the the document of Ruby 2.0
Prior to Ruby 1.9, it returns an integer character code. Since Ruby 1.9, it returns the character itself.
String#[] in Ruby 1.8.7:
str[fixnum] => fixnum or nil
String#[] in Ruby 2.0:
str[index] → new_str or nil
In ruby you can use something like this:
ending = str[-n..-1] || str
this return last n characters
Using Rails library, I would call the method #last as the string is an array. Mostly because it's more verbose..
To get the last character.
"hello there".last() #=> "e"
To get the last 3 characters you can pass a number to #last.
"hello there".last(3) #=> "ere"
Slice() method will do for you.
For Ex
"hello".slice(-1)
# => "o"
Thanks
Your code kinda works, the 'strange number' you are seeing is ; ASCII code. Every characters has a corresponding ascii code ( https://www.asciitable.com/). You can use for conversationputs last_char.chr, it should output ;.

Splitting string based on word

I have a string composed by words divided by'#'. For instance 'this#is#an#example' and I need to extract the last word or the last two words according to the second to last word.
If the second to last is 'myword' I need the last two words otherwise just the last one.
'this#is#an#example' => 'example'
'this#is#an#example#using#myword#also' => 'myword#also'
Is there a better way than splitting and checking the second to last? perhaps using regular expression?
Thanks.
You can use the end-of-line anchor $ and make the myword# prefix optional:
str = 'this#is#an#example'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "example"
str = 'this#is#an#example#using#myword#also'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "myword#also"
However, I don't think using a regular expression is "better" in this case. I would use something like Santosh's (deleted) answer: split the line by # and use an if clause.
def foo(str)
*, a, b = str.split('#')
if a == 'myword'
"#{a}##{b}"
else
b
end
end
str = 'this#is#an#example#using#myword#also'
array = str.split('#')
array[-2] == 'myword' ? array[-2..-1].join('#') : array[-1]
With regex:
'this#is#an#example'[/(myword\#)*\w+$/]
# => "example"
'this#is#an#example#using#myword#also'[/(myword\#)*\w+$/]
# => "myword#also"

Resources