Regular expression to match exact substring at the beginning of string - ruby

I have one substring to be matched exactly at the beginning of source string.
source_string = "This is mat. This is cat."
substring1 = "This is"
substring2 = "That is"
source_string.match(/^(#{substring1}|#{substring2})$/)
This is what I tried it should work like this, if exact 'This is' or 'That is' is there at the beginning of string it should match, doesn't matter what is there after those substrings in source_string. My code is giving nil even if 'This is' is present.

I would not use a regex:
[substring1, substring2].any? { |sub| source_string.start_with?(sub) }
#=> true

Remove $ at the end of the regular expression pattern.
source_string.match(/^(#{substring1}|#{substring2})$/)
↑
By appending $, it requires the pattern ends with This is or That is. You only need ^ at the beginning.
source_string = "This is mat. This is cat."
substring1 = "This is"
substring2 = "That is"
source_string.match(/^(#{substring1}|#{substring2})/)
# => #<MatchData "This is" 1:"This is">

While #falsetru is right about the core problem, the regexp is actually still wrong. Whilst the goal is to match a pattern at a beginning of source string, not at the beginning of each line, the \A modifier should be used (see Regexp::Anchors for details):
source_string = <<-STR
Not to be matched.
This is cat.
STR
source_string.match(/^This is/) # this should not be matched!
#⇒ #<MatchData "This is">
source_string.match(/\AThis is/)
#⇒ nil

Related

How to get a substring from the beginning till the end of a word that starts with a with a given substring - Ruby

In Ruby, I'm trying to get a substring from the beginning till the end of a word that begins with some string.
For example:
a = "Metrics testSomeMetrics gets initial metrics data"
I also have a string that is a substring of a.
For example:
b = "test".
"test" appears in the second word of the string a.
I need to return a substring from the beginning of a till the end of the word with test
in it.
In this example I need to return: "Metrics testSomeMetrics"
Use Regexp:
a = 'Metrics testSomeMetrics gets initial metrics data'
b = 'test'
a.match(/^.*#{b}\w*/).to_s
Where:
^ — start of the string.
.* — zero or more of any single character.
#{b} — your string.
\w* — zero or more of any word character.
UPDATE
Add \b to get /^.*\b#{b}\w*/ so that it b will be exactly a start of a new string.
I have not been able to find a regular expression that works here. One could do the following, however:
def get_juicy_bit(str, word)
str.match(/\b#{word}\S+/) { |md| str[0...md.end(0)] }
end
word = "test"
get_juicy_bit("Metrics testSomeMetrics gets data", word)
#=> "Metrics testSomeMetrics"
get_juicy_bit("Metrics testSomeMetrics gets data", word)
#=> "Metrics testSomeMetrics"
get_juicy_bit("Metrics donottestMetrics gets data", word)
#=> nil
get_juicy_bit("testMetrics gets data", word)
#=> "testMetrics"
get_juicy_bit(" testMetrics gets data", word)
#=> " testMetrics"
See MatchData#end. The regular expression /\b#{word}\S+/ reads, "match a word break (\b) followed by the value of he variable word followed by one or more characters other than whitespace". Here a word break is a character other than a word character (a letter, digit or underscore) or the beginning of the string.
One way to go is:
First, split your string into a array of words. Then find the index of first word that includes the pattern. Finally from the found index create a sub-string of the original (start from 0 to the found index).
a = "Metrics testSomeMetrics gets initial metrics data"
b = "test"
words = a.split(" ")
index = words.find_index { |word| word.include?(b) }
return "" unless index
words.slice(0..index).join(" ")

Delete all the whitespaces that occur after a word in ruby

I have a string " hello world! How is it going?"
The output I need is " helloworld!Howisitgoing?"
So all the whitespaces after hello should be removed. I am trying to do this in ruby using regex.
I tried strip and delete(' ') methods but I didn't get what I wanted.
some_string = " hello world! How is it going?"
some_string.delete(' ') #deletes all spaces
some_string.strip #removes trailing and leading spaces only
Please help. Thanks in advance!
There are numerous ways this could be accomplished without without a regular expressions, but using them could be the "cleanest" looking approach without taking sub-strings, etc. The regular expression I believe you are looking for is /(?!^)(\s)/.
" hello world! How is it going?".gsub(/(?!^)(\s)/, '')
#=> " helloworld!Howisitgoing?"
The \s matched any whitespace character (including tabs, etc), and the ^ is an "anchor" meaning the beginning of the string. The ! indicates to reject a match with following criteria. Using those together to your goal can be accomplished.
If you are not familiar with gsub, it is very similar to replace, but takes a regular expression. It additionally has a gsub! counter-part to mutate the string in place without creating a new altered copy.
Note that strictly speaking, this isn't all whitespace "after a word" to quote the exact question, but I gathered from your examples that your intentions were "all whitespace except beginning of string", which this will do.
def remove_spaces_after_word(str, word)
i = str.index(/\b#{word}\b/i)
return str if i.nil?
i += word.size
str.gsub(/ /) { Regexp.last_match.begin(0) >= i ? '' : ' ' }
end
remove_spaces_after_word("Hey hello world! How is it going?", "hello")
#=> "Hey helloworld!Howisitgoing?"

Extract all words with # symbol from a string

I need to extract all #usernames from a string(for twitter) using rails/ruby:
String Examples:
"#tom #john how are you?"
"how are you #john?"
"#tom hi"
The function should extract all usernames from a string, plus without special characters disallowed for usernames... as you see "?" in an example...
From "Why can't I register certain usernames?":
A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. Check to make sure your desired username doesn't contain any symbols, dashes, or spaces.
The \w metacharacter is equivalent to [a-zA-Z0-9_]:
/\w/ - A word character ([a-zA-Z0-9_])
Simply scanning for #\w+ will succeed according to that:
strings = [
"#tom #john how are you?",
"how are you #john?",
"#tom hi",
"#foo #_foo #foo_ #foo_bar #f123bar #f_123_bar"
]
strings.map { |s| s.scan(/#\w+/) }
# => [["#tom", "#john"],
# ["#john"],
# ["#tom"],
# ["#foo", "#_foo", "#foo_", "#foo_bar", "#f123bar", "#f_123_bar"]]
There are multiple ways to do it - here's one way:
string = "#tom #john how are you?"
words = string.split " "
twitter_handles = words.select do |word|
word.start_with?('#') && word[1..-1].chars.all? do |char|
char =~ /[a-zA-Z1-9\_]/
end && word.length > 1
end
The char =~ regex will only accept alphaneumerics and the underscore
r = /
# # match character
[[[:alpha:]]]+ # match one or more letters
\b # match word break
/x # free-spacing regex definition mode
"#tom #john how are you? And you, #andré?".scan(r)
#=> ["#tom", "#john", "#andré"]
If you wish to instead return
["tom", "john", "andré"]
change the first line of the regex from # to
(?<=#)
which is a positive lookbehind. It requires that the character "#" be present but it will not be part of the match.

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

Regex to match exact word in string

I've looked around but haven't been able to find a working solution to my problem.
I have an array of two strings input and want to test which element of the array contains an exact substring Test.
One thing I have tried (among numerous other attempts):
input = ["Test's string", "Test string"]
# Alternative input array that it needs to work on:
# ["Testing string", "some Test string"]
substring = "Test"
if (input[0].match(/\b#{substring}\b/))
puts "Test 0 "
# Do something...
elsif (input[1].match(/\b#{substring}\b/))
puts "Test 1"
# Do something different...
end
The desired result is a print of "Test 1". The input can be more complex but overall I am looking for a way to find an exact match of a substring in a longer string.
I feel like this should be a rather trivial regex but I haven't been able to come up with the correct pattern. Any help would be greatly appreciated!
Following code may be what you are looking for.
input = ["Testing string", "Test string"]
substring = "Test"
if (input[0].match(/[^|\s]#{substring}[\s|$]/)
puts "Test 0 "
elsif (input[1].match(/[^|\s]#{substring}[\s|$]/)
puts "Test 1"
end
The meaning of the pattern /[^|\s]#{substring}[\s|$]/ is
[^|\s] : left side of the substring is begining of string(^) or white space,
{substring} : subsring is matched exactly,
[\s|$] : right side of the substring is white space or end of string($).
One way to that is as follows:
input = ["Testing string", "Test"]
"Test #{ input.index { |s| s[/\bTest\b/] } }"
#=> "Test 1"
input = ["Test", "Testing string"]
"Test #{ input.index { |s| s[/\bTest\b/] } }"
#=> "Test 0"
\b is the regex denotes a word boundary.
Maybe you want a method to return the index of the first element of input that contains the word? That could be:
def matching_index(input, word)
input.index { |s| s[/\b#{word}\b/i] }
end
input = ["Testing string", "Test"]
matching_index(input, "Test") #=> 1
matching_index(input, "test") #=> 1
matching_index(input, "Testing") #=> 0
matching_index(input, "Testy") #=> nil
Then you could use it like this, for example:
word = 'Test'
puts "The matching element for '#{word}' is at index #{ matching_index(input, word) }"
#=> The matching element for 'Test' is at index 1
word = "Testing"
puts "The matching element for '#{word}' is '#{ input[matching_index(input, word)] }'"
#The matching element for 'Testing' is 'Testing string'
The problem is with your bounding. In your original question, the word Test will match the first string because the ' is will match the \b word boundary. It's a perfect match and is responding with "Test 0" correctly. You need to determine how you'll terminate your search. If your input contains special characters, I don't think the regex will work properly. /\bTest my $money.*/ will never match because the of the $ in your substring.
What happens if you have multiple matches in your input array? Do you want to do something to all of them or just the first one?

Resources