Match returns another outpu - ruby

after inserting input value "https://youtu.be/KMBBjzp5hdc" the code returns output value "https://youtu.be/"
str = gets.chomp.to_s
puts /.*[=\/]/.match(str)
I do not understand why as i would expect https:/
Thanks for advise!

[...] the code returns output value "https://youtu.be/" [...] I do not understand why as i would expect https:/
Your regexp /.*[=\/]/ matches:
.* zero or more characters
[=\/] followed by a = or / character
In your example string, there are 3 candidates that end with a / character: (and none ending with =)
https:/
https://
https://youtu.be/
Repetition like * is greedy by default, i.e. it matches as many characters as it can. From the 3 options above, it matches the longest one which is https://youtu.be/.
You can append a ? to make the repetition lazy, which results in the shortest match:
"https://youtu.be/KMBBjzp5hdc".match(/.*?[=\/]/)
#=> #<MatchData "https:/">

str = "https://youtu.be/KMBBjzp5hdc"
matches = str.match(/(https:\/\/youtu.be\/)(.+)/)
matches[1]
Outputs:
"https://youtu.be/"

Related

How to split a string in half, into two variables, in one statement?

I want to split str in half and assign each half to first and second
Like this pseudo code example:
first,second = str.split( middle )
class String
def halves
chars.each_slice(size / 2).map(&:join)
end
end
Will work, but you will need to adjust to how you want to handle odd-sized strings.
Or in-line:
first, second = str.chars.each_slice(str.length / 2).map(&:join)
first,second = str.partition(/.{#{str.size/2}}/)[1,2]
Explanation
You can use partition. Using a regex pattern to look for X amount of characters (in this case str.size / 2).
Partition returns three elements; head, match, and tail. Because we are matching on any character, the head will always be a blank string. So we only care about the match and tail hence [1,2]
Here are two ways to do that
rgx = /
(?<= # begin a positive lookbehind
\A # match the beginning of the string
.{#{str.size/2}} # match any character #{str.size/2} times
) # end positive lookbehind
/x # invoke free-spacing regex definition mode
def halves(str)
str.split(rgx)
end
first, second = halves('abcdef')
#=> ["abc", "def"]
first, second = halves('abcde')
#=> ["ab", "cde"]
The regular expression is conventionally written
/(?<=\A.{#{str.size/2}})/
Note that the regular expression matches a location between two successive characters.
def halves(str)
[str[0, str.size/2], str[str.size/2..-1]]
end
first, second = halves('abcdef')
#=> ["abc", "def"]
first, second = halves('abcde')
#=> ["ab", "cde"]
Note: This only works with even length strings.
Along the line of your pseudocode,
first, second = string[0...string.length/2], string[string.length/2...string.length]
If string is the original string.

Regex to grab full firstname and first letter of last name

I have a list of users grabbed by the Etc Ruby library:
Thomas_J_Perkins
Jennifer_Scanner
Amanda_K_Loso
Aaron_Cole
Mark_L_Lamb
What I need to do is grab the full first name, skip the middle name (if given), and grab the first character of the last name. The output should look like this:
Thomas P
Jennifer S
Amanda L
Aaron C
Mark L
I'm not sure how to do this, I've tried grabbing all of the characters: /\w+/ but that will grab everything.
You don't always need regular expressions.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. Jamie Zawinski
You can do it with some simple Ruby code
string = "Mark_L_Lamb"
string.split('_').first + ' ' + string.split('_').last[0]
=> "Mark L"
I think its simpler without regex:
array = "Thomas_J_Perkins".split("_") # split at _
array.first + " " + array.last[0] # .first prints first name .last[0] prints first char of last name
#=> "Thomas P"
You can use
^([^\W_]+)(?:_[^\W_]+)*_([^\W_])[^\W_]*$
And replace with \1_\2. See the regex demo
The [^\W_] matches a letter or a digit. If you want to only match letters, replace [^\W_] with \p{L}.
^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$
See updated demo
The point is to match and capture the first chunk of letters up to the first _ (with (\p{L}+)), then match 0+ sequences of _ + letters inside (with (?:_\p{L}+)*_) and then match and capture the last word first letter (with (\p{L})) and then match the rest of the string (with \p{L}*).
NOTE: replace ^ with \A and $ with \z if you have independent strings (as in Ruby ^ matches the start of a line and $ matches the end of the line).
Ruby code:
s.sub(/^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$/, "\\1_\\2")
I'm in the don't-use-a-regex-for-this camp.
str1 = "Alexander_Graham_Bell"
str2 = "Sylvester_Grisby"
"#{str1[0...str1.index('_')]} #{str1[str1.rindex('_')+1]}"
#=> "Alexander B"
"#{str2[0...str2.index('_')]} #{str2[str2.rindex('_')+1]}"
#=> "Sylvester G"
or
first, last = str1.split(/_.+_|_/)
#=> ["Alexander", "Bell"]
first+' '+last[0]
#=> "Alexander B"
first, last = str2.split(/_.+_|_/)
#=> ["Sylvester", "Grisby"]
first+' '+last[0]
#=> "Sylvester G"
but if you insist...
r = /
(.+?) # match any characters non-greedily in capture group 1
(?=_) # match an underscore in a positive lookahead
(?:.*) # match any characters greedily in a non-capture group
(?:_) # match an underscore in a non-capture group
(.) # match any character in capture group 2
/x # free-spacing regex definition mode
str1 =~ r
$1+' '+$2
#=> "Alexander B"
str2 =~ r
$1+' '+$2
#=> "Sylvester G"
You can of course write
r = /(.+?)(?=_)(?:.*)(?:_)(.)/
This is my attempt:
/([a-zA-Z]+)_([a-zA-Z]+_)?([a-zA-Z])/
See demo
Let's see if this works:
/^([^_]+)(?:_\w)?_(\w)/
And then you'll have to combine the first and second matches into the format you want. I don't know Ruby, so I can't help you there.
And another attempt using a replacement method:
result = subject.gsub(/^([^_]+)(?:_[^_])?_([^_])[^_]+$/, '\1 \2')
We capture the entire string, with the relevant parts in capturing groups. Then just return the two captured groups
using the split method is much better
full_names.map do |full_name|
parts = full_name.split('_').values_at(0,-1)
parts.last.slice!(1..-1)
parts.join(' ')
end
/^[A-Za-z]{5,15}\s[A-Za-z]{1}]$/i
This will have the following criteria:
5-15 characters for first name then a whitespace and finally a single character for last name.

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

Use regular expression to fetch 3 groups from string

This is my expected result.
Input a string and get three returned string.
I have no idea how to finish it with Regex in Ruby.
this is my roughly idea.
match(/(.*?)(_)(.*?)(\d+)/)
Input and expected output
# "R224_OO2003" => R224, OO, 2003
# "R2241_OOP2003" => R2244, OOP, 2003
If the example description I gave in my comment on the question is correct, you need a very straightforward regex:
r = /(.+)_(.+)(\d{4})/
Then:
"R224_OO2003".scan(r).flatten #=> ["R224", "OO", "2003"]
"R2241_OOP2003".scan(r).flatten #=> ["R2241", "OOP", "2003"]
Assuming that your three parts consist of (R and one or more digits), then an underbar, then (one or more non-whitespace characters), before finally (a 4-digit numeric date), then your regex could be something like this:
^(R\d+)_(\S+)(\d{4})$
The ^ indicates start of string, and the $ indicates end of string. \d+ indicates one or more digits, while \S+ says one or more non-whitespace characters. The \d{4} says exactly four digits.
To recover data from the matches, you could either use the pre-defined globals that line up with your groups, or you could could use named captures.
To use the match globals just use $1, $2, and $3. In general, you can figure out the number to use by counting the left parentheses of the specific group.
To use the named captures, include ? right after the left paren of a particular group. For example:
x = "R2241_OOP2003"
match_data = /^(?<first>R\d+)_(?<second>\S+)(?<third>\d{4})$/.match(x)
puts match_data['first'], match_data['second'], match_data['third']
yields
R2241
OOP
2003
as expected.
As long as your pattern covers all possibilities, then you just need to use the match object to return the 3 strings:
my_match = "R224_OO2003".match(/(.*?)(_)(.*?)(\d+)/)
#=> #<MatchData "R224_OO2003" 1:"R224" 2:"_" 3:"OO" 4:"2003">
puts my_match[0] #=> "R224_OO2003"
puts my_match[1] #=> "R224"
puts my_match[2] #=> "_"
puts my_match[3] #=> "00"
puts my_match[4] #=> "2003"
A MatchData object contains an array of each match group starting at index [1]. As you can see, index [0] returns the entire string. If you don't want the capture the "_" you can leave it's parentheses out.
Also, I'm not sure you are getting what you want with the part:
(.*?)
this basically says one or more of any single character followed by zero or one of any single character.

regex scan only returning first value

I have two strings that should both return matches according to the regex, but only str1 returns the expected match. str1 is an exact match for the regex (created by Avinash Raj) below. str2 contains str1 and more data. I expected str2 to return str1 and more values that matched, but it returns nothing Can someone explain why?
str1="3,15,14,31,40,5,5,4,5,3,4,4,5,2,2,2,1,2,1,1,3,3,3,2,4,3,false,false,false,false,false,true,false,true,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,3,3,3,2,3"
str2="3,15,14,31,40,5,5,4,5,3,4,4,5,2,2,2,1,2,1,1,3,3,3,2,4,3,false,false,false,false,false,true,false,true,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,3,3,3,2,3,3,15,14,35,27,4,5,3,5,3,2,4,4,2,1,1,2,2,2,1,3,3,3,2,5,9,true,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,true,true,false,false,false,false,2,2,3,2,3,3,15,16,34,53,4,4,4,3,1,3,4,3,1,1,1,1,1,1,1,2,3,2,3,5,1,true,false,false,false,false,false,true,false,false,false,false,false,false,false,true,true,false,false,false,false,false,false,false,false,false,false,false,3,2,3,2,3,3,15,18,37,29,4,4,4,3,2,3,3,4,1,1,1,1,1,1,1,1,3,1,2,4,1,true,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,3,2,3,2,3,3,15,20,34,37,4,4,4,3,1,3,3,4,1,1,1,1,1,1,1,1,1,1,2,4,1,false,false,false,true,false,false,false,false,false,false,false,false,false,true,false,true,false,false,false,false,false,false,false,false,true,false,false,3,1,3,1,3,3,16,10,18,30,4,3,3,3,1,3,3,3,1,1,1,1,1,1,1,1,2,1,4,4,3,false,false,false,false,false,true,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,3,2,3,2,3,3,16,12,39,5,5,5,4,5,3,5,5,5,1,1,1,1,1,1,1,2,1,1,1,5,10,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,3,2,3,2,3,3,16,14,18,27,4,4,4,4,2,3,3,4,1,1,1,1,1,1,1,1,1,1,2,5,1,true,false,false,false,false,false,false,false,false,false,false,false,false,true,false,true,false,false,false,false,true,false,false,false,false,false,false,3,2,3,2,3,3,16,16,18,32,5,5,5,5,4,5,5,5,1,1,1,1,1,1,1,2,1,1,1,5,3,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,true,false,3,2,3,2,3,3,16,18,20,7,5,5,5,5,3,3,3,4,1,1,1,1,1,1,1,1,1,1,2,5,1,false,false,false,true,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,true,false,false,false,false,false,false,3,2,3,2,3,3,16,20,18,59,4,4,4,3,1,1,1,2,1,1,1,1,1,1,1,1,2,2,4,5,9,false,false,false,true,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,3,2,3,2,3,3,17,10,16,9,3,3,3,3,1,2,3,3,1,1,1,1,1,1,1,1,2,1,3,5,1,true,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,3,2,3,2,3,3,17,12,16,17,4,3,4,2,1,4,3,2,1,1,1,1,1,1,1,1,1,1,4,5,3,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,3,2,3,2,3,3,17,14,16,21,4,4,4,4,1,3,4,4,1,1,1,1,1,1,1,1,1,1,2,5,1,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,true,false,false,3,2,3,2,3,3,17,16,16,20,5,5,4,5,3,4,4,5,1,1,1,1,1,1,1,1,1,1,1,5,8,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,true,false,false,false,3,2,3,2,3,3,17,18,16,31,4,4,4,4,1,4,3,3,1,1,1,1,1,1,1,1,1,1,3,5,1,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,true,false,true,false,false,false,false,3,2,3,2,3,3,17,20,18,8,5,5,4,5,4,4,4,5,1,1,1,1,1,1,1,2,1,1,1,5,1,false,false,false,true,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,true,false,false,false,3,2,3,2,3,3,18,10,31,33,3,2,3,2,2,2,2,3,1,1,1,1,1,1,1,1,1,1,1,5,7,true,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,3,2,3,2,3,3,18,12,36,11,4,4,4,5,3,4,3,3,1,1,2,1,2,1,2,2,1,1,1,5,1,false,false,false,true,false,false,true,false,false,false,false,false,false,true,false,true,false,false,false,false,false,false,false,false,true,false,false,3,2,3,2,3,3,18,14,49,6,3,3,2,2,1,2,2,2,2,1,1,1,2,1,2,3,3,4,4,5,9,true,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,3,2,3,2,3,3,18,16,32,53,3,4,4,3,3,3,3,3,1,1,1,1,1,1,2,2,1,1,3,5,7,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,true,false,false,false,false,3,2,3,2,3,3,18,18,37,59,5,4,4,4,4,4,4,4,1,1,1,1,1,1,1,2,1,1,2,5,7,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,true,false,false,false,false,3,2,3,2,3,3,19,10,5,25,4,4,4,2,2,4,3,3,1,1,1,1,1,1,1,1,2,2,2,5,1,true,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,2,1,3,2,3,3,19,13,0,5,5,5,4,5,3,3,5,5,1,1,1,1,1,1,1,1,1,1,3,5,7,false,false,true,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,true,false,false,false,false,false,false,3,2,3,2,3,3,19,14,5,23,4,4,4,4,3,4,3,3,1,1,1,1,1,1,1,1,1,2,2,5,9,false,false,true,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,3,2,3,2,3,3,19,16,7,19,5,4,4,4,3,4,3,3,1,1,1,1,1,1,1,2,2,2,3,5,9,false,false,true,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,3,2,3,2,3,3,19,18,6,30,4,4,4,4,3,4,4,4,1,1,1,1,1,1,1,1,1,1,1,5,8,false,false,true,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,true,false,false,false,true,false,false,3,2,3,2,3,3,19,20,8,25,4,4,5,4,3,4,3,4,1,1,1,1,1,1,1,1,1,1,3,5,1,false,false,true,false,false,false,true,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,true,false,false,3,2,3,2,3,3,19,21,18,2,4,4,4,3,3,4,3,4,1,1,1,1,1,1,1,1,1,1,1,5,1,false,false,true,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,true,false,true,false,false,false,true,false,false,3,2,3,2,3,"
str1.scan(/^,?(?:[1-5]\d|[1-9])(?:,(?:[1-5]\d|[1-9])){4}(?:,[1-5]){21}(?:,(?:true|false)){27}(?:,[1-5]){5}$/).each{|x|
puts x
puts "---1---"
}
str2.scan(/^,?(?:[1-5]\d|[1-9])(?:,(?:[1-5]\d|[1-9])){4}(?:,[1-5]){21}(?:,(?:true|false)){27}(?:,[1-5]){5}$/).each{|x|
puts x
puts "---2---"
}
Kind of by definition, you can't have more than one pattern match in a string when your pattern specifically says "start of string, then [stuff], then end of string". Look at regexp anchors ^ and $.
A simpler example might make it clearer: ^a$ "start of string, then letter a, then end of string" will match in "a" once, but will match in "aaa" zero times, even though there are three letters a.
$ assert position at end of a line
Now you are not matching upto the end of line.
^,?(?:[1-5]\d|[1-9])(?:,(?:[1-5]\d|[1-9])){4}(?:,[1-5]){21}(?:,(?:true|false)){27}(?:,[1-5]){5}
Just remove the $ from the end.See demo.
https://regex101.com/r/sJ9gM7/22
Because you're regular starts with the ^ metacharacter and ends with the $ metacharacter, it expects the full string to match.

Resources