Replacing hyphens in words with the next letter capitalized - ruby

I have a symbol like the following. Whenever the symbol contains the "-" hyphen mark, I want to remove it and upcase the subsequent letter.
I am able to do it like so:
sym = :'new-york'
str = sym.to_s.capitalize
/-(.)/.match(str)
str = str.gsub(/-(.)/,$1.capitalize)
=> "NewYork"
This required four lines. Is there a more elegant way to create CamelCase (upper CamelCase e.g. NewYork, NewJersey, BucksCounty) from hyphened words in Ruby?

Here's one way:
sym.to_s.split('-').map(&:capitalize).join #=> "NewYork"

sym.to_s.gsub(/(-|\A)./) { $&[-1].upcase }
or
sym.to_s.gsub(/(-|\A)./) { |m| m[-1].upcase }

r = /
([[:alpha:]]+) # match 1 or more letters in capture group 1
- # match a hyphen
([[:alpha:]]+) # match 1 or more letters in capture group 2
/x # free-spacing regex definition mode
sym = :'new-york'
sym.to_s.sub(r) { $1.capitalize + $2.capitalize }
#=> "NewYork"

Related

Extract all words with # symbol from a string

I need to extract all #usernames from a string(for twitter) using rails/ruby:
String Examples:
"#tom #john how are you?"
"how are you #john?"
"#tom hi"
The function should extract all usernames from a string, plus without special characters disallowed for usernames... as you see "?" in an example...
From "Why can't I register certain usernames?":
A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. Check to make sure your desired username doesn't contain any symbols, dashes, or spaces.
The \w metacharacter is equivalent to [a-zA-Z0-9_]:
/\w/ - A word character ([a-zA-Z0-9_])
Simply scanning for #\w+ will succeed according to that:
strings = [
"#tom #john how are you?",
"how are you #john?",
"#tom hi",
"#foo #_foo #foo_ #foo_bar #f123bar #f_123_bar"
]
strings.map { |s| s.scan(/#\w+/) }
# => [["#tom", "#john"],
# ["#john"],
# ["#tom"],
# ["#foo", "#_foo", "#foo_", "#foo_bar", "#f123bar", "#f_123_bar"]]
There are multiple ways to do it - here's one way:
string = "#tom #john how are you?"
words = string.split " "
twitter_handles = words.select do |word|
word.start_with?('#') && word[1..-1].chars.all? do |char|
char =~ /[a-zA-Z1-9\_]/
end && word.length > 1
end
The char =~ regex will only accept alphaneumerics and the underscore
r = /
# # match character
[[[:alpha:]]]+ # match one or more letters
\b # match word break
/x # free-spacing regex definition mode
"#tom #john how are you? And you, #andré?".scan(r)
#=> ["#tom", "#john", "#andré"]
If you wish to instead return
["tom", "john", "andré"]
change the first line of the regex from # to
(?<=#)
which is a positive lookbehind. It requires that the character "#" be present but it will not be part of the match.

Regex matching except when pattern is after another pattern

I am looking to find method names for python functions. I only want to find method names if they aren't after "def ". E.g.:
"def method_name(a, b):" # (should not match)
"y = method_name(1,2)" # (should find `method_name`)
My current regex is /\W(.*?)\(/.
str = "def no_match(a, b):\ny = match(1,2)"
str.scan(/(?<!def)\s+\w+(?=\()/).map(&:strip)
#⇒ ["match"]
The regex comments:
negative lookbehind for def,
followed by spaces (will be stripped later),
followed by one or more word symbols \w,
followed by positive lookahead for parenthesis.
Sidenote: one should never use regexps to parse long strings for any purpose.
I have assumed that lines that do not contain "def" are of the form "[something]=[zero or more spaces][method name]".
R1 = /
\bdef\b # match 'def' surrounded by word breaks
/x # free-spacing regex definition mode
R2 = /
[^=]+ # match any characters other than '='
= # match '='
\s* # match >= 0 whitespace chars
\K # forget everything matched so far
[a-z_] # match a lowercase letter or underscore
[a-z0-9_]* # match >= 0 lowercase letters, digits or underscores
[!?]? # possibly match '!' or '?'
/x
def match?(str)
(str !~ R1) && str[R2]
end
match?("def method_name1(a, b):") #=> false
match?("y = method_name2(1,2)") #=> "method_name2"
match?("y = method_name") #=> "method_name"
match?("y = method_name?") #=> "method_name?"
match?("y = def method_name") #=> false
match?("y << method_name") #=> nil
I chose to use two regexes to be able to deal with both my first and penultimate examples. Note that the method returns either a method name or a falsy value, but the latter may be either false or nil.

Regex strings in Ruby

Input strings:
str1 = "$13.90 Price as Shown"
str2 = "$590.50 $490.00 Price as Selected"
str3 = "$9.90 or 5/$27.50 Price as Selected"
Output strings:
str1 = "13.90"
str2 = "490.00"
str3 = "9.90"
My code to get output:
str = str.strip.gsub(/\s\w{2}\s\d\/\W\d+.\d+/, "") # remove or 5/$27.50 from string
str = /\W\d+.\d+\s\w+/.match(str).to_s.gsub("$", "").gsub(" Price", "")
This code works fine for all 3 different types of strings. But how can I improve my code? Are there any better solutions?
Also guys can you give link to good regex guide/book?
A regex I suggested first is just a sum total of your regexps:
(?<=(?<!\/)\$)\d+.\d+(?=\s\w+)
See demo
Since it is next to impossible to compare numbers with regex, I suggest
Extracting all float numbers
Parse them as float values
Get the minimum one
Here is a working snippet:
def getLowestNumberFromString(input)
arr = input.scan(/(?<=(?<!\/)\$)\d+(?:\.\d+)?/)
arr.collect do |value|
value.to_f
end
return arr.min
end
puts getLowestNumberFromString("$13.90 Price as Shown")
puts getLowestNumberFromString("$590.50 $490.00 Price as Selected")
puts getLowestNumberFromString("$9.90 or 5/$27.50 Price as Selected")
The regex breakdown:
(?<=(?<!\/)\$) - assert that there is a $ symbol not preceded with / right before...
\d+ - 1 or more digits
(?:\.\d+)? - optionally followed with a . followed by 1 or more digits
Note that if you only need to match floats with decimal part, remove the ? and non-capturing group from the last subpattern (/(?<=(?<!\/)\$)\d+\.\d+/ or even /(?<=(?<!\/)\$)\d*\.?\d+/).
Supposing input can be relied upon to look like one of your three examples, how about this?
expr = /\$(\d+\.\d\d)\s+(?:or\s+\d+\/\$\d+\.\d\d\s+)?Price/
str = "$9.90 or 5/$27.50 Price as Selected"
str[expr, 1] # => "9.90"
Here it is on Rubular: http://rubular.com/r/CakoUt5Lo3
Explained:
expr = %r{
\$ # literal dollar sign
(\d+\.\d\d) # capture a price with two decimal places (assume no thousands separator)
\s+ # whitespace
(?: # non-capturing group
or\s+ # literal "or" followed by whitespace
\d+\/ # one or more digits followed by literal "/"
\$\d+\.\d\d # dollar sign and price
\s+ # whitespace
)? # preceding group is optional
Price # the literal word "Price"
}x
You might use it like this:
MATCH_PRICE_EXPR = /\$(\d+\.\d\d)\s+(?:or\s+\d+\/\$\d+\.\d\d\s+)?Price/
def match_price(input)
return unless input =~ MATCH_PRICE_EXPR
$1.to_f
end
puts match_price("$13.90 Price as Shown")
# => 13.9
puts match_price("$590.50 $490.00 Price as Selected")
# => 490.0
puts match_price("$9.90 or 5/$27.50 Price as Selected")
# => 9.9
My code works fine for all 3 types of strings. Just wondering how can
I improve that code
str = str.gsub(/ or \d\/[\$\d.]+/i, '')
str = /(\$[\d.]+) P/.match(str)
Ruby Live Demo
http://ideone.com/18XMjr
A better regex is probably: /\B\$(\d+\.\d{2})\b/
str = "$590.50 $490.00 Price as Selected"
str.scan(/\B\$(\d+\.\d{2})\b/).flatten.min_by(&:to_f)
#=> "490.00"
Assuming you simply want the smallest dollar value in each line:
r = /
\$ # match a dollar sign
\d+ # match one or more digits
\. # match a decimal point
\d{2} # match two digits
/x # extended mode
[str1, str2, str3].map { |s| s.scan(r).min_by { |s| s[1..-1].to_f } }
#=> ["$13.90", "$490.00", "$9.90"]
Actually, you don't have to use a regex. You could do it like this:
def smallest(str)
val = str.each_char.with_index(1).
select { |c,_| c == ?$ }.
map { |_,i| str[i..-1].to_f }.
min
"$%.2f" % val
end
smallest(str1) #=> "$13.90"
smallest(str2) #=> "$490.00"
smallest(str3) #=> "$9.90"

Count capitalized of each sentence in a paragraph Ruby

I answered my own question. Forgot to initialize count = 0
I have a bunch of sentences in a paragraph.
a = "Hello there. this is the best class. but does not offer anything." as an example.
To figure out if the first letter is capitalized, my thought is to .split the string so that a_sentence = a.split(".")
I know I can "hello world".capitalize! so that if it was nil it means to me that it was already capitalized
EDIT
Now I can use array method to go through value and use '.capitalize!
And I know I can check if something is .strip.capitalize!.nil?
But I can't seem to output how many were capitalized.
EDIT
a_sentence.each do |sentence|
if (sentence.strip.capitalize!.nil?)
count += 1
puts "#{count} capitalized"
end
end
It outputs:
1 capitalized
Thanks for all your help. I'll stick with the above code I can understand within the framework I only know in Ruby. :)
Try this:
b = []
a.split(".").each do |sentence|
b << sentence.strip.capitalize
end
b = b.join(". ") + "."
# => "Hello there. This is the best class. But does not offer anything."
Your post's title is misleading because from your code, it seems that you want to get the count of capitalized letters at the beginning of a sentence.
Assuming that every sentence is finishing on a period (a full stop) followed by a space, the following should work for you:
split_str = ". "
regex = /^[A-Z]/
paragraph_text.split(split_str).count do |sentence|
regex.match(sentence)
end
And if you want to simply ensure that each starting letter is capitalized, you could try the following:
paragraph_text.split(split_str).map(&:capitalize).join(split_str) + split_str
There's no need to split the string into sentences:
str = "It was the best of times. sound familiar? Out, damn spot! oh, my."
str.scan(/(?:^|[.!?]\s)\s*\K[A-Z]/).length
#=> 2
The regex could be written with documentation by adding x after the closing /:
r = /
(?: # start a non-capture group
^|[.!?]\s # match ^ or (|) any of ([]) ., ! or ?, then one whitespace char
) # end non-capture group
\s* # match any number of whitespace chars
\K # forget the preceding match
[A-Z] # match one capital letter
/x
a = str.scan(r)
#=> ["I", "O"]
a.length
#=> 2
Instead of Array#length, you could use its alias, size, or Array#count.
You can count how many were capitalized, like this:
a = "Hello there. this is the best class. but does not offer anything."
a_sentence = a.split(".")
a_sentence.inject(0) { |sum, s| s.strip!; s.capitalize!.nil? ? sum += 1 : sum }
# => 1
a_sentence
# => ["Hello there", "This is the best class", "But does not offer anything"]
And then put it back together, like this:
"#{a_sentence.join('. ')}."
# => "Hello there. This is the best class. But does not offer anything."
EDIT
As #Humza sugested, you could use count:
a_sentence.count { |s| s.strip!; s.capitalize!.nil? }
# => 1

How can I improve this small Ruby Regex snippet?

How can I improve this?
the purpose of this code is to be used in a method that captures a string of hash_tags #twittertype from a form - parse through the list of words and make sure all the words are separated out.
WORD_TEST = "123 sunset #2d2-apple,#home,#star #Babyclub, #apple_surprise #apple,cats mustard#dog , #basic_cable safety #222 #dog-D#DOG#2D "
SECOND_TEST = 'orion#Orion#oRion,Mike'
This is my problem area RegXps...
_string_rgx = /([a-zA-Z0-9]+(-|_)?\w+|#?[a-zA-Z0-9]+(-|_)?\w+)/
add_pound_sign = lambda { |a| a[0].chr == '#' ? a : a='#' + a; a}
I don't know that much Regular Expressions: hence the needed collect the first[element] from the result of the scan -> It yielded weird stuff but the first element was always what I wanted.
t_word = WORD_TEST.scan(_string_rgx).collect {|i| i[0] }
s_word = SECOND_TEST.scan(_string_rgx).collect {|i| i[0] }
t_word.map! { |a| a = add_pound_sign.call(a); a }
s_word.map! { |a| a = add_pound_sign.call(a); a }
The results are what I want. I just want insight from Ruby | Regex guru's out there.
puts t_word.inspect
[
"#123", "#sunset", "#2d2-apple", "#home", "#star", "#Babyclub",
"#apple_surprise", "#apple", "#cats", "#mustard", "#dog",
"#basic_cable", "#safety", "#222", "#dog-D", "#DOG", "#2D"
]
puts s_word.inspect
[
"#orion", "#Orion", "#oRion", "#Mike"
]
Thanks in advance.
Lets unfold the regex:
(
[a-zA-Z0-9]+ (-|_)? \w+
| #? [a-zA-Z0-9]+ (-|_)? \w+
)
( begin capture group
[a-zA-Z0-9]+ match one or more alphanumeric characters
(-|_)? match a hyphen or an underscore and save. This group may fail
\w+ match one or more "word" characters (alphanumeric + underscore)
| OR match this:
#? match optional # character
[a-zA-Z0-9]+ match one or more alphanumeric characters
(-|_)? match hyphen or underscore and capture. may fail.
\w+ match one or more word characters
) end capature
I'd rather write this regex like this;
(#? [a-zA-Z0-9]+ (-|_)? \w+)
or
( #? [a-zA-Z0-9]+ (-?\w+)? )
or
( #? [a-zA-Z0-9]+ -? \w+ )
(all are reasonably equivalent)
You should note that this regex will fail on hashtags with unicode characters, eg #Ü-Umlaut, #façadeetc. You are also limited to a two-character minimum length (#a fails, #ab matches) and may have only one hyphen (#a-b-c fails / would return #a-b)
I would reduce your Regex pattern such as this:
WORD_TEST = "123 sunset #2d2-apple,#home,#star #Babyclub, #apple_surprise #apple,cats mustard#dog , #basic_cable safety #222 #dog-D#DOG#2D "
foo = []
WORD_TEST.scan(/#?[-\w]+\b/) do |s|
foo.push( s[0] != '#' ? '#' + s : s )
end

Resources