Convert string to camel case in Ruby - ruby

Working on a Ruby challenge to convert dash/underscore delimited words into camel casing. The first word within the output should be capitalized only if the original word was capitalized (known as Upper Camel Case).
My solution so far..:
def to_camel_case(str)
str.split('_,-').collect.camelize(:lower).join
end
However .camelize(:lower) is a rails method I believe and doesn't work with Ruby. Is there an alternative method, equally as simplistic? I can't seem to find one. Or do I need to approach the challenge from a completely different angle?
main.rb:4:in `to_camel_case': undefined method `camelize' for #<Enumerator: []:collect> (NoMethodError)
from main.rb:7:in `<main>'

I assume that:
Each "word" is made up of one or more "parts".
Each part is made of up characters other than spaces, hypens and underscores.
The first character of each part is a letter.
Each successive pair of parts is separated by a hyphen or underscore.
It is desired to return a string obtained by modifying each part and removing the hypen or underscore that separates each successive pair of parts.
For each part all letters but the first are to be converted to lowercase.
All characters in each part of a word that are not letters are to remain unchanged.
The first letter of the first part is to remain unchanged.
The first letter of each part other than the first is to be capitalized (if not already capitalized).
Words are separated by spaces.
It this describes the problem correctly the following method could be used.
R = /(?:(?<=^| )|[_-])[A-Za-z][^ _-]*/
def to_camel_case(str)
str.gsub(R) do |s|
c1 = s[0]
case c1
when /[A-Za-z]/
c1 + s[1..-1].downcase
else
s[1].upcase + s[2..-1].downcase
end
end
end
to_camel_case "Little Miss-muffet sat_on_HE$R Tuffett eating-her_cURDS And_whey"
# => "Little MissMuffet satOnHe$r Tuffett eatingHerCurds AndWhey"
The regular expression is can be written in free-spacing mode to make it self-documenting.
R = /
(?: # begin non-capture group
(?<=^| ) # use a positive lookbehind to assert that the next character
# is preceded by the beginning of the string or a space
| # or
[_-] # match '_' or '-'
) # end non-capture group
[A-Za-z] # match a letter
[^ _-]* # match 0+ characters other than ' ', '_' and '-'
/x # free-spacing regex definition mode

Most Rails methods can be added into basic Ruby projects without having to pull in the whole Rails source.
The trick is to figure out the minimum amount of files to require in order to define the method you need. If we go to APIDock, we can see that camelize is defined in active_support/inflector/methods.rb.
Therefore active_support/inflector seems like a good candidate to try. Let's test it:
irb(main)> require 'active_support/inflector'
=> true
irb(main)> 'foo_bar'.camelize
=> "FooBar"
Seems to work. Note that this assumes you already ran gem install activesupport earlier. If not, then do it first (or add it to your Gemfile).

In pure Ruby, no Rails, given str = 'my-var_name' you could do:
delimiters = Regexp.union(['-', '_'])
str.split(delimiters).then { |first, *rest| [first, rest.map(&:capitalize)].join }
#=> "myVarName"
Where str = 'My-var_name' the result is "MyVarName", since the first element of the splitting result is untouched, while the rest is mapped to be capitalized.
It works only with "dash/underscore delimited words", no spaces, or you need to split by spaces, then map with the presented method.
This method is using string splitting by delimiters, as explained here Split string by multiple delimiters,
chained with Object#then.

Related

Positive Lookahead and Non-capturing group difference

When you want to match either of two patterns but not capture it, you would use a noncapturing group ?::
/(?:https?|ftp)://(.+)/
But what if I want to capture '_1' in the string 'john_1'. It could be '2' or '' followed by anything else. First I tried a non-capturing group:
'john_1'.gsub(/(?:.+)(_.+)/, "")
=> ""
It does not work. I am telling it to not capture one or more characters but to capture _ and all characters after it.
Instead the following works:
'john_1'.gsub(/(?=.+)(_.+)/, "")
=> "john"
I used a positive lookahead. The definition I found for positive lookahead was as follows:
q(?=u) matches a q that is
followed by a u, without making the u part of the match. The positive
lookahead construct is a pair of parentheses, with the opening
parenthesis followed by a question mark and an equals sign.
But that definition doesn't really fit my example. What makes the Positive Lookahead work but not the Non-capturing group work in the example I provide?
Capturing and matching are two different things. (?:expr) doesn't capture expr, but it's still included in the matched string. Zero-width assertions, e.g. (?=expr), don't capture or include expr in the matched string.
Perhaps some examples will help illustrate the difference:
> "abcdef"[/abc(def)/] # => abcdef
> $1 # => def
> "abcdef"[/abc(?:def)/] # => abcdef
> $1 # => nil
> "abcdef"[/abc(?=def)/] # => abc
> $1 # => nil
When you use a non-capturing group in your String#gsub call, it's still part of the match, and gets replaced by the replacement string.
Your first example doesn't work because a non-capturing group is still part of the overall capture, whereas the lookbehind is only used for matching but isn't part of the overall capture.
This is easier to understand if you get the actual match data:
# Non-capturing group
/(?:.+)(_.+)/.match 'john_1'
=> #<MatchData "john_1" 1:"_1">
# Positive Lookbehind
/(?=.+)(_.+)/.match 'john_1'
=> #<MatchData "_1" 1:"_1">
EDIT: I should also mention that sub and gsub work on the entire capture, not individual capture groups (although those can be used in the replacement).
'john_1'.gsub(/(?:.+)(_.+)/, 'phil\1')
=> "phil_1"
Let's consider a couple of situations.
The string preceding the underscore must be "john" and the underscore is followed by one or more characters
str = "john_1"
You have two choices.
Use a positive lookbehind
str[/(?<=john)_.+/]
#=> "_1"
The positive lookbehind requires that "john" must appear immediately before the underscore, but it is not part of the match that is returned.
Use a capture group:
str[/john(_.+)/, 1]
#=> "_1"
This regular expression matches "john_1", but "_.+" is captured in capture group 1. By examining the doc for the method String#[] you will see that one form of the method is str[regexp, capture], which returns the contents of the capture group capture. Here capture equals 1, meaning the first capture group.
Note that the string following the underscore may contain underscores: "john_1_a"[/(?<=john)_.+/] #=> "_1_a".
If the underscore can be at the end of the string replace + with * in the above regular expressions (meaning match zero or more characters after the underscore).
The string preceding the underscore can be anything and and the underscore is followed by one or more characters
str = "john_mary_tom_julie"
We may consider two cases.
The string returned is to begin with the first underscore
In this case we could write:
str[/_.+/]
#=> "_mary_tom_julie"
This works because the regex is by default greedy, meaning it will begin at the first underscore encountered.
The string returned is to begin with the last underscore
Here we could write:
str[/_[^_]+\z/]
#=> "_julie"
This regex matches an underscore followed by one or more characters that are not underscores, followed by the end-of-string anchor (\z).
Aside: the method String#[]
[] may seem an odd name for a method but it is a method nevertheless, so it can be invoked in the conventional way:
str.[](/john(_.+)/, 1)
#=> "_1"
The expression str[/john(_.+)/, 1] is an example (of which there are many in Ruby) of syntactic sugar. When written str[...] Ruby converts it to the conventional expression for methods before evaluating it.

How do I specify in Ruby that I want to match a character provided that a sequence following that character does not match a pattern?

I'm using Ruby on Rails 5.1. In Ruby, how do I say taht I want to match a string if the first character matches something but the sequence that follows does NOT match a pattern? That is, I want to match a number provided that the sequence taht follows is not a character from an array I have followed by two other numbers. Here's my character array ...
2.4.0 :010 > TOKENS
=> [":", ".", "'"]
So this string would NOT match
3:00
since ":00" matches the pattern of a character from my array followed by two numbers. But this string
3400
would match. This string would also match
3:0
and this would match
3
since nothing follows the above. How do I write the appropriate regex in Ruby?
string =~ /\A\d+(?!:\d{2})/
This regular expression means:
\A anchors the match to the start of the string.
\d+ means "one or more digits".
(?!...) is a negative look-ahead. It checks that the pattern contained in the brackets does not match., looking ahead from the current position.
:\d{2} means : followed by two digits.
Consideration should be given to testing the first character and the remaining characters separately.
def match_it?(str, first_char_regex, no_match_regex)
str[0].match?(first_char_regex) && !str[1..-1].match?(no_match_regex)
end
match_it?("0:00", /0/, /\A[:. ]cat\z/) #=> true
match_it?("0:00", /\d/, /\A[:. ]\d+\z/) #=> false
match_it?("0:00", /[[:alpha:]]/, /\A[:. ]\d+\z/) #=> false
I believe this reads well and it simplifies testing when compared to methods that employ a single regular expression.

How to remove strings that end with a particular character in Ruby

Based on "How to Delete Strings that Start with Certain Characters in Ruby", I know that the way to remove a string that starts with the character "#" is:
email = email.gsub( /(?:\s|^)#.*/ , "") #removes strings that start with "#"
I want to also remove strings that end in ".". Inspired by "Difference between \A \z and ^ $ in Ruby regular expressions" I came up with:
email = email.gsub( /(?:\s|$).*\./ , "")
Basically I used gsub to remove the dollar sign for the carrot and reversed the order of the part after the closing parentheses (making sure to escape the period). However, it is not doing the trick.
An example I'd like to match and remove is:
"a8&23q2aas."
You were so close.
email = email.gsub( /.*\.\s*$/ , "")
The difference lies in the fact that you didn't consider the relationship between string of reference and the regex tokens that describe the condition you wish to trigger. Here, you are trying to find a period (\.) which is followed only by whitespace (\s) or the end of the line ($). I would read the regex above as "Any characters of any length followed by a period, followed by any amount of whitespace, followed by the end of the line."
As commenters pointed out, though, there's a simpler way: String#end_with?.
I'd use:
words = %w[#a day in the life.]
# => ["#a", "day", "in", "the", "life."]
words.reject { |w| w.start_with?('#') || w.end_with?('.') }
# => ["day", "in", "the"]
Using a regex is overkill for this if you're only concerned with the starting or ending character, and, in fact, regular expressions will slow your code in comparison with using the built-in methods.
I would really like to stick to using gsub....
gsub is the wrong way to remove an element from an array. It could be used to turn the string into an empty string, but that won't remove that element from the array.
def replace_suffix(str,suffix)
str.end_with?(suffix)? str[0, str.length - suffix.length] : str
end

Ruby search a string for matching character pairs

I want to match character pairs in a string. Let's say the string is:
"zttabcgqztwdegqf". Both "zt" and "gq" are matching pairs of characters in the string.
The following code finds the "zt" matching pair, but not the "gq" pair:
#!/usr/bin/env ruby
string = "zttabcgqztwdegqf"
puts string.scan(/.{1,2}/).detect{ |c| string.count(c) > 1 }
The code provides matching pairs where the indices of the pairs are 0&1,2&3,4&5... but not 1&2,3&4,5&6, etc:
zt
ta
bc
gq
zt
wd
eg
qf
I'm not sure regex in Ruby is the best way to go. But I want to use Ruby for the solution.
You can do your search with a single regex:
puts string.scan(/(?=(.{2}).*\1)/)
regex101 demo
Output
zt
gq
Regex Breakout
(?= # Start a lookahead
(.{2}) # Search any couple of char and group it in \1
.*\1 # Search ahead in the string for another \1 to validate
) # Close lookahead
Note
Putting all the checks inside lookahead assure the regex engine does not consume the couple when validates it.
So it also works with overlapping couples like in the string abcabc: the output will correctly be ab,bc.
Oddity
If the regex engine does not consume the chars how it can reach the end of the string?
Internally after the check Onigmo (the ruby regex engine) makes one step further automatically. Most regex flavours behaves in this way but e.g. the javascript engine needs the programmer to increment the last match index manually.
str = "ztcabcgqzttwtcdegqf"
r = /
(.) # match any character in capture group 1
(?= # begin a positive lookahead
(.) # match any character in capture group 2
.+ # match >= 1 characters
\1 # match capture group 1
\2 # match capture group 2
) # close positive lookahead
/x # extended/free-spacing regex definition mode
str.scan(r).map(&:join)
#=> ["zt", "tc", "gq"]
Here is one way to do this without using regex:
string = "zttabcgqztwdegqf"
p string.split('').each_cons(2).map(&:join).select {|i| string.scan(i).size > 1 }.uniq
#=> ["zt", "gq"]

Use regular expression to fetch 3 groups from string

This is my expected result.
Input a string and get three returned string.
I have no idea how to finish it with Regex in Ruby.
this is my roughly idea.
match(/(.*?)(_)(.*?)(\d+)/)
Input and expected output
# "R224_OO2003" => R224, OO, 2003
# "R2241_OOP2003" => R2244, OOP, 2003
If the example description I gave in my comment on the question is correct, you need a very straightforward regex:
r = /(.+)_(.+)(\d{4})/
Then:
"R224_OO2003".scan(r).flatten #=> ["R224", "OO", "2003"]
"R2241_OOP2003".scan(r).flatten #=> ["R2241", "OOP", "2003"]
Assuming that your three parts consist of (R and one or more digits), then an underbar, then (one or more non-whitespace characters), before finally (a 4-digit numeric date), then your regex could be something like this:
^(R\d+)_(\S+)(\d{4})$
The ^ indicates start of string, and the $ indicates end of string. \d+ indicates one or more digits, while \S+ says one or more non-whitespace characters. The \d{4} says exactly four digits.
To recover data from the matches, you could either use the pre-defined globals that line up with your groups, or you could could use named captures.
To use the match globals just use $1, $2, and $3. In general, you can figure out the number to use by counting the left parentheses of the specific group.
To use the named captures, include ? right after the left paren of a particular group. For example:
x = "R2241_OOP2003"
match_data = /^(?<first>R\d+)_(?<second>\S+)(?<third>\d{4})$/.match(x)
puts match_data['first'], match_data['second'], match_data['third']
yields
R2241
OOP
2003
as expected.
As long as your pattern covers all possibilities, then you just need to use the match object to return the 3 strings:
my_match = "R224_OO2003".match(/(.*?)(_)(.*?)(\d+)/)
#=> #<MatchData "R224_OO2003" 1:"R224" 2:"_" 3:"OO" 4:"2003">
puts my_match[0] #=> "R224_OO2003"
puts my_match[1] #=> "R224"
puts my_match[2] #=> "_"
puts my_match[3] #=> "00"
puts my_match[4] #=> "2003"
A MatchData object contains an array of each match group starting at index [1]. As you can see, index [0] returns the entire string. If you don't want the capture the "_" you can leave it's parentheses out.
Also, I'm not sure you are getting what you want with the part:
(.*?)
this basically says one or more of any single character followed by zero or one of any single character.

Resources