Find exact word in string and not partial - ruby

I have the following string
str = "feminino blue"
I need to know if there is a string called "mini" inside this string.
When I use include? method, the return is true because "feMINino" has "min"
Is there a way to search for the exact word that is passed as param?
Thanks

Sounds like a use case for regular expressions, which can match all kinds of more complex string patterns. You can read through that page for all the specifics (and it's very valuable to learn, not just as a Ruby concept; Regexes are used in almost every modern language), but this should cover your use case.
/\bmini\b/ =~ str
\b means "match a word boundary", so exactly one of the things to the left or right should be a word character and the other side should not (i.e. should be whitespace or the beginning/end of the string).
This will return nil if there's no match or the index of the match if there is one. Since nil is falsy and all numbers are truthy, this return value is safe to use in an if statement if all you need is a yes/no answer.
If the string you're working with is not constant and is instead in a variable called, say, my_word, you can interpolate it.
/\b#{Regexp.quote(my_word)}\b/ =~ str

Related

Working with Ruby class: Capitalizing a string

I'm trying to get my head around how to work with Classes in Ruby and would really appreciate some insight on this area. Currently, I've got a rather simple task to convert a string with the start of each word capitalized. For example:
Not Jaden-Cased: "How can mirrors be real if our eyes aren't real"
Jaden-Cased: "How Can Mirrors Be Real If Our Eyes Aren't Real"
This is my code currently:
class String
def toJadenCase
split
capitalize
end
end
#=> usual case: split.map(&:capitalize).join(' ')
Output:
Expected: "The Moment That Truth Is Organized It Becomes A Lie.",
instead got: "The moment that truth is organized it becomes a lie."
I suggest you not pollute the core String class with the addition of an instance method. Instead, just add an argument to the method to hold the string. You can do that as follows, by downcasing the string then using gsub with a regular expression.
def to_jaden_case(str)
str.downcase.gsub(/(?<=\A| )[a-z]/) { |c| c.upcase }
end
to_jaden_case "The moMent That trUth is organized, it becomes a lie."
#=> "The Moment That Truth Is Organized, It Becomes A Lie."
Ruby's regex engine performs the following operations.
(?<=\A| ) : use a positive lookbehind to assert that the following match
is immediately preceded by the start of the string or a space
[a-z] : match a lowercase letter
(?<=\A| ) can be replaced with the negative lookbehind (?<![^ ]), which asserts that the match is not preceded by a character other than a space.
Notice that by using String#gsub with a regular expression (unlike the split-process-join dance), extra spaces are preserved.
When spaces are to be matched by a regular expression one often sees whitespaces (\s) matched instead. Here, for example, /(?<=\A|\s)[a-z]/ works fine, but sometimes matching whitespaces leads to problems, mainly because they also match newlines (\n) (as well as spaces, tabs and a few other characters). My advice is to match space characters if spaces are to be matched. If tabs are to be matched as well, use a character class ([ \t]).
Try:
def toJadenCase
self.split.map(&:capitalize).join(' ')
end

Case-sensitive substitutions with gsub

As an exercise, I'm working on an accent translation dictionary. My dictionary is contained in a hash, and I'm thinking of using #gsub! to run inputted strings through the translator.
I'm wondering if there's any way to make the substitutions case-sensitive. For example, I want "didja" to translate to "did you" and "Didja" to translate to "Did you", but I don't want to have to create multiple dictionary entries to deal with case.
I know I can use regex syntax to find strings to replace case-insensitively, with str.gsub!(/#{x}/i,dictionary[x]) where x is a variable. The problem is that this replaces "Didja" with "did you", rather than matching the original case.
Is there any way to make it match the original case?
Suppose we have:
a method to_key that converts a string str to a key in a hash DICTIONARY; and
a method transform that converts the pair [str, DICTIONARY[to_key(str)]] to the replacement for str.
Then str is to be replaced with:
transform(str, DICTIONARY[to_key(str)]])
Without lose of generality, I think we can assume that DICTIONARY's keys and values are all of the same case (say, lower case) and that to_key is simply:
def to_key(str)
str.downcase
end
So all that is necessary is to define the method transform. However, the specification provided does not imply a unique mapping. We therefore must decide what transform should do.
For example, suppose the rule is simply that, if the first character of str and the first character of the dictionary value are both letters, the latter is to be converted to upper case if the former is upper case. Then:
def transform(str, dict_value)
(str[0] =~ /[A-Z]/) ? dict_value.capitalize : dict_value
end
(I originally had dict_value[0] = dict_value[0].upcase if..., but came to my senses after reading #sawa's answer.)
Note that if DICTIONARY['cat'] => 'dog', 'Cat' will be converted to 'Dog'.
One might think that another possibility is that all characters of str that are letters should maintain their case. This is problematic, however, as the dictionary mapping may (without further specification) remove letters, and it may not be clear from DICTIONARY[str] which letters of str were removed, some of which may be lower case and others upper case.
It is not clear what capitalization patterns you have in mind. I assume that you only need to deal with words that are all low case or all low case except the first letter.
str.gsub!(/#{x}/i){|x| x.downcase! ? dictionary[x].capitalize : dictionary[x]}
I don't think this is possible since in this scenario you need to specify the exact string that must take place of the replaced string.
With that in mind, this is the best I can suggest:
subs = {'didja' => 'did you'}
subs.clone.each{ |k, v| subs[k.capitalize] = v.capitalize }
# if you want to replace all occurrences i.e. even substrings:
regex = /#{subs.keys.join('|')}/
# if you want to remove complete words only: (as the Tin man points out)
regex = /\b(?:#{subs.keys.join('|')})\b/ # \b checks for word-boundaries
"didja Didja".gsub(regex, subs)
Update:
Because in your example, the case-sensitive character isn't to be replaced by another value, you could use this:
regex = /(?<=(d))idja/i # again, keep in mind the substrings
"didja Didja".gsub(regex, "id you")

Precedence of Ruby regular expressions?

I am reviewing regular expressions and cannot understand why a regular expression won't match a given string, specifically:
regex = /(ab*)+(bc)?/
mystring = "abbc"
The match matches "abb" but leaves the c off. I tested this using Rubular and in IRB and don't understand why the regex doesn't match the entire string. I thought that (ab*)+ would match "ab" and then (bc)? would match "bc".
Am I missing something in terms of precedence for regular expression operations?
Regular expressions try to match the first part of the regular expression as much as possible by default, and they do not backtrack to try to make larger sections match if they don't have to. Since you make (bc) optional, the (ab*) can match as much as it wants (the non-zero repetition after it doesn't have much to do) and doesn't try backtracking to try other matching alternatives.
If you want the whole string to be matched (which will force some backtracking in this case) make sure you anchor both ends of the string:
regex = /^(ab*)+(bc)?$/
The regex with parenthesis assumes you have two matches in your string.
The first one is abb because (ab*) means a and zero or more b. You have two b, so the match is abb. Then you have only c in your string, so it doesn't match the second condition which is bc.

Regex can this be achieved

I'm too ambitious or is there a way do this
to add a string if not present ?
and
remove a the same string if present?
Do all of this using Regex and avoid the if else statement
Here an example
I have string
"admin,artist,location_manager,event_manager"
so can the substring location_manager be added or removed with regards to above conditions
basically I'm looking to avoid the if else statement and do all of this plainly in regex
"admin,artist,location_manager,event_manager".test(/some_regex/)
The some_regex will remove location_manager from the string if present else it will add it
Am I over over ambitions
You will need to use some sort of logic.
str += ',location_manager' unless str.gsub!(/location_manager,/,'')
I'm assuming that if it's not present you append it to the end of the string
Regex will not actually add or remove anything in any language that I am aware of. It is simply used to match. You must use some other language construct (a regex based replacement function for example) to achieve this functionality. It would probably help to mention your specific language so as to get help from those users.
Here's one kinda off-the-wall solution. It doesn't use regexes, but it also doesn't use any if/else statements either. It's more academic than production-worthy.
Assumptions: Your string is a comma-separated list of titles, and that these are a unique set (no duplicates), and that order doesn't matter:
titles = Set.new(str.split(','))
#=> #<Set: {"admin", "artist", "location_manager", "event_manager"}>
titles_to_toggle = ["location_manager"]
#=> ["location_manager"]
titles ^= titles_to_toggle
#=> #<Set: {"admin", "artist", "event_manager"}>
titles ^= titles_to_toggle
#=> #<Set: {"location_manager", "admin", "artist", "event_manager"}>
titles.to_a.join(",")
#=> "location_manager,admin,artist,event_manager"
All this assumes that you're using a string as a kind of set. If so, you should probably just use a set. If not, and you actually need string-manipulation functions to operate on it, there's probably no way around except for using if-else, or a variant, such as the ternary operator, or unless, or Bergi's answer
Also worth noting regarding regex as a solution: Make sure you consider the edge cases. If 'location_manager' is in the middle of the string, will you remove the extraneous comma? Will you handle removing commas correctly if it's at the beginning or the end of the string? Will you correctly add commas when it's added? For these reasons treating a set as a set or array instead of a string makes more sense.
No. Regex can only match/test whether "a string" is present (or not). Then, the function you've used can do something based on that result, for example replace can remove a match.
Yet, you want to do two actions (each can be done with regex), remove if present and add if not. You can't execute them sequentially, because they overlap - you need to execute either the one or the other. This is where if-else structures (or ternary operators) come into play, and they are required if there is no library/native function that contains them to do exactly this job. I doubt there is one in Ruby.
If you want to avoid the if-else-statement (for one-liners or expressions), you can use the ternary operator. Or, you can use a labda expression returning the correct value:
# kind of pseudo code
string.replace(/location,?|$/, function($0) return $0 ? "" : ",location" )
This matches the string "location" (with optional comma) or the string end, and replaces that with nothing if a match was found or the string ",location" otherwise. I'm sure you can adapt this to Ruby.
to remove something matching a pattern is really easy:
(admin,?|artist,?|location_manager,?|event_manager,?)
then choose the string to replace the match -in your case an empty string- and pass everything to the replace method.
The other operation you suggested was more difficult to achieve with regex only. Maybe someone knows a better answer

Ruby String: how to match a Regexp from a defined position

I want to match a regexp from a ruby string only from a defined position. Matches before that position do not interest me. Moreover, I'd like \A to match this position.
I found this solution:
code[index..-1][/\A[a-z_][a-zA-Z0-9_]*/]
This match the regexp at position index in the string code. If the match is not exactly at position index, it return nil.
Is there a more elegant way to do this (I want to avoid to create the temporary string with the first slice)?
Thanks
You could use ^.{#{index}} inside the regular expression. Don't know if that's what you want, because I don't understand your question completely. Can you maybe add an example with the tested String? And have you heard of Rubular? Great way to test your regular expressions.
This is how you could do it if I understand your question correctly:
code.match(/^.{#{index}}your_regex_here/)
The index variable will be put inside your regular expression. When index = 4, it will check if there's 4 characters from the beginning. Then it will check your own regular expression and only return true if yours is valid as well. I hope it helps. Good luck.
EDIT
And if you want to get the matched value for your regular expression:
code.scan(/^.{#{index}}([a-z_][a-zA-Z0-9_]*)/).join
It puts the matched result (inside the brackets) in an Array and joins it into a String.

Resources