Capitalize first letter in Ruby with UTF-8 strings with exceptions - ruby

I would like to capitalize each word of a UTF-8 string. However, I need the function to ignore some special characters in the beginning of words, like "(-.,". The function will be used to capitalize song titles which can look like this:
marko, gabriel boni, simple jack - recall (original mix)
...would output:
Marko, Gabriel Boni, Simple Jack - Recall (Original Mix)
It should also be able to capitalize UTF-8 chars like "å" > "Å". "é" > "É".

Is there something why Unicode::capitalize method from unicode library does not suit your needs ?
irb(main):013:0> require 'unicode'
=> true
irb(main):014:0> begin Unicode::capitalize 'åäöéèí' rescue $stderr.print "unicode error\n" end
=> "Åäöéèí"
irb(main):015:0> begin Unicode::capitalize '-åäöéèí' rescue $stderr.print "unicode error\n" end
=> "-åäöéèí"

"åbc".mb_chars.capitalize
#=> "Åbc"
"ébc".mb_chars.capitalize.to_s
#=> "Ébc"
UPD
And to ignore none word chars:
string = "-åbc"
str = string.match(/^(\W*)(.*)/)
str[1] + str[2].mb_chars.capitalize.to_s
#=> "-Åbc"

I did this and wanted to filter a lot of things.
I created a constants file initializers/constants.rb
letters = ("a".."z").collect
numbers = ("1".."9").collect
symbols = %w[! # # $ % ^ & * ( ) _ - + = | \] { } : ; ' " ? / > . < , ]
FILTER = letters + numbers + symbols
And then just did a check to see if it was in my filter:
if !FILTER.include?(c)
#no
else
#yes
end
You can also check the value of the unicode but you need to know the range or specific values. I did this with chinese characters, so that is where I got my values. I will post some code just to give you an idea:
def check(char)
char = char.unpack('U*').first
if char >= 0x4E00 && char <= 0x9FFF
return true
end
if char >= 0x3400 && char <= 0x4DBF
return true
end
if char >= 0x20000 && char <= 0x2A6DF
return true
end
if char >= 0x2A700 && char <= 0x2B73F
return true
end
return false
end
You need to know the specific values here of course.

Related

Ruby, looping through a string deleting groups of characters until a desired output is achieved

I have a coding problem I solved and want to refactor. I know there has to be a cleaner way of doing what I did.
The goal is to write a method that takes a string of "!" and "?" and reduces the string by eliminating all odd groupings of each symbol.
Example - a string "????!!!" would have an odd grouping of "!!!" because there are three in a row. These would be deleted from the string.
If there is only one "!" or "?" its left because it is not in a group.
Ex -
remove("!????!!!?") answer == "!"
# => ("!????!!!?" --> "!?????" --> "!")
In the first string, the only odd grouping is "!!!", once removed, it leaves a new string with an odd grouping "?????". You remove the next odd grouping so you're left with "!". This fits the desired output.
Another example
remove("!???!!") == ""
# => ("!???!!" --> "!!!" --> "")
Current code:
def remove(s)
arr = [s]
i = 0
until i == arr[0].length
s = s.chars.chunk{|c|c}.map{ |n,a| a.join }.select{|x| x if x.length.even? || x.length <= 1}.join
arr << s
i += 1
end
return arr[-1]
end
My code solves this problem and all test cases. I have a suspicion that my until loop can be removed/refactored so that I could solve this problem in one line and have spent hours trying to figure it out with no luck.
Suppose
str = "???!!!???!"
If we first remove the two groups "???" we are left with "!!!!", which cannot be reduced further.
If we first remove the group "!!!" we are left with "??????!", which cannot be reduced further.
If we are permitted to remove all odd groups of either character without reference to the effect that either has on the other, we obtain !, which cannot be reduced further.
It's not clear what rule is to be used. Here are three possibilities and code to implement each.
I will use the following two regular expressions, and in the first two cases a helper method.
Rq = /
(?<!\?) # do not match a question mark, negative lookbehind
\? # match a question mark
(\?{2})+ # match two question marks one or more times
(?!\?) # do not match a question mark, negative lookahead
/x # free-spacing regex definition mode
which is commonly written /(?<!\?)\?(\?{2})+(?!\?)/.
Similarly,
Rx = /(?<!!)!(!{2})+(?!!)/
def sequential(str, first_regex, second_regex)
s = str.dup
loop do
size = s.size
s = s.gsub(first_regex,'').gsub(second_regex,'')
return s if s.size == size
end
end
I apply each of the three methods below to two example strings:
str1 = "???!!!???!"
str2 = 50.times.map { ['?', '!'].sample }.join
#=> "?!!!?!!!?!??????!!!?!!??!!???!?!????!?!!!?!?!???!?"
Replace all odd groups of "?" then odd groups of "!" then repeat until no further removals are possible
def question_before_exclamation(str)
sequential(str, Rq, Rx)
end
question_before_exclamation str1 #=> "!!!!"
question_before_exclamation str2 #=> "??!??!?!!?!?!!?"
Replace all odd groups of "!" then odd groups of "?" then repeat until no further removals are possible
def exclamation_before_question(str)
sequential(str, Rx, Rq)
end
exclamation_before_question str1 #=> "??????!"
exclamation_before_question str2 #=> "??!????!!?!?!!?!?!!?"
Replace all odd groups of both "?" and "!" then repeat until no further removals are possible
Rqx = /#{Rq}|#{Rx}/
#=> /(?-mix:(?<!\?)\?(\?{2})+(?!\?))|(?-mix:(?<!!)!(!{2})+(?!!))/
def question_and_explanation(str)
s = str.dup
loop do
size = s.size
s = s.gsub(Rqx,'')
return s if s.size == size
end
end
question_and_explanation str1 #=> "!"
question_and_explanation str2 #=> "??!?!!?!?!!?!?!!?"
I don't know the exact Ruby syntax for this, but you could simplify your solution by using regular expressions:
Gather all matches of consecutive characters
if all matches are of even length or 1 exit
Test if matches are an odd length
if an odd length, replace with the empty string
else do nothing
Goto step 1
A solution in Perl would be:
#!perl
use strict;
use warnings;
use feature qw(say);
my $string = '!????!!!?';
sub reduce {
my ($s) = #_;
while ( my #matches = $s =~ m/((.)\2+)/g ) {
last if ! grep { length($_) > 1 && length($_) % 2 == 1 } #matches;
foreach my $match ( #matches ) {
$s =~ s/\Q$match// if length($match) > 1 && length($match) % 2 == 1;
}
}
return $s;
}
say reduce($string);
I could be wrong (this is ruby, after all) but I don't think you'll find a one-liner for this because ruby's utility functions generally aren't recursive. But you can use regex to simplify your logic, at the very least:
def remove(s)
while s =~ /(?<!\!)\!([\!]{2})+(?!\!)/ || s =~ /(?<!\?)\?([\?]{2})+(?!\?)/
s.gsub! /(?<!\!)\!([\!]{2})+(?!\!)/, "" # remove odd !
s.gsub! /(?<!\?)\?([\?]{2})+(?!\?)/, "" # remove odd ?
end
return s
end
To make the regex less mind-boggling, it helps to look at them with 'a' instead of '?' and '!':
/(?<!a)a([a]{2})+(?!a)/ #regex for 'a'
(?<!a) #negative lookbehind: the match cannot start with an 'a'
a([a]{2})+ #the match should be an 'a' followed by 1 or more pairs
(?!a) #negative lookahead: the match cannot end with an 'a'
It should be simple enough with a regular expression replacement
def remove(string)
begin
original = string
string.gsub!(/(\!{3,})|(\?{3,})/) { |s| s.length.even? ? s : '' }
end until original == string
string
end
puts remove("!????!!!?").inspect # answer == "!"
puts remove("!???!!").inspect # answer == ""
puts remove("!????!!").inspect # answer == "!????!!"

Taking a string and returning it with vowels removed

I'm attempting to write a function that takes a string and returns it with all vowels removed. Below is my code.
def vowel(str)
result = ""
new = str.split(" ")
i = 0
while i < new.length
if new[i] == "a"
i = i + 1
elsif new[i] != "a"
result = new[i] + result
end
i = i + 1
end
return result
end
When I run the code, it returns the exact string that I entered for (str). For example, if I enter "apple", it returns "apple".
This was my original code. It had the same result.
def vowel(str)
result = ""
new = str.split(" ")
i = 0
while i < new.length
if new[i] != "a"
result = new[i] + result
end
i = i + 1
end
return result
end
I need to know what I am doing wrong using this methodology. What am I doing wrong?
Finding the bug
Let's see what's wrong with your original code by executing your method's code in IRB:
$ irb
irb(main):001:0> str = "apple"
#=> "apple"
irb(main):002:0> new = str.split(" ")
#=> ["apple"]
Bingo! ["apple"] is not the expected result. What does the documentation for String#split say?
split(pattern=$;, [limit]) → anArray
Divides str into substrings based on a delimiter, returning an array of these substrings.
If pattern is a String, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.
Our pattern is a single space, so split returns an array of words. This is definitely not what we want. To get the desired result, i.e. an array of characters, we could pass an empty string as the pattern:
irb(main):003:0> new = str.split("")
#=> ["a", "p", "p", "l", "e"]
"split on empty string" feels a bit hacky and indeed there's another method that does exactly what we want: String#chars
chars → an_array
Returns an array of characters in str. This is a shorthand for str.each_char.to_a.
Let's give it a try:
irb(main):004:0> new = str.chars
#=> ["a", "p", "p", "l", "e"]
Perfect, just as advertised.
Another bug
With the new method in place, your code still doesn't return the expected result (I'm going to omit the IRB prompt from now on):
vowel("apple") #=> "elpp"
This is because
result = new[i] + result
prepends the character to the result string. To append it, we have to write
result = result + new[i]
Or even better, use the append method String#<<:
result << new[i]
Let's try it:
def vowel(str)
result = ""
new = str.chars
i = 0
while i < new.length
if new[i] != "a"
result << new[i]
end
i = i + 1
end
return result
end
vowel("apple") #=> "pple"
That looks good, "a" has been removed ("e" is still there, because you only check for "a").
Now for some refactoring.
Removing the explicit loop counter
Instead of a while loop with an explicit loop counter, it's more idiomatic to use something like Integer#times:
new.length.times do |i|
# ...
end
or Range#each:
(0...new.length).each do |i|
# ...
end
or Array#each_index:
new.each_index do |i|
# ...
end
Let's apply the latter:
def vowel(str)
result = ""
new = str.chars
new.each_index do |i|
if new[i] != "a"
result << new[i]
end
end
return result
end
Much better. We don't have to worry about initializing the loop counter (i = 0) or incrementing it (i = i + 1) any more.
Avoiding character indices
Instead of iterating over the character indices via each_index:
new.each_index do |i|
if new[i] != "a"
result << new[i]
end
end
we can iterate over the characters themselves using Array#each:
new.each do |char|
if char != "a"
result << char
end
end
Removing the character array
We don't even have to create the new character array. Remember the documentation for chars?
This is a shorthand for str.each_char.to_a.
String#each_char passes each character to the given block:
def vowel(str)
result = ""
str.each_char do |char|
if char != "a"
result << char
end
end
return result
end
The return keyword is optional. We could just write result instead of return result, because a method's return value is the last expression that was evaluated.
Removing the explicit string
Ruby even allows you to pass an object into the loop using Enumerator#with_object, thus eliminating the explicit result string:
def vowel(str)
str.each_char.with_object("") do |char, result|
if char != "a"
result << char
end
end
end
with_object passes "" into the block as result and returns it (after the characters have been appended within the block). It is also the last expression in the method, i.e. its return value.
You could also use if as a modifier, i.e.:
result << char if char != "a"
Alternatives
There are many different ways to remove characters from a string.
Another approach is to filter out the vowel characters using Enumerable#reject (it returns a new array containing the remaining characters) and then join the characters (see Nathan's answer for a version to remove all vowels):
def vowel(str)
str.each_char.reject { |char| char == "a" }.join
end
For basic operations like string manipulation however, Ruby usually already provides a method. Check out the other answers for built-in alternatives:
str.delete('aeiouAEIOU') as shown in Gagan Gami's answer
str.tr('aeiouAEIOU', '') as shown in Cary Swoveland's answer
str.gsub(/[aeiou]/i, '') as shown in Avinash Raj's answer
Naming things
Cary Swoveland pointed out that vowel is not the best name for your method. Choose the names for your methods, variables and classes carefully. It's desirable to have a short and succinct method name, but it should also communicate its intent.
vowel(str) obviously has something to do with vowels, but it's not clear what it is. Does it return a vowel or all vowels from str? Does it check whether str is a vowel or contains a vowel?
remove_vowels or delete_vowels would probably be a better choice.
Same for variables: new is an array of characters. Why not call it characters (or chars if space is an issue)?
Bottom line: read the fine manual and get to know your tools. Most of the time, an IRB session is all you need to debug your code.
I should use regex.
str.gsub(/[aeiou]/i, "")
> string= "This Is my sAmple tExt to removE vowels"
#=> "This Is my sAmple tExt to removE vowels"
> string.delete 'aeiouAEIOU'
#=> "Ths s my smpl txt t rmv vwls"
You can create a method like this:
def remove_vowel(str)
result = str.delete 'aeiouAEIOU'
return result
end
remove_vowel("Hello World, This is my sample text")
# output : "Hll Wrld, Ths s my smpl txt"
Live Demo
Assuming you're trying to learn about the basics of programming, rather than finding the quickest one-liner to do this (which would be to use a regular expression as Avinash has said), you have a number of problems with your code you need to change.
new = str.split(" ")
This line is likely the culprit, because it splits the string based on spaces. So your input string would have to be "a p p l e" to have the effect you're looking for.
new = str.split("")
You should also remove the duplicate i = i+1 once you've changed that.
As others have already identified the problems with the OP's code, I will merely suggest an alternative; namely, you could use String#tr:
"Now is the time for all good people...".tr('aeiouAEIOU', '')
#=> "Nw s th tm fr ll gd ppl..."
If regex is not allowed, you can do it this way:
def remove_vowels(string)
string.split("").delete_if { |letter| %w[a e i o u].include? letter }.join
end

Converting even indexed characters in a string to uppercase ruby

I need to convert all the even indexed characters in a string to become uppercase, while the odd indexed characters stay lowercase. I've tried this, but it keeps failing and I'm not sure why. I'd appreciate some help!
for i in 0..string.length
if (i % 2) == 0
string[i].upcase
else
string[i].downcase
end
end
"foobar".gsub(/(.)(.?)/){$1.upcase + $2.downcase} # => "FoObAr"
"fooba".gsub(/(.)(.?)/){$1.upcase + $2.downcase} # => "FoObA"
There you go:
string = "asfewfgv"
(0...string.size).each do |i|
string[i] = i.even? ? string[i].upcase : string[i].downcase
end
string # => "AsFeWfGv"
We people don't use for loop usually, that's why I gave the above code. But here is correct version of yours :
string = "asfewfgv"
for i in 0...string.length # here ... instead of ..
string[i] = if (i % 2) == 0
string[i].upcase
else
string[i].downcase
end
end
string # => "AsFeWfGv"
You were doing it correctly, you just forgot to reassign it the string index after upcasing or downcasing.
You have two problems with your code:
for i in 0..string.length should be for i in 0...string.length to make the last character evaluated string[string.length-1], rather than going past the end of the string (string[string.length]); and
string[i] must be an L-value; that is, you must have, for example, string[i] = string[i].upcase.
You can correct your code and make it more idiomatic as follows:
string = "The cat in the hat"
string.length.times do |i|
string[i] = i.even? ? string[i].upcase : string[i].downcase
end
string
#=> "ThE CaT In tHe hAt"

How do I use gets when running a file from the terminal?

I'm trying to execute this program from the command line, and I'm not able to use gets.chomp, instead, it returns the key value.
I am entering: ruby name_of_file.rb name_of_file.txt
def caesar_cipher(key)
s = gets.chomp
encoded = ""
s.each_byte do |l|
if ((l >= 65 && l <= 90) || (l >= 97 && l <= 122))
encoded += (l+key).chr
else
encoded += l.chr
end
end
encoded
end
File.readlines(ARGV[0]).map(&:to_i).each {|key| puts caesar_cipher(key)}
I know the program does not execute the caesar cipher completely, I am just trying to figure out how to run it from the command line without having to use pry or irb.
You want to manually enter the cipher key?
Use STDIN.gets
#vgoff has the answer, but here's how I'd rewrite the the code to be more readable:
def caesar_cipher(key)
encoded = ""
s = STDIN.gets.chomp
s.each_char do |l|
case l
when 'A' .. 'Z', 'a' .. 'z'
encoded += (l.ord + key).chr
else
encoded += l
end
end
encoded
end
# File.readlines(ARGV[0]).map(&:to_i).each {|key| puts caesar_cipher(key)}
puts caesar_cipher(0)
puts caesar_cipher(1)
Instead of splitting characters into bytes, I'd probably use each_char to maintain the character-encoding. I'd use a case statement to let me use two ranges to define upper and lower-case characters cleanly, and use ord to get the actual ordinal value for a character, instead of the byte.
It's more readable, but might not fully satisfy your needs.

Ruby: Strings do not match

I am testing two strings to see if they are equal.
One string is just a simple string: "\17"
The other is parsed to: "\17"
num = 7
num2 = "\17"
parsed_num = "\1#{num}"
puts parsed_num.class
puts num2.class
if parsed_num == num2
puts 'Equal'
else
puts 'Not equal'
end
It returns:
String
String
Not equal
My goal is to have parsed_num exactly the same as the literal num2
I am going to take the opposite answer and assume that "\17" is correct, then consider this code:
num = 7
num2 = "\17"
puts "ni #{num2.inspect}"
# extra \ to fix error, for demo
parsed_num = "\\1#{num}"
puts "pi #{parsed_num.inspect}"
# for note, but ICK!!!
p2 = eval('"' + parsed_num + '"')
puts "p2i #{p2.inspect}"
puts "p2= #{p2 == num2}"
dec = (10 + num).to_s.oct
p3 = dec.chr
puts "p3i #{p3.inspect}"
puts "p3= #{p3 == num2}"
Result:
ni "\017"
pi "\\17"
p2i "\017"
p2= true
p3i "\017"
p3= true
The reason why "\1#{num}" didn't work is that string literals -- and the embedded escape sequences -- are handled during parsing while the string interpolation itself (#{}) happens later, at run-time. (This is required, because who knows what may happen to be in num?)
In the case of p2 I used eval, which parses and then executes the supplied code. The code there is equivalent to eval('"\17"'), because parsed_num contained the 3-letter string: \17. (Please note, this approach is generally considered bad!)
In the case of p3 I manually did what the parser does for string interpolation of \octal: took the value of octal, in, well, octal, and then converted it into the "character" with the corresponding value.
Happy coding.
If you're using "\17" backslash escape, it will be interpreted as as "\0017", where 17 would be an octal digit equals to 'F' hex:
"\17" # => "\u000F"
because your string uses double quotes.
You can achieve what you want with help of this snippet, for example:
num = 7
num2 = "\\17"
parsed_num = "\\1#{num}"
if parsed_num == num2
puts 'Equal'
else
puts 'Not equal'
end
# => Equal
As you can see you get this result with help of the backslash to escape another backslash :)
Use single quotes so that the strings involved are the literal things you are setting:
num = 7
num2 = '\17'
parsed_num = '\1' + String(num)
if parsed_num == num2
puts 'Equal'
else
puts 'Not equal'
end
This produces 'Equal' - the desired result. Here's a link with more info on the differences between single quoted strings and double quoted strings if desired.

Resources