Modify string class to use only uppercase - ruby

I have a small application that processes some basic data (Names, birthdate, etc.). It will be interfacing with a management system that only accepts uppercase strings. Thinking of ways to go about this, I know I could just use .upcase for all the variables. I figured the most DRY way would be to modify the String class itself and make a conversion, but could not find any documentation as to the method within String that actually takes in the value of said string. The more I think about it, I also do not know what the implications of doing it this way would be (if it's even possible).
I tried monkey patching the String class
class String
def initialize
self = self.upcase
end
end
Or
class String
def new(str="")
new_str = str.upcase
end
end
But I haven't found any info on how a string is actually initialized.
Tl;Dr
How can I convert a lower case string to uppercase on said string's
initialization
Are there any implications I should be aware of if it
is possible?
Thank you for your time.

The solution here is not to boil the ocean and make every string in Ruby force everything to uppercase, but to uppercase the things that system needs if and when you provide it to that system.
Changing fundamental Ruby classes in this dramatic a way is bound to cause your entire code-base to implode. Many internals depend on being able to store arbitrary data in strings, and if those strings are arbitrarily uppercased you're in big trouble. It's like redefining what Integer#+ does. You can, but you really, really shouldn't. This would be akin to redefining the electrical charge of a proton. The universe would literally explode.
It's better to write some kind of adapter method that can operate on arbitrary strings or values and make sure they conform to whatever quirks or encoding your other system uses:
def to_arcahic(string)
string.upcase
end
If, for example, they don't allow accented characters or emoji, you'll need to strip those out and/or convert them to something else. Maybe "é" becomes "E" or maybe you just delete it.

Related

Evaluating a frozen string

My vague understanding is that, with Ruby 2.2's frozen method on string or Ruby 2.3's frozen-string-literal: true pragma, a relevant frozen string literal is evaluated only once throughout program execution if and only if the string does not have interpolation. The following seems to illustrate this:
Not interpolated
#frozen-string-literal: true
5.times{p "".object_id}
Outputs (same object IDs):
70108065381260
70108065381260
70108065381260
70108065381260
70108065381260
Interpolated
#frozen-string-literal: true
5.times{p "#{}".object_id}
Outputs (different object IDs):
70108066220720
70108066220600
70108066220420
70108066220300
70108066220180
What is this property (i.e., being evaluated only once) called? It should be distinct from immutability.
Is my understanding of the condition when strings come to have such property correct? Where is the official documentation mentioning this?
Is there a way to make an interpolated string be evaluated only once?
Interning. The strings are said to be interned.
Not completely. It is more like if the interpreter can decide what the value of the string would be before evaluating it. For example, consider:
5.times { puts "#{'foo'}".object_id }
The id is the same even though there is interpolation involved.
No. This is an internal optimization. The main point of Object#freeze is immutability.
UPDATE: Only literal strings get internalized. This is evident here.
I couldn't find the part of the code responsible for interpolation. So I'm not sure why "#{'foo'}" is considered a literal string. Note that wherever this translation occurs, it is on a lower parser level and happens way before any actual processing. This is evident by the fact that String#freeze is mapped to rb_str_freeze, which doesn't call opt_str_freeze.
"Frozen" is not about whether the string is evaluated more than once. It is, you are right, about mutability.
A string literal will be evaluated every time the line containing it is encountered.
The (only) way to make it be evaluated only once, is to put it in a line of source code that is only executed once, instead of in a loop. A string literal in a loop (or any other part of source code) will always be evaluated every time that line of source code is executed in program flow.
This is indeed a separate thing than whether it is frozen/immutable or not, once evaluated.
The accepted answer is kind of misleading. "It is more like if the interpreter can decide what the value of the string would be before evaluating it." Nope. Not at all. It needs to be evaluated. If the string is frozen, then once it IS evaluated, it will use the same location in memory and the same object/object_id (which are two ways of saying the same thing) as all other equivalent strings. But it's still being evaluated, with or without interpolation.
(Without interpolation, 'evaluation' of a string literal is very very quick. With simple interpolation it's usually pretty quick too. You can of course use interpolation to call out to an expensive method though, hypothetically).
Without interpolation, I wouldn't worry about it at all. With interpolation, if you think your interpolation is expensive enough you don't want to do it in a loop -- the only way to avoid it is not to do it in a loop, but create the string once outside the loop.
Ruby docs probably talk about "String literals" rather than "literal Strings". A "String literal" is any String created by bytes in source code (using '', "", %Q[], or any of the other ways of creating strings literals in source code in ruby). With or without interpolation.
So what kinds of Strings aren't created by String literals? Well, a string created by reading in bytes from a file or network for instance. Or a String created by taking an existing string and calling a method on it that returns a copy, like some_string.dup. "String literal" means a string created literally in source code, rather than by reading from external input. http://ruby-doc.org/core-2.1.1/doc/syntax/literals_rdoc.html

Special character uppercase

I have strings with a bunch of special characters. This works:
myString.upcase.tr('æ-ý','Æ-Ý')
However, it does not work really cross-platform. My Ruby implementation on Windows won't go with this (on my Mac and Linux machines, works like a charm). Any pointers / workarounds / solutions, really appreciated!
Try mb_chars method if you are using Rails >= 3. For example,
'æ-ý'.mb_chars.upcase
=> "Æ-Ý"
If you're not using Rails please try unicode gem.
Unicode::upcase('æ-ý')
Or you can override String class methods as well:
require "unicode";
class String
def downcase
Unicode::downcase(self)
end
def downcase!
self.replace downcase
end
def upcase
Unicode::upcase(self)
end
def upcase!
self.replace upcase
end
def capitalize
Unicode::capitalize(self)
end
def capitalize!
self.replace capitalize
end
end
Unfortunately, it is impossible to correctly upcase/downcase a string without knowing the language and it in some cases even the contents of the string.
For example, in English the uppercase variant of i is I and the lowercase variant of I is i, but in Turkish the uppercase variant of i is İ and the lowercase variant of I is ı. In German, the uppercase variant of ß is SS, but so is the uppercase variant of ss, so to downcase, you need to understand the text, because e.g. MASSE could be downcased to either masse (mass) or maße (measurements).
Ruby takes the easy way out and simply only uppercases/downcases within the ASCII alphabet.
However, that only explains why your workaround is needed, not why it sometimes works and sometimes doesn't. Provided that you use the same Ruby version and the same Ruby implementation and the same version of the implementation on all platforms, it should work. YARV doesn't use the underlying platform's string manipulation routines much (the same is true for most Ruby implementations, actually, even JRuby doesn't use Java's powerful string libraries but rolls its own for maximum compatibility), and it also doesn't use any third-party libraries (like e.g. ICU) except Onigmo, so it's unlikely that platform differences are to blame. Different versions of Ruby use different versions of the Unicode Character Database, though (e.g. I believe it was updated somewhere between 1.9 and 2.2 at least once), so if you have a version mismatch, that might explain it.
Or, it might be a genuine bug in YARV on Windows. Maybe try JRuby? It tends to be more consistent between platforms, in fact, on Windows, it is more compatible with Ruby than Ruby (i.e. YARV) itself!

In Ruby can data interpolated into a string cause the string to terminate?

In Ruby is there any way that data added to a string with interpolation can terminate the string? For example something like:
"This remains#{\somekindofmagic} and this does not" # => "This remains"
I'm assuming not but I want to be sure that doing something like
something.send("#{untrusted_input}=", more_untrusted_input)
doesn't actually leave some way that the interpolated string could be terminated and used to send eval.
Not possible with input string data AFAIK. Ruby Strings can contain arbitrary binary data, there should be no magic combination of bytes that terminates a String early.
If you are worried about "injection" style attacks on Ruby strings, then this is generally not easy to achieve if input is in the form of external data that has been converted to a string (and your specific concern about having an eval triggered cannot occur). This style of attack relies on code that passes an input string into some other interpreter (e.g. SQL or JavaScript) without properly escaping language constructs.
However, if String parameters are coming in the form of Ruby objects from untrusted Ruby code in the same process, it is possible to add side-effects to them:
class BadString
def to_s
puts "Payload"
"I am innocent"
end
end
b = BadString.new
c = "Hello #{b}"
Payload
=> "Hello I am innocent"
Edit: Your example
something.send("#{untrusted_input}=", more_untrusted_input)
would still worry me slightly, if untrusted_input really is untrusted, you are relying heavily on the fact that there are no methods ending in = that you would be unhappy to have called. Sometimes new methods can be defined on core classes due to use of a framework or gem, and you may not know about them, or they may appear in later versions of a gem. Personally I would whitelist allowed method names for that reason, or use some other validation scheme on the incoming data, irrespective of how secure you feel against open-ended evals.
Strings in ruby are internally handled as an array of bytes on the heap and an integer that holds the length of the string. So while in C a NUL byte (\0) terminates a string, this can not happen in ruby.
More info on ruby string internals here: http://patshaughnessy.net/2012/1/4/never-create-ruby-strings-longer-than-23-characters (also includes why ruby strings longer than 23 bytes were slower in ruby 1.9).

How do you check for a changing value within a string

I am doing some localization testing and I have to test for strings in both English and Japaneses. The English string might be 'Waiting time is {0} minutes.' while the Japanese string might be '待ち時間は{0}分です。' where {0} is a number that can change over the course of a test. Both of these strings are coming from there respective property files. How would I be able to check for the presence of the string as well as the number that can change depending on the test that's running.
I should have added the fact that I'm checking these strings on a web page which will display in the relevant language depending on the location of where they are been viewed. And I'm using watir to verify the text.
You can read elsewhere about various theories of the best way to do testing for proper language conversion.
One typical approach is to replace all hard-coded text matches in your code with constants, and then have a file that sets the constants which can be updated based on the language in use. (I've seen that done by wrapping the require of that file in a case statement based on the language being tested. Another approach is an array or hash for each value, enumerated by a variable with a name like 'language', which lets the tests change the language on the fly. So validations would look something like this
b.div(:id => "wait-time-message).text.should == WAIT_TIME_MESSAGE[language]
To match text where part is expected to change but fall within a predictable pattern, use a regular expression. I'd recommend a little reading about regular expressions in ruby, especially using unicode regular expressions in ruby, as well as some experimenting with a tool like Rubular to test regexes
In the case above a regex such as:
/Waiting time is \d+ minutes./ or /待ち時間は\d+分です。/
would match the messages above and expect one or more digits in the middle (note that it would fail if no digits appear, if you want zero or more digits, then you would need a * in place of the +
Don't check for the literal string. Check for some kind of intermediate form that can be used to render the final string.
Sometimes this is done by specifying a message and any placeholder data, like:
[ :waiting_time_in_minutes, 10 ]
Where that would render out as the appropriate localized text.
An alternative is to treat one of the languages as a template, something that's more limited in flexibility but works most of the time. In that case you could use the English version as the string that's returned and use a helper to render it to the final page.

Most concise way to test string equality (not object equality) for Ruby strings or symbols?

I always do this to test string equality in Ruby:
if mystring.eql?(yourstring)
puts "same"
else
puts "different"
end
Is this is the correct way to do this without testing object equality?
I'm looking for the most concise way to test strings based on their content.
With the parentheses and question mark, this seems a little clunky.
According to http://www.techotopia.com/index.php/Ruby_String_Concatenation_and_Comparison
Doing either
mystring == yourstring
or
mystring.eql? yourstring
Are equivalent.
Your code sample didn't expand on part of your topic, namely symbols, and so that part of the question went unanswered.
If you have two strings, foo and bar, and both can be either a string or a symbol, you can test equality with
foo.to_s == bar.to_s
It's a little more efficient to skip the string conversions on operands with known type. So if foo is always a string
foo == bar.to_s
But the efficiency gain is almost certainly not worth demanding any extra work on behalf of the caller.
Prior to Ruby 2.2, avoid interning uncontrolled input strings for the purpose of comparison (with strings or symbols), because symbols are not garbage collected, and so you can open yourself to denial of service through resource exhaustion. Limit your use of symbols to values you control, i.e. literals in your code, and trusted configuration properties.
Ruby 2.2 introduced garbage collection of symbols.

Resources