Evaluating a frozen string - ruby

My vague understanding is that, with Ruby 2.2's frozen method on string or Ruby 2.3's frozen-string-literal: true pragma, a relevant frozen string literal is evaluated only once throughout program execution if and only if the string does not have interpolation. The following seems to illustrate this:
Not interpolated
#frozen-string-literal: true
5.times{p "".object_id}
Outputs (same object IDs):
70108065381260
70108065381260
70108065381260
70108065381260
70108065381260
Interpolated
#frozen-string-literal: true
5.times{p "#{}".object_id}
Outputs (different object IDs):
70108066220720
70108066220600
70108066220420
70108066220300
70108066220180
What is this property (i.e., being evaluated only once) called? It should be distinct from immutability.
Is my understanding of the condition when strings come to have such property correct? Where is the official documentation mentioning this?
Is there a way to make an interpolated string be evaluated only once?

Interning. The strings are said to be interned.
Not completely. It is more like if the interpreter can decide what the value of the string would be before evaluating it. For example, consider:
5.times { puts "#{'foo'}".object_id }
The id is the same even though there is interpolation involved.
No. This is an internal optimization. The main point of Object#freeze is immutability.
UPDATE: Only literal strings get internalized. This is evident here.
I couldn't find the part of the code responsible for interpolation. So I'm not sure why "#{'foo'}" is considered a literal string. Note that wherever this translation occurs, it is on a lower parser level and happens way before any actual processing. This is evident by the fact that String#freeze is mapped to rb_str_freeze, which doesn't call opt_str_freeze.

"Frozen" is not about whether the string is evaluated more than once. It is, you are right, about mutability.
A string literal will be evaluated every time the line containing it is encountered.
The (only) way to make it be evaluated only once, is to put it in a line of source code that is only executed once, instead of in a loop. A string literal in a loop (or any other part of source code) will always be evaluated every time that line of source code is executed in program flow.
This is indeed a separate thing than whether it is frozen/immutable or not, once evaluated.
The accepted answer is kind of misleading. "It is more like if the interpreter can decide what the value of the string would be before evaluating it." Nope. Not at all. It needs to be evaluated. If the string is frozen, then once it IS evaluated, it will use the same location in memory and the same object/object_id (which are two ways of saying the same thing) as all other equivalent strings. But it's still being evaluated, with or without interpolation.
(Without interpolation, 'evaluation' of a string literal is very very quick. With simple interpolation it's usually pretty quick too. You can of course use interpolation to call out to an expensive method though, hypothetically).
Without interpolation, I wouldn't worry about it at all. With interpolation, if you think your interpolation is expensive enough you don't want to do it in a loop -- the only way to avoid it is not to do it in a loop, but create the string once outside the loop.
Ruby docs probably talk about "String literals" rather than "literal Strings". A "String literal" is any String created by bytes in source code (using '', "", %Q[], or any of the other ways of creating strings literals in source code in ruby). With or without interpolation.
So what kinds of Strings aren't created by String literals? Well, a string created by reading in bytes from a file or network for instance. Or a String created by taking an existing string and calling a method on it that returns a copy, like some_string.dup. "String literal" means a string created literally in source code, rather than by reading from external input. http://ruby-doc.org/core-2.1.1/doc/syntax/literals_rdoc.html

Related

Get the same results from string.start_with? and string[ ]

Basically, I want to check if a string (main) starts with another string (sub), using both of the above methods. For example, following is my code:
main = gets.chomp
sub = gets.chomp
p main.start_with? sub
p main[/^#{sub}/]
And, here is an example with I/O - Try it online!
If I enter simple strings, then both of them works exactly the same, but when I enter strings like "1\2" in stdin, then I get errors in the Regexp variant, as seen in TIO example.
I guess this is because of the reason that the string passed into second one isn't raw. So, I tried passing sub.dump into second one - Try it online!
which gives me nil result. How to do this correctly?
As a general rule, you should never ever blindly execute inputs from untrusted sources.
Interpolating untrusted input into a Regexp is not quite as bad as interpolating it into, say, Kernel#eval, because the worst thing an attacker can do with a Regexp is to construct an Evil Regex to conduct a Regular expression Denial of Service (ReDoS) attack (see also the section on Performance in the Regexp documentation), whereas with eval, they could execute arbitrary code, including but not limited to, deleting the entire file system, scanning memory for unencrypted passwords / credit card information / PII and exfiltrate that via the network, etc.
However, it is still a bad idea. For example, when I say "the worst thing that happen is a ReDoS", that assumes that there are no bugs in the Regexp implementation (Onigmo in the case of YARV, Joni in the case of JRuby and TruffleRuby, etc.) Ruby's Regexps are quite powerful and thus Onigmo, Joni and co. are large and complex pieces of code, and may very well have their own security holes that could be used by a specially crafted Regexp.
You should properly sanitize and escape the user input before constructing the Regexp. Thankfully, the Ruby core library already contains a method which does exactly that: Regexp::escape. So, you could do something like this:
p main[/^#{Regexp.escape(sub)}/]
The reason why your attempt at using String#dump didn't work, is that String#dump is for representing a String the same way you would have to write it as a String literal, i.e. it is escaping String metacharacters, not Regexp metacharacters and it is including the quote characters around the String that you need to have it recognized as a String literal. You can easily see that when you simply try it out:
sub.dump
#=> "\"1\\\\2\""
# equivalent to '"1\\2"'
So, that means that String#dump
includes the quotes (which you don't want),
escapes characters that don't need escaping in Regexp just because they need escaping in Strings (e.g. # or "), and
doesn't escape characters that don't need escaping in Strings (e.g. [, ., ?, *, +, ^, -).

In Ruby can data interpolated into a string cause the string to terminate?

In Ruby is there any way that data added to a string with interpolation can terminate the string? For example something like:
"This remains#{\somekindofmagic} and this does not" # => "This remains"
I'm assuming not but I want to be sure that doing something like
something.send("#{untrusted_input}=", more_untrusted_input)
doesn't actually leave some way that the interpolated string could be terminated and used to send eval.
Not possible with input string data AFAIK. Ruby Strings can contain arbitrary binary data, there should be no magic combination of bytes that terminates a String early.
If you are worried about "injection" style attacks on Ruby strings, then this is generally not easy to achieve if input is in the form of external data that has been converted to a string (and your specific concern about having an eval triggered cannot occur). This style of attack relies on code that passes an input string into some other interpreter (e.g. SQL or JavaScript) without properly escaping language constructs.
However, if String parameters are coming in the form of Ruby objects from untrusted Ruby code in the same process, it is possible to add side-effects to them:
class BadString
def to_s
puts "Payload"
"I am innocent"
end
end
b = BadString.new
c = "Hello #{b}"
Payload
=> "Hello I am innocent"
Edit: Your example
something.send("#{untrusted_input}=", more_untrusted_input)
would still worry me slightly, if untrusted_input really is untrusted, you are relying heavily on the fact that there are no methods ending in = that you would be unhappy to have called. Sometimes new methods can be defined on core classes due to use of a framework or gem, and you may not know about them, or they may appear in later versions of a gem. Personally I would whitelist allowed method names for that reason, or use some other validation scheme on the incoming data, irrespective of how secure you feel against open-ended evals.
Strings in ruby are internally handled as an array of bytes on the heap and an integer that holds the length of the string. So while in C a NUL byte (\0) terminates a string, this can not happen in ruby.
More info on ruby string internals here: http://patshaughnessy.net/2012/1/4/never-create-ruby-strings-longer-than-23-characters (also includes why ruby strings longer than 23 bytes were slower in ruby 1.9).

How do you check for a changing value within a string

I am doing some localization testing and I have to test for strings in both English and Japaneses. The English string might be 'Waiting time is {0} minutes.' while the Japanese string might be '待ち時間は{0}分です。' where {0} is a number that can change over the course of a test. Both of these strings are coming from there respective property files. How would I be able to check for the presence of the string as well as the number that can change depending on the test that's running.
I should have added the fact that I'm checking these strings on a web page which will display in the relevant language depending on the location of where they are been viewed. And I'm using watir to verify the text.
You can read elsewhere about various theories of the best way to do testing for proper language conversion.
One typical approach is to replace all hard-coded text matches in your code with constants, and then have a file that sets the constants which can be updated based on the language in use. (I've seen that done by wrapping the require of that file in a case statement based on the language being tested. Another approach is an array or hash for each value, enumerated by a variable with a name like 'language', which lets the tests change the language on the fly. So validations would look something like this
b.div(:id => "wait-time-message).text.should == WAIT_TIME_MESSAGE[language]
To match text where part is expected to change but fall within a predictable pattern, use a regular expression. I'd recommend a little reading about regular expressions in ruby, especially using unicode regular expressions in ruby, as well as some experimenting with a tool like Rubular to test regexes
In the case above a regex such as:
/Waiting time is \d+ minutes./ or /待ち時間は\d+分です。/
would match the messages above and expect one or more digits in the middle (note that it would fail if no digits appear, if you want zero or more digits, then you would need a * in place of the +
Don't check for the literal string. Check for some kind of intermediate form that can be used to render the final string.
Sometimes this is done by specifying a message and any placeholder data, like:
[ :waiting_time_in_minutes, 10 ]
Where that would render out as the appropriate localized text.
An alternative is to treat one of the languages as a template, something that's more limited in flexibility but works most of the time. In that case you could use the English version as the string that's returned and use a helper to render it to the final page.

Is it possible to use Column Properties in Expressions in Powerbuilder?

Say I have a field on a datawindow that is the value of a database column ("Insert > Column). It has conditions in which it needs to be protected (Properties>General>Protect).
I want to have the field background grey when it's protect. At the moment, the only way I can work out how to do this is to copy the protect conditional, no matter how complex, substituting the 1 (protect) and 0 (not protect) for colour values.
Is there some sort of syntax I can use in the Expression field for the column's background colour that references the protect value of the column? I tried
if (column.protect=1, Grey, White)
but it returns errorous saying it expects a TRUE/FALSE condition.
Is what I'm after impossible, or is it just a matter of getting the right syntax.
Cheers.
Wow. You like complex, layered questions.
The first problem is accessing the value, which isn't done as directly as you described. As a matter of fact, you use a Describe() to get the value. The only problem with that is that it comes back as a string in the following format, with quotes around (note that we're using standard PowerScript string notation where ~t is a tab)
"<DefaultValue>~t<Expression>"
You want the expression, so you'll have to parse it out, dropping the quotes as well.
Once you've got the expression, you'll need to evaluate it for the given row. That can be done with another Describe () call, particularly:
Describe ("Evaluate('<expression>', <rownum>)")
The row number that an expression is being evaluated on can be had with the GetRow() function.
This may sound like it needs PowerScript and some interim value storage, but as long as you're willing to make redundant function calls to get a given value more than once, you can do this in an expression, something like (for an example column b):
if (Describe ("Evaluate (~"" + Mid (Describe ("b.protect"),
Pos (Describe ("b.protect"), "~t")+1,
Len (Describe ("b.protect")) - Pos (Describe ("b.protect"), "~t") - 1)
+ "~", " + String (GetRow()) + ")")='1',
rgb(128, 128, 128),
rgb(255,255,255))
This looks complex, but if you put the Mid() expression in a compute field so you can see the result, you'll see that simply parses out the Protect expression and puts it into the Describe (Evaluate()) syntax described above.
I have put one cheat into my code for simplicity. I used the knowledge that I only had single quotes in my Protect expression, and chose to put the Evaluate() expression string in double quotes. If I was trying to do this generically for any column, and couldn't assume an absence of double quotes in my Protect expression, I'd have use a global function to do a replace of any double quotes in the Protect expression with escaped quotes (~"), which I believe in your code would look like a triple tilde and a quote. Then again, if I had to make a global function call (note that global function calls in expressions can have a significant performance impact if there are a lot of rows), I'd just pass it the Describe ("column.protect") and GetRow() and build the entire expression in PowerScript, which would be easier to understand and maintain.
Good luck,
Terry.

Most concise way to test string equality (not object equality) for Ruby strings or symbols?

I always do this to test string equality in Ruby:
if mystring.eql?(yourstring)
puts "same"
else
puts "different"
end
Is this is the correct way to do this without testing object equality?
I'm looking for the most concise way to test strings based on their content.
With the parentheses and question mark, this seems a little clunky.
According to http://www.techotopia.com/index.php/Ruby_String_Concatenation_and_Comparison
Doing either
mystring == yourstring
or
mystring.eql? yourstring
Are equivalent.
Your code sample didn't expand on part of your topic, namely symbols, and so that part of the question went unanswered.
If you have two strings, foo and bar, and both can be either a string or a symbol, you can test equality with
foo.to_s == bar.to_s
It's a little more efficient to skip the string conversions on operands with known type. So if foo is always a string
foo == bar.to_s
But the efficiency gain is almost certainly not worth demanding any extra work on behalf of the caller.
Prior to Ruby 2.2, avoid interning uncontrolled input strings for the purpose of comparison (with strings or symbols), because symbols are not garbage collected, and so you can open yourself to denial of service through resource exhaustion. Limit your use of symbols to values you control, i.e. literals in your code, and trusted configuration properties.
Ruby 2.2 introduced garbage collection of symbols.

Resources