Ruby: What does the comment "frozen_string_literal: true" do? - ruby

This is the rspec binstub in my project directory.
#!/usr/bin/env ruby
begin
load File.expand_path("../spring", __FILE__)
rescue LoadError
end
# frozen_string_literal: true
#
# This file was generated by Bundler.
#
# The application 'rspec' is installed as part of a gem, and
# this file is here to facilitate running it.
#
require "pathname"
ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../Gemfile",
Pathname.new(__FILE__).realpath)
require "rubygems"
require "bundler/setup"
load Gem.bin_path("rspec-core", "rspec")
What is this intended to do?
# frozen_string_literal: true

# frozen_string_literal: true is a magic comment, supported for the first time in Ruby 2.3, that tells Ruby that all string literals in the file are implicitly frozen, as if #freeze had been called on each of them. That is, if a string literal is defined in a file with this comment, and you call a method on that string which modifies it, such as <<, you'll get RuntimeError: can't modify frozen String.
The comment must be on the first line of the file.
In Ruby 2.3, you can use this magic comment to prepare for frozen string literals being the default in Ruby 3.
In Ruby 2.3 run with the --enable=frozen-string-literal flag, and in Ruby 3, string literals are frozen in all files. You can override the global setting with # frozen_string_literal: false.
If you want a string literal to be mutable regardless of the global or per-file setting, you can prefix it with the unary + operator (being careful with operator precedence) or call .dup on it:
# frozen_string_literal: true
"".frozen?
=> true
(+"").frozen?
=> false
"".dup.frozen?
=> false
You can also freeze a mutable (unfrozen) string with unary -.
Source: magic_comment defined in ruby/ruby

It improves application performance by not allocating new space for the same string, thereby also saving time for garbage collection chores. How? when you freeze a string literal(string object), you're telling Ruby to not let any of your programs modify the string literal (object).
Some obvious observations to keep in mind.
1. By freezing string literals, you're not allocating new memory space for it.
Example:
Without magic comment allocates new space for the same string
(Observe the different object IDs printed)
def hello_id
a = 'hello'
a.object_id
end
puts hello_id #=> 70244568358640
puts hello_id #=> 70244568358500
With magic comment, ruby allocates space only once
# frozen_string_literal: true
def hello_id
a = 'hello'
a.object_id
end
puts hello_id #=> 70244568358640
puts hello_id #=> 70244568358640
2. By freezing string literals, your program will raise an exception when trying to modify the string literal.
Example:
Without magic comment, you can modify the string literals.
name = 'Johny'
name << ' Cash'
puts name #=> Johny Cash
With magic comment, an exception will be raised when you modify string literals
# frozen_string_literal: true
name = 'john'
name << ' cash' #=> `<main>': can't modify frozen String (FrozenError)
puts name
There's always more to learn and be flexible:
https://bugs.ruby-lang.org/issues/8976
https://www.mikeperham.com/2018/02/28/ruby-optimization-with-one-magic-comment/

In Ruby 3.0. Matz (Ruby’s creator) decided to make all String literals frozen by default.
EDIT 2019: he decided to abandon the idea of making frozen-string-literals default for Ruby 3.0 (source: https://bugs.ruby-lang.org/issues/11473#note-53)
You can use in Ruby 2.x. Just add this comment in the first line of your files.
# frozen_string_literal: true
The above comment at top of a file changes semantics of static string
literals in the file. The static string literals will be frozen and
always returns same object. (The semantics of dynamic string literals
is not changed.)
This way has following benefits:
No ugly f-suffix.
No syntax error on older Ruby.
We need only a line
for each file.
Plese, read this topic for more information.
https://bugs.ruby-lang.org/issues/8976

Related

How can I consistently prefix IRB return values with a custom comment string in Ruby >= 3.0.0?

I have the following in my ~/.irbrc file:
IRB.conf[:PROMPT][:DEFAULT][:RETURN].prepend ?#
In earlier Ruby versions, this would ensure that each return value was prefixed with a comment symbol rather than just the association token (e.g. #=> rather than =>), which allowed cutting-and-pasting into a REPL without the return values being evaluated. However, after upgrading to Ruby 3.0.0, it seems that newer versions of IRB occasionally wrap the output of long return values, and I'm not sure how to ensure all return values are properly commented out. For example, consider this now-typical output from an unrelated post:
s1 = Suggestion.new :foo, %w[Alice Bob]
#=> #<Suggestion:0x00007f9671154578 #participants=["Alice", "Bob"], #type=:foo>
s2 = Suggestion.new :bar, %w[Charlie Dana]
#=> #<Suggestion:0x00007faed7113900 #participants=:bar, #type=:foo>
Suggestion.all
#=>
[#<Suggestion:0x00007f9671154578 #participants=["Alice", "Bob"], #type=:foo>,
#<Suggestion:0x00007f9671089058
#participants=["Charlie", "Dana"],
#type=:bar>]
Here, the first two lines of code show the return values correctly preceeded by a comment character, but the Array returned by the third line results in the return value following the defined comment characters. The Ruby 3.0.0 IRB module doesn't say anything about this wrapping behavior, or provide any obvious clues about how I can format multi-line return values consistently as comments.
How can I ensure that all lines of the return value in IRB are prefixed with a comment character?

Substring syntaxes in Ruby

Python has the following elegant syntax for checking whether one string is a substring of another one:
'ab' in 'abc' # True
Is there an equivalent elegant syntax in Ruby?
I'm aware to the "abc".includes? "ab" Ruby syntax, but I'm wondering whether the inverse syntax exists too (where the first parameter is the substring and the second is the string).
There isn't such method in Ruby standard library, but Rails ActiveSupport provides #.in? method:
1.9.3-p484 :004 > "ab".in? "abc"
=> true
Here is the source code: https://github.com/rails/rails/blob/e20dd73df42d63b206d221e2258cc6dc7b1e6068/activesupport/lib/active_support/core_ext/object/inclusion.rb
Define "elegant".
This does a sub-string search and returns the "hit" if found:
'abc'['ab'] # => "ab"
Using !! converts the value returned to a true/false, so "ab" becomes true:
!!'abc'['ab'] # => true
Knowing that, it's trivial to add it in if you want something closer:
class String
def in?(other)
!!other[self]
end
end
'ab'.in?('abc') # => true
'ab'.in? 'abc' # => true
Or, use require 'active_support/core_ext/object/inclusion' to cherry-pick the Active Suport definition that extends all objects to allow in?. See http://edgeguides.rubyonrails.org/active_support_core_extensions.html#in-questionmark. The upside/downside to that it's modifying all objects.

How to create a string with a "bad encoding" in ruby?

I have a file somewhere out in production that I do not have access to that, when loaded by a ruby script, a regular expression against the contents fails with a ArgumentError => invalid byte sequence in UTF-8.
I believe I have a fix based on the answer with all the points here: ruby 1.9: invalid byte sequence in UTF-8
# Remove all invalid and undefined characters in the given string
# (ruby 1.9.3)
def safe_str str
# edited based on matt's comment (thanks matt)
s = str.encode('utf-16', 'utf-8', invalid: :replace, undef: :replace, replace: '')
s.encode!('utf-8', 'utf-16')
end
However, I now want to build my rspec to verify that the code works. I don't have access to the file that caused the problem so I want to create a string with the bad encoding programatically.
I've tried variations on things like:
bad_str = (100..1000).to_a.inject('') {|s,c| s << c; s}
bad_str.length.should > safe_str(bad_str).length
or,
bad_str = (100..1000).to_a.pack(c*)
bad_str.length.should > safe_str(bad_str).length
but the length is always the same. I have also tried different character ranges; not always 100 to 1000.
Any suggestions on how to build a string with an invalid encoding within a ruby 1.9.3 script?
Lots of one-byte strings will make an invalid UTF-8 string, starting with 0x80. So 128.chr should work.
Your safe_str method will (currently) never actually do anything to the string, it is a no-op. The docs for String#encode on Ruby 1.9.3 say:
Please note that conversion from an encoding enc to the same encoding enc is a no-op, i.e. the receiver is returned without any changes, and no exceptions are raised, even if there are invalid bytes.
This is true for the current release of 2.0.0 (patch level 247), however a recent commit to Ruby trunk changes this, and also introduces a scrub method that pretty much does what you want.
Until a new version of Ruby is released you will need to round trip your text string to another encoding and back to clean it, as in the second example in this answer to the question you linked to, something like:
def safe_str str
s = str.encode('utf-16', 'utf-8', invalid: :replace, undef: :replace, replace: '')
s.encode!('utf-8', 'utf-16')
end
Note that your first example of an attempt to create an invalid string won’t work:
bad_str = (100..1000).to_a.inject('') {|s,c| s << c; s}
bad_str.valid_encoding? # => true
From the << docs:
If the object is a Integer, it is considered as a codepoint, and is converted to a character before concatenation.
So you’ll always get a valid string.
Your second method, using pack will create a string with the encoding ASCII-8BIT. If you then change this using force_encoding you can create a UTF-8 string with an invalid encoding:
bad_str = (100..1000).to_a.pack('c*').force_encoding('utf-8')
bad_str.valid_encoding? # => false
Try with s = "hi \255"
s.valid_encoding?
# => false
Following example can be used for testing purposes:
describe TestClass do
let(:non_utf8_text) { "something\255 english." }
it 'is not raise error on invalid byte sequence string' do
expect(non_utf8_text).not_to be_valid_encoding
expect { subject.call(non_utf8_text) }.not_to raise_error
end
end
Thanks to Iwan B. for "\255" advise.
In spec tests I’ve written, I haven’t found a way to fix this bad encoding:
Period%Basics
The %B string consistently produces ArgumentError: invalid byte sequence in UTF-8.

Clarifications on YAML syntax and Ruby parsing

I am new to YAML and Ruby. I am using the following Ruby code to parse a YAML file:
obj = YAML::load_file('test.yml')
Are the following YAML file contents for 'test.yml' valid?
Case 1:
test
In this case, I don't specify the value of test (something like test : true) but my Ruby parsing code does not throw an error. I thought this was an invalid YAML syntax.
Case 2:
:test : true
In this case, the Ruby code treats test as a symbol instead of a string and when I do puts obj[:test], it returns the result to be "true". Is this a Ruby thing? Other languages will interpret it as a string ":test"?
Case 3:
:test : true
:test : false
In this case, instead of throwing up an error for redefinition of :test, my Ruby code takes the latest value for :test (which is false). Why is this? Does YAML syntax allow for re-definition of elements and in which case only the latest value gets taken?
Case 1: YAML allows unquoted scalars, or "bare" strings not enclosed in quotes. Compared to quoted strings they are less flexible, since you can't use certain characters without creating ambiguous syntax, but the Ruby parser does support them.
1.9.3-p448 > YAML::parse('test').to_ruby
=> "test"
Case 2: As you've guessed, this is Ruby-specific since YAML has no concept of "symbols". When converting a YAML mapping to a Ruby hash, scalar keys that start with a colon are interpreted as symbols instead of strings.
Case 3: Under YAML's definition of a mapping, keys must be unique, so a strict parser should throw an error when given your example. It seems the Ruby parser is more lenient, and allows the same key to be defined multiple times with a last-value-wins rule. This is also allowed in native Ruby hashes.
1.9.3-p448 > YAML::parse("test: true\ntest: false").to_ruby
=> {"test"=>false}
1.9.3-p448 > { 'test' => true, 'test' => false }
=> {"test"=>false}
A great way to learn how the YAML parser converts to/from Ruby structures, is to write Ruby code that outputs YAML, and look at what it's doing:
Here's a basic hash:
require 'yaml'
foo = {'test' => true} # => {"test"=>true}
foo.to_yaml # => "---\ntest: true\n"
A hash using a symbol as a key:
foo = {test: true}
foo.to_yaml # => "---\n:test: true\n"
A hash with conflicting keys, causing the first to be stomped-on by the last:
foo = {test: true, test: false}
foo # => {:test=>false}
foo.to_yaml # => "---\n:test: false\n"
YAML is creating the hash, but hashes can't have duplicated keys; If they do, the collision results in the second replacing the first.
"Yaml Cookbook
at the YamlForRuby site" is also a great resource.

What does the question mark at the end of a method name mean in Ruby?

What is the purpose of the question mark operator in Ruby?
Sometimes it appears like this:
assert !product.valid?
sometimes it's in an if construct.
It is a code style convention; it indicates that a method returns a boolean value (true or false) or an object to indicate a true value (or “truthy” value).
The question mark is a valid character at the end of a method name.
https://docs.ruby-lang.org/en/2.0.0/syntax/methods_rdoc.html#label-Method+Names
Also note ? along with a character acts as shorthand for a single-character string literal since Ruby 1.9.
For example:
?F # => is the same as "F"
This is referenced near the bottom of the string literals section of the ruby docs:
There is also a character literal notation to represent single
character strings, which syntax is a question mark (?) followed by a
single character or escape sequence that corresponds to a single
codepoint in the script encoding:
?a #=> "a"
?abc #=> SyntaxError
?\n #=> "\n"
?\s #=> " "
?\\ #=> "\\"
?\u{41} #=> "A"
?\C-a #=> "\x01"
?\M-a #=> "\xE1"
?\M-\C-a #=> "\x81"
?\C-\M-a #=> "\x81", same as above
?あ #=> "あ"
Prior to Ruby 1.9, this returned the ASCII character code of the character. To get the old behavior in modern Ruby, you can use the #ord method:
?F.ord # => will return 70
It's a convention in Ruby that methods that return boolean values end in a question mark. There's no more significance to it than that.
In your example it's just part of the method name. In Ruby you can also use exclamation points in method names!
Another example of question marks in Ruby would be the ternary operator.
customerName == "Fred" ? "Hello Fred" : "Who are you?"
It may be worth pointing out that ?s are only allowed in method names, not variables. In the process of learning Ruby, I assumed that ? designated a boolean return type so I tried adding them to flag variables, leading to errors. This led to me erroneously believing for a while that there was some special syntax involving ?s.
Relevant: Why can't a variable name end with `?` while a method name can?
In your example
product.valid?
Is actually a function call and calls a function named valid?. Certain types of "test for condition"/boolean functions have a question mark as part of the function name by convention.
I believe it's just a convention for things that are boolean. A bit like saying "IsValid".
It's also used in regular expressions, meaning "at most one repetition of the preceding character"
for example the regular expression /hey?/ matches with the strings "he" and "hey".
It's also a common convention to use with the first argument of the test method from Kernel#test
test ?d, "/dev" # directory exists?
# => true
test ?-, "/etc/hosts", "/etc/hosts" # are the files identical
# => true
as seen in this question here

Resources