Clarifications on YAML syntax and Ruby parsing - ruby

I am new to YAML and Ruby. I am using the following Ruby code to parse a YAML file:
obj = YAML::load_file('test.yml')
Are the following YAML file contents for 'test.yml' valid?
Case 1:
test
In this case, I don't specify the value of test (something like test : true) but my Ruby parsing code does not throw an error. I thought this was an invalid YAML syntax.
Case 2:
:test : true
In this case, the Ruby code treats test as a symbol instead of a string and when I do puts obj[:test], it returns the result to be "true". Is this a Ruby thing? Other languages will interpret it as a string ":test"?
Case 3:
:test : true
:test : false
In this case, instead of throwing up an error for redefinition of :test, my Ruby code takes the latest value for :test (which is false). Why is this? Does YAML syntax allow for re-definition of elements and in which case only the latest value gets taken?

Case 1: YAML allows unquoted scalars, or "bare" strings not enclosed in quotes. Compared to quoted strings they are less flexible, since you can't use certain characters without creating ambiguous syntax, but the Ruby parser does support them.
1.9.3-p448 > YAML::parse('test').to_ruby
=> "test"
Case 2: As you've guessed, this is Ruby-specific since YAML has no concept of "symbols". When converting a YAML mapping to a Ruby hash, scalar keys that start with a colon are interpreted as symbols instead of strings.
Case 3: Under YAML's definition of a mapping, keys must be unique, so a strict parser should throw an error when given your example. It seems the Ruby parser is more lenient, and allows the same key to be defined multiple times with a last-value-wins rule. This is also allowed in native Ruby hashes.
1.9.3-p448 > YAML::parse("test: true\ntest: false").to_ruby
=> {"test"=>false}
1.9.3-p448 > { 'test' => true, 'test' => false }
=> {"test"=>false}

A great way to learn how the YAML parser converts to/from Ruby structures, is to write Ruby code that outputs YAML, and look at what it's doing:
Here's a basic hash:
require 'yaml'
foo = {'test' => true} # => {"test"=>true}
foo.to_yaml # => "---\ntest: true\n"
A hash using a symbol as a key:
foo = {test: true}
foo.to_yaml # => "---\n:test: true\n"
A hash with conflicting keys, causing the first to be stomped-on by the last:
foo = {test: true, test: false}
foo # => {:test=>false}
foo.to_yaml # => "---\n:test: false\n"
YAML is creating the hash, but hashes can't have duplicated keys; If they do, the collision results in the second replacing the first.
"Yaml Cookbook
at the YamlForRuby site" is also a great resource.

Related

YAML: error parsing a string containing a square bracket as its first character

I'm parsing a YAML file in Ruby and some of the input is causing a Psych syntax error:
require 'yaml'
example = "my_key: [string] string"
YAML.load(example)
Resulting in:
Psych::SyntaxError: (<unknown>): did not find expected key
while parsing a block mapping at line 1 column 1
from [...]/psych.rb:456:in `parse'
I received this YAML from an external API that I do not have control over. I can see that editing the input to force parsing as a string, using my_key: '[string] string', as noted in "Do I need quotes for strings in YAML?", fixes the issue however I don't control how the input is received.
Is there a way to force the input to be parsed as a string for some keys such as my_key? Is there a workaround to successfully parse this YAML?
One approach would be to process the response before reading it as YAML. Assuming it's a string, you could use a regex to replace the problematic pattern with something valid. I.e.
resp_str = "---\nmy_key: [string] string\n"
re = /(\: )(\[[a-z]*?\] [a-z]*?)(\n)/
resp_str.gsub!(re, "#{$1}'#{$2}'#{$3}")
#=> "---\n" + "my_key: '[string] string'\n"
Then you can do
YAML.load(resp_str)
#=> {"my_key"=>"[string] string"}
It does not work because square brackets have a special meaning in YAML, denoting arrays:
YAML.load "my_key: [string]"
#⇒ {"my_key"=>["string"]}
and [foo] bar is an invalid type. One should escape square brackets explicitly
YAML.load "my_key: \\[string\\] string"
#⇒ {"my_key"=>"\\[string\\] string"}
Also, one might implement the custom Psych parser.
There is very native and easy solution. If you would like to have string context you can always put quotes around it:
YAML.load "my_key: '[string]'"
=> {"my_key"=>"[string]"}

Ruby: What does the comment "frozen_string_literal: true" do?

This is the rspec binstub in my project directory.
#!/usr/bin/env ruby
begin
load File.expand_path("../spring", __FILE__)
rescue LoadError
end
# frozen_string_literal: true
#
# This file was generated by Bundler.
#
# The application 'rspec' is installed as part of a gem, and
# this file is here to facilitate running it.
#
require "pathname"
ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../Gemfile",
Pathname.new(__FILE__).realpath)
require "rubygems"
require "bundler/setup"
load Gem.bin_path("rspec-core", "rspec")
What is this intended to do?
# frozen_string_literal: true
# frozen_string_literal: true is a magic comment, supported for the first time in Ruby 2.3, that tells Ruby that all string literals in the file are implicitly frozen, as if #freeze had been called on each of them. That is, if a string literal is defined in a file with this comment, and you call a method on that string which modifies it, such as <<, you'll get RuntimeError: can't modify frozen String.
The comment must be on the first line of the file.
In Ruby 2.3, you can use this magic comment to prepare for frozen string literals being the default in Ruby 3.
In Ruby 2.3 run with the --enable=frozen-string-literal flag, and in Ruby 3, string literals are frozen in all files. You can override the global setting with # frozen_string_literal: false.
If you want a string literal to be mutable regardless of the global or per-file setting, you can prefix it with the unary + operator (being careful with operator precedence) or call .dup on it:
# frozen_string_literal: true
"".frozen?
=> true
(+"").frozen?
=> false
"".dup.frozen?
=> false
You can also freeze a mutable (unfrozen) string with unary -.
Source: magic_comment defined in ruby/ruby
It improves application performance by not allocating new space for the same string, thereby also saving time for garbage collection chores. How? when you freeze a string literal(string object), you're telling Ruby to not let any of your programs modify the string literal (object).
Some obvious observations to keep in mind.
1. By freezing string literals, you're not allocating new memory space for it.
Example:
Without magic comment allocates new space for the same string
(Observe the different object IDs printed)
def hello_id
a = 'hello'
a.object_id
end
puts hello_id #=> 70244568358640
puts hello_id #=> 70244568358500
With magic comment, ruby allocates space only once
# frozen_string_literal: true
def hello_id
a = 'hello'
a.object_id
end
puts hello_id #=> 70244568358640
puts hello_id #=> 70244568358640
2. By freezing string literals, your program will raise an exception when trying to modify the string literal.
Example:
Without magic comment, you can modify the string literals.
name = 'Johny'
name << ' Cash'
puts name #=> Johny Cash
With magic comment, an exception will be raised when you modify string literals
# frozen_string_literal: true
name = 'john'
name << ' cash' #=> `<main>': can't modify frozen String (FrozenError)
puts name
There's always more to learn and be flexible:
https://bugs.ruby-lang.org/issues/8976
https://www.mikeperham.com/2018/02/28/ruby-optimization-with-one-magic-comment/
In Ruby 3.0. Matz (Ruby’s creator) decided to make all String literals frozen by default.
EDIT 2019: he decided to abandon the idea of making frozen-string-literals default for Ruby 3.0 (source: https://bugs.ruby-lang.org/issues/11473#note-53)
You can use in Ruby 2.x. Just add this comment in the first line of your files.
# frozen_string_literal: true
The above comment at top of a file changes semantics of static string
literals in the file. The static string literals will be frozen and
always returns same object. (The semantics of dynamic string literals
is not changed.)
This way has following benefits:
No ugly f-suffix.
No syntax error on older Ruby.
We need only a line
for each file.
Plese, read this topic for more information.
https://bugs.ruby-lang.org/issues/8976

Check the string with hash key

I am using Ruby 1.9.
I have a hash:
Hash_List={"ruby"=>"fun to learn","the rails"=>"It is a framework"}
I have a string like this:
test_string="I am learning the ruby by myself and also the rails."
I need to check if test_string contains words that match the keys of Hash_List. And if it does, replace the words with the matching hash value.
I used this code to check, but it is returning them empty:
another_hash=Hash_List.select{|key,value| key.include? test_string}
OK, hold onto your hat:
HASH_LIST = {
"ruby" => "fun to learn",
"the rails" => "It is a framework"
}
test_string = "I am learning the ruby by myself and also the rails."
keys_regex = /\b (?:#{Regexp.union(HASH_LIST.keys).source}) \b/x # => /\b (?:ruby|the\ rails) \b/x
test_string.gsub(keys_regex, HASH_LIST) # => "I am learning the fun to learn by myself and also It is a framework."
Ruby's got some great tricks up its sleeve, one of which is how we can throw a regular expression and a hash at gsub, and it'll search for every match of the regular expression, look up the matching "hits" as keys in the hash, and substitute the values back into the string:
gsub(pattern, hash) → new_str
...If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string....
Regexp.union(HASH_LIST.keys) # => /ruby|the\ rails/
Regexp.union(HASH_LIST.keys).source # => "ruby|the\\ rails"
Note that the first returns a regular expression and the second returns a string. This is important when we embed them into another regular expression:
/#{Regexp.union(HASH_LIST.keys)}/ # => /(?-mix:ruby|the\ rails)/
/#{Regexp.union(HASH_LIST.keys).source}/ # => /ruby|the\ rails/
The first can quietly destroy what you think is a simple search, because of the ?-mix: flags, which ends up embedding different flags inside the pattern.
The Regexp documentation covers all this well.
This capability is the core to making an extremely high-speed templating routine in Ruby.
You could do that as follows:
Hash_List.each_with_object(test_string.dup) { |(k,v),s| s.sub!(/#{k}/, v) }
#=> "I am learning the fun to learn by myself and also It is a framework."
First, follow naming conventions. Variables are snake_case, and names of classes are CamelCase.
hash = {"ruby" => "fun to learn", "rails" => "It is a framework"}
words = test_string.split(' ') # => ["I", "am", "learning", ...]
another_hash = hash.select{|key,value| words.include?(key)}
Answering your question: split your test string in words with #split and then check whether words include a key.
For checking if the string is substring of another string use String#[String] method:
another_hash = hash.select{|key, value| test_string[key]}

Substring syntaxes in Ruby

Python has the following elegant syntax for checking whether one string is a substring of another one:
'ab' in 'abc' # True
Is there an equivalent elegant syntax in Ruby?
I'm aware to the "abc".includes? "ab" Ruby syntax, but I'm wondering whether the inverse syntax exists too (where the first parameter is the substring and the second is the string).
There isn't such method in Ruby standard library, but Rails ActiveSupport provides #.in? method:
1.9.3-p484 :004 > "ab".in? "abc"
=> true
Here is the source code: https://github.com/rails/rails/blob/e20dd73df42d63b206d221e2258cc6dc7b1e6068/activesupport/lib/active_support/core_ext/object/inclusion.rb
Define "elegant".
This does a sub-string search and returns the "hit" if found:
'abc'['ab'] # => "ab"
Using !! converts the value returned to a true/false, so "ab" becomes true:
!!'abc'['ab'] # => true
Knowing that, it's trivial to add it in if you want something closer:
class String
def in?(other)
!!other[self]
end
end
'ab'.in?('abc') # => true
'ab'.in? 'abc' # => true
Or, use require 'active_support/core_ext/object/inclusion' to cherry-pick the Active Suport definition that extends all objects to allow in?. See http://edgeguides.rubyonrails.org/active_support_core_extensions.html#in-questionmark. The upside/downside to that it's modifying all objects.

Generate string for Regex pattern in Ruby

In Python language I find rstr that can generate a string for a regex pattern.
Or in Python we have this method that can return range of string:
re.sre_parse.parse(pattern)
#..... ('range', (97, 122)) ....
But In Ruby I didn't find any thing.
So how to generate string for a regex pattern in Ruby(reverse regex)?
I wanna to some thing like this:
"/[a-z0-9]+/".example
#tvvd
"/[a-z0-9]+/".example
#yt
"/[a-z0-9]+/".example
#bgdf6
"/[a-z0-9]+/".example
#564fb
"/[a-z0-9]+/" is my input.
The outputs must be correct string that available in my regex pattern.
Here outputs were: tvvd , yt , bgdf6 , 564fb that "example" method generated them.
I need that method.
Thanks for your advice.
You can also use the Faker gem https://github.com/stympy/faker and then use this call:
Faker::Base.regexify(/[a-z0-9]{10}/)
In Ruby:
/qweqwe/.to_s
# => "(?-mix:qweqwe)"
When you declare a Regexp, you've got the Regexp class object, to convert it to String class object, you may use Regexp's method #to_s. During conversion the special fields will be expanded, as you may see in the example., using:
(using the (?opts:source) notation. This string can be fed back in to Regexp::new to a regular expression with the same semantics as the original.
Also, you can use Regexp's method #inspect, which:
produces a generally more readable version of rxp.
/ab+c/ix.inspect #=> "/ab+c/ix"
Note: that the above methods are only use for plain conversion Regexp into String, and in order to match or select set of string onto an other one, we use other methods. For example, if you have a sourse array (or string, which you wish to split with #split method), you can grep it, and get result array:
array = "test,ab,yr,OO".split( ',' )
# => ['test', 'ab', 'yr', 'OO']
array = array.grep /[a-z]/
# => ["test", "ab", "yr"]
And then convert the array into string as:
array.join(',')
# => "test,ab,yr"
Or just use #scan method, with slightly changed regexp:
"test,ab,yr,OO".scan( /[a-z]+/ )
# => ["test", "ab", "yr"]
However, if you really need a random string matched the regexp, you have to write your own method, please refer to the post, or use ruby-string-random library. The library:
generates a random string based on Regexp syntax or Patterns.
And the code will be like to the following:
pattern = '[aw-zX][123]'
result = StringRandom.random_regex(pattern)
A bit late to the party, but - originally inspired by this stackoverflow thread - I have created a powerful ruby gem which solves the original problem:
https://github.com/tom-lord/regexp-examples
/this|is|awesome/.examples #=> ['this', 'is', 'awesome']
/https?:\/\/(www\.)?github\.com/.examples #=> ['http://github.com', 'http://www.github.com', 'https://github.com', 'https://www.github.com']
UPDATE: Now regular expressions supported in string_pattern gem and it is 30 times faster than other gems
require 'string_pattern'
/[a-z0-9]+/.generate
To see a comparison of speed https://repl.it/#tcblues/Comparison-generating-random-string-from-regular-expression
I created a simple way to generate strings using a pattern without the mess of regular expressions, take a look at the string_pattern gem project: https://github.com/MarioRuiz/string_pattern
To install it: gem install string_pattern
This is an example of use:
# four characters. optional: capitals and numbers, required: lower
"4:XN/x/".gen # aaaa, FF9b, j4em, asdf, ADFt
Maybe you can find what you are looking for over here.

Resources