How do I check if a string is valid YAML? - ruby

I'd like to check if a string is valid YAML. I'd like to do this from within my Ruby code with a gem or library. I only have this begin/rescue clause, but it doesn't get rescued properly:
def valid_yaml_string?(config_text)
require 'open-uri'
file = open("https://github.com/TheNotary/the_notarys_linux_mint_postinstall_configuration")
hard_failing_bad_yaml = file.read
config_text = hard_failing_bad_yaml
begin
YAML.load config_text
return true
rescue
return false
end
end
I am unfortunately getting the terrible error of:
irb(main):089:0> valid_yaml_string?("b")
Psych::SyntaxError: (<unknown>): mapping values are not allowed in this context at line 6 column 19
from /home/kentos/.rvm/rubies/ruby-1.9.3-p374/lib/ruby/1.9.1/psych.rb:203:in `parse'
from /home/kentos/.rvm/rubies/ruby-1.9.3-p374/lib/ruby/1.9.1/psych.rb:203:in `parse_stream'
from /home/kentos/.rvm/rubies/ruby-1.9.3-p374/lib/ruby/1.9.1/psych.rb:151:in `parse'
from /home/kentos/.rvm/rubies/ruby-1.9.3-p374/lib/ruby/1.9.1/psych.rb:127:in `load'
from (irb):83:in `valid_yaml_string?'
from (irb):89
from /home/kentos/.rvm/rubies/ruby-1.9.3-p374/bin/irb:12:in `<main>'

Using a cleaned-up version of your code:
require 'yaml'
require 'open-uri'
URL = "https://github.com/TheNotary/the_notarys_linux_mint_postinstall_configuration"
def valid_yaml_string?(yaml)
!!YAML.load(yaml)
rescue Exception => e
STDERR.puts e.message
return false
end
puts valid_yaml_string?(open(URL).read)
I get:
(<unknown>): mapping values are not allowed in this context at line 6 column 19
false
when I run it.
The reason is, the data you are getting from that URL isn't YAML at all, it's HTML:
open('https://github.com/TheNotary/the_notarys_linux_mint_postinstall_configuration').read[0, 100]
=> " \n\n\n<!DOCTYPE html>\n<html>\n <head prefix=\"og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# githubog:"
If you only want a true/false response whether it's parsable YAML, remove this line:
STDERR.puts e.message
Unfortunately, going beyond that and determining if the string is a YAML string gets harder. You can do some sniffing, looking for some hints:
yaml[/^---/m]
will search for the YAML "document" marker, but a YAML file doesn't have to use those, nor do they have to be at the start of the file. We can add that in to tighten up the test:
!!YAML.load(yaml) && !!yaml[/^---/m]
But, even that leaves some holes, so adding in a test to see what the parser returns can help even more. YAML could return an Fixnum, String, an Array or a Hash, but if you already know what to expect, you can check to see what YAML wants to return. For instance:
YAML.load(({}).to_yaml).class
=> Hash
YAML.load(({}).to_yaml).instance_of?(Hash)
=> true
So, you could look for a Hash:
parsed_yaml = YAML.load(yaml)
!!yaml[/^---/m] && parsed_yaml.instance_of(Hash)
Replace Hash with whatever type you think you should get.
There might be even better ways to sniff it out, but those are what I'd try first.

Related

How to safe load a YAML file that includes multiple documents?

The regular way to safe load a typical single document YAML file is done by using YAML.safe_load(content).
YAML files can contain multiple documents:
---
key: value
---
key: !ruby/struct
foo: bar
Loading a YAML file such as this using YAML.safe_load(content) will only return the first document:
{ 'key' => 'value' }
If you split the file and try to safe_load the second document, you will get an exception as expected:
Psych::DisallowedClass (Tried to load unspecified class: Struct)
To load multiple documents you can use YAML.load_stream(content) which returns an array:
[
{ 'key' => 'value' },
{ 'key' => #<struct foo="bar"> }
]
The problem is that there is no YAML.safe_load_stream that would raise exceptions for non-whitelisted data types.
I wrote a workaround that utilizes the YAML.parse_stream interface:
Edit: Now as gem yaml-safe_load_stream. Also, the maintainers of Psych (the YAML in ruby stdlib) are looking into adding this feature to the library.
require 'yaml'
module YAML
def safe_load_stream(yaml, filename = nil, &block)
parse_stream(yaml, filename) do |stream|
raise_if_tags(stream, filename)
if block_given?
yield stream.to_ruby
else
stream.to_ruby
end
end
end
module_function :safe_load_stream
def raise_if_tags(obj, filename = nil, doc_num = 1)
doc_num += 1 if obj.is_a?(Psych::Nodes::Document)
if obj.respond_to?(:tag)
if tag = obj.tag
message = "tag #{tag} encountered on line #{obj.start_line} column #{obj.start_column} of document #{doc_num}"
message << " in file #{filename}" if filename
raise Psych::DisallowedClass, message
end
end
if obj.respond_to?(:children)
Array(obj.children).each do |child|
raise_if_tags(child, filename, doc_num)
end
end
end
module_function :raise_if_tags
private_class_method :raise_if_tags
end
With this you can do:
YAML.safe_load_stream(content, 'file.txt')
And get an exception:
Psych::DisallowedClass (Tried to load unspecified class: tag !ruby/struct
encountered on line 1 column 7 of document 2 in file file.txt)
The line numbers returned from .start_line are relative to the document start, I didn't find a way to get the line number where the document starts, so I added the document number to the error message.
It does not have the class and symbol whitelists and toggling of anchors/aliasing like the YAML.safe_load.
Also there are ways to use tags that will probably give a false positive with such a simplistic unless tag.nil? detection.

How to use YAML.load with handlers

irb(main):001:0> a="run: yes"
=> "run: yes"
irb(main):002:0> require 'yaml'
=> true
irb(main):003:0> YAML.load a
=> {"run"=>true}
irb(main):004:0> YAML.load(a, handlers => {'bool#yes' = identity})
SyntaxError: (irb):4: syntax error, unexpected '=', expecting =>
YAML.load(a, handlers => {'bool#yes' = identity})
^
from /usr/bin/irb:11:in `<main>
I want the yaml val is yes and i google find the handler will help.
But seems i do not use correct syntax.
I try to search related docs but fail.
The problems with the listed code are
that handlers isn't defined anywhere, you likely wanted :handlers
that identity isn't defined anywhere, maybe wanted :identity that
you are missing a > on your hash rocket (=>).
So to get this code to run it should (likely) look like
YAML.load("run: yes", :handlers => {'bool#yes' => :identity})
However, so far as I know the second parameter to YAML.load is a filename.
If you are able to change the input YAML, simply quoting the value "yes" will cause it come through as a string
YAML.load("a: 'yes'")
# => {"a"=>"yes"}
If you require the un-quoted string 'yes' in the YAML to be treated as 'yes', not true in ruby after parsing. I cobbled this together (with help from this question), using Psych::Handler and Pysch::Parser. Though I'm not sure if there's another easier/better way to do this without having to hack this all together like this.
require 'yaml'
class MyHandler < Psych::Handlers::DocumentStream
def scalar(value, anchor, tag, plain, quoted, style)
if value == 'yes'
super(value, anchor, tag, plain, true, style)
else
super(value, anchor, tag, plain, quoted, style)
end
end
end
def my_parse(yaml)
parser = Psych::Parser.new(MyHandler.new{|node| return node})
parser.parse yaml
false
end
my_parse("a: yes").to_ruby
# => {"a"=>"yes"}
my_parse("a: 'yes'").to_ruby
# => {"a"=>"yes"}
my_parse("a: no").to_ruby
# => {"a"=>false}
Sidenote in the console (and the source):
YAML
# => Psych

JSON to CSV File Ruby

I am trying to convert the following JSON to CSV via Ruby, but am having trouble with my code. I am learning as I go, so any help is appreciated.
require 'json'
require 'net/http'
require 'uri'
require 'csv'
uri = 'https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C'
response = Net::HTTP.get_response(URI.parse(uri))
struct = JSON.parse(response.body.scan(/processPOIs\((.*)\);/).first.first)
CSV.open("output.csv", "w") do |csv|
JSON.parse(struct).read.each do |hash|
csv << hash.values
end
end
The error I receive is:
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/gems/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `new'
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/gems/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `parse'
from test.rb:14:in `block in <main>'
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/2.2.0/csv.rb:1273:in `open'
from test.rb:13:in `<main>'
I am trying to get all the data off of the following link and put it into a CSV file that I can analyse later. https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C
You have several problems here, the most significant of which is that you're calling JSON.parse twice. The second time you call it on struct, which was the result of calling JSON.parse the first time. You're basically doing JSON.parse(JSON.parse(string)). Oops.
There's another problem on the line where you call JSON.parse a second time: You call read on the value it returns. As far as I know JSON.parse does not ordinarily return anything that responds to read.
Fixing those two errors, your code looks something like this:
struct = JSON.parse(response.body.scan(/processPOIs\((.*)\);/).first.first)
CSV.open("output.csv", "w") do |csv|
struct.each do |hash|
csv << hash.values
end
end
This ought to work iif struct is an object that responds to each (like an array) and the values yielded by each all respond to values (like a hash). In other words, this code assumes that JSON.parse will return an array of hashes, or something similar. If it doesn't—well, that's beyond the scope of this question.
As an aside, this is not great:
response.body.scan(/processPOIs\((.*)\);/).first.first
The purpose of String#scan is to find every substring in a string that matches a regular expression. But you're only concerned with the first match, so scan is the wrong choice.
An alternative is to use String#match:
matches = response.body.match(/processPOIs\((.*)\)/)
json = matches[1]
struct = JSON.parse(json)
However, that's overkill. Since this is a JSONP response, we know that it will look like this:
processPOIs(...);
...give or take a trailing semicolon or newline. We don't need a regular expression to find the parts inside the parentheses, because we already know where it is: It starts 13 characters from the start (i.e. index 12) and ends two characters before the end ("index" -3). That makes it easy work with String#slice, a.k.a. String#[]:
json = response.body[12..-3]
struct = JSON.parse(json)
Like I said, "give or take a trailing semicolon or newline," so you might need to tweak that ending index depending on what the API returns. And with that, no more ugly .first.first, and it's faster, too.
Thank you everybody for the help. I was able to get everything into a CSV and then just used some VBA to organize it the way I wanted.
require 'json'
require 'net/http'
require 'uri'
require 'csv'
uri = 'https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C'
response = Net::HTTP.get_response(URI.parse(uri))
matches = response.body.match(/processPOIs\((.*)\)/)
json = response.body[12..-3]
struct = JSON.parse(json)
CSV.open("output.csv", "w") do |csv|
csv << struct['searchResults'].map { |result| result['fields']}
end

Ruby, writing to a YAML file, with arrays

I'm trying to save a few variables in a YAML config file.
Cool!!
However, when I try and save them, I get an error in RUBY:
undefined method `[]=' for false:FalseClass (NoMethodError)
My function should (In my head at least) be:
Does the config file exist, if not, just create a blank one.
Now that we know it exists, YAML.open it
set the new/overwriting key/value pairs
re Write the file
But, I'm getting the error above.
I'm new to Ruby (PHP bloke here), tell me where I'm being stupid please :)
def write_to_file( path_to_file, key, value, overwrite = true )
if !File.exist?(path_to_file)
File.open(path_to_file, 'a+')
end
config_file = YAML.load_file( path_to_file)
config_file[key] = value
File.open(path_to_file, 'w') { |f| YAML.dump(config_file, f) }
# I tried this commented code below too, same error..
# {|f| f.write config_file.to_yaml }
end
The problem is that you created an empty file. And the YAML parser returns false for an empty string:
YAML.load('') #=> false
Just set config_file to an empty hash when the YAML loader returned false:
config_file = YAML.load_file(path_to_file) || {}

Deserialize JSON primitives with the built-in Ruby JSON library

Why can Ruby's built-in JSON not deserialize simple JSON primitives, and how do I work around it?
irb(main):001:0> require 'json'
#=> true
irb(main):002:0> objects = [ {}, [], 42, "", true, nil ]
#=> [{}, [], 42, "", true]
irb(main):012:0> objects.each do |o|
irb(main):013:1* json = o.to_json
irb(main):014:1> begin
irb(main):015:2* p JSON.parse(json)
irb(main):016:2> rescue Exception => e
irb(main):017:2> puts "Error parsing #{json.inspect}: #{e}"
irb(main):018:2> end
irb(main):019:1> end
{}
[]
Error parsing "42": 706: unexpected token at '42'
Error parsing "\"\"": 706: unexpected token at '""'
Error parsing "true": 706: unexpected token at 'true'
Error parsing "null": 706: unexpected token at 'null'
#=> [{}, [], 42, "", true, nil]
irb(main):020:0> RUBY_DESCRIPTION
#=> "ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.7.0]"
irb(main):022:0> JSON::VERSION
#=> "1.4.2"
RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON) has this to say:
2. JSON Grammar
A JSON text is a sequence of tokens. The set of tokens includes six
structural characters, strings, numbers, and three literal names.
A JSON text is a serialized object or array.
JSON-text = object / array
[...]
2.1. Values
A JSON value MUST be an object, array, number, or string, or one of
the following three literal names:
false null true
If you call to_json on your six sample objects, we get this:
>> objects = [ {}, [], 42, "", true, nil ]
>> objects.map { |o| puts o.to_json }
{}
[]
42
""
true
null
So the first and second are valid JSON texts whereas the last four are not valid JSON texts even though they are valid JSON values.
JSON.parse wants what it calls a JSON document:
Parse the JSON document source into a Ruby data structure and return it.
Perhaps JSON document is the library's term for what RFC 4627 calls a JSON text. If so, then raising an exception is a reasonable response to an invalid input.
If you forcibly wrap and unwrap everything:
objects.each do |o|
json = o.to_json
begin
json_text = '[' + json + ']'
p JSON.parse(json_text)[0]
rescue Exception => e
puts "Error parsing #{json.inspect}: #{e}"
end
end
And as you note in your comment, using an array as the wrapper is better than an object in case the caller wants to use the :symbolize_names option. Wrapping like this means that you'll always be feeding JSON.parse a JSON text and everything should be fine.
This is quite an old question but I think it worths to have a proper answer to prevent hair loss for the ones who just encountered with the problem and still searching for a solution :)
To be able to parse "JSON primitives" with JSON gem below version 2, you can pass quirks_mode: true option like so;
JSON::VERSION # => 1.8.6
json_text = "This is a json primitive".to_json
JSON.parse(json_text, quirks_mode: true)
With the JSON gem version greater or equals to 2, the quirks_mode is not necessary anymore.
JSON::VERSION # => 2.0.0
json_text = "This is a json primitive".to_json
JSON.parse(json_text)
Before parsing the JSON, you can check the version of the JSON gem that you are using in your project with bundle show json or gem list | grep json and then use the corresponding one.
Happy JSON parsing!
It appears that the built-in JSON parser intentionally fails on anything but objects and arrays. My current workaround is the following:
# Work around a flaw in Ruby's built-in JSON parser
# not accepting anything but an object or array at the root level.
module JSON
def self.parse_any(str,opts={})
parse("[#{str}]",opts).first
end
end
Use JSON.load instead of JSON.parse to handle primitives:
e.g.
JSON.load('true') # => true
JSON.load('false') # => false
JSON.load('5150') # => 5150
JSON.load('null') # => nil
I think you are right...whether it is a bug or not, there is some wonky logic going on with the implementation. If it can parse arrays, and hashes it should be able to parse everything else.
Because JSON.parse seems geared for objects and arrays, I would try to pass your data one of those ways if you can, and if you can't, stick with the workaround you have.

Resources