How to dump strings in YAML using literal scalar style? - ruby

I have a big string of formatted data (e.g. JSON) that I want to dump to YAML using Psych in ruby while preserving formatting.
Basically, I want for JSON to appear in YAML using literal style:
---
json: |
{
"page": 1,
"results": [
"item", "another"
],
"total_pages": 0
}
However, when I use YAML.dump it doesn't use literal style. I get something like this:
---
json: ! "{\n \"page\": 1,\n \"results\": [\n \"item\", \"another\"\n ],\n \"total_pages\":
0\n}\n"
How can I tell Psych to dump scalars in wanted style?
Solution:
Big thanks to Aaron Patterson for his solution that I'm expanding on here: https://gist.github.com/2023978
Although a bit verbose, that gist is a working way of tagging certain strings in ruby to be output using literal style in YAML.

require 'psych'
# Construct an AST
visitor = Psych::Visitors::YAMLTree.new({})
visitor << DATA.read
ast = visitor.tree
# Find all scalars and modify their formatting
ast.grep(Psych::Nodes::Scalar).each do |node|
node.plain = false
node.quoted = true
node.style = Psych::Nodes::Scalar::LITERAL
end
begin
# Call the `yaml` method on the ast to convert to yaml
puts ast.yaml
rescue
# The `yaml` method was introduced in later versions, so fall back to
# constructing a visitor
Psych::Visitors::Emitter.new($stdout).accept ast
end
__END__
{
"page": 1,
"results": [
"item", "another"
],
"total_pages": 0
}

Related

Ruby JSON extractor failing, possibly due to overly large JSON

I was in the process of creating a script to extract all of the comments from a Reddit Thread as a JSON:
require "rubygems"
require "json"
require "net/http"
require "uri"
require 'open-uri'
require 'neatjson'
#The URL.
url = ("https://www.reddit.com/r/AskReddit/comments/46n0zc.json")
#Sets up the JSON reader.
result = JSON.parse(open(url).read)
children = result["data"]["children"]
#Prints the jsons.
children.each do |child|
puts "Author: " + child["data"]["author"]
puts "Body: " + child["data"]["body"]
puts "ID: " + child["data"]["id"]
puts "Upvotes: " + child["data"]["ups"].to_s
puts ""
end
And for some reason it gives me an error. However, the error is not in the actual JSON printer, but in the reader:
005----extractallredditpostcomments.rb:17:in `[]': no implicit conversion of String into Integer (TypeError)
from 005----extractallredditpostcomments.rb:17:in `<main>'
For some reason,
children = result["data"]["children"]
Isn't working, which is strange because it worked fine yesterday
What I'm wondering is: Could this be causes by the size of the JSON? If you actually go to the link (https://www.reddit.com/r/AskReddit/comments/46n0zc.json) you can see that the file is huge. I'm having so much trouble finding the tags I need due to the sheer size of the page, it took me hours and I'm still not sure I have the correct ones, that could be causing the error as well. I'm not sure what's failing here.
Oh, and one last thing: I tried simplifying the program by removing the printer:
#Sets up the JSON reader.
result = JSON.parse(open(url).read)
children = result["data"]["children"]
puts children
#Prints the jsons.
#children.each do |child|
# puts "Author: " + child["data"]["author"]
# puts "Body: " + child["data"]["body"]
# puts "ID: " + child["data"]["id"]
# puts "Upvotes: " + child["data"]["ups"].to_s
# puts ""
#end
And it still fails:
005----extractallredditpostcomments.rb:13:in `[]': no implicit conversion of String into Integer (TypeError)
from 005----extractallredditpostcomments.rb:13:in `<main>'
A quick look at the returned JSON value shows that it is a JSON array of two JSON objects and not a JSON object. It looks somewhat like this:
[
{
"data": {
"after": null,
"before": null,
"children": [
{
"data": {
"approved_by": null,
"archived": false,
...
},
"kind": "Listing"
},
{
"data": {
"after": null,
"before": null,
"children": [
{
"data": {
"approved_by": null,
"archived": false,
"author": "finkledinkle7",
"author_flair_css_class": null,
"author_flair_text": null,
"banned_by": null,
"body": "My mother was really sick in 2008. I was turning 25 with a younger brother and sister.\n\nLost both of my grandparents on mom's side to cancer a few years prior. Mom had to watch as her parents slowly passed away. It destroyed her not having her mother around as t ...
}
]
This means that the line children = result["data"]["children"] in your program won't work because it is treating result as a JSON object. It looks like you should do children = result[1]["data"]["children"].

What's an efficient way (without parsing and re-encoding) to put a string representing JSON into a Ruby hash?

I have a JSON string which has been generated by Jbuilder:
json = "{name: 'Peter', email: 'peter#stackoverflow.com'}"
This is currently a string. However I want to combine it into a new hash (ideally in Ruby) before finally outputting it as JSON.
i.e.
output = {result: :success, data: json}
However if I convert this to JSON the json value gets double-encoded such that it's sent as a string:
output.to_json
#=> "{\"result\":\"success\",\"data\":\"{name: 'Peter', email: 'peter#stackoverflow.com'}\"}"
Now I could parse the JSON into a Ruby hash and then re-output it but that seems like a big fat waste of parsing when what I'd really like to do is to say "hey, this node is already JSON, don't re-encode it already!".
Is there any equivalent to the raw() method Rails has in views? i.e.
output = {result: :success, data: raw(json)}
so that the json evaluation of this then becomes:
output.to_json
#=> "{\"result\":\"success\",\"data\": {\"name\":\"Peter\",\"email\":\"peter#stackoverflow.com\"}"
Here’s a way you can do this, it’s a bit of a hack but you might find it useful.
First restating the problem:
# Note the quotes, your example isn't actually valid
json = "{\"name\": \"Peter\", \"email\": \"peter#stackoverflow.com\"}"
output = {result: :success, data: json}
# Without changing anything
puts JSON.generate(output)
This results in the following, where the value of data is a single string:
{"result":"success","data":"{\"name\": \"Peter\", \"email\": \"peter#stackoverflow.com\"}"}
The json gem uses a to_json method that is added to all objects to convert them to json, so the simplest fix would be to replace that method on objects you want to behave differently:
# As before
json = "{\"name\": \"Peter\", \"email\": \"peter#stackoverflow.com\"}"
# Replace to_json on the singleton object
def json.to_json *args
self
end
output = {result: :success, data: json}
# Generate the output (output.to_json gives the same result)
puts JSON.generate(output)
This creates the following, where the data value is now itself a hash, as desired:
{"result":"success","data":{"name": "Peter", "email": "peter#stackoverflow.com"}}
A cleaner way to do this, to avoid manipulating singletons in your code could be to create a subclass of string that has this behaviour:
class JsonSafeString < String
def to_json *args
self
end
end
You can now create a JsonSafeString when you want the contents included directly in a JSON object:
json = "{\"name\": \"Peter\", \"email\": \"peter#stackoverflow.com\"}"
output = {result: :success, data: JsonSafeString.new(json)}
puts JSON.generate(output)
The result is the same as above:
{"result":"success","data":{"name": "Peter", "email": "peter#stackoverflow.com"}}
You could wrap the call to JsonSafeString.new in a method like raw_json if you wanted.
Obviously this leaves the task of ensuring your string is valid to you – the main point of using a library for this is the user doesn’t have to concern themselves with things like whether to use single or double quotes, so you could be vulnerable to generating invalid JSON if you’re not careful. Also this is just a quick hack, there are probably a load of things I haven’t considered. In particular I haven’t taken character encodings into account, so watch out.
This doesn't address your question, but may help you avoid it altogether...
Do you really need to generate your json variable into JSON before adding it to the hash? Jbuilder can generate a hash just as easily as a JSON string, e.g.:
hash = Jbuilder.new do |json|
json.name 'Peter'
json.email 'peter#stackoverflow.com'
end.attributes!
# => {"name"=>"Peter", "email"=>"peter#stackoverflow.com"}
output = {result: :success, data: hash}
eval will put it out as raw code.
eval "{name: 'Peter', email: 'peter#stackoverflow.com'}"
=> {:name=>"Peter", :email=>"peter#stackoverflow.com"}
And the results.
output = {result: :success, data: eval("{name: 'Peter', email: 'peter#stackoverflow.com'}") }
=> {:result=>:success, :data=>{:name=>"Peter", :email=>"peter#stackoverflow.com"}}
And to string
output.to_s
=> "{:result=>:success, :data=>{:name=>\"Peter\", :email=>\"peter#stackoverflow.com\"}}"
And JSON
require 'json'
=> true
output.to_json
=> "{\"result\":\"success\",\"data\":{\"name\":\"Peter\",\"email\":\"peter#stackoverflow.com\"}}"

For loop inside <<-eos Ruby

I'm a rookie in Ruby language. I'm trying to write a json file with ruby to import it after to a Mongodb collection. I need the document maintain proper indentation to then fill it comfortably
At this moment, I'm doing it in this way, but I'm sure that isn't the recommened way
out_file = File.new('file.json', "w+")
str = <<-eos
{
"key1": #{#value1},
"key2" : #{#value2},
"key3" : {
"subkey_3_1" : {
"key" : #{#value},
"questions" : #{#invalid_questions}
},
"subkey_3_2" : {
"key" : #{value},
"array_key" : [
for i in 1..50
# Here, must be create 50 hash pair-value like this.
{},
{},
{},
...
end
]
}
}
}
eos
out_file.puts(str)
out_file.close
This is the final structure that I want.Thanks, and sorry for not explaining right from the start
How can I define it in ruby?
str = <<-eos
"key" : [
#{for i in 1..50 {
...something content...
}.join("\n") }
]
eos
However - why do you want a string here - I don't know what you are trying to do, but there must be a better way of doing it.
UPDATE:
Yep, as mentioned by #ArupRakshit you need to create the hash first and call to_json on it. If you don't have this method, you need to install gem called active_support and require 'active_support/core_ext' (no need to do this for rails app). Do not build json response manually.

Deserialize JSON primitives with the built-in Ruby JSON library

Why can Ruby's built-in JSON not deserialize simple JSON primitives, and how do I work around it?
irb(main):001:0> require 'json'
#=> true
irb(main):002:0> objects = [ {}, [], 42, "", true, nil ]
#=> [{}, [], 42, "", true]
irb(main):012:0> objects.each do |o|
irb(main):013:1* json = o.to_json
irb(main):014:1> begin
irb(main):015:2* p JSON.parse(json)
irb(main):016:2> rescue Exception => e
irb(main):017:2> puts "Error parsing #{json.inspect}: #{e}"
irb(main):018:2> end
irb(main):019:1> end
{}
[]
Error parsing "42": 706: unexpected token at '42'
Error parsing "\"\"": 706: unexpected token at '""'
Error parsing "true": 706: unexpected token at 'true'
Error parsing "null": 706: unexpected token at 'null'
#=> [{}, [], 42, "", true, nil]
irb(main):020:0> RUBY_DESCRIPTION
#=> "ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.7.0]"
irb(main):022:0> JSON::VERSION
#=> "1.4.2"
RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON) has this to say:
2. JSON Grammar
A JSON text is a sequence of tokens. The set of tokens includes six
structural characters, strings, numbers, and three literal names.
A JSON text is a serialized object or array.
JSON-text = object / array
[...]
2.1. Values
A JSON value MUST be an object, array, number, or string, or one of
the following three literal names:
false null true
If you call to_json on your six sample objects, we get this:
>> objects = [ {}, [], 42, "", true, nil ]
>> objects.map { |o| puts o.to_json }
{}
[]
42
""
true
null
So the first and second are valid JSON texts whereas the last four are not valid JSON texts even though they are valid JSON values.
JSON.parse wants what it calls a JSON document:
Parse the JSON document source into a Ruby data structure and return it.
Perhaps JSON document is the library's term for what RFC 4627 calls a JSON text. If so, then raising an exception is a reasonable response to an invalid input.
If you forcibly wrap and unwrap everything:
objects.each do |o|
json = o.to_json
begin
json_text = '[' + json + ']'
p JSON.parse(json_text)[0]
rescue Exception => e
puts "Error parsing #{json.inspect}: #{e}"
end
end
And as you note in your comment, using an array as the wrapper is better than an object in case the caller wants to use the :symbolize_names option. Wrapping like this means that you'll always be feeding JSON.parse a JSON text and everything should be fine.
This is quite an old question but I think it worths to have a proper answer to prevent hair loss for the ones who just encountered with the problem and still searching for a solution :)
To be able to parse "JSON primitives" with JSON gem below version 2, you can pass quirks_mode: true option like so;
JSON::VERSION # => 1.8.6
json_text = "This is a json primitive".to_json
JSON.parse(json_text, quirks_mode: true)
With the JSON gem version greater or equals to 2, the quirks_mode is not necessary anymore.
JSON::VERSION # => 2.0.0
json_text = "This is a json primitive".to_json
JSON.parse(json_text)
Before parsing the JSON, you can check the version of the JSON gem that you are using in your project with bundle show json or gem list | grep json and then use the corresponding one.
Happy JSON parsing!
It appears that the built-in JSON parser intentionally fails on anything but objects and arrays. My current workaround is the following:
# Work around a flaw in Ruby's built-in JSON parser
# not accepting anything but an object or array at the root level.
module JSON
def self.parse_any(str,opts={})
parse("[#{str}]",opts).first
end
end
Use JSON.load instead of JSON.parse to handle primitives:
e.g.
JSON.load('true') # => true
JSON.load('false') # => false
JSON.load('5150') # => 5150
JSON.load('null') # => nil
I think you are right...whether it is a bug or not, there is some wonky logic going on with the implementation. If it can parse arrays, and hashes it should be able to parse everything else.
Because JSON.parse seems geared for objects and arrays, I would try to pass your data one of those ways if you can, and if you can't, stick with the workaround you have.

What are good examples of mapping YAML data to Ruby objects?

I am looking for basic examples of YAML syntax and how to work with it in Ruby.
Basically, by looking at the examples, I hope to better understand how to map YAML scalars to object attributes, and whether to use different YAML files or having one YAML file containing multiple objects.
There is a YAML class in Ruby core which has a short tutorial and a few links.
YAML in Five Minutes
Serializing and Deserializing objects with Ruby
require "yaml"
test_obj = ["dogs", "cats", "badgers"]
yaml_obj = YAML::dump( test_obj )
# -> ---
- dogs
- cats
- badgers
ruby_obj = YAML::load( yaml_obj )
# => ["dogs", "cats", "badgers"]
ruby_obj == test_obj
# => true

Resources