How to parse CSON to Ruby object? - ruby

I am trying to read CSON (CoffeeScript Object Notation) into Ruby.
I am looking for something similar to data = JSON.parse(file) that one would use for JSON files.
file = File.read(filename)
data = CSON.parse(file) # does not exist - would like to have
I looked into invoking CoffeeScript and JavaScript from Ruby, but it feels overly complicated and like reinventing the wheel. Also, code in the data file should not be executed.
How can I read CSON into Ruby objects in a simple way?

This is what I came up with. It is sufficient for the data I am processing. The main work is done with the YAML parser Psych (https://github.com/ruby/psych). Arrays, hashes, and some of the multi-line text require a special treatment.
module CSON
def load_file(fname)
load_string File.read fname
end
def remove_indent(data)
out = ""
data.each_line do |line|
out += line.sub /^\s\s/,""
end
out
end
def parse_array(data)
data.gsub! /\n/, ","
data.gsub! /([\[\{]),/, '\1'
data.gsub! /,([\]\}])/, '\1'
YAML.load data
end
def load_string(data)
hashed = {}
data.gsub! /^(\w+):\s+(\[.*?\])/mu do # find arrays
key = Regexp.last_match[1]
value = parse_array Regexp.last_match[2]
hashed[key] = value
""
end
data.gsub! /(\w+):\s+\'\'\'\s*\n(.*?)\'\'\'/mu do # find heredocs
hashed[Regexp.last_match[1]] = remove_indent Regexp.last_match[2]
""
end
hashed.merge YAML.load data
end
end
This solution is likely to fail when applied to more complicated .cson files. I would be happy to see if someone has a more elegant answer!

Related

How to pass method arguments use as Hash path?

E.G.
def do_the_thing(file_to_load, hash_path)
file = File.read(file)
data = JSON.parse(file, { symbolize_names: true })
data[sections.to_sym]
end
do_the_thing(file_I_want, '[:foo][:bar][0]')
Tried a few methods but failed so far.
Thanks for any help in advance :)
Assuming you missed the parameters names...
Lets assume our file is:
// test.json
{
"foo": {
"bar": ["foobar"]
}
}
Recomended solution
Does your param really need to be a string??
If your code can be more flexible, and pass arguments as they are on ruby, you can use the Hash dig method:
require 'json'
def do_the_thing(file, *hash_path)
file = File.read(file)
data = JSON.parse(file, symbolize_names: true)
data.dig(*hash_path)
end
do_the_thing('test.json', :foo, :bar, 0)
You should get
"foobar"
It should work fine !!
Read the rest of the answer if that doesn't satisfy your question
Alternative solution (using the same argument)
If you REALLY need to use that argument as string, you can;
Treat your params to adapt to the first solution, it won't be a small or fancy code, but it will work:
require 'json'
BRACKET_REGEX = /(\[[^\[]*\])/.freeze
# Treats the literal string to it's correspondent value
def treat_type(param)
# Remove the remaining brackets from the string
# You could do this step directly on the regex if you want to
param = param[1..-2]
case param[0]
# Checks if it is a string
when '\''
param[1..-2]
# Checks if it is a symbol
when ':'
param[1..-1].to_sym
else
begin
Integer(param)
rescue ArgumentError
param
end
end
end
# Converts your param to the accepted pattern of 'dig' method
def string_to_args(param)
# Scan method will break the match results of the regex into an array
param.scan(BRACKET_REGEX).flatten.map { |match| treat_type(match) }
end
def do_the_thing(file, hash_path)
hash_path = string_to_args(hash_path)
file = File.read(file)
data = JSON.parse(file, symbolize_names: true)
data.dig(*hash_path)
end
so:
do_the_thing('test.json', '[:foo][:bar][0]')
returns
"foobar"
This solution though is open to bugs when the "hash_path" is not on an acceptable pattern, and treating it's bugs might make the code even longer
Shortest solution (Not safe)
You can use Kernel eval method which I EXTREMELY discourage to use for security reasons, read the documentation and understand its danger before using it
require 'json'
def do_the_thing(file, hash_path)
file = File.read(file)
data = JSON.parse(file, symbolize_names: true)
eval("data#{hash_path}")
end
do_the_thing('test.json', '[:foo][:bar][0]')
If the procedure you were trying to work with was just extracting the JSON data to an object, you might find yourself using either of the following scenarios:
def do_the_thing(file_to_load)
file = File.read(file)
data = JSON.parse(file, { symbolize_names: true })
data[sections.to_sym]
end
do_the_thing(file_I_want)[:foo][:bar][0]
or use the dig function of Hash :
def do_the_thing(file_to_load, sections)
file = File.read(file)
data = JSON.parse(file, { symbolize_names: true })
data.dig(*sections)
end
do_the_thing(file_I_want, [:foo, :bar, 0])

How to read multiple XML files then output to multiple CSV files with the same XML filenames

I am trying to parse multiple XML files then output them into CSV files to list out the proper rows and columns.
I was able to do so by processing one file at a time by defining the filename, and specifically output them into a defined output file name:
File.open('H:/output/xmloutput.csv','w')
I would like to write into multiple files and make their name the same as the XML filenames without hard coding it. I tried doing it multiple ways but have had no luck so far.
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<record:root>
<record:Dataload_Request>
<record:name>Bob Chuck</record:name>
<record:Address_Data>
<record:Street_Address>123 Main St</record:Street_Address>
<record:Postal_Code>12345</record:Postal_Code>
</record:Address_Data>
<record:Age>45</record:Age>
</record:Dataload_Request>
</record:root>
Here is what I've tried:
require 'nokogiri'
require 'set'
files = ''
input_folder = "H:/input"
output_folder = "H:/output"
if input_folder[input_folder.length-1,1] == '/'
input_folder = input_folder[0,input_folder.length-1]
end
if output_folder[output_folder.length-1,1] != '/'
output_folder = output_folder + '/'
end
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
doc = Nokogiri::XML(file)
record = {} # hashes
keys = Set.new
records = [] # array
csv = ""
doc.traverse do |node|
value = node.text.gsub(/\n +/, '')
if node.name != "text" # skip these nodes: if class isnt text then skip
if value.length > 0 # skip empty nodes
key = node.name.gsub(/wd:/,'').to_sym
if key == :Dataload_Request && !record.empty?
records << record
record = {}
elsif key[/^root$|^document$/]
# neglect these keys
else
key = node.name.gsub(/wd:/,'').to_sym
# in case our value is html instead of text
record[key] = Nokogiri::HTML.parse(value).text
# add to our key set only if not already in the set
keys << key
end
end
end
end
# build our csv
File.open('H:/output/.*csv', 'w') do |file|
file.puts %Q{"#{keys.to_a.join('","')}"}
records.each do |record|
keys.each do |key|
file.write %Q{"#{record[key]}",}
end
file.write "\n"
end
print ''
print 'output files ready!'
print ''
end
I have been getting 'read memory': no implicit conversion of Array into String (TypeError) and other errors.
Here's a quick peer-review of your code, something like you'd get in a corporate environment...
Instead of writing:
input_folder = "H:/input"
input_folder[input_folder.length-1,1] == '/' # => false
Consider doing it using the -1 offset from the end of the string to access the character:
input_folder[-1] # => "t"
That simplifies your logic making it more readable because it's lacking unnecessary visual noise:
input_folder[-1] == '/' # => false
See [] and []= in the String documentation.
This looks like a bug to me:
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
files is an array of filenames. input_folder + '/' + files is appending an array to a string:
foo = ['1', '2'] # => ["1", "2"]
'/parent/' + foo # =>
# ~> -:9:in `+': no implicit conversion of Array into String (TypeError)
# ~> from -:9:in `<main>'
How you want to deal with that is left as an exercise for the programmer.
doc.traverse do |node|
is icky because it sidesteps the power of Nokogiri being able to search for a particular tag using accessors. Very rarely do we need to iterate over a document tag by tag, usually only when we're peeking at its structure and layout. traverse is slower so use it as a very last resort.
length is nice but isn't needed when checking whether a string has content:
value = 'foo'
value.length > 0 # => true
value > '' # => true
value = ''
value.length > 0 # => false
value > '' # => false
Programmers coming from Java like to use the accessors but I like being lazy, probably because of my C and Perl backgrounds.
Be careful with sub and gsub as they don't do what you're thinking they do. Both expect a regular expression, but will take a string which they do a escape on before beginning their scan.
You're passing in a regular expression, which is OK in this case, but it could cause unexpected problems if you don't remember all the rules for pattern matching and that gsub scans until the end of the string:
foo = 'wd:barwd:' # => "wd:barwd:"
key = foo.gsub(/wd:/,'') # => "bar"
In general I recommend people think a couple times before using regular expressions. I've seen some gaping holes opened up in logic written by fairly advanced programmers because they didn't know what the engine was going to do. They're wonderfully powerful, but need to be used surgically, not as a universal solution.
The same thing happens with a string, because gsub doesn't know when to quit:
key = foo.gsub('wd:','') # => "bar"
So, if you're looking to change just the first instance use sub:
key = foo.sub('wd:','') # => "barwd:"
I'd do it a little differently though.
foo = 'wd:bar'
I can check to see what the first three characters are:
foo[0,3] # => "wd:"
Or I can replace them with something else using string indexing:
foo[0,3] = ''
foo # => "bar"
There's more but I think that's enough for now.
You should use Ruby's CSV class. Also, you don't need to do any string matching or regex stuff. Use Nokogiri to target elements. If you know the node names in the XML will be consistent it should be pretty simple. I'm not exactly sure if this is the output you want, but this should get you in the right direction:
require 'nokogiri'
require 'csv'
def xml_to_csv(filename)
xml_str = File.read(filename)
xml_str.gsub!('record:','') # remove the record: namespace
doc = Nokogiri::XML xml_str
csv_filename = filename.gsub('.xml', '.csv')
CSV.open(csv_filename, 'wb' ) do |row|
row << ['name', 'street_address', 'postal_code', 'age']
row << [
doc.xpath('//name').text,
doc.xpath('//Street_Address').text,
doc.xpath('//Postal_Code').text,
doc.xpath('//Age').text,
]
end
end
# iterate over all xml files
Dir.glob('*.xml').each { |filename| xml_to_csv(filename) }

Using Ruby to parse and write Puppet node definitions

I am writing a helper API in Ruby to automatically create and manipulate node definitions. My code is working; it can read and write the node defs successfully, however, it is a bit clunky.
Ruby is not my main language, so I'm sure there is a cleaner, and more rubyesque solution. I would appreciate some advice or suggestions.
Each host has its own file in manifests/nodes containing just the node definition. e.g.
node 'testnode' {
class {'firstclass': }
class {'secondclass': enabled => false }
}
The classes all are either enabled (default) or disabled elements. In the Ruby code, I store these as an instance variable hash #elements.
The read method looks like this:
def read()
data = File.readlines(#filepath)
for line in data do
if line.include? 'class'
element = line[/.*\{'([^\']*)':/, 1]
if #elements.include? element.to_sym
if not line.include? 'enabled => false'
#elements[element.to_sym] = true
else
#elements[element.to_sym] = false
end
end
end
end
end
And the write method looks like this:
def write()
data = "node #{#hostname} {\n"
for element in #elements do
if element[1]
line = " class {'#{element[0]}': }\n"
else
line = " class {'#{element[0]}': enabled => false}\n"
end
data += line
end
data += "}\n"
file = File.open(#filepath, 'w')
file.write(data)
file.close()
end
One thing to add is that these systems will be isolated from the internet. So I'd prefer to avoid large number of dependency libraries as I'll need to install / maintain them manually.
If your goal is to define your node's programmatically, there is a much more straightforward way then reading and writing manifests. One of the built-in features of puppet is "External Node Classifiers"(ENC). The basic idea is that something external to puppet will define what a node should look like.
In the simplest form, the ENC can be a ruby/python/whatever script that writes out yaml with the list of classes and enabled parameters. Reading and writing yaml from ruby is as simple as it gets.
Ruby has some pretty good methods to iterate over data structures. See below for an example of how to rubify your code a little bit. I am by no means an expert on the subject, and have not tested the code. :)
def read
data = File.readlines(#filepath)
data.each_line do |line|
element = line[/.*\{'([^\']*)':/, 1].to_sym
if #elements.include?(element)
#elements[element] = line.include?('enabled => false') ? false : true
end
end
end
def write
File.open(#filepath, 'w') do |file|
file.puts "node #{#hostname} {"
#elements.each do |element|
if element[1]
file.puts " class {'#{element[0]}': }"
else
file.puts " class {'#{element[0]}': enabled => false }"
end
end
file.puts '}'
end
end
Hope this points you in the right direction.

How do I test reading a file?

I'm writing a test for one of my classes which has the following constructor:
def initialize(filepath)
#transactions = []
File.open(filepath).each do |line|
next if $. == 1
elements = line.split(/\t/).map { |e| e.strip }
transaction = Transaction.new(elements[0], Integer(1))
#transactions << transaction
end
end
I'd like to test this by using a fake file, not a fixture. So I wrote the following spec:
it "should read a file and create transactions" do
filepath = "path/to/file"
mock_file = double(File)
expect(File).to receive(:open).with(filepath).and_return(mock_file)
expect(mock_file).to receive(:each).with(no_args()).and_yield("phrase\tvalue\n").and_yield("yo\t2\n")
filereader = FileReader.new(filepath)
filereader.transactions.should_not be_nil
end
Unfortunately this fails because I'm relying on $. to equal 1 and increment on every line and for some reason that doesn't happen during the test. How can I ensure that it does?
Global variables make code hard to test. You could use each_with_index:
File.open(filepath) do |file|
file.each_with_index do |line, index|
next if index == 0 # zero based
# ...
end
end
But it looks like you're parsing a CSV file with a header line. Therefore I'd use Ruby's CSV library:
require 'csv'
CSV.foreach(filepath, col_sep: "\t", headers: true, converters: :numeric) do |row|
#transactions << Transaction.new(row['phrase'], row['value'])
end
You can (and should) use IO#each_line together with Enumerable#each_with_index which will look like:
File.open(filepath).each_line.each_with_index do |line, i|
next if i == 1
# …
end
Or you can drop the first line, and work with others:
File.open(filepath).each_line.drop(1).each do |line|
# …
end
If you don't want to mess around with mocking File for each test you can try FakeFS which implements an in memory file system based on StringIO that will clean up automatically after your tests.
This way your test's don't need to change if your implementation changes.
require 'fakefs/spec_helpers'
describe "FileReader" do
include FakeFS::SpecHelpers
def stub_file file, content
FileUtils.mkdir_p File.dirname(file)
File.open( file, 'w' ){|f| f.write( content ); }
end
it "should read a file and create transactions" do
file_path = "path/to/file"
stub_file file_path, "phrase\tvalue\nyo\t2\n"
filereader = FileReader.new(file_path)
expect( filereader.transactions ).to_not be_nil
end
end
Be warned: this is an implementation of most of the file access in Ruby, passing it back onto the original method where possible. If you are doing anything advanced with files you may start running into bugs in the FakeFS implementation. I got stuck with some binary file byte read/write operations which weren't implemented in FakeFS quite how Ruby implemented them.

Manipulating XML files in ruby with XmlSimple

I've got a complex XML file, and I want to extract a content of a specific tag from it.
I use a ruby script with XmlSimple gem. I retrieve an XML file with HTTP request, then strip all the unnecessary tags and pull out necessary info. That's the script itself:
data = XmlSimple.xml_in(response.body)
hash_1 = Hash[*data['results']]
def find_value(hash, value)
hash.each do |key, val|
if val[0].kind_of? Hash then
find_value(val[0], value)
else
if key.to_s.eql? value
puts val
end
end
end
end
hash_1['book'].each do |arg|
find_value(arg, "title")
puts("\n")
end
The problem is, that when I change replace puts val with return val, and then call find_value method with puts find_value (arg, "title"), i get the whole contents of hash_1[book] on the screen.
How to correct the find_value method?
A "complex XML file" and XmlSimple don't mix. Your task would be solved a lot easier with Nokogiri, and be faster as well:
require 'nokogiri'
doc = Nokogiri::XML(response.body)
puts doc.xpath('//book/title/text()')

Resources