Using Ruby to parse and write Puppet node definitions - ruby

I am writing a helper API in Ruby to automatically create and manipulate node definitions. My code is working; it can read and write the node defs successfully, however, it is a bit clunky.
Ruby is not my main language, so I'm sure there is a cleaner, and more rubyesque solution. I would appreciate some advice or suggestions.
Each host has its own file in manifests/nodes containing just the node definition. e.g.
node 'testnode' {
class {'firstclass': }
class {'secondclass': enabled => false }
}
The classes all are either enabled (default) or disabled elements. In the Ruby code, I store these as an instance variable hash #elements.
The read method looks like this:
def read()
data = File.readlines(#filepath)
for line in data do
if line.include? 'class'
element = line[/.*\{'([^\']*)':/, 1]
if #elements.include? element.to_sym
if not line.include? 'enabled => false'
#elements[element.to_sym] = true
else
#elements[element.to_sym] = false
end
end
end
end
end
And the write method looks like this:
def write()
data = "node #{#hostname} {\n"
for element in #elements do
if element[1]
line = " class {'#{element[0]}': }\n"
else
line = " class {'#{element[0]}': enabled => false}\n"
end
data += line
end
data += "}\n"
file = File.open(#filepath, 'w')
file.write(data)
file.close()
end
One thing to add is that these systems will be isolated from the internet. So I'd prefer to avoid large number of dependency libraries as I'll need to install / maintain them manually.

If your goal is to define your node's programmatically, there is a much more straightforward way then reading and writing manifests. One of the built-in features of puppet is "External Node Classifiers"(ENC). The basic idea is that something external to puppet will define what a node should look like.
In the simplest form, the ENC can be a ruby/python/whatever script that writes out yaml with the list of classes and enabled parameters. Reading and writing yaml from ruby is as simple as it gets.

Ruby has some pretty good methods to iterate over data structures. See below for an example of how to rubify your code a little bit. I am by no means an expert on the subject, and have not tested the code. :)
def read
data = File.readlines(#filepath)
data.each_line do |line|
element = line[/.*\{'([^\']*)':/, 1].to_sym
if #elements.include?(element)
#elements[element] = line.include?('enabled => false') ? false : true
end
end
end
def write
File.open(#filepath, 'w') do |file|
file.puts "node #{#hostname} {"
#elements.each do |element|
if element[1]
file.puts " class {'#{element[0]}': }"
else
file.puts " class {'#{element[0]}': enabled => false }"
end
end
file.puts '}'
end
end
Hope this points you in the right direction.

Related

How to "observe" a stream in Ruby's CSV module?

I am writing a class that takes a CSV files, transforms it, and then writes the new data out.
module Transformer
class Base
def initialize(file)
#file = file
end
def original_data(&block)
opts = { headers: true }
CSV.open(file, 'rb', opts, &block)
end
def transformer
# complex manipulations here like modifying columns, picking only certain
# columns to put into new_data, etc but simplified to `+10` to keep
# example concise
-> { |row| new_data << row['some_header'] + 10 }
end
def transformed_data
self.original_data(self.transformer)
end
def write_new_data
CSV.open('new_file.csv', 'wb', opts) do |new_data|
transformed_data
end
end
end
end
What I'd like to be able to do is:
Look at the transformed data without writing it out (so I can test that it transforms the data correctly, and I don't need to write it to file right away: maybe I want to do more manipulation before writing it out)
Don't slurp all the file at once, so it works no matter the size of the original data
Have this as a base class with an empty transformer so that instances only need to implement their own transformers but the behavior for reading and writing is given by the base class.
But obviously the above doesn't work because I don't really have a reference to new_data in transformer.
How could I achieve this elegantly?
I can recommend one of two approaches, depending on your needs and personal taste.
I have intentionally distilled the code to just its bare minimum (without your wrapping class), for clarity.
1. Simple read-modify-write loop
Since you do not want to slurp the file, use CSV::Foreach. For example, for a quick debugging session, do:
CSV.foreach "source.csv", headers: true do |row|
row["name"] = row["name"].upcase
row["new column"] = "new value"
p row
end
And if you wish to write to file during that same iteration:
require 'csv'
csv_options = { headers: true }
# Open the target file for writing
CSV.open("target.csv", "wb") do |target|
# Add a header
target << %w[new header column names]
# Iterate over the source CSV rows
CSV.foreach "source.csv", **csv_options do |row|
# Mutate and add columns
row["name"] = row["name"].upcase
row["new column"] = "new value"
# Push the new row to the target file
target << row
end
end
2. Using CSV::Converters
There is a built in functionality that might be helpful - CSV::Converters - (see the :converters definition in the CSV::New documentation)
require 'csv'
# Register a converter in the options hash
csv_options = { headers: true, converters: [:stripper] }
# Define a converter
CSV::Converters[:stripper] = lambda do |value, field|
value ? value.to_s.strip : value
end
CSV.open("target.csv", "wb") do |target|
# same as above
CSV.foreach "source.csv", **csv_options do |row|
# same as above - input data will already be converted
# you can do additional things here if needed
end
end
3. Separate input and output from your converter classes
Based on your comment, and since you want to minimize I/O and iterations, perhaps extracting the read/write operations from the responsibility of the transformers might be of interest. Something like this.
require 'csv'
class NameCapitalizer
def self.call(row)
row["name"] = row["name"].upcase
end
end
class EmailRemover
def self.call(row)
row.delete 'email'
end
end
csv_options = { headers: true }
converters = [NameCapitalizer, EmailRemover]
CSV.open("target.csv", "wb") do |target|
CSV.foreach "source.csv", **csv_options do |row|
converters.each { |c| c.call row }
target << row
end
end
Note that the above code still does not handle the header, in case it was changed. You will probably have to reserve the last row (after all transformations) and prepend its #headers to the output CSV.
There are probably plenty other ways to do it, but the CSV class in Ruby does not have the cleanest interface, so I try to keep code that deals with it as simple as I can.

How to parse CSON to Ruby object?

I am trying to read CSON (CoffeeScript Object Notation) into Ruby.
I am looking for something similar to data = JSON.parse(file) that one would use for JSON files.
file = File.read(filename)
data = CSON.parse(file) # does not exist - would like to have
I looked into invoking CoffeeScript and JavaScript from Ruby, but it feels overly complicated and like reinventing the wheel. Also, code in the data file should not be executed.
How can I read CSON into Ruby objects in a simple way?
This is what I came up with. It is sufficient for the data I am processing. The main work is done with the YAML parser Psych (https://github.com/ruby/psych). Arrays, hashes, and some of the multi-line text require a special treatment.
module CSON
def load_file(fname)
load_string File.read fname
end
def remove_indent(data)
out = ""
data.each_line do |line|
out += line.sub /^\s\s/,""
end
out
end
def parse_array(data)
data.gsub! /\n/, ","
data.gsub! /([\[\{]),/, '\1'
data.gsub! /,([\]\}])/, '\1'
YAML.load data
end
def load_string(data)
hashed = {}
data.gsub! /^(\w+):\s+(\[.*?\])/mu do # find arrays
key = Regexp.last_match[1]
value = parse_array Regexp.last_match[2]
hashed[key] = value
""
end
data.gsub! /(\w+):\s+\'\'\'\s*\n(.*?)\'\'\'/mu do # find heredocs
hashed[Regexp.last_match[1]] = remove_indent Regexp.last_match[2]
""
end
hashed.merge YAML.load data
end
end
This solution is likely to fail when applied to more complicated .cson files. I would be happy to see if someone has a more elegant answer!

How can I process huge JSON files as streams in Ruby, without consuming all memory?

I'm having trouble processing a huge JSON file in Ruby. What I'm looking for is a way to process it entry-by-entry without keeping too much data in memory.
I thought that yajl-ruby gem would do the work but it consumes all my memory. I've also looked at Yajl::FFI and JSON:Stream gems but there it is clearly stated:
For larger documents we can use an IO object to stream it into the
parser. We still need room for the parsed object, but the document
itself is never fully read into memory.
Here's what I've done with Yajl:
file_stream = File.open(file, "r")
json = Yajl::Parser.parse(file_stream)
json.each do |entry|
entry.do_something
end
file_stream.close
The memory usage keeps getting higher until the process is killed.
I don't see why Yajl keeps processed entries in the memory. Can I somehow free them, or did I just misunderstood the capabilities of Yajl parser?
If it cannot be done using Yajl: is there a way to do this in Ruby via any library?
Problem
json = Yajl::Parser.parse(file_stream)
When you invoke Yajl::Parser like this, the entire stream is loaded into memory to create your data structure. Don't do that.
Solution
Yajl provides Parser#parse_chunk, Parser#on_parse_complete, and other related methods that enable you to trigger parsing events on a stream without requiring that the whole IO stream be parsed at once. The README contains an example of how to use chunking instead.
The example given in the README is:
Or lets say you didn't have access to the IO object that contained JSON data, but instead only had access to chunks of it at a time. No problem!
(Assume we're in an EventMachine::Connection instance)
def post_init
#parser = Yajl::Parser.new(:symbolize_keys => true)
end
def object_parsed(obj)
puts "Sometimes one pays most for the things one gets for nothing. - Albert Einstein"
puts obj.inspect
end
def connection_completed
# once a full JSON object has been parsed from the stream
# object_parsed will be called, and passed the constructed object
#parser.on_parse_complete = method(:object_parsed)
end
def receive_data(data)
# continue passing chunks
#parser << data
end
Or if you don't need to stream it, it'll just return the built object from the parse when it's done. NOTE: if there are going to be multiple JSON strings in the input, you must specify a block or callback as this is how yajl-ruby will hand you (the caller) each object as it's parsed off the input.
obj = Yajl::Parser.parse(str_or_io)
One way or another, you have to parse only a subset of your JSON data at a time. Otherwise, you are simply instantiating a giant Hash in memory, which is exactly the behavior you describe.
Without knowing what your data looks like and how your JSON objects are composed, it isn't possible to give a more detailed explanation than that; as a result, your mileage may vary. However, this should at least get you pointed in the right direction.
Both #CodeGnome's and #A. Rager's answer helped me understand the solution.
I ended up creating the gem json-streamer that offers a generic approach and spares the need to manually define callbacks for every scenario.
Your solutions seem to be json-stream and yajl-ffi. There's an example on both that're pretty similar (they're from the same guy):
def post_init
#parser = Yajl::FFI::Parser.new
#parser.start_document { puts "start document" }
#parser.end_document { puts "end document" }
#parser.start_object { puts "start object" }
#parser.end_object { puts "end object" }
#parser.start_array { puts "start array" }
#parser.end_array { puts "end array" }
#parser.key {|k| puts "key: #{k}" }
#parser.value {|v| puts "value: #{v}" }
end
def receive_data(data)
begin
#parser << data
rescue Yajl::FFI::ParserError => e
close_connection
end
end
There, he sets up the callbacks for possible data events that the stream parser can experience.
Given a json document that looks like:
{
1: {
name: "fred",
color: "red",
dead: true,
},
2: {
name: "tony",
color: "six",
dead: true,
},
...
n: {
name: "erik",
color: "black",
dead: false,
},
}
One could stream parse it with yajl-ffi something like this:
def parse_dudes file_io, chunk_size
parser = Yajl::FFI::Parser.new
object_nesting_level = 0
current_row = {}
current_key = nil
parser.start_object { object_nesting_level += 1 }
parser.end_object do
if object_nesting_level.eql? 2
yield current_row #here, we yield the fully collected record to the passed block
current_row = {}
end
object_nesting_level -= 1
end
parser.key do |k|
if object_nesting_level.eql? 2
current_key = k
elsif object_nesting_level.eql? 1
current_row["id"] = k
end
end
parser.value { |v| current_row[current_key] = v }
file_io.each(chunk_size) { |chunk| parser << chunk }
end
File.open('dudes.json') do |f|
parse_dudes f, 1024 do |dude|
pp dude
end
end

How to read data from a different file without using YAML or JSON

I'm experimenting with a Ruby script that will add data to a Neo4j database using REST API. (Here's the tutorial with all the code if interested.)
The script works if I include the hash data structure in the initialize method but I would like to move the data into a different file so I can make changes to it separately using a different script.
I'm relatively new to Ruby. If I copy the following data structure into a separate file, is there a simple way to read it from my existing script when I call #data? I've heard one could do something with YAML or JSON (not familiar with how either work). What's the easiest way to read a file and how could I go about coding that?
#I want to copy this data into a different file and read it with my script when I call #data.
{
nodes:[
{:label=>"Person", :title=>"title_here", :name=>"name_here"}
]
}
And here is part of my code, it should be enough for the purposes of this question.
class RGraph
def initialize
#url = 'http://localhost:7474/db/data/cypher'
#If I put this hash structure into a different file, how do I make #data read that file?
#data = {
nodes:[
{:label=>"Person", :title=>"title_here", :name=>"name_here"}
]
}
end
#more code here... not relevant to question
def create_nodes
# Scan file, find each node and create it in Neo4j
#data.each do |key,value|
if key == :nodes
#data[key].each do |node| # Cycle through each node
next unless node.has_key?(:label) # Make sure this node has a label
#WE have sufficient data to create a node
label = node[:label]
attr = Hash.new
node.each do |k,v| # Hunt for additional attributes
next if k == :label # Don't create an attribute for "label"
attr[k] = v
end
create_node(label,attr)
end
end
end
end
rGraph = RGraph.new
rGraph.create_nodes
end
Given that OP said in comments "I'm not against using either of those", let's do it in YAML (which preserves the Ruby object structure best). Save it:
#data = {
nodes:[
{:label=>"Person", :title=>"title_here", :name=>"name_here"}
]
}
require 'yaml'
File.write('config.yaml', YAML.dump(#data))
This will create config.yaml:
---
:nodes:
- :label: Person
:title: title_here
:name: name_here
If you read it in, you get exactly what you saved:
require 'yaml'
#data = YAML.load(File.read('config.yaml'))
puts #data.inspect
# => {:nodes=>[{:label=>"Person", :title=>"title_here", :name=>"name_here"}]}

How do I test reading a file?

I'm writing a test for one of my classes which has the following constructor:
def initialize(filepath)
#transactions = []
File.open(filepath).each do |line|
next if $. == 1
elements = line.split(/\t/).map { |e| e.strip }
transaction = Transaction.new(elements[0], Integer(1))
#transactions << transaction
end
end
I'd like to test this by using a fake file, not a fixture. So I wrote the following spec:
it "should read a file and create transactions" do
filepath = "path/to/file"
mock_file = double(File)
expect(File).to receive(:open).with(filepath).and_return(mock_file)
expect(mock_file).to receive(:each).with(no_args()).and_yield("phrase\tvalue\n").and_yield("yo\t2\n")
filereader = FileReader.new(filepath)
filereader.transactions.should_not be_nil
end
Unfortunately this fails because I'm relying on $. to equal 1 and increment on every line and for some reason that doesn't happen during the test. How can I ensure that it does?
Global variables make code hard to test. You could use each_with_index:
File.open(filepath) do |file|
file.each_with_index do |line, index|
next if index == 0 # zero based
# ...
end
end
But it looks like you're parsing a CSV file with a header line. Therefore I'd use Ruby's CSV library:
require 'csv'
CSV.foreach(filepath, col_sep: "\t", headers: true, converters: :numeric) do |row|
#transactions << Transaction.new(row['phrase'], row['value'])
end
You can (and should) use IO#each_line together with Enumerable#each_with_index which will look like:
File.open(filepath).each_line.each_with_index do |line, i|
next if i == 1
# …
end
Or you can drop the first line, and work with others:
File.open(filepath).each_line.drop(1).each do |line|
# …
end
If you don't want to mess around with mocking File for each test you can try FakeFS which implements an in memory file system based on StringIO that will clean up automatically after your tests.
This way your test's don't need to change if your implementation changes.
require 'fakefs/spec_helpers'
describe "FileReader" do
include FakeFS::SpecHelpers
def stub_file file, content
FileUtils.mkdir_p File.dirname(file)
File.open( file, 'w' ){|f| f.write( content ); }
end
it "should read a file and create transactions" do
file_path = "path/to/file"
stub_file file_path, "phrase\tvalue\nyo\t2\n"
filereader = FileReader.new(file_path)
expect( filereader.transactions ).to_not be_nil
end
end
Be warned: this is an implementation of most of the file access in Ruby, passing it back onto the original method where possible. If you are doing anything advanced with files you may start running into bugs in the FakeFS implementation. I got stuck with some binary file byte read/write operations which weren't implemented in FakeFS quite how Ruby implemented them.

Resources