Check file header and then loop through - ruby

Suppose I have a file in the following format.
date|time|account
2010-01-01|07:00:00|A1
2010-01-01|07:00:01|A2
....
Suppose I have the following function.
def ReadLongFile(longFile)
CSV.foreach(longFile, :headers => true, :col_sep => '|') do |row|
p row.to_hash
end
end
I like this function because it allows me to store each line as a hash where the header entries are the keys, and the line entries are the corresponding values. However, what is the most efficient way to modify it such that I can verify the header contains the correct entries? I was considering two options. First, I could open it another function and check the first line. Second, I could check within the function, but it would perform the check each iteration.

I would suggest using the CSV::header_row function to perform the check, and raising an error if it's not what you expect. Something like:
def ReadLongFile(longFile)
CSV.foreach(longFile, :headers => true, :return_headers => true, :col_sep => '|') do |row|
if row.header_row? then
raise ArgumentError, "Bad headers" unless header_sane?(row)
end
# Otherwise do the processing
end
end
Your implemenation of header_sane? will perform the validation that you need to ensure the file is what you expect it to be. Your calling code can rescue the ArgumentError if it can recover from it, or just let it fail :-)
Note: Updated to reflect error noted in the comments below. Be sure to set the :return_headers option when calling CSV::foreach.
If you are worried about the minimal overhead of calling header_row? for each of the row entries, you can construct a CSV instance and use shift to manually check the first row before continuing. For instance:
def ReadLongFile(longFile)
File.open(longFile) do |file|
reader = CSV.new(file, {:col_sep => '|', :headers => true, :return_headers => true})
header_row = reader.shift
raise ArgumentError, "Bad file headers" unless header_sane?(header_row)
reader.each do |row|
p row
end
end
end
Implemented as above, the following behavior holds true:
[4] pry(main)> def header_sane? row
[4] pry(main)* true
[4] pry(main)* end
=> nil
[5] pry(main)> ReadLongFile("file.csv")
#<CSV::Row "date":"2010-01-01" "time":"07:00:00" "account":"A1">
#<CSV::Row "date":"2010-01-01" "time":"07:00:01" "account":"A2">
=> nil
[6] pry(main)> def header_sane? row
[6] pry(main)* false
[6] pry(main)* end
=> nil
[7] pry(main)> ReadLongFile("file.csv")
ArgumentError: Bad file headers
from (pry):7:in `block in ReadLongFile'

Related

Why is ruby acting like passing by reference when using gsub function in Ruby? [duplicate]

This question already has answers here:
Ruby 'pass by value' clarification [duplicate]
(3 answers)
Closed 4 years ago.
Given the following two methods:
[53] pry(main)> def my_method
[53] pry(main)* leti = 'leti'
[53] pry(main)* edit(leti)
[53] pry(main)* leti
[53] pry(main)* end
=> :my_method
[54] pry(main)> def edit(a_leti)
[54] pry(main)* a_leti.gsub!('e', '3')
[54] pry(main)* a_leti
[54] pry(main)* end
=> :edit
[55] pry(main)> my_method
=> "l3ti"
Can someone explain why I am getting the value edited inside the edit method and not the original value ('leti'). I though Ruby was passed by value. In fact, if instead of using the function gsub I use a simple assignment, I get the original value. Does the gsub! make it by reference?
Thank you!
In Ruby: Objects like strings are passed by reference. Variables with objects like strings are in fact references to those strings. Parameters are passed by value. However, for strings, these are references to those strings.
So here is the classic example:
irb(main):004:0* a = "abcd"
=> "abcd"
irb(main):005:0> b = a
=> "abcd"
irb(main):006:0> b << "def"
=> "abcddef"
irb(main):007:0> a
=> "abcddef"
irb(main):008:0> b
=> "abcddef"
If you do not wish to modify the original string, you need to make a copy of it:
Three ways (of many) to do this are:
b = a.dup
b = a.clone
b = String.new a
Using dup
irb(main):009:0> a = "abcd"
=> "abcd"
irb(main):010:0> b = a.dup
=> "abcd"
irb(main):011:0> b << "def"
=> "abcddef"
irb(main):012:0> a
=> "abcd"
irb(main):013:0> b
=> "abcddef"
BTW: For myself, this effect is the number one cause of defects in my own code.

Round-trip JSON serialization in Ruby

Suppose I have a simple class
class Person
attr_accessor :name
def say
puts name
end
end
Is there a way to serialize it to JSON and back and get instance of the same class?
For example I would like to have a code like
p = Person.new
p.name = 'bob'
json = JSON.serialize p
# json should be smth. containing { 'name' : 'bob' }
# and maybe some additional information required for later deserialization
p2 = JSON.deserialize
p2.say
# should output 'bob'
I tried as_json (from ActiveSupport I guess), but result is {'name': 'bob'} and obviously type information is lost and after deserialization I just have a hash, not a Person instance.
Ruby's JSON library supports the Marshal interface. Short answer: you need to define #to_json and self#json_create in your class.
The trick is that you need to store the name of the class you want to round-trip back to in the json itself; the default place to do this is as the value of the key json_class and there's likely no reason to change it.
Here's a ridiculously simple example:
require 'json'
class A
attr_accessor :a,:b
def initialize(a,b)
#a = a
#b = b
end
def to_json(*a)
{
"json_class" => self.class.name,
"data" => {:a => #a, :b=>#b}
}.to_json(*a)
end
def self.json_create(h)
self.new(h["data"]["a"], h["data"]["b"])
end
end
Then you can round-trip it with JSON.generate and JSON.load. Note that JSON.parse will not work; it'll just give you back the expected hash.
[29] pry(main)> x = A.new(1,2)
=> #<A:0x007fbda457efe0 #a=1, #b=2>
[30] pry(main)> y = A.new(3,4)
=> #<A:0x007fbda456ea78 #a=3, #b=4>
[31] pry(main)> str = JSON.generate(x)
=> "{\"json_class\":\"A\",\"data\":{\"a\":1,\"b\":2}}"
[32] pry(main)> z = JSON.load(str)
=> #<A:0x007fbda43fc050 #a=1, #b=2>
[33] pry(main)> arr = [x,y,z]
=> [#<A:0x007fbda457efe0 #a=1, #b=2>, #<A:0x007fbda456ea78 #a=3, #b=4>, #<A:0x007fbda43fc050 #a=1, #b=2>]
[34] pry(main)> str = JSON.generate(arr)
=> "[{\"json_class\":\"A\",\"data\":{\"a\":1,\"b\":2}},{\"json_class\":\"A\",\"data\":{\"a\":3,\"b\":4}},{\"json_class\":\"A\",\"data\":{\"a\":1,\"b\":2}}]"
[35] pry(main)> arr2 = JSON.load(str)
=> [#<A:0x007fbda4120a48 #a=1, #b=2>, #<A:0x007fbda4120700 #a=3, #b=4>, #<A:0x007fbda4120340 #a=1, #b=2>]

Ruby: calculate average() while excluding nil values from data

I'm very new to Ruby and I'm having some difficulties with a seemingly simple problem.
Code is here...
https://github.com/sensu/sensu-community-plugins/blob/master/plugins/graphite/check-stats.rb
...but I've included a full copy of the current source at the end, because it may change as new versions are submitted to Github.
It's a Sensu plugin. It collects data from Graphite via an HTTP request. Stores the reply in body, which is then JSON.parse() into data.
For each metric in data, it collects datapoints, and performs an average on the datapoints. If average is higher than certain thresholds (options -w or -c), it throws a warning or a critical.
Sometimes the Graphite store is a bit behind times. The most recent data point may be missing from some metrics. When that happens, the data point is nil.
The problem is, nil is counted as zero when computing average(datapoints). This artificially lowers the average, sometimes to the effect that the plugin doesn't trigger when it should.
What's the best way to eliminate the nil values from the calculation of average?
Ideally, the elimination of the nils should happen in such a way that, if all data points are nil, then it should trigger the datapoints.empty condition. Basically, kill all the nils before they reach "unless datapoints.empty?" because if all are nil then we don't actually have any data points.
Or somehow metric.collect{} should skip the nil values.
I've tried to use .compact but that didn't seem to make a difference (probably I've used it wrong).
This is the current version of the code:
#!/usr/bin/env ruby
#
# Checks metrics in graphite, averaged over a period of time.
#
# The fired sensu event will only be critical if a stat is
# above the critical threshold. Otherwise, the event will be warning,
# if a stat is above the warning threshold.
#
# Multiple stats will be checked if * are used
# in the "target" query.
#
# Author: Alan Smith (alan#asmith.me)
# Date: 08/28/2014
#
require 'rubygems' if RUBY_VERSION < '1.9.0'
require 'json'
require 'net/http'
require 'sensu-plugin/check/cli'
class CheckGraphiteStat < Sensu::Plugin::Check::CLI
option :host,
:short => "-h HOST",
:long => "--host HOST",
:description => "graphite hostname",
:proc => proc {|p| p.to_s },
:default => "graphite"
option :period,
:short => "-p PERIOD",
:long => "--period PERIOD",
:description => "The period back in time to extract from Graphite. Use -24hours, -2days, -15mins, etc, same format as in Graphite",
:proc => proc {|p| p.to_s },
:required => true
option :target,
:short => "-t TARGET",
:long => "--target TARGET",
:description => "The graphite metric name. Can include * to query multiple metrics",
:proc => proc {|p| p.to_s },
:required => true
option :warn,
:short => "-w WARN",
:long => "--warn WARN",
:description => "Warning level",
:proc => proc {|p| p.to_f },
:required => false
option :crit,
:short => "-c Crit",
:long => "--crit CRIT",
:description => "Critical level",
:proc => proc {|p| p.to_f },
:required => false
def average(a)
total = 0
a.to_a.each {|i| total += i.to_f}
total / a.length
end
def danger(metric)
datapoints = metric['datapoints'].collect {|p| p[0].to_f}
unless datapoints.empty?
avg = average(datapoints)
if !config[:crit].nil? && avg > config[:crit]
return [2, "#{metric['target']} is #{avg}"]
elsif !config[:warn].nil? && avg > config[:warn]
return [1, "#{metric['target']} is #{avg}"]
end
end
[0, nil]
end
def run
body =
begin
uri = URI("http://#{config[:host]}/render?format=json&target=#{config[:target]}&from=#{config[:period]}")
res = Net::HTTP.get_response(uri)
res.body
rescue Exception => e
warning "Failed to query graphite: #{e.inspect}"
end
status = 0
message = ''
data =
begin
JSON.parse(body)
rescue
[]
end
unknown "No data from graphite" if data.empty?
data.each do |metric|
s, msg = danger(metric)
message += "#{msg} " unless s == 0
status = s unless s < status
end
if status == 2
critical message
elsif status == 1
warning message
end
ok
end
end
Well, if you want to eliminate nils before doing collect, you can do
metric['datapoints'].reject { |p| p.nil? }.collect {|p| p[0].to_f}
instead of
metric['datapoints'].collect {|p| p[0].to_f}
BTW, you average can also be rewritten as
def average(a)
a.reduce(0,:+)/a.size
end
You can use Array#compact which does exactly that:
["a", nil, "b", nil, "c", nil].compact
#=> [ "a", "b", "c" ]
http://ruby-doc.org/core-2.1.3/Array.html#method-i-compact

using a string or key-val pair as a method argument

Is there a better way to write this? basically I want to add an argument to a hash. if the argument is a key-val pair, then id like to add it as is. if the argument is a string i'd like to add it as a key with a nil value. the below code works, but is there a more appropriate (simple) way?
2nd question, does calling an each method on an array with two arguments |key, val| automatically convert an array to a hash as it appears to?
#some_hash = {}
def some_method(input)
if input.is_a? Hash
input.each {|key, val| #some_hash[key] = val}
else
input.split(" ").each {|key, val| #some_hash[key] = val}
end
end
some_method("key" => "val")
This gives the result as instructed in the question, but it works differently from the code OP gave (which means that the OP's code does not work as it says):
#some_hash = {}
def some_method(input)
case input
when Hash then #some_hash.merge!(input)
when String then #some_hash[input] = nil
end
end
some_method("foo" => "bar")
some_method("baz")
#some_hash # => {"foo" => "bar", "baz" => nil}
Second question
An array is never automatically converted to a hash. What you are probably mentioning is the fact that the elements of an array within an array [[:foo, :bar]] can be referred to separately in:
[[:foo, :bar]].each{|f, b| puts f; puts b}
# => foo
# => bar
That is due to destructive assignment. When necessary, Ruby takes out the elements of an array as separate things and tries to adjust the number of variables. It is the same as:
f, b = [:foo, :bar]
f # => :foo
b # => :bar
Here, you don't get f # => [:foo, :bar] and b # => nil.

How do I skip headers while writing CSV?

I am writing a CSV file and CSV.dump outputs two header lines which I don't want.
I tried setting :write_headers => false but still it outputs a header:
irb> A = Struct.new(:a, :b)
=> A
irb> a = A.new(1,2)
=> #<struct A a=1, b=2>
irb> require 'csv'
=> true
irb> puts CSV.dump [a], '', :write_headers => false, :headers=>false
class,A
a=,b=
1,2
I don't think you can do it with option parameters. But you can easily accomplish what you want by not using the generate method
irb> arr = [a, a]
=> [#<struct A a=1, b=2>, #<struct A a=1, b=2>]
irb> csv_string = CSV.generate do |csv|
irb* arr.each {|a| csv << a}
irb> end
irb> puts csv_string
1,2
1,2
=> nil
I think the problem is two-fold:
CSV.dump [a]
wraps an instance of the struct a in an array, which then CSV tries to marshall. While that might be useful sometimes, when trying to generate a CSV file for consumption by some other non-Ruby app that recognizes CSV, you're going to end up with values that can't be used. Looking at the output, it isn't CSV:
class,A
a=,b=
1,2
Looking at it in IRB shows:
=> "class,A\na=,b=\n1,2\n"
which, again, isn't going to be accepted by something like a spreadsheet or database. So, another tactic is needed.
Removing the array from a doesn't help:
CSV.dump a
=> "class,Fixnum\n\n\n\n"
Heading off a different way, I looked at a standard way of generating CSV from an array:
puts a.to_a.to_csv
=> 1,2
An alternate way to create it is:
CSV.generate do |csv|
csv << a.to_a
end
=> "1,2\n"

Resources