E.G.
def do_the_thing(file_to_load, hash_path)
file = File.read(file)
data = JSON.parse(file, { symbolize_names: true })
data[sections.to_sym]
end
do_the_thing(file_I_want, '[:foo][:bar][0]')
Tried a few methods but failed so far.
Thanks for any help in advance :)
Assuming you missed the parameters names...
Lets assume our file is:
// test.json
{
"foo": {
"bar": ["foobar"]
}
}
Recomended solution
Does your param really need to be a string??
If your code can be more flexible, and pass arguments as they are on ruby, you can use the Hash dig method:
require 'json'
def do_the_thing(file, *hash_path)
file = File.read(file)
data = JSON.parse(file, symbolize_names: true)
data.dig(*hash_path)
end
do_the_thing('test.json', :foo, :bar, 0)
You should get
"foobar"
It should work fine !!
Read the rest of the answer if that doesn't satisfy your question
Alternative solution (using the same argument)
If you REALLY need to use that argument as string, you can;
Treat your params to adapt to the first solution, it won't be a small or fancy code, but it will work:
require 'json'
BRACKET_REGEX = /(\[[^\[]*\])/.freeze
# Treats the literal string to it's correspondent value
def treat_type(param)
# Remove the remaining brackets from the string
# You could do this step directly on the regex if you want to
param = param[1..-2]
case param[0]
# Checks if it is a string
when '\''
param[1..-2]
# Checks if it is a symbol
when ':'
param[1..-1].to_sym
else
begin
Integer(param)
rescue ArgumentError
param
end
end
end
# Converts your param to the accepted pattern of 'dig' method
def string_to_args(param)
# Scan method will break the match results of the regex into an array
param.scan(BRACKET_REGEX).flatten.map { |match| treat_type(match) }
end
def do_the_thing(file, hash_path)
hash_path = string_to_args(hash_path)
file = File.read(file)
data = JSON.parse(file, symbolize_names: true)
data.dig(*hash_path)
end
so:
do_the_thing('test.json', '[:foo][:bar][0]')
returns
"foobar"
This solution though is open to bugs when the "hash_path" is not on an acceptable pattern, and treating it's bugs might make the code even longer
Shortest solution (Not safe)
You can use Kernel eval method which I EXTREMELY discourage to use for security reasons, read the documentation and understand its danger before using it
require 'json'
def do_the_thing(file, hash_path)
file = File.read(file)
data = JSON.parse(file, symbolize_names: true)
eval("data#{hash_path}")
end
do_the_thing('test.json', '[:foo][:bar][0]')
If the procedure you were trying to work with was just extracting the JSON data to an object, you might find yourself using either of the following scenarios:
def do_the_thing(file_to_load)
file = File.read(file)
data = JSON.parse(file, { symbolize_names: true })
data[sections.to_sym]
end
do_the_thing(file_I_want)[:foo][:bar][0]
or use the dig function of Hash :
def do_the_thing(file_to_load, sections)
file = File.read(file)
data = JSON.parse(file, { symbolize_names: true })
data.dig(*sections)
end
do_the_thing(file_I_want, [:foo, :bar, 0])
Related
I am writing a class that takes a CSV files, transforms it, and then writes the new data out.
module Transformer
class Base
def initialize(file)
#file = file
end
def original_data(&block)
opts = { headers: true }
CSV.open(file, 'rb', opts, &block)
end
def transformer
# complex manipulations here like modifying columns, picking only certain
# columns to put into new_data, etc but simplified to `+10` to keep
# example concise
-> { |row| new_data << row['some_header'] + 10 }
end
def transformed_data
self.original_data(self.transformer)
end
def write_new_data
CSV.open('new_file.csv', 'wb', opts) do |new_data|
transformed_data
end
end
end
end
What I'd like to be able to do is:
Look at the transformed data without writing it out (so I can test that it transforms the data correctly, and I don't need to write it to file right away: maybe I want to do more manipulation before writing it out)
Don't slurp all the file at once, so it works no matter the size of the original data
Have this as a base class with an empty transformer so that instances only need to implement their own transformers but the behavior for reading and writing is given by the base class.
But obviously the above doesn't work because I don't really have a reference to new_data in transformer.
How could I achieve this elegantly?
I can recommend one of two approaches, depending on your needs and personal taste.
I have intentionally distilled the code to just its bare minimum (without your wrapping class), for clarity.
1. Simple read-modify-write loop
Since you do not want to slurp the file, use CSV::Foreach. For example, for a quick debugging session, do:
CSV.foreach "source.csv", headers: true do |row|
row["name"] = row["name"].upcase
row["new column"] = "new value"
p row
end
And if you wish to write to file during that same iteration:
require 'csv'
csv_options = { headers: true }
# Open the target file for writing
CSV.open("target.csv", "wb") do |target|
# Add a header
target << %w[new header column names]
# Iterate over the source CSV rows
CSV.foreach "source.csv", **csv_options do |row|
# Mutate and add columns
row["name"] = row["name"].upcase
row["new column"] = "new value"
# Push the new row to the target file
target << row
end
end
2. Using CSV::Converters
There is a built in functionality that might be helpful - CSV::Converters - (see the :converters definition in the CSV::New documentation)
require 'csv'
# Register a converter in the options hash
csv_options = { headers: true, converters: [:stripper] }
# Define a converter
CSV::Converters[:stripper] = lambda do |value, field|
value ? value.to_s.strip : value
end
CSV.open("target.csv", "wb") do |target|
# same as above
CSV.foreach "source.csv", **csv_options do |row|
# same as above - input data will already be converted
# you can do additional things here if needed
end
end
3. Separate input and output from your converter classes
Based on your comment, and since you want to minimize I/O and iterations, perhaps extracting the read/write operations from the responsibility of the transformers might be of interest. Something like this.
require 'csv'
class NameCapitalizer
def self.call(row)
row["name"] = row["name"].upcase
end
end
class EmailRemover
def self.call(row)
row.delete 'email'
end
end
csv_options = { headers: true }
converters = [NameCapitalizer, EmailRemover]
CSV.open("target.csv", "wb") do |target|
CSV.foreach "source.csv", **csv_options do |row|
converters.each { |c| c.call row }
target << row
end
end
Note that the above code still does not handle the header, in case it was changed. You will probably have to reserve the last row (after all transformations) and prepend its #headers to the output CSV.
There are probably plenty other ways to do it, but the CSV class in Ruby does not have the cleanest interface, so I try to keep code that deals with it as simple as I can.
I am trying to read CSON (CoffeeScript Object Notation) into Ruby.
I am looking for something similar to data = JSON.parse(file) that one would use for JSON files.
file = File.read(filename)
data = CSON.parse(file) # does not exist - would like to have
I looked into invoking CoffeeScript and JavaScript from Ruby, but it feels overly complicated and like reinventing the wheel. Also, code in the data file should not be executed.
How can I read CSON into Ruby objects in a simple way?
This is what I came up with. It is sufficient for the data I am processing. The main work is done with the YAML parser Psych (https://github.com/ruby/psych). Arrays, hashes, and some of the multi-line text require a special treatment.
module CSON
def load_file(fname)
load_string File.read fname
end
def remove_indent(data)
out = ""
data.each_line do |line|
out += line.sub /^\s\s/,""
end
out
end
def parse_array(data)
data.gsub! /\n/, ","
data.gsub! /([\[\{]),/, '\1'
data.gsub! /,([\]\}])/, '\1'
YAML.load data
end
def load_string(data)
hashed = {}
data.gsub! /^(\w+):\s+(\[.*?\])/mu do # find arrays
key = Regexp.last_match[1]
value = parse_array Regexp.last_match[2]
hashed[key] = value
""
end
data.gsub! /(\w+):\s+\'\'\'\s*\n(.*?)\'\'\'/mu do # find heredocs
hashed[Regexp.last_match[1]] = remove_indent Regexp.last_match[2]
""
end
hashed.merge YAML.load data
end
end
This solution is likely to fail when applied to more complicated .cson files. I would be happy to see if someone has a more elegant answer!
I have been trying to use Minitest to test my code (full repo) but am having trouble with one method which downloads a SHA1 hash from a .txt file on a website and returns the value.
Method:
def download_remote_sha1
#log.info('Downloading Elasticsearch SHA1.')
#remote_sha1 = ''
Kernel.open(#verify_url) do |file|
#remote_sha1 = file.read
end
#remote_sha1 = #remote_sha1.split(/\s\s/)[0]
#remote_sha1
end
You can see that I log what is occurring to the command line, create an object to hold my SHA1 value, open the url (e.g. https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.2.deb.sha1.txt)
I then split the string so that I only have the SHA1 value.
The problem is that during a test, I want to stub the Kernel.open which uses OpenURI to open the URL. I would like to ensure that I'm not actually reaching out to download any file, but rather I'm just passing the block my own mock IO object testing just that it correctly splits stuff.
I attempted it like the block below but when #remote_sha1 = file.read occurs the file item is nil.
#mock_file = Minitest::Mock.new
#mock_file.expect(:read, 'd377e39343e5cc277104beee349e1578dc50f7f8 elasticsearch-1.4.2.deb')
Kernel.stub :open, #mock_file do
#downloader = ElasticsearchUpdate::Downloader.new(hash, true)
#downloader.download_remote_sha1.must_equal 'd377e39343e5cc277104beee349e1578dc50f7f8'
end
I was working on this question too, but matt figured it out first. To add to what matt posted:
When you write:
Kernel.stub(:open, #mock_file) do
#block code
end
...that means when Kernel.open() is called--in any code, anywhere before the stub() block ends--the return value of Kernel.open() will be #mock_file. However, you never use the return value of Kernel.open() in your code:
Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
If you wanted to use the return value of Kernel.open(), you would have to write:
return_val = Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
#do something with return_val
Therefore, the return value of Kernel.open() is irrelevant in your code--which means the second argument of stub() is irrelevant.
A careful examination of the source code for stub() reveals that stub() takes a third argument--an argument which will be passed to a block specified after the stubbed method call. You, in fact, have specified a block after your stubbed Kernel.open() method call:
stubbed method call -+ +- start of block
| | |
V V V
Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
^
|
end of block
So, in order to pass #mockfile to the block you need to specify it as the third argument to Kernel.stub():
Kernel.stub(:open, 'irrelevant', #mock_file) do
end
Here is a full example for future searchers:
require 'minitest/autorun'
class Dog
def initialize
#verify_url = 'http://www.google.com'
end
def download_remote_sha1
#remote_sha1 = ''
Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
#puts #remote_sha1[0..300]
#remote_sha1 = #remote_sha1.split(" ")[0] #Using a single space for the split() pattern will split on contiguous whitespace.
end
end
#Dog.new.download_remote_sha1
describe 'downloaded file' do
it 'should be an sha1 code' do
#mock_file = Minitest::Mock.new
#mock_file.expect(:read, 'd377e39343e5cc277104beee349e1578dc50f7f8 elasticsearch-1.4.2.deb')
Kernel.stub(:open, 'irrelevant', #mock_file) do
#downloader = Dog.new
#downloader.download_remote_sha1.must_equal 'd377e39343e5cc277104beee349e1578dc50f7f8'
end
end
end
xxx
The second argument to stub is what you want the return value to be for the duration of your test, but the way Kernel.open is used here requires the value it yields to the block to be changed instead.
You can achieve this by providing a third argument. Try changing the call to Kernel.stub to
Kernel.stub :open, true, #mock_file do
#...
Note the extra argument true, so that #mock_file is now the third argument and will be yielded to the block. The actual value of the second argument doesn’t really matter in this case, you might want to use #mock_file there too to more closely correspond to how open behaves.
Here's an extract of the code that I am using:
def retrieve(user_token, quote_id, check="quotes")
end_time = Time.now + 15
match = false
until Time.now > end_time || match
#response = http_request.get(quote_get_url(quote_id, user_token))
eval("match = !JSON.parse(#response.body)#{field(check)}.nil?")
end
match.eql?(false) ? nil : #response
end
private
def field (check)
hash = {"quotes" => '["quotes"][0]',
"transaction-items" => '["quotes"][0]["links"]["transactionItems"]'
}
hash[check]
end
I was informed that using eval in this manner is not good practice. Could anyone suggest a better way of dynamically checking the existence of a JSON node (field?). I want this to do:
psudo: match = !JSON.parse(#response.body) + dynamic-path + .nil?
Store paths as arrays of path elements (['quotes', 0]). With a little helper function you'll be able to avoid eval. It is, indeed, completely inappropriate here.
Something along these lines:
class Hash
def deep_get(path)
path.reduce(self) do |memo, path_element|
return unless memo
memo[path_element]
end
end
end
path = ['quotes', 0]
hash = JSON.parse(response.body)
match = !hash.deep_get(path).nil?
I've tried and tried, but I can't make this less ugly/more ruby-like. It seems like there just must be a better way. Help me learn.
class Df
attr_accessor :thresh
attr_reader :dfo
def initialize
#dfo = []
#df = '/opt/TWWfsw/bin/gdf'
case RUBY_PLATFORM
when /hpux/i
#fstyp = 'vxfs'
when /solaris/i
# fix: need /tmp too
#fstyp = 'ufs'
when /linux/i
#df = '/bin/df'
#fstyp = 'ext3'
end
#dfo = parsedf
end
def parsedf
ldf = []
[" "," -i"] .each do |arg|
fields = %w{device size used avail capp mount}
fields = %w{device inodes inodesused inodesavail iusep mount} if arg == ' -i'
ldf.push %x{#{#df} -P -t #{#fstyp}#{arg}}.split(/\n/)[1..-1].collect{|line| Hash[*fields.zip(line.split).flatten]}
end
out = []
# surely there must be an easier way
ldf[0].each do |x|
ldf[1].select { |y|
if y['device'] == x['device']
out.push x.merge(y)
end
}
end
out
end
end
In my machine, your ldf array after the df calls yields the following:
irb(main):011:0> ldf
=> [[{"device"=>"/dev/sda5", "size"=>"49399372", "mount"=>"/", "avail"=>"22728988", "used"=>"24161036", "capp"=>"52%"}], [{"device"=>"/dev/sda5", "inodes"=>"3137536", "mount"=>"/", "iusep"=>"13%", "inodesavail"=>"2752040", "inodesused"=>"385496"}]]
The most flexible approach to merging such a structure is probably something along these lines:
irb(main):013:0> ldf.flatten.inject {|a,b| a.merge(b)}
=> {"device"=>"/dev/sda5", "inodes"=>"3137536", "size"=>"49399372", "mount"=>"/", "avail"=>"22728988", "inodesavail"=>"2752040", "iusep"=>"13%", "used"=>"24161036", "capp"=>"52%", "inodesused"=>"385496"}
Some ruby programmers frown on this use of inject, but I like it, so your mileage may vary.
As for helping making your code more ruby like, I suggest you talk to some experienced rubyist you might know over your code to help you rewriting it in a way that follows good style and best practices. Probably that would the preferable than to just have someone rewrite it for you here.
Best of Luck!
Didn't test the code, but here goes:
ARGUMENTS = {
" " => %w{size used avail capp mount},
" -i" => %w{inodes inodesused inodesavail iusep mount}
}
def parsedf
# Store resulting info in a hash:
device_info = Hash.new do |h, dev|
h[dev] = {} # Each value will be a empty hash by default
end
ARGUMENTS.each do |arg, fields|
%x{#{#df} -P -t #{#fstyp}#{arg}}.split(/\n/)[1..-1].each do |line|
device, *data = line.split
device_info[device].merge! Hash[fields.zip(data)]
end
end
device_info
end
Notes: returns something a bit different than what you had:
{ "/dev/sda5" => {"inodes" => "...", ...},
"other device" => {...}
}
Also, I'm assuming Ruby 1.8.7 or better for Hash[key_value_pairs], otherwise you can resort to the Hash[*key_value_pairs.flatten] form you had
Depending on your needs, you should consider switch the fields from string to symbols; they are the best type of keys.