Objectify Ruby Hashes from/to JSON API - ruby

I just released a ruby gem to use some JSON over HTTP API:
https://github.com/solyaris/blomming_api
My naif ruby code just convert complex/nested JSON data structures returned by API endpoints (json_data) to ruby Hashes ( hash_data), in a flat one-to-one transaltion (JSON to ruby hash and viceversa). Tat's fine, but...
I would like a programming interface more "high level".
Maybe instatiating a class Resource for every endpoint, but I'm confused about a smart implementation.
Let me explain with an abstract code.
Let say I have a complex/nested JSON received by an API,
usually an Array of Hashes, recursively nested as here below (imagination example):
json_data = '[{
"commute": {
"minutes": 0,
"startTime": "Wed May 06 22:14:12 EDT 2014",
"locations": [
{
"latitude": "40.4220061",
"longitude": "40.4220061"
},
{
"latitude": "40.4989909",
"longitude": "40.48989805"
},
{
"latitude": "40.4111169",
"longitude": "40.42222869"
}
]
}
},
{
"commute": {
"minutes": 2,
"startTime": "Wed May 28 20:14:12 EDT 2014",
"locations": [
{
"latitude": "43.4220063",
"longitude": "43.4220063"
}
]
}
}]'
At the moment what I do, when I receive a similar JSON form an API is just:
# from JSON to hash
hash_data = JSON.load json_data
# and to assign values:
coords = hash_data.first["commute"]["locations"].last
coords["longitude"] = "40.00" # was "40.4111169"
coords["latitude"] = "41.00" # was "40.42222869"
that's ok, but with awfull/confusing syntax.
Instead, I probably would enjoy something like:
# create object Resource from hash
res = Resource.create( hash_data )
# ... some processing
# assign a "nested" variables: longitude, latitude of object: res
coords = res.first.commute.locations.last
coords.longitude = "40.00" # was "40.4111169"
coords.latitude = "41.00" # was "40.42222869"
# ... some processing
# convert modified object: res into an hash again:
modified_hash = res.save
# and probably at least I'll recover to to JSON:
modified_json = JSON.dump modified_hash
I read intresting posts:
http://pullmonkey.com/2008/01/06/convert-a-ruby-hash-into-a-class-object/
http://www.goodercode.com/wp/convert-your-hash-keys-to-object-properties-in-ruby/
and copying Kerry Wilson' code, I sketched the implementation here below:
class Resource
def self.create (hash)
new ( hash)
end
def initialize ( hash)
hash.to_obj
end
def save
# or to_hash()
# todo! HELP! (see later)
end
end
class ::Hash
# add keys to hash
def to_obj
self.each do |k,v|
v.to_obj if v.kind_of? Hash
v.to_obj if v.kind_of? Array
k=k.gsub(/\.|\s|-|\/|\'/, '_').downcase.to_sym
## create and initialize an instance variable for this key/value pair
self.instance_variable_set("##{k}", v)
## create the getter that returns the instance variable
self.class.send(:define_method, k, proc{self.instance_variable_get("##{k}")})
## create the setter that sets the instance variable
self.class.send(:define_method, "#{k}=", proc{|v| self.instance_variable_set("##{k}", v)})
end
return self
end
end
class ::Array
def to_obj
self.map { |v| v.to_obj }
end
end
#------------------------------------------------------------
BTW, I studied a bit ActiveResource project (was part of Rails if I well understood).
ARes could be great for my scope but the problem is ARes have a bit too "strict" presumption of full REST APIs...
In my case server API are not completely RESTfull in the way ARes would expect...
All in all I would do a lot of work to subclass / modify ARes behaviours
and at the moment I discarded the idea to use ActiveResource
QUESTIONS:
someone could help me to realize the save() method on the above code (I'm really bad with recursive methods... :-( ) ?
Does exist some gem that to the above sketched hash_to_object() and object_to_hash() translation ?
What do you think about that "automatic" objectifying of an "arbitrary" hash coming froma JSON over http APIs ?
I mean: I see the great pro that I do not need to client-side static-wire data structures, allowing to be flexible to possible server side variations.
But on the other hand, doing this automatic objectify, there is a possible cons of a side effect to allow security issues ... like malicious JSON injection (possible untrasted communication net ...)
What do you think about all this ? Any suggestion is welcome!
Sorry for my long post and my ruby language metaprogramming azards :-)
giorgio
UPDATE 2: I'm still interested reading opinions about question point 3:
Pros/Cons to create Resource class for every received JSON
Pros/Cons to create static (preemptive attributes) / automatich/dynamic nested objects
UPDATE 1: long reply to Simone:
thanks, you are right Mash have a sweet .to_hash() method:
require 'json'
require 'hashie'
json_data = '{
"commute": {
"minutes": 0,
"startTime": "Wed May 06 22:14:12 EDT 2014",
"locations": [
{
"latitude": "40.4220061",
"longitude": "40.4220061"
},
{
"latitude": "40.4989909",
"longitude": "40.48989805"
},
{
"latitude": "40.4111169",
"longitude": "40.42222869"
}
]
}
}'
# trasforma in hash
hash = JSON.load json_data
puts hash
res = Hashie::Mash.new hash
# assign a "nested" variables: longitude, latitude of object: res
coords = res.commute.locations.last
coords.longitude = "40.00" # was "40.4111169"
coords.latitude = "41.00" # was "40.42222869"
puts; puts "longitude: #{res.commute.locations.last.longitude}"
puts "latitude: #{res.commute.locations.last.latitude}"
modified_hash = res.to_hash
puts; puts modified_hash

This feature is provided by a few gem. One of the most known is Hashie, specifically the class Hashie::Mash.
Mash is an extended Hash that gives simple pseudo-object functionality that can be built from hashes and easily extended. It is designed to be used in RESTful API libraries to provide easy object-like access to JSON and XML parsed hashes.
Mash also supports multi-level objects.

Depending on your needs and level of nesting, you may get away with an OpenStruct.
I was working with a simple test stub. Hashie would have worked well, but was a bigger tool than I needed (and added dependency).

Related

Parsing JSON with multiple pages in Ruby

I understand how to parse JSON, but I don’t understand how to parse it if it contains links to other pages.
I would be grateful for your help!
api.example.com/v0/accounts
On the first request for a JSON file, we get:
{
"response": "OK",
"nfts": [
{
"token_id": "35507806371588763669298464310896317145981867843055556101069010709538683224114"
}
],
"total": null,
"continuation": "1634866413000"
}
There is a line: continuation, which is a link to the next request, and so it repeats many more times.
On next request, the link changes to api.example.com/v0/accounts&continuation=1634866413000
My code now looks like this:
class Source
include Mongoid::Document
include Mongoid::Timestamps
require 'json'
after_save :add_items
def add_items
json= HTTParty.get("https://api.example.com/v0/accounts")
json.dig('nfts')
load_items_ethereum.each do |item|
Item.create!(
:token_id => item['token_id'],
)
end
end
end
Low-level HTTP clients like HTTParty typically don't handle iteration. You'll need to do it yourself, using a loop until there's no continuation field, e.g.:
begin
continuation_param = "?continuation=#{continuation_id}" if continuation_id
json = HTTParty.get("https://api.example.com/v0/accounts#{continuation_param}")
continuation_id = json.dig('continuation');
# process latest payload, append it to a running list, etc.
end while continuation_id
(And for production, best practice would be to keep a counter so you can bail after N iterations, to avoid an infinite loop.)

How can I process huge JSON files as streams in Ruby, without consuming all memory?

I'm having trouble processing a huge JSON file in Ruby. What I'm looking for is a way to process it entry-by-entry without keeping too much data in memory.
I thought that yajl-ruby gem would do the work but it consumes all my memory. I've also looked at Yajl::FFI and JSON:Stream gems but there it is clearly stated:
For larger documents we can use an IO object to stream it into the
parser. We still need room for the parsed object, but the document
itself is never fully read into memory.
Here's what I've done with Yajl:
file_stream = File.open(file, "r")
json = Yajl::Parser.parse(file_stream)
json.each do |entry|
entry.do_something
end
file_stream.close
The memory usage keeps getting higher until the process is killed.
I don't see why Yajl keeps processed entries in the memory. Can I somehow free them, or did I just misunderstood the capabilities of Yajl parser?
If it cannot be done using Yajl: is there a way to do this in Ruby via any library?
Problem
json = Yajl::Parser.parse(file_stream)
When you invoke Yajl::Parser like this, the entire stream is loaded into memory to create your data structure. Don't do that.
Solution
Yajl provides Parser#parse_chunk, Parser#on_parse_complete, and other related methods that enable you to trigger parsing events on a stream without requiring that the whole IO stream be parsed at once. The README contains an example of how to use chunking instead.
The example given in the README is:
Or lets say you didn't have access to the IO object that contained JSON data, but instead only had access to chunks of it at a time. No problem!
(Assume we're in an EventMachine::Connection instance)
def post_init
#parser = Yajl::Parser.new(:symbolize_keys => true)
end
def object_parsed(obj)
puts "Sometimes one pays most for the things one gets for nothing. - Albert Einstein"
puts obj.inspect
end
def connection_completed
# once a full JSON object has been parsed from the stream
# object_parsed will be called, and passed the constructed object
#parser.on_parse_complete = method(:object_parsed)
end
def receive_data(data)
# continue passing chunks
#parser << data
end
Or if you don't need to stream it, it'll just return the built object from the parse when it's done. NOTE: if there are going to be multiple JSON strings in the input, you must specify a block or callback as this is how yajl-ruby will hand you (the caller) each object as it's parsed off the input.
obj = Yajl::Parser.parse(str_or_io)
One way or another, you have to parse only a subset of your JSON data at a time. Otherwise, you are simply instantiating a giant Hash in memory, which is exactly the behavior you describe.
Without knowing what your data looks like and how your JSON objects are composed, it isn't possible to give a more detailed explanation than that; as a result, your mileage may vary. However, this should at least get you pointed in the right direction.
Both #CodeGnome's and #A. Rager's answer helped me understand the solution.
I ended up creating the gem json-streamer that offers a generic approach and spares the need to manually define callbacks for every scenario.
Your solutions seem to be json-stream and yajl-ffi. There's an example on both that're pretty similar (they're from the same guy):
def post_init
#parser = Yajl::FFI::Parser.new
#parser.start_document { puts "start document" }
#parser.end_document { puts "end document" }
#parser.start_object { puts "start object" }
#parser.end_object { puts "end object" }
#parser.start_array { puts "start array" }
#parser.end_array { puts "end array" }
#parser.key {|k| puts "key: #{k}" }
#parser.value {|v| puts "value: #{v}" }
end
def receive_data(data)
begin
#parser << data
rescue Yajl::FFI::ParserError => e
close_connection
end
end
There, he sets up the callbacks for possible data events that the stream parser can experience.
Given a json document that looks like:
{
1: {
name: "fred",
color: "red",
dead: true,
},
2: {
name: "tony",
color: "six",
dead: true,
},
...
n: {
name: "erik",
color: "black",
dead: false,
},
}
One could stream parse it with yajl-ffi something like this:
def parse_dudes file_io, chunk_size
parser = Yajl::FFI::Parser.new
object_nesting_level = 0
current_row = {}
current_key = nil
parser.start_object { object_nesting_level += 1 }
parser.end_object do
if object_nesting_level.eql? 2
yield current_row #here, we yield the fully collected record to the passed block
current_row = {}
end
object_nesting_level -= 1
end
parser.key do |k|
if object_nesting_level.eql? 2
current_key = k
elsif object_nesting_level.eql? 1
current_row["id"] = k
end
end
parser.value { |v| current_row[current_key] = v }
file_io.each(chunk_size) { |chunk| parser << chunk }
end
File.open('dudes.json') do |f|
parse_dudes f, 1024 do |dude|
pp dude
end
end

Ruby Best Practice For Storing Collection of Procs

I am writing regressive test for my app and I use the class Page. Each page has a nav_to method that needs to be set with a proc when the instance is initialized.
I currently have a list of 40 some procs in the global scope and to me this seems sloppy. What would be the best practice for storing these procs? Should I store them in a module? Hash? Class? Please help!
Consider storing them in a module- (or class-)constant so that they can be grouped and named clearly. The data structure you choose (array vs hash) probably depends most on your desired interface (are they associated with some key or simply ordered?) and performance concerns, if relevant:
module MyTests # ...or "class"
NAV_TO_PROCS = [
Proc.new { ... },
Proc.new { ... },
]
# ... or ...
NAV_TO_BY_PAGE_NAME = {
"page1" => Proc.new { ... },
"page2" => Proc.new { ... },
}
end
As an aside, when using module constants as such I like to "freeze" them to avoid accidental mutation during use (e.g. NAV_TO_PROCS = [...].freeze).

Rail's strong_parameters not marking Array's Hashes as Permitted

I've got a bit of a puzzler on for strong_parameters.
I'm posting a large array of JSON to get processed and added as relational models to a central model. It looks something like this:
{
"buncha_data": {
"foo_data" [
{ "bar": 1, "baz": 3 },
...
]
},
...
}
And I've got a require/permit flow that looks like it should work:
class TheController < ApplicationController
def create
mymodel = MyModel.create import_params
end
def import_params
params.require(:different_property)
params.require(:buncha_data).permit(foo_data: [:bar, :baz])
params
end
end
Yet in the create method when I iterate through this data to create the related model:
self.relatables = posted_data['buncha_data']['foo_data'].map do |raw|
RelatedModel.new raw
end
I get a ActiveModel::ForbiddenAttributesError. What I've ended up having to do is iterate through the array on my own and call permit on each hash in the array, like so:
params.required(:buncha_data).each do |_, list|
list.each{ |row| row.permit [:bar, :baz] }
end
What gives?
As MikeJ pointed out - require and permit do not update the object.
I rewrote my controller to be:
def import_params
params[:different_property] = params.require(:different_property)
params[:buncha_data] = params.require(:buncha_data).permit(foo_data: [:bar, :baz])
params
end
And everything worked great. This is somewhat apparent if you read the source code.

How do I deserialize YAML documents from external sources and have full access on class members?

In Ruby any object can be transferred, i.e. serialized, to a YAML document by saving the output of the "to_yaml" method to a file. Afterwards, this YAML file can be read again, i.e. deserialized, by using the YAML::load method. Moreover, one has full access on all members of the underlying class/object.
All of this is valid as long I'm using Ruby as a single platform. Once I serialize objects in Java and deserialize them under Ruby, I cannot access the object any more because of a NoMethodError exception. This is due to to the way objects/local data types are named under different systems.
Given a Ruby class "Car":
# A simple class describing a car
#
class Car
attr :brand, :horsepower, :color, :extra_equipment
def initialize(brand, horsepower, color, extra_equipment)
#brand = brand
#horsepower = horsepower
#color = color
#extra_equipment = extra_equipment
end
end
Creating a simple instance:
# creating new instance of class 'Car' ...
porsche = Car.new("Porsche", 180, "red", ["sun roof", "air conditioning"])
Calling porsche.to_yaml results in the following output:
--- !ruby/object:Car
brand: Porsche
color: red
extra_equipment:
- sun roof
- air conditioning
horsepower: 180
I test deserialization by loading the YAML output:
# reading existing yaml file from file system
sample_car = YAML::load(File.open("sample.yaml"))
puts sample_car.brand # returns "Porsche"
This works as expected, but now let's assume the YAML document was produced by a different system and lacks any reference to Ruby, although having a yaml-conform object description, "!Car", instead of "!ruby/object:Car":
--- !Car
brand: Porsche
color: red
extra_equipment:
- sun roof
- air conditioning
horsepower: 180
This code:
# reading existing yaml file from file system
sample_car = YAML::load(File.open("sample.yaml"))
puts sample_car.brand # returns "Porsche"
returns this exception:
/path/yaml_to_object_converter.rb.rb:27:in `<main>':
undefined method `brand' for #<YAML::DomainType:0x9752bec> (NoMethodError)
Is there a way to deal with objects defined in "external" YAML documents?
For me sample_car in the IRB shell evaluates to:
=> #<Syck::DomainType:0x234df80 #domain="yaml.org,2002", #type_id="Car", #value={"brand"=>"Porsche", "color"=>"red", "extra_equipment"=>["sun roof", "air conditioning"], "horsepower"=>180}>
Then I issued sample_car.value:
=> {"brand"=>"Porsche", "color"=>"red", "extra_equipment"=>["sun roof", "air conditioning"], "horsepower"=>180}
Which is a Hash. This means, that you can construct your Car object by adding a class method to Car like so:
def self.from_hash(h)
Car.new(h["brand"], h["horsepower"], h["color"], h["extra_equipment"])
end
Then I tried it:
porsche_clone = Car.from_hash(sample_car.value)
Which returned:
=> #<Car:0x236eef0 #brand="Porsche", #horsepower=180, #color="red", #extra_equipment=["sun roof", "air conditioning"]>
That's the ugliest way of doing it. There might be others. =)
EDIT (19-May-2011): BTW, Just figured a lot easier way:
def from_hash(o,h)
h.each { |k,v|
o.send((k+"=").to_sym, v)
}
o
end
For this to work in your case, your constructor must not require parameters. Then you can simply do:
foreign_car = from_hash(Car.new, YAML::load(File.open("foreign_car.yaml")).value)
puts foreign_car.inspect
...which gives you:
#<Car:0x2394b70 #brand="Porsche", #color="red", #extra_equipment=["sun roof", "air conditioning"], #horsepower=180>

Resources