Parsing JSON with multiple pages in Ruby - ruby

I understand how to parse JSON, but I don’t understand how to parse it if it contains links to other pages.
I would be grateful for your help!
api.example.com/v0/accounts
On the first request for a JSON file, we get:
{
"response": "OK",
"nfts": [
{
"token_id": "35507806371588763669298464310896317145981867843055556101069010709538683224114"
}
],
"total": null,
"continuation": "1634866413000"
}
There is a line: continuation, which is a link to the next request, and so it repeats many more times.
On next request, the link changes to api.example.com/v0/accounts&continuation=1634866413000
My code now looks like this:
class Source
include Mongoid::Document
include Mongoid::Timestamps
require 'json'
after_save :add_items
def add_items
json= HTTParty.get("https://api.example.com/v0/accounts")
json.dig('nfts')
load_items_ethereum.each do |item|
Item.create!(
:token_id => item['token_id'],
)
end
end
end

Low-level HTTP clients like HTTParty typically don't handle iteration. You'll need to do it yourself, using a loop until there's no continuation field, e.g.:
begin
continuation_param = "?continuation=#{continuation_id}" if continuation_id
json = HTTParty.get("https://api.example.com/v0/accounts#{continuation_param}")
continuation_id = json.dig('continuation');
# process latest payload, append it to a running list, etc.
end while continuation_id
(And for production, best practice would be to keep a counter so you can bail after N iterations, to avoid an infinite loop.)

Related

Savon - shortcut for prepending every element with namespace? (SOAP)

I'm currently building a SOAP client in Ruby, using Savon, but when I write the code to generate the XML, I'm having to write:
builder = Builder::XmlMarkup.new
builder.ns :request do
builder.ns :Foo do
builder.ns :FooBar do
builder.ns :Bar, "Foo"
end
end
end
to generate
<ns:request>
<ns:Foo>
<ns:FooBar>
<ns:Bar>Foo</ns:Bar>
</ns:FooBar>
</ns:Foo>
</ns:request>
Which is obviously quite repetitive, and I'd like to cut out the NS repetitions if possible. I've also noticed that without the ns, I'm allowed to use curly brackets, rather than dos/ends.
Is there any way around this? I don't like not having the ability to use curly brackets, and when I add dynamic input of element names later, it could make things complicated.
I tried
def send_builder(requestsym, data=nil)
##builder requestsym, data
end
But my knowledge of blocks/procs/lambdas isn't good enough to make that work with nested elements.
I think this is what you need, so you want to prepend a namespace in your request xml right?:
With Savon you can add namespace to your xml request as follows
client = Savon.client do
wsdl: "http://www.webserviceurl.net/service.asmx?WSDL"
namespace_identifier: :ns
end
response = client.call(:yourOperationHere, message: { request: { foo: { foo_bar: { bar: "Foo" } } } })
The result would be (the body of your request):
<ns:request>
<ns:foo>
<ns:fooBar>
<ns:bar>
Foo
</ns:bar>
</ns:fooBar>
</ns:foo>
</ns:request>
Just as a note for you, Savon has something called Gyoku, it converts symbols to :camelcase, so if your request is caseSensitive use strings instead of symbols.

How can I process huge JSON files as streams in Ruby, without consuming all memory?

I'm having trouble processing a huge JSON file in Ruby. What I'm looking for is a way to process it entry-by-entry without keeping too much data in memory.
I thought that yajl-ruby gem would do the work but it consumes all my memory. I've also looked at Yajl::FFI and JSON:Stream gems but there it is clearly stated:
For larger documents we can use an IO object to stream it into the
parser. We still need room for the parsed object, but the document
itself is never fully read into memory.
Here's what I've done with Yajl:
file_stream = File.open(file, "r")
json = Yajl::Parser.parse(file_stream)
json.each do |entry|
entry.do_something
end
file_stream.close
The memory usage keeps getting higher until the process is killed.
I don't see why Yajl keeps processed entries in the memory. Can I somehow free them, or did I just misunderstood the capabilities of Yajl parser?
If it cannot be done using Yajl: is there a way to do this in Ruby via any library?
Problem
json = Yajl::Parser.parse(file_stream)
When you invoke Yajl::Parser like this, the entire stream is loaded into memory to create your data structure. Don't do that.
Solution
Yajl provides Parser#parse_chunk, Parser#on_parse_complete, and other related methods that enable you to trigger parsing events on a stream without requiring that the whole IO stream be parsed at once. The README contains an example of how to use chunking instead.
The example given in the README is:
Or lets say you didn't have access to the IO object that contained JSON data, but instead only had access to chunks of it at a time. No problem!
(Assume we're in an EventMachine::Connection instance)
def post_init
#parser = Yajl::Parser.new(:symbolize_keys => true)
end
def object_parsed(obj)
puts "Sometimes one pays most for the things one gets for nothing. - Albert Einstein"
puts obj.inspect
end
def connection_completed
# once a full JSON object has been parsed from the stream
# object_parsed will be called, and passed the constructed object
#parser.on_parse_complete = method(:object_parsed)
end
def receive_data(data)
# continue passing chunks
#parser << data
end
Or if you don't need to stream it, it'll just return the built object from the parse when it's done. NOTE: if there are going to be multiple JSON strings in the input, you must specify a block or callback as this is how yajl-ruby will hand you (the caller) each object as it's parsed off the input.
obj = Yajl::Parser.parse(str_or_io)
One way or another, you have to parse only a subset of your JSON data at a time. Otherwise, you are simply instantiating a giant Hash in memory, which is exactly the behavior you describe.
Without knowing what your data looks like and how your JSON objects are composed, it isn't possible to give a more detailed explanation than that; as a result, your mileage may vary. However, this should at least get you pointed in the right direction.
Both #CodeGnome's and #A. Rager's answer helped me understand the solution.
I ended up creating the gem json-streamer that offers a generic approach and spares the need to manually define callbacks for every scenario.
Your solutions seem to be json-stream and yajl-ffi. There's an example on both that're pretty similar (they're from the same guy):
def post_init
#parser = Yajl::FFI::Parser.new
#parser.start_document { puts "start document" }
#parser.end_document { puts "end document" }
#parser.start_object { puts "start object" }
#parser.end_object { puts "end object" }
#parser.start_array { puts "start array" }
#parser.end_array { puts "end array" }
#parser.key {|k| puts "key: #{k}" }
#parser.value {|v| puts "value: #{v}" }
end
def receive_data(data)
begin
#parser << data
rescue Yajl::FFI::ParserError => e
close_connection
end
end
There, he sets up the callbacks for possible data events that the stream parser can experience.
Given a json document that looks like:
{
1: {
name: "fred",
color: "red",
dead: true,
},
2: {
name: "tony",
color: "six",
dead: true,
},
...
n: {
name: "erik",
color: "black",
dead: false,
},
}
One could stream parse it with yajl-ffi something like this:
def parse_dudes file_io, chunk_size
parser = Yajl::FFI::Parser.new
object_nesting_level = 0
current_row = {}
current_key = nil
parser.start_object { object_nesting_level += 1 }
parser.end_object do
if object_nesting_level.eql? 2
yield current_row #here, we yield the fully collected record to the passed block
current_row = {}
end
object_nesting_level -= 1
end
parser.key do |k|
if object_nesting_level.eql? 2
current_key = k
elsif object_nesting_level.eql? 1
current_row["id"] = k
end
end
parser.value { |v| current_row[current_key] = v }
file_io.each(chunk_size) { |chunk| parser << chunk }
end
File.open('dudes.json') do |f|
parse_dudes f, 1024 do |dude|
pp dude
end
end

Savon returning XML as string, not hash

I am trying to parse a SOAP response using Savon. The response is XML but is being returned as one long string. If I use #to_hash the entire XML object is still a string, now stored in
hash[:response][:return]
which means it is still a huge unusable mess.
My code looks like
response = soapClient.request(:get_sites_user_can_access) do
soap.body = { :sessionid => session[:login_response][:login_return],
:eid => user }
end
rep = response.to_hash
pp rep[:get_sites_user_can_access_response][:get_sites_user_can_access_return]
What step am I missing to get useful information out of the response? Note: Unfortunately I can't post the XML response because of the info it contains, but it looks like an entire XML document stored as a string. It's class is Nori::StringWithAttributes
I was able to get the desired results but parsing the Nori string(?) using this documentation. This seems like a less than ideal method, but I realized the last element is an array of hashes. So it's hash, of hashes, with an array of hashes. Anyway, here is what worked for me. Advice on how to make this less ugly and clunky would be appreciated.
response = soapClient.request(:get_sites_user_can_access) do
soap.body = { :sessionid => session[:login_response][:login_return],
:eid => user }
end
rep = response.to_hash[:get_sites_user_can_access_response][:get_sites_user_can_access_return]
hrep = Nori.parse(rep)
hrep[:list][:item].each { |item| pp item[:site_id] }

How to fetch multiple JSONs in parallel with Eventmachine in Ruby

I'm new to EM and am following this example:
EventMachine.run {
http = EventMachine::HttpRequest.new('http://google.com/').get :query => {'keyname' => 'value'}
http.errback { p 'Uh oh'; EM.stop }
http.callback {
p http.response_header.status
p http.response_header
p http.response
EventMachine.stop
}
}
I want to do something similar.
I want to fetch "JavaScript Object Notation" (JSON) files from several different web servers, in parallel.
I cannot find the way how to store all these JSON files in a common variable, so that I can do some calculations about them afterwards, something like in every request I store the JSON in a global array.
You want the requests to be in parallel and to process them after all have been completed?
You can use EventMachine::MultiRequest from em-http-request. The wiki has documentation on issuing parallel requests, see "Synchronizing with Multi interface".
You should add our code to multi.callback and you will receive an array of requests.

How to parse SOAP response from ruby client?

I am learning Ruby and I have written the following code to find out how to consume SOAP services:
require 'soap/wsdlDriver'
wsdl="http://www.abundanttech.com/webservices/deadoralive/deadoralive.wsdl"
service=SOAP::WSDLDriverFactory.new(wsdl).create_rpc_driver
weather=service.getTodaysBirthdays('1/26/2010')
The response that I get back is:
#<SOAP::Mapping::Object:0x80ac3714
{http://www.abundanttech.com/webservices/deadoralive} getTodaysBirthdaysResult=#<SOAP::Mapping::Object:0x80ac34a8
{http://www.w3.org/2001/XMLSchema}schema=#<SOAP::Mapping::Object:0x80ac3214
{http://www.w3.org/2001/XMLSchema}element=#<SOAP::Mapping::Object:0x80ac2f6c
{http://www.w3.org/2001/XMLSchema}complexType=#<SOAP::Mapping::Object:0x80ac2cc4
{http://www.w3.org/2001/XMLSchema}choice=#<SOAP::Mapping::Object:0x80ac2a1c
{http://www.w3.org/2001/XMLSchema}element=#<SOAP::Mapping::Object:0x80ac2774
{http://www.w3.org/2001/XMLSchema}complexType=#<SOAP::Mapping::Object:0x80ac24cc
{http://www.w3.org/2001/XMLSchema}sequence=#<SOAP::Mapping::Object:0x80ac2224
{http://www.w3.org/2001/XMLSchema}element=[#<SOAP::Mapping::Object:0x80ac1f7c>,
#<SOAP::Mapping::Object:0x80ac13ec>,
#<SOAP::Mapping::Object:0x80ac0a28>,
#<SOAP::Mapping::Object:0x80ac0078>,
#<SOAP::Mapping::Object:0x80abf6c8>,
#<SOAP::Mapping::Object:0x80abed18>]
>>>>>>> {urn:schemas-microsoft-com:xml-diffgram-v1}diffgram=#<SOAP::Mapping::Object:0x80abe6c4
{}NewDataSet=#<SOAP::Mapping::Object:0x80ac1220
{}Table=[#<SOAP::Mapping::Object:0x80ac75e4
{}FullName="Cully, Zara"
{}BirthDate="01/26/1892"
{}DeathDate="02/28/1979"
{}Age="(87)"
{}KnownFor="The Jeffersons"
{}DeadOrAlive="Dead">,
#<SOAP::Mapping::Object:0x80b778f4
{}FullName="Feiffer, Jules"
{}BirthDate="01/26/1929"
{}DeathDate=#<SOAP::Mapping::Object:0x80c7eaf4>
{}Age="81"
{}KnownFor="Cartoonists"
{}DeadOrAlive="Alive">]>>>>
I am having a great deal of difficulty figuring out how to parse and show the returned information in a nice table, or even just how to loop through the records and have access to each element (ie. FullName,Age,etc). I went through the whole "getTodaysBirthdaysResult.methods - Object.new.methods" and kept working down to try and work out how to access the elements, but then I get to the array and I got lost.
Any help that can be offered would be appreciated.
If you're going to parse the XML anyway, you might as well skip SOAP4r and go with Handsoap. Disclaimer: I'm one of the authors of Handsoap.
An example implementation:
# wsdl: http://www.abundanttech.com/webservices/deadoralive/deadoralive.wsdl
DEADORALIVE_SERVICE_ENDPOINT = {
:uri => 'http://www.abundanttech.com/WebServices/DeadOrAlive/DeadOrAlive.asmx',
:version => 1
}
class DeadoraliveService < Handsoap::Service
endpoint DEADORALIVE_SERVICE_ENDPOINT
def on_create_document(doc)
# register namespaces for the request
doc.alias 'tns', 'http://www.abundanttech.com/webservices/deadoralive'
end
def on_response_document(doc)
# register namespaces for the response
doc.add_namespace 'ns', 'http://www.abundanttech.com/webservices/deadoralive'
end
# public methods
def get_todays_birthdays
soap_action = 'http://www.abundanttech.com/webservices/deadoralive/getTodaysBirthdays'
response = invoke('tns:getTodaysBirthdays', soap_action)
(response/"//NewDataSet/Table").map do |table|
{
:full_name => (table/"FullName").to_s,
:birth_date => Date.strptime((table/"BirthDate").to_s, "%m/%d/%Y"),
:death_date => Date.strptime((table/"DeathDate").to_s, "%m/%d/%Y"),
:age => (table/"Age").to_s.gsub(/^\(([\d]+)\)$/, '\1').to_i,
:known_for => (table/"KnownFor").to_s,
:alive? => (table/"DeadOrAlive").to_s == "Alive"
}
end
end
end
Usage:
DeadoraliveService.get_todays_birthdays
SOAP4R always returns a SOAP::Mapping::Object which is sometimes a bit difficult to work with unless you are just getting the hash values that you can access using hash notation like so
weather['fullName']
However, it does not work when you have an array of hashes. A work around is to get the result in xml format instead of SOAP::Mapping::Object. To do that I will modify your code as
require 'soap/wsdlDriver'
wsdl="http://www.abundanttech.com/webservices/deadoralive/deadoralive.wsdl"
service=SOAP::WSDLDriverFactory.new(wsdl).create_rpc_driver
service.return_response_as_xml = true
weather=service.getTodaysBirthdays('1/26/2010')
Now the above would give you an xml response which you can parse using nokogiri or REXML. Here is the example using REXML
require 'rexml/document'
rexml = REXML::Document.new(weather)
birthdays = nil
rexml.each_recursive {|element| birthdays = element if element.name == 'getTodaysBirthdaysResult'}
birthdays.each_recursive{|element| puts "#{element.name} = #{element.text}" if element.text}
This will print out all elements that have any text.
So once you have created an xml document you can pretty much do anything depending upon the methods the library you choose has ie. REXML or Nokogiri
Well, Here's my suggestion.
The issue is, you have to snag the right part of the result, one that is something you can actually iterator over. Unfortunately, all the inspecting in the world won't help you because it's a huge blob of unreadable text.
What I do is this:
File.open('myresult.yaml', 'w') {|f| f.write(result.to_yaml) }
This will be a much more human readable format. What you are probably looking for is something like this:
--- !ruby/object:SOAP::Mapping::Object
__xmlattr: {}
__xmlele:
- - &id024 !ruby/object:XSD::QName
name: ListAddressBooksResult <-- Hash name, so it's resul["ListAddressBooksResult"]
namespace: http://apiconnector.com
source:
- !ruby/object:SOAP::Mapping::Object
__xmlattr: {}
__xmlele:
- - &id023 !ruby/object:XSD::QName
name: APIAddressBook <-- this bastard is enumerable :) YAY! so it's result["ListAddressBooksResult"]["APIAddressBook"].each
namespace: http://apiconnector.com
source:
- - !ruby/object:SOAP::Mapping::Object
The above is a result from DotMailer's API, which I spent the last hour trying to figure out how to enumerate over the results. The above is the technique I used to figure out what the heck is going on. I think it beats using REXML etc this way, I could do something like this:
result['ListAddressBooksResult']['APIAddressBook'].each {|book| puts book["Name"]}
Well, I hope this helps anyone else who is looking.
/jason

Resources