Storing a MessagePacked hash in Redis - ruby

I'm having a problem storing a MessagePacked hash in Redis. I've pasted a test case below. When pulling out the packed data from Redis and unpacking it, the hash is slightly corrupted. This appears to happen when the hash values are beyond a certain length, although I can't say that for sure.
I'm using Redis 2.4.17 (default config), Ruby 1.9.3p194, MessagePack 0.4.7, and the Redis gem 3.0.2. The same problem happens using node, so I'm assuming the problem is within MessagePack or Redis. Any ideas?
require 'redis'
require 'msgpack'
class Test
def self.run(url)
redis = Redis.new
data = {'number' => 13498935756, 'hash' => {'url' => url}}
redis.set('my_key', MessagePack.pack(data))
result = MessagePack.unpack(redis.get('my_key'))
puts result
puts result['hash']['url'] == data['hash']['url']
end
end
Test.run('http://fake.example.com') # works
=> {"number"=>13498935756, "hash"=>{"url"=>"http://fake.example.com"}}
=> true
Test.run('http://fakeurl.example.com') # does not work
=> {"number"=>13498935756, "hash"=>{"url"=>"ttp://fakeurl.example.com"}}
=> false

MessagePack deals in raw bytes, which are marked as 'ASCII-8BIT' encoding. However your packed data is coming back from Redis marked as being in UTF-8 encoding. In order for MessagePack to successfully unpack, you need to force it back to being interpreted as raw bytes.
Therefore, change this line...
result = MessagePack.unpack(redis.get('my_key'))
to something like this...
redis_val = redis.get('my_key').force_encoding('ASCII-8BIT')
result = MessagePack.unpack(redis_val)

Related

Ruby ZIP file encoding for sending to Sidekiq / Redis

I build a ZIP file using the following code:
def compress_batch(directory_path)
zip_file_path = File.join( File.expand_path("..", directory_path), SecureRandom.hex(10))
Zip::File.open(zip_file_path, Zip::File::CREATE) do |zip_file|
(Dir.entries(directory_path) - %w(. ..)).each do |file_name|
zip_file.add file_name, File.join(directory_path, file_name)
end
end
result = File.open(zip_file_path, 'rb').read
File.unlink(zip_file_path)
result
end
I store that ZIP file in memory:
#result = Payoff::DataFeed::Compress::ZipCompress.new.compress_batch(source_path)
I put it into a hash:
options = {
data: #result
}
Then I submit it to my SideKiq worker using perform_async:
DeliveryWorker.perform_async(options)
and get the following error:
[DEBUG] Starting store to: { "destination" => "sftp", "path" => "INBOUND/20191009.zip" }
Encoding::UndefinedConversionError: "\xBA" from ASCII-8BIT to UTF-8
from ruby/2.3.0/gems/activesupport-4.2.10/lib/active_support/core_ext/object/json.rb:34:in `encode'
However, if I use .new.perform instead of .perform_async, bypassing SideKiq, it works fine!
DeliveryWorker.new.perform(options)
My best guess is that there is something wrong with my encoding such that when the job goes to SideKiq / Redis, it blows up. How should I have encoded it? Do I need to change the creation of my ZIP file? Maybe I can convert the encoding upon submission to SideKiq?
Sidekiq serializes arguments as JSON. You are trying to stuff binary data into JSON, which only supports UTF-8 strings. You will need to Base64 encode the data if you wish to pass it through Redis.
require 'base64'
encoded = Base64.encode64(filedata)

Cannot deserialize object from a JSON string (but only to Hash)?

I wrote the dictation gem on my Mac, and deserialization works fine. When I installed it on another Mac it would not work because it "fails" to deserialize object, because it can only deserialize to a Hash.
Private Mac Ruby version: ruby-1.9.3-p0, json v1.8.0
Another Mac Ruby version: ruby-1.9.3-p448, json v1.8.0
I also tried different Ruby versions and Gem versions on both, but none of them works, only the initial one where I first wrote it.
When I try this code in the working environment:
require 'json'
class Word
attr_accessor :value, :translation
def initialize(value, translation)
#value = value
#translation = translation
end
def to_json(*args)
{
'json_class' => self.class.name,
'data' => [ #value, #translation ]
}.to_json(*args)
end
class << self
def json_create(object)
new(*object['data'])
end
end
end
str = '{"json_class":"Word","data":["Morgen","Tomorrow"]}'
p JSON.parse(str)
It prints a Word object, which is expected:
#<Word:0x007fcce22c9c58 #translation="Tomorrow", #value="Morgen">
With the other environment, it always prints a Hash:
{"json_class"=>"Word", "data"=>["Morgen", "Tomorrow"]}
I also tried to pass :object_class key, it throws another exception:
p JSON.parse(str, :object_class => Word)
# => ArgumentError: wrong number of arguments (0 for 2)
I could not figure out the require 'json' version during runtime using:
puts Gem.loaded_specs['json'].version
because Gem.loaded_specs.keys doesn't contain it.
Thanks for any hint.
Replied from the author of JSON lib - on newer version, due to security reason, to deserialize custom object, either you can:
JSON.parse(str, :create_additions => true)
or you can:
JSON.load(str)
So, I overlooked the JSON#load part in ruby-doc:
load(source, proc = nil, options = {})
Load a ruby data structure from a JSON source and return it. A source
can either be a string-like object, an IO-like object, or an object
responding to the read method. If proc was given, it will be called
with any nested Ruby object as an argument recursively in depth first
order. To modify the default options pass in the optional options
argument as well.
BEWARE: This method is meant to serialise data from trusted user
input, like from your own database server or clients under your
control, it could be dangerous to allow untrusted users to pass JSON
sources into it. The default options for the parser can be changed via
the ::load_default_options method.
This method is part of the implementation of the load/dump interface
of Marshal and YAML.
Deserializing directly into a rich object (especially if your JSON comes from an unknown source) can be a pretty serious attack vector (recent Rails vulnerabilities are related to that).
I would guess that this ability was disabled between Ruby versions, or, at least changed to a whitelist-based approach. I wasn't able to find any links to support this claim though, so I might be wrong.
Anyway, you might find it simpler and more compatible to initialize your class from the deserialized hash instead:
class Word
def self.from_json(json)
args = JSON.parse(json)["data"];
new(*args)
end
end
Here is another workaround, because my code is not used in web communication, vulnerability is not a problem here.
Before I was doing:
JSON.parse(str)
Now just need to add few lines:
obj = JSON.parse(str)
if obj.is_a?(Hash)
class_name = obj['json_class'].split('::').inject(Kernel) { |namespace, const_name| namespace.const_get(const_name) }
args = obj['data']
word = class_name.new(*args)
else
word = obj
end

Stream based parsing and writing of JSON

I fetch about 20,000 datasets from a server in 1,000 batches. Each dataset is a JSON object. Persisted this makes around 350 MB of uncompressed plaintext.
I have a memory limit of 1GB. Hence, I write each 1,000 JSON objects as an array into a raw JSON file in append mode.
The result is a file with 20 JSON arrays which needs to be aggregated. I need to touch them anyway, because I want to add metadata. Generally the Ruby Yajl Parser makes this possible like so:
raw_file = File.new(path_to_raw_file, 'r')
json_file = File.new(path_to_json_file, 'w')
datasets = []
parser = Yajl::Parser.new
parser.on_parse_complete = Proc.new { |o| datasets += o }
parser.parse(datasets)
hash = { date: Time.now, datasets: datasets }
Yajl::Encoder.encode(hash, json_file)
Where is the problem with this solution? The problem is that still the whole JSON is parsed into memory, which I must avoid.
Basically what I need is a solution which parses the JSON from an IO object and encodes them to another IO object, at the same time.
I assumed Yajl offers this, but I haven't found a way, nor did its API give any hints, so I guess not. Is there a JSON Parser library which supports this? Are there other solutions?
The only solution I can think of is to use the IO.seek capabilities. Write all the datasets arrays one after another [...][...][...] and after every array, I seek back to the start and overwrite ][ with ,, effectively connecting the arrays manually.
Why can't you retrieve a single record at a time from the database, process it as necessary, convert it to JSON, then emit it with a trailing/delimiting comma?
If you started with a file that only contained [, then appended all your JSON strings, then, on the final entry didn't append a comma, and instead used a closing ], you'd have a JSON array of hashes, and would only have to process one row's worth at a time.
It'd be a tiny bit slower (maybe) but wouldn't impact your system. And DB I/O can be very fast if you use blocking/paging to retrieve a reasonable number of records at a time.
For instance, here's a combination of some Sequel example code, and code to extract the rows as JSON and build a larger JSON structure:
require 'json'
require 'sequel'
DB = Sequel.sqlite # memory database
DB.create_table :items do
primary_key :id
String :name
Float :price
end
items = DB[:items] # Create a dataset
# Populate the table
items.insert(:name => 'abc', :price => rand * 100)
items.insert(:name => 'def', :price => rand * 100)
items.insert(:name => 'ghi', :price => rand * 100)
add_comma = false
puts '['
items.order(:price).each do |item|
puts ',' if add_comma
add_comma ||= true
print JSON[item]
end
puts "\n]"
Which outputs:
[
{"id":2,"name":"def","price":3.714714089426208},
{"id":3,"name":"ghi","price":27.0179624376119},
{"id":1,"name":"abc","price":52.51248221170203}
]
Notice the order is now by "price".
Validation is easy:
require 'json'
require 'pp'
pp JSON[<<EOT]
[
{"id":2,"name":"def","price":3.714714089426208},
{"id":3,"name":"ghi","price":27.0179624376119},
{"id":1,"name":"abc","price":52.51248221170203}
]
EOT
Which results in:
[{"id"=>2, "name"=>"def", "price"=>3.714714089426208},
{"id"=>3, "name"=>"ghi", "price"=>27.0179624376119},
{"id"=>1, "name"=>"abc", "price"=>52.51248221170203}]
This validates the JSON and demonstrates that the original data is recoverable. Each row retrieved from the database should be a minimal "bitesized" piece of the overall JSON structure you want to build.
Building upon that, here's how to read incoming JSON in the database, manipulate it, then emit it as a JSON file:
require 'json'
require 'sequel'
DB = Sequel.sqlite # memory database
DB.create_table :items do
primary_key :id
String :json
end
items = DB[:items] # Create a dataset
# Populate the table
items.insert(:json => JSON[:name => 'abc', :price => rand * 100])
items.insert(:json => JSON[:name => 'def', :price => rand * 100])
items.insert(:json => JSON[:name => 'ghi', :price => rand * 100])
items.insert(:json => JSON[:name => 'jkl', :price => rand * 100])
items.insert(:json => JSON[:name => 'mno', :price => rand * 100])
items.insert(:json => JSON[:name => 'pqr', :price => rand * 100])
items.insert(:json => JSON[:name => 'stu', :price => rand * 100])
items.insert(:json => JSON[:name => 'vwx', :price => rand * 100])
items.insert(:json => JSON[:name => 'yz_', :price => rand * 100])
add_comma = false
puts '['
items.each do |item|
puts ',' if add_comma
add_comma ||= true
print JSON[
JSON[
item[:json]
].merge('foo' => 'bar', 'time' => Time.now.to_f)
]
end
puts "\n]"
Which generates:
[
{"name":"abc","price":3.268814929005337,"foo":"bar","time":1379688093.124606},
{"name":"def","price":13.871147312377719,"foo":"bar","time":1379688093.124664},
{"name":"ghi","price":52.720984131655676,"foo":"bar","time":1379688093.124702},
{"name":"jkl","price":53.21477190840114,"foo":"bar","time":1379688093.124732},
{"name":"mno","price":40.99364022416619,"foo":"bar","time":1379688093.124758},
{"name":"pqr","price":5.918738444452265,"foo":"bar","time":1379688093.124803},
{"name":"stu","price":45.09391752439902,"foo":"bar","time":1379688093.124831},
{"name":"vwx","price":63.08947792357426,"foo":"bar","time":1379688093.124862},
{"name":"yz_","price":94.04921035056373,"foo":"bar","time":1379688093.124894}
]
I added the timestamp so you can see that each row is processed individually, AND to give you an idea how fast the rows are being processed. Granted, this is a tiny, in-memory database, which has no network I/O to content with, but a normal network connection through a switch to a database on a reasonable DB host should be pretty fast too. Telling the ORM to read the DB in chunks can speed up the processing because the DBM will be able to return larger blocks to more efficiently fill the packets. You'll have to experiment to determine what size chunks you need because it will vary based on your network, your hosts, and the size of your records.
Your original design isn't good when dealing with enterprise-sized databases, especially when your hardware resources are limited. Over the years we've learned how to parse BIG databases, which make 20,000 row tables appear miniscule. VM slices are common these days and we use them for crunching, so they're often the PCs of yesteryear: single CPU with small memory footprints and dinky drives. We can't beat them up or they'll be bottlenecks, so we have to break the data into the smallest atomic pieces we can.
Harping about DB design: Storing JSON in a database is a questionable practice. DBMs these days can spew JSON, YAML and XML representations of rows, but forcing the DBM to search inside stored JSON, YAML or XML strings is a major hit in processing speed, so avoid it at all costs unless you also have the equivalent lookup data indexed in separate fields so your searches are at the highest possible speed. If the data is available in separate fields, then doing good ol' database queries, tweaking in the DBM or your scripting language of choice, and emitting the massaged data becomes a lot easier.
It is possible via JSON::Stream or Yajl::FFI gems. You will have to write your own callbacks though. Some hints on how to do that can be found here and here.
Facing a similar problem I have created the json-streamer gem that will spare you the need to create your own callbacks. It will yield you each object one by one removing it from the memory afterwards. You could then pass these to another IO object as intended.
There is a library called oj that does exactly that. It can do parsing and generation. For example, for parsing you can use Oj::Doc:
Oj::Doc.open('[3,[2,1]]') do |doc|
result = {}
doc.each_leaf() do |d|
result[d.where?] = d.fetch()
end
result
end #=> ["/1" => 3, "/2/1" => 2, "/2/2" => 1]
You can even backtrack in the file using doc.move(path). it seems very flexible.
For writing documents, you can use Oj::StreamWriter:
require 'oj'
doc = Oj::StreamWriter.new($stdout)
def write_item(doc, item)
doc.push_object
doc.push_key "type"
doc.push_value "item"
doc.push_key "value"
doc.push_value item
doc.pop
end
def write_array(doc, array)
doc.push_object
doc.push_key "type"
doc.push_value "array"
doc.push_key "value"
doc.push_array
array.each do |item|
write_item(doc, item)
end
doc.pop
doc.pop
end
write_array(doc, [{a: 1}, {a: 2}]) #=> {"type":"array","value":[{"type":"item","value":{":a":1}},{"type":"item","value":{":a":2}}]}

How to parse SOAP response from ruby client?

I am learning Ruby and I have written the following code to find out how to consume SOAP services:
require 'soap/wsdlDriver'
wsdl="http://www.abundanttech.com/webservices/deadoralive/deadoralive.wsdl"
service=SOAP::WSDLDriverFactory.new(wsdl).create_rpc_driver
weather=service.getTodaysBirthdays('1/26/2010')
The response that I get back is:
#<SOAP::Mapping::Object:0x80ac3714
{http://www.abundanttech.com/webservices/deadoralive} getTodaysBirthdaysResult=#<SOAP::Mapping::Object:0x80ac34a8
{http://www.w3.org/2001/XMLSchema}schema=#<SOAP::Mapping::Object:0x80ac3214
{http://www.w3.org/2001/XMLSchema}element=#<SOAP::Mapping::Object:0x80ac2f6c
{http://www.w3.org/2001/XMLSchema}complexType=#<SOAP::Mapping::Object:0x80ac2cc4
{http://www.w3.org/2001/XMLSchema}choice=#<SOAP::Mapping::Object:0x80ac2a1c
{http://www.w3.org/2001/XMLSchema}element=#<SOAP::Mapping::Object:0x80ac2774
{http://www.w3.org/2001/XMLSchema}complexType=#<SOAP::Mapping::Object:0x80ac24cc
{http://www.w3.org/2001/XMLSchema}sequence=#<SOAP::Mapping::Object:0x80ac2224
{http://www.w3.org/2001/XMLSchema}element=[#<SOAP::Mapping::Object:0x80ac1f7c>,
#<SOAP::Mapping::Object:0x80ac13ec>,
#<SOAP::Mapping::Object:0x80ac0a28>,
#<SOAP::Mapping::Object:0x80ac0078>,
#<SOAP::Mapping::Object:0x80abf6c8>,
#<SOAP::Mapping::Object:0x80abed18>]
>>>>>>> {urn:schemas-microsoft-com:xml-diffgram-v1}diffgram=#<SOAP::Mapping::Object:0x80abe6c4
{}NewDataSet=#<SOAP::Mapping::Object:0x80ac1220
{}Table=[#<SOAP::Mapping::Object:0x80ac75e4
{}FullName="Cully, Zara"
{}BirthDate="01/26/1892"
{}DeathDate="02/28/1979"
{}Age="(87)"
{}KnownFor="The Jeffersons"
{}DeadOrAlive="Dead">,
#<SOAP::Mapping::Object:0x80b778f4
{}FullName="Feiffer, Jules"
{}BirthDate="01/26/1929"
{}DeathDate=#<SOAP::Mapping::Object:0x80c7eaf4>
{}Age="81"
{}KnownFor="Cartoonists"
{}DeadOrAlive="Alive">]>>>>
I am having a great deal of difficulty figuring out how to parse and show the returned information in a nice table, or even just how to loop through the records and have access to each element (ie. FullName,Age,etc). I went through the whole "getTodaysBirthdaysResult.methods - Object.new.methods" and kept working down to try and work out how to access the elements, but then I get to the array and I got lost.
Any help that can be offered would be appreciated.
If you're going to parse the XML anyway, you might as well skip SOAP4r and go with Handsoap. Disclaimer: I'm one of the authors of Handsoap.
An example implementation:
# wsdl: http://www.abundanttech.com/webservices/deadoralive/deadoralive.wsdl
DEADORALIVE_SERVICE_ENDPOINT = {
:uri => 'http://www.abundanttech.com/WebServices/DeadOrAlive/DeadOrAlive.asmx',
:version => 1
}
class DeadoraliveService < Handsoap::Service
endpoint DEADORALIVE_SERVICE_ENDPOINT
def on_create_document(doc)
# register namespaces for the request
doc.alias 'tns', 'http://www.abundanttech.com/webservices/deadoralive'
end
def on_response_document(doc)
# register namespaces for the response
doc.add_namespace 'ns', 'http://www.abundanttech.com/webservices/deadoralive'
end
# public methods
def get_todays_birthdays
soap_action = 'http://www.abundanttech.com/webservices/deadoralive/getTodaysBirthdays'
response = invoke('tns:getTodaysBirthdays', soap_action)
(response/"//NewDataSet/Table").map do |table|
{
:full_name => (table/"FullName").to_s,
:birth_date => Date.strptime((table/"BirthDate").to_s, "%m/%d/%Y"),
:death_date => Date.strptime((table/"DeathDate").to_s, "%m/%d/%Y"),
:age => (table/"Age").to_s.gsub(/^\(([\d]+)\)$/, '\1').to_i,
:known_for => (table/"KnownFor").to_s,
:alive? => (table/"DeadOrAlive").to_s == "Alive"
}
end
end
end
Usage:
DeadoraliveService.get_todays_birthdays
SOAP4R always returns a SOAP::Mapping::Object which is sometimes a bit difficult to work with unless you are just getting the hash values that you can access using hash notation like so
weather['fullName']
However, it does not work when you have an array of hashes. A work around is to get the result in xml format instead of SOAP::Mapping::Object. To do that I will modify your code as
require 'soap/wsdlDriver'
wsdl="http://www.abundanttech.com/webservices/deadoralive/deadoralive.wsdl"
service=SOAP::WSDLDriverFactory.new(wsdl).create_rpc_driver
service.return_response_as_xml = true
weather=service.getTodaysBirthdays('1/26/2010')
Now the above would give you an xml response which you can parse using nokogiri or REXML. Here is the example using REXML
require 'rexml/document'
rexml = REXML::Document.new(weather)
birthdays = nil
rexml.each_recursive {|element| birthdays = element if element.name == 'getTodaysBirthdaysResult'}
birthdays.each_recursive{|element| puts "#{element.name} = #{element.text}" if element.text}
This will print out all elements that have any text.
So once you have created an xml document you can pretty much do anything depending upon the methods the library you choose has ie. REXML or Nokogiri
Well, Here's my suggestion.
The issue is, you have to snag the right part of the result, one that is something you can actually iterator over. Unfortunately, all the inspecting in the world won't help you because it's a huge blob of unreadable text.
What I do is this:
File.open('myresult.yaml', 'w') {|f| f.write(result.to_yaml) }
This will be a much more human readable format. What you are probably looking for is something like this:
--- !ruby/object:SOAP::Mapping::Object
__xmlattr: {}
__xmlele:
- - &id024 !ruby/object:XSD::QName
name: ListAddressBooksResult <-- Hash name, so it's resul["ListAddressBooksResult"]
namespace: http://apiconnector.com
source:
- !ruby/object:SOAP::Mapping::Object
__xmlattr: {}
__xmlele:
- - &id023 !ruby/object:XSD::QName
name: APIAddressBook <-- this bastard is enumerable :) YAY! so it's result["ListAddressBooksResult"]["APIAddressBook"].each
namespace: http://apiconnector.com
source:
- - !ruby/object:SOAP::Mapping::Object
The above is a result from DotMailer's API, which I spent the last hour trying to figure out how to enumerate over the results. The above is the technique I used to figure out what the heck is going on. I think it beats using REXML etc this way, I could do something like this:
result['ListAddressBooksResult']['APIAddressBook'].each {|book| puts book["Name"]}
Well, I hope this helps anyone else who is looking.
/jason

JSON object for just an integer

Silly question, but I'm unable to figure out..
I tried the following in Ruby:
irb(main):020:0> JSON.load('[1,2,3]').class
=> Array
This seems to work. While neither
JSON.load('1').class
nor this
JSON.load('{1}').class
works. Any ideas?
I'd ask the guys who programmed the library. AFAIK, 1 isn't a valid JSON object, and neither is {1} but 1 is what the library itself generates for the fixnum 1.
You'd need to do: {"number" : 1} to be valid json. The bug is that
a != JSON.parse(JSON.generate(a))
I'd say it's a bug:
>> JSON.parse(1.to_json)
JSON::ParserError: A JSON text must at least contain two octets!
from /opt/local/lib/ruby/gems/1.8/gems/json-1.1.3/lib/json/common.rb:122:in `initialize'
from /opt/local/lib/ruby/gems/1.8/gems/json-1.1.3/lib/json/common.rb:122:in `new'
from /opt/local/lib/ruby/gems/1.8/gems/json-1.1.3/lib/json/common.rb:122:in `parse'
from (irb):7
I assume you're using this: (http://json.rubyforge.org/)
JSON only supporting objects is simply not true -- json.org also does not suggest this imo. it was derived from javascript and thus especially strings and numbers are also valid JSON:
var json_string = "1";
var p = eval('(' + json_string + ')');
console.log(p);
// => 1
typeof p
// => "number"
ActiveSupport::JSON properly understands raw value JSON:
require 'active_support/json'
p = ActiveSupport::JSON.decode '1'
# => 1
p.class
# => Fixnum
and so does MultiJson:
require 'multi_json'
p = MultiJson.load '1'
# => 1
p.class
# => Fixnum
so, as a2800276 mentioned, this must be a bug.
but as of this writing, ruby 2's JSON has quirks_mode enabled by default when using the load method.
require 'json'
p = JSON.load '1'
# => 1
p.class
# => Fixnum
The first example is valid. The second two are not valid JSON data. go to json.org for details.
As said only arrays and objects are allowed at the top level of JSON.
Maybe wrapping your values in an array will solve your problem.
def set( value ); #data = [value].to_json; end
def get; JSON.parse( #data )[0]; end
From the very basics of what JSON is:
Data types in JSON can be:
Number
String
Json Object ... (and some more)
Reference to see complete list of Json data types
Now any Json data has to be encapsulated in 'Json Object' at the top level.
To understand why is this so, you can see that without a Json Object at the top level, everything would be loose and you could only have only one of the data type in the whole of Json. i.e. Either a number, a string, a array, a null value etc... but only one.
'Json Object' type has a fixed format of 'key' : 'value' pair.
You cannot store just the value. Thus you cannot have something like {1}.
You need to put in the correct format, i.e. 'key' : 'value' pair.

Resources