parsing a json string with duplicate keys in ruby - ruby

i have a json with duplicate keys as below:
{"a":{
"stateId":"b",
"countyId":["c"]
},"a":{
"stateId":"d",
"countyId":["e"]
}}
When i use JSON.parse or JSON(stirng), it parses and gives me the key with values d, e. I need to parse the json such that it it avoids parsing the same key twice and has b, c values for the key 'a' instead of 'd', 'e'.

There is a way. Instead of using the ususal Hash class for parsing JSON objects, use a slightly modified class, which can check if a key already exists:
class DuplicateCheckingHash < Hash
attr_accessor :duplicate_check_off
def []=(key, value)
if !duplicate_check_off && has_key?(key) then
fail "Failed: Found duplicate key \"#{key}\" while parsing json! Please cleanup your JSON input!"
end
super
end
end
json = '{"a": 1, "a": 2}' # duplicate!
hash = JSON.parse(json, { object_class:DuplicateCheckingHash }) # will raise
json = '{"a": 1, "b": 2}'
hash = JSON.parse(json, { object_class:DuplicateCheckingHash })
hash.duplicate_check_off = true # make updateable again
hash["a"] = 42 # won't raise

It all depends on the format of your string. If it's as simple as you posted:
require 'json'
my_json =<<END_OF_JSON
{"a":{
"stateId":"b",
"countyId":["c"]
},"a":{
"stateId":"d",
"countyId":["e"]
},"b":{
"stateId":"x",
"countyId":["y"]
},"b":{
"stateId":"no",
"countyId":["no"]
}}
END_OF_JSON
results = {}
hash_strings = my_json.split("},")
hash_strings.each do |hash_str|
hash_str.strip!
hash_str = "{" + hash_str if not hash_str.start_with? "{"
hash_str += "}}" if not hash_str.end_with? "}}"
hash = JSON.parse(hash_str)
hash.each do |key, val|
results[key] = val if not results.keys.include? key
end
end
p results
--output:--
{"a"=>{"stateId"=>"b", "countyId"=>["c"]}, "b"=>{"stateId"=>"x", "countyId"=>["y"]}}

Related

Converting Ruby Hash into string with escapes

I have a Hash which needs to be converted in a String with escaped characters.
{name: "fakename"}
and should end up like this:
'name:\'fakename\'
I don't know how this type of string is called. Maybe there is an already existing method, which I simply don't know...
At the end I would do something like this:
name = {name: "fakename"}
metadata = {}
metadata['foo'] = 'bar'
"#{name} AND #{metadata}"
which ends up in that:
'name:\'fakename\' AND metadata[\'foo\']:\'bar\''
Context: This query a requirement to search Stripe API: https://stripe.com/docs/api/customers/search
If possible I would use Stripe's gem.
In case you can't use it, this piece of code extracted from the gem should help you encode the query parameters.
require 'cgi'
# Copied from here: https://github.com/stripe/stripe-ruby/blob/a06b1477e7c28f299222de454fa387e53bfd2c66/lib/stripe/util.rb
class Util
def self.flatten_params(params, parent_key = nil)
result = []
# do not sort the final output because arrays (and arrays of hashes
# especially) can be order sensitive, but do sort incoming parameters
params.each do |key, value|
calculated_key = parent_key ? "#{parent_key}[#{key}]" : key.to_s
if value.is_a?(Hash)
result += flatten_params(value, calculated_key)
elsif value.is_a?(Array)
result += flatten_params_array(value, calculated_key)
else
result << [calculated_key, value]
end
end
result
end
def self.flatten_params_array(value, calculated_key)
result = []
value.each_with_index do |elem, i|
if elem.is_a?(Hash)
result += flatten_params(elem, "#{calculated_key}[#{i}]")
elsif elem.is_a?(Array)
result += flatten_params_array(elem, calculated_key)
else
result << ["#{calculated_key}[#{i}]", elem]
end
end
result
end
def self.url_encode(key)
CGI.escape(key.to_s).
# Don't use strict form encoding by changing the square bracket control
# characters back to their literals. This is fine by the server, and
# makes these parameter strings easier to read.
gsub("%5B", "[").gsub("%5D", "]")
end
end
params = { name: 'fakename', metadata: { foo: 'bar' } }
Util.flatten_params(params).map { |k, v| "#{Util.url_encode(k)}=#{Util.url_encode(v)}" }.join("&")
I use it now with that string, which works... Quite straigt forward:
"email:\'#{email}\'"
email = "test#test.com"
key = "foo"
value = "bar"
["email:\'#{email}\'", "metadata[\'#{key}\']:\'#{value}\'"].join(" AND ")
=> "email:'test#test.com' AND metadata['foo']:'bar'"
which is accepted by Stripe API

Ruby - Safely navigated hash when a key could be a string or further hash

I'm stuck trying to safely navigate a hash from json.
The json could have a string, eg:
or it could be further nested:
h1 = { location: { formatted: 'Australia', code: 'AU' } }
h2 = { location: 'Australia' }
h2.dig('location', 'formatted')
Then String does not have #dig method
Basically I'm trying to load the JSON then populate the rails model with the data available which may be optional. It seems backwards to check every nested step with an if.
Hash#dig has no magic. It reduces the arguments recursively calling Hash#[] on what was returned from the previous call.
h1 = { location: { formatted: 'Australia', code: 'AU' } }
h1.dig :location, :code
#β‡’ "AU"
It works, because h1[:location] had returned a hash.
h2 = { location: 'Australia' }
h2.dig :location, :code
It raises, because h2[:location] had returned a String.
That said, the solution would be to reimplement Hash#dig, as usually :)
Explicitly taking into account that it’s extremely trivial. Just take a list of keys to dig and (surprise) reduce, returning either the value, or nil.
%i|location code|.reduce(h2) do |acc, e|
acc.is_a?(Hash) ? acc[e] : nil
end
#β‡’ nil
%i|location code|.reduce(h1) do |acc, e|
acc.is_a?(Hash) ? acc[e] : nil
end
#β‡’ "AU"
Shameless plug. You might find the gem iteraptor I had created for this exact purpose useful.
You can use a simple piece of code like that:
def nested_value(hash, key)
return hash if key == ''
keys = key.split('.')
value = hash[keys.first] || hash[keys.first.to_sym]
return value unless value.is_a?(Hash)
nested_value(value, keys[1..-1].join('.'))
end
h1 = { location: { formatted: 'Australia', code: 'AU' } }
h2 = { 'location' => 'Australia' }
p nested_value(h1, 'location.formatted') # => Australia
p nested_value(h2, 'location.formatted') # => Australia
You can also use that method for getting any nested value of a hash by providing key in format foo.bar.baz.qux. Also the method doesn't worry whether a hash has string keys or symbol keys.
I don't know if this lead to the expected behaviour (see examples below) but you can define a patch for the Hash class as follow:
module MyHashPatch
def safe_dig(params) # ok, call as you like..
tmp = self
res = nil
params.each do |param|
if (tmp.is_a? Hash) && (tmp.has_key? param)
tmp = tmp[param]
res = tmp
else
break
end
end
res
end
end
Hash.include MyHashPatch
Then test on your hashes:
h1 = { location: { formatted: 'Australia', code: 'AU' } }
h2 = { location: 'Australia' }
h1.safe_dig([:location, :formatted]) #=> "Australia"
h2.safe_dig([:location, :formatted]) #=> "Australia"
h1.safe_dig([:location, :code]) #=> "AU"
h2.safe_dig([:location, :code]) #=> "Australia"

How to create a Hash from a nested CSV in Ruby?

I have a CSV in the following format:
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,codes.1
YK,1234,4567,AB001,AK002
As you can see, this is a nested structure. The CSV may contain multiple rows. I would like to convert this into an array of hashes like this:
[
{
name: 'YK',
contacts: [
{
phone_no: '1234'
},
{
phone_no: '4567'
}
],
codes: ['AB001', 'AK002']
}
]
The structure uses numbers in the given format to represent arrays. There can be hashes inside arrays. Is there a simple way to do that in Ruby?
The CSV headers are dynamic. It can change. I will have to create the hash on the fly based on the CSV file.
There is a similar node library called csvtojson to do that for JavaScript.
Just read and parse it line-by-line. The arr variable in the code below will hold an array of Hash that you need
arr = []
File.readlines('README.md').drop(1).each do |line|
fields = line.split(',').map(&:strip)
hash = { name: fields[0], contacts: [fields[1], fields[2]], address: [fields[3], fields[4]] }
arr.push(hash)
end
Let's first construct a CSV file.
str = <<~END
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,IQ,codes.1
YK,1234,4567,AB001,173,AK002
ER,4321,7654,BA001,81,KA002
END
FName = 't.csv'
File.write(FName, str)
#=> 121
I have constructed a helper method to construct a pattern that will be used to convert each row of the CSV file (following the first, containing the headers) to an element (hash) of the desired array.
require 'csv'
def construct_pattern(csv)
csv.headers.group_by { |col| col[/[^.]+/] }.
transform_values do |arr|
case arr.first.count('.')
when 0
arr.first
when 1
arr
else
key = arr.first[/(?<=\d\.).*/]
arr.map { |v| { key=>v } }
end
end
end
In the code below, for the example being considered:
construct_pattern(csv)
#=> {"name"=>"name",
# "contacts"=>[{"phone_no"=>"contacts.0.phone_no"},
# {"phone_no"=>"contacts.1.phone_no"}],
# "codes"=>["codes.0", "codes.1"],
# "IQ"=>"IQ"}
By tacking if pattern.empty? onto the above expression we ensure the pattern is constructed only once.
We may now construct the desired array.
pattern = {}
CSV.foreach(FName, headers: true).map do |csv|
pattern = construct_pattern(csv) if pattern.empty?
pattern.each_with_object({}) do |(k,v),h|
h[k] =
case v
when Array
case v.first
when Hash
v.map { |g| g.transform_values { |s| csv[s] } }
else
v.map { |s| csv[s] }
end
else
csv[v]
end
end
end
#=> [{"name"=>"YK",
# "contacts"=>[{"phone_no"=>"1234"}, {"phone_no"=>"4567"}],
# "codes"=>["AB001", "AK002"],
# "IQ"=>"173"},
# {"name"=>"ER",
# "contacts"=>[{"phone_no"=>"4321"}, {"phone_no"=>"7654"}],
# "codes"=>["BA001", "KA002"],
# "IQ"=>"81"}]
The CSV methods I've used are documented in CSV. See also Enumerable#group_by and Hash#transform_values.

How to flatten a hash, making each key a unique value?

I want to take a hash with nested hashes and arrays and flatten it out into a single hash with unique values. I keep trying to approach this from different angles, but then I make it way more complex than it needs to be and get myself lost in what's happening.
Example Source Hash:
{
"Name" => "Kim Kones",
"License Number" => "54321",
"Details" => {
"Name" => "Kones, Kim",
"Licenses" => [
{
"License Type" => "PT",
"License Number" => "54321"
},
{
"License Type" => "Temp",
"License Number" => "T123"
},
{
"License Type" => "AP",
"License Number" => "A666",
"Expiration Date" => "12/31/2020"
}
]
}
}
Example Desired Hash:
{
"Name" => "Kim Kones",
"License Number" => "54321",
"Details_Name" => "Kones, Kim",
"Details_Licenses_1_License Type" => "PT",
"Details_Licenses_1_License Number" => "54321",
"Details_Licenses_2_License Type" => "Temp",
"Details_Licenses_2_License Number" => "T123",
"Details_Licenses_3_License Type" => "AP",
"Details_Licenses_3_License Number" => "A666",
"Details_Licenses_3_Expiration Date" => "12/31/2020"
}
For what it's worth, here's my most recent attempt before giving up.
def flattify(hashy)
temp = {}
hashy.each do |key, val|
if val.is_a? String
temp["#{key}"] = val
elsif val.is_a? Hash
temp.merge(rename val, key, "")
elsif val.is_a? Array
temp["#{key}"] = enumerate val, key
else
end
print "=> #{temp}\n"
end
return temp
end
def rename (hashy, str, n)
temp = {}
hashy.each do |key, val|
if val.is_a? String
temp["#{key}#{n}"] = val
elsif val.is_a? Hash
val.each do |k, v|
temp["#{key}_#{k}#{n}"] = v
end
elsif val.is_a? Array
temp["#{key}"] = enumerate val, key
else
end
end
return flattify temp
end
def enumerate (ary, str)
temp = {}
i = 1
ary.each do |x|
temp["#{str}#{i}"] = x
i += 1
end
return flattify temp
end
Interesting question!
Theory
Here's a recursive method to parse your data.
It keeps track of which keys and indices it has found.
It appends them in a tmp array.
Once a leaf object has been found, it gets written in a hash as value, with a joined tmp as key.
This small hash then gets recursively merged back to the main hash.
Code
def recursive_parsing(object, tmp = [])
case object
when Array
object.each.with_index(1).with_object({}) do |(element, i), result|
result.merge! recursive_parsing(element, tmp + [i])
end
when Hash
object.each_with_object({}) do |(key, value), result|
result.merge! recursive_parsing(value, tmp + [key])
end
else
{ tmp.join('_') => object }
end
end
As an example:
require 'pp'
pp recursive_parsing(data)
# {"Name"=>"Kim Kones",
# "License Number"=>"54321",
# "Details_Name"=>"Kones, Kim",
# "Details_Licenses_1_License Type"=>"PT",
# "Details_Licenses_1_License Number"=>"54321",
# "Details_Licenses_2_License Type"=>"Temp",
# "Details_Licenses_2_License Number"=>"T123",
# "Details_Licenses_3_License Type"=>"AP",
# "Details_Licenses_3_License Number"=>"A666",
# "Details_Licenses_3_Expiration Date"=>"12/31/2020"}
Debugging
Here's a modified version with old-school debugging. It might help you understand what's going on:
def recursive_parsing(object, tmp = [], indent="")
puts "#{indent}Parsing #{object.inspect}, with tmp=#{tmp.inspect}"
result = case object
when Array
puts "#{indent} It's an array! Let's parse every element:"
object.each_with_object({}).with_index(1) do |(element, result), i|
result.merge! recursive_parsing(element, tmp + [i], indent + " ")
end
when Hash
puts "#{indent} It's a hash! Let's parse every key,value pair:"
object.each_with_object({}) do |(key, value), result|
result.merge! recursive_parsing(value, tmp + [key], indent + " ")
end
else
puts "#{indent} It's a leaf! Let's return a hash"
{ tmp.join('_') => object }
end
puts "#{indent} Returning #{result.inspect}\n"
result
end
When called with recursive_parsing([{a: 'foo', b: 'bar'}, {c: 'baz'}]), it displays:
Parsing [{:a=>"foo", :b=>"bar"}, {:c=>"baz"}], with tmp=[]
It's an array! Let's parse every element:
Parsing {:a=>"foo", :b=>"bar"}, with tmp=[1]
It's a hash! Let's parse every key,value pair:
Parsing "foo", with tmp=[1, :a]
It's a leaf! Let's return a hash
Returning {"1_a"=>"foo"}
Parsing "bar", with tmp=[1, :b]
It's a leaf! Let's return a hash
Returning {"1_b"=>"bar"}
Returning {"1_a"=>"foo", "1_b"=>"bar"}
Parsing {:c=>"baz"}, with tmp=[2]
It's a hash! Let's parse every key,value pair:
Parsing "baz", with tmp=[2, :c]
It's a leaf! Let's return a hash
Returning {"2_c"=>"baz"}
Returning {"2_c"=>"baz"}
Returning {"1_a"=>"foo", "1_b"=>"bar", "2_c"=>"baz"}
Unlike the others, I have no love for each_with_object :-). But I do like passing a single result hash around so I don't have to merge and remerge hashes over and over again.
def flattify(value, result = {}, path = [])
case value
when Array
value.each.with_index(1) do |v, i|
flattify(v, result, path + [i])
end
when Hash
value.each do |k, v|
flattify(v, result, path + [k])
end
else
result[path.join("_")] = value
end
result
end
(Some details adopted from Eric, see comments)
Non-recursive approach, using BFS with an array as a queue. I keep the key-value pairs where the value isn't an array/hash, and push array/hash contents to the queue (with combined keys). Turning arrays into hashes (["a", "b"] &mapsto; {1=>"a", 2=>"b"}) as that felt neat.
def flattify(hash)
(q = hash.to_a).select { |key, value|
value = (1..value.size).zip(value).to_h if value.is_a? Array
!value.is_a?(Hash) || !value.each { |k, v| q << ["#{key}_#{k}", v] }
}.to_h
end
One thing I like about it is the nice combination of keys as "#{key}_#{k}". In my other solution, I could've also used a string path = '' and extended that with path + "_" + k, but that would've caused a leading underscore that I'd have to avoid or trim with extra code.

Can't convert symbol to integer from hash table

Edit: The issue is being unable to get the quantity of arrays within the hash, so it can be, x = amount of arrays. so it can be used as function.each_index{|x| code }
Trying to use the index of the amount of rows as a way of repeating an action X amount of times depending on how much data is pulled from a CSV file.
Terminal issued
=> Can't convert symbol to integer (TypeError)
Complete error:
=> ~/home/tests/Product.rb:30:in '[]' can't convert symbol into integer (TypeError) from ~home/tests/Product.rub:30:in 'getNumbRel'
from test.rb:36:in '<main>'
the function is that is performing the action is:
def getNumRel
if defined? #releaseHashTable
return #releaseHashTable[:releasename].length
else
#releaseHashTable = readReleaseCSV()
return #releaseHashTable[:releasename].length
end
end
The csv data pull is just a hash of arrays, nothing snazzy.
def readReleaseCSV()
$log.info("Method "+"#{self.class.name}"+"."+"#{__method__}"+" has started")
$log.debug("reading product csv file")
# Create a Hash where the default is an empty Array
result = Array.new
csvPath = "#{File.dirname(__FILE__)}"+"/../../data/addingProdRelProjIterTestSuite/releaseCSVdata.csv"
CSV.foreach(csvPath, :headers => true, :header_converters => :symbol) do |row|
row.each do |column, value|
if "#{column}" == "prodid"
proHash = Hash.new { |h, k| h[k] = [ ] }
proHash['relid'] << row[:relid]
proHash['releasename'] << row[:releasename]
proHash['inheritcomponents'] << row[:inheritcomponents]
productId = Integer(value)
if result[productId] == nil
result[productId] = Array.new
end
result[productId][result[productId].length] = proHash
end
end
end
$log.info("Method "+"#{self.class.name}"+"."+"#{__method__}"+" has finished")
#productReleaseArr = result
end
Sorry, couldn't resist, cleaned up your method.
# empty brackets unnecessary, no uppercase in method names
def read_release_csv
# you don't need + here
$log.info("Method #{self.class.name}.#{__method__} has started")
$log.debug("reading product csv file")
# you're returning this array. It is not a hash. [] is preferred over Array.new
result = []
csvPath = "#{File.dirname(__FILE__)}/../../data/addingProdRelProjIterTestSuite/releaseCSVdata.csv"
CSV.foreach(csvPath, :headers => true, :header_converters => :symbol) do |row|
row.each do |column, value|
# to_s is preferred
if column.to_s == "prodid"
proHash = Hash.new { |h, k| h[k] = [ ] }
proHash['relid'] << row[:relid]
proHash['releasename'] << row[:releasename]
proHash['inheritcomponents'] << row[:inheritcomponents]
# to_i is preferred
productId = value.to_i
# this notation is preferred
result[productId] ||= []
# this is identical to what you did and more readable
result[productId] << proHash
end
end
end
$log.info("Method #{self.class.name}.#{__method__} has finished")
#productReleaseArr = result
end
You haven't given much to go on, but it appears that #releaseHashTable contains an Array, not a Hash.
Update: Based on the implementation you posted, you can see that productId is an integer and that the return value of readReleaseCSV() is an array.
In order to get the releasename you want, you have to do this:
#releaseHashTable[productId][n][:releasename]
where productId and n are integers. Either you'll have to specify them specifically, or (if you don't know n) you'll have to introduce a loop to collect all the releasenames for all the products of a particular productId.
This is what Mark Thomas meant:
> a = [1,2,3] # => [1, 2, 3]
> a[:sym]
TypeError: can't convert Symbol into Integer
# here starts the backstrace
from (irb):2:in `[]'
from (irb):2
An Array is only accessible by an index like so a[1] this fetches the second element from the array
Your return a an array and thats why your code fails:
#....
result = Array.new
#....
#productReleaseArr = result
# and then later on you call
#releaseHashTable = readReleaseCSV()
#releaseHashTable[:releasename] # which gives you TypeError: can't convert Symbol into Integer

Resources