Ruby: Yaml::dump outputs integers in quotes using Postgres - ruby

I am writing a Ruby gem that converts a sql executable to yaml, using Yaml::dump. However, when testing it in Postgresql I am finding that the integers are output with single quotes around them (as strings) unless they start with a zero. Below is the code snippet of the call to Yaml::dump and some resulting data.
db_object = {}
db_output = {}
full_table = ActiveRecord::Base.connection.execute("SELECT * FROM #{model};")
keys = full_table[0].keys
db_object["columns"] = keys
model_arr=[]
full_table.each do |row|
model_arr << row.values_at(*keys)
end
db_object["records"] = model_arr
db_output[model] = db_object
YAML::dump(db_output, file)
And here are the first couple rows of results:
schema_migrations:
columns:
- version
records:
- - '20121225230020'
- - '20121225230129'
---
students:
columns:
- id
- first_name
- last_name
- date_of_birth
- rank
- phone
records:
- - '1'
- Celestino
- Towne
- '2007-09-20'
- '2'
- '6417358360'
Any insight would be much appreciated.

Related

Keep indentation for array value in parsed YAML frontmatter

I am using https://github.com/waiting-for-dev/front_matter_parser to parse and update values in the frontmatter of my markdown files.
The following code removes the original indentation of two spaces for array values:
require 'front_matter_parser'
class FrontMatterUpdater
def self.run(path)
parsed = FrontMatterParser::Parser.parse_file(path)
front_matter = parsed.front_matter
front_matter['redirect_from'] = Array(front_matter['redirect_from'])
front_matter['redirect_from'] << 'new_entry2'
front_matter['redirect_from'].uniq!
new_content = [YAML.dump(front_matter), '---', "\n\n", parsed.content].join
File.write(path, new_content)
end
end
FrontMatterUpdater.run('test.md')
Content of my test.md file (same path):.
---
redirect_from:
- new_entry0
- new_entry1
---
Script result (indentation has been removed by the parser):
---
redirect_from:
- new_entry0
- new_entry1
- new_entry2
---
YAML indentation for array in hash confirms that both versions (indented and not) is valid YAML syntax.
But I'd love to keep the indentation for readability.
Do you see any option to keep the indentation of 2 spaces for the array values?

using construct_undefined in ruamel from_yaml

I'm creating a custom yaml tag MyTag. It can contain any given valid yaml - map, scalar, anchor, sequence etc.
How do I implement class MyTag to model this tag so that ruamel parses the contents of a !mytag in exactly the same way as it would parse any given yaml? The MyTag instance just stores whatever the parsed result of the yaml contents is.
The following code works, and the asserts should should demonstrate exactly what it should do and they all pass.
But I'm not sure if it's working for the right reasons. . . Specifically in the from_yaml class method, is using commented_obj = constructor.construct_undefined(node) a recommended way of achieving this, and is consuming 1 and only 1 from the yielded generator correct? It's not just working by accident?
Should I instead be using something like construct_object, or construct_map or. . .? The examples I've been able to find tend to know what type it is constructing, so would either use construct_map or construct_sequence to pick which type of object to construct. In this case I effectively want to piggy-back of the usual/standard ruamel parsing for whatever unknown type there might be in there, and just store it in its own type.
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedSeq, TaggedScalar
class MyTag():
yaml_tag = '!mytag'
def __init__(self, value):
self.value = value
#classmethod
def from_yaml(cls, constructor, node):
commented_obj = constructor.construct_undefined(node)
flag = False
for data in commented_obj:
if flag:
raise AssertionError('should only be 1 thing in generator??')
flag = True
return cls(data)
with open('mytag-sample.yaml') as yaml_file:
yaml_parser = ruamel.yaml.YAML()
yaml_parser.register_class(MyTag)
yaml = yaml_parser.load(yaml_file)
custom_tag_with_list = yaml['root'][0]['arb']['k2']
assert type(custom_tag_with_list) is MyTag
assert type(custom_tag_with_list.value) is CommentedSeq
print(custom_tag_with_list.value)
standard_list = yaml['root'][0]['arb']['k3']
assert type(standard_list) is CommentedSeq
assert standard_list == custom_tag_with_list.value
custom_tag_with_map = yaml['root'][1]['arb']
assert type(custom_tag_with_map) is MyTag
assert type(custom_tag_with_map.value) is CommentedMap
print(custom_tag_with_map.value)
standard_map = yaml['root'][1]['arb_no_tag']
assert type(standard_map) is CommentedMap
assert standard_map == custom_tag_with_map.value
custom_tag_scalar = yaml['root'][2]
assert type(custom_tag_scalar) is MyTag
assert type(custom_tag_scalar.value) is TaggedScalar
standard_tag_scalar = yaml['root'][3]
assert type(standard_tag_scalar) is str
assert standard_tag_scalar == str(custom_tag_scalar.value)
And some sample yaml:
root:
- item: blah
arb:
k1: v1
k2: !mytag
- one
- two
- three-k1: three-v1
three-k2: three-v2
three-k3: 123 # arb comment
three-k4:
- a
- b
- True
k3:
- one
- two
- three-k1: three-v1
three-k2: three-v2
three-k3: 123 # arb comment
three-k4:
- a
- b
- True
- item: argh
arb: !mytag
k1: v1
k2: 123
# blah line 1
# blah line 2
k3:
k31: v31
k32:
- False
- string here
- 321
arb_no_tag:
k1: v1
k2: 123
# blah line 1
# blah line 2
k3:
k31: v31
k32:
- False
- string here
- 321
- !mytag plain scalar
- plain scalar
- item: no comment
arb:
- one1
- two2
In YAML you can have anchors and aliases, and it is perfectly fine to have an object be a child of itself (using an alias). If you want to dump the Python data structure data:
data = [1, 2, 4, dict(a=42)]
data[3]['b'] = data
it dumps to:
&id001
- 1
- 2
- 4
- a: 42
b: *id001
and for that anchors and aliases are necessary.
When loading such a construct, ruamel.yaml recurses into the nested data structures, but if the toplevel node has not caused a real object to be constructed to which the anchor can be made a reference, the recursive leaf cannot resolve the alias.
To solve that, a generator is used, except for scalar values. It first creates an empty object, then recurses and updates it values. In code calling the constructor a check is made to see if a generator is returned, and in that case next() is done on the data, and potential self-recursion "resolved".
Because you call construct_undefined(), you always get a generator. Practically that method could return a value if it detects a scalar node (which of course cannot recurse), but it doesn't. If it would, your code could then not load the following YAML document:
!mytag 1
without modifications that test if you get a generator or not, as is done in the code in ruamel.yaml calling the various constructors so it can handle both construct_undefined and e.g. construct_yaml_int (which is not a generator).

How do you decode GraphQL::Schema::UniqueWithinType in Postgres?

Say someone evil stored an encoded id in your database and you needed to use it. Example:
Ruby:
GraphQL::Schema::UniqueWithinType.default_id_separator = '|'
relay_id = GraphQL::Schema::UniqueWithinType.encode('User', '123')
# "VXNlcnwxMjM="
How do I get 123 out of VXNlcnwxMjM= in Postgres?
Ruby:
GraphQL::Schema::UniqueWithinType.default_id_separator = '|'
relay_id = GraphQL::Schema::UniqueWithinType.encode('User', '123')
# "VXNlcnwxMjM="
Base64.decode64(relay_id)
# "User|123"
To get "123" out of "VXNlcnwxMjM=" in Postgres SQL you can do this horror show
select
substring(
(decode('VXNlcnwxMjM=', 'base64')::text)
from (char_length('User|') + 1)
for (char_length(decode('VXNlcnwxMjM=', 'base64')::text) - char_length('User|'))
)::int
Edit: Playing around with this ... on Postgres 9.6.5 the above works... but our staging server is 10.5 and I had to do this instead (which also works on 9.6.5)
select
substring(
convert_from(decode('VXNlcnwxMjM=', 'base64'), 'UTF-8')
from (char_length('User|') + 1)
for (char_length(convert_from(decode('VXNlcnwxMjM=', 'base64'), 'UTF-8')) - char_length('User|'))
)::int

Extract data from URL with Ruby

I'm new to ruby and I'm trying to return a list of ASINs and corresponding prices using Ruby. I was able to get pretty close to what I need but would need help to answer 2 questions:
How can I get rid of the [[' and >\n"]] around the ASIN (see result below)
Is there a simpler way to extract the ASIN from the URL than using this regex?
Thanks so much for your help!
Here is what I get in the Terminal from the current code:
[["B00EJDIG8M\n"]] - $7.00
[["B00KJ07SEM\n"]] - $26.99
[["B000FAR33M\n"]] - $119.00
[["B00LLMKPVK\n"]] - $22.99
[["B007NXPAQG\n"]] - $9.47
[["B004W5WAMU\n"]] - $22.43
[["B00LFUNGU0\n"]] - $17.99
[["B0052G14E8\n"]] - $54.99
[["B002MPLYEW\n"]] - $212.99
[["B00009W3G7\n"]] - $6.61
[["B000NCTOUM\n"]] - $3.04
[["B009SANIDO\n"]] - $12.29
[["B0052G51AQ\n"]] - $67.99
[["B003XEUEPQ\n"]] - $26.74
[["B00CYH9HRO\n"]] - $25.75
[["B00KV0SKQK\n"]] - $21.99
[["B009PCI2JU\n"]] - $56.66
[["B00LLM6ZFK\n"]] - $24.99
[["B004RQDY60\n"]] - $18.40
[["B000JLNBW4\n"]] - $49.14
Here is the code:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
PAGE_URL = "http://www.amazon.com/Best-Sellers-Appliances/zgbs/appliances/ref=zg_bs_nav_0"
page = Nokogiri::HTML(open(PAGE_URL))
page.css(".zg_itemWrapper").each do |item|
price = item.at_css(".zg_price .price").text
asin = item.at_css(".zg_title a")[:href].scan(/http:\/\/(?:www\.|)amazon\.com\/(?:gp\/product|[^\/]+\/dp|dp)\/([^\/]+)/)
puts "#{asin} - #{price}"
end
Rather than cleaning up your Nokogiri search, the easiest thing to do at this point is just clean up your current asin values during interpolation. For example:
puts "#{asin.flatten.pop.chomp} - #{price}"
Regarding question 2., I realized I don't really need regex and found a way to get the same result with a much shorter line of code
replacing
asin = item.at_css(".zg_title a")[:href].scan(/http:\/\/(?:www\.|)amazon\.com\/(?:gp\/product|[^\/]+\/dp|dp)\/([^\/]+)/)
with
asin = item.at_css(".zg_title a")[:href].split("/")[5].chomp

YAML/Ruby: Get the first item whose <field> is <value>?

I have this YAML:
- company:
- id: toyota
- fullname: トヨタ自動車株式会社
- company:
- id: konami
- fullname: Konami Corporation
And I want to get the fullname of the company whose id is konami.
Using Ruby 1.9.2, what is the simplest/usual way to get it?
Note: In the rest of my code, I have been using require "yaml" so I would prefer to use the same library.
This works too and does not use iteration:
y = YAML.load_file('japanese_companies.yml')
result = y.select{ |x| x['company'].first['id'] == 'konami' }
result.first['company'].last['fullname'] # => "Konami Corporation"
Or if you have other attributes and you can't be sure fullname is the last one:
result.first['company'].select{ |x| x['fullname'] }.first['fullname']
I agree with Ray Toal, if you change your yml it becomes much easier. E.g.:
toyota:
fullname: トヨタ自動車株式会社
konami:
fullname: Konami Corporation
With the above yaml, fetching the fullname of konami becomes much easier:
y = YAML.load_file('test.yml')
y.fetch('konami')['fullname']
Your YAML is a little unconventional but we can compensate.
A brute force approach is (I'm not sure if this can be done without parsing the YAML):
require 'yaml'
YAML.parse_file(ARGV[0]).transform.each do |company|
properties = {}
company['company'].each {|h| properties = properties.merge(h)}
puts properties['fullname'] if properties['id'] == 'konami'
end
Pass your YAML file in as the first argument to this script.
Feel free to adapt into a method that takes the YAML as a string and returns the desired fullname. (A return is useful because it directly answers the OP's question of obtaining the first such company.)

Resources