YAML/Ruby: Get the first item whose <field> is <value>? - ruby

I have this YAML:
- company:
- id: toyota
- fullname: トヨタ自動車株式会社
- company:
- id: konami
- fullname: Konami Corporation
And I want to get the fullname of the company whose id is konami.
Using Ruby 1.9.2, what is the simplest/usual way to get it?
Note: In the rest of my code, I have been using require "yaml" so I would prefer to use the same library.

This works too and does not use iteration:
y = YAML.load_file('japanese_companies.yml')
result = y.select{ |x| x['company'].first['id'] == 'konami' }
result.first['company'].last['fullname'] # => "Konami Corporation"
Or if you have other attributes and you can't be sure fullname is the last one:
result.first['company'].select{ |x| x['fullname'] }.first['fullname']
I agree with Ray Toal, if you change your yml it becomes much easier. E.g.:
fullname: トヨタ自動車株式会社
fullname: Konami Corporation
With the above yaml, fetching the fullname of konami becomes much easier:
y = YAML.load_file('test.yml')

Your YAML is a little unconventional but we can compensate.
A brute force approach is (I'm not sure if this can be done without parsing the YAML):
require 'yaml'
YAML.parse_file(ARGV[0]).transform.each do |company|
properties = {}
company['company'].each {|h| properties = properties.merge(h)}
puts properties['fullname'] if properties['id'] == 'konami'
Pass your YAML file in as the first argument to this script.
Feel free to adapt into a method that takes the YAML as a string and returns the desired fullname. (A return is useful because it directly answers the OP's question of obtaining the first such company.)


using construct_undefined in ruamel from_yaml

I'm creating a custom yaml tag MyTag. It can contain any given valid yaml - map, scalar, anchor, sequence etc.
How do I implement class MyTag to model this tag so that ruamel parses the contents of a !mytag in exactly the same way as it would parse any given yaml? The MyTag instance just stores whatever the parsed result of the yaml contents is.
The following code works, and the asserts should should demonstrate exactly what it should do and they all pass.
But I'm not sure if it's working for the right reasons. . . Specifically in the from_yaml class method, is using commented_obj = constructor.construct_undefined(node) a recommended way of achieving this, and is consuming 1 and only 1 from the yielded generator correct? It's not just working by accident?
Should I instead be using something like construct_object, or construct_map or. . .? The examples I've been able to find tend to know what type it is constructing, so would either use construct_map or construct_sequence to pick which type of object to construct. In this case I effectively want to piggy-back of the usual/standard ruamel parsing for whatever unknown type there might be in there, and just store it in its own type.
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedSeq, TaggedScalar
class MyTag():
yaml_tag = '!mytag'
def __init__(self, value):
self.value = value
def from_yaml(cls, constructor, node):
commented_obj = constructor.construct_undefined(node)
flag = False
for data in commented_obj:
if flag:
raise AssertionError('should only be 1 thing in generator??')
flag = True
return cls(data)
with open('mytag-sample.yaml') as yaml_file:
yaml_parser = ruamel.yaml.YAML()
yaml = yaml_parser.load(yaml_file)
custom_tag_with_list = yaml['root'][0]['arb']['k2']
assert type(custom_tag_with_list) is MyTag
assert type(custom_tag_with_list.value) is CommentedSeq
standard_list = yaml['root'][0]['arb']['k3']
assert type(standard_list) is CommentedSeq
assert standard_list == custom_tag_with_list.value
custom_tag_with_map = yaml['root'][1]['arb']
assert type(custom_tag_with_map) is MyTag
assert type(custom_tag_with_map.value) is CommentedMap
standard_map = yaml['root'][1]['arb_no_tag']
assert type(standard_map) is CommentedMap
assert standard_map == custom_tag_with_map.value
custom_tag_scalar = yaml['root'][2]
assert type(custom_tag_scalar) is MyTag
assert type(custom_tag_scalar.value) is TaggedScalar
standard_tag_scalar = yaml['root'][3]
assert type(standard_tag_scalar) is str
assert standard_tag_scalar == str(custom_tag_scalar.value)
And some sample yaml:
- item: blah
k1: v1
k2: !mytag
- one
- two
- three-k1: three-v1
three-k2: three-v2
three-k3: 123 # arb comment
- a
- b
- True
- one
- two
- three-k1: three-v1
three-k2: three-v2
three-k3: 123 # arb comment
- a
- b
- True
- item: argh
arb: !mytag
k1: v1
k2: 123
# blah line 1
# blah line 2
k31: v31
- False
- string here
- 321
k1: v1
k2: 123
# blah line 1
# blah line 2
k31: v31
- False
- string here
- 321
- !mytag plain scalar
- plain scalar
- item: no comment
- one1
- two2
In YAML you can have anchors and aliases, and it is perfectly fine to have an object be a child of itself (using an alias). If you want to dump the Python data structure data:
data = [1, 2, 4, dict(a=42)]
data[3]['b'] = data
it dumps to:
- 1
- 2
- 4
- a: 42
b: *id001
and for that anchors and aliases are necessary.
When loading such a construct, ruamel.yaml recurses into the nested data structures, but if the toplevel node has not caused a real object to be constructed to which the anchor can be made a reference, the recursive leaf cannot resolve the alias.
To solve that, a generator is used, except for scalar values. It first creates an empty object, then recurses and updates it values. In code calling the constructor a check is made to see if a generator is returned, and in that case next() is done on the data, and potential self-recursion "resolved".
Because you call construct_undefined(), you always get a generator. Practically that method could return a value if it detects a scalar node (which of course cannot recurse), but it doesn't. If it would, your code could then not load the following YAML document:
!mytag 1
without modifications that test if you get a generator or not, as is done in the code in ruamel.yaml calling the various constructors so it can handle both construct_undefined and e.g. construct_yaml_int (which is not a generator).

Using a CSV file to insert values using Ruby

I have some sample code I can execute for our Nexpose server and I need to do some mass asset tagging. Here is an example of the code.
nsc = Nexpose::Connection.new('your_nexpose_instance', 'username', 'password', 3780)
criterion = Nexpose::Tag::Criterion.new('IP_RANGE', 'IN', ['ip1', 'ip2'])
criteria = Nexpose::Tag::Criteria.new(criterion)
tag = Nexpose::Tag.new("tagname", Nexpose::Tag::Type::Generic::CUSTOM)
tag.search_criteria = criteria
I have a file called with the following data.
How would I go about running a for loop and using the CSV to quickly process the above code? I have no experiance with Ruby and tried to follow some example but I'm confused at this point.
There's a CSV library in Ruby's standard lib collection that you can use.
Basic example based on your code example and data, not tested:
require 'csv'
nsc = Nexpose::Connection.new('your_nexpose_instance', 'username', 'password', 3780)
CSV.foreach("path/to/file.csv", headers: true) do |row|
criterion = Nexpose::Tag::Criterion.new('IP_RANGE', 'IN', [row['ip1'], row['ip2'])
criteria = Nexpose::Tag::Criteria.new(criterion)
tag = Nexpose::Tag.new(row['tagname'], Nexpose::Tag::Type::Generic::CUSTOM)
tag.search_criteria = criteria
I made a directory with input.csv and main.rb
require "csv"
CSV.foreach("input.csv", headers: true) do |row|
puts "ip1: #{row['ip1']}"
puts "ip2: #{row['ip2']}"
puts "tagname: #{row['tagname']}"
the output is
tagname: Workstations
I hope this can help. If you have questions I'm here :)
If you just need to loop through each line of the file and fire that chunk of code for each line, you could do something like this:
file = Net::HTTP.get(URI(<whatever_your_file_name_is>))
index = 0
file.each_line do |line|
next if index == 0
index += 1
split_line = line.split(',')
ip1 = split_line[0]
ip2 = split_line[1]
tagname = split_line[2]
nsc = Nexpose::Connection.new('your_nexpose_instance', 'username', 'password', 3780)
criterion = Nexpose::Tag::Criterion.new('IP_RANGE', 'IN', [ip1, ip2])
criteria = Nexpose::Tag::Criteria.new(criterion)
tag = Nexpose::Tag.new(tagname, Nexpose::Tag::Type::Generic::CUSTOM)
tag.search_criteria = criteria
NOTE: This code example is assuming that the CSV file is stored remotely, not locally.
ALSO: In case you're wondering, the next if index == 0 is there to skip your header record.
To use this approach for a local file, you can use File.open() instead of Net::HTTP.get(), like so:
file = File.open(<whatever_your_file_name_is>).read
Two things to note:
Make sure you use the fully-qualified name of the file - i.e. ~/folder/folder/filename.csv instead of just filename.csv.
If the files you're going to be loading are enormous, this might not be an ideal approach because it's actually reading the whole file into memory. But considering your file only has 3 columns, you'd have to have an extreme number of rows in the file for this to be an issue.

YAML deserializer with position information?

Does anyone know of a YAML deserializer that can provide position information for the constructed objects?
I know how to deserialize a YAML file into a Java object. Simple instructions on http://yamlbeans.sourceforge.net/.
However, I want to do some algorithmic validation on the deserialized object and report error back to the user pointing to the position in the YAML that cause the error.
=========YAML file==========
name: Nathan Sweet
age: 28
address: 4011 16th Ave S
=======JAVA class======
public class Contact {
public String name;
public int age;
public String address;
Imagine if I want to first load the yaml into Contact class and then validate the address against some repository and error back if its invalid. Something like:
'Line 3 Column 9: The address does not match valid entry in the database'
The problem is, currently there is no way to get the position inside a deserialized object from YAML.
Anyone know a solution to this issue?
Most YAML parsers, if they keep any information about positions around they drop it while constructing the language native objects.
In ruamel.yaml ¹, I keep more information around because I want to be able to round-trip with minimal loss of original layout (e.g. keeping comments and key order in mappings).
I don't keep information on individual key-value pairs, but I do on the "upper-left" position of a mapping². Because of the kept order of the mapping items you can give some rather nice feedback. Given an input file:
- name: anthon
age: 53
adres: Rijn en Schiekade 105
- name: Nathan Sweet
age: 28
address: 4011 16th Ave S
And a program that you call with the input file as argument:
#! /usr/bin/env python2.7
# coding: utf-8
# http://stackoverflow.com/questions/30677517/yaml-deserializer-with-position-information?noredirect=1#comment49491314_30677517
import sys
import ruamel.yaml
up_arrow = '↑'
def key_error(key, value, line, col, error, e='E'):
print('E[{}:{}]: {}'.format(line, col, error))
print('{}{}: {}'.format(' '*col, key, value))
print('{}{}'.format(' '*(col), up_arrow))
def value_error(key, value, line, col, error, e='E'):
val_col = col + len(key) + 2
print('{}[{}:{}]: {}'.format(e, line, val_col, error))
print('{}{}: {}'.format(' '*col, key, value))
print('{}{}'.format(' '*(val_col), up_arrow))
def value_warning(key, value, line, col, error):
value_error(key, value, line, col, error, e='W')
class Contact(object):
def __init__(self, vals):
for offset, k in enumerate(vals):
self.check(k, vals[k], vals.lc.line+offset, vals.lc.col)
for k in ['name', 'address', 'age']:
if k not in vals:
print('K[{}:{}]: {}'.format(
vals.lc.line+offset, vals.lc.col, "missing key: "+k
def check(self, key, value, line, col):
if key == 'name':
if value[0].lower() == value[0]:
value_error(key, value, line, col,
'value should start with uppercase')
elif key == 'age':
if value < 50:
value_warning(key, value, line, col,
'probably too young for knowing ALGOL 60')
elif key == 'address':
key_error(key, value, line, col,
"unexpected key")
data = ruamel.yaml.load(open(sys.argv[1]), Loader=ruamel.yaml.RoundTripLoader)
for x in data:
contact = Contact(x)
giving you E(rrors), W(arnings) and K(eys missing):
E[0:8]: value should start with uppercase
name: anthon
E[2:2]: unexpected key
adres: Rijn en Schiekade 105
K[2:2]: missing key: address
W[4:7]: probably too young for knowing ALGOL 60
age: 28
Which you should be able to parser in a calling program in any language to give feedback. The check method of course need adjusting to your requirements. This is not as nice as being to do that in the language the rest of your application is in, but it might be better than nothing.
In my experience handling the above format is certainly simpler than extending an existing (open source) YAML parser.
¹ Disclaimer: I am the author of that package
² I want to use that kind of information at some point to preserve spurious newlines, inserted for readability
In python, you can readily write custom Dumper/Loader objects and use them to load (or dump) your yaml code. You can have these objects track the file/line info:
import yaml
from collections import OrderedDict
class YamlOrderedDict(OrderedDict):
An OrderedDict that was loaded from a yaml file, and is annotated
with file/line info for reporting about errors in the source file
def _annotate(self, node):
self._key_locs = {}
self._value_locs = {}
nodeiter = node.value.__iter__()
for key in self:
subnode = nodeiter.next()
self._key_locs[key] = subnode[0].start_mark.name + ':' + \
self._value_locs[key] = subnode[1].start_mark.name + ':' + \
def key_loc(self, key):
return self._key_locs[key]
except AttributeError, KeyError:
return ''
def value_loc(self, key):
return self._value_locs[key]
except AttributeError, KeyError:
return ''
# Use YamlOrderedDict objects for yaml maps instead of normal dict
yaml.add_representer(OrderedDict, lambda dumper, data:
yaml.add_representer(YamlOrderedDict, lambda dumper, data:
def _load_YamlOrderedDict(loader, node):
rv = YamlOrderedDict(loader.construct_pairs(node))
return rv
yaml.add_constructor(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, _load_YamlOrderedDict)
Now when you read a yaml file, any mapping objects will be read as a YamlOrderedDict, which allows looking up the file location of keys in the mapping object. You can also add an iterator method like:
def iter_with_lines(self):
for key, val in self.items():
yield (key, val, self.key_loc(key))
...and now you can write a loop like:
for key,value,location in obj.iter_with_lines():
# iterate through the key/value pairs in a YamlOrderedDict, with
# the source file location

Extract data from URL with Ruby

I'm new to ruby and I'm trying to return a list of ASINs and corresponding prices using Ruby. I was able to get pretty close to what I need but would need help to answer 2 questions:
How can I get rid of the [[' and >\n"]] around the ASIN (see result below)
Is there a simpler way to extract the ASIN from the URL than using this regex?
Thanks so much for your help!
Here is what I get in the Terminal from the current code:
[["B00EJDIG8M\n"]] - $7.00
[["B00KJ07SEM\n"]] - $26.99
[["B000FAR33M\n"]] - $119.00
[["B00LLMKPVK\n"]] - $22.99
[["B007NXPAQG\n"]] - $9.47
[["B004W5WAMU\n"]] - $22.43
[["B00LFUNGU0\n"]] - $17.99
[["B0052G14E8\n"]] - $54.99
[["B002MPLYEW\n"]] - $212.99
[["B00009W3G7\n"]] - $6.61
[["B000NCTOUM\n"]] - $3.04
[["B009SANIDO\n"]] - $12.29
[["B0052G51AQ\n"]] - $67.99
[["B003XEUEPQ\n"]] - $26.74
[["B00CYH9HRO\n"]] - $25.75
[["B00KV0SKQK\n"]] - $21.99
[["B009PCI2JU\n"]] - $56.66
[["B00LLM6ZFK\n"]] - $24.99
[["B004RQDY60\n"]] - $18.40
[["B000JLNBW4\n"]] - $49.14
Here is the code:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
PAGE_URL = "http://www.amazon.com/Best-Sellers-Appliances/zgbs/appliances/ref=zg_bs_nav_0"
page = Nokogiri::HTML(open(PAGE_URL))
page.css(".zg_itemWrapper").each do |item|
price = item.at_css(".zg_price .price").text
asin = item.at_css(".zg_title a")[:href].scan(/http:\/\/(?:www\.|)amazon\.com\/(?:gp\/product|[^\/]+\/dp|dp)\/([^\/]+)/)
puts "#{asin} - #{price}"
Rather than cleaning up your Nokogiri search, the easiest thing to do at this point is just clean up your current asin values during interpolation. For example:
puts "#{asin.flatten.pop.chomp} - #{price}"
Regarding question 2., I realized I don't really need regex and found a way to get the same result with a much shorter line of code
asin = item.at_css(".zg_title a")[:href].scan(/http:\/\/(?:www\.|)amazon\.com\/(?:gp\/product|[^\/]+\/dp|dp)\/([^\/]+)/)
asin = item.at_css(".zg_title a")[:href].split("/")[5].chomp

Working with nested hashes in Rails 3

I'm working with the Koala gem and the Facebook Graph API, and I want to break down the results I get for a users feed into separate variables for inserting into a mySQL database, probably using Active Record. Here is the code I have so far:
#token = Service.where(:provider => 'facebook', :user_id => session[:user_id]).first.token
#graph = Koala::Facebook::GraphAPI.new(#token)
#feeds = params[:page] ? #graph.get_page(params[:page]) : #graph.get_connections("me", "home")
And here is what #feeds looks like:
[{"id"=>"1519989351_1799856285747", "from"=>{"name"=>"April Daggett Swayne", "id"=>"1519989351"},
"link"=>"http://www.facebook.com/photo.php?fbid=1799856805760&set=a.1493877356465.2064294.1519989351&type=1", "name"=>"Mobile Uploads",
"icon"=>"http://static.ak.fbcdn.net/rsrc.php/v1/yx/r/og8V99JVf8G.gif", "type"=>"photo", "object_id"=>"1799856805760", "application"=>{"name"=>"Facebook for Android",
"id"=>"350685531728"}, "created_time"=>"2011-07-03T03:14:04+0000", "updated_time"=>"2011-07-03T03:14:04+0000"}, {"id"=>"2733058_10100271380562998", "from"=>{"name"=>"Joshua Ramirez",
"id"=>"2733058"}, "message"=>"Just posted a photo",
"link"=>"http://instagr.am/p/G1tp8/", "name"=>"jtrainexpress's photo", "caption"=>"instagr.am",
"icon"=>"http://photos-e.ak.fbcdn.net/photos-ak-snc1/v27562/10/124024574287414/app_2_124024574287414_6936.gif", "actions"=>[{"name"=>"Comment",
"link"=>"http://www.facebook.com/2733058/posts/10100271380562998"}, {"name"=>"Like", "link"=>"http://www.facebook.com/2733058/posts/10100271380562998"}], "type"=>"link",
"application"=>{"name"=>"Instagram", "id"=>"124024574287414"}, "created_time"=>"2011-07-03T02:07:37+0000", "updated_time"=>"2011-07-03T02:07:37+0000"},
{"id"=>"588368718_10150230423643719", "from"=>{"name"=>"Eric Bailey", "id"=>"588368718"}, "link"=>"http://www.facebook.com/pages/Martis-Camp/105474549513998", "name"=>"Martis Camp",
"caption"=>"Eric checked in at Martis Camp.", "description"=>"Rockin the pool", "icon"=>"http://www.facebook.com/images/icons/place.png", "actions"=>[{"name"=>"Comment",
"link"=>"http://www.facebook.com/588368718/posts/10150230423643719"}, {"name"=>"Like", "link"=>"http://www.facebook.com/588368718/posts/10150230423643719"}],
"place"=>{"id"=>"105474549513998", "name"=>"Martis Camp", "location"=>{"city"=>"Truckee", "state"=>"CA", "country"=>"United States", "latitude"=>39.282813917575,
"longitude"=>-120.16736760768}}, "type"=>"checkin", "application"=>{"name"=>"Facebook for iPhone", "id"=>"6628568379"}, "created_time"=>"2011-07-03T01:58:32+0000",
"updated_time"=>"2011-07-03T01:58:32+0000", "likes"=>{"data"=>[{"name"=>"Mike Janes", "id"=>"725535294"}], "count"=>1}}]
I have looked around for clues on this, and haven't found it yet (but I'm still working on my stackoverflow-foo). Any help would be greatly appreciated.
That isn't a Ruby Hash, that's a fragment of a JSON string. First you need to decode into a Ruby data structure:
# If your JSON string is in json...
h = ActiveSupport::JSON.decode(json) # Or your favorite JSON decoder.
Now you'll have a Hash in h so you can access it like any other Hash:
array = h['data']
puts array[0]['id']
# prints out 1111111111_0000000000000
puts array[0]['from']['name']
# prints Jane Done
