Convert dot notation keys to tree-structured YAML in Ruby - ruby

I've sent my I18n files to be translated by a third party. Since my translator is not computer savvy we made a spreadsheet with the keys, they where sent in dot notation and the values translated.
For example:
es.models.parent: "Pariente"
es.models.teacher: "Profesor"
es.models.school: "Colegio"
How can I move that into a YAML file?
UPDATE: Just like #tadman said, this already is YAML. So if you are with the, you are just fine.
So we will focus this question if you would like to have the tree structure for YAML.

The first thing to do is transform this into a Hash.
So the previous info moved into this:
tr = {}
tr["es.models.parent"] = "Pariente"
tr["es.models.teacher"] = "Profesor"
tr["es.models.school"] = "Colegio"
Then we just advanced creating a deeper hash.
result = {} #The resulting hash
tr.each do |k, value|
h = result
keys = k.split(".") # This key is a concatenation of keys
keys.each_with_index do |key, index|
h[key] = {} unless h.has_key? key
if index == keys.length - 1 # If its the last element
h[key] = value # then we only need to set the value
else
h = h[key]
end
end
end;
require 'yaml'
puts result.to_yaml #Here it is for your YAMLing pleasure

Related

Ruby hashing problems

I have a large .txt file, which contains some data, mapping crashing inputs to programs to their crash sites. The data is formatted as
, and each line is another crash.
I tried to run the ruby script below to automatically sort them, but it gave no output. Any and all suggestions would be appreciated.
# !/usr/bin/ruby
fn = ARGV[0]
$result = Hash.new([])
File.open(fn, "r") do |f|
f.readlines do |l|
ar = l.split
puts(ar)
$result[ar[1]].push[ar[0]]
end
end
$result.each do |k, v|
puts(k)
puts(v)
end
I think the problem is that $result = Hash.new([]) doesn't do what you want/think it does.
It will return the same array when you request any non-existent key for the hash, and also it doesn't assign any array for subsequent requests for the same key.
Instead you can use the block version of Hash.new:
result = Hash.new {|hash, key| hash[key] = [] }
The version of Hash.new with a default value is more useful for avoiding the need for nil checks when you're using a hash to maintain some counts e.g.
counts = Hash.new(0)
counts['foo'] += 1

Ruby's optimized implementation of Histogram/Aggregator

i'm about to write my own but i was wondering if there are any gems/libs that i can use as aggregator/histogram
my goal would be to sum up values based on a matching key:
["fish","2"]
["fish","40"]
["meat","56"]
["meat","1"]
Should sum op the values per unique key and return ["fish","42"] and ["meat","57"]
.The files i have to aggregate are relatively large, about 4gb text files made of tsv key/value pair
.My goal is to try not to use temporary files in order not to take too much space on the machine, so i was wondering if something similar already optimized already exists, i have found a jeb on github named 'histogram' but it does not really contain the functionalities i need
Thx
You can use a Hash with a default value of 0 to do the counting, then in the end you could convert it to Array to yield the format you want, though I think you might just want to keep using the Hash instead.
data = [
["fish","2"],
["fish","40"],
["meat","56"],
["meat","1"]
]
hist = data.each_with_object(Hash.new(0)) do |(k,v), h|
h[k] += v.to_i
end
hist # => {"fish"=>42, "meat"=>57}
hist.to_a # => [["fish", 42], ["meat", 57]]
# To get String values, "42" instead of 42, etc:
hist.map { |k,v| [k, v.to_s] } # => [["fish", "42"], ["meat", "57"]]
Since you stated you had to read the data from a file, here is the above when applied to a file. The input.txt file contents are as follows for this example:
fish,2
fish,40
meat,56
meat,1
Then, to create the same output as before by reading it line by line:
file = File.open('input.txt')
hist = file.each_with_object(Hash.new(0)) do |line, h|
key, value = line.split(',')
h[key] += value.to_i
end
file.close

How do I subgroup this hash that has already been grouped?

I have a set of word strings which I am turning into a hash, grouped by the size of the string. I am doing this by:
hash = set.group_by(&:size)
resulting in
hash = {5=>[apple, andys, throw, balls], 7=>[bananas, oranges]}
I want to further group the hash values by first letter, so the the end results looks like:
hash = {5=>{a=>[apple, andys],b=>[balls],t=>[throw]}, 7=>{b=>[bananas], o=>[oranges]}}
I tried putting
hash.each_value do | value |
value = value.group_by(&:chr)
end
after the first group_by but that only seems to return the original hash. I am admittedly a ruby beginner so I'm not sure if I could do this in one fell swoop, or exactly how (&:size) notation works, if I were asked to write it out. Thoughts?
To update your hash you need to do like this
hash.each do |key, value|
hash[key] = value.group_by(&:chr)
end
I'd keep the whole computation functional:
>> Hash[set.group_by(&:size).map { |k, vs| [k, vs.group_by(&:chr)] }]
=> {5=>{"a"=>["apple", "andys"], "t"=>["throw"], "b"=>["balls"]},
7=>{"b"=>["bananas"], "o"=>["oranges"]}}

Reading strings from one file and adding to another file with suffix to make unique

I am processing documents in ruby.
I have a document I am extracting specific strings from using regexp and then adding them to another file. When added to the destination file they must be made unique so if that string already exists in the destination file I'am adding a simple suffix e.g. <word>_1. Eventually I want to be referencing the strings by name so random number generation or string from the date is no good.
At present I am storing each word added in an array and then everytime I add a word I check the string doesn't exist in an array which is fine if there is only 1 duplicate however there might be 2 or more so I need to check for the initial string then loop incrementing the suffix until it doesn't exist, (I have simplified my code so there may be bugs)
def add_word(word)
if #added_words include? word
suffix = 1
suffixed_word = word
while added_words include? suffixed_word
suffixed_word = word + "_" + suffix.to_s
suffix += 1
end
word = suffixed_word
end
#added_words << word
end
It looks messy, is there a better algorithm or ruby way of doing this?
Make #added_words a Set (don't forget to require 'set'). This makes for faster lookup as sets are implemented with hashes, while still using include? to check for set membership. It's also easy to extract the highest used suffix:
>> s << 'foo'
#=> #<Set: {"foo"}>
>> s << 'foo_1'
#=> #<Set: {"foo", "foo_1"}>
>> word = 'foo'
#=> "foo"
>> s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' }
#=> "foo_1"
>> s << 'foo_12' #=>
#<Set: {"foo", "foo_1", "foo_12"}>
>> s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' }
#=> "foo_12"
Now to get the next value you can insert, you could just do the following (imagine you already had 12 foos, so the next should be a foo_13):
>> s << s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' }.next
#=> #<Set: {"foo", "foo_1", "foo_12", "foo_13"}
Sorry if the examples are a bit confused, I had anesthesia earlier today. It should be enough to give you an idea of how sets could potentially help you though (most of it would work with array too, but sets have faster lookup).
Change #added_words to a Hash with a default of zero. Then you can do:
#added_words = Hash.new(0)
def add_word( word)
#added_words[word] += 1
end
# put it to work:
list = %w(test foo bar test bar bar)
names = list.map do |w|
"#{w}_#{add_word(w)}"
end
p #added_words
#=> {"test"=>2, "foo"=>1, "bar"=>3}
p names
#=>["test_1", "foo_1", "bar_1", "test_2", "bar_2", "bar_3"]
In that case, I'd probably use a set or hash:
#in your class:
require 'set'
require 'forwardable'
extend Forwardable #I'm just including this to keep your previous api
#elsewhere you're setting up your instance_var, it's probably [] at the moment
def initialize
#added_words = Set.new
end
#then instead of `def add_word(word); #added_words.add(word); end`:
def_delegator :added_words, :add_word, :add
#or just change whatever loop to use ##added_words.add('word') rather than self#add_word('word')
##added_words.add('word') does nothing if 'word' already exists in the set.
If you've got some attributes that you're grouping via these sections, then a hash might be better:
#elsewhere you're setting up your instance_var, it's probably [] at the moment
def initialize
#added_words = {}
end
def add_word(word, attrs={})
#added_words[word] ||= []
#added_words[word].push(attrs)
end
Doing it the "wrong way", but in slightly nicer code:
def add_word(word)
if #added_words.include? word
suffixed_word = 1.upto(1.0/0.0) do |suffix|
candidate = [word, suffix].join("_")
break candidate unless #added_words.include?(candidate)
end
word = suffixed_word
end
#added_words << word
end

Remove the '-' dividers in JSON keys in Ruby

I'm trying to read some JSON data from the Tumblr API.
I'm using the Hashie gem to read the values as object properties. This should make reading easier/cleaner.
it turns something like this:
data['post']['title']
into this:
data.post.title
Unfortunately there are some keys showing up with a '-' as divider between like this:
regular-title: Mijn eerste post
format: html
regular-body: <p>post</p>
therefore i cannot use post.regular-title. Is there a way to replace all the minus(-) symbols into underscores(_)?
This will do it:
def convert_object(data)
case data
when Hash
data.inject({}) do |h,(k,v)|
h[(k.respond_to?(:tr) ? k.tr('-', '_') : k)] = convert_object(v)
h
end
when Array
data.map { |i| convert_object(i) }
else
data
end
end
You can use it like this:
convert_object(JSON.parse('{"something-here":"value","otherkey":{"other-key":"value-value"}}'))
Karaszi Istvan helped me a lot with the solution. I added the check for an array in the hash. This way hashes in arrays in the hash will get underscored too.
def convert_hash(hash)
case hash
when Hash
hash.inject({}) do |h,(k,v)|
h[k.tr('-', '_')] = convert_hash(v)
h
end
when Array
array = hash
number = 0
array.each do
array[number] = convert_hash(array[number])
number += 1
end
array
else
hash
end
end
I don't know why i added the 'number' as iterator. Somehow hash.each didn't work.

Resources