Parsing text using Ruby - ruby

I have the following text which will always follow the same format:
1
"13"
"241"
"Rabun"
"06"
"County"
2
"13"
"281"
"Towns"
"06"
"County"
I would like to assign each section to a hash like:
locality= {:id => "", :fips1 => "", :fips2 => "", :county => "", :stateid => "", :type => ""}
How would I go about doing this in Ruby? Any help is greatly appreciated.

fields = [:fips1,:fips2,:county,:stateid,:type]
arraywithhashes = yourtextdata.split("\n\n").map { |loc|
Hash[
[[:id,loc[/\d+/]]] +
fields.zip(loc.scan(/"([^"]+)"/).map &:first)
]
}
If you add new fields to your file, the only you'll need to edit is to add it to fields.

for each section, use a regular expression with groups corresponding to each entry in the section, then simply create hash table as you described from these groups.

locality.each_key { |k| locality.store(k, "foo") }
Another newbie-ish person here, but that might be a start for you.

You might want to consider using a Struct instead of a Hash.
Locality = Struct.new(:id, :fips1, :fips2, :county, :stateid, :type)
localities = []
DATA.each_slice(7) do |chunk|
chunk.pop if chunk.size == 7
localities << Locality.new(*chunk.map{|line| line.scan(/\w+/) })
end
p localities # => [#<struct Locality id=["1"], fips1=["13"], fips2=["241"], etc.
puts localities[1].fips2 # => 281
__END__
1
"13"
"241"
"Rabun"
"06"
"County"
2
"13"
"281"
"Towns"
"06"
"County"
each_slice(7) takes 7 lines of
DATA (the stuff after __END__ ).
The last line is removed unless there
are only six lines (the last
'record').
A cleaned-up copy of the remaining
lines is made. With these values a
new Locality is created and added to
an array

Related

How to merge values of a single hash?

Is there any way to merge values of a single hash?
Example:
address = {
"apartment" => "1",
"building" => "Lido House",
"house_number" => "20",
"street_name" => "Mount Park Road",
"city" => "Greenfield",
"county" => nil,
"post_code" => "WD1 8DC"
}
Could we get an outcome which looks like this?
1 Lido House,
20 Mount Park Road,
Greenfield,
WD1 8DC
address.compact will remove the value which equals nil, but what if in a method you include string interpolation and you want to exclude the nil value for some addresses and include it for others without a comma at the end?
def address(hash)
hash.compact
puts "#{hash["apartment"]} #{hash["building"]}, \n#{hash["house_number"]} #{hash["street_name"]}, \n#{hash["city"]}, \n#{hash["county"]}, \n#{hash["post_code"]}"
end
You need to join the values in a string:
"#{address['house_number']} #{address['street_name']},\n#{address['city']},\n#{address['post_code']}"
You could also improve the formatting by making this a helper method, and using a HEREDOC:
def formatted_address(address)
<<~ADDRESS
#{address['house_number']} #{address['street_name']},
#{address['city']},
#{address['post_code']}
ADDRESS
end
Usage:
address = {
"house_number" => 20,
"street_name" => "Mount Park Road",
"city" => "Greenfield",
"post_code" => "WD1 8DC"
}
puts formatted_address(address)
# => 20 Mount Park Road,
# Greenfield,
# WD1 8DC
Use string formats.
"%{house_number} %{street_name},\n%{city},\n%{post_code}" % address

How to transform a user input in rails 4?

I am creating an app where users could enter their name that will be returned as chemical symbols (when matching).
So I managed to do in the console like:
symbols = {
"ac" => "Ac",
"al" => "Al",
"am" => "Al",
"br" => "Br",
"ba" => "Ba",
"cr" => "Cr"
}
puts "Get your chemical name!"
name = gets.chomp
name.gsub!(/#{symbols.keys.join('|')}/, symbols)
puts name
Now I'd like to make it works in the app but I don't know how to create the method ?
I want it to be displayed only in the views/show
= #convertor.name
= link_to 'Edit', edit_convertor_path(#convertor)
= link_to 'Back', convertors_path
shall I create the method in my model or else where?
class Convertor < ActiveRecord::Base
def get_chemical_name(name)
symbols = {
"ac" => "Ac",
"al" => "Al",
"am" => "Al",
"br" => "Br",
"ba" => "Ba",
"cr" => "Cr"
}
name.gsub!(/#{symbols.keys.join('|')}/, symbols)
puts name
end
end
so in my view showI tried something like =#convertor.get_chemical(name) but unsuccessful..
I need your help please
Yes, the method can stay in the model.
Short one:
#convertor.get_chemical(#convertor.name)
would work but this is not a right way to do that.
Correct way would be to change the method in Convertor class to not accept any arguments, since it is an instance method and it already has access to name attribute. So, after changing the method signature
def get_chemical_name
symbols = {
"ac" => "Ac",
"al" => "Al",
"am" => "Al",
"br" => "Br",
"ba" => "Ba",
"cr" => "Cr"
}
name.gsub!(/#{symbols.keys.join('|')}/, symbols)
end
you will be able to use
=#convertor.get_chemical_name
Also, I removed useless puts name from the method definition - in Ruby the last evaluated line is already a return value of the method (unless returned before the end of the method).
Also, if by any chance you are using the symbols hash anywhere else, you can move it to constant.

Ruby: Scanning strings for matching adjacent vowel groups

I am building a script to randomly generate words that sound like english. I have broken down a large number of english words into VCV groups.
...where the V's represent ALL the adjacent vowels in a word and the C represents ALL the adjacent consonants. For example, the English word "miniature" would become
"-mi", "inia", "iatu", and "ure". "school" would become "-schoo" and "ool".
These groups will be assembled together with other groups from other words with
the rule being that the complete set of adjacent ending vowels must match the
complete set of starting vowels for the attached group.
I have constructed a hash in the following structure:
pieces = {
:starters => { "-sma" => 243, "-roa" => 77, "-si" => 984, ...},
:middles => { "iatu" => 109, "inia" => 863, "aci" => 229, ...},
:enders => { "ar-" => 19, "ouid-" => 6, "ude" => 443, ...}
}
In order to construct generated words, a "starter" string would need to end with the same vowel grouping as the "middle" string. The same applies when connecting the "middle" string with the "ender" string. One possible result using the examples above would be "-sma" + "aba" + "ar-" to give "smabar". Another would be "-si" + "inia" + "iatu" + "ude" to give "siniatude".
My problem is that when I sample any two pieces, I don't know how to ensure that the ending V group of the first piece exactly matches the beginning V group of the second piece. For example, "utua" + "uailo" won't work together because "ua" is not the same as "uai". However, a successful pair would be "utua" + "uado" because "ua" = "ua".
def match(first, second)
end_of_first = first[/[aeiou]+$|[^aeiou]+$/]
start_of_second = second[/^[aeiou]+|^[^aeiou]+/]
end_of_first == start_of_second
end
match("utua", "uailo")
# => false
match("inia", "iatu")
# => true
EDIT: I apparently can't read, I thought you just want to match the group (whether vowel or consonant). If you restrict to vowel groups, it's simpler:
end_of_first = first[/[aeiou]+$/]
start_of_second = second[/^[aeiou]+/]
Since you're already pre-processing the dictionary, I suggest doing a little more preprocessing to make generation simpler. I have two suggestions. First, for the starters and middles, separate each into a tuple (for which, in Ruby, we just use a two-element array) of the form (VC, V), so e.g. "inia" becomes ["in", "ia"]:
starters = [
[ "-sm", "a" ],
[ "-r", "oa" ],
[ "-s", "i" ],
# ...
]
We store the starters in an array since we just need to choose one at random, which we can do with Array#sample:
starter, middle1_key = starters.sample
puts starter # => "-sm"
puts middle1_key # => "a"
We want to be able to look up middles by their initial V groups, so we put those tuples in a Hash instead, with their initial V groups as keys:
middles = {
"ia" => [
[ "iat", "u" ],
[ "iabl", "e" ],
],
"i" => [
[ "in", "ia" ],
# ...
],
"a" => [
[ "ac", "i" ],
# ...
],
# ...
}
Since we stored the starter's final V group in middle1_key above, we can now use that as a key to get the array of middle tuples whose initial V group matches, and choose one at random as we did above:
possible_middles1 = middles[middle1_key]
middle1, middle2_key = possible_middles1.sample
puts middle1 # => "ac"
puts middle2_key => "i"
Just for kicks, let's pick a second middle:
middle2, ender_key = middles[middle2_key].sample
puts middle2 # => "in"
puts ender_key # => "ia"
Our enders we don't need to store in tuples, since we won't be using any part of them to look anything up like we did with middles. We can just put them in a hash whose keys are the initial V groups and whose values are arrays of all of the enders with that initial V group:
enders = {
"a" => [ "ar-", ... ],
"oui" => [ "ouid-", ... ],
"u" => [ "ude-", ... ],
"ia" => [ "ial-", "iar-", ... ]
# ...
}
We stored the second middle's final V group in ender_key above, which we can use to get the array of matching enders:
possible_enders = enders[ender_key]
ender = possible_enders.sample
puts ender # => "iar-"
Now that we have four parts, we just put them together to form our word:
puts starter + middle1 + middle2 + ender
# => -smaciniar-
Edit
The data structures above omit the relative frequencies (I wrote the above before I had a chance to read your answer to my question about the numbers). Obviously it's trivial to also store the relative frequencies alongside the parts, but I don't know off the top of my head a fast way to then choose parts in a weighted fashion. Hopefully my answer is of some use to you regardless.
You can do that using the methods Enumerable#flat_map, String#partition, Enumerable#chunk and a few more familiar ones:
def combine(arr)
arr.flat_map { |s| s.partition /[^aeiou-]+/ }.
chunk { |s| s }.
map { |_, a| a.first }.
join.delete('-')
end
combine ["-sma", "aba", "ar-"]) #=> "smabar"
combine ["-si", "inia", "iatu", "ude"] #=> "siniatude"
combine ["utua", "uailo", "orsua", "uav-"] #=> "utuauailorsuav"
To see how this works, let's look at the last example:
arr = ["utua", "uailo", "orsua", "uav-"]
a = arr.flat_map { |s| s.partition /[^aeiou-]+/ }
#=> ["u", "t", "ua", "uai", "l", "o", "o", "rs", "ua", "ua", "v", "-"]
enum = a.chunk { |s| s }
#=> #<Enumerator: #<Enumerator::Generator:0x007fdd14963888>:each>
We can see the elements of this enumerator by converting it to an array:
enum.to_a
#=> [["u", ["u"]], ["t", ["t"]], ["ua", ["ua"]], ["uai", ["uai"]],
# ["l", ["l"]], ["o", ["o", "o"]], ["rs", ["rs"]], ["ua", ["ua", "ua"]],
# ["v", ["v"]], ["-", ["-"]]]
b = enum.map { |_, a| a.first }
#=> ["u", "t", "ua", "uai", "l", "o", "rs", "ua", "v", "-"]
s = b.join
#=> "utuauailorsuav-"
s.delete('-')
#=> "utuauailorsuav"

Including empty k/v pairs when merging hashes

There are three hashes. Each hash results in a single key/value pair.
When merged and outputted to a json file, the only k/v pairs visible are the ones with data.
For example:
employee_hours[ name ] = {"Hours" => hours}
employee_revenue [ name ] = {"Revenue" => revenue}
employee_activations [ name ] = {"Activations" => activations}
If any of the k/v pairs don't exist I need them to be included in the output with a value of 0.00.
I tried to simply just include empty k/v pairs from the other hashes in each hashtable, but when merged, they overwrite existed values.
employee_hours[ name ] = {"Hours" => hours, "Revenue" = "", Activations = ""}
employee_revenue [ name ] = {"Hours" => "", "Revenue" => revenue, Activations = ""}
employee_activations [ name ] = {"Hours" => "", "Revenue" => "", "Activations" => activations}
Edit
My current code is listed here: https://gist.github.com/hnanon/766a0d6b2b0f9d9d03fd
You need to define a hash for the default values and merge into it. Assuming that employee_final is the hash where you merged all the employee information,
employee_defaults = { "Hours" => 0.0, "Revenue" => 0.0 }
employee_final.each_key do |name|
employee_final[name] = employee_defaults.merge(employee_final[name])
end
It sounds as if you need to define a 'REQUIRED_KEYS' array, and add check on their existence in your hashes. Here's one way to achieve that:
REQUIRED_KEYS = [ "A", "B", "C" ]
DEFAULT_VALUE = 0.0
REQUIRED_KEYS.each { |key| your_hash[key] = DEFAULT_VALUE if not your_hash.has_key?(key) }
Use Hash Defaults
You can use an argument to Hash#new to set a default value for a hash. For example:
require 'json'
employee_hours = Hash.new(0.0)
employee_revenue = Hash.new(0.0)
employee_activations = Hash.new(0.0)
name = 'Bob'
{
'Hours' => employee_hours[name],
'Revenue' => employee_revenue[name],
'Activations' => employee_activations[name],
}.to_json
# => "{\"Hours\":0.0,\"Revenue\":0.0,\"Activations\":0.0}"

Ruby - Array of Hashes, Trying to Select Multiple Keys and Group By Key Value

I have a set of data that is an array of hashes, with each hash representing one record of data:
data = [
{
:id => "12345",
:bucket_1_rank => "2",
:bucket_1_count => "12",
:bucket_2_rank => "7",
:bucket_2_count => "25"
},
{
:id => "45678",
:bucket_1_rank => "2",
:bucket_1_count => "15",
:bucket_2_rank => "9",
:bucket_2_count => "68"
},
{
:id => "78901",
:bucket_1_rank => "5",
:bucket_1_count => "36"
}
]
The ranks values are always between 1 and 10.
What I am trying to do is select each of the possible values for the rank fields (the :bucket_1_rank and :bucket_2_rank fields) as keys in my final resultset, and the values for each key will be an array of all the values in its associated :bucket_count field. So, for the data above, the final resulting structure I have in mind is something like:
bucket 1:
{"2" => ["12", "15"], "5" => ["36"]}
bucket 2:
{"7" => ["25"], "9" => ["68"]}
I can do this working under the assumption that the field names stay the same, or through hard coding the field/key names, or just using group_by for the fields I need, but my problem is that I work with a different data set each month where the rank fields are named slightly differently depending on the project specs, and I want to identify the names for the count and rank fields dynamically as opposed to hard coding the field names.
I wrote two quick helpers get_ranks and get_buckets that use regex to return an array of fieldnames that are either ranks or count fields, since these fields will always have the literal string "_rank" or "_count" in their names:
ranks = get_ranks
counts = get_counts
results = Hash.new{|h,k| h[k] = []}
data.each do |i|
ranks.each do |r|
unless i[r].nil?
counts.each do |c|
results[i[r]] << i[c]
end
end
end
end
p results
This seems to be close, but feels awkward, and it seems to me there has to be a better way to iterate through this data set. Since I haven't worked on this project using Ruby I'd use this as an opportunity to improve my understanding iterating through arrays of hashes, populating a hash with arrays as values, etc. Any resources/suggestions would be much appreciated.
You could shorten it to:
result = Hash.new{|h,k| h[k] = Hash.new{|h2,k2| h2[k2] = []}}
data.each do |hsh|
hsh.each do |key, value|
result[$1][value] << hsh["#{$1}_count".to_sym] if key =~ /(.*)_rank$/
end
end
puts result
#=> {"bucket_1"=>{"2"=>["12", "15"], "5"=>["36"]}, "bucket_2"=>{"7"=>["25"], "9"=>["68"]}}
Though this is assuming that :bucket_2_item_count is actually supposed to be :bucket_2_count.

Resources