grouping a hash based on parameters ruby - ruby

so i have the following
[{:item=>"x"}, {:item2=>"x"}, {:item3=>"x"}, {:item=>"x"},{:item3=>"x"}]
I want to get this split into groups,
so each group starts at item and ends at item3, item2 could be missing
ideally i want
{:item=>"x",:item2=>"x",:item3=>"x"} & {:item=>"x",:item3=>"x"}
So in a real example:
An 3 items need to be posted but I get an array from an excel spreadsheet
name: blah
id: blah
color: blah
name: blah
date: blah
size: blah
name: blah
id: blah
date: blah
color: blah
size: blah
I need to post each record, given I have an array like above is there some way to group/split the array on a given hash element key field?

If your data is really delimited by double line break, you should take advantage by splitting first by paragraph, then by line, then by the colon. Then you don't have to worry about missing data and can blindly fill in key/value pairs.

A functional approach, get the indexes where the :items are and split there:
hs = [{:item=>"x"}, {:item2=>"x"}, {:item3=>"x"}, {:item=>"x"},{:item3=>"x"}]
indexes = hs.map.with_index { |h, i| i if h.first[0] == :item }.compact
(indexes + [hs.size]).each_cons(2).map { |from, to| hs[from...to].reduce(:merge) }
#=> [{:item=>"x", :item2=>"x", :item3=>"x"}, {:item=>"x", :item3=>"x"}]
If you prefer a more declarative approach (I do), add some abstractions to your extensions library so you can write:
indexes = hashes.find_indexes { |h| h.first[0] == :item }
hashes.split_at(*indexes.drop(1)).map { |hs| hs.reduce(:merge) }

try this
input = [{:item=>"x"}, {:item2=>"x"}, {:item3=>"x"}, {:item=>"x"},{:item3=>"x"}]
res = []
input.each do |element|
if element.keys.first == :item
res << element
else
res.last.merge! element
end
end
puts puts res.inspect # => [{:item=>"x", :item2=>"x", :item3=>"x"}, {:item=>"x", :item3=>"x"}]

Pure awesomness of Ruby:
arr = [{:item=>"x"}, {:item2=>"x"}, {:item3=>"x"}, {:item=>"x"}, {:item3=>"x"}]
arr.each_slice(3).map { |a| a.inject(&:merge) }
=> [{:item=>"x", :item2=>"x", :item3=>"x"}, {:item=>"x", :item3=>"x"}]

Related

How to create a Hash from a nested CSV in Ruby?

I have a CSV in the following format:
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,codes.1
YK,1234,4567,AB001,AK002
As you can see, this is a nested structure. The CSV may contain multiple rows. I would like to convert this into an array of hashes like this:
[
{
name: 'YK',
contacts: [
{
phone_no: '1234'
},
{
phone_no: '4567'
}
],
codes: ['AB001', 'AK002']
}
]
The structure uses numbers in the given format to represent arrays. There can be hashes inside arrays. Is there a simple way to do that in Ruby?
The CSV headers are dynamic. It can change. I will have to create the hash on the fly based on the CSV file.
There is a similar node library called csvtojson to do that for JavaScript.
Just read and parse it line-by-line. The arr variable in the code below will hold an array of Hash that you need
arr = []
File.readlines('README.md').drop(1).each do |line|
fields = line.split(',').map(&:strip)
hash = { name: fields[0], contacts: [fields[1], fields[2]], address: [fields[3], fields[4]] }
arr.push(hash)
end
Let's first construct a CSV file.
str = <<~END
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,IQ,codes.1
YK,1234,4567,AB001,173,AK002
ER,4321,7654,BA001,81,KA002
END
FName = 't.csv'
File.write(FName, str)
#=> 121
I have constructed a helper method to construct a pattern that will be used to convert each row of the CSV file (following the first, containing the headers) to an element (hash) of the desired array.
require 'csv'
def construct_pattern(csv)
csv.headers.group_by { |col| col[/[^.]+/] }.
transform_values do |arr|
case arr.first.count('.')
when 0
arr.first
when 1
arr
else
key = arr.first[/(?<=\d\.).*/]
arr.map { |v| { key=>v } }
end
end
end
In the code below, for the example being considered:
construct_pattern(csv)
#=> {"name"=>"name",
# "contacts"=>[{"phone_no"=>"contacts.0.phone_no"},
# {"phone_no"=>"contacts.1.phone_no"}],
# "codes"=>["codes.0", "codes.1"],
# "IQ"=>"IQ"}
By tacking if pattern.empty? onto the above expression we ensure the pattern is constructed only once.
We may now construct the desired array.
pattern = {}
CSV.foreach(FName, headers: true).map do |csv|
pattern = construct_pattern(csv) if pattern.empty?
pattern.each_with_object({}) do |(k,v),h|
h[k] =
case v
when Array
case v.first
when Hash
v.map { |g| g.transform_values { |s| csv[s] } }
else
v.map { |s| csv[s] }
end
else
csv[v]
end
end
end
#=> [{"name"=>"YK",
# "contacts"=>[{"phone_no"=>"1234"}, {"phone_no"=>"4567"}],
# "codes"=>["AB001", "AK002"],
# "IQ"=>"173"},
# {"name"=>"ER",
# "contacts"=>[{"phone_no"=>"4321"}, {"phone_no"=>"7654"}],
# "codes"=>["BA001", "KA002"],
# "IQ"=>"81"}]
The CSV methods I've used are documented in CSV. See also Enumerable#group_by and Hash#transform_values.

Ruby 2.5 efficient way to delete ruby key if it contains a hash with only one key/val pair

Assuming a data structure that looks like the following:
foo = {
'first': {
'bar': 'foo'
},
'second': {
'bar': 'foobar',
'foo': 'barfoo'
},
'third': {
'test': 'example'
}
}
I want to remove all keys from the Hash foo that contain an entry that has only one key/val pair. In this particular case, after the operation is done, foo should only have left:
foo = {
'second': {
'bar': 'foobar',
'foo': 'barfoo'
}
}
as foo['first'] and foo['third'] only contain one key/val pair.
Option 1 - delete_if
foo.delete_if { |_, inner| inner.one? }
delete_if is destructive so it mutates the original hash
This will let through empty hashes
Option 2 - reject
This doesn't mutate any more:
foo = foo.reject { |_, inner| inner.one? }
This will let through empty hashes
Option 3 - select
No mutation plus different operator:
foo = foo.select { |_, inner| inner.size > 1 }
Option 4 - many? - Rails only
foo = foo.select { |_, inner| inner.many? }
If you're using Rails it defines #many? for you which is any array with more than 1 item
Other Notes
Used _ for unused variables as that's a way of showing "this is irrelevant"
Named the variable inner - convinced there's a better name but value could be confusing
Just a pair of option more, letting apart the way to check the condition.
Using Hash#keep_if
foo.keep_if{ |_, v| v.size > 1 }
And a more complicated, Enumerable#each_with_object:
foo.each_with_object({}){ |(k,v), h| h[k] = v if v.size > 1 }

Ruby - iterating through database results and storing them in a hash

Consider the following:
details = Hash.new
# Example of this Hash's final look
details["team1"] = "Example"
details["team2"] = "Another team"
details["foo"] = "Bar"
The way I get the names of the two teams is through:
teams = Match.find(1).teams
=> [#<Team id: 1, name: "England">, #<Team id: 2, name: "Australia">]
Now I would like to save the names of the teams into the Hash under team1 and team2. If I were using arrays I could do:
teams.each do |team|
details << team.name
end
However, I need to do this with the Hash I have shown above. How would one go about accomplishing this?
Hash[teams.map { |x| ["team_#{x.id}", x.name] }]
# => {"team_1"=>"England", "team_2"=>"Australia"}
If you want to keep id 1 and 2
Hash[a.map.with_index { |x,i| ["team_#{i.succ}", x.name] }]
# => {"team_1"=>"England", "team_2"=>"Australia"}
What about this?
teams.each_with_index do |team, idx|
id = "team#{idx + 1}"
details[id] = team.name
end
Here you take the team object and make hash key out of it, and then use that key to set a value.
How about using an inject for a one liner?
teams.inject({}){ |details, team| details["team#{team.id}"] = team.name; details }
The return value will be an Array or Hashes.
{}.tap do |h|
Match.find(1).teams.each_with_index {|t, i| h["team#{i+1}"] = t.name}
end

Convert named matches in MatchData to Hash

I have a rather simple regexp, but I wanted to use named regular expressions to make it cleaner and then iterate over results.
Testing string:
testing_string = "111x222b333"
My regexp:
regexp = %r{
(?<width> [0-9]{3} ) {0}
(?<height> [0-9]{3} ) {0}
(?<depth> [0-9]+ ) {0}
\g<width>x\g<height>b\g<depth>
}x
dimensions = regexp.match(testing_string)
This work like a charm, but heres where the problem comes:
dimensions.each { |k, v| dimensions[k] = my_operation(v) }
# ERROR !
undefined method `each' for #<MatchData "111x222b333" width:"111" height:"222" depth:"333">.
There is no .each method in MatchData object, and I really don't want to monkey patch it.
How can I fix this problem ?
I wasn't as clear as I thought: the point is to keep names and hash-like structure.
If you need a full Hash:
captures = Hash[ dimensions.names.zip( dimensions.captures ) ]
p captures
#=> {"width"=>"111", "height"=>"222", "depth"=>"333"}
If you just want to iterate over the name/value pairs:
dimensions.names.each do |name|
value = dimensions[name]
puts "%6s -> %s" % [ name, value ]
end
#=> width -> 111
#=> height -> 222
#=> depth -> 333
Alternatives:
dimensions.names.zip( dimensions.captures ).each do |name,value|
# ...
end
[ dimensions.names, dimensions.captures ].transpose.each do |name,value|
# ...
end
dimensions.names.each.with_index do |name,i|
value = dimensions.captures[i]
# ...
end
So today a new Ruby version (2.4.0) was released which includes many new features, amongst them feature #11999, aka MatchData#named_captures. This means you can now do this:
h = '12'.match(/(?<a>.)(?<b>.)(?<c>.)?/).named_captures
#=> {"a"=>"1", "b"=>"2", "c"=>nil}
h.class
#=> Hash
So in your code change
dimensions = regexp.match(testing_string)
to
dimensions = regexp.match(testing_string).named_captures
And you can use the each method on your regex match result just like on any other Hash, too.
I'd attack the whole problem of creating the hash a bit differently:
irb(main):052:0> testing_string = "111x222b333"
"111x222b333"
irb(main):053:0> hash = Hash[%w[width height depth].zip(testing_string.scan(/\d+/))]
{
"width" => "111",
"height" => "222",
"depth" => "333"
}
While regex are powerful, their siren-call can be too alluring, and we get sucked into trying to use them when there are more simple, or straightforward, ways of accomplishing something. It's just something to think about.
To keep track of the number of elements scanned, per the OPs comment:
hash = Hash[%w[width height depth].zip(scan_result = testing_string.scan(/\d+/))]
=> {"width"=>"111", "height"=>"222", "depth"=>"333"}
scan_result.size
=> 3
Also hash.size will return that, as would the size of the array containing the keys, etc.
#Phrogz's answer is correct if all of your captures have unique names, but you're allowed to give multiple captures the same name. Here's an example from the Regexp documentation.
This code supports captures with duplicate names:
captures = Hash[
dimensions.regexp.named_captures.map do |name, indexes|
[
name,
indexes.map { |i| dimensions.captures[i - 1] }
]
end
]
# Iterate over the captures
captures.each do |name, values|
# name is a String
# values is an Array of Strings
end
If you want to keep the names, you can do
new_dimensions = {}
dimensions.names.each { |k| new_dimensions[k] = my_operation(dimensions[k]) }

Nicely formatting output to console, specifying number of tabs

I am generating a script that is outputting information to the console. The information is some kind of statistic with a value. So much like a hash.
So one value's name may be 8 characters long and another is 3. when I am looping through outputting the information with two \t some of the columns aren't aligned correctly.
So for example the output might be as such:
long value name 14
short 12
little 13
tiny 123421
long name again 912421
I want all the values lined up correctly. Right now I am doing this:
puts "#{value_name} - \t\t #{value}"
How could I say for long names, to only use one tab? Or is there another solution?
Provided you know the maximum length to be no more than 20 characters:
printf "%-20s %s\n", value_name, value
If you want to make it more dynamic, something like this should work nicely:
longest_key = data_hash.keys.max_by(&:length)
data_hash.each do |key, value|
printf "%-#{longest_key.length}s %s\n", key, value
end
There is usually a %10s kind of printf scheme that formats nicely.
However, I have not used ruby at all, so you need to check that.
Yes, there is printf with formatting.
The above example should right align in a space of 10 chars.
You can format based on your widest field in the column.
printf ([port, ]format, arg...)
Prints arguments formatted according to the format like sprintf. If the first argument is the instance of the IO or its subclass, print redirected to that object. the default is the value of $stdout.
String has a built-in ljust for exactly this:
x = {"foo"=>37, "something long"=>42, "between"=>99}
x.each { |k, v| puts "#{k.ljust(20)} #{v}" }
# Outputs:
# foo 37
# something long 42
# between 99
Or, if you want tabs, you can do a little math (assuming tab display width of 8) and write a short display function:
def tab_pad(label, tab_stop = 4)
label_tabs = label.length / 8
label.ljust(label.length + tab_stop - label_tabs, "\t")
end
x.each { |k, v| puts "#{tab_pad(k)}#{v}" }
# Outputs:
# foo 37
# something long 42
# between 99
There was few bugs in it before, but now you can use most of printf syntax with % operator:
1.9.3-p194 :025 > " %-20s %05d" % ['hello', 12]
=> " hello 00012"
Of course you can use precalculated width too:
1.9.3-p194 :030 > "%-#{width}s %05x" % ['hello', 12]
=> "hello 0000c"
I wrote a thing
Automatically detects column widths
Spaces with spaces
Array of arrays [[],[],...] or array of hashes [{},{},...]
Does not detect columns too wide for console window
lists = [
[ 123, "SDLKFJSLDKFJSLDKFJLSDKJF" ],
[ 123456, "ffff" ],
]
array_maxes
def array_maxes(lists)
lists.reduce([]) do |maxes, list|
list.each_with_index do |value, index|
maxes[index] = [(maxes[index] || 0), value.to_s.length].max
end
maxes
end
end
array_maxes(lists)
# => [6, 24]
puts_arrays_columns
def puts_arrays_columns(lists)
maxes = array_maxes(hashes)
lists.each do |list|
list.each_with_index do |value, index|
print " #{value.to_s.rjust(maxes[index])},"
end
puts
end
end
puts_arrays_columns(lists)
# Output:
# 123, SDLKFJSLDKFJSLDKFJLSDKJF,
# 123456, ffff,
and another thing
hashes = [
{ "id" => 123, "name" => "SDLKFJSLDKFJSLDKFJLSDKJF" },
{ "id" => 123456, "name" => "ffff" },
]
hash_maxes
def hash_maxes(hashes)
hashes.reduce({}) do |maxes, hash|
hash.keys.each do |key|
maxes[key] = [(maxes[key] || 0), key.to_s.length].max
maxes[key] = [(maxes[key] || 0), hash[key].to_s.length].max
end
maxes
end
end
hash_maxes(hashes)
# => {"id"=>6, "name"=>24}
puts_hashes_columns
def puts_hashes_columns(hashes)
maxes = hash_maxes(hashes)
return if hashes.empty?
# Headers
hashes.first.each do |key, value|
print " #{key.to_s.rjust(maxes[key])},"
end
puts
hashes.each do |hash|
hash.each do |key, value|
print " #{value.to_s.rjust(maxes[key])},"
end
puts
end
end
puts_hashes_columns(hashes)
# Output:
# id, name,
# 123, SDLKFJSLDKFJSLDKFJLSDKJF,
# 123456, ffff,
Edit: Fixes hash keys considered in the length.
hashes = [
{ id: 123, name: "DLKFJSDLKFJSLDKFJSDF", asdfasdf: :a },
{ id: 123456, name: "ffff", asdfasdf: :ab },
]
hash_maxes(hashes)
# => {:id=>6, :name=>20, :asdfasdf=>8}
Want to whitelist columns columns?
hashes.map{ |h| h.slice(:id, :name) }
# => [
# { id: 123, name: "DLKFJSDLKFJSLDKFJSDF" },
# { id: 123456, name: "ffff" },
#]
For future reference and people who look at this or find it... Use a gem. I suggest https://github.com/wbailey/command_line_reporter
You typically don't want to use tabs, you want to use spaces and essentially setup your "columns" your self or else you run into these types of problems.

Resources