how do I map one csv to another with ruby - ruby

I have two csv's with different headers.
lets say csv 1 has headers one, two, three, four and I want to create a csv with headers five, six, seven, eight.
I'm having a hard time writing the code to open the first CSV and then creating the second CSV.
Here is the current code that I have.
require 'csv'
wmj_headers = [
"Project Number",
"Task ID",
"Task Name",
"Status Comment",
"Act Complete",
"Plan Complete",
"Description"]
jir_headers_hash = {
"Summary" => "Task Name",
"Issue key" => "Status Comment",
"Resolved" => "Act Complete",
"Due date" => "Plan Complete",
"Description" => "Description"
}
puts "Enter path to a directory of .csv files"
dir_path = gets.chomp
csv_file_names = Dir["#{dir_path}*.csv"]
csv_file_names.each do |f_path|
base_name = File.basename(f_path, '.csv')
wmj_name = "#{base_name}_wmj.csv"
arr = []
mycount = 0
CSV.open(wmj_name, "wb") do |row|
row << wmj_headers
CSV.foreach(f_path, :headers => true) do |r|
r.headers.each do |value|
if jir_headers_hash[value].nil? == false
arr << r[value]
end
end
end
row << arr
end
end

People tend to overcomplicate things. You don’t need any CSV processing at all to substitute headers.
$ cat /tmp/src.csv
one,two,three
1,2,3
4,5,6
Let’s substitute the headers and stream everything else untouched.
subst = {"one" => "ONE", "two" => "TWO", "three" => "THREE"}
src, dest = %w[/tmp/src.csv /tmp/dest.csv].map { |f| File.new f, "a+" }
headers = src.readline() # read just headers
dest.write(headers.gsub(/\b(#{Regexp.union(subst.keys)})\b/, )) # write headers
IO.copy_stream(src, dest, -1, headers.length) # stream the rest
[src, dest].each(&:close)
Check it:
$ cat /tmp/dest.csv
ONE,TWO,THREE
1,2,3
4,5,6

If you want to substitute CSV column names, here it is:
require 'csv'
# [["one", "two", "three"], ["1", "2", "3"], ["4", "5", "6"]]
csv = CSV.read('data.csv')
# new keys
ks = ['k1', 'k2', 'k3']
# [["k1", "k2", "k3"], ["1", "2", "3"], ["4", "5", "6"]]
k = csv.transpose.each_with_index.map do |x,i|
x[0] = ks[i]
x
end.transpose
# write new file
CSV.open("myfile.csv", "w") do |csv|
k.each do |row|
csv << row
end
end

Related

Searching through two multidimensional arrays and grouping together similar subarrays

I am trying to search through two multidimensional arrays to find any elements in common in a given subarray and then put the results in a third array where the entire subarrays with similar elements are grouped together (not just the similar elements).
The data is imported from two CSVs:
require 'csv'
array = CSV.read('primary_csv.csv')
#=> [["account_num", "account_name", "primary_phone", "second_phone", "status],
#=> ["11111", "John Smith", "8675309", " ", "active"],
#=> ["11112", "Tina F.", "5551234", "5555678" , "disconnected"],
#=> ["11113", "Troy P.", "9874321", " ", "active"]]
# and so on...
second_array = CSV.read('customer_service.csv')
#=> [["date", "name", "agent", "call_length", "phone", "second_phone", "complaint"],
#=> ["3/1/15", "Mary ?", "Bob X", "5:00", "5551234", " ", "rude"],
#=> ["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309" , "says shes not a customer"]]
# and so on...
If any number is present as an element in a subarray on both primary.csv and customer_service.csv, I want that entire subarray (as opposed to just the common elements), put into a third array, results_array. The desire output based upon the above sample is:
results_array = [["11111", "John Smith", "8675309", " ", "active"],
["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309" , "says shes not a customer"]] # and so on...
I then want to export the array into a new CSV, where each subarray is its own row of the CSV. I intend to iterate over each subarray by joining it with a , to make it comma delimited and then put the results into a new CSV:
results_array.each do {|j| j.join(",")}
File.open("results.csv", "w") {|f| f.puts results_array}
#=> 11111,John Smith,8675309, ,active
#=> 3/2/15,Mrs. Smith,Stew,1:45,9995678,8675309,says shes not a customer
# and so on...
How can I achieve the desired output? I am aware that the final product will look messy because similar data (for example, phone number) will be in different columns. But I need to find a way to generally group the data together.
Suppose a1 and a2 are the two arrays (excluding header rows).
Code
def combine(a1, a2)
h2 = a2.each_with_index
.with_object(Hash.new { |h,k| h[k] = [] }) { |(arr,i),h|
arr.each { |e| es = e.strip; h[es] << i if number?(es) } }
a1.each_with_object([]) do |arr, b|
d = arr.each_with_object([]) do |str, d|
s = str.strip
d.concat(a2.values_at(*h2[s])) if number?(s) && h2.key?(s)
end
b << d.uniq.unshift(arr) if d.any?
end
end
def number?(str)
str =~ /^\d+$/
end
Example
Here is your example, modified somewhat:
a1 = [
["11111", "John Smith", "8675309", "", "active" ],
["11112", "Tina F.", "5551234", "5555678", "disconnected"],
["11113", "Troy P.", "9874321", "", "active" ]
]
a2 = [
["3/1/15", "Mary ?", "Bob X", "5:00", "5551234", "", "rude"],
["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309", "surly"],
["3/7/15", "Cher", "Sonny", "7:45", "9874321", "8675309", "Hey Jude"]
]
combine(a1, a2)
#=> [[["11111", "John Smith", "8675309", "",
# "active"],
# ["3/2/15", "Mrs. Smith", "Stew", "1:45",
# "9995678", "8675309", "surly"],
# ["3/7/15", "Cher", "Sonny", "7:45",
# "9874321", "8675309", "Hey Jude"]
# ],
# [["11112", "Tina F.", "5551234", "5555678",
# "disconnected"],
# ["3/1/15", "Mary ?", "Bob X", "5:00",
# "5551234", "", "rude"]
# ],
# [["11113", "Troy P.", "9874321", "",
# "active"],
# ["3/7/15", "Cher", "Sonny", "7:45",
# "9874321", "8675309", "Hey Jude"]
# ]
# ]
Explanation
First, we define a helper:
def number?(str)
str =~ /^\d+$/
end
For example:
number?("8675309") #=> 0 ("truthy)
number?("3/1/15") #=> nil
Now index a2 on the values that represent numbers:
h2 = a2.each_with_index
.with_object(Hash.new { |h,k| h[k] = [] }) { |(arr,i),h|
arr.each { |e| es = e.strip; h[es] << i if number?(es) } }
#=> {"5551234"=>[0], "9995678"=>[1], "8675309"=>[1, 2], "9874321"=>[2]}
This says, for example, that the "numeric" field "8675309" is contained in elements at offsets 1 and 2 of a2 (i.e, for Mrs. Smith and Cher).
We can now simply run through the elements of a1 looking for matches.
The code:
arr.each_with_object([]) do |str, d|
s = str.strip
d.concat(a2.values_at(*h2[s])) if number?(s) && h2.key?(s)
end
steps through the elements of arr, assigning each to the block variable str. For example, if arr holds the first element of a1 str will in turn equals "11111", "John Smith", and so on. After s = str.strip, this says that if a s has a numerical representation and there is a matching key in h2, the (initially empty) array d is concatenated with the elements of a2 given by the value of h2[s].
After completing this loop we see if d contains any elements of a2:
b << d.uniq.unshift(arr) if d.any?
If it does, we remove duplicates, prepend the array with arr and save it to b.
Note that this allows one element of a2 to match multiple elements of a1.

Convert an Array of Strings to a Hash in Ruby

I have an Array that contains strings:
["First Name", "Last Name", "Location", "Description"]
I need to convert the Array to a Hash, as in the following:
{"A" => "First Name", "B" => "Last Name", "C" => "Location", "D" => "Description"}
Also, this way too:
{"First Name" => "A", "Last Name" => "B", "Location" => "C", "Description" => "D"}
Any thoughts how to handle this the best way?
You could implement as follows
def string_array_to_hash(a=[],keys=false)
headers = ("A".."Z").to_a
Hash[keys ? a.zip(headers.take(a.count)) : headers.take(a.count).zip(a)]
end
Then to get your initial output it would be
a = ["First Name", "Last Name", "Location", "Description"]
string_array_to_hash a
#=> {"A"=>"First Name", "B"=>"Last Name", "C"=>"Location", "D"=>"Description"}
And second output is
a = ["First Name", "Last Name", "Location", "Description"]
string_array_to_hash a, true
#=> {"First Name"=>"A", "Last Name"=>"B", "Location"=>"C", "Description"=>"D"}
Note: this will work as long as a is less than 27 Objects otherwise you will have to specify a different desired output. This is due to the fact that a) the alphabet only has 26 letters b) Hash objects can only have unique keys.
You could do this:
arr = ["First Name", "Last Name", "Location", "Description"]
letter = Enumerator.new do |y|
l = ('A'.ord-1).chr
loop do
y.yield l=l.next
end
end
#=> #<Enumerator: #<Enumerator::Generator:0x007f9a00878fd8>:each>
h = arr.each_with_object({}) { |s,h| h[letter.next] = s }
#=> {"A"=>"First Name", "B"=>"Last Name", "C"=>"Location", "D"=>"Description"}
h.invert
#=> {"First Name"=>"A", "Last Name"=>"B", "Location"=>"C", "Description"=>"D"}
or
letter = ('A'.ord-1).chr
#=> "#"
h = arr.each_with_object({}) { |s,h| h[letter = letter.next] = s }
#=> {"A"=>"First Name", "B"=>"Last Name", "C"=>"Location", "D"=>"Description"}
When using the enumerator letter, we have
27.times { puts letter.next }
#=> "A"
# "B"
# ...
# "Z"
# "AA"
If you are not being specific about keys name then you could try this out
list = ["First Name", "Last Name", "Location", "Description"]
Hash[list.map.with_index{|*x|x}].invert
Output
{0=>"First Name", 1=>"Last Name", 2=>"Location", 3=>"Description"}
Similar solutions is here.
Or..You also can try this :)
letter = 'A'
arr = ["First Name", "Last Name", "Location", "Description"]
hash = {}
arr.each { |i|
hash[i] = letter
letter = letter.next
}
// => {"First Name"=>"A", "Last Name"=>"B", "Location"=>"C", "Description"=>"D"}
or
letter = 'A'
arr = ["First Name", "Last Name", "Location", "Description"]
hash = {}
arr.each { |i|
hash[letter] = i
letter = letter.next
}
// => {"A"=>"First Name", "B"=>"Last Name", "C"=>"Location", "D"=>"Description"}

How to replace CSV headers

If using the 'csv' library in ruby, how would you replace the headers without re-reading in a file?
foo.csv
'date','foo',bar'
1,2,3
4,5,6
Using a CSV::Table because of this answer
Here is a working solution, however it requires writing and reading from a file twice.
require 'csv'
#csv = CSV.table('foo.csv')
# Perform additional operations, like remove specific pieces of information.
# Save fixed csv to a file (with incorrect headers)
File.open('bar.csv','w') do |f|
f.write(#csv.to_csv)
end
# New headers
new_keywords = ['dur','hur', 'whur']
# Reopen the file, replace the headers, and print it out for debugging
# Not sure how to replace the headers of a CSV::Table object, however I *can* replace the headers of an array of arrays (hence the file.open)
lines = File.readlines('bar.csv')
lines.shift
lines.unshift(new_keywords.join(',') + "\n")
puts lines.join('')
# TODO: re-save file to disk
How could I modify the headers without reading from disk twice?
'dur','hur','whur'
1,x,3
4,5,x
Update
For those curious, here is the unabridged code. In order to use things like delete_if() the CSV must be imported with the CSV.table() function.
Perhaps the headers could be changed by converting the csv table into an array of arrays, however I'm not sure how to do that.
Given a test.csv file whose contents look like this:
id,name,age
1,jack,8
2,jill,9
You can replace the header row using this:
require 'csv'
array_of_arrays = CSV.read('test.csv')
p array_of_arrays # => [["id", "name", "age"],
# => ["1", "jack", "26"],
# => ["2", "jill", "27"]]
new_keywords = ['dur','hur','whur']
array_of_arrays[0] = new_keywords
p array_of_arrays # => [["dur", "hur", "whur"],
# => ["1", " jack", " 26"],
# => ["2", " jill", " 27"]]
Or if you'd rather preserve your original two-dimensional array:
new_array = Array.new(array_of_arrays)
new_array[0] = new_keywords
p new_array # => [["dur", "hur", "whur"],
# => ["1", " jack", " 26"],
# => ["2", " jill", " 27"]]
p array_of_arrays # => [["id", "name", "age"],
# => ["1", "jack", "26"],
# => ["2", "jill", "27"]]

How to use a Hash before defining it in Ruby

I have a Ruby script with a very long Hash with more than 300 associations.
The script looks like this:
#!/usr/bin/env ruby
Array_A = []
myHash = {
"x1" => "2",
"x2" => "0",
"x3" => "1",
.
.
.
"X350" => "1"
}
myHash.keys.each do |z|
Array_A << "This is key " + z
end
puts myHash.values.join("|")
puts Array_A.join("|")
But since the Hash is very large, for reading purposes I'd like to put the Hash at the end of the script and the each loop and puts command first, something like this:
Array_A = []
myHash.keys.each do |z|
Array_A << "This is key " + z
end
puts myHash.values.join("|")
puts Array_A.join("|")
myHash = {
"x1" => "2",
"x2" => "0",
"x3" => "1",
.
.
.
"X350" => "1"
}
Is there a way to do this?
It's a little bit weird, but this is basically what DATA is for. The catch is that it's a file containing the contents of the section after __END__, so you'll need to go from that to a hash. So something like:
Array_A = []
myHash = eval DATA.read
myHash.keys.each do |z|
Array_A << "This is key " + z
end
puts myHash.values.join("|")
puts Array_A.join("|")
__END__
{
"x1" => "2",
"x2" => "0",
"x3" => "1",
…
"X350" => "1"
}

Sequentially parse array to hash in Ruby

I have an array that looks like this:
array = [
"timestamp 1",
"data 1",
"data 2",
"data 3",
"timestamp 2",
"data "1",
"timestamp 3",
".."
]
etc
I want to loop through my array, and turn it into a hash data structure that looks like:
hash = {
"timestamp 1" => [ "data 1", " data 2", "data 3" ],
"timestamp 2" => [ "data 1" ],
}
I can't figure out a good "rubyish" way of doing it. I'm looping through the array, and I just quite can't seem to figure out how to keep track of where I am at, and assign to the hash as needed.
# Let's comb through the array, and map the time value to the subsequent lines beneath
array.each do |e|
if timestamp?(e)
hash["#{e}"] == nil
else
# last time stamp here => e
end
EDIT: Here is the timestamp? method
def timestamp?(string)
begin
return true if string =~ /[a-zA-z][a-z][a-z]\s[a-zA-z][a-z][a-z]\s\d\d\s\d\d:\d\d:\d\d\s\d\d\d\d/
false
rescue => msg
puts "Error in timestamp? => #{msg}"
exit
end
end
array = [
"timestamp 1",
"data 1",
"data 2",
"data 3",
"timestamp 2",
"data 1",
"timestamp 3",
"data 2"
]
hsh = {}
ary = []
array.each do |line|
if line.start_with?("timestamp")
ary = Array.new
hsh[line] = ary
else
ary << line
end
end
puts hsh.inspect
I would do as below:
array = [
"timestamp 1",
"data 1",
"data 2",
"data 3",
"timestamp 2",
"data 1",
]
Hash[array.slice_before{|i| i.include? 'timestamp'}.map{|a| [a.first,a[1..-1]]}]
# => {"timestamp 1"=>["data 1", "data 2", "data 3"], "timestamp 2"=>["data 1"]}
Hash[array.slice_before{|e| e.start_with?("timestamp ")}.map{|k, *v| [k, v]}]
Output
{
"timestamp 1" => [
"data 1",
"data 2",
"data 3"
],
"timestamp 2" => ["data 1"],
"timestamp 3" => [".."]
}
You can keep track of the last hash key using an outside variable. It will be persisted across all iterations:
h = {}
last_group = nil
array.each do |e|
if timestamp?(e)
array[e] = []
last_group = e
else
h[last_group] << e
end
end
last_timestamp = nil
array.reduce(Hash.new(){|hsh,k| hsh[k]=[]}) do |hsh, m|
if m =~ /timestamp/
last_timestamp = m
else
hsh[last_timestamp] << m
end
hsh
end
hash = (Hash.new { |this, key| this[key] = [] } ).tap do |hash|
current_timestamp = nil
array.each do |element|
current_timestamp = element if timestamp? element
hash[current_timestamp] << element unless timestamp? element
end
end
Using an outside variable to keep track of the current timestamp, but wrapping it in a closure to avoid polluting the namespace.
I know this has already been answered, but there are so many ways to do this.
I prefer these two ways, they might not be fast but i find them readable:
my_hash = Hash.new
array.slice_before(/timestamp/).each do |array|
key, *values = array
my_hash[key] = values
end
or
one_liner = Hash[array.slice_before(/timestamp/).map{|x|[x.shift, x]}]

Resources