Split multiple CSV cell values from multiple columns using Ruby

Split multiple CSV cell values from multiple columns using Ruby - ruby

I'm trying to split multiple values in a CSV cell. I can do it right if the multiple values in a cell is found in a single column only, but I'm having difficulty doing it if the multiple values are found in multiple columns. Any guidance will be appreciated.
Here's the sample of the data I'm trying to split:
| Column A | Column B |
|Value1, Value2, Value3 | Value3, Value4, Value5 |
|Value6 | Value7, Value8 |
I'm aiming to have a result like this:
| Column A | Column B |
| Value1 | Value3 |
| Value2 | Value4 |
| Value3 | Value5 |
| Value6 | Value7 |
| Value6 | Value8 |
Here's my code:
require 'csv'
split_a = []
split_b = []
def split_values(value)
value = value.to_s
value = value.gsub('/', ',').gsub('|', ',').gsub(' ', ',')
return value.split(',').map(&:strip)
end
source_csv = kendo_shipment = CSV.read('source_file.csv', headers: true, header_converters: :symbol, liberal_parsing: true).map(&:to_h)
source_csv.each do |source_csv|
column_a = source_csv[:column_a]
column_b = source_csv[:column_b]
column_a = split_values(column_a)
column_a.each do |column_a|
next if column_a.nil? || column_a.empty?
split_a << [
column_a: column_a,
column_b: column_b
]
end
end
split_a.each do |key, split_a|
column_a = key[:column_a]
column_b = key[:column_b]
column_b = split_values(column_b)
column_b.each do |column_b|
next if column_b.nil? || column_b.empty?
split_b << [
column_a,
column_b
]
end
end

There is a special option to define a column separator col_sep: '|' it simplifies the code.
require 'csv'
source_csv = CSV.read('source_file.csv', col_sep: '|', headers: true, header_converters: :symbol, liberal_parsing: true)
split_a = []
split_b = []
# I just assign values to separate arrays, because I am not sure what kind of data structure you want to get at the end.
source_csv.each do |row|
split_a += row[:column_a].split(',').map(&:strip)
split_b += row[:column_b].split(',').map(&:strip)
end
# The result
split_a
# => ["Value1", "Value2", "Value3", "Value6"]
split_b
# => ["Value3", "Value4", "Value5", "Value7", "Value8"]

Here is the code:
require 'csv'
source_csv = CSV.read('source_file.csv',
col_sep: '|',
headers: true,
header_converters: lambda {|h| h.strip},
liberal_parsing: true
)
COLUMN_NAMES = ['Column A', 'Column B']
# Column values
columns = COLUMN_NAMES.map do |col_name|
source_csv&.map do |row|
row[col_name]&.split(',')&.map(&:strip)
end&.flatten(1) || []
end
# repeat the last value in case the number of values in the columns differs:
vals_last_id = columns.map {|col| col.count}.max - 1
columns.each do |col|
# replace `col.last` with `nil` on the next line if you want to leave the value blank
col.fill(col.last, col.length..vals_last_id) if col.length <= vals_last_id
end
values = columns[0].zip(*columns[1..-1])
# result:
pp values; 1
# [["Value1", "Value3"],
# ["Value2", "Value4"],
# ["Value3", "Value5"],
# ["Value6", "Value7"],
# ["Value6", "Value8"]]
Generate CSV text, with | (pipe) delimiter instead comma:
csv = CSV.new( '',
col_sep: '|',
headers: COLUMN_NAMES,
write_headers: true
);
values.each {|row| csv << row};
puts csv.string
# Column A|Column B
# Value1|Value3
# Value2|Value4
# Value3|Value5
# Value6|Value7
# Value6|Value8
Formatted output:
col_val = [COLUMN_NAMES] + values
col_widths = (0..(COLUMN_NAMES.count - 1)).map do |col_id|
col_val.map {|row| row[col_id]&.length || 0}.max
end
fmt = "|" + col_widths.map {|w| " %-#{w}s "}.join('|') + "|\n"
col_val.each {|row| printf fmt % row}; 1
# | Column A | Column B |
# | Value1 | Value3 |
# | Value2 | Value4 |
# | Value3 | Value5 |
# | Value6 | Value7 |
# | Value6 | Value8 |

As you want the output to be a CSV file I would suggest that it look like:
Column A|Column B
Value1|Value3
Value2|Value4
Value3|Value5
Value6|Value7
Value6|Value8
rather than
| Column A | Column B |
| Value1 | Value3 |
| Value2 | Value4 |
| Value3 | Value5 |
| Value6 | Value7 |
| Value6 | Value8 |
Enclosing each line with column separators and adding unnecessary spaces makes it unnecessarily difficult to extract the text of interest from the file.
Let's begin by creating the file you are given, though I have modified your example to make it easier to follow what is happening.
str=<<~_
| Column A | Column B |
| A1, A2, A3 | B1, B2, B3 |
| A4 | B4, B5 |
_
IN_NAME = 'in.csv'
OUT_NAME = 'out.csv'
File.write(IN_NAME, str)
#=> 84
See IO::write1.
As the structure of this file resembles a CSV file only vaguely, I think it's easiest to read it using ordinary file I/O methods.
header, *body = IO.foreach(IN_NAME, chomp: true).with_object([]) do |line,arr|
arr << line.gsub(/^\| *| *\|$/, '')
.split(/ *\| */)
.flat_map { |s| s.split(/, +/) }
end
(I provide an explanation of this calculation later.) This results in the following:
header
#=> ["Column A", "Column B"]
body
#=> [["A1", "A2", "A3", "B1", "B2", "B3"], ["A4", "B4", "B5"]]
See IO::foreach, Enumerator#with_object and Enumerable#flat_map. Note that foreach without a block returns an enumerator that I have chained to with_object.
At this point it is convenient to compute the number of rows to be written to the output file after the header row.
mx = body.map(&:size).max
#=> 6
Next we need to modify body to make it suitable for writing the output CSV file.
mod_body = Array.new(body.size) do |i|
Array.new(mx) { |j| body[i][j] || body[i].last }
end.transpose
#=> [["A1", "A4"], ["A2", "B4"], ["A3", "B5"], ["B1", "B5"],
# ["B2", "B5"], ["B3", "B5"]]
See Array::new.
It is now a simple matter to write the output CSV file.
require 'csv'
CSV.open(OUT_NAME, "wb", col_sep: '|', headers: header, write_headers: true) do |csv|
mod_body.each { |a| csv << a }
end
See CSV::open.
Lastly, let's look at the file that was written.
puts File.read(OUT_NAME)
diplays
Column A|Column B
A1|A4
A2|B4
A3|B5
B1|B5
B2|B5
B3|B5
See IO::read1.
To explain the calculations made in
header, *body = IO.foreach(IN_NAME, chomp: true).with_object([]) do |line,arr|
arr << line.gsub(/^\| *| *\|$/, '')
.split(/ *\| */)
.flat_map { |s| s.split(/, +/) }
end
it is easiest to run it with some puts statements inserted.
header, *body = IO.foreach(IN_NAME, chomp: true).with_object([]) do |line,arr|
puts "line = #{line}"
puts "arr = #{arr}"
arr << line.gsub(/^\| *| *\|$/, '')
.tap { |l| puts " line after gsub = #{l}" }
.split(/ *\| */)
.tap { |a| puts " array after split = #{a}" }
.flat_map { |s| s.split(/, +/) }
.tap { |a| puts " array after flat_map = #{a}" }
end
#=> [["Column A", "Column B"], ["A1", "A2", "A3", "B1", "B2", "B3"],
# ["A4", "B4", "B5"]]
The following is displayed.
line = | Column A | Column B |
arr = []
line after gsub = Column A | Column B
array after split = ["Column A", "Column B"]
array after flat_map = ["Column A", "Column B"]
line = | A1, A2, A3 | B1, B2, B3 |
arr = [["Column A", "Column B"]]
line after gsub = A1, A2, A3 | B1, B2, B3
array after split = ["A1, A2, A3", "B1, B2, B3"]
array after flat_map = ["A1", "A2", "A3", "B1", "B2", "B3"]
line = | A4 | B4, B5 |
arr = [["Column A", "Column B"], ["A1", "A2", "A3", "B1", "B2", "B3"]]
line after gsub = A4 | B4, B5
array after split = ["A4", "B4, B5"]
array after flat_map = ["A4", "B4", "B5"]
1. IO methods are commonly invoked on File. That is permissible since File.superclass #=> IO.

Related

Search for a string in a file and read the line after string match

I would like to look for a string in a file and output the rest of the line after this string - but without newline etc.
[File content]
[Test]
key1 = value
key2 = value2
[Test2]
key1 = value3
key3 = value4 value5
For example, I would like to read "value" when I search for Key1 in the Test section. When I search for key3 in the section Test2 I would like to get "value4 value5" as string.
My current approach is:
section = Test
key = key1
text = File.read(path)
# read complete section in value
sect_value = text.scan(/^[section]=(.+)$/).flatten.first
# read specific key from section
value = sect_value.scan(/^key =(.+)$/)
Is there a configparser which is already installed by the ruby package (without adding further gems)?

Let's first construct a file.
FNAME = 't'
File.write(FNAME, <<~END
cat
[Test]
key1 = value
key2 = value2
[Test2]
key1 = value3
key3 = value4 value5
dog
END
)
#=> 90
The following method could be used to extract the values of interest.
def extract(section, key)
rkey_match = /\A *#{key} *= /
rkey_get = /\A *#{key} *= *\K.*/
section_found = false
File.foreach(FNAME, chomp:true) do |line|
if line.strip == section
section_found = true
elsif section_found
return line[rkey_get] if line.match?(rkey_match)
end
end
nil
end
extract('[Test]', 'key1')
#=> "value"
extract('[Test2]', 'key3')
#=> "value4 value5"
extract('[Test]', 'key7')
#=> nil
extract('[Test3]', 'key1')
#=> nil
For
section = "[Test2]"
key = "key3"
the regular expression are computed as follows.
rkey_match = /\A *#{key} *= /
#=> /\A *cat *= /
rkey_get = /\A *#{key} *= *\K.*/
#=> /\A *cat *= *\K.*/
In rkey_get, \K loosely means to forget everything matched so far.
Depending on requirements some adustments may be needed in the construction of the regular expression.

Is this what you are expecting ?
require 'FileUtils'
section = ARGV[0]
v_key = ARGV[1]
section_found = "n"
f = File.open("myinputfile.txt", "r")
f.each_line do |line|
if line.include? section
section_found = 'y'
end
if section_found == 'y'
if line.include? v_key
puts "\nValues for #{section} - #{v_key} :::::::::"
puts(line.split('=')[1])
break # Necessary to prevent reading same keys from subsequent sections
end
end
end
f.close()
Output
$ ruby line_after_string_match.rb Test key1
Values for Test - key1 :::::::::
value
$ ruby line_after_string_match.rb Test2 key3
Values for Test2 - key3 :::::::::
value4 value5

Join two CSV files in Ruby without using tables

I have 2 CSV files with columns like A, B, C.. & D, E, F. I want to join these two CSV files into a new file with rows where File1.B = File2.E and the row would have columns A, B/E, C, D, F. How can I achieve this JOIN without using tables?

Givens
We are given the following.
The paths for the two input files:
fname1 = 't1.csv'
fname2 = 't2.csv'
The path for the output file:
fname3 = 't3.csv'
The names of the headers to match in each of the two input files:
target1 = 'B'
target2 = 'E'
I do assume that (as is the case with the example) the two files necessarily contain the same number of lines.
Create test files
Let's first create the two files:
str = [%w|A B C|, %w|1 1 1|, %w|2 2 2|, %w|3 4 5|, %w|6 9 9|].
map { |a| a.join(",") }.join("\n")
#=> "A,B,C\n1,1,1\n2,2,2\n3,4,5\n6,9,9"
File.write(fname1, str)
#=> 29
str = [%w|D E F|, %w|21 1 41|, %w|22 5 42|, %w|23 8 45|, %w|26 9 239|].
map { |a| a.join(",") }.join("\n")
#=> "D,E,F\n21,1,41\n22,5,42\n23,8,45\n26,9,239"
File.write(fname2, str)
#=> 38
Read the input files into CSV::Table objects
When reading fname1 I will use the :header_converters option to convert the header "B" to "B/E". Note that this does not require knowledge of the location of the column with header "B" (or whatever it may be).
require 'csv'
new_target1 = target1 + "/" + target2
#=> "B/E"
csv1 = CSV.read(fname1, headers: true,
header_converters: lambda { |header| header==target1 ? new_target1 : header})
csv2 = CSV.read(fname2, headers: true)
Construct arrays of headers to be written from each input file
headers1 = csv1.headers
#=> ["A", "B/E", "C"]
headers2 = csv2.headers - [target2]
#=> ["D", "F"]
Create the output file
We will first write the new headers headers1 + headers2 to the output file.
Next, for each row index i (i = 0 corresponding to the first row after the header row in each file), for which a condition is satisfied, we write as a single row the elements of csv1[i] and csv2[i] that are in the columns having headers in headers1 and headers2. The condition to be satisfied to write the rows at index i is that i satisfies:
csv1[i][new_target1] == csv2[i][target2] #=> true
Now open fname3 for writing, write the headers and then the body.
CSV.open(fname3, 'w') do |csv|
csv << headers1 + headers2
[csv1.size, csv2.size].min.times do |i|
csv << (headers1.map { |h| csv1[i][h] } +
headers2.map { |h| csv2[i][h] }) if
csv1[i][new_target1] == csv2[i][target2]
end
end
#=> 4
Let's confirm that what was written is correct.
puts File.read(fname3)
A,B/E,C,D,F
1,1,1,21,41
6,9,9,26,239

If you have CSV files like these:
first.csv:
A | B | C
1 | 1 | 1
2 | 2 | 2
3 | 4 | 5
6 | 9 | 9
second.csv:
D | E | F
21 | 1 | 41
22 | 5 | 42
23 | 8 | 45
26 | 9 | 239
You can do something like this:
require 'csv'
first = CSV.read('first.csv')
second = CSV.read('second.csv')
CSV.open("result.csv", "w") do |csv|
csv << %w[A B.E C D F]
first.each do |rowF|
second.each do |rowS|
csv << [rowF[0],rowF[1],rowF[2],rowS[0],rowS[2]] if rowF[1] == rowS[1]
end
end
end
To get this:
result.csv:
A | B.E | C | D | F
1 | 1 | 1 | 21 | 41
6 | 9 | 9 | 26 | 239

The answer is to use group by to create a hash table and then iterate over the keys of the hash table. Assuming the column you're joining on is unique in each table:
join_column = "whatever"
csv1 = CSV.table("file1.csv").group_by { |r| r[join_column] }
csv2 = CSV.table("file2.csv").group_by { |r| r[join_column] }
joined_data = csv1.keys.sort.map do |join_column_values|
csv1[join_column].first.merge(csv2[join_column].first)
end
If the column is not unique in each table, then you need to decide how you want to handle those cases since there will be more than just the first element in the arrays csv1[join_column] and csv2[join_column]. You could do an O(mxn) join as suggested in one of the other answers (i.e. nested map calls), or you could filter or combine them based on some criteria. The choice really depends on your usecase.

How to create multiple columns per row in CSV with Ruby

Ok, I have a hash which contains several properties. I wanted certain properties of this hash to be added to a CSV file.
Here's what I've written:
require 'csv'
require 'curb'
require 'json'
arr = []
CSV.foreach('test.csv') do | row |
details = []
details << result['results'][0]['formatted_address']
result['results'][0]['address_components'].each do | w |
details << w['short_name']
end
arr << details
end
CSV.open('test_result.csv', 'w') do | csv |
arr.each do | e |
csv << [e]
end
end
end
All works fine apart from the fact the I get each like so:
["something", "300", "something", "something", "something", "something", "something", "GB", "something"]
As an array, which I do not want. I want each element of the array in a new column. The problem is that I do not know how many items I'll have otherwise I could something like this:
CSV.open('test_result.csv', 'w') do | csv |
arr.each do | e |
csv << [e[0], e[1], ...]
end
end
end
Any ideas?

Change csv << [e] to csv << e.

Parse mysql2 gem SELECT query

I am trying to parse some output from a query using the mysql2 gem.
Previously, I would use:
response = JSON.parse(response.body)
a = response.map{|s| {label: s['Category'], value: s['count'].to_i} }
Now with the mysql2 query:
results = db.query(sql)
results.map do |row|
puts row
end
Output
{"Category"=>"Food", "count"=>22}
{"Category"=>"Drinks", "count"=>12}
{"Category"=>"Alcohol", "count"=>9}
{"Category"=>"Home", "count"=>7}
{"Category"=>"Work", "count"=>2}
'Category' to ':label' and 'count' to ':value'.
results = db.query(sql)
results.map do |row|
{label: row['Category'], value: row['count'].to_i} }
end
Desired Output
{:label=>"Food", :value=>22}
{:label=>"Drinks", :value=>12}
{:label=>"Alcohol", :value=>9}
{:label=>"Home", :value=>7}
{:label=>"Work", :value=>2}

You have two mistakes in your code:
1) You have two closing braces:
# HERE
# | |
results.map do |row| # V V
{label: row['Category'], value: row['count'].to_i} }
end
2) map() returns an array, and you don't save the array anywhere, so ruby discards it.
records = results.map do |row|
{label: row['Category'], value: row['count'].to_i }
end
p records
Here's the proof:
mysql> select * from party_supplies;
+----+----------+-------+
| id | Category | count |
+----+----------+-------+
| 1 | Food | 22 |
| 2 | Drinks | 12 |
+----+----------+-------+
2 rows in set (0.00 sec)
.
require 'mysql2'
client = Mysql2::Client.new(
host: "localhost",
username: "root",
database: "my_db",
)
results = client.query("SELECT * FROM party_supplies")
records = results.map do |row|
{ label: row['Category'], value: row['count'] }
end
p records
--output:--
[{:label=>"Food", :value=>22}, {:label=>"Drinks", :value=>12}]
Note that your output indicates the 'count' field is already an int, so calling to_i() is redundant.

Iteration over hash, with array as key

Suppose I have the following code:
#!/usr/bin/env ruby -wKU
h = {}
h[[1, "a"]] = "first"
h[[2, "b"]] = "second"
puts h.to_yaml
# case 1 - works fine
h.each do |k, v|
num, char = k
puts "key = #{[num, char]}; value = #{v}"
end
# case 2 - works fine
h.each_key do | num, char |
puts "key = #{[num, char]}; value = #{h[[num, char]]}"
end
# case 3 - Doesn't work
# how can I get all three values in one go?
h.each do | [num, char], v |
puts "key = #{[num, char]}; value = #{v}"
end
How would I create an iterator where I could get all 3 values (key[0], key[1], value) assigned in the block parameters? Is this even possible?

h.each do | (num, char), v |
puts "key = #{[num, char]}; value = #{v}"
end

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Split multiple CSV cell values from multiple columns using Ruby - ruby

Related

Search for a string in a file and read the line after string match

Join two CSV files in Ruby without using tables

How to create multiple columns per row in CSV with Ruby

Parse mysql2 gem SELECT query

Iteration over hash, with array as key

Categories

Resources