How to extract multiple columns from a csv file with ruby? - ruby

Right now I can extract 1 column (column 6) from the csv file. How could I edit the script below to extract more than 1 column? Let's say I also want to extract column 9 and 10 as well as 6. I would want the output to be such that column 6 ends up in column 1 of the output file, 9 in the 2nd column of the output file, and column 10 in the 3rd column of the output file.
ruby -rcsv -e 'CSV.foreach(ARGV.shift) {|row| puts row [5]}' input.csv &> output.csv

Since row is an array, your question boils down to how to pick certain elements from an array; this is not related to CSV.
You can use values_at:
row.values_at(5,6,9,10)
returns the fields 5,6,9 and 10.
If you want to present these picked fields in a different order, it is however easier to map each index explicitly:
output_row = Array.new(row.size) # Or row.dup, depending on your needs
output_row[1] = row[6]
# Or, if you have used row.dup and want to swap the rows:
output_row[1],output_row[6] = row[6],row[1]
# and so on
out_csv.puts(output_row)
This assumes that you have defined before
out_csv=CSV.new(STDOUT)
since you want to have your new CSV be created on standard output.

Let's first create a (header-less) CSV file:
enum = 1.step
FNameIn = 't_in.csv'
CSV.open(FNameIn, "wb") { |csv| 3.times { csv << 5.times.map { enum.next } } }
#=> 3
I've assumed the file contains string representations of integers.
The file contains the three lines:
File.read(FNameIn).each_line { |line| p line }
"1,2,3,4,5\n"
"6,7,8,9,10\n"
"11,12,13,14,15\n"
Now let's extract the columns at indices 1 and 3. These columns are to be written to the output file in that order.
cols = [1, 3]
Now write to the CSV output file.
arr = CSV.read(FNameIn, converters: :integer).
map { |row| row.values_at(*cols) }
#=> [[2, 4], [7, 9], [12, 14]]
FNameOut = 't_out.csv'
CSV.open(FNameOut, 'wb') { |csv| arr.each { |row| csv << row } }
We have written three lines:
File.read(FNameOut).each_line { |line| p line }
"2,4\n"
"7,9\n"
"12,14\n"
which we can read back into an array:
CSV.read(FNameOut, converters: :integer)
#=> [[2, 4], [7, 9], [12, 14]]
A straightforward transformation of these operations is required to perform these operations from the command line.

Related

ruby - create CSV::Table from 2d array

I have a 2D array... is their any way to create CSV::Table with first row considered as headers and assuming all rows has same number of headers.
You can create a CSV::Table object with headers from a 2D array using CSV.parse.
First convert your 2d array to a string where the values in each row are joined by a comma, and each row is joined by a newline, then pass that string to CSV.parse along with the headers: true option
require 'csv'
sample_array = [
["column1", "column2", "column3"],
["r1c1", "r1c2", "r1c3"],
["r2c1", "r2c2", "r2c3"],
["r3c1", "r3c2", "r3c3"],
]
csv_data = sample_array.map {_1.join(",")}.join("\n")
table = CSV.parse(csv_data, headers: true)
p table
p table.headers
p table[0]
p table[1]
p table[2]
=>
#<CSV::Table mode:col_or_row row_count:4>
["column1", "column2", "column3"]
#<CSV::Row "column1":"r1c1" "column2":"r1c2" "column3":"r1c3">
#<CSV::Row "column1":"r2c1" "column2":"r2c2" "column3":"r2c3">
#<CSV::Row "column1":"r3c1" "column2":"r3c2" "column3":"r3c3">
Below is the basic example to create CSV file in ruby:
hash = {a: [1, 2, 3], b: [4, 5, 6]}
require 'csv'
CSV.open("my_file.csv", "wb") do |csv|
csv << %w(header1 header2 header3)
hash.each_value do |array|
csv << array
end
end
#diwanshu-tyagi will this help to resolve your question? if not please add example of your input value, I'll update this answer.
Thanks

How to write a CSV without headers

I have a file that looks like this:
milk 7,dark 0,white 0,sugar free 1
milk 0,dark 3,white 0,sugar free 0
milk 0,dark 3,white 0,sugar free 5
milk 0,dark 1,white 5,sugar free 3
There are no headers in that CSV. However, every time I open this file in a program like Numbers, it looks like this:
Is this a problem with how I wrote data to the CSV or a problem with the CSV program... it probably is because the choice of delimiter is wrong.
My code when writing looks like this:
def self.write(output_path:, data:, write_headers: false)
CSV.open(output_path, "w", write_headers: write_headers) do |csv|
data.each do |row|
csv << row
end
end
end
What does the write_headers part even do?
Is this a problem with how I wrote data to the CSV or a problem with the CSV program... it probably is because the choice of delimiter is wrong.
Your CSV data is perfectly fine. CSV is only loosely standardized and there's no concept of marking a header row as such within the file. It cannot be distinguished from any other row, syntax-wise.
What does the write_headers part even do?
It's used together with headers. From the docs for CSV.new:
:write_headers
When true and :headers is set, a header row will be added to the output.
Example:
require 'csv'
CSV.generate(headers: %w[a b c], write_headers: true) do |csv|
csv << [1, 2, 3]
csv << [4, 5, 6]
end
#=> "a,b,c\n1,2,3\n4,5,6\n"
as opposed to:
CSV.generate(headers: %w[a b c], write_headers: false) do |csv|
csv << [1, 2, 3]
csv << [4, 5, 6]
end
#=> "1,2,3\n4,5,6\n"
write_headers will add a header row to the output when it's set true. I think setting headers: false in your options hash will fix your issue (when it's set true, it treats the first line of the CSV as headers)

Exporting unequal number of array output to a file using Ruby

I have keys and data [sic] as follows, which I need to export in a text file.
keys = %w[ID No time]
Data = ["a", ["1", "2", "3", "4"], 20]
My desired output is:
ID No time
a 1 20
a 2 20
a 3 20
a 4 20
I had attempted the following code so far:
File.open('test1.txt', 'w') {|f| f.write Data.join("\t")}
But it doesn't show my desired output.
Any direction regarding this would be highly appreciated.
Update :
Just extending the question :
if there are same Keys and a block of Data (Data1,Data2, Data3 ,...) how to efficiently concatenate and export the total output to a text file?
Data1 = [a, [1, 2, 3, 4], 20]
Data2 = [b,[5,6,7,8],8]
Data3 =[c,[9,10,11,13],10]
require 'csv'
keys = %w(ID No time)
data = ['a', [1, 2, 3, 4], 20]
id, numbers, time = data
CSV.open('test1.txt', 'w', headers: keys, write_headers: true, col_sep: "\t") do |csv|
numbers.each do |number|
csv << [id, number, time]
end
end
Without using csv library:
keys = %w[ID No time]
data = ["a", ["1", "2", "3", "4"], 20]
File.open('test1.txt', 'w') do |file|
file.write(keys.join("\t")+"\n")
data[1].map { |x| file.write("#{data[0]}\t#{x}\t#{data[2]}\n") }
end
For multiple data:
data_array = []
data_array << data1
data_array << data2
data_array << data3
.....
Which results data_array as:
data_array = [['a', [1, 2, 3, 4], 20], ['b',[5,6,7,8],8], ['c',[9,10,11,13],10]]
File.open('test1.txt', 'w') do |file|
file.write(keys.join("\t")+"\n")
data_array.each do |data|
data[1].map { |x| file.write("#{data[0]}\t#{x}\t#{data[2]}\n") }
end
end
Just extending the question :
if there are same Keys and a block of Data (Data1,Data2, Data3 ,...) how to efficiently concatenate and export the total output to a text file?
Data1 = [a, [1, 2, 3, 4], 20]
Data2 = [b,[5,6,7,8],8]
Data3 =[c,[9,10,11,13],10]

How to remove some items from a big array

I want to check the files that don't exist, so I write a code as follows:
MAX_ID = 43148178
def extract_ids
done = Dir['res/*.html'].map {|name| name[/\d+/].to_i}
all = (1..MAX_ID).to_a
all.delete_if { |i| done.include?(i) }
all.shuffle
end
ls res | wc -l returns 35854.
I find that this is slow. How do I do this effectively?
If 'done' is an array of items you wish to remove from the 'all' array, you can simply do this:
all = [1,2,3,4,5,6,7,8,9,10]
done = [1,3,5]
all - done
# => [2, 4, 6, 7, 8, 9, 10]
Or, as you want to change the all array
all -= done

How to parse output of array.inspect back into an array

I want to store multidimensional arrays in text files and reload them efficiently. The tricky part is that the array includes strings which could look like " ] , [ \\\"" or anything.
Easiest way of writing the table to file is just as my_array.inspect (right?)
How do I then recreate the array as quickly and painlessly as possible from a string read back from the text file that might look like "[\" ] , [ \\\\\\\"\"]" (as in the above case)?
If your array only includes objects that are literally written such as Numerals, Strings, Arrays, Hashes, you can use eval.
a = [1, 2, 3].inspect
# => "[1, 2, 3]"
eval(a)
# => [1, 2, 3]
In my opinion, this sounds like too much trouble. Use YAML instead.
require 'yaml'
a = [ [ [], [] ], [ [], [] ] ]
File.open("output.yml", "w") do |f|
f.write a.to_yaml
end
b = YAML.load File.open('output.yml', 'r')
As an alternative, you could use JSON instead.
Say you have array
ary
You could write the array to a file:
File.open(path, 'w') { |f| f.write Marshal.dump(ary) }
and then re-create the array by reading the file into a string and saying
ary = Marshal.load(File.read(path))

Resources