how to convert the yaml to spreadsheet? - ruby

how can i use ruby to convert a yaml file and keep on the indent format over cells to spreadsheet file.
the yaml file like this:
https://github.com/rails/rails/blob/v2.3.10/activesupport/lib/active_support/locale/en.yml

You haven't clearly stated what you want this spreadsheet to look like so I can't be specific but you can use the YAML library to read the file into a data structure, then convert the data structure into one like a table (array of arrays of strings) then use the CSV library to output it to a file.
require 'yaml'
require 'csv'
yaml_txt = File.read 'input.yaml'
yaml_data = YAML.load yaml_txt
csv_table = [
[1,'hello world', true],
['a', 'b', 3.14159, 'c', 2, 3e8],
[nil, 'another row', 'bla']
]
#replace this^ with something that converts the yaml_data into a 2D array
File.open 'output.csv', 'w' do |f|
f.puts( csv_table.map do |row|
CSV.generate_line row
end.join "\n" )
end
The current example will produce:
1,hello world,true
a,b,3.14159,c,2,300000000.0
,another row,bla
in output.csv.
You can then open the CSV spreadsheet with the following options:

think, it's better to join rows with an empty string rather than "\n"

Related

Ruby CSV converter, remove all converters?

I have some data I was writing from one CSV to another CSV because I need to do some data manipulation.
I noticed the CSV library has some default converters that are taking my values that look like dates and parsing those into new date strings.
I was wondering if I could remove all converters? I tried using my custom converter, but no matter what I do it seems that the dates keep getting parsed.
Here is my code simplified:
require 'csv'
CSV::Converters[:my_converter] = lambda do |value|
value
end
CSV.open('new-data.csv', 'w') do |csv|
data = CSV.read('original-data.csv', :converters => [:my_converter]).each do |row|
csv << row
end
end
The value 9/30/14 0:00 is getting changed to 9/30/2014 0:00, for example.
Are you sure that your CSV file doesn't actually contain the 4-digit year? Try looking at puts File.read('original-data.csv')
When I tried this on Ruby 2.1.8, it didn't change the value
require 'csv'
my_csv_data = 'hello,"9/30/14 0:00",world'
CSV.new(my_csv_data).each do |row|
puts row.inspect # prints ["hello", "9/30/14 0:00", "world"], as expected
end
CSV files are not parsed and converted into objects, the data in the fields is returned as a string. Always. This behavior is different than YAML or JSON, which do convert back to their base types.
Consider this:
require 'csv'
CSV.parse("1,10/1/14,foo") # => [["1", "10/1/14", "foo"]]
All values are strings.
csv = ["foo", 'bar', 1, Date.new(2014, 10, 1)].to_csv # => "foo,bar,1,2014-10-01\n"
Converting an array containing native Ruby objects results in a string of comma-delimited values.
CSV.parse(csv) # => [["foo", "bar", "1", "2014-10-01"]]
Reparsing that string returns the string versions but doesn't attempt to return them to their original types as CSV doesn't have a way of knowing what those were. The developer (you) has to know and do that.
The end-result of all that is that CSV won't change a year from '14' to '2014'. It doesn't know that it's a date, and, because it's not CSV's place to convert to objects, it only splits the fields appropriately and passes the information on to be massaged by the developer.

How to read specific columns of a zipped CSV file

I used the code below to read the contents of a zipped CSV file.
Zip::ZipFile.foreach(file) do |entry|
istream = entry.get_input_stream
data = istream.read
#...
end
It gives me the entire content of the text (CSV) file with headers like below:
NAME AGE GENDER NAME1 29 MALE NAME2 30 FEMALE
but I need specific data of the column. For example, I want to display only the names (NAME). Please help me proceed with this.
Though your example shows ZipFile, you're really asking a CSV question. First, you should check the docs in http://www.ruby-doc.org/stdlib-2.0/libdoc/csv/rdoc/CSV.html
You'll find that if you parse your data with the :headers => true option, you'll get a CSV::table object that knows how to extract a column of data as follows. (For obvious reasons, I wouldn't code it this way -- this is for example only.)
require 'zip'
require 'csv'
csv_table = nil
Zip::ZipFile.foreach("x.csv.zip") do |entry|
istream = entry.get_input_stream
data = istream.read
csv_table = CSV.parse(data, :col_sep => " ", :headers => true)
end
With the data you gave, we need `col_sep => " " since you're using spaces as column separators. But now we can do:
>> csv_table["NAME"] # extract the NAME column
=> ["NAME1", "NAME2"]
First you can use this for reference:
http://www.ruby-doc.org/stdlib-2.0/libdoc/csv/rdoc/CSV.html
If you have a string you can do
array = CSV.parse("data")
This would give you an array of arrays, one for each line.
Now if you know that the first column for each line is the name you can just manipulate that array i.e
array.map { |line| line[0] }.join(",") # returns NAME,<name>,<name>,<name> ...

How do I write a TSV file scraper, where "if line contains x, then save"?

I want to open a TSV (tab-separated-value) file, and save specific rows to a new CSV (comma-separated-value) file.
If the row contains 'NLD' in a field with the header 'Actor1Code', I want to save the row to a CSV; if not, I want to iterate to the next row. This is what I have so far, but apparently that is not enough:
require 'csv'
CSV.open("path/to.csv", "wb") do |csv| #csv to save to
CSV.open('data.txt', 'r', '\t').each do |row| #csv to scrape
if row['Actor1Code'] == 'NLD'
csv << row
else
end
end
end
Are you sure that you're calling CSV.open correctly? The documentation seems to suggest arguments are passed in as hashes:
CSV.open('data.txt', 'r', col_sep: "\t")
The error you're seeing is probably the result of '\t' being interpreted as a hash and referenced with [].

How do I make an array of arrays out of a CSV?

I have a CSV file that looks like this:
Jenny, jenny#example.com ,
Ricky, ricky#example.com ,
Josefina josefina#example.com ,
I'm trying to get this output:
users_array = [
['Jenny', 'jenny#example.com'], ['Ricky', 'ricky#example.com'], ['Josefina', 'josefina#example.com']
]
I've tried this:
users_array = Array.new
file = File.new('csv_file.csv', 'r')
file.each_line("\n") do |row|
puts row + "\n"
columns = row.split(",")
users_array.push columns
puts users_array
end
Unfortunately, in Terminal, this returns:
Jenny
jenny#example.com
Ricky
ricky#example.com
Josefina
josefina#example.com
Which I don't think will work for this:
users_array.each_with_index do |user|
add_page.form_with(:id => 'new_user') do |f|
f.field_with(:id => "user_email").value = user[0]
f.field_with(:id => "user_name").value = user[1]
end.click_button
end
What do I need to change? Or is there a better way to solve this problem?
Ruby's standard library has a CSV class with a similar api to File but contains a number of useful methods for working with tabular data. To get the output you want, all you need to do is this:
require 'csv'
users_array = CSV.read('csv_file.csv')
PS - I think you are getting the output you expected with your file parsing as well, but maybe you're thrown off by how it is printing to the terminal. puts behaves differently with arrays, printing each member object on a new line instead of as a single array. If you want to view it as an array, use puts my_array.inspect.
Assuming that your CSV file actually has a comma between the name and email address on the third line:
require 'csv'
users_array = []
CSV.foreach('csv_file.csv') do |row|
users_array.push row.delete_if(&:nil?).map(&:strip)
end
users_array
# => [["Jenny", "jenny#example.com"],
# ["Ricky", "ricky#example.com"],
# ["Josefina", "josefina#example.com"]]
There may be a simpler way, but what I'm doing there is discarding the nil field created by the trailing comma and stripping the spaces around the email addresses.

how to store a Ruby array into a file?

How to store a Ruby array into a file?
I am not sure what exactly you want, but, to serialize an array, write it to a file and read back, you can use this:
fruits = %w{mango banana apple guava}
=> ["mango", "banana", "apple", "guava"]
serialized_array = Marshal.dump(fruits)
=> "\004\b[\t\"\nmango\"\vbanana\"\napple\"\nguava"
File.open('/tmp/fruits_file.txt', 'w') {|f| f.write(serialized_array) }
=> 33
# read the file back
fruits = Marshal.load File.read('/tmp/fruits_file.txt')
=> ["mango", "banana", "apple", "guava"]
There are other alternatives you can explore, like json and YAML.
To just dump the array to a file in the standard [a,b,c] format:
require 'pp'
$stdout = File.open('path/to/file.txt', 'w')
pp myArray
That might not be so helpful, perhaps you might want to read it back? In that case you could use json. Install using rubygems with gem install json.
require 'rubygems'
require 'json'
$stdout = File.open('path/to/file.txt', 'w')
puts myArray.to_json
Read it back:
require 'rubygems'
require 'json'
buffer = File.open('path/to/file.txt', 'r').read
myArray = JSON.parse(buffer)
There are multiple ways to dump an array to disk. You need to decide if you want to serialize in a binary format or in a text format.
For binary serialization you can look at Marshal
For text format you can use json, yaml, xml (with rexml, builder, ... ) , ...
Some standard options for serializing data in Ruby:
Marshal
YAML
JSON (built-in as of 1.9, various gems available as well)
(There are other, arguably better/faster implementations of YAML and JSON, but I'm linking to built-ins for a start.)
In practice, I seem to see YAML most often, but that may not be indicative of anything real.
Here's a quick yaml example
config = {"rank" => "Admiral", "name"=>"Akbar",
"wallet_value" => 9, "bills" => [5,1,1,2]}
open('store.yml', 'w') {|f| YAML.dump(config, f)}
loaded = open('store.yml') {|f| YAML.load(f) }
p loaded
# => {"name"=>"Akbar", "wallet_value"=>9, \
# "bills"=>[5, 1, 1, 2], "rank"=>"Admiral"}
Example: write text_area to a file where text_area is an array of strings.
File.open('output.txt', 'w') { |f| text_area.each { |line| f << line } }
Don't forget to do error checking on file operations :)
Afaik.. files contain lines not arrays. When you read the files, the data can then be stored in an array or other data structures. I am anxious to know if there is another way.

Resources