I'm trying to create a converter to remove newline characters from CSV output.
I've got:
nonewline=lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
I've verified that this works properly IF I load a variable and then run something like:
csv=CSV(variable,:converters=>[nonewline])
However, I'm attempting to use this code to update a bunch of preexisting code using CSV.generate, and it does not appear to work at all.
CSV.generate(:converters=>[nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
returns:
"\"hello\ngoodbye\"\n"
I've tried quite a few things as well as trying other examples I've found online, and it appears as though :converters has no effect when used with CSV.generate.
Is this correct, or is there something I'm missing?
You need to write your converter as as below :
CSV::Converters[:nonewline] = lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
Then do :
CSV.generate(:converters => [:nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
Read the documentation Converters .
Okay, above part I didn't remove, as to show you how to write the custom CSV converters. The way you wrote it is incorrect.
Read the documentation of CSV::generate
This method wraps a String you provide, or an empty default String, in a CSV object which is passed to the provided block. You can use the block to append CSV rows to the String and when the block exits, the final String will be returned.
After reading the docs, it is quite clear that this method is for writing to a csv file, not for reading. Now all the converters options ( like :converters, :header_converters) is applied, when you are reading a CSV file, but not applied when you are writing into a CSV file.
Let me show you 2 examples to illustrate this more clearly.
require 'csv'
string = <<_
foo,bar
baz,quack
_
File.write('a',string)
CSV::Converters[:upcase] = lambda do |s|
s.upcase
end
I am reading from a CSV file, so :converters option is applied to it.
CSV.open('a','r',:converters => :upcase) do |csv|
puts csv.read
end
output
# >> FOO
# >> BAR
# >> BAZ
# >> QUACK
Now I am writing into the CSV file, converters option is not applied.
CSV.open('a','w',:converters => :upcase) do |csv|
csv << ['dog','cat']
end
CSV.read('a') # => [["dog", "cat"]]
Attempting to remove newlines using :converters did not work.
I had to override the << method from csv.rb adding the following code to it:
# Change all CR/NL's into one space
row.map! { |element|
if element.is_a?(String)
element.gsub(/(\r?\n)+/,' ')
else
element
end
}
Placed right before
output = row.map(&#quote).join(#col_sep) + #row_sep # quote and separate
at line 21.
I would think this would be a good patch to CSV, as newlines will always produce bad CSV output.
Related
I have a CSV file that I want to change the headers only for certain columns (about 20 of them in my actual file). Here's a sample CSV file:
CSV File
"name","blah_01_blah","foo_1_01_foo","bacon_01_bacon","bacon_02_bacon"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
I have been trying this with a File/IO class, but when it reads the file to do the gsub it removes all of the quotation marks around each string separated by commas. Here's the code I'm using:
Ruby Code
file = 'file.csv'
replacements = {
'blah_01_blah' => 'newblah1',
'foo_01_foo' => 'coolfoo1',
'bacon_01_bacon' => 'goodpig1',
'bacon_01_bacon' => 'goodpig2'
}
matcher = /#{replacements.keys.join('|')}/
outdata = File.read(file).gsub(matcher, replacements)
File.open(file, 'w') do |out|
out << outdata
end
What I end up with is this in the CSV file:
New CSV File
name,blah_01_blah,foo_1_01_foo,bacon_01_bacon,bacon_02_bacon
John,yucky,summer,yum,food
Mary,"","",cool,sundae
It's keeping the quotation marks in fields that are blank, but taking them out around the strings elsewhere. I want to retain those quotation marks in case for some reason a rogue comma ends up in a string somewhere so it doesn't get thrown off. How can I change the headers without losing my quotation marks around the strings?
EDIT - This is what I want the file to look like at the end.
Expected Result CSV File
"name","newblah1","coolfoo1","goodpig1","goodpig2"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
Thanks!
You don’t need to handle CSV at all:
File.write(
file,
File.readlines(file).tap do |lines|
lines.first.gsub!(matcher, replacements)
end.join
)
File#readlines.
The trick here is we actually deal with the first line only, as with plain text.
Let's first create the input CSV file.
text =<<_
"name","blah_01_blah","foo_1_01_foo","bacon_01_bacon","bacon_02_bacon"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
_
file_in = 'file_in.csv'
file_out = 'file_out.csv'
File.write(file_in, text)
#=> 137
Here is the replacements hash, which I simplified slightly.
replacements = {'blah_01_blah'=>'newblah1', 'foo_01_foo'=>'coolfoo1',
'bacon_01_bacon'=>'goodpig1'}
The first task is to modify this hash so that if it has no key k, replacements[k] will return k. For this we use the method Hash#default_proc=.
replacements.default_proc = ->(_,k) { k }
Here are two examples of how this hash is used.
replacements['bacon_01_bacon']
#=> "goodpig1"
replacements['name']
#=> "name"`
The latter follows because replacements has no key 'name'.
The code is as follows.
require 'csv'
f_in = CSV.read(file_in, headers:true)
CSV.open(file_out, 'w') do |csv_out|
csv_out << replacements.values_at(*f_in.headers)
f_in.each { |row| csv_out << row }
end
#=> #<CSV::Table mode:col_or_row row_count:3>
Note that
f_in.headers
#=> ["name", "blah_01_blah", "foo_1_01_foo", "bacon_01_bacon", "bacon_02_bacon"]
Let's look at the output file.
puts File.read(file_out)
prints
name,newblah1,foo_1_01_foo,goodpig1,bacon_02_bacon
John,yucky,summer,yum,food
Mary,"","",cool,sundae
So I have a Ruby script which parses a CSV file as a list of rules and does some processing and returns an array of hashes as processed data.
CSV looks like this:
id_number,rule,notes
1,"dummy_rule","necessary"
2,"sample_rule","optional"
The parsing of CSV looks like this:
def parse_csv(file_name)
filtered_data = []
CSV.foreach(file_name, headers: true) do |row|
filtered_data << row # some processing
end
filtered_data
end
Now I am wondering if it is possible to stub/mock an actual CSV file for unit testing in such a way that I could pass a "filename" into this function. I could make a CSV object and use the generate function but then the actual filename would not be possible.
def test_parse_csv
expected_result = ["id_number"=>1, "rule"=>"dummy_rule", "notes"=>"optional"}]
# stub/mock csv with filename: say "rules.csv"
assert.equal(expected_result, parse_csv(file_name))
end
I use Test::Ruby
I also found a ruby gem library called mocha but I don't know how this works for CSVs
https://github.com/freerange/mocha
Any thoughts are welcome!
I would create a support directory inside your test directory then create a file..
# test/support/rules.csv
id_number,rule,notes
1,"dummy_rule","necessary"
2,"sample_rule","optional"
Your test should look like this, also adding an opening curly bracket which looks like you've missed on line #2.
def test_parse_csv
expected_result = [{"id_number"=>1, "rule"=>"dummy_rule", "notes"=>"optional"}]
assert.equal(expected_result, parse_csv(File.read(csv_file)).first)
end
def csv_file
Rails.root.join('test', 'support', 'rules.csv')
end
EDIT: Sorry, I should have noticed this wasn't Ruby on Rails... heres a ruby solution:
def test_parse_csv
expected_result = [{"id_number"=>1, "rule"=>"dummy_rule", "notes"=>"optional"}]
assert.equal(expected_result, parse_csv(csv_file).first)
end
def csv_file
File.read('test/support/rules.csv')
end
I did this in my test let(:raw_csv) { "row_number \n 1 \n 2" }, telling myself that a CSV file was surely read as a simple string with linebreaks and commas... and it works fine. Be careful with additional whitespaces and commas, it is easy to make a mistake.
I'd like to be able to change the date and time formats used by CSV when generating csv output. For example, instead of generating '2004-1-30' for a date, I'd like it to generate '1/30/2004'.
How can I do that?
Here is a complete example :
require 'csv'
require 'date'
str = <<_
2004-1-30,foo
2004-11-20,bar
_
File.write('a',str)
CSV::Converters[:cdate] = lambda do |s|
begin
Date.strptime(s,"%Y-%m-%d").strftime("%-m/%d/%Y")
rescue ArgumentError
s
end
end
CSV.foreach('a',:converters => :cdate) do |row|
p row
end
# >> ["1/30/2004", "foo"]
# >> ["11/20/2004", "bar"]
Look at the documentation of Converters.
An Array of names from the Converters Hash and/or lambdas that handle custom conversion. A single converter doesn’t have to be in an Array. All built-in converters try to transcode fields to UTF-8 before converting. The conversion will fail if the data cannot be transcoded, leaving the field unchanged.
I have a CSV file that looks like this:
Jenny, jenny#example.com ,
Ricky, ricky#example.com ,
Josefina josefina#example.com ,
I'm trying to get this output:
users_array = [
['Jenny', 'jenny#example.com'], ['Ricky', 'ricky#example.com'], ['Josefina', 'josefina#example.com']
]
I've tried this:
users_array = Array.new
file = File.new('csv_file.csv', 'r')
file.each_line("\n") do |row|
puts row + "\n"
columns = row.split(",")
users_array.push columns
puts users_array
end
Unfortunately, in Terminal, this returns:
Jenny
jenny#example.com
Ricky
ricky#example.com
Josefina
josefina#example.com
Which I don't think will work for this:
users_array.each_with_index do |user|
add_page.form_with(:id => 'new_user') do |f|
f.field_with(:id => "user_email").value = user[0]
f.field_with(:id => "user_name").value = user[1]
end.click_button
end
What do I need to change? Or is there a better way to solve this problem?
Ruby's standard library has a CSV class with a similar api to File but contains a number of useful methods for working with tabular data. To get the output you want, all you need to do is this:
require 'csv'
users_array = CSV.read('csv_file.csv')
PS - I think you are getting the output you expected with your file parsing as well, but maybe you're thrown off by how it is printing to the terminal. puts behaves differently with arrays, printing each member object on a new line instead of as a single array. If you want to view it as an array, use puts my_array.inspect.
Assuming that your CSV file actually has a comma between the name and email address on the third line:
require 'csv'
users_array = []
CSV.foreach('csv_file.csv') do |row|
users_array.push row.delete_if(&:nil?).map(&:strip)
end
users_array
# => [["Jenny", "jenny#example.com"],
# ["Ricky", "ricky#example.com"],
# ["Josefina", "josefina#example.com"]]
There may be a simpler way, but what I'm doing there is discarding the nil field created by the trailing comma and stripping the spaces around the email addresses.
The following code is a line in an xml file:
<appId>455360226</appId>
How can I replace the number between the 2 tags with another number using ruby?
There is no possibility to modify a file content in one step (at least none I know, when the file size would change).
You have to read the file and store the modified text in another file.
replace="100"
infile = "xmlfile_in"
outfile = "xmlfile_out"
File.open(outfile, 'w') do |out|
out << File.open(infile).read.gsub(/<appId>\d+<\/appId>/, "<appId>#{replace}</appId>")
end
Or you read the file content to memory and afterwords you overwrite the file with the modified content:
replace="100"
filename = "xmlfile_in"
outdata = File.read(filename).gsub(/<appId>\d+<\/appId>/, "<appId>#{replace}</appId>")
File.open(filename, 'w') do |out|
out << outdata
end
(Hope it works, the code is not tested)
You can do it in one line like this:
IO.write(filepath, File.open(filepath) {|f| f.read.gsub(//<appId>\d+<\/appId>/, "<appId>42</appId>"/)})
IO.write truncates the given file by default, so if you read the text first, perform the regex String.gsub and return the resulting string using File.open in block mode, it will replace the file's content in one fell swoop.
I like the way this reads, but it can be written in multiple lines too of course:
IO.write(filepath, File.open(filepath) do |f|
f.read.gsub(//<appId>\d+<\/appId>/, "<appId>42</appId>"/)
end
)
replace="100"
File.open("xmlfile").each do |line|
if line[/<appId>/ ]
line.sub!(/<appId>\d+<\/appId>/, "<appId>#{replace}</appId>")
end
puts line
end
The right way is to use an XML parsing tool, and example of which is XmlSimple.
You did tag your question with regex. If you really must do it with a regex then
s = "Blah blah <appId>455360226</appId> blah"
s.sub(/<appId>\d+<\/appId>/, "<appId>42</appId>")
is an illustration of the kind of thing you can do but shouldn't.