RSpec custom diffable matcher - ruby

I have a custom matcher in RSpec, that ignores whitespaces / newlines, and just matches content:
RSpec::Matchers.define :be_matching_content do |expected|
match do |actual|
actual.gsub(/\s/,'').should == expected.gsub(/\s/,'')
end
diffable
end
I can use it like this:
body = " some data \n more data"
body.should be_matching_content("some data\nmore wrong data")
However, when a test fails (like the one above), the diff output looks not good:
-some data
-more wrong data
+ some data
+ more data
Is it possible to configure the diffable output? The first line some data is right, but the second more wrong data is wrong. It would be very useful, to only get the second line as the root cause of the failure.

I believe you should disable default diffable behaviour in RSpec and substitute your own implementation:
RSpec::Matchers.define :be_matching_content do |expected|
match do |actual|
#stripped_actual = actual.gsub(/\s/,'')
#stripped_expected = expected.gsub(/\s/,'')
expect(#stripped_actual).to eq #stripped_expected
end
failure_message do |actual|
message = "expected that #{#stripped_actual} would match #{#stripped_expected}"
message += "\nDiff:" + differ.diff_as_string(#stripped_actual, #stripped_expected)
message
end
def differ
RSpec::Support::Differ.new(
:object_preparer => lambda { |object| RSpec::Matchers::Composable.surface_descriptions_in(object) },
:color => RSpec::Matchers.configuration.color?
)
end
end
RSpec.describe 'something'do
it 'should diff correctly' do
body = " some data \n more data"
expect(body).to be_matching_content("some data\nmore wrong data")
end
end
produces the following:
Failures:
1) something should diff correctly
Failure/Error: expect(body).to be_matching_content("some data\nmore wrong data")
expected that somedatamoredata would match somedatamorewrongdata
Diff:
## -1,2 +1,2 ##
-somedatamorewrongdata
+somedatamoredata
You can use custom differ if you want, even reimplement this whole matcher to a system call to diff command, something like this:
♥ diff -uw --label expected --label actual <(echo " some data \n more data") <(echo "some data\nmore wrong data")
--- expected
+++ actual
## -1,2 +1,2 ##
some data
- more data
+more wrong data
Cheers!

You can override the expected and actual methods that will then be used when generating the diff. In this example, we store the expected and actual values as instance variables and define methods that return the instance variables:
RSpec::Matchers.define :be_matching_content do |expected_raw|
match do |actual_raw|
#actual = actual_raw.gsub(/\s/,'')
#expected = expected_raw.gsub(/\s/,'')
expect(expected).to eq(#actual)
end
diffable
attr_reader :actual, :expected
end
Another example is to match for specific attributes in two different types of objects. (The expected object in this case is a Client model.)
RSpec::Matchers.define :have_attributes_of_v1_client do |expected_client|
match do |actual_object|
#expected = client_attributes(expected_client)
#actual = actual_object.attributes
expect(actual_object).to have_attributes(#expected)
end
diffable
attr_reader :actual, :expected
def failure_message
"expected attributes of a V1 Client view row, but they do not match"
end
def client_attributes(client)
{
"id" => client.id,
"client_type" => client.client_type.name,
"username" => client.username,
"active" => client.active?,
}
end
end
Example failure looks like this:
Failure/Error: is_expected.to have_attributes_of_v1_client(client_active_partner)
expected attributes of a V1 Client view row, but they do not match
Diff:
## -1,6 +1,6 ##
"active" => true,
-"client_type" => #<ClientType id: 2, name: "ContentPartner">,
+"client_type" => "ContentPartner",
"id" => 11,

There is a gem called diffy which can be used.
But it goes through a string line by line and compares them so instead of removing all whitespace you could replace any amount of whitespace with a newline and diff those entries.
This is an example of something you could do to improve your diffs a little bit. I am not 100% certain about where to insert this into your code.
def compare(str1, str2)
str1 = break_string(str1)
str2 = break_string(str2)
return true if str1 == str2
puts Diffy::Diff.new(str1, str2).to_s
return false
end
def break_string(str)
str.gsub(/\s+/,"\n")
end
The diffy gem can be set to produce color output suitable for the terminal.
Using this code would work like this
str1 = 'extra some content'
str2 = 'extra more content'
puts compare(str1, str2)
this would print
extra
-some # red in terminal
+more # green in terminal
content
\ No newline at end of file

Related

How to read multiple XML files then output to multiple CSV files with the same XML filenames

I am trying to parse multiple XML files then output them into CSV files to list out the proper rows and columns.
I was able to do so by processing one file at a time by defining the filename, and specifically output them into a defined output file name:
File.open('H:/output/xmloutput.csv','w')
I would like to write into multiple files and make their name the same as the XML filenames without hard coding it. I tried doing it multiple ways but have had no luck so far.
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<record:root>
<record:Dataload_Request>
<record:name>Bob Chuck</record:name>
<record:Address_Data>
<record:Street_Address>123 Main St</record:Street_Address>
<record:Postal_Code>12345</record:Postal_Code>
</record:Address_Data>
<record:Age>45</record:Age>
</record:Dataload_Request>
</record:root>
Here is what I've tried:
require 'nokogiri'
require 'set'
files = ''
input_folder = "H:/input"
output_folder = "H:/output"
if input_folder[input_folder.length-1,1] == '/'
input_folder = input_folder[0,input_folder.length-1]
end
if output_folder[output_folder.length-1,1] != '/'
output_folder = output_folder + '/'
end
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
doc = Nokogiri::XML(file)
record = {} # hashes
keys = Set.new
records = [] # array
csv = ""
doc.traverse do |node|
value = node.text.gsub(/\n +/, '')
if node.name != "text" # skip these nodes: if class isnt text then skip
if value.length > 0 # skip empty nodes
key = node.name.gsub(/wd:/,'').to_sym
if key == :Dataload_Request && !record.empty?
records << record
record = {}
elsif key[/^root$|^document$/]
# neglect these keys
else
key = node.name.gsub(/wd:/,'').to_sym
# in case our value is html instead of text
record[key] = Nokogiri::HTML.parse(value).text
# add to our key set only if not already in the set
keys << key
end
end
end
end
# build our csv
File.open('H:/output/.*csv', 'w') do |file|
file.puts %Q{"#{keys.to_a.join('","')}"}
records.each do |record|
keys.each do |key|
file.write %Q{"#{record[key]}",}
end
file.write "\n"
end
print ''
print 'output files ready!'
print ''
end
I have been getting 'read memory': no implicit conversion of Array into String (TypeError) and other errors.
Here's a quick peer-review of your code, something like you'd get in a corporate environment...
Instead of writing:
input_folder = "H:/input"
input_folder[input_folder.length-1,1] == '/' # => false
Consider doing it using the -1 offset from the end of the string to access the character:
input_folder[-1] # => "t"
That simplifies your logic making it more readable because it's lacking unnecessary visual noise:
input_folder[-1] == '/' # => false
See [] and []= in the String documentation.
This looks like a bug to me:
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
files is an array of filenames. input_folder + '/' + files is appending an array to a string:
foo = ['1', '2'] # => ["1", "2"]
'/parent/' + foo # =>
# ~> -:9:in `+': no implicit conversion of Array into String (TypeError)
# ~> from -:9:in `<main>'
How you want to deal with that is left as an exercise for the programmer.
doc.traverse do |node|
is icky because it sidesteps the power of Nokogiri being able to search for a particular tag using accessors. Very rarely do we need to iterate over a document tag by tag, usually only when we're peeking at its structure and layout. traverse is slower so use it as a very last resort.
length is nice but isn't needed when checking whether a string has content:
value = 'foo'
value.length > 0 # => true
value > '' # => true
value = ''
value.length > 0 # => false
value > '' # => false
Programmers coming from Java like to use the accessors but I like being lazy, probably because of my C and Perl backgrounds.
Be careful with sub and gsub as they don't do what you're thinking they do. Both expect a regular expression, but will take a string which they do a escape on before beginning their scan.
You're passing in a regular expression, which is OK in this case, but it could cause unexpected problems if you don't remember all the rules for pattern matching and that gsub scans until the end of the string:
foo = 'wd:barwd:' # => "wd:barwd:"
key = foo.gsub(/wd:/,'') # => "bar"
In general I recommend people think a couple times before using regular expressions. I've seen some gaping holes opened up in logic written by fairly advanced programmers because they didn't know what the engine was going to do. They're wonderfully powerful, but need to be used surgically, not as a universal solution.
The same thing happens with a string, because gsub doesn't know when to quit:
key = foo.gsub('wd:','') # => "bar"
So, if you're looking to change just the first instance use sub:
key = foo.sub('wd:','') # => "barwd:"
I'd do it a little differently though.
foo = 'wd:bar'
I can check to see what the first three characters are:
foo[0,3] # => "wd:"
Or I can replace them with something else using string indexing:
foo[0,3] = ''
foo # => "bar"
There's more but I think that's enough for now.
You should use Ruby's CSV class. Also, you don't need to do any string matching or regex stuff. Use Nokogiri to target elements. If you know the node names in the XML will be consistent it should be pretty simple. I'm not exactly sure if this is the output you want, but this should get you in the right direction:
require 'nokogiri'
require 'csv'
def xml_to_csv(filename)
xml_str = File.read(filename)
xml_str.gsub!('record:','') # remove the record: namespace
doc = Nokogiri::XML xml_str
csv_filename = filename.gsub('.xml', '.csv')
CSV.open(csv_filename, 'wb' ) do |row|
row << ['name', 'street_address', 'postal_code', 'age']
row << [
doc.xpath('//name').text,
doc.xpath('//Street_Address').text,
doc.xpath('//Postal_Code').text,
doc.xpath('//Age').text,
]
end
end
# iterate over all xml files
Dir.glob('*.xml').each { |filename| xml_to_csv(filename) }

Minitest reports the wrong line number when an assertion fails inside a block

I have written an assertion that collects new records created while it yields to a block. Here's an example, with a failing assertion inside that block:
product =
assert_latest_record Product do # line 337
post :create,
:product => { ... }
assert false # line 340
end
The source of my assertion is below, but I don't think it's relevant. It does not intercept Minitest exceptions, or even call rescue or ensure.
The problem is when an assertion inside that block fails. The fault diagnostic message reports the line number as 337 the line of the outer assertion, not 340, the line of the inner assertion that failed. This is important if, for example, my colleagues have written a run-on test with way too many lines in it; isolating a failing line becomes more difficult.
Why doesn't Minitest report the correct line number?
The source:
##
# When a test case calls methods that write new ActiveModel records to a database,
# sometimes the test needs to assert those records were created, by fetching them back
# for inspection. +assert_latest_record+ collects every record in the given model or
# models that appear while its block runs, and returns either a single record or a ragged
# array.
#
# ==== Parameters
#
# * +models+ - At least 1 ActiveRecord model or association.
# * +message+ - Optional string or ->{block} to provide more diagnostics at failure time.
# * <code>&block</code> - Required block to call and monitor for new records.
#
# ==== Example
#
# user, email_addresses =
# assert_latest_record User, EmailAddress, ->{ 'Need moar records!' } do
# post :create, ...
# end
# assert_equal 'franklyn', user.login # 1 user, so not an array
# assert_equal 2, email_addresses.size
# assert_equal 'franklyn#gmail.com', email_addresses.first.mail
# assert_equal 'franklyn#hotmail.com', email_addresses.second.mail
#
# ==== Returns
#
# The returned value is a set of one or more created records. The set is normalized,
# so all arrays of one item are replaced with the item itself.
#
# ==== Operations
#
# The last argument to +assert_latest_record+ can be a string or a callable block.
# At failure time the assertion adds this string or this block's return value to
# the diagnostic message.
#
# You may call +assert_latest_record+ with anything that responds to <code>.pluck(:id)</code>
# and <code>.where()</code>, including ActiveRecord associations:
#
# user = User.last
# email_address =
# assert_latest_record user.email_addresses do
# post :add_email_address, user_id: user.id, ...
# end
# assert_equal 'franklyn#philly.com', email_address.mail
# assert_equal email_address.user_id, user.id, 'This assertion is redundant.'
#
def assert_latest_record(*models, &block)
models, message = _get_latest_record_args(models, 'assert')
latests = _get_latest_record(models, block)
latests.include?(nil) and _flunk_latest_record(models, latests, message, true)
pass # Increment the test runner's assertion count
return latests.size > 1 ? latests : latests.first
end
##
# When a test case calls methods that might write new ActiveModel records to a
# database, sometimes the test must check that no records were written.
# +refute_latest_record+ watches for new records in the given class or classes
# that appear while its block runs, and fails if any appear.
#
# ==== Parameters
#
# See +assert_latest_record+.
#
# ==== Operations
#
# refute_latest_record User, EmailAddress, ->{ 'GET should not create records' } do
# get :index
# end
#
# The last argument to +refute_latest_record+ can be a string or a callable block.
# At failure time the assertion adds this string or this block's return value to
# the diagnostic message.
#
# Like +assert_latest_record+, you may call +refute_latest_record+ with anything
# that responds to <code>pluck(:id)</code> and <code>where()</code>, including
# ActiveRecord associations.
#
def refute_latest_record(*models, &block)
models, message = _get_latest_record_args(models, 'refute')
latests = _get_latest_record(models, block)
latests.all?(&:nil?) or _flunk_latest_record(models, latests, message, false)
pass
return
end
##
# Sometimes a test must detect new records without using an assertion that passes
# judgment on whether they should have been written. Call +get_latest_record+ to
# return a ragged array of records created during its block, or +nil+:
#
# user, email_addresses, posts =
# get_latest_record User, EmailAddress, Post do
# post :create, ...
# end
#
# assert_nil posts, "Don't create Post records while creating a User"
#
# Unlike +assert_latest_record+, +get_latest_record+ does not take a +message+ string
# or block, because it has no diagnostic message.
#
# Like +assert_latest_record+, you may call +get_latest_record+ with anything
# that responds to <code>.pluck(:id)</code> and <code>.where()</code>, including
# ActiveRecord associations.
#
def get_latest_record(*models, &block)
assert models.any?, 'Call get_latest_record with one or more ActiveRecord models or associations.'
refute_nil block, 'Call get_latest_record with a block.'
records = _get_latest_record(models, block)
return records.size > 1 ? records : records.first
end # Methods should be easy to use correctly and hard to use incorrectly...
def _get_latest_record_args(models, what) #:nodoc:
message = nil
message = models.pop unless models.last.respond_to?(:pluck)
valid_message = message.nil? || message.kind_of?(String) || message.respond_to?(:call)
models.length > 0 && valid_message and return models, message
raise "call #{what}_latest_record(models..., message) with any number\n" +
'of Model classes or associations, followed by an optional diagnostic message'
end
private :_get_latest_record_args
def _get_latest_record(models, block) #:nodoc:
id_sets = models.map{ |model| model.pluck(*model.primary_key) } # Sorry about your memory!
block.call
record_sets = []
models.each_with_index do |model, index|
pk = model.primary_key
set = id_sets[index]
records =
if set.length == 0
model
elsif pk.is_a?(Array)
pks = pk.map{ |k| "`#{k}` = ?" }.join(' AND ')
pks = [ "(#{pks})" ] * set.length
pks = pks.join(' OR ')
model.where.not(pks, *set.flatten)
else
model.where.not(pk => set)
end
records = records.order(pk).to_a
record_sets.push records.size > 1 ? records : records.first
end
return record_sets
end
private :_get_latest_record
def _flunk_latest_record(models, latests, message, polarity) #:nodoc:
itch_list = []
models.each_with_index do |model, index|
records_found = latests[index] != nil
records_found == polarity or itch_list << model.name
end
itch_list = itch_list.join(', ')
diagnostic = "should#{' not' unless polarity} create new #{itch_list} record(s) in block"
message = nil if message == ''
message = message.call.to_s if message.respond_to?(:call)
message = [ message, diagnostic ].compact.join("\n")
raise Minitest::Assertion, message
end
private :_flunk_latest_record
You could try to configure it to log exceptions in test_helper.rb:
def MiniTest.filter_backtrace(backtrace)
backtrace
end
I'm not sure if this is the default, but depending on your configuration, the backtrace might not be shown.

Finding certain ruby word in txt file

I am trying to create a ruby tool that goes through a file looking for a certain string, and if it finds that word than it stores it in a variable. If NOT then it prints “word not found” on the console. Is this possible? How can i code this?
You can use File#open method and readlinesmethod like this.
test.txt
This is a test string.
Lorem imsum.
Nope.
code
def get_string_from_file(string, file_path)
File.open(file_path) do |f|
f.readlines.each { |line| return string if line.include?(string) }
end
nil
end
file_path = './test.txt'
var = get_string_from_file('Lorem', file_path)
puts var || "word not found"
# => "Lorem"
var = get_string_from_file('lorem', file_path)
puts var || "word not found"
# => "word not found"
I hope this heps.
Here's few examples of how you could find a certain word in a text file using IO from the Ruby core: http://ruby-doc.org/core-2.3.1/
In find_word_in_text_file.rb:
# SETUP
#
filename1 = 'file1.txt'
filename2 = 'file2.txt'
body1 = <<~EOS
PHRASES
beside the point
irrelevant.
case in point
an instance or example that illustrates what is being discussed: the “green revolution” in agriculture is a good case in point.
get the point
understand or accept the validity of someone's idea or argument: I get the point about not sending rejections.
make one's point
put across a proposition clearly and convincingly.
make a point of
make a special and noticeable effort to do (a specified thing): she made a point of taking a walk each day.
EOS
body2 = <<~EOS
nothing to see here
or here
or here
EOS
# write body to file
File.open(filename1, 'w+') {|f| f.write(body1)}
# write file without matching word
File.open(filename2, 'w+') {|f| f.write(body2)}
# METHODS
#
# 1) search entire file as one string
def file_as_string_rx(filename, string)
# http://ruby-doc.org/core-2.3.1/Regexp.html#method-c-escape
# http://ruby-doc.org/core-2.3.1/Regexp.html#method-c-new
rx = Regexp.new(Regexp.escape(string), true) # => /whatevs/i
# read entire file to string
# http://ruby-doc.org/core-2.3.1/IO.html#method-i-read
text = IO.read(filename)
# search entire file for string; return first match
found_word = text[rx]
# print word or default string
puts found_word || "word not found"
# —OR—
#STDOUT.write found_word || "word not found"
#STDOUT.write "\n"
end
# 2) search line by line
def line_by_line_rx(filename, string)
# http://ruby-doc.org/core-2.3.1/Regexp.html#method-c-escape
# http://ruby-doc.org/core-2.3.1/Regexp.html#method-c-new
rx = Regexp.new(Regexp.escape(string), true) # => /whatevs/i
# create array to store line numbers of matches
matches_array = []
# search each line for string
# http://ruby-doc.org/core-2.3.1/IO.html#method-c-readlines
#lines = IO.readlines(filename)
#
# http://ruby-doc.org/core-2.3.1/Enumerable.html#method-i-each_with_index
# http://stackoverflow.com/a/5546681/1076207
# "Be wary of "slurping" files. That's when you
# read the entire file into memory at once.
# The problem is that it doesn't scale well.
#lines.each_with_index do |line,i|
#
# —OR—
#
# http://ruby-doc.org/core-2.3.1/IO.html#method-c-foreach
i = 1
IO.foreach(filename) do |line|
# add line number if match found within line
matches_array.push(i) if line[rx]
i += 1
end
# print array or default string
puts matches_array.any? ? matches_array.inspect : "word not found"
# —OR—
#STDOUT.write matches_array.any? ? matches_array.inspect : "word not found"
#STDOUT.write "\n"
end
# RUNNER
#
string = "point"
puts "file_as_string_rx(#{filename1.inspect}, #{string.inspect})"
file_as_string_rx(filename1, string)
puts "\nfile_as_string_rx(#{filename2.inspect}, #{string.inspect})"
file_as_string_rx(filename2, string)
puts "\nline_by_line_rx(#{filename1.inspect}, #{string.inspect})"
line_by_line_rx(filename1, string)
puts "\nline_by_line_rx(#{filename2.inspect}, #{string.inspect})"
line_by_line_rx(filename2, string)
# CLEANUP
#
File.delete(filename1)
File.delete(filename2)
Command line:
$ ruby find_word_in_text_file.rb
file_as_string_rx("file1.txt", "point")
point
file_as_string_rx("file2.txt", "point")
word not found
line_by_line_rx("file1.txt", "point")
[3, 6, 7, 9, 10, 12, 15, 16]
line_by_line_rx("file2.txt", "point")
word not found

Compare REXML elements for name/attribute equality in RSpec

Is there a matcher for comparing REXML elements for logical equality in RSpec? I tried writing a custom matcher that converts them to formatted strings, but it fails if the attribute order is different. (As noted in the XML spec, the order of attributes should not be significant.)
I could grind through writing a custom matcher that compares the name, namespace, child nodes, attributes, etc., etc., but this seems time-consuming and error-prone, and if someone else has already done it I'd rather not reinvent the wheel.
I ended up using the equivalent-xml gem and writing an RSpec custom matcher to convert the REXML to Nokogiri, compare with equivalent-xml, and pretty-print the result if needed.
The test assertion is pretty simple:
expect(actual).to be_xml(expected)
or
expect(actual).to be_xml(expected, path)
if you want to display the file path or some sort of identifier (e.g. if you're comparing a lot of documents).
The match code is a little fancier than it needs to be because it handles REXML, Nokogiri, and strings.
module XMLMatchUtils
def self.to_nokogiri(xml)
return nil unless xml
case xml
when Nokogiri::XML::Element
xml
when Nokogiri::XML::Document
xml.root
when String
to_nokogiri(Nokogiri::XML(xml, &:noblanks))
when REXML::Element
to_nokogiri(xml.to_s)
else
raise "be_xml() expected XML, got #{xml.class}"
end
end
def self.to_pretty(nokogiri)
return nil unless nokogiri
out = StringIO.new
save_options = Nokogiri::XML::Node::SaveOptions::FORMAT | Nokogiri::XML::Node::SaveOptions::NO_DECLARATION
nokogiri.write_xml_to(out, encoding: 'UTF-8', indent: 2, save_with: save_options)
out.string
end
def self.equivalent?(expected, actual, filename = nil)
expected_xml = to_nokogiri(expected) || raise("expected value #{expected || 'nil'} does not appear to be XML#{" in #{filename}" if filename}")
actual_xml = to_nokogiri(actual)
EquivalentXml.equivalent?(expected_xml, actual_xml, element_order: false, normalize_whitespace: true)
end
def self.failure_message(expected, actual, filename = nil)
expected_string = to_pretty(to_nokogiri(expected))
actual_string = to_pretty(to_nokogiri(actual)) || actual
# Uncomment this to dump expected/actual to file for manual diffing
#
# now = Time.now.to_i
# FileUtils.mkdir('tmp') unless File.directory?('tmp')
# File.open("tmp/#{now}-expected.xml", 'w') { |f| f.write(expected_string) }
# File.open("tmp/#{now}-actual.xml", 'w') { |f| f.write(actual_string) }
diff = Diffy::Diff.new(expected_string, actual_string).to_s(:text)
"expected XML differs from actual#{" in #{filename}" if filename}:\n#{diff}"
end
def self.to_xml_string(actual)
to_pretty(to_nokogiri(actual))
end
def self.failure_message_when_negated(actual, filename = nil)
"expected not to get XML#{" in #{filename}" if filename}:\n\t#{to_xml_string(actual) || 'nil'}"
end
end
The actual matcher is fairly straightforward:
RSpec::Matchers.define :be_xml do |expected, filename = nil|
match do |actual|
XMLMatchUtils.equivalent?(expected, actual, filename)
end
failure_message do |actual|
XMLMatchUtils.failure_message(expected, actual, filename)
end
failure_message_when_negated do |actual|
XMLMatchUtils.failure_message_when_negated(actual, filename)
end
end

How do I force one field in Ruby's CSV output to be wrapped with double-quotes?

I'm generating some CSV output using Ruby's built-in CSV. Everything works fine, but the customer wants the name field in the output to have wrapping double-quotes so the output looks like the input file. For instance, the input looks something like this:
1,1.1.1.1,"Firstname Lastname",more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields
CSV's output, which is correct, looks like:
1,1.1.1.1,Firstname Lastname,more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields
I know CSV is doing the right thing by not double-quoting the third field just because it has embedded blanks, and wrapping the field with double-quotes when it has the embedded comma. What I'd like to do, to help the customer feel warm and fuzzy, is tell CSV to always double-quote the third field.
I tried wrapping the field in double-quotes in my to_a method, which creates a "Firstname Lastname" field being passed to CSV, but CSV laughed at my puny-human attempt and output """Firstname Lastname""". That is the correct thing to do because it's escaping the double-quotes, so that didn't work.
Then I tried setting CSV's :force_quotes => true in the open method, which output double-quotes wrapping all fields as expected, but the customer didn't like that, which I expected also. So, that didn't work either.
I've looked through the Table and Row docs and nothing appeared to give me access to the "generate a String field" method, or a way to set a "for field n always use quoting" flag.
I'm about to dive into the source to see if there's some super-secret tweaks, or if there's a way to monkey-patch CSV and bend it to do my will, but wondered if anyone had some special knowledge or had run into this before.
And, yes, I know I could roll my own CSV output, but I prefer to not reinvent well-tested wheels. And, I'm also aware of FasterCSV; That's now part of Ruby 1.9.2, which I'm using, so explicitly using FasterCSV buys me nothing special. Also, I'm not using Rails and have no intention of rewriting it in Rails, so unless you have a cute way of implementing it using a small subset of Rails, don't bother. I'll downvote any recommendations to use any of those ways just because you didn't bother to read this far.
Well, there's a way to do it but it wasn't as clean as I'd hoped the CSV code could allow.
I had to subclass CSV, then override the CSV::Row.<<= method and add another method forced_quote_fields= to make it possible to define the fields I want to force-quoting on, plus pull two lambdas from other methods. At least it works for what I want:
require 'csv'
class MyCSV < CSV
def <<(row)
# make sure headers have been assigned
if header_row? and [Array, String].include? #use_headers.class
parse_headers # won't read data for Array or String
self << #headers if #write_headers
end
# handle CSV::Row objects and Hashes
row = case row
when self.class::Row then row.fields
when Hash then #headers.map { |header| row[header] }
else row
end
#headers = row if header_row?
#lineno += 1
#do_quote ||= lambda do |field|
field = String(field)
encoded_quote = #quote_char.encode(field.encoding)
encoded_quote +
field.gsub(encoded_quote, encoded_quote * 2) +
encoded_quote
end
#quotable_chars ||= encode_str("\r\n", #col_sep, #quote_char)
#forced_quote_fields ||= []
#my_quote_lambda ||= lambda do |field, index|
if field.nil? # represent +nil+ fields as empty unquoted fields
""
else
field = String(field) # Stringify fields
# represent empty fields as empty quoted fields
if (
field.empty? or
field.count(#quotable_chars).nonzero? or
#forced_quote_fields.include?(index)
)
#do_quote.call(field)
else
field # unquoted field
end
end
end
output = row.map.with_index(&#my_quote_lambda).join(#col_sep) + #row_sep # quote and separate
if (
#io.is_a?(StringIO) and
output.encoding != raw_encoding and
(compatible_encoding = Encoding.compatible?(#io.string, output))
)
#io = StringIO.new(#io.string.force_encoding(compatible_encoding))
#io.seek(0, IO::SEEK_END)
end
#io << output
self # for chaining
end
alias_method :add_row, :<<
alias_method :puts, :<<
def forced_quote_fields=(indexes=[])
#forced_quote_fields = indexes
end
end
That's the code. Calling it:
data = [
%w[1 2 3],
[ 2, 'two too', 3 ],
[ 3, 'two, too', 3 ]
]
quote_fields = [1]
puts "Ruby version: #{ RUBY_VERSION }"
puts "Quoting fields: #{ quote_fields.join(', ') }", "\n"
csv = MyCSV.generate do |_csv|
_csv.forced_quote_fields = quote_fields
data.each do |d|
_csv << d
end
end
puts csv
results in:
# >> Ruby version: 1.9.2
# >> Quoting fields: 1
# >>
# >> 1,"2",3
# >> 2,"two too",3
# >> 3,"two, too",3
This post is old, but I can't believe no one thought of this.
Why not do:
csv = CSV.generate :quote_char => "\0" do |csv|
where \0 is a null character, then just add quotes to each field where they are needed:
csv << [product.upc, "\"" + product.name + "\"" # ...
Then at the end you can do a
csv.gsub!(/\0/, '')
I doubt if this will help the customer feeling warm and fuzzy after all this time, but this seems to work:
require 'csv'
#prepare a lambda which converts field with index 2
quote_col2 = lambda do |field, fieldinfo|
# fieldinfo has a line- ,header- and index-method
if fieldinfo.index == 2 && !field.start_with?('"') then
'"' + field + '"'
else
field
end
end
# specify above lambda as one of the converters
csv = CSV.read("test1.csv", :converters => [quote_col2])
p csv
# => [["aaa", "bbb", "\"ccc\"", "ddd"], ["fff", "ggg", "\"hhh\"", "iii"]]
File.open("test1.txt","w"){|out| csv.each{|line|out.puts line.join(",")}}
CSV has a force_quotes option that will force it to quote all fields (it may not have been there when you posted this originally). I realize this isn't exactly what you were proposing, but it's less monkey patching.
2.1.0 :008 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields']
1,1.1.1.1,Firstname Lastname,more,fields
2.1.0 :009 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields'], force_quotes: true
"1","1.1.1.1","Firstname Lastname","more","fields"
The drawback is that the first integer value ends up listed as a string, which changes things when you import into Excel.
It's been a long time, but since the CSV library has been patched, this might help someone if they're now facing this issue:
require 'csv'
# puts CSV::VERSION # this should be 3.1.9+
headers = ['id', 'ip', 'name', 'foo', 'bar']
data = [
[1, '1.1.1.1','Firstname Lastname','more','fields'],
[2, '2.2.2.2','Firstname Lastname, Jr.','more','fields']
]
quoter = Proc.new do |field, field_meta|
# the index starts at zero, that's why the third field would be 2:
field = '"' + field + '"' if field_meta.index == 2 && fields_meta.index > 1
field = '"' + field + '"' if field.is_a?(String) && field.include?(',')
# ^ CSV format needs to escape fields containing comma(s): ,
field
end
file = CSV.generate(headers: true, quote_char: '', write_converters: quoter) do |csv|
csv << headers
data.each { |row| csv << row }
end
puts file
the output would be:
id,ip,name,foo,bar
1,1.1.1.1,"Firstname Lastname",more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields
It doesn't look like there's any way to do this with the existing CSV implementation short of monkey-patching/rewriting it.
However, assuming you have full control over the source data, you could do this:
Append a custom string including a comma (i.e. one that would never be naturally found in the data) to the end of the field in question for each row; maybe something like "FORCE_COMMAS,".
Generate the CSV output.
Now that you have CSV output with quotes on every row for your field, remove the custom string: csv.gsub!(/FORCE_COMMAS,/, "")
Customer feels warm and fuzzy.
CSV has changed a bit in Ruby 2.1 as mentioned by #jwadsack, however here's an working version of #the-tin-man's MyCSV. Bit modified, you set the forced_quote_fields via options.
MyCSV.generate(forced_quote_fields: [1]) do |_csv|...
The modified code
require 'csv'
class MyCSV < CSV
def <<(row)
# make sure headers have been assigned
if header_row? and [Array, String].include? #use_headers.class
parse_headers # won't read data for Array or String
self << #headers if #write_headers
end
# handle CSV::Row objects and Hashes
row = case row
when self.class::Row then row.fields
when Hash then #headers.map { |header| row[header] }
else row
end
#headers = row if header_row?
#lineno += 1
output = row.map.with_index(&#quote).join(#col_sep) + #row_sep # quote and separate
if #io.is_a?(StringIO) and
output.encoding != (encoding = raw_encoding)
if #force_encoding
output = output.encode(encoding)
elsif (compatible_encoding = Encoding.compatible?(#io.string, output))
#io.set_encoding(compatible_encoding)
#io.seek(0, IO::SEEK_END)
end
end
#io << output
self # for chaining
end
def init_separators(options)
# store the selected separators
#col_sep = options.delete(:col_sep).to_s.encode(#encoding)
#row_sep = options.delete(:row_sep) # encode after resolving :auto
#quote_char = options.delete(:quote_char).to_s.encode(#encoding)
#forced_quote_fields = options.delete(:forced_quote_fields) || []
if #quote_char.length != 1
raise ArgumentError, ":quote_char has to be a single character String"
end
#
# automatically discover row separator when requested
# (not fully encoding safe)
#
if #row_sep == :auto
if [ARGF, STDIN, STDOUT, STDERR].include?(#io) or
(defined?(Zlib) and #io.class == Zlib::GzipWriter)
#row_sep = $INPUT_RECORD_SEPARATOR
else
begin
#
# remember where we were (pos() will raise an exception if #io is pipe
# or not opened for reading)
#
saved_pos = #io.pos
while #row_sep == :auto
#
# if we run out of data, it's probably a single line
# (ensure will set default value)
#
break unless sample = #io.gets(nil, 1024)
# extend sample if we're unsure of the line ending
if sample.end_with? encode_str("\r")
sample << (#io.gets(nil, 1) || "")
end
# try to find a standard separator
if sample =~ encode_re("\r\n?|\n")
#row_sep = $&
break
end
end
# tricky seek() clone to work around GzipReader's lack of seek()
#io.rewind
# reset back to the remembered position
while saved_pos > 1024 # avoid loading a lot of data into memory
#io.read(1024)
saved_pos -= 1024
end
#io.read(saved_pos) if saved_pos.nonzero?
rescue IOError # not opened for reading
# do nothing: ensure will set default
rescue NoMethodError # Zlib::GzipWriter doesn't have some IO methods
# do nothing: ensure will set default
rescue SystemCallError # pipe
# do nothing: ensure will set default
ensure
#
# set default if we failed to detect
# (stream not opened for reading, a pipe, or a single line of data)
#
#row_sep = $INPUT_RECORD_SEPARATOR if #row_sep == :auto
end
end
end
#row_sep = #row_sep.to_s.encode(#encoding)
# establish quoting rules
#force_quotes = options.delete(:force_quotes)
do_quote = lambda do |field|
field = String(field)
encoded_quote = #quote_char.encode(field.encoding)
encoded_quote +
field.gsub(encoded_quote, encoded_quote * 2) +
encoded_quote
end
quotable_chars = encode_str("\r\n", #col_sep, #quote_char)
#quote = if #force_quotes
do_quote
else
lambda do |field, index|
if field.nil? # represent +nil+ fields as empty unquoted fields
""
else
field = String(field) # Stringify fields
# represent empty fields as empty quoted fields
if field.empty? or
field.count(quotable_chars).nonzero? or
#forced_quote_fields.include?(index)
do_quote.call(field)
else
field # unquoted field
end
end
end
end
end
end

Resources