How to detect and handle different EOL in Ruby? - ruby

I am trying to process a CSV file that can either be generated with CF or LF as an EOL marker. When I try to read the file with
infile = File.open('my.csv','r')
while line = infile.gets
...
The entire 20MB file is read in as one line.
How can I detect and handle properly?
TIA

I would slurp the file, normalize the input, and then feed it to CSV:
raw = File.open('my.csv','rb',&:read).gsub("\r\n","\n")
CSV.parse(raw) do |row|
# use row here...
end
The above uses File.open instead of IO.read due to slow file reads on Windows Ruby.

When in doubt, use a regex.
> "how\r\nnow\nbrown\r\ncow\n".split /[\r\n]+/
=> ["how", "now", "brown", "cow"]
So, something like
infile.read.split(/[\r\n]+/).each do |line|
. . .
end
Now, it turns out that the standard library CSV already understands mixed line endings, so you could just do:
CSV.parse(infile.read).each do |line|
. . .

Related

Ruby. NUL chars after reading simple file

I'm reading simple text files using Ruby for further regex processing and suddenly I see that str NUL after each printable character. Totally lost, where it comes from, I tested typing simple text in Notepad, saving as txt file and still getting those. I'm on W machine, didn't have this before.
How I can process it, probably replace them, not sure how to refer to them.
My regex doesn't work with them, tried several ways, using SciTE for run.
e.g. use presented as uNULsNULeNUL and not equal to use
puts File.read(file_name)
puts '____________________'
File.open(file_name, "r") do |f|
f.each_line do |line|
puts 'Line.....' + line
end
end
---------------------- below pic on content of file and output:
This file is probably in UTF-16 format. You'll need to read it in that way:
File.open(file_name, "r:UTF-16LE") do |f|
# ...
end
That format is the default in Windows.
You can always fix this by re-saving the file as UTF-8.

Ruby Write txt file DOS/Windows

Maybe this is a beginner question, but I could not find the problem yet.
I need to write a text file with Ruby.
I can write and create the file to export, but the time I export the file and it is read in other software, it tells me it is a UNIX file and the program requires it to be DOS / Windows.
How can I do this with Ruby?
I use Rails 4 in the project.
Example of how I am writing.
File.open(filePath, "w+"){ |file| file.write("blablabla\n")}
Use \r\n instead:
File.open(filePath, "w+"){ |file| file.write("blablabla\r\n")}
Using \n (0x0a) only is 'unix style'.
Using \r\n (0x0d 0x0a) is 'windows style'.
Although most software should be able to handle both.
It isn't very clearly documented but File.open also accepts these String#encode options:
File.open('a.txt', 'w+', crlf_newline: true){ |file| file.write("blablabla\n")}
and
File.open('a.txt', 'w+', newline: :crlf){ |file| file.write("blablabla\n")}
Either will force Ruby to write CRLF instead of LF (CR is \r and LF is \n).

Ruby / CSV - Convert Dos to Unix

I currently use: http://emacswiki.org/emacs/DosToUnix to manually convert DOS CSVs to UNIX. Just wondering if there's a ruby function for the CSV library that I'm missing? And / or if it's possible build a quick script / Monkey Patch.
Yes. The CSV docs say:
The String appended to the end of each row. This can be set to the special :auto setting, which requests that CSV automatically discover this from the data. Auto-discovery reads ahead in the data looking for the next "\r\n", "\n", or "\r" sequence.
:auto is the default, so you should be able to feed your DOS CSV to Ruby unmodified.
However, if you want to convert to UNIX line endings:
str.gsub(/\r\n/, "\n")

CSV.read Illegal quoting in line x

I am using ruby CSV.read with massive data. From time to time the library encounters poorly formatted lines, for instance:
"Illegal quoting in line 53657."
It would be easier to ignore the line and skip it, then to go through each csv and fix the formatting. How can I do this?
I had this problem in a line like 123,456,a"b"c
The problem is the CSV parser is expecting ", if they appear, to entirely surround the comma-delimited text.
Solution use a quote character besides " that I was sure would not appear in my data:
CSV.read(filename, :quote_char => "|")
The liberal_parsing option is available starting in Ruby 2.4 for cases like this. From the documentation:
When set to a true value, CSV will attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields.
To enable it, pass it as an option to the CSV read/parse/new methods:
CSV.read(filename, liberal_parsing: true)
Don't let CSV both read and parse the file.
Just read the file yourself and hand each line to CSV.parse_line, and then rescue any exceptions it throws.
Try forcing double quote character " as quote char:
require 'csv'
CSV.foreach(file,{headers: :first_row, quote_char: "\x00"}) do |line|
p line
end
Apparently this error can also be caused by unprintable BOM characters. This thread suggests using a file mode to force a conversion, which is what finally worked for me.
require 'csv'
CSV.open(#filename, 'r:bom|utf-8') do |csv|
# do something
end

ruby each_line reads line break too?

I'm trying to read data from a text file and join it with a post string. When there's only one line in the file, it works fine. But with 2 lines, my request is failed. Is each_line reading the line break? How can I correct it?
File.open('sfzh.txt','r'){|f|
f.each_line{|row|
send(row)
}
I did bypass this issue with split and extra delimiter. But it just looks ugly.
Yes, each_line includes line breaks. But you can strip them easily using chomp:
File.foreach('test1.rb') do |line|
send line.chomp
end
Another way is to map strip onto each line as it is returned. To read a file line-by-line, stripping whitespace and do something with each line you can do the following:
File.open("path to file").readlines.map(&:strip).each do |line|
(do something with line)
end

Resources