Opening file in binary mode and saving it adds new line - ruby

Why does this add a new line to every line in the file?
text = File.read('1.txt', mode: 'rb', encoding: 'UTF-8')
File.write('1.txt', text, encoding: 'UTF-8')
If I remove binary mode, it is normal again, but I need it for another encoding (UTF-16LE).
Test it - http://asdfasd.net/ruby/binary_adds_newline.zip

I ran some tests; it depends on the way your lines end. When they end with either LF (\n) or CR (\r) it will produce output like you expect. That is, no new lines are added. However, if you have CRLF (\r\n) it will add a CR character after each line, thus ending it effectively with CRCR+LF which produces the extra line.
Most programming editors allow you to select an option which makes the line endings visible.
I am not exactly sure why this happens, but likely has to do with the following snippet from the IO docs at the 'b' mode:
Suppresses EOL <-> CRLF conversion on Windows.
It appears that when not using binary mode, CRLF (default end-of-lines on Windows) are converted to LF's. The simple solution thus seems to just replace all \r\n by either \n or \r. You can do that like this:
File.open('converted.txt', 'wb') do |converted|
File.open('1.txt', 'rb').each_line do |line|
converted << line.gsub("\r\n", "\n") # Replace CRLF with LF
end
end
If you run the script multiple times on the same file, you should make sure to replace CRLF with LF before you write it back:
# Note the .gsub at the end here
text = File.read('1.txt', mode: 'rb', encoding: 'UTF-8').gsub("\r\n", "\n")
File.write('1.txt', text, encoding: 'UTF-8')

Related

Perl on Windows translates my newlines to CRLF

print FILEHANDLE; - when run from a Windows box - always converts a trailing \n into \r\n - resulting in a DOS formatted file. The difference between a DOS and a UNIX file is that in UNIX, the last character of each line is \n, whereas in Windows it is \r\n. I have tried changing the line termination character $\ = "\n"; but the print command still does the conversion to DOS format. This only occurs on Windows boxes.
If you don't like how Perl decides to output your data, you can change it. In the three-argument open, it looks like this:
open my $fh, '>:raw', $filename;
Or, if you already have the filehandle, you can use binmode:
binmode $fh, ':raw';
binmode $fh; # :raw is the default
The output depends on various IO "layers", each of which gets to stick their dirty fingers into your data before it is output. The perlio docs have the list. There's a :crlf layer that turns unix line endings, and you are probably getting it by default. Note that changing the output record separator is something that happens at the print level, but there are deeper layers that can still do their work.

Ruby Write txt file DOS/Windows

Maybe this is a beginner question, but I could not find the problem yet.
I need to write a text file with Ruby.
I can write and create the file to export, but the time I export the file and it is read in other software, it tells me it is a UNIX file and the program requires it to be DOS / Windows.
How can I do this with Ruby?
I use Rails 4 in the project.
Example of how I am writing.
File.open(filePath, "w+"){ |file| file.write("blablabla\n")}
Use \r\n instead:
File.open(filePath, "w+"){ |file| file.write("blablabla\r\n")}
Using \n (0x0a) only is 'unix style'.
Using \r\n (0x0d 0x0a) is 'windows style'.
Although most software should be able to handle both.
It isn't very clearly documented but File.open also accepts these String#encode options:
File.open('a.txt', 'w+', crlf_newline: true){ |file| file.write("blablabla\n")}
and
File.open('a.txt', 'w+', newline: :crlf){ |file| file.write("blablabla\n")}
Either will force Ruby to write CRLF instead of LF (CR is \r and LF is \n).

std::endl equivalent on Ruby?

I just can't find it.
Found you can remove them with chomp, but not how to create them.
There is a global variable $/ which represent input record separator (default to newline (\n)).
>> $/
=> "\n"
Methods like Kernel#gets use this to determine input boundary.
As long as you work with files in text mode (the default), Ruby itself does the translation of the operating system's end-of-line character sequences to "\n" in Ruby:
When reading from a file in text mode, all line endings will appear as "\n".
When writing to a file in text mode, all newline characters "\n" will be written as the operating system's end-of-line character sequence.
So for all practical purposes when dealing with files in text mode, you can use "\n" as a constant to mean the OS-specific line ending, like std::endl.
Source: How to make your Ruby code work on Windows PCs, section "Get your file modes right".

Ruby on a Mac -- Regular Expression Spanning Two Lines of Text

On the PC, the following Ruby regular expression matches data. However, when run on the Mac against the same input text file, no matches occur. Am I matching line returns in a way that should work cross-platform?
data = nil
File.open(ARGV[0], "r") do |file|
data = file.readlines.join("").scan(/^Name: (.*?)[\r\n]+Email: (.*?)$/)
end
Versions
PC: ruby 1.9.2p135
Mac: ruby 1.8.6
Thank you,
Ben
The problem was the ^ and $ pattern characters! Ruby doesn't consider \r (a.k.a. ^M) a line boundary. If I modified my pattern, replacing both ^ and $ with "\r", the pattern matched as desired.
data = file.readlines.join.scan(/\rName: (.*?)\rEmail: (.*?)\r/)
Instead of modifying the pattern, I opted to do a gsub on the text, replacing \r with \n before calling scan.
data = file.readlines.join.gsub(/\r/, "\n").scan(/^Name: (.*?)\nEmail: (.*?)$/)
Thank you each for your responses to my question.
When going from Windows -> Unix based (MAC) I've had this issue: ^M =? \r\n. The Carriage return gets rendered as a Control-M which may or may not be interpreted correctly by your regexp~
On Unix (OS X is a Unix), end of lines are \n, not \r\n. Putting simply [\n] will work on Mac.
To have a cross-platform script, may be you could first replace each \r\n sequence by a \n character?

Ruby: cannot parse Excel file exported as CSV in OS X

I'm using Ruby's CSV library to parse some CSV. I have a seemingly well-formed CSV file that I created by exporting an Excel file as CSV.
However CSV.open(filename, 'r') causes a CSV::IllegalFormatError.
There are no rogue commas or quotation marks in the file, nor anything else that I can see that might cause problems.
I suspect the problem could be to do with line endings. I am able to parse data entered manually via a text editor (Aquamacs). It is just when I try with data exported from Excel (for OS X) that problems occur. When I open up the exported CSV in vim, all the text appears on one line, with ^M appearing between lines.
From the docs, it seems that you can provide open with a row separator; however I am unsure what it should be in this case.
Try: CSV.open('filename', 'r', ?,, ?\r)
As cantlin notes, for Ruby 2 it's:
CSV.new('file.csv', 'r', :col_sep => ?,, :row_sep => ?\r)
I'm pretty sure these will DTRT for you. You can also "fix" the file itself (in which case keep the old open) with the following vim command: :%s/\r/\r/g
Yes, I know that command looks like a total no-op, but it will work.
Stripping \r characters seemed to work for me
CSV.parse(File.read('filename').gsub(/\r/, '')) do |row|
...
end
Another option is to open the CSV file or the original spreadsheet in Excel and save it as "Windows Comma Separated" rather than "Comma Separated Values". This will output the file with line endings that FasterCSV is able to understand.
"""
When I open up the exported CSV in vim, all the text appears on one line, with ^M appearing between lines.
From the docs, it seems that you can provide open with a row separator; however I am unsure what it should be in this case.
"""
Read back a sentence ... ^M means keyboard Ctrl-M aka '\x0D' (M is the 13th letter of the ASCII alphabet; 0x0D == 13) aka ASCII CR (carriage return) aka '\r' ... IOW what Macs used to use as a line terminator before OS X.
It seems newer versions of the CSV parser and/or any component it uses read DOS/Windows line endings without issues. Mac OS X's stock one (not sure the version) was not cutting it, installed Ruby 2.0.0 and it parsed the file just fine, without the special arguments...
I had similar problem. I got an error:
"error_message"=>"Illegal quoting in line 1.", "error_class"=>"CSV::MalformedCSVError"
The problem was the file had Windows line endings, which are of course other than Unix. What helped me was defining row_sep: "\r\n":
CSV.open(path, 'w', headers: :first_row, col_sep: ';', quote_char: '"', row_sep: "\r\n")

Resources