Maybe this is a beginner question, but I could not find the problem yet.
I need to write a text file with Ruby.
I can write and create the file to export, but the time I export the file and it is read in other software, it tells me it is a UNIX file and the program requires it to be DOS / Windows.
How can I do this with Ruby?
I use Rails 4 in the project.
Example of how I am writing.
File.open(filePath, "w+"){ |file| file.write("blablabla\n")}
Use \r\n instead:
File.open(filePath, "w+"){ |file| file.write("blablabla\r\n")}
Using \n (0x0a) only is 'unix style'.
Using \r\n (0x0d 0x0a) is 'windows style'.
Although most software should be able to handle both.
It isn't very clearly documented but File.open also accepts these String#encode options:
File.open('a.txt', 'w+', crlf_newline: true){ |file| file.write("blablabla\n")}
and
File.open('a.txt', 'w+', newline: :crlf){ |file| file.write("blablabla\n")}
Either will force Ruby to write CRLF instead of LF (CR is \r and LF is \n).
Related
I'm reading simple text files using Ruby for further regex processing and suddenly I see that str NUL after each printable character. Totally lost, where it comes from, I tested typing simple text in Notepad, saving as txt file and still getting those. I'm on W machine, didn't have this before.
How I can process it, probably replace them, not sure how to refer to them.
My regex doesn't work with them, tried several ways, using SciTE for run.
e.g. use presented as uNULsNULeNUL and not equal to use
puts File.read(file_name)
puts '____________________'
File.open(file_name, "r") do |f|
f.each_line do |line|
puts 'Line.....' + line
end
end
---------------------- below pic on content of file and output:
This file is probably in UTF-16 format. You'll need to read it in that way:
File.open(file_name, "r:UTF-16LE") do |f|
# ...
end
That format is the default in Windows.
You can always fix this by re-saving the file as UTF-8.
Why does this add a new line to every line in the file?
text = File.read('1.txt', mode: 'rb', encoding: 'UTF-8')
File.write('1.txt', text, encoding: 'UTF-8')
If I remove binary mode, it is normal again, but I need it for another encoding (UTF-16LE).
Test it - http://asdfasd.net/ruby/binary_adds_newline.zip
I ran some tests; it depends on the way your lines end. When they end with either LF (\n) or CR (\r) it will produce output like you expect. That is, no new lines are added. However, if you have CRLF (\r\n) it will add a CR character after each line, thus ending it effectively with CRCR+LF which produces the extra line.
Most programming editors allow you to select an option which makes the line endings visible.
I am not exactly sure why this happens, but likely has to do with the following snippet from the IO docs at the 'b' mode:
Suppresses EOL <-> CRLF conversion on Windows.
It appears that when not using binary mode, CRLF (default end-of-lines on Windows) are converted to LF's. The simple solution thus seems to just replace all \r\n by either \n or \r. You can do that like this:
File.open('converted.txt', 'wb') do |converted|
File.open('1.txt', 'rb').each_line do |line|
converted << line.gsub("\r\n", "\n") # Replace CRLF with LF
end
end
If you run the script multiple times on the same file, you should make sure to replace CRLF with LF before you write it back:
# Note the .gsub at the end here
text = File.read('1.txt', mode: 'rb', encoding: 'UTF-8').gsub("\r\n", "\n")
File.write('1.txt', text, encoding: 'UTF-8')
I currently use: http://emacswiki.org/emacs/DosToUnix to manually convert DOS CSVs to UNIX. Just wondering if there's a ruby function for the CSV library that I'm missing? And / or if it's possible build a quick script / Monkey Patch.
Yes. The CSV docs say:
The String appended to the end of each row. This can be set to the special :auto setting, which requests that CSV automatically discover this from the data. Auto-discovery reads ahead in the data looking for the next "\r\n", "\n", or "\r" sequence.
:auto is the default, so you should be able to feed your DOS CSV to Ruby unmodified.
However, if you want to convert to UNIX line endings:
str.gsub(/\r\n/, "\n")
I am trying to process a CSV file that can either be generated with CF or LF as an EOL marker. When I try to read the file with
infile = File.open('my.csv','r')
while line = infile.gets
...
The entire 20MB file is read in as one line.
How can I detect and handle properly?
TIA
I would slurp the file, normalize the input, and then feed it to CSV:
raw = File.open('my.csv','rb',&:read).gsub("\r\n","\n")
CSV.parse(raw) do |row|
# use row here...
end
The above uses File.open instead of IO.read due to slow file reads on Windows Ruby.
When in doubt, use a regex.
> "how\r\nnow\nbrown\r\ncow\n".split /[\r\n]+/
=> ["how", "now", "brown", "cow"]
So, something like
infile.read.split(/[\r\n]+/).each do |line|
. . .
end
Now, it turns out that the standard library CSV already understands mixed line endings, so you could just do:
CSV.parse(infile.read).each do |line|
. . .
I'm using Ruby's CSV library to parse some CSV. I have a seemingly well-formed CSV file that I created by exporting an Excel file as CSV.
However CSV.open(filename, 'r') causes a CSV::IllegalFormatError.
There are no rogue commas or quotation marks in the file, nor anything else that I can see that might cause problems.
I suspect the problem could be to do with line endings. I am able to parse data entered manually via a text editor (Aquamacs). It is just when I try with data exported from Excel (for OS X) that problems occur. When I open up the exported CSV in vim, all the text appears on one line, with ^M appearing between lines.
From the docs, it seems that you can provide open with a row separator; however I am unsure what it should be in this case.
Try: CSV.open('filename', 'r', ?,, ?\r)
As cantlin notes, for Ruby 2 it's:
CSV.new('file.csv', 'r', :col_sep => ?,, :row_sep => ?\r)
I'm pretty sure these will DTRT for you. You can also "fix" the file itself (in which case keep the old open) with the following vim command: :%s/\r/\r/g
Yes, I know that command looks like a total no-op, but it will work.
Stripping \r characters seemed to work for me
CSV.parse(File.read('filename').gsub(/\r/, '')) do |row|
...
end
Another option is to open the CSV file or the original spreadsheet in Excel and save it as "Windows Comma Separated" rather than "Comma Separated Values". This will output the file with line endings that FasterCSV is able to understand.
"""
When I open up the exported CSV in vim, all the text appears on one line, with ^M appearing between lines.
From the docs, it seems that you can provide open with a row separator; however I am unsure what it should be in this case.
"""
Read back a sentence ... ^M means keyboard Ctrl-M aka '\x0D' (M is the 13th letter of the ASCII alphabet; 0x0D == 13) aka ASCII CR (carriage return) aka '\r' ... IOW what Macs used to use as a line terminator before OS X.
It seems newer versions of the CSV parser and/or any component it uses read DOS/Windows line endings without issues. Mac OS X's stock one (not sure the version) was not cutting it, installed Ruby 2.0.0 and it parsed the file just fine, without the special arguments...
I had similar problem. I got an error:
"error_message"=>"Illegal quoting in line 1.", "error_class"=>"CSV::MalformedCSVError"
The problem was the file had Windows line endings, which are of course other than Unix. What helped me was defining row_sep: "\r\n":
CSV.open(path, 'w', headers: :first_row, col_sep: ';', quote_char: '"', row_sep: "\r\n")