Removing traiing commas from csv file in ruby - ruby

In a csv file, I have trailing commas that I want to get rid of but the number of commas vary in length. So I cannot use gsub to remove them. Does anyone know a way to read a csv file, remove any trailing commas from the row, and rewrite to the same csv file?

You can read file line by line and sub all trailing ,s. You cannot directly edit the file so best thing to do is create a TempFile and replace your csv file with it when done. Here:
require 'fileutils'
require 'tempfile'
t_file = Tempfile.new('temp.txt')
File.open("/path/to/csv", 'r') do |f|
f.each_line{|line| t_file.puts line.chomp.sub(/,$/,'') }
end
t_file.close
FileUtils.mv(t_file.path, "/path/to/csv")

Related

Ruby. NUL chars after reading simple file

I'm reading simple text files using Ruby for further regex processing and suddenly I see that str NUL after each printable character. Totally lost, where it comes from, I tested typing simple text in Notepad, saving as txt file and still getting those. I'm on W machine, didn't have this before.
How I can process it, probably replace them, not sure how to refer to them.
My regex doesn't work with them, tried several ways, using SciTE for run.
e.g. use presented as uNULsNULeNUL and not equal to use
puts File.read(file_name)
puts '____________________'
File.open(file_name, "r") do |f|
f.each_line do |line|
puts 'Line.....' + line
end
end
---------------------- below pic on content of file and output:
This file is probably in UTF-16 format. You'll need to read it in that way:
File.open(file_name, "r:UTF-16LE") do |f|
# ...
end
That format is the default in Windows.
You can always fix this by re-saving the file as UTF-8.

Ruby remove blank lines from a file which include spaces

I am trying to remove blank lines from a file.
My current code is:
def remove_blank_lines_from_file(file)
File.write(file, File.read(file).gsub(/^$\n/, ''))
end
The above code removes only the empty lines, but I also want to remove the lines which include spaces.
How could I do it?
Since you nevertheless load the whole file into memory, this might be easier to read:
File.write(file, File.readlines(file).reject { |s| s.strip.empty? }.join)
Just remove those lines, containing the spaces only.

CSV.read Illegal quoting in line x

I am using ruby CSV.read with massive data. From time to time the library encounters poorly formatted lines, for instance:
"Illegal quoting in line 53657."
It would be easier to ignore the line and skip it, then to go through each csv and fix the formatting. How can I do this?
I had this problem in a line like 123,456,a"b"c
The problem is the CSV parser is expecting ", if they appear, to entirely surround the comma-delimited text.
Solution use a quote character besides " that I was sure would not appear in my data:
CSV.read(filename, :quote_char => "|")
The liberal_parsing option is available starting in Ruby 2.4 for cases like this. From the documentation:
When set to a true value, CSV will attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields.
To enable it, pass it as an option to the CSV read/parse/new methods:
CSV.read(filename, liberal_parsing: true)
Don't let CSV both read and parse the file.
Just read the file yourself and hand each line to CSV.parse_line, and then rescue any exceptions it throws.
Try forcing double quote character " as quote char:
require 'csv'
CSV.foreach(file,{headers: :first_row, quote_char: "\x00"}) do |line|
p line
end
Apparently this error can also be caused by unprintable BOM characters. This thread suggests using a file mode to force a conversion, which is what finally worked for me.
require 'csv'
CSV.open(#filename, 'r:bom|utf-8') do |csv|
# do something
end

How to detect and handle different EOL in Ruby?

I am trying to process a CSV file that can either be generated with CF or LF as an EOL marker. When I try to read the file with
infile = File.open('my.csv','r')
while line = infile.gets
...
The entire 20MB file is read in as one line.
How can I detect and handle properly?
TIA
I would slurp the file, normalize the input, and then feed it to CSV:
raw = File.open('my.csv','rb',&:read).gsub("\r\n","\n")
CSV.parse(raw) do |row|
# use row here...
end
The above uses File.open instead of IO.read due to slow file reads on Windows Ruby.
When in doubt, use a regex.
> "how\r\nnow\nbrown\r\ncow\n".split /[\r\n]+/
=> ["how", "now", "brown", "cow"]
So, something like
infile.read.split(/[\r\n]+/).each do |line|
. . .
end
Now, it turns out that the standard library CSV already understands mixed line endings, so you could just do:
CSV.parse(infile.read).each do |line|
. . .

ruby each_line reads line break too?

I'm trying to read data from a text file and join it with a post string. When there's only one line in the file, it works fine. But with 2 lines, my request is failed. Is each_line reading the line break? How can I correct it?
File.open('sfzh.txt','r'){|f|
f.each_line{|row|
send(row)
}
I did bypass this issue with split and extra delimiter. But it just looks ugly.
Yes, each_line includes line breaks. But you can strip them easily using chomp:
File.foreach('test1.rb') do |line|
send line.chomp
end
Another way is to map strip onto each line as it is returned. To read a file line-by-line, stripping whitespace and do something with each line you can do the following:
File.open("path to file").readlines.map(&:strip).each do |line|
(do something with line)
end

Resources