Ruby CSV::Row remove new line - ruby

I am opening a CSV file and then converting it to JSON. This is all working fine except the JSON data has \n characters in the string. These are not part of the last element as far as I can tell from printing it and trying to chomp it. When I print the row it does have \n
require 'csv'
require 'json'
def csv_to_json (tmpfile)
JSON_ARRAY = Array.new
CSV.foreach(tmpfile) do |row|
print row[row.length - 1]
if row[row.length - 1].chomp! == nil
print row
end
JSON_ARRAY.push(row)
end
return JSON_ARRAY.to_json
end
The JSON then looks like this when it is returned
["field11,field12\n",
"field21,field22\n"]
How can I remove these new line characters?
EDIT:
These are CSV::Row objects and do not support string operations like chomp or strip
tmpfile is in the format
field11,field21
field21,field22

Set the row_sep to nil.
JSON_ARRAY.push( row.to_s( row_sep: nil ) )
or
JSON_ARRAY.push( row.to_csv( row_sep: nil ) )
As a comment pointed out, CSV::row#to_s is an alias for CSV::row#to_csv, which adds a row separator after each line automatically. To get around this you can just set the row_sep to nil and it will not add \n at the end of each row.
Hope that helps.

The simplest way:
File.read(tmpfile).split("\n")
By the way, if you want to remove the newline from the string, you could use String::strip method.
CSV.foreach(tmpfile) do |row|
# here row should be an array.
p row
end

CSV.foreach(tmpfile) do |row|
print row[row.length - 1]
if row[row.length - 1].chomp! == nil
print row
end
row.map{|cell| cell.strip!}
JSON_ARRAY.push(row)
end
The row doesn't support stripping, but the cells do.

I was able to get it to work using a map! after the fact
json_array.map! { |row| row = row.to_s.chomp! }
You could also do the to_s.chomp! inside of the loop. This wasn't an option for me because I needed the regular objects to do some calculations before returning the json

Related

Ruby : How to remove duplicate lines from a document text?

I want to remove duplicate lines from a text for example :
1.aabba
2.abaab
3.aabba
4.aabba
After running :
1.aabba
2.abaab
Tried so far :
lines = File.readlines("input.txt")
lines = File.read('/path/to/file')
lines.split("\n").uniq.join("\n")
Let's construct a file.
fname = 't'
IO.write fname, <<~END
dog
cat
dog
pig
cat
END
#=> 20
See IO::write. First let's suppose you simply want to read the unique lines into an array.
If, as here, the file is not excessive large, you can write:
arr = IO.readlines(fname, chomp: true).uniq
#=> ["dog", "cat", "pig"]
See IO::readlines. chomp: true removes the newline character at the end of each line.
If you wish to then write that array to another file:
fname_out = 'tt'
IO.write(fname_out, arr.join("\n") << "\n")
#=> 12
or
File.open(fname_out, 'w') do |f|
arr.each { |line| f.puts line }
end
If you wish to overwrite fname, write to a new file, delete the existing file and then rename the new file fname.
If the file is so large it cannot be held in memory and there are many duplicate lines, you might be able to do the following.
require 'set'
st = IO.foreach(fname, chomp: true).with_object(Set.new) do |line, st|
st.add(line)
end
#=> #<Set: {"dog", "cat", "pig"}>
See IO::foreach.
If you wish to simply write the contents of this set to file, you can execute:
File.open(fname_out, 'w') do |f|
st.each { |s| f.puts(s) }
end
If instead you need to convert the set to an array:
st.to_a
#=> ["dog", "cat", "pig"]
This assumes you have enough memory to hold both st and st.to_a. If not, you could write:
st.size.times.with_object([]) do |_,a|
s = st.first
a << s
st.delete(s)
end
#=> ["dog", "cat", "pig"]
If you don't have enough memory to even hold st you will need to read your file (line-by-line) into a database and then use database operations.
If you wish to write the file with the duplicates skipped, and the file is very large, you may do the following, albeit with the infinitesimal risk of including one or more duplicates (see the comments).
require 'set'
line_map = IO.foreach(fname, chomp: true).with_object({}) do |line,h|
hsh = line.hash
h[hsh] = $. unless h.key?(hsh)
end
#=> {3393575068349183629=>1, -4358860729541388342=>2,
# -176447925574512206=>4}
$. is the number (base 1) of the line just read. See String#hash. Since the number of distinct values returned by this method is finite and the number of possible strings is infinite, there is the possibility that two distinct strings could have the same hash value.
Then (assuming line_map is not empty):
lines_to_keep = line_map.values
File.open(fname_out, 'w') do |fout|
IO.foreach(fname, chomp: true) do |line|
if lines_to_keep.first == $.
fout.puts(line)
lines_to_keep.shift
end
end
end
Let's see what we've written:
puts File.read(fname_out)
dog
cat
pig
See File::open.
Incidentally, for IO class methods m (including read, write, readlines and foreach), you may see IO.m... written File.m.... That's permissible because File is a subclass of IO and therefore inherits the latter's methods. That does not apply to my use of File::open, as IO::Open is a different method.
Set only stores unique elements, so:
require 'Set'
s = Set.new
while line = gets
s << line.strip
end
s.each { |unique_elt| puts unique_elt }
You can run this with any input file using < input.txt on the command-line rather than hardwiring the file name into your program.
Note that Set is based on Hash, and the documentation states "Hashes enumerate their values in the order that the corresponding keys were inserted", so this will preserve the order of entry.
You can continue your idea with uniq.
uniq compares result of the block and delete duplicates.
For example you have input.txt with this content:
1.aabba
2.abaab
3.aabba
4.aabba
puts File.readlines('input.txt', chomp: true).
uniq { |line| line.sub(/\A\d+\./, '') }.
join("\n")
# will print
# 1.aabba
# 2.abaab
Here Sring#sub that delete list numbers, but you can use other methods, for example line[2..-1].

how I could split a hash in ruby?

Im trying to split I hash but not get nothing
this is my code
start="1,4,1,0,1,1,1,30,12,;1,4,1,2,1,1,1,30,29,;1,5,1,2,0,1,1,30,29,;1,4,1,2,0,1,1,30,29,;1,4,1,0,1,1,1,30,29,;"
options = {"start" => "1,4,1,0,1,1,1,30,12,;1,4,1,2,1,1,1,30,29,;1,5,1,2,0,1,1,30,29,;1,4,1,2,0,1,1,30,29,;1,4,1,0,1,1,1,30,29,;"}
File.open("mmmm3", "a" )do |f|
f.puts #{options[start]}.split(";")[1]
end
please help me with this
options[start] returns a string. But the problem is you have mistakenly commented out your code. Remove #{ and }. What you want to print is
options[start].split(";")[1]
which will contain the second group (since [0] would return the first)

How can I further process the line of data that causes the Ruby FasterCSV library to throw a MalformedCSVError?

The incoming data file(s) contain malformed CSV data such as non-escaped quotes, as well as (valid) CSV data such as fields containing new lines. If a CSV format error is detected I would like to use an alternative routine on that data.
With the following sample code (abbreviated for simplicity)
FasterCSV.open( file ){|csv|
row = true
while row
begin
row = csv.shift
break unless row
# Do things with the good rows here...
rescue FasterCSV::MalformedCSVError => e
# Do things with the bad rows here...
next
end
end
}
The MalformedCSVError is caused in the csv.shift method. How can I access the data that caused the error from the rescue clause?
require 'csv' #CSV in ruby 1.9.2 is identical to FasterCSV
# File.open('test.txt','r').each do |line|
DATA.each do |line|
begin
CSV.parse(line) do |row|
p row #handle row
end
rescue CSV::MalformedCSVError => er
puts er.message
puts "This one: #{line}"
# and continue
end
end
# Output:
# Unclosed quoted field on line 1.
# This one: 1,"aaa
# Illegal quoting on line 1.
# This one: aaa",valid
# Unclosed quoted field on line 1.
# This one: 2,"bbb
# ["bbb", "invalid"]
# ["3", "ccc", "valid"]
__END__
1,"aaa
aaa",valid
2,"bbb
bbb,invalid
3,ccc,valid
Just feed the file line by line to FasterCSV and rescue the error.
This is going to be really difficult. Some things that make FasterCSV, well, faster, make this particularly hard. Here's my best suggestion: FasterCSV can wrap an IO object. What you could do, then, is to make your own subclass of File (itself a subclass of IO) that "holds onto" the result of the last gets. Then when FasterCSV raises an exception you can ask your special File object for the last line. Something like this:
class MyFile < File
attr_accessor :last_gets
#last_gets = ''
def gets(*args)
line = super
#last_gets << $/ << line
line
end
end
# then...
file = MyFile.open(filename, 'r')
csv = FasterCSV.new file
row = true
while row
begin
break unless row = csv.shift
# do things with the good row here...
rescue FasterCSV::MalformedCSVError => e
bad_row = file.last_gets
# do something with bad_row here...
next
ensure
file.last_gets = '' # nuke the #last_gets "buffer"
end
end
Kinda neat, right? BUT! there are caveats, of course:
I'm not sure how much of a performance hit you take when you add an extra step to every gets call. It might be an issue if you need to parse multi-million-line files in a timely fashion.
This fails utterly might or might not fail if your CSV file contains newline characters inside quoted fields. The reason for this is described in the source--basically, if a quoted value contains a newline then shift has to do additional gets calls to get the entire line. There could be a clever way around this limitation but it's not coming to me right now. If you're sure your file doesn't have any newline characters within quoted fields then this shouldn't be a worry for you, though.
Your other option would be to read the file using File.gets and pass each line in turn to FasterCSV#parse_line but I'm pretty sure in so doing you'd squander any performance advantage gained from using FasterCSV.
I used Jordan's file subclassing approach to fix the problem with my input data before CSV ever tries to parse it. In my case, I had a file that used \" to escape quotes, instead of the "" that CSV expects. Hence,
class MyFile < File
def gets(*args)
line = super
if line != nil
line.gsub!('\\"','""') # fix the \" that would otherwise cause a parse error
end
line
end
end
infile = MyFile.open(filename)
incsv = CSV.new(infile)
while row = infile.shift
# process each row here
end
This allowed me to parse the non-standard CSV file. Ruby's CSV implementation is very strict and often has trouble with the many variants of the CSV format.

Swap Words in File with hash

I have a text file and I am trying to replace certain lines with the values in a hash. I am trying to make it loop through the file, and swap out anything that matches the hash. For some reason this isn't working, it only duplicates the file, doesn't swap anything out. Any Ideas?
HASHBROWNS{
'mustard' => 'dijon',
'ketchup' => 'catsup',
}
File.open('new_hashed_file.txt', 'w') do |file|
File.open('oldfile.txt', 'r').readlines.each do |swaparoo|
if HASHBROWNS.has_key?(swaparoo.downcase)
file.puts HASHBROWNS[swaparoo.downcase]
else
file.puts swaparoo
end
end
end
Thanks
Ryn
Change this line:
File.open('oldfile.txt', 'r').readlines.each do |swaparoo|
to this:
File.open('oldfile.txt', 'r').readlines.map(&:chomp).each do |swaparoo|
The problem is your array of lines contains newlines.
When you read data with readlines there will be a newline present in each string. This is what's making your match miss. The easy way is to just trim it off with chomp. You may want do modify your test slightly:
File.open('new_hashed_file.txt', 'w') do |file|
File.open('oldfile.txt', 'r').readlines.each do |line|
line = line.chomp.downcase
file.puts HASHBROWNS[line] || line
end
end
One thing to pay attention to is not repeatedly calling methods like downcase if you can simply save the result to a temporary variable and recycle it.

Ruby: Why is there a `nil` at the end of all the strings in my array?

So I've written some code in Ruby to split a text file up into individual lines, then to group those lines based on a delimiter character. This output is then written to an array, which is passed to a method, which spits out HTML into a text file. I started running into problems when I tried to use gsub in different methods to replace placeholders in a HTML text file with values from the record array - Ruby kept telling me that I was passing in nil values. After trying to debug that part of the program for several hours, I decided to look elsewhere, and I think I'm on to something. A modified version of the program is posted below.
Here is a sample of the input text file:
26188
WHL
1
Delco
B-7101
A-63
208-220/440
3
285 w/o pallet
1495.00
C:/img_converted/26188B.jpg
EDM Machine Part 2 of 3
AC Motor, 3/4 Hp, Frame 182, 1160 RPM
|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|
Here is a snippet of the code that I've been testing with:
# function to import file as a string
def file_as_string(filename)
data = ''
f = File.open(filename, "r")
f.each_line do |line|
data += line
end
return data
end
Dir.glob("single_listing.jma") do |filename|
content = file_as_string(filename)
content = content.gsub(/\t/, "\n")
database_array = Array.new
database_array = content.split("|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|")
for i in database_array do
record = Array.new
record = i.split("\n")
puts record[0]
puts record[0].class end
end
When that code is run, I get this output:
john#starfire:~/code/ruby/idealm_db_parser$ ruby putsarray.rb
26188
String
nil
NilClass
... which means that each array position in record apparently has data of type String and of type nil. why is this?
Your database_array has more dimensions than you think.
Your end-of-stanza marker, |--|--|...|--| has a newline after it. So, file_as_string returns something like this:
"26188\nWHL...|--|--|\n"
and is then split() on end-of-stanza into something like this:
["26188\nWHL...1160 RPM\n", "\n"] # <---- Note the last element here!
You then split each again, but "\n".split("\n") gives an empty array, the first element of which comes back as nil.

Resources