Read compressed csv file on-the-fly - ruby

I have wrote some csv file and compress it, using this code:
arr = (0...2**16).to_a
File.open('file.bz2', 'wb') do |f|
writer = Bzip2::Writer.new f
CSV(writer) do |csv|
(2**16).times { csv << arr }
end
writer.close
end
I want to read this csv bzip2ed file (csv files compressed with bzip2). These files uncompressed look like:
1,2
4,12
5,2
8,7
1,3
...
So I tried this code:
Bzip2::Reader.open(filename) do |bzip2|
CSV.foreach(bzip2) do |row|
puts row.inspect
end
end
but when it is executed, it throws:
/Users/foo/.rvm/rubies/ruby-2.1.0/lib/ruby/2.1.0/csv.rb:1256:in `initialize': no implicit conversion of Bzip2::Reader into String (TypeError)
from /Users/foo/.rvm/rubies/ruby-2.1.0/lib/ruby/2.1.0/csv.rb:1256:in `open'
from /Users/foo/.rvm/rubies/ruby-2.1.0/lib/ruby/2.1.0/csv.rb:1256:in `open'
from /Users/foo/.rvm/rubies/ruby-2.1.0/lib/ruby/2.1.0/csv.rb:1121:in `foreach'
from worm_pathfinder_solver.rb:79:in `block in <main>'
from worm_pathfinder_solver.rb:77:in `open'
from worm_pathfinder_solver.rb:77:in `<main>'
Question:
What is wrong?
How should I do?

CSV.foreach assumes you're passing a file path to open. If you want to pass a stream to CSV you need to be more explicit and use CSV.new. This code will process a gzipped file:
Zlib::GzipReader.open(filename) do |gzip|
csv = CSV.new(gzip)
csv.each do |row|
puts row.inspect
end
end

Based on the brief docs you'll probably need send the read method on bzip2 object (not tested):
Bzip2::Reader.open(filename) do |bzip2|
CSV.foreach(bzip2.read) do |row|
# ^^^^
puts row.inspect
end
end

My guess would be that CSV tries to convert the Bzip2::Reader to a string but doesn't know how and simply throws the exception. You can manually read the data into a string and then pass THAT to CSV.
Though it's strange since it could handle Bzip2::Writer just fine.

Related

Ruby/Rake: Why isn't the CSV file open for reading?

I want to drop the top two rows from a CSV file and add my own header. I have wrapped this in a rake task.
task :fix_csv do
# copy to temp file
cp ENV['source'], TMP_FILE
# drop header rows
table = CSV.table(TMP_FILE)
File.open(TMP_FILE, 'w') do |f|
f.write(table.drop(2).to_csv)
end
# add new header
CSV.open(TMP_FILE, 'w', force_quotes: true) do |csv|
csv << HEADERS if csv.count.eql? 0
end
puts 'Done!'
end
However, this fails with an error:
rake aborted!
IOError: not opened for reading
../rakefile.rb:54:in `count'
Line 54 is:
csv << HEADERS if csv.count.eql? 0
Why can't it read the file? Do I need to explicitly close the file after I've removed the first two rows?
The second time you open the file for writing only, but then you are trying to iterate getting an access to content (namely by querying the row count):
# ⇓⇓⇓
CSV.open(TMP_FILE, 'w', force_quotes: true) do |csv|
# ⇓⇓⇓⇓⇓
csv << HEADERS if csv.count.eql? 0
end
while it’s easy to fix, may I ask what would be wrong with forgetting about CSV in total, in favor of somewhat like:
old = File.readlines(FILE_NAME).drop(2)
old[0...0] = HEADERS.join(',')
File.write(FILE_NAME, old)
?

how to extract .tar file

I am trying to extract .tar file using the below code, but getting invalid file format error, what is way to extract .tar file in ruby, FYI I am using window OS. I can see the file property as Tar archive (application/x-tar)
file_path = "/home/logan/skype/bill_2015-12-14.txt.tar"
extract = Gem::Package::TarReader.new(Zlib::GzipReader.open(file_name))
ERROR:
Zlib::GzipFile::Error: not in gzip format
from (irb):68:in `initialize'
from (irb):68:in `open'
A solution is given on The Buckblog and looks like this :
require 'rubygems/package'
File.open("chunky_png-1.3.4.gem", "rb") do |file|
Gem::Package::TarReader.new(file) do |tar|
tar.each do |entry|
if entry.file?
FileUtils.mkdir_p(File.dirname(entry.full_name))
File.open(entry.full_name, "wb") do |f|
f.write(entry.read)
end
File.chmod(entry.header.mode, entry.full_name)
end
end
end
end

String cant write to CSV file Ruby

I don't necessarily understand why this bit of code is incorrect. I understand the error in that string class doesn't have a method map. But I'm having a hard time wrapping my head around this error.
The error
`<<': undefined method `map' for #<String:0x000001020b8940> (NoMethodError)
The but of code
require 'nokogiri'
require 'open-uri'
require 'csv'
doc = Nokogiri::HTML(open("dent-file.html"))
new_array = doc.search("p").map do |para|
para.text
end
CSV.open("dent.csv", "w") do |csv|
new_array.each do |string|
csv << string
end
end
I want to write each element of the newdoc array to each line of the csv file dent.csv.
CSV#<< accepts an array or a CSV::Row. Convert the string to an array.
csv << [string]

Why does exporting data to CSV give only numbers?

I am trying to export a mongo strucuture to CSV with the following code:
file = Tempfile.new(['genreport','.csv'],file_path)
file_name = file.path()
CSV.open(file_name,"w") do |csv|
result_cursor.each do |eachdoc|
eachdoc.each do |key,value|
csv<<key.to_s
csv<<value.to_s
end
csv<<"\n"
end
end
The CSV file is created as expected, but it is full of numbers only. What am I doing wrong?
Here are the types:
result_cursor is a mongo cursor, eachdoc will be a hash, and key and value will be a String.
I'm not sure how your code differs or alters the context of whats posted, but when I try to run the code as is, I get an exception (undefined method `map' for 'value of key'). However, when I do this, it works fine.
file = Tempfile.new(['genreport','.csv'],file_path)
file_name = file.path()
CSV.open(file_name,"w") do |csv|
result_cursor.each do |eachdoc|
eachdoc.each do |key,value|
csv << [key, value]
end
end
end
That really doesn't help explain the numbers you're seeing though. Perhaps something else is overriding the contents of the temp file.

Ruby: How to read maybe gzipped data from file or STDIN?

I would like to read data from an input file or STDIN - the input data may be gzipped.
For files this can be done with Zlib::GzipReader like this:
require 'zlib'
ios = File.open(file, mode='r')
begin
ios = Zlib::GzipReader.new(ios)
rescue
ios.rewind
end
ios.each_line { |line| puts line }
However, I fail to get the detection of zipped data from STDIN right:
require 'zlib'
if STDIN.tty?
# do nothing
else
ios = STDIN
begin
ios = Zlib::GzipReader.new(ios)
rescue
ios.rewind
end
end
ios.each_line { |line| puts line }
The above works with gzipped data in STDIN, but regular data results in this:
./test.rb:14:in `rewind': Illegal seek - <STDIN> (Errno::ESPIPE)
from ./test.rb:14:in `rescue in <main>'
from ./test.rb:11:in `<main>'
So, if I cannot rewind STDIN, how do I test if data in STDIN is zipped or not?
Cheers,
Martin
Load data from STDIN into temporary file and only then parse it
require 'tempfile'
tf = Tempfile.new('tmp')
while $stdin.gets do
tf.puts $_
end
tf.rewind

Resources