Compressing using LZMA on-the-fly to a file? - ruby

This code compress on-the-fly data using a Bzip2 writer a csvfile.
File.open('file.bz2', 'wb') do |f|
writer = Bzip2::Writer.new f
CSV(writer) do |csv|
(2**16).times { csv << arr }
end
writer.close
end
I want to do the same using lzma algorithm and ruby-lzma gem could be useful, but this gem only one method compressed = LZMA.compress('data to compress').
Question:
Is there a way to do a similar compression using lzma?

Use ruby-xz which has a much better interface to liblzma (using FFI).
The lib has XZ::StreamWriter class. Check the docs for ruby-xz
However CSV constructor does not take the XZ::StreamWriter, so you need to change the code to use CSV.generate_line. I was able to run this, which does generate the file on the fly
require 'xz'
require 'csv'
arr = ['one', 'two', 'three']
File.open('file.xz', 'wb') do |f|
XZ::StreamWriter.new(f) do |writer|
(2**16).times { writer << CSV.generate_line(arr) }
writer.finish
end
end

Related

Compressing using Bzip2 on-the-fly to a file?

There is a program that generates huge CSV files. For example:
arr = (0..10).to_a
CSV.open("foo.csv", "wb") do |csv|
(2**16).times { csv << arr }
end
It will generate a big file, so I want to be compressed on-the-fly, and, instead of output a non-compressed CSV file (foo.csv), output a bzip-compressed CSV file (foo.csv.bzip).
I have an example from the "ruby-bzip2" gem:
writer = Bzip2::Writer.new File.open('file')
writer << 'data1'
writer.close
I am not sure how to compose Bzip2 write from the CSV one.
You can also construct a CSV object with an IO or something sufficiently like an IO, such as a Bzip2::Writer.
For example
File.open('file.bz2', 'wb') do |f|
writer = Bzip2::Writer.new f
CSV(writer) do |csv|
(2**16).times { csv << arr }
end
writer.close
end
Maybe it would be more flexible to write the CSV data to stdout:
# csv.rb
require 'csv'
$stdout.sync = true
arr = (0..10).to_a
(2**16).times do
puts arr.to_csv
end
... and pipe the output to bzip2:
$ ruby csv.rb | bzip2 > foo.csv.bz2

How to properly automate xml to xls

I am getting a lot of xml files recently, that i want to analyse in excel. In stead of using the xml conversion standard in (newer versions of) excel, I want to use a Ruby code that does it for a number of files automatically.
I am not very familiar, however, with rexml. After half a days work I got the code to convert just one(!) xml node. This is how it looks:
require 'rexml/document'
Dir.glob("FILES/archive/*.xml") do |eksemel|
puts "converting #{eksemel}"
filename = (/\d+/.match(eksemel)).to_s
xml_file = File.open("#{eksemel}", "r")
csv_file = File.new("#{filename}.csv", "w")
xml = REXML::Document.new( xml_file )
counter = 0
xml.elements.each("RESULTS") do |e|
e.elements.each("component") do |f|
f.elements.each("paragraph") do |g|
counter = counter + 1
csv_file.puts g.text
end
end
end
end
Is there a way to a) instead of define the names of the elements and the number let ruby do it automatically and b) save all of these as separate columns in a csv file?
It isn't clear what you are using counter for. It would also help if you clarified what kind of structure the XML file has (for instance, are there many <paragraph> elements within each <component> element?). But, here is a cleaner way to write what I think you shooting for:
require 'rexml/document'
require 'csv'
Dir.glob('FILES/archive/*.xml') do |eksemel|
puts "converting #{eksemel}"
# I assume you are creating a .csv file with the same name as your .xml file
xml_file = File.new(eksemel)
csv_file = CSV.open(eksemel.sub(/\.xml$/, '.csv'), 'w')
xml = REXML::Document.new(xml_file)
counter = xml.elements.to_a('RESULTS//component//paragraph').length
xml.elements.each('RESULTS//component') do |component|
csv_file << component.elements.to_a('paragraph')
end
[xml_file, csv_file].each {|f| f.close}
end

Split output data using CSV in Ruby 1.9

I have a csv file that has 7000+ records that I process/manipulate and export to a new csv file. I have no issues doing that and everything works as expected.
I would like to change the process to where it breaks the output into multiple files. So instead of writing all 7000+ rows to the new csv file it would write the first 1000 rows to newexport1.csv and the next 1000 rows to newexport2.csv until it reaches the end of the data.
Is there an easy way to do this with CSV in Ruby 1.9?
My current write method:
CSV.open("#{PATH_TO_EXPORT_FILE}/newexport.csv", "w+", :col_sep => '|', :headers => true) do |f|
export_rows.each do |row|
f << row
The short answer is "no". You'll want to adjust your current code to split up the set and then dump each subset to a different file. This ought to be pretty close:
export_rows.each_slice(1000).with_index do |rows, idx|
CSV.open("#{PATH_TO_EXPORT_FILE}/newexport-#{idx.to_s}.csv", "w+", :col_sep => '|', :headers => true) do |f|
rows.each { |row| f << row }
end
end
Yes, there is.
It's embedded in Ruby 1.9
Check this link
To read:
CSV.foreach("path/to/file.csv") do |row|
# manipulate the content
end
To write:
CSV.open("path/to/file.csv", "wb") do |csv|
csv << ["row", "of", "CSV", "data"]
csv << ["another", "row"]
# something else
end
I think that you'll need to combine one inside the other.
FasterCSV is the standard CSV library since ruby 1.9, you can find a lot of example code in the examples folder:
https://github.com/JEG2/faster_csv/tree/master/examples
For the example code to work, you should change:
require "faster_csv"
for
require "csv"

Ruby streaming tar/gz

Basically I want to stream data from memory into a tar/gz format (possibly multiple files into the tar, but it should NEVER TOUCH THE HARDDRIVE, only streaming!), then stream them somewhere else (an HTTP request body in my case).
Anyone know of an existing library that can do this? Is there something in Rails?
libarchive-ruby is only a C wrapper and seems like it would be very platform-dependent (the docs want you to compile as an installation step?!).
SOLUTION:
require 'zlib'
require 'rubygems/package'
tar = StringIO.new
Gem::Package::TarWriter.new(tar) { |writer|
writer.add_file("a_file.txt", 0644) { |f|
(1..1000).each { |i|
f.write("some text\n")
}
}
writer.add_file("another_file.txt", 0644) { |f|
f.write("some more text\n")
}
}
tar.seek(0)
gz = Zlib::GzipWriter.new(File.new('this_is_a_tar_gz.tar.gz', 'wb')) # Make sure you use 'wb' for binary write!
gz.write(tar.read)
tar.close
gz.close
That's it! You can swap out the File in the GzipWriter with any IO to keep it streaming. Cookies for dw11wtq!
Take a look at the TarWriter class in rubygems: http://rubygems.rubyforge.org/rubygems-update/Gem/Package/TarWriter.html it just operates on an IO stream, which may be a StringIO.
tar = StringIO.new
Gem::Package::TarWriter.new(tar) do |writer|
writer.add_file("hello_world.txt", 0644) { |f| f.write("Hello world!\n") }
end
tar.seek(0)
p tar.read #=> mostly padding, but a tar nonetheless
It also provides methods to add directories if you need a directory layout in the tarball.
For reference, you could achieve the gzipping with IO.popen, just piping the data in/out of the system process:
http://www.ruby-doc.org/core-1.9.2/IO.html#method-c-popen
The gzipping itself would look something like this:
gzippped_data = IO.popen("gzip", "w+") do |gzip|
gzip.puts "Hello world!"
gzip.close_write
gzip.read
end
# => "\u001F\x8B\b\u0000\xFD\u001D\xA2N\u0000\u0003\xF3H\xCD\xC9\xC9W(\xCF/\xCAIQ\xE4\u0002\u0000A䩲\r\u0000\u0000\u0000"
Based on the solution OP wrote, I wrote fully on-memory tgz archive function what I want to use to POST to web server.
# Create tar gz archive file from files, on the memory.
# Parameters:
# files: Array of hash with key "filename" and "body"
# Ex: [{"filename": "foo.txt", "body": "This is foo.txt"},...]
#
# Return:: tar_gz archived image as string
def create_tgz_archive_from_files(files)
tar = StringIO.new
Gem::Package::TarWriter.new(tar){ |tar_writer|
files.each{|file|
tar_writer.add_file(file['filename'], 0644){|f|
f.write(file['body'])
}
}
}
tar.rewind
gz = StringIO.new('', 'r+b')
gz.set_encoding("BINARY")
gz_writer = Zlib::GzipWriter.new(gz)
gz_writer.write(tar.read)
tar.close
gz_writer.finish
gz.rewind
tar_gz_buf = gz.read
return tar_gz_buf
end

Saving output of a query onto a text file in Ruby

I'm trying to query a table, fetch all records, and save the result as a CSV file.
This is what I've done so far:
require 'OCI8'
conn = OCI8.new('scott','tiger','020')
file = File.open('output.csv','w') do |f|
conn.exec('select * from emp') do |e|
f.write log.join(',')
end
end
.. And while it does generate a CSV file, the problem is that all records get saved onto a single line. How can I put the data such that each record goes onto a new line ?
Well, you can use f.puts instead of f.write there, but I'd recommend you take a look at CSV module:
http://ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
outfile = File.open('csvout', 'wb')
CSV::Writer.generate(outfile) do |csv|
csv << ['c1', nil, '', '"', "\r\n", 'c2']
...
end
outfile.close
PS: Actually, there is another CSV library called FasterCSV, which became CSV in standard library in Ruby 1.9. But in general, any should be better than writing it yourself.

Resources