Create a tar.gz with contens of a specific path (without chdir) with Ruby - ruby

I'm working on method in Ruby that will create a tar.gz file that will archive directories and files under a certain path (cdpath), it is expected to be similar to tar -C cdpath -zcf targzfile srcs, but without changing the CWD (to keep it thread safe). I'm using Gem::Package::TarWriter to create the Tar object and wrap it with Zlib::GzipWriter to compress.
Here's what I came up with (this is just a simple standalone test):
require 'rubygems/package'
require 'zlib'
require 'pathname'
require 'find'
cdpath="/absolute/path/to/some/place"
targzfile="test.tar.gz"
src=["some-dir-name-at-cdpath"]
BLOCKSIZE_TO_READ = 1024 * 1000
path = Pathname.new(cdpath)
raise "path #{cdpath} should be an absolute path" unless path.absolute?
raise "path #{cdpath} should be a directory" unless File.directory? cdpath
raise "Destination tar.gz file #{targzfile} already exists" if File.exist? targzfile
raise "no file or directory to tar" if !src || src.length == 0
src.each { |p| p.sub! /^/, "#{cdpath}/" }
File.open targzfile, 'wb' do |otargzfile|
Zlib::GzipWriter.wrap otargzfile do |gz|
Gem::Package::TarWriter.new gz do |tar|
Find.find *src do |f|
relative_path = f.sub "#{cdpath}/", ""
mode = File.stat(f).mode
if File.directory? f
tar.mkdir relative_path, mode
else
File.open f, 'rb' do |rio|
tar.add_file relative_path, mode do |tio|
tio.write rio.read
end
end
end
end
end
end
end
However, I'm hitting the following exception and I can't seem to figure out what I'm doing wrong.
/usr/lib/ruby/2.1.0/rubygems/package/tar_writer.rb:108:in `add_file': Gem::Package::NonSeekableIO (Gem::Package::NonSeekableIO)
from tartest2.rb:29:in `block (5 levels) in <main>'
from tartest2.rb:28:in `open'
from tartest2.rb:28:in `block (4 levels) in <main>'
from /usr/lib/ruby/2.1.0/find.rb:48:in `block (2 levels) in find'
from /usr/lib/ruby/2.1.0/find.rb:47:in `catch'
from /usr/lib/ruby/2.1.0/find.rb:47:in `block in find'
from /usr/lib/ruby/2.1.0/find.rb:42:in `each'
from /usr/lib/ruby/2.1.0/find.rb:42:in `find'
from tartest2.rb:22:in `block (3 levels) in <main>'
from /usr/lib/ruby/2.1.0/rubygems/package/tar_writer.rb:85:in `new'
from tartest2.rb:21:in `block (2 levels) in <main>'
from tartest2.rb:20:in `wrap'
from tartest2.rb:20:in `block in <main>'
from tartest2.rb:19:in `open'
from tartest2.rb:19:in `<main>'
EDIT: I was able to resolve this, by using TarWriter's add_file_simple instead of add_file, the file size needs to be obtained using File.stat method, details are in the answer below.

As described in the OP, the solution is to use add_file_simple method instead of add_file, this also requires that you obtain the file size using File.stat method.
Here's a working method:
# similar as 'tar -C cdpath -zcf targzfile srcs', the difference is 'srcs' is related
# to the current working directory, instead of 'cdpath'
def self.cdtargz(cdpath, targzfile, *src)
path = Pathname.new(cdpath)
raise "path #{cdpath} should be an absolute path" unless path.absolute?
raise "path #{cdpath} should be a directory" unless File.directory? cdpath
raise "Destination tar.gz file #{targzfile} already exists" if File.exist? targzfile
raise "no file or directory to tar" if !src || src.length == 0
src.each { |p| p.sub! /^/, "#{cdpath}/" }
File.open targzfile, 'wb' do |otargzfile|
Zlib::GzipWriter.wrap otargzfile do |gz|
Gem::Package::TarWriter.new gz do |tar|
Find.find *src do |f|
relative_path = f.sub "#{cdpath}/", ""
mode = File.stat(f).mode
size = File.stat(f).size
if File.directory? f
tar.mkdir relative_path, mode
else
tar.add_file_simple relative_path, mode, size do |tio|
File.open f, 'r' do |rio|
tio.write rio.read
end
end
end
end
end
end
end
end
EDIT: After reviewing the answer in this question, I revised the above slightly to avoid "slurping" the files, in my case 95% of the files are quite small, but few very BIG ones, so this makes a lot of sense. Here's the updated version:
BLOCKSIZE_TO_READ = 1024 * 1000
def self.cdtargz(cdpath, targzfile, *src)
path = Pathname.new(cdpath)
raise "path #{cdpath} should be an absolute path" unless path.absolute?
raise "path #{cdpath} should be a directory" unless File.directory? cdpath
raise "Destination tar.gz file #{targzfile} already exists" if File.exist? targzfile
raise "no file or directory to tar" if !src || src.length == 0
src.each { |p| p.sub! /^/, "#{cdpath}/" }
File.open targzfile, 'wb' do |otargzfile|
Zlib::GzipWriter.wrap otargzfile do |gz|
Gem::Package::TarWriter.new gz do |tar|
Find.find *src do |f|
relative_path = f.sub "#{cdpath}/", ""
mode = File.stat(f).mode
size = File.stat(f).size
if File.directory? f
tar.mkdir relative_path, mode
else
tar.add_file_simple relative_path, mode, size do |tio|
File.open f, 'rb' do |rio|
while buffer = rio.read(BLOCKSIZE_TO_READ)
tio.write buffer
end
end
end
end
end
end
end
end
end

Related

Getting "Unknown file type" in ruby

here's my code:
> !#usr/bin/ruby
require 'fileutils'
Dir.chdir "/home/john/Documents"
if (Dir.exist?("Photoshoot") === false) then
Dir.mkdir "Photoshoot"
puts "Directory: 'Photoshoot' created"
end
Dir.chdir "/run/user/1000/gvfs"
camdirs = Dir.glob('*')
numcams = camdirs.length
camnum = 0
campath = []
while camnum < numcams do
campath.push("/run/user/1000/gvfs/#{camdirs[camnum]}/DCIM")
puts campath[camnum]
camnum += 1
end
campath.each do |path|
Dir.chdir (path)
foldnum = 0
foldir = Dir.glob('*')
puts foldir
Dir.entries("#{path}/#{foldir[foldnum]}").each do |filename|
filetype = File.extname(filename)
if filetype == ".JPG"
FileUtils.mv("#{path}/#{foldir[foldnum]}/#{filename}", "/home/john/Documents/Photoshoot")
end
foldnum += 1
end
end
puts "#{numcams} cameras detected"
I'm just trying to go into some cameras I have connected and extract all the images into a file but its giving me this error. One of the things that's messing me up is that the images are stored in sub-folders under DCIM. When I just use .entries it gives me the folders the images are in as well as the images.
/usr/lib/ruby/2.3.0/fileutils.rb:1387:in `copy': unknown file type: /run/user/1000/gvfs/gphoto2:host=%5Busb%3A002%2C021%5D/DCIM//IMG_0092.JPG (RuntimeError)
from /usr/lib/ruby/2.3.0/fileutils.rb:472:in `block in copy_entry'
from /usr/lib/ruby/2.3.0/fileutils.rb:1498:in `wrap_traverse'
from /usr/lib/ruby/2.3.0/fileutils.rb:469:in `copy_entry'
from /usr/lib/ruby/2.3.0/fileutils.rb:530:in `rescue in block in mv'
from /usr/lib/ruby/2.3.0/fileutils.rb:527:in `block in mv'
from /usr/lib/ruby/2.3.0/fileutils.rb:1571:in `block in fu_each_src_dest'
from /usr/lib/ruby/2.3.0/fileutils.rb:1585:in `fu_each_src_dest0'
from /usr/lib/ruby/2.3.0/fileutils.rb:1569:in `fu_each_src_dest'
from /usr/lib/ruby/2.3.0/fileutils.rb:517:in `mv'
from /home/john/Desktop/TestExtract.rb:34:in `block (2 levels) in <main>'
from /home/john/Desktop/TestExtract.rb:31:in `each'
from /home/john/Desktop/TestExtract.rb:31:in `block in <main>'
from /home/john/Desktop/TestExtract.rb:26:in `each'
from /home/john/Desktop/TestExtract.rb:26:in `<main>'
/run/user/1000/gvfs/gphoto2:host=%5Busb%3A002%2C022%5D/DCIM
/run/user/1000/gvfs/gphoto2:host=%5Busb%3A002%2C021%5D/DCIM
/run/user/1000/gvfs/gphoto2:host=%5Busb%3A002%2C020%5D/DCIM
104___03
105___04
106___05
102___01
[Finished in 0.1s with exit code 1]
[shell_cmd: ruby "/home/john/Desktop/TestExtract.rb"]
[dir: /home/john/Desktop]
[path: /home/john/bin:/home/john/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin]
Any advice? I can't figure out what's wrong.
The reason the path to your files looks strange is because your camera storage has been mounted using FUSE. If you look very closely, you'll see that it is looking for:
/run/user/1000/gvfs/gphoto2:host=%5Busb%3A002%2C021%5D/DCIM//IMG_0092.JPG
You have two forward slashes before the final filename. Try correcting this on line 34 of your app.
If the problem still manifests then it is possible that the user running the operation in Ruby does not have permission to that filesystem or the manner in which the paths are constructed by FUSE is not compatible with Ruby FileUtils.
You can try to run:
cat /run/user/1000/gvfs/gphoto2:host=%5Busb%3A002%2C021%5D/DCIM/IMG_0092.JPG
as the same user that is running the Ruby process to ensure you have read permission to the filesystem.

Passing the contents of a file to FileUtils.cp

file = 'list.txt'
fileArray = []
list_open = File.open(file, "r")
list_open.each_line { |line|
fileArray.push line
}
fileArray.each { |x| puts x }
fileArray.each { |x| FileUtils.cp x, "/home/user/scripts/" }
The contents of lists.txt is just a path to a file. I want to read from the file and pass it to cp here and copy it to /home/user/scripts/.
When I run this script, here is the error I receive:
/usr/local/lib/ruby/2.1/fileutils.rb:1401:in `initialize': No such file or directory # rb_sysopen - /home/user/test.txt (Errno::ENOENT)
from /usr/local/lib/ruby/2.1/fileutils.rb:1401:in `open'
from /usr/local/lib/ruby/2.1/fileutils.rb:1401:in `copy_file'
from /usr/local/lib/ruby/2.1/fileutils.rb:483:in `copy_file'
from /usr/local/lib/ruby/2.1/fileutils.rb:400:in `block in cp'
from /usr/local/lib/ruby/2.1/fileutils.rb:1579:in `block in fu_each_src_dest'
from /usr/local/lib/ruby/2.1/fileutils.rb:1593:in `fu_each_src_dest0'
from /usr/local/lib/ruby/2.1/fileutils.rb:1577:in `fu_each_src_dest'
from /usr/local/lib/ruby/2.1/fileutils.rb:399:in `cp'
from ./for_Test.rb:12:in `block in <main>'
from ./for_Test.rb:12:in `each'
from ./for_Test.rb:12:in `<main>'
Recall that each line in a file necessarily ends with a newline ("\n"). You need to remove it, which is easy with String#chomp:
list_path = "list.txt"
filenames = []
File.open(list_path, "r") do |list|
list.each_line do |line|
filenames.push(line.chomp)
end
end
...or more succinctly:
filenames = File.open(list_path, "r") do |list|
list.each_line.map(&:chomp)
end

Reading files in a zip archive, without unzipping the archive

I have a directory with 100+ zip files and I need to read the files inside the zip files to do some data processing, without unzipping the archive.
Is there a Ruby library to read the contents of files in zip archives, without unzipping the file?
Using rubyzip gives an error:
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
# Handle entries one by one
zip_file.each do |entry|
# Extract to file/directory/symlink
puts "Extracting #{entry.name}"
entry.extract('here')
# Read into memory
content = entry.get_input_stream.read
end
end
Gives this error:
test.rb:12:in `block (2 levels) in <main>': undefined method `read' for Zip::NullInputStream:Module (NoMethodError)
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:42:in `call'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:42:in `block in each'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:41:in `each'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:41:in `each'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/central_directory.rb:182:in `each'
from test.rb:6:in `block in <main>'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/file.rb:99:in `open'
from test.rb:4:in `<main>'
The Zip::NullInputStream is returned if the entry is a directory and not a file, could that be the case?
Here's a more robust variation of the code:
#!/usr/bin/env ruby
require 'rubygems'
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
# Handle entries one by one
zip_file.each do |entry|
if entry.directory?
puts "#{entry.name} is a folder!"
elsif entry.symlink?
puts "#{entry.name} is a symlink!"
elsif entry.file?
puts "#{entry.name} is a regular file!"
# Read into memory
entry.get_input_stream { |io| content = io.read }
# Output
puts content
else
puts "#{entry.name} is something unknown, oops!"
end
end
end
I came across the same issue and checking for if entry.file?, before entry.get_input_stream.read, resolved the issue.
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
# Handle entries one by one
zip_file.each do |entry|
# Extract to file/directory/symlink
puts "Extracting #{entry.name}"
entry.extract('here')
# Read into memory
if entry.file?
content = entry.get_input_stream.read
end
end
end

Zlib::BufError when using progressbar/ruby-progressbar gem

I use the following Ruby snippet to download a 8.9MB file.
require 'open-uri'
require 'net/http'
require 'uri'
def http_download_no_progress_bar(uri, filename)
uri.open(read_timeout: 500) do |file|
open filename, 'w' do |io|
file.each_line do |line|
io.write line
end
end
end
end
I want to add the progressbar gem to visualize the download process:
require 'open-uri'
require 'progressbar'
require 'net/http'
require 'uri'
def http_download_with_progressbar(uri, filename)
progressbar = nil
uri.open(
read_timeout: 500,
content_length_proc: lambda { |total|
if total && 0 < total.to_i
progressbar = ProgressBar.new("...", total)
progressbar.file_transfer_mode
end
},
progress_proc: lambda { |step|
progressbar.set step if progressbar
}
) do |file|
open filename, 'w' do |io|
file.each_line do |line|
io.write line
end
end
end
end
However, it now fails with the following error:
/home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:357:in `finish':
buffer error (Zlib::BufError)oooooo | 8.0MB 8.6MB/s ETA: 0:00:00
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:357:in `finish'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:262:in `ensure in inflater'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:262:in `inflater'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:274:in `read_body_0'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:201:in `read_body'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:328:in `block (2 levels) in open_http'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:1415:in `block (2 levels) in transport_request'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:162:in `reading_body'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:1414:in `block in transport_request'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:1405:in `catch'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:1405:in `transport_request'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:1378:in `request'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:319:in `block in open_http'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:853:in `start'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:313:in `open_http'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:724:in `buffer_open'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:210:in `block in open_loop'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:208:in `catch'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:208:in `open_loop'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:149:in `open_uri'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:704:in `open'
Meanwhile I also tried the ruby-progressbar gem:
require 'open-uri'
require 'ruby-progressbar'
require 'net/http'
require 'uri'
def http_download_with_ruby_progressbar(uri, filename)
progressbar = nil
uri.open(
read_timeout: 500,
content_length_proc: lambda { |total|
if total && 0 < total.to_i
progressbar = ProgressBar.create(title: filename, total: total)
end
},
progress_proc: lambda { |step|
progressbar.progress = step if progressbar
}
) do |file|
open filename, 'w' do |io|
file.each_line do |line|
io.write line
end
end
end
end
It fails with the same error. Here is the associated issue for the problem.
The problem is the file you are trying to download as every method works with this file: https://androidnetworktester.googlecode.com/files/1mb.txt.
The problem is that your file is larger than it says it is. The content_length_proc says that it is 8549968 bytes (8.15MB) whereas it is 101187668 bytes (96.5MB) (check with ls after downloading the file). Now I have an alternative that does not crash and gives you a progressbar:
def http_download_with_words(uri, filename)
bytes_total = nil
uri.open(
read_timeout: 500,
:content_length_proc => lambda{|content_length|
bytes_total = content_length},
:progress_proc => lambda{|bytes_transferred|
if bytes_total
# Print progress
print("\r#{bytes_transferred}/#{bytes_total}")
else
# We don’t know how much we get, so just print number
# of transferred bytes
print("\r#{bytes_transferred} (total size unknown)")
end
}
) do |file|
open filename, 'w' do |io|
file.each_line do |line|
io.write line
end
end
end
end
http_download_with_words(URI( 'http://data.wien.gv.at/daten/geo?service=WFS&request=GetFeature&version=1.1.0&typeName=ogdwien%3aBAUMOGD&srsName=EPSG:4326' ), 'temp.txt')
which is pretty self-explanatory, (seen here.)
Now the part I haven't been able to figure out is how exactly the progressbar gem is interfering with the ZLib. Most things seem to work fine inside the procs (e.g. having them print random stuff) so I assume both of these progressbars do something odd on completion that somehow messes with the transfer. I'd be very interested if anyone can figure out why that is?
In my testing when this occurred it was due to the raise in #set. As for why it results in an error in Zlib, that's not clear. Perhaps some strange exception handling in there. In my case I did "progbar.set(count) rescue nil" to get rid of the issue.

error related to REXML

I'm not sure it's REXML or ruby issue.
But this is happening when I work with REXML.
The program below should access elements of each xml file in the directory.
#!/usr/bin/ruby -w
require 'rexml/document'
include REXML
p "Current directory was: " + Dir.pwd
Dir.chdir("/home/askar/xml_files1") {
p "Now we're in: " + Dir.pwd
if File.exist?(Dir.pwd)
xml_files = Dir.glob("ShipmentRequest*.xml")
Dir.foreach(Dir.pwd) do |file|
xmlfile = File.new(file)
xmldoc = Document.new(xmlfile)
end
else
puts "It's empty"
end
}
When I run:
ruby import_xml.rb
Errors:
"Current directory was: /home/askar/Dropbox/rails_studio/xml_to_mysql"
"Now we're in: /home/askar/xml_files1"
There're 6226 files in the folder...
/home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/source.rb:148:in `read': Is a directory - . (Errno::EISDIR)
from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/source.rb:148:in `initialize'
from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/source.rb:14:in `new'
from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/source.rb:14:in `create_from'
from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/parsers/baseparser.rb:127:in `stream='
from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/parsers/baseparser.rb:116:in `initialize'
from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/parsers/treeparser.rb:9:in `new'
from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/parsers/treeparser.rb:9:in `initialize'
from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/document.rb:245:in `new'
from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/document.rb:245:in `build'
from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/document.rb:43:in `initialize'
from import_xml.rb:20:in `new'
from import_xml.rb:20:in `block (2 levels) in <main>'
from import_xml.rb:17:in `foreach'
from import_xml.rb:17:in `block in <main>'
from import_xml.rb:8:in `chdir'
from import_xml.rb:8:in `<main>'
When I comment out:
#xmldoc = Document.new(xmlfile)
it's not giving errors.
Folder /home/askar/xml_files1 contains only 3 xml files.
I'm using Linux Mint Nadia and
ruby -v
ruby 1.9.3p429 (2013-05-15 revision 40747) [x86_64-linux]
If you noticed, for some reason, error shows ruby 1.9.1. Is this an issue?
I think #halfelf is correct here. The API docs say that Dir.foreach will iterate over every entry in the directory - and in Unix, that includes the two directories . and ...
A couple lines before your Dir.foreach call, you use glob to build an array of files called xml_files. What happens if you iterate over that in your loop instead?
Just a guess: Not everything returned by Dir.foreach(Dir.pwd) is a file that can be read. Some of them are directories.
Using Nokogiri, here's how I'd write this:
#!/usr/bin/ruby -w
require 'nokogiri'
DIRNAME = "/home/askar/xml_files1"
puts "Current directory is: #{ Dir.pwd }"
Dir.chdir(DIRNAME) do
puts "Now in: #{ DIRNAME }"
xml_files = Dir.glob("ShipmentRequest*.xml")
if xml_files.empty?
puts "#{ DIRNAME } is empty."
else
xml_files.each do |file|
doc = Nokogiri::XML(open(file))
# ... do something with the doc ...
end
end
end

Resources