RubyZip docx issues with write_buffer instead of open - ruby

I'm adapting the RubyZip recursive zipping example (found here) to work with write_buffer instead of open and am coming across a host of issues. I'm doing this because the zip archive I'm producing has word documents in it and I'm getting errors on opening those word documents. Therefore, I'm trying the work-around that RubyZip suggests, which is using write_buffer instead of open (example found here).
The problem is, I'm getting errors because I'm using an absolute path, but I'm not sure how to get around that. I'm getting the error "#//', name must not start with />"
Second, I'm not sure what to do to mitigate the issue with word documents. When I used my original code, which worked and created an actual zip file, any word document in that zip file had the following error upon opening: "Word found unreadable content in Do you want to recover the contents of this document? If you trust the source of this document, click Yes." The unreadable content error is the reason why I went down the road of attempting to use write_buffer.
Any help would be appreciated.
Here is the code that I'm currently using:
require 'zip'
require 'zip/zipfilesystem'
module AdvisoryBoard
class ZipService
def initialize(input_dir, output_file)
#input_dir = input_dir
#output_file = output_file
end
# Zip the input directory.
def write
entries = Dir.entries(#input_dir) - %w[. ..]
path = ""
buffer = Zip::ZipOutputStream.write_buffer do |zipfile|
entries.each do |e|
zipfile_path = path == '' ? e : File.join(path, e)
disk_file_path = File.join(#input_dir, zipfile_path)
#file = nil
#data = nil
if !File.directory?(disk_file_path)
#file = File.open(disk_file_path, "r+b")
#data = #file.read
unless [#output_file, #input_dir].include?(e)
zipfile.put_next_entry(e)
zipfile.write #data
end
#file.close
end
end
zipfile.put_next_entry(#output_file)
zipfile.put_next_entry(#input_dir)
end
File.open(#output_file, "wb") { |f| f.write(buffer.string) }
end
end
end

I was able to get word documents to open without any warnings or corruption! Here's what I ended up doing:
require 'nokogiri'
require 'zip'
require 'zip/zipfilesystem'
class ZipService
# Initialize with the directory to zip and the location of the output archive.
def initialize(input_dir, output_file)
#input_dir = input_dir
#output_file = output_file
end
# Zip the input directory.
def write
entries = Dir.entries(#input_dir) - %w[. ..]
::Zip::File.open(#output_file, ::Zip::File::CREATE) do |zipfile|
write_entries entries, '', zipfile
end
end
private
# A helper method to make the recursion work.
def write_entries(entries, path, zipfile)
entries.each do |e|
zipfile_path = path == '' ? e : File.join(path, e)
disk_file_path = File.join(#input_dir, zipfile_path)
if File.directory? disk_file_path
recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
else
put_into_archive(disk_file_path, zipfile, zipfile_path, e)
end
end
end
def recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
zipfile.mkdir zipfile_path
subdir = Dir.entries(disk_file_path) - %w[. ..]
write_entries subdir, zipfile_path, zipfile
end
def put_into_archive(disk_file_path, zipfile, zipfile_path, entry)
if File.extname(zipfile_path) == ".docx"
Zip::File.open(disk_file_path) do |zip|
doc = zip.read("word/document.xml")
xml = Nokogiri::XML.parse(doc)
zip.get_output_stream("word/document.xml") {|f| f.write(xml.to_s)}
end
zipfile.add(zipfile_path, disk_file_path)
else
zipfile.add(zipfile_path, disk_file_path)
end
end
end

Related

How to take the result from another method

I have a directory structure with sub-directories:
../../../../../MY_PROJECT/TEST_A/cats/
../../../../../MY_PROJECT/TEST_B/dogs/
../../../../../MY_PROJECT/TEST_A/tigers/
../../../../../MY_PROJECT/TEST_A/elephants/
each of which has a file that ends with ".sln":
../../../../../MY_PROJECT/TEST_A/cats/cats.sln
../../../../../MY_PROJECT/TEST_B/dogs/dogs.sln
...
These files contain information specific to their directory. I would like to do the following:
Create a file "myfile.txt" within each sub-directory, and write some strings to them:
../../../../../MY_PROJECT/TEST_A/cats/myfile.txt
../../../../../MY_PROJECT/TEST_B/dogs/myfile.txt
../../../../../MY_PROJECT/TEST_A/tigers/myfile.txt
../../../../../MY_PROJECT/TEST_A/elephants/myfile.txt
Copy a specific string in the ".sln" files to the myfile.txt of certain directories using the following method:
def parse_sln_files
sln_files = Dir["../../../../../MY_PROJECT/TEST_*/**/*.sln"]
sln_files.each do |file_name|
File.open(file_name) do |f|
f.each_line { |line|
if line =~ /C Source files ="..\\/ #"
path = line.scan(/".*.c"/)
puts path
end
}
end
end
end
I would like to do something like this:
def create_myfile
Dir['../../../../../MY_PROJECT/TEST_*/*/'].each do |dir|
File.new File.join(dir, 'myfile.txt'), 'w+'
Dir['../../../../../TEST/TEST_*/*/myfile.txt'].each do |path|
File.open(path,'w+') do |f|
f.puts "some text...."
f.puts "some text..."
f.puts # here I would like to return the result of parse_sln_files
end
end
end
end
Any suggestions on how to express this?
It seems like you want to read list of C file names from a Visual C++ Solution file, and store in a separate file in the same directory. You may have to merge the two loops that you have shown in your code, and do something like this:
def parse_sln_and_store_source_files
sln_files = Dir["../../../../../MY_PROJECT/TEST_*/**/*.sln"]
sln_files.each do |file_name|
#### Lets collect source file names in this array
source_file_names = []
File.open(file_name) do |f|
f.each_line { |line|
if line =~ /C Source files ="..\\/ #"
path = line.scan(/".*.c"/)
############ Add path to array ############
source_file_names << path
end
}
end
#### lets create `myfile.txt` in same dir as that of .sln
test_file = File.expand_path(File.dirname(file_name)) + "/myfile.txt"
File.open(test_file,'w+') do |f|
f.puts "some text...."
f.puts "some text..."
##### Iterate over source file names & write to file
source_file_names.each { |n| f.puts n }
end
end
end
This can be done bit more elegantly with few more refactoring. Also note that this is not tested code, hopefully, you get the gist of what I am suggesting.

Missing parts after parsing and processing a very large XML file in Ruby

I have to parse and modify a 22.2MB XML file (a wordpress export).
The problem is after parsing, the last part of the file is always missing, but I can't really figure out why.
I've tried using the saxerator gem, but it does not seem to solve my problem
Here I'm just trying to get all the <item> from the input file and display them in an output file:
class SaxImport
def initialize input_file, output_file
f = File.read(input_file, File.size(input_file))
xml_data = Saxerator.parser(f) do |config|
config.output_type = :xml
end
category_fr_list = {}
items = []
output = File.open output_file, "w"
xml_data.for_tag(:item).reverse_each do |item|
output << item.to_xml
end
output.close
end
end
import_en = SaxImport.new 'weekly.xml', 'weekly.processed.xml'

Script to append files

I am trying to write a script to do the following:
There are two directories A and B. In directory A, there are files called "today" and "today1". In directory B, there are three files called "today", "today1" and "otherfile".
I want to loop over the files in directory A and append the files that have similar names in directory B to the files in Directory A.
I wrote the method below to handle this but I am not sure if this is on track or if there is a more straightforward way to handle such a case?
Please note I am running the script from directory B.
def append_data_to_daily_files
directory = "B"
Dir.entries('B').each do |file|
fileName = file
next if file == '.' or file == '..'
File.open(File.join(directory, file), 'a') {|file|
Dir.entries('.').each do |item|
next if !(item.match(/fileName/))
File.open(item, "r")
file<<item
item.close
end
#file.puts "hello"
file.close
}
end
end
In my opinion, your append_data_to_daily_files() method is trying to do too many things -- which makes it difficult to reason about. Break down the logic into very small steps, and write a simple method for each step. Here's a start along that path.
require 'set'
def dir_entries(dir)
Dir.chdir(dir) {
return Dir.glob('*').to_set
}
end
def append_file_content(target, source)
File.open(target, 'a') { |fh|
fh.write(IO.read(source))
}
end
def append_common_files(target_dir, source_dir)
ts = dir_entries(target_dir)
ss = dir_entries(source_dir)
common_files = ts.intersection(ss)
common_files.each do |file_name|
t = File.join(target_dir, file_name)
s = File.join(source_dir, file_name)
append_file_content(t, s)
end
end
# Run script like this:
# ruby my_script.rb A B
append_common_files(*ARGV)
By using a Set, you can easily figure out the common files. By using glob you can avoid the hassle of filtering out the dot-directories. By designing the code to take its directory names from the command line (rather than hard-coding the names in the script), you end up with a potentially re-usable tool.
My solution....
def append_old_logs_to_daily_files
directory = "B"
#For each file in the folder "B"
Dir.entries('B').each do |file|
fileName = file
#skip dot directories
next if file == '.' or file == '..'
#Open each file
File.open(File.join(directory, file), 'a') {|file|
#Get each log file from the current directory in turn
Dir.entries('.').each do |item|
next if item == '.' or item == '..'
#that matches the day we are looking for
next if !(item.match(fileName))
#Read the log file
logFilesToBeCopied = File.open(item, "r")
contents = logFilesToBeCopied.read
file<<contents
end
file.close
}
end
end

Compress a complete directory in Ruby with zlib?

This is the code I'm trying.
require 'zlib'
Dir.glob('*.*').each do |file|
Zlib::GzipWriter.open('output.gz') do |gz|
gz.mtime = File.mtime(file)
gz.orig_name = File.basename(file)
gz.write IO.binread(file)
end
end
I've tried different variations of this. There doesn't seem to be a howto for "multiple files" online. I keep ending up with the first file name in the ouput.gz, and I think it may have the content of the last file from the directory (not sure). But that's besides the point. I just want to put each file as separate entities in a compressed file. The more cross platform compatible it is the better.
This answer is taken from http://old.thoughtsincomputation.com/posts/tar-and-a-few-feathers-in-ruby who took it from the RubyGems library.
require 'rubygems'
require 'rubygems/package'
require 'zlib'
require 'fileutils'
module Util
module Tar
# Creates a tar file in memory recursively
# from the given path.
#
# Returns a StringIO whose underlying String
# is the contents of the tar file.
def tar(path)
tarfile = StringIO.new("")
Gem::Package::TarWriter.new(tarfile) do |tar|
Dir[File.join(path, "**/*")].each do |file|
mode = File.stat(file).mode
relative_file = file.sub /^#{Regexp::escape path}\/?/, ''
if File.directory?(file)
tar.mkdir relative_file, mode
else
tar.add_file relative_file, mode do |tf|
File.open(file, "rb") { |f| tf.write f.read }
end
end
end
end
tarfile.rewind
tarfile
end
# gzips the underlying string in the given StringIO,
# returning a new StringIO representing the
# compressed file.
def gzip(tarfile)
gz = StringIO.new("")
z = Zlib::GzipWriter.new(gz)
z.write tarfile.string
z.close # this is necessary!
# z was closed to write the gzip footer, so
# now we need a new StringIO
StringIO.new gz.string
end
# un-gzips the given IO, returning the
# decompressed version as a StringIO
def ungzip(tarfile)
z = Zlib::GzipReader.new(tarfile)
unzipped = StringIO.new(z.read)
z.close
unzipped
end
# untars the given IO into the specified
# directory
def untar(io, destination)
Gem::Package::TarReader.new io do |tar|
tar.each do |tarfile|
destination_file = File.join destination, tarfile.full_name
if tarfile.directory?
FileUtils.mkdir_p destination_file
else
destination_directory = File.dirname(destination_file)
FileUtils.mkdir_p destination_directory unless File.directory?(destination_directory)
File.open destination_file, "wb" do |f|
f.print tarfile.read
end
end
end
end
end
end
end
### Usage Example: ###
#
# include Util::Tar
#
# io = tar("./Desktop") # io is a TAR of files
# gz = gzip(io) # gz is a TGZ
#
# io = ungzip(gz) # io is a TAR
# untar(io, "./untarred") # files are untarred
#
First off, that will keep overwriting output.gz, leaving it containing only the last file compressed.
Second, the gzip format does not hold multiple files. It only holds one. You need to use the .tar.gz or .zip format. .zip is more "cross platform compatible". Take a look at rubyzip.

Ruby Load multiple xml from a directory in a program to parse them

I want to load a set of xml from a directory and use REXML to parse all the xml in a loop.
I cant seem to create File Object after i start reading from a directory
i=1
filearray=Array.new
documentarray=Array.new
directory = 'xml'
Dir.foreach(directory).each { |file|
next if file == '.' or file == '..'
filearray[i]=File.open(directory +"/"+file)
i=i+1
Please help
You are opening the file, but not reading it. This is ugly, but will work:
require 'find'
files = []
directory = 'xml'
def get_contents(file)
contents = ""
contents = File.open(file).readlines
end
Find.find(directory) do |file|
next if FileTest.directory?(file)
files << get_contents(file)
end
Hope it helps

Resources