RubyZip: archiving process indication - ruby

I am adding tons of file to my archive it looks like this:
print "Starting ..."
Zip::ZipFile.open(myarchive, 'w') do |zipfile|
my_tons_of_files.each do |file|
print "Adding #{file.filename} to archive ... \r"
# ...
end
print "\n Saving archive"
# !!! -> waiting about 10-15 minutes
# but I want to show the percentage of completed job
end
After all files are added to my archive it starts to compress them all (about 10-15 minutes).
How can I indicate what is actually going on with rubyzip gem (actually I want to show percentage like current_file_number/total_files_count).

You can override Zip::ZipFile.commit:
require 'zip'
require 'zip/zipfilesystem'
module Zip
class ZipFile
def commit
return if ! commit_required?
on_success_replace(name) {
|tmpFile|
ZipOutputStream.open(tmpFile) {
|zos|
total_files = #entrySet.length
current_files = 1
#entrySet.each do |e|
puts "Current file: #{current_files}, Total files: #{total_files}"
current_files += 1
e.write_to_zip_output_stream(zos)
end
zos.comment = comment
}
true
}
initialize(name)
end
end
end
print "Starting ..."
Zip::ZipFile.open(myarchive, 'w') do |zipfile|

Related

How do I stop the Tempfile created by Creek from being deleted before I'm done with it?

I'm writing a script that Creek and an .xlsx file and uses it to update the prices and weights of products in a database. The .xlsx file is located on an AWS server, so Creek copies the file down and stores it in a Tempfile while it is in use.
The issue is, at some point the Tempfile seems to be prematurely deleted, and since Creek continues to call on it whenever it iterates through a sheet, the script fails. Interestingly, my coworker's environment runs the script fine, though I haven't found a difference between what we're running.
Here is the script I've written:
require 'creek'
class PricingUpdateWorker
include Sidekiq::Worker
def perform(filename)
# This points to the file in the root bucket
file = bucket.files.get(filename)
# Make public temporarily to open in Creek
file.public = true
file.save
creek_sheets = Creek::Book.new(file.public_url, remote: true).sheets
# Close file to public
file.public = false
file.save
creek_sheets.each_with_index do |sheet, sheet_index|
p "---------- #{sheet.name} ----------"
sheet.simple_rows.each_with_index do |row, index|
next if index == 0
product = Product.find_by_id(row['A'].to_i)
if product
if row['D']&.match(/N\/A/) || row['E']&.match(/N\/A/)
product.delete
p '*** deleted ***'
else
product.price = row['D']&.to_f&.round(2)
product.weight = row['E']&.to_f
product.request_for_quote = false
product.save
p 'product updated'
end
else
p "#{row['A']} | product not found ***"
end
end
end
end
private
def connection
#connection ||= Fog::Storage.new(
provider: 'AWS',
aws_access_key_id: ENV['AWS_ACCESS_KEY_ID'],
aws_secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
)
end
def bucket
# Grab the file from the bucket
#bucket ||= connection.directories.get 'my-aws-bucket'
end
end
And the logs:
"---------- Sheet 1 ----------"
"product updated"
"product updated"
... I've cut out a bunch more of these...
"product updated"
"product updated"
"---------- Sheet 2 ----------"
rails aborted!
Errno::ENOENT: No such file or directory # rb_sysopen - /var/folders/9m/mfcnhxmn1bqbm6h91rx_rd8m0000gn/T/file20190920-19247-c6x4zw
"/var/folders/9m/mfcnhxmn1bqbm6h91rx_rd8m0000gn/T/file20190920-19247-c6x4zw" is the temporary file, and as you can see, it's been collected already, even though I'm still using it, and I believe it is still in scope. Any ideas what could be causing this? It's especially odd that my coworker can run this just fine.
In case it's helpful, here is a little code from Creek:
def initialize path, options = {}
check_file_extension = options.fetch(:check_file_extension, true)
if check_file_extension
extension = File.extname(options[:original_filename] || path).downcase
raise 'Not a valid file format.' unless (['.xlsx', '.xlsm'].include? extension)
end
if options[:remote]
zipfile = Tempfile.new("file")
zipfile.binmode
zipfile.write(HTTP.get(path).to_s)
# I added the line below this one, and it fixes the problem by preventing the file from being marked for garbage collection, though I shouldn't need to take steps like that.
# ObjectSpace.undefine_finalizer(zipfile)
zipfile.close
path = zipfile.path
end
#files = Zip::File.open(path)
#shared_strings = SharedStrings.new(self)
end
EDIT: Someone wanted to know exactly how I was running my code, so here it is.
I run the following rake task by executing bundle exec rails client:pricing_update[client_updated_prices.xlsx] in the command line.
namespace :client do
desc 'Imports the initial database structure & base data from uploaded .xlsx file'
task :pricing_update, [:filename] => :environment do |t, args|
PricingUpdateWorker.new.perform(args[:filename])
end
end
I should also mention that I'm running Rails, so the Gemfile.lock keeps the gem versions consistent between me and my coworker. My fog version is 2.0.0 and my rubyzip version is 1.2.2.
Finally it seems that the bug is not in Creek gem at all but rather in the rubyzip gem having trouble with xlsx files as noted in this issue It seems to depend on how the source of the file was generated. I created a simple 2 page spreadsheet in Google sheets and it works fine, but a random xlsx file may not.
require 'creek'
def test_creek(url)
Creek::Book.new(url, remote: true).sheets.each_with_index do |sheet, index|
p "----------Name: #{sheet.name} Index: #{index} ----------"
sheet.simple_rows.each_with_index do |row, i|
puts "#{row} index: #{i}"
end
end
end
test_creek 'https://tc-sandbox.s3.amazonaws.com/creek-test.xlsx'
# works fine should output
"----------Name: Sheet1 Index: 0 ----------"
{"A"=>"foo ", "B"=>"sheet", "C"=>"one"} index: 0
"----------Name: Sheet2 Index: 1 ----------"
{"A"=>"bar", "B"=>"sheet", "C"=>"2.0"} index: 0
test_creek 'http://dev-builds.libreoffice.org/tmp/test.xlsx'
# raises error

How would I rewrite .sc files into a different format?

I want to export sc files from SpaceEngine, then for each file, create a new file with the same name but with the extension .txt.
Here is my code:
require 'fileutils'
require 'IO/console'
puts "Make sure your export folder is clear of everything but the files you want to turn into Object text files."
puts "Starting Process"
i = 0
Dir.foreach('C:\SpaceEngine\export') do |item|
next if item == '.' or item == '..'
i = i + 1
name = File.basename(item, ".*")
current = File.new("#{name}.txt", "w");
current.close
end
sleep 2
I have the latter part already, but I can't get it to read the original files one by one, and then only put certain things from the original into the new file.
# test.sc
# assume this is your test data
this has foo
this does not
this also has foo
this has some other stuff
this is the last line which has foo
blah
blah blah 💩
# filejunk.rb
# you need to write a method that handles the data inside the file you want to
# modify, change, replace etc. but for example
def replace_file_data(filename)
lines = File.readlines
lines.select{|l| l.include?'foo'} #assumes you only want lines with 'foo' in them
end
Dir.glob('C:\SpaceEngine\export\*.sc').each_with_index do |filename, i|
i += 1
name = File.basename(filename, ".*")
current = File.new("#{name}.txt", "w") {|f| f.write replace_file_data(filename) }
current.close
end

Measuring xcodebuild durations for all targets (including dependent ones)

Is it possible to measure time, that single xcodebuild command consumes to build every distinct target?
Let's say I have a target, which depends on some cocoapods: pod1 and pod2.
I build my target using xcodebuild. I can measure overall time.
I need to measure times, that were separately spent on pod1, pod2 and my target
I tried to find the answer in xcodebuild's output, but failed to do so.
Thanks in advance!
I ended up writing a custom ruby script for modifing every target of my xcodeproj and of Pods.xcodeproj. This script adds two build phases that log the target name and current timestamp into an output file. One build phase executes first, an the other one executes last. Later on I simply substract one timestamp from another in a separate script.
Here is the result of the script:
The output file will look like this (after sorting)
Alamofire end: 1510929112.3409
Alamofire start: 1510929110.2161
AlamofireImage end: 1510929113.6925
AlamofireImage start: 1510929112.5205
Path to the output file (/a/ci_automation/metrics/performance-metrics/a.txt on the screenshot) is not hardcoded anyhow. Instead, you pass it as a parameter of a ruby script like this:
$ruby prepare-for-target-build-time-profiling.rb ${PWD}/output.txt
Note, that this script requires cocoapods 1.3.1 (maybe 1.3).
Here is the ruby script: ruby prepare-for-target-build-time-profiling.rb
#!/usr/bin/env ruby
require 'xcodeproj'
require 'cocoapods'
require 'fileutils'
def inject_build_time_profiling_build_phases(project_path)
project = Xcodeproj::Project.open(project_path)
log_time_before_build_phase_name = '[Prefix placeholder] Log time before build'.freeze
log_time_after_build_phase_name = '[Prefix placeholder] Log time after build'.freeze
puts "Patching project at path: #{project_path}"
puts
project.targets.each do |target|
puts "Target: #{target.name}"
first_build_phase = create_leading_build_phase(target, log_time_before_build_phase_name)
last_build_phase = create_trailing_build_phase(target, log_time_after_build_phase_name)
puts
end
project.save
puts "Finished patching project at path: #{project_path}"
puts
end
def create_leading_build_phase(target, build_phase_name)
remove_existing_build_phase(target, build_phase_name)
build_phase = create_build_phase(target, build_phase_name)
shift_build_phase_leftwards(target, build_phase)
is_build_phase_leading = true
inject_shell_code_into_build_phase(target, build_phase, is_build_phase_leading)
return build_phase
end
def create_trailing_build_phase(target, build_phase_name)
remove_existing_build_phase(target, build_phase_name)
build_phase = create_build_phase(target, build_phase_name)
is_build_phase_leading = false
inject_shell_code_into_build_phase(target, build_phase, is_build_phase_leading)
return build_phase
end
def remove_existing_build_phase(target, build_phase_name)
existing_build_phase = target.shell_script_build_phases.find do |build_phase|
build_phase.name.end_with?(build_phase_name)
# We use `end_with` instead of `==`, because `cocoapods` adds its `[CP]` prefix to a `build_phase_name`
end
if !existing_build_phase.nil?
puts "deleting build phase #{existing_build_phase.name}"
target.build_phases.delete(existing_build_phase)
end
end
def create_build_phase(target, build_phase_name)
puts "creating build phase: #{build_phase_name}"
build_phase = Pod::Installer::UserProjectIntegrator::TargetIntegrator
.create_or_update_shell_script_build_phase(target, build_phase_name)
return build_phase
end
def shift_build_phase_leftwards(target, build_phase)
puts "moving build phase leftwards: #{build_phase.name}"
target.build_phases.unshift(build_phase).uniq! unless target.build_phases.first == build_phase
end
def inject_shell_code_into_build_phase(target, build_phase, is_build_phase_leading)
start_or_end = is_build_phase_leading ? "start" : "end"
build_phase.shell_script = <<-SH.strip_heredoc
timestamp=`echo "scale=4; $(gdate +%s%N/1000000000)" | bc`
echo "#{target.name} #{start_or_end}: ${timestamp}" >> #{$build_time_logs_output_file}
SH
end
def parse_arguments
$build_time_logs_output_file = ARGV[0]
if $build_time_logs_output_file.to_s.empty? || ! $build_time_logs_output_file.start_with?("/")
puts "Error: you should pass a full path to a output file as an script's argument. Example:"
puts "$ruby prepare-for-target-build-time-profiling.rb /path/to/script/output.txt"
puts
exit 1
end
end
def print_arguments
puts "Arguments:"
puts "Output path: #{$build_time_logs_output_file}"
puts
end
def clean_up_before_script
if File.exist?($build_time_logs_output_file)
FileUtils.rm($build_time_logs_output_file)
end
build_time_logs_output_folder = File.dirname($build_time_logs_output_file)
unless File.directory?(build_time_logs_output_folder)
FileUtils.mkdir_p(build_time_logs_output_folder)
end
end
def main
parse_arguments
print_arguments
clean_up_before_script
inject_build_time_profiling_build_phases("path/to/project.xcodeproj")
inject_build_time_profiling_build_phases("path/to/pods/project.xcodeproj")
end
# arguments:
$build_time_logs_output_file
main

ruby Copy files based on date modified

I have files (with different extensions) that are added every 10 minutes on a windows share (A) and want to copy them to linux server (B) and do some operations on them with a script.
Using ruby and FilesUtils How can i create a script that will copy only the last added files (or have a watcher that will copy the files to folder B whenever they are added to my folder A)
update this is what i have so far
require 'fileutils'
require 'time'
class Copier
def initialize(from,to)
puts "copying files... puts #{Time.now} \n"
my_files = Dir["#{from}/*.*"].sort_by { |a| File.stat(a).mtime }
my_files.each do |filename|
name = File.basename(filename)
orig = "#{filename}"
dest = "#{to}/#{name}"
FileUtils.cp(orig, dest)
puts "cp file : from #{orig} => to #{dest}"
end
end
end
Copier.new("/mnt/windows_share", "linux_folder")
But it copies all the files each time it is called...
This is what I ended up doing to get the files modified in the last 10 minutes and then copy them from the a windows share folder to the linux folder:
require 'fileutils'
require 'time'
class Copier
def initialize(from,to)
puts "copying files... puts #{Time.now} \n"
my_files = Dir["#{from}/*.*"].select { |fname| File.mtime(fname) > (Time.now - (60*10)) })
my_files.each do |filename|
name = File.basename(filename)
orig = "#{filename}"
dest = "#{to}/#{name}"
FileUtils.cp(orig, dest)
puts "cp file : from #{orig} => to #{dest}"
end
end
end
Copier.new("/mnt/windows_share", "linux_folder")

Download a file only if it exists with ruby

I'm doing a scraper to download all the issues of The Exile available at http://exile.ru/archive/list.php?IBLOCK_ID=35&PARAMS=ISSUE.
So far, my code is like this:
require 'rubygems'
require 'open-uri'
DATA_DIR = "exile"
Dir.mkdir(DATA_DIR) unless File.exists?(DATA_DIR)
BASE_exile_URL = "http://exile.ru/docs/pdf/issues/exile"
for number in 120..290
numero = BASE_exile_URL + number.to_s + ".pdf"
puts "Downloading issue #{number}"
open(numero) { |f|
File.open("#{DATA_DIR}/#{number}.pdf",'w') do |file|
file.puts f.read
end
}
end
puts "done"
The thing is, a lot of the issue links are down, and the code creates a PDF for every issue and, if it's empty, it will leave an empty PDF. How can I change the code so that it can only create and copy a file if the link exists?
require 'open-uri'
DATA_DIR = "exile"
Dir.mkdir(DATA_DIR) unless File.exists?(DATA_DIR)
url_template = "http://exile.ru/docs/pdf/issues/exile%d.pdf"
filename_template = "#{DATA_DIR}/%d.pdf"
(120..290).each do |number|
pdf_url = url_template % number
print "Downloading issue #{number}"
# Opening the URL downloads the remote file.
open(pdf_url) do |pdf_in|
if pdf_in.read(4) == '%PDF'
pdf_in.rewind
File.open(filename_template % number,'w') do |pdf_out|
pdf_out.write(pdf_in.read)
end
print " OK\n"
else
print " #{pdf_url} is not a PDF\n"
end
end
end
puts "done"
open(url) downloads the file and provides a handle to a local temp file. A PDF starts with '%PDF'. After reading the first 4 characters, if the file is a PDF, the file pointer has to be put back to the beginning to capture the whole file when writing a local copy.
you can use this code to check if exist the file:
require 'net/http'
def exist_the_pdf?(url_pdf)
url = URI.parse(url_pdf)
Net::HTTP.start(url.host, url.port) do |http|
puts http.request_head(url.path)['content-type'] == 'application/pdf'
end
end
Try this:
require 'rubygems'
require 'open-uri'
DATA_DIR = "exile"
Dir.mkdir(DATA_DIR) unless File.exists?(DATA_DIR)
BASE_exile_URL = "http://exile.ru/docs/pdf/issues/exile"
for number in 120..290
numero = BASE_exile_URL + number.to_s + ".pdf"
open(numero) { |f|
content = f.read
if content.include? "Link is missing"
puts "Issue #{number} doesnt exists"
else
puts "Issue #{number} exists"
File.open("./#{number}.pdf",'w') do |file|
file.write(content)
end
end
}
end
puts "done"
The main thing I added is a check to see if the string "Link is missing". I wanted to do it using HTTP status codes but they always give a 200 back, which is not the best practice.
The thing to note is that with my code you always download the whole site to look for that string, but I don't have any other idea to fix it at the moment.

Resources