Is there a better way to check file size using SCP before downloading? - ruby

I have this code to download a file from a remote machine, but I want to limit it to files that are less than 5MB.
So far the code works. but is there a better way to check the filesize before downloading?
Net::SCP.start(hname, uname, :password => pw ) do|scp|
fsize=scp.download!("#{remdir}/#{filname}").size;puts fsize
scp.download!("#{remdir}/#{filname}", "#{filname}") do |ch, name, sent, total|
File.open(lfile, 'a'){ |f| f.puts "#{name}: #{(sent.to_f * 100 / total.to_f).to_i}% complete"}
#puts "Sending : #{(sent.to_f * 100 / total.to_f).to_i}% complete"
#puts "#{name}: #{sent}/#{total}"
#print "\r#{name}: #{(sent.to_f * 100 / total.to_f).to_i}%"
end
end
Does this cause any problem if I am use it for large files?
fsize=scp.download!("#{remdir}/#{filname}").size;puts fsize
This page says the file will be returned as a string:
http://ruby.about.com/od/ssh/ss/netscp_6.htm
Update:
I tried SFTP aswell. first it did not work for full path to file. and secondly it did not do what i wanted. so was using scp.download!().size. i know i am doing the download twice :(
require 'net/sftp'
# did not take full path "/user/myname/filename"
remote_path="somefile.txt"
Net::SFTP.start(hname, uname, :password => pw) do |sftp|
attrs = sftp.stat!("rome_desc.txt") ; puts attrs # gave # ☼ ↨?% (? '→ ??Q{ ?Qt;?
sftp.stat!(remote_path) do |response| puts response #returned no such file (2)
# but did not do below upload operation.
unless response.ok?
sftp.upload!("#{filname}", remdir)
end
end
end
Update:2 Solution
found the solution using the the comments provided from below users and after searching the net.
Net::SFTP.start(hname, uname, :password => pw) do |sftp| #, :verbose => Logger::DEBUG
sftp.dir.glob("./#{remdir}", "#{filname}") do |entry| p entry.name
file_size=entry.attributes.size; file_size = '%.2f' % (("#{file_size}").to_f / 2**20) ; File.open(lfile, 'a'){ |f| f.puts "File size is #{file_size} mb"}
if file_size < file_size_param then
sftp.download!("#{remdir}/#{filname}", filname)
else
File.open(lfile, 'a'){ |f| f.puts "File size is greater than #{file_size_param} mb. so can not Download File"}
end
end
end
used .attributes.size to obtain the file size and perform the download operation by checking the filesize.
sftp.dir.glob("./#{remdir}", "#{filname}") do |entry| p entry.name
file_size=entry.attributes.size

Does this cause any problem if I am use it for large files?
We don't know because we don't know how fast your internet connection is, how much RAM you have, how fast the pipe is from the host you're downloading the file from?
Basically though, you are reading the file twice, once into memory to see how big it is, then again if it meets your requirement, which seems really... silly.
You're doubling the traffic to the host you're reading from and on your network connection, and, if the file is larger than RAM on your local machine, it is going to go nuts.
As Darshan says, look at using Net::SFTP. It will give you the ability to query the file's size before you try to load it, without pulling the entire thing down. It's a bit more complicated to use, but that complexity translates into flexibility.
"/user/myname/filename"
(S)FTP might not necessarily have its base path where it can see that file. To probe the system and figure out, ask the system, via the SFTP connection, what its current directory is when you first login, then ask it for the files it can see using something like this example from the Net::STFP docs:
sftp.dir.glob("/base/path", "*/**/*.rb") do |entry|
p entry.name
end
That will recursively look through the "/base/path" hierarchy, searching for all "*.rb" files.

Your current code downloads the file, checks the size of the downloaded file (presumably to check if it is less than 5MB, but you don't actually do that), and then downloads it again. Even if you did something with fsize, it's too late to have not downloaded it.
I'd look into the sftp gem rather than scp; it should be pretty straightforward to do what you want with sftp, but not with scp.

Related

How to unit test a "disk full" scenario with Ruby RSpec?

I need to unit test scenarios like the following:
The disk has 1MB free space. I try to copy 2MB of file(s) to the disk.
What's the best way to do this with Ruby RSpec?
For further information, I need to unit test the following file cache method, since it appears to have some issue:
def set first_key, second_key='', files=[]
# If cache exists already, overwrite it.
content_dir = get first_key, second_key
second_key_file = nil
begin
if (content_dir.nil?)
# Check the size of cache, and evict entries if too large
check_cache_size if (rand(100) < check_size_percent)
# Make sure cache dir doesn't exist already
first_cache_dir = File.join(dir, first_key)
if (File.exist?first_cache_dir)
raise "BuildCache directory #{first_cache_dir} should be a directory" unless File.directory?(first_cache_dir)
else
FileUtils.mkpath(first_cache_dir)
end
num_second_dirs = Dir[first_cache_dir + '/*'].length
cache_dir = File.join(first_cache_dir, num_second_dirs.to_s)
# If cache directory already exists, then a directory must have been evicted here, so we pick another name
while File.directory?cache_dir
cache_dir = File.join(first_cache_dir, rand(num_second_dirs).to_s)
end
content_dir = File.join(cache_dir, '/content')
FileUtils.mkpath(content_dir)
# Create 'last_used' file
last_used_filename = File.join(cache_dir, 'last_used')
FileUtils.touch last_used_filename
FileUtils.chmod(permissions, last_used_filename)
# Copy second key
second_key_file = File.open(cache_dir + '/second_key', 'w+')
second_key_file.flock(File::LOCK_EX)
second_key_file.write(second_key)
else
log "overwriting cache #{content_dir}"
FileUtils.touch content_dir + '/../last_used'
second_key_file = File.open(content_dir + '/../second_key', 'r')
second_key_file.flock(File::LOCK_EX)
# Clear any existing files out of cache directory
FileUtils.rm_rf(content_dir + '/.')
end
# Copy files into content_dir
files.each do |filename|
FileUtils.cp(filename, content_dir)
end
FileUtils.chmod(permissions, Dir[content_dir + '/*'])
# Release the lock
second_key_file.close
return content_dir
rescue => e
# Something went wrong, like a full disk or some other error.
# Delete any work so we don't leave cache in corrupted state
unless content_dir.nil?
# Delete parent of content directory
FileUtils.rm_rf(File.expand_path('..', content_dir))
end
log "ERROR: Could not set cache entry. #{e.to_s}"
return 'ERROR: !NOT CACHED!'
end
end
One solution is to stub out methods that write to disk to raise an error. For example, for the specs that test disk space errors, you could try:
before do
allow_any_instance_of(File).to receive(:open) { raise Errno::ENOSPC }
# or maybe # allow(File).to receive(:write) { raise Errno::ENOSPC }
# or # allow(FileUtils).to receive(:cp) { raise Errno::ENOSPC }
# or some combination of these 3...
end
it 'handles an out of disk space error' do
expect{ my_disk_cache.set('key1', 'key2', [...]) }.to # your logic for how BuildCache::DiskCache should handle the error here.
end
There are two problems with this however:
1) Errno::ENOSPC may not be the error you actually see getting raised. That error fits the description in your question, but depending on the peculiarities of your lib and the systems it runs on, you might not really be getting an Errno::ENOSPC error. Maybe you run out of RAM first and are getting Errno::ENOMEM, or maybe you have too many file descriptors open and are getting Errno::EMFILE. Of course if you want to be rigorous you could handle all of these, but this is time consuming and you'll get diminishing returns for handling the more obscure errors.
See this for more information on Errno errors.
2) This solution involves stubbing a specific method on a specific class. (File.open) This isn't ideal because it couples the setup for your test to the implementation in your code. That is to say, if you refactor BuildCache::DiskCache#set to not use File.open, then this test might start failing even though the method might be correct.
That said, File.open is fairly low level. I know that some FileUtils methods use File.open, (Notably, FileUtils.cp) so I would suggest just using that first allow_any_instance_of line. I'd expect that to handle most of your use cases.
Alternatively, there is a tool called fakefs that may be able to help you with this. I am not familiar with it, but it may well have functionality that helps with testing such errors. You may want to look into it.
You could make use of any of the method calls you know are happening inside of the method you need to test, and stub them so they raise an error. E.g. FileUtils.touch is called a number of times, so we could do:
it 'handles file write error gracefully' do
allow(FileUtils).to receive(:touch).and_raise('oh no')
# your expectations
# your test trigger
end

optimising reading id3 tags of mp3 files

I'm trying to read mp3 files using 'mp3info' gem and by going through each file which ends with .mp3 in its file name in a directory and going inside a directory using Dir.chdir() and repeating the process and storing these tags in database. But I have 30gb of music collection and it takes around 6-10 mins for the whole scan to complete. Is there any way I can optimise this scan?
def self.gen_list(dir)
prev_pwd=Dir.pwd
begin
Dir.chdir(dir)
rescue Errno::EACCES
end
counter = 0
Dir[Dir.pwd+'/*'].each{|x|
#puts Dir.pwd
if File.directory?(x) then
self.gen_list(x) do |y|
yield y
end
else if File.basename(x).match('.mp3') then
begin
Mp3Info.open(x) do |y|
yield [x,y.tag.title,y.tag.album,y.tag.artist]
end
rescue Mp3InfoError
end
end
end
}
Dir.chdir(prev_pwd)
end
This is the method which generates list and sends the tags to &block where data is stored in database..
Have you tried setting the parse_mp3 flag to false? by default it is on which means you are going to pull in the entire file for each scan when all you care about is the info. I don't know how much time this will save you. See the github source for more info.
https://github.com/moumar/ruby-mp3info/blob/master/lib/mp3info.rb#L214
# Specify :parse_mp3 => false to disable processing of the mp3
def initialize(filename_or_io, options = {})
You can:
Run several processes (for each directory in the base dir, for example)
Use threads with rubinius or JRuby.
You can try taglib-ruby gem which is unlike mp3info wrapper over C library and it could give you little bit more performance. Otherwise you have to stick to JRuby and run multiple threads (4 if you have 4 cores).
You may also benefit from a more direct way of retrieving the mp3 files.
Dir['**/*.mp3'].each |filepath|
Mp3Info.open(filepath) do |mp3|
...
end
rescue Mp3ErrorInfo
...
end
This will find all .mp3 files at any depth from the current directory and yield the relative path to the block. It is approximately equivalent to find . -name '*.mp3' -print

Failure reading PNG files with Ruby on Windows

I am creating a Zip file containing both text files and image files. The code works as expected when running on MacOS, but it fails when running on Windows because the image file contents are not read correctly.
The snippet below always reads PNG image files as '‰PNG', adding a 5 bytes file in the Zip for each PNG image.
Is it an issue regarding Windows environment?
zip_fs.file.open(destination, 'w') do |f|
f.write File.read(file_name)
end
from Why are binary files corrupted when zipping them?
io.get_output_stream(zip_file_path) do |out|
out.write File.binread(disk_file_path)
end
You need to tell Ruby to read/write the files in binary mode. Here are some variations on a theme:
zip_fs.file.open(destination, 'wb') do |f|
File.open(file_name, 'rb') do |fi|
f.write fi.read
end
end
zip_fs.file.open(destination, 'wb') do |f|
f.write File.read(file_name, 'mode' => 'rb')
end
zip_fs.file.open(destination, 'wb') do |f|
f.write File.readbin(file_name)
end
A potential problem with the code is the input file is being slurped, which, if it's larger than the available space, would be a bad thing. It'd be better to read the input file in blocks. This is untested but should work:
BLOCK_SIZE = 1024 * 1024
zip_fs.file.open(destination, 'wb') do |f|
File.open(file_name, 'rb') do |fi|
while (block_in = fi.read(BLOCK_SIZE)) do
f.write block_in
end
end
end
The file that was opened will never be closed. Use File.binread(file_name)
My initial code was written to show that binary mode needed to be used, and used open because it's "more traditional", but forgot to use the block mode. I modified my sample code to fix that problem.
However, the file would be closed implicitly by Ruby as the interpreter shuts down when the script ends, as part of housekeeping that occurs. However, it's better to explicitly close the file. If the OP is using RubyZip like I think, that will automatically happen if a block is passed to open. Otherwise, read and readbin will both read to EOF and close the file. Code using those methods needs to be sensitive to the need to read blocks if the input file is an unknown size or larger than available buffer space.
I had a similar problem when I was reading Lib files. Here's my solution:
File.open(path + '\Wall.Lib')
Where the path corresponded to a javascript file that inputs filenames.

Ruby NET::SCP containing Wildcard

I need to download a file daily from a client that I have SCP but not SSH access to.
The file name will always be /outgoing/Extract/visit_[date]-[timestamp].dat.gz'
For example yesterdays file was called visits_20130604-090003.dat.gz
I can not rely on the fact that the time stamp will always be the same, but the date should always be yesterdays date:
My set up so far:
My home directory contains to sub-directories named downloads_fullname and downloads_wildcard.
It also contains an simple ruby script named foo.rb.
The contents of foo.rb are this`
#! /usr/bin/ruby
require 'net/ssh'
require 'net/scp'
yesterday = (Time.now - 86400).strftime('%Y%m%d')
Net::SCP.start('hostname', 'username') do |scp|
scp.download!('/outgoing/Extract/visits_' + yesterday + '-090003.dat.gz', 'downloads_fullname')
scp.download!('/outgoing/Extract/visits_' + yesterday + '-*.dat.gz', 'downloads_wildcard')
end
When run the downloads_fullname directory contains the file, but the downloads_wildcard directory does not.
Is there any way to use wildcarding in Net::SCP? Or does anybody have any sly workarounds? I tried \*to no avail.
Thank you Tin Man!!!
To anybody else, here is the code I ended up with following Tin Man's lead:
(Tried to post it as a comment but had formatting issues)
#! /usr/bin/ruby
require 'net/sftp'
yesterday = (Time.now - 86400).strftime('%Y%m%d')
Net::SFTP.start('hostname', 'username') do |sftp|
sftp.dir.foreach("/outgoing/Extract") do |file|
if file.name.include? '_' + yesterday + '-'
sftp.download!('/outgoing/Extract/' + file.name, 'downloads/'+ file.name)
end
end
end
I don't think you can get there using scp because it expects you to know exactly which file you want, but sftp will let you get a directory listing.
You can use Net::SFTP to programmatically pick your file and request it. This is the example code:
require 'net/sftp'
Net::SFTP.start('host', 'username', :password => 'password') do |sftp|
# upload a file or directory to the remote host
sftp.upload!("/path/to/local", "/path/to/remote")
# download a file or directory from the remote host
sftp.download!("/path/to/remote", "/path/to/local")
# grab data off the remote host directly to a buffer
data = sftp.download!("/path/to/remote")
# open and write to a pseudo-IO for a remote file
sftp.file.open("/path/to/remote", "w") do |f|
f.puts "Hello, world!\n"
end
# open and read from a pseudo-IO for a remote file
sftp.file.open("/path/to/remote", "r") do |f|
puts f.gets
end
# create a directory
sftp.mkdir! "/path/to/directory"
# list the entries in a directory
sftp.dir.foreach("/path/to/directory") do |entry|
puts entry.longname
end
end
Based on that you can list the directory entries then use find or select to iterate over the returned list to find the one with the current date. Pass that filename to sftp.download! to download it to a local file.

How to FTP in Ruby without first saving the text file

Since Heroku does not allow saving dynamic files to disk, I've run into a dilemma that I am hoping you can help me overcome. I have a text file that I can create in RAM. The problem is that I cannot find a gem or function that would allow me to stream the file to another FTP server. The Net/FTP gem I am using requires that I save the file to disk first. Any suggestions?
ftp = Net::FTP.new(domain)
ftp.passive = true
ftp.login(username, password)
ftp.chdir(path_on_server)
ftp.puttextfile(path_to_web_file)
ftp.close
The ftp.puttextfile function is what is requiring a physical file to exist.
StringIO.new provides an object that acts like an opened file. It's easy to create a method like puttextfile, by using StringIO object instead of file.
require 'net/ftp'
require 'stringio'
class Net::FTP
def puttextcontent(content, remotefile, &block)
f = StringIO.new(content)
begin
storlines("STOR " + remotefile, f, &block)
ensure
f.close
end
end
end
file_content = <<filecontent
<html>
<head><title>Hello!</title></head>
<body>Hello.</body>
</html>
filecontent
ftp = Net::FTP.new(domain)
ftp.passive = true
ftp.login(username, password)
ftp.chdir(path_on_server)
ftp.puttextcontent(file_content, path_to_web_file)
ftp.close
David at Heroku gave a prompt response to a support ticket I entered there.
You can use APP_ROOT/tmp for temporary file output. The existence of files created in this dir is not guaranteed outside the life of a single request, but it should work for your purposes.
Hope this helps,
David

Resources