IOError: closed stream in Ruby SFTP - ruby

The following code tries to list the entries of a remote directory via SFTP and Net::SFTP, but it causes an "closed stream" IOError if the directory contains a large number of files (~ 6000 files):
require 'net/ssh'
require 'net/sftp'
Net::SFTP.start('hostname', 'username', :password => 'password') do |sftp|
# list the entries in a directory
sftp.dir.foreach("/") do |entry|
puts entry.longname
end
end
What is the best way to avoid it? Versions are net-sftp Gem: 2.0.5 and net-ssh Gem: 2.2.1, Ruby: 1.8.7. The full error message reads:
IOError: closed stream
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/ruby_compat.rb:33:in `select'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/ruby_compat.rb:33:in `io_select'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/ruby_compat.rb:32:in `synchronize'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/ruby_compat.rb:32:in `io_select'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/transport/packet_stream.rb:73:in `available_for_read?'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/transport/packet_stream.rb:85:in `next_packet'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/transport/session.rb:170:in `poll_message'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/transport/session.rb:165:in `loop'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/transport/session.rb:165:in `poll_message'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:451:in `dispatch_incoming_packets'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:213:in `preprocess'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:197:in `process'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:161:in `loop'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:161:in `loop_forever'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:161:in `loop'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:110:in `close'
from ~/.rvm/gems/ruby-1.8.7-p330/gems/net-sftp-2.0.5/lib/net/sftp.rb:36:in `start'

The behavior could be deliberate, if we take a look at the dir source code in net-sftp/lib/net/sftp/operations/dir.rb, we see a close operation:
def foreach(path)
..
ensure
sftp.close!(handle) if handle
end
It is possible that this close operation causes the closed stream error. If it does not indicate a bug, it is possible the catch the IOError exception. It also seems to help to run the SSH event loop occasionally:
begin
..
sftp.dir.foreach("/") do |entry|
puts entry.longname
# ...
sftp.loop # Runs the SSH event loop
end
rescue IOError => Ex
puts "*** We are done: "+Ex.message
end

Related

Write to file from stream block

Working on a web service that sometimes needs to return large files, and want it to send something to the client quickly so the client doesn't time out waiting for the start of the data. stream seemed perfect for this, but I ran into a problem.
Dumb example:
get '/path' do
status 200
headers 'Content-Type' => 'text/plain'
stream do |out|
sleep 1
out << "Hello,\n"
sleep 1
out << "World!\n"
end
end
This works fine:
$ curl http://localhost:4567/path
Hello,
World!
But I have a side log that the service writes to, and trying to mix File I/O with the streaming API doesn't work at all:
get '/path' do
status 200
headers 'Content-Type' => 'text/plain'
File.open '/tmp/side-log', 'a' do |lf|
stream do |out|
lf.puts "Woo!"
sleep 1
out << "Hello,\n"
sleep 1
out << "World!\n"
end
end
end
Now I get this:
$ curl http://localhost:4567/path
curl: (18) transfer closed with outstanding read data remaining
Puma doesn't indicate any problems on the server side, but Thin exits entirely:
hello2.rb:13:in `write': closed stream (IOError)
from hello2.rb:13:in `puts'
from hello2.rb:13:in `block (3 levels) in <main>'
from vendor/bundle/gems/sinatra-1.4.6/lib/sinatra/base.rb:437:in `block (2 levels) in stream'
from vendor/bundle/gems/sinatra-1.4.6/lib/sinatra/base.rb:628:in `with_params'
from vendor/bundle/gems/sinatra-1.4.6/lib/sinatra/base.rb:437:in `block in stream'
from vendor/bundle/gems/sinatra-1.4.6/lib/sinatra/base.rb:403:in `call'
from vendor/bundle/gems/sinatra-1.4.6/lib/sinatra/base.rb:403:in `block in each'
from vendor/bundle/gems/eventmachine-1.0.8/lib/eventmachine.rb:1062:in `call'
from vendor/bundle/gems/eventmachine-1.0.8/lib/eventmachine.rb:1062:in `block in spawn_threadpool'
[1]+ Exit 1 ruby hello2.rb
So what should I do if I want to write something out to someplace other than the output stream from inside the stream block?
Not sure if this is the best solution, but using the asynchronous em-files gem worked for me, even in Puma (which I understand is not EventMachine-based):
require 'em-files'
get '/path' do
status 200
headers 'Content-Type' => 'text/plain'
EM::File.open '/tmp/side-log', 'a' do |lf|
stream do |out|
lf.write "Woo!\n"
sleep 1
out << "Hello,\n"
sleep 1
out << "World!\n"
end
end
end

Reading files in a zip archive, without unzipping the archive

I have a directory with 100+ zip files and I need to read the files inside the zip files to do some data processing, without unzipping the archive.
Is there a Ruby library to read the contents of files in zip archives, without unzipping the file?
Using rubyzip gives an error:
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
# Handle entries one by one
zip_file.each do |entry|
# Extract to file/directory/symlink
puts "Extracting #{entry.name}"
entry.extract('here')
# Read into memory
content = entry.get_input_stream.read
end
end
Gives this error:
test.rb:12:in `block (2 levels) in <main>': undefined method `read' for Zip::NullInputStream:Module (NoMethodError)
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:42:in `call'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:42:in `block in each'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:41:in `each'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:41:in `each'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/central_directory.rb:182:in `each'
from test.rb:6:in `block in <main>'
from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/file.rb:99:in `open'
from test.rb:4:in `<main>'
The Zip::NullInputStream is returned if the entry is a directory and not a file, could that be the case?
Here's a more robust variation of the code:
#!/usr/bin/env ruby
require 'rubygems'
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
# Handle entries one by one
zip_file.each do |entry|
if entry.directory?
puts "#{entry.name} is a folder!"
elsif entry.symlink?
puts "#{entry.name} is a symlink!"
elsif entry.file?
puts "#{entry.name} is a regular file!"
# Read into memory
entry.get_input_stream { |io| content = io.read }
# Output
puts content
else
puts "#{entry.name} is something unknown, oops!"
end
end
end
I came across the same issue and checking for if entry.file?, before entry.get_input_stream.read, resolved the issue.
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
# Handle entries one by one
zip_file.each do |entry|
# Extract to file/directory/symlink
puts "Extracting #{entry.name}"
entry.extract('here')
# Read into memory
if entry.file?
content = entry.get_input_stream.read
end
end
end

efficiently processing large quantity of files in Ruby

I am writing a script and I need to traverse a file system, and return the SHA1 sum of the files.
The code I am using is this:
time ruby -r'digest/sha1' -r'find' -e 'Find.find("/") {|x| next unless File.file?(x) ; Digest::SHA1.hexdigest(File.read(x))}
The problem is, I get this error message after about 5 seconds after execution
-e:1:in `read': failed to allocate memory (NoMemoryError)
from -e:1:in `open'
from -e:1:in `block in <main>'
from /usr/share/ruby/find.rb:43:in `block in find'
from /usr/share/ruby/find.rb:42:in `catch'
from /usr/share/ruby/find.rb:42:in `find'
from -e:1:in `<main>'
Why am I getting this error, and what is the "best practice" for handling a task like this?
Help appreciated.
It doesn't seem to be well documented (or at least, I'm not looking in the write place) but the Digest library provides a way of hashsumming files by reading the files in chunks and computing the hashsum, versus File.read which reads the whole file into memory.
The working code would be:
begin
Find.find("/") do |file|
next unless File.file?(file)
puts "#{Digest::SHA1.file(file)} #{file}"
end
rescue => e
puts e
end
Why make it difficult by putting this in a one-liner ?
If you put your code in a script like this, on my system everyting runs smooth and every file on my HD is read.
On a data disk you'rd better find a way to handle large files, like the solution at https://www.ruby-forum.com/topic/58563 I adapted for SHA1.
require 'digest/sha1'
require 'find'
Find.find("/") do |file|
next unless File.file?(file)
begin
sha = File.open(file, 'rb') do |io|
dig = Digest::SHA1.new
buf = ""
dig.update(buf) while io.read(4096, buf)
dig
end
puts "#{sha} #{file}"
rescue => e
puts e.backtrace
end
end
gives
ba4aeced8ab461b75ff87d989ff16ca2464ea787 /$AVG/$VAULT/vault.db
31d8730390451d236b80c4351b6b287d6853570c /$AVG/$VAULT/vvfolder.idx
b4c783e3478e5b6f795e92d3cf5d85837fffd128 /$Recycle.Bin/S-1-5-21-50811273-296787125-2640436092-1000/desktop.ini
b4c783e3478e5b6f795e92d3cf5d85837fffd128 /$Recycle.Bin/S-1-5-21-50811273-296787125-2640436092-1011/desktop.ini
3109805dcc447395f58fec8b5e8a8fca1d20892b /.rnd
61fc34796b7cc67caf9da685e59461c9d13fba29 /4nt500/4NT.INI
...

Rake task not running

Every time I call my rake task it say:
[2012-07-12 15:50:01] ERROR IOError: An existing connection was forcibly closed by the remote host
C:/jruby-1.3.1/lib/ruby/1.8/webrick/httpresponse.rb:324:in `_write_data'
C:/jruby-1.3.1/lib/ruby/1.8/webrick/httpresponse.rb:180:in `send_header'
C:/jruby-1.3.1/lib/ruby/1.8/webrick/httpresponse.rb:103:in `send_response'
C:/jruby-1.3.1/lib/ruby/1.8/webrick/httpserver.rb:79:in `run'
C:/jruby-1.3.1/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'
C:/jruby-1.3.1/lib/ruby/1.8/webrick/server.rb:162:in `start'
[2012-07-12 15:50:29] ERROR IOError: An existing connection was forcibly closed by the remote host
C:/jruby-1.3.1/lib/ruby/1.8/webrick/httpserver.rb:55:in `run'
C:/jruby-1.3.1/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'
C:/jruby-1.3.1/lib/ruby/1.8/webrick/server.rb:162:in `start'
I have tryed different options thinking that it could be the call to the rake task but apparently isn't, also I have tryed with Mongrel and WEBRick in case that it could be the server.
I'm using Jruby 1.3.1
The task it's not been executed.
This is part of my code:
application_controller.rb
def call_rake(task, options = {})
options[:rails_env] ||= Rails.env
args = options.map { |n, v| "#{n.to_s.upcase}='#{v}'" }
system "C:/jruby-1.3.1/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake #{task} #{args.join(' ')} start"
end
forbidden_file.rake
desc "Process CSV file"
task :process_file => :environment do
forbidden_file = ForbiddenFile.find(ENV["csv"])
forbidden_file.upload_file
end
Controller
...
call_rake :process_file, :csv => params[:files]
redirect_to forbidden_files_url
...
It's working now, I just removed the word start from the command.
system "rake #{task} #{args.join(' ')}"

Net::SSH with non unix/linux host?

I am trying to use the Net::SSH library to login and manage a host that supports ssh. It is a piece of telecom equipment and so speaks TL1. I seem to be able to log in successfully, but when I try to ssh.exec something, it aborts saying it could not execute command. Here is my simple code:
require 'net/ssh'
Net::SSH.start('10.204.121.192', 'password', :password => "password") do |ssh|
ssh.exec("INH-MSG-ALL;")
end
If i point the same code at a Linux server and provide a command such as "ls -l /", it works fine. What I am wondering is, can I use this ssh library? Do I need to use another command instead of exec?
This is the error output:
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:322:in `block (2 levels) in exec': could not execute command: "INH-MSG-ALL;" (RuntimeError)
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh/connection/channel.rb:597:in `call'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh/connection/channel.rb:597:in `do_failure'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:586:in `channel_failure'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:456:in `dispatch_incoming_packets'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:213:in `preprocess'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:197:in `process'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:161:in `block in loop'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:161:in `loop'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:161:in `loop'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh/connection/session.rb:110:in `close'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.2.1/lib/net/ssh.rb:194:in `start'
from ssh_test.rb:3:in `<main>'
I assume it works fine when you login in the shell manually.
To understand what is the difference when you connect through net/ssh collect output of env command in both cases and compare.
That most probably you'll see a difference that will lead you to a solution or at least will give you dirty trick.
UPDATE. (Not working)
Net::SSH.start('10.204.121.192', 'password', :password => "password") do |ssh|
ssh.open_channel do |channel|
channel.on_data do |ch, data|
puts "got data: #{data.inspect}"
end
channel.send_data("INH-MSG-ALL;\n")
end
end
UPDATE2. (Working)
Net::SSH.start('10.204.121.192', 'password', :password => "password") do |ssh|
ssh.open_channel do |channel|
channel.send_channel_request "shell"
channel.on_data do |ch, data|
puts "got data: #{data.inspect}"
end
channel.send_data("INH-MSG-ALL;\n")
end
end
Thanks forker for your updates),
One more thing,
from your code how to make this
puts "got data: #{data.inspect}"
to output data for each command sent to the shell ?
Does this code wait for each command to complete?
Thanks.

Resources