How do I download a file from a link but also get the download progress in order to display it ?
From the very little I know open-uri and the typhoeus gem ( I use both ) manage the download internaly and I have no idea on how to get the download progression from them.
I suppose what I am looking for is a way of pausing the download and get the progress of update a value each time a packet is received but I do not know how to proceed.
I could also use an other gem ? If yes, which one and why ?
You need to do it as a chunked request, which is what libraries like open-uri do behind the scenes if you use their high level APIs (like open(uri).read).
Untested example based on https://www.ruby-forum.com/topic/170237:
u = URI.parse(uri)
s = ""
progress = 0
Net::HTTP.start(u.host, u.port) do |http|
http.request_get(u.path) do |response|
response.read_body do |chunk|
s << chunk
progress += chunk.length
puts "Downloaded #{progress} so far"
end
end
end
puts "Received:\n#{s}"
Related
I am trying to download a file from Watir but I don't want to use the loop sleep approach.
I would rather at the last moment of the interaction, recreate the session Watir has on the webpage and use another library, for example Typhoeus.
Typhoeus uses curl and can use cookies from a file, nonetheless, Watir generates a Hash Array, and if I ask to save it, it saves it as a YAML file.
Is there a faster way to convert it?
Another article in StackOverflow said that curl uses Mozilla style cookie files.
So, if your Watir instance is browser and the file to which you are going to write to is file you can do
browser.cookies.to_a.each do |ch|
terms = []
terms << ch[:domain]
terms << ch[:same_site].nil? ? 'FALSE' : 'TRUE'
terms << ch[:path]
terms << ch[:secure] ? 'TRUE' : 'FALSE'
terms << ch[:expires].to_i.to_s
terms << ch[:name]
terms << ch[:value]
file.puts terms.join("\t")
end
Then you can tell Typhoeus to use the contents of file to keep navigating using the same cookies.
I'm currently issuing a GET request to the PivotalTracker API to get all of the bugs for a given project, by bug severity. All I really need is a count of the bugs (i.e. 10 critical bugs), but I'm currently getting all of the raw data for each bug in XML format. The XML data has a bug count at the top, but I have to scroll up tons of data to get to that count.
To solve this issue, I'm trying to parse the XML to only display the bug count, but I'm not sure how to do that. I've experimented with Nokogiri and REXML, but it seems like they can only parse actual XML files, not XML from an HTTP GET request.
Here is my code (The access token as been replaced with *'s for security reasons):
require 'net/http'
require 'rexml/document'
prompt = '> '
puts "What is the id of the Project you want to get data from?"
print prompt
project_id = STDIN.gets.chomp()
puts "What type of bugs do you want to get?"
print prompt
type = STDIN.gets.chomp()
def bug(project_id, type)
net = Net::HTTP.new("www.pivotaltracker.com")
request = Net::HTTP::Get.new("/services/v3/projects/#{project_id}/stories?filter=label%3Aqa-#{type}")
request.add_field("X-TrackerToken", "*******************")
net.read_timeout = 10
net.open_timeout = 10
response = net.start do |http|
http.request(request)
end
puts response.code
print response.read_body
end
bug(project_id, type)
Like I said, the GET request is successfully printing a bug count and all of the raw data for each individual bug to my Terminal window, but I only want it to print the bug count.
The API documentation shows the total bug count is an attribute of the XML response's top-level node, stories.
Using Nokogiri as an example, try replacing print response.read_body with
xml = Nokogiri::XML.parse(response.body)
puts "Bug count: #{xml.xpath('/stories/#total')}"
Naturally you'll need to add require 'nokogiri' at the top of your code as well.
I'm using EventMachine to process incoming emails which could at times be very high volume. The code that I have so far definitely works for emails that come in separated by at least about 5 seconds, but somewhere below that, only one email will be processed out of however many arrive. I've tried adding EM.defer statements in a few different places which I thought would help, but to no avail. I should also note, if it makes any difference, that I'm using the em-imap gem in this example as well.
The relevant section of the code is here:
EM.run do
client = EM::IMAP.new('imap.gmail.com', 993, true)
client.connect.bind! do
client.login('me#email.com', 'password123')
end.bind! do
client.select('INBOX')
end.bind! do
client.wait_for_new_emails do |response|
client.fetch(response.data).callback do |fetched|
currentSubjectLine = fetched.first.attr.values[1].subject
desiredCommand = parseSubjectLine(currentSubjectLine)
if desiredCommand == 0
if fetched.first.attr.values[0].parts.length == 2
if fetched.first.attr.values[0].parts[1].subtype.downcase != "pdf"
puts 'Error: Missing attachment, or attachment of the wrong type.'
else
file_name = fetched.first.attr.values[0].parts[1].param.values[0]
client.fetch(response.data, "BODY[2]").callback do |attachments|
attachment = attachments[0].attr["BODY[2]"]
File.new(file_name,'wb+').write(Base64.decode64(attachment))
end
end...
Am I somehow blocking the reactor in this code segment? Is it possible that some library that I'm using isn't appropriate here? Could GMail's IMAP server have something to do with it? Do you need any more information about what happens in some given situation before you can answer with confidence? As always, any help is greatly appreciated. Thank you!
Update with Minimized Code
Just in case anything in my organization has anything to do with it, I'm including everything that I think might possibly be relevant.
module Processing
def self.run
EM.run do
client = EM::IMAP.new('imap.gmail.com', 993, true)
client.connect.bind! do
client.login('me#email.com', 'password123')
end.bind! do
client.select('INBOX')
end.bind! do
client.wait_for_new_emails do |response|
client.fetch(response.data).callback do |fetched|
puts fetched[0].attr.values[1].subject
end
end
end.errback do |error|
puts "Something failed: #{error}"
end
end...
Processing.run
Don't hate me for saying this, but refactor that pyramid of doom spaggheti thingy that makes Demeter twitch into something readable and the error will reveal itself :)
If it doesn't reveal itself you will be able to boil it down to the simplest possible code that reproduces the problem and submit it as an issue to https://github.com/eventmachine/eventmachine
However, EM isn't really supported any more, the devs went a bit awol so think about moving to https://github.com/celluloid/celluloid and https://github.com/celluloid/celluloid-io
PS
just saw this
File.new(file_name,'wb+').write(Base64.decode64(attachment))
is a blocking call afaik, try playing with this and you might be able to reproduce the issue. See https://github.com/martinkozak/em-files and http://eventmachine.rubyforge.org/EventMachine.html#defer-class_method on possible ways to go around this
Is it possible to open every link in certain div and collect values of opened fields alltogether in one file or at least terminal output?
I am trying to get list of coordinates from all markers visible on google map.
all_links = b.div(:id, "kmlfolders").links
all_links.each do |link|
b.link.click
b.link(:text, "Norādījumi").click
puts b.text_field(:title, "Galapunkta_adrese").value
end
Are there easier or more effective ways how to automatically collect coordinates from all markers?
Unless there is other data (alt tags? elements invoked via onhover?) in the HTML already that you could pick through, that does seem like the most practical way to iterate through the links, however from what I can see you are not actually making use of the 'link' object inside your loop. You'd need something more like this I think
all_links = b.div(:id, "kmlfolders").links
all_links.each do |thelink|
b.link(:href => thelink.href).click
b.link(:text, "Norādījumi").click
puts b.text_field(:title, "Galapunkta_adrese").value
end
Probably using their API is a lot more effective means to get what you want however, it's why folks make API's after all, and if one is available, then using it is almost always best. Using a test tool as a screen-scraper to gather the info is liable to be a lot harder in the long run than learning how to make some api calls and get the data that way.
for web based api's and Ruby I find the REST-CLIENT gem works great, other folks like HTTP-Party
As I'm not already familiar with Google API, I find it hard for me to dig into API for one particular need. Therefor I made short watir-webdriver script for collecting coordinates of markers on protected google map. Resulting file is used in python script that creates speedcam files for navigation devices.
In this case it's speedcam map maintained and updated by Latvian police, but this script can probably be used with any google map just by replacing url.
# encoding: utf-8
require "rubygems"
require "watir-webdriver"
#b = Watir::Browser.new :ff
#--------------------------------
#b.goto "http://maps.google.com/maps?source=s_q&f=q&hl=lv&geocode=&q=htt%2F%2Fmaps.google.com%2Fmaps%2Fms%3Fmsid%3D207561992958290099079.0004b731f1c645294488e%26msa%3D0%26output%3Dkml&aq=&sll=56.799934,24.5753&sspn=3.85093,8.64624&ie=UTF8&ll=56.799934,24.5753&spn=3.610137,9.887695&z=7&vpsrc=0&oi=map_misc&ct=api_logo"
#b.div(:id, "kmlfolders").wait_until_present
all_markers = #b.div(:id, "kmlfolders").divs(:class, "fdrlt")
#prev_coordinates = 1
puts "#{all_markers.length} speedcam markers detected"
File.open("list_of_coordinates.txt","w") do |outfile|
all_markers.each do |marker|
sleep 1
marker.click
sleep 1
description = #b.div(:id => "iw_kml").text
#b.span(:class, "actbar-text").click
sleep 2
coordinates = #b.text_field(:name, "daddr").value
redo if coordinates == #prev_coordinates
puts coordinates
outfile.puts coordinates
#prev_coordinates = coordinates
end
end
puts "Coordinates saved in file!"
#b.close
Works both on Mac OSX 10.7 and Windows7.
I finally got my code to reference the ruby-alsa library, but I'm stuck again. Eventually, what I'd like to happen is have an audio file play on the server when an action is invoked from the client side. As such, I have the following code in my controller:
File.open('./public/audio/test.wav', 'r') do |f|
ALSA::PCM::Playback.open do |playback|
playback.write do |length|
f.read length
end
end
end
By reading the RDoc for the ALSA library, I get the impression I should be using the write_in_background method from the ALSA::PCM::Playback class. Unfortunately, I can't get the syntax right for getting this method to work. To simply verify I have ALSA working properly, I attempted to use the Playback.write method (note code above). The above code is syntactically correct; however, it plays a sound only for a microsecond, then stops. My guess is the request ends so quickly, it doesn't have enough time to play anything recognizable.
As previously mentioned, my ultimate goal is to have an end-user invoke an action that plays audio back on the server. The file should not stop playing at the end of the HTTP request --it should continue until another action is invoked that stops playback. Knowing that, could somebody please help me with getting the syntax and parameters right for calling the write_in_background method? I'm afraid the ruby-alsa documentation I have at this time hasn't been helpful enough for me (as a complete newbie at Ruby).
Update: If I replace the above call to the write method with a call to the write_to_background method, I get the following runtime error: cannot add pcm handler (Function not implemented)
Update 2: I tried this with a different WAV file and the following code and it plays at warp speed.
File.open('./public/audio/sample.wav', 'r') do |f|
ALSA::PCM::Playback.open do |playback|
playback.write do |length|
#temp = length
f.read length
end
sleep 4
end
end
It appears there may be a combination of things going on here. I believe the first is in reference to the sample rate (length == 44100, which is CD quality). I'll have to look into how to play back audio files at a different rate. Beyond that, however, I'm still stuck on how to get it to play in the background. While the sleep proves ALSA is working, it won't work well in a real-world scenario.
Update 3: Got the sample rate bit working even though it temporarily relies on a magic number:
File.open('./public/audio/sample.wav', 'r') do |f|
ALSA::PCM::Playback.open "default", {:channels => 1, :sample_rate => 11025} do |playback|
playback.write do |length|
#temp = length
f.read length
end
sleep 4
end
end
At this point, I have my ruby code playing audio through ALSA, but I'm not sure how to make it play continuously in the background without requiring the sleep.
How about kicking it off on its own thread?
Thread.new do
File.open('myfile.wav', 'rb') do |f|
ALSA::PCM::Playback.open do |playback|
playback.write do |length|
f.read length
end
end
end
end.join
Edit:
OK, if that doesn't work, I'm guessing it kicks off its own thread already. How about sleeping for the duration? First try:
File.open('myfile.wav', 'rb') do |f|
ALSA::PCM::Playback.open do |playback|
playback.write do |length|
f.read length
end
end
end
sleep 4 # seconds