Headless chromedp wait until download finishes - go

I'm using chromedp to navigate thru the website, to download PDF files which are generated by the system. It takes a while, to generate them so... Code looks like that:
chromedp.Navigate("https://website.com/with/report/to/download"),
// wait for download link
chromedp.WaitReady("a.downloadLink"),
chromedp.Click("a.downloadLink"),
// wait some time to pull the file
chromedp.Sleep(time.Minute),
chromedp.Click("#close-button"),
Right now I'm waiting a minute, and then close the browser, but I don't like it that way. If there is any way to control or get some kind of "event" when file download is finished?

This question was asked a long time ago, but since nobody answered and the entire chromedp tag bunch of questions is kind of thin with answers... here goes.
You should not depend on the passage of time for any asynchronous operations. See the chromedp file download example for the correct way to download
a file. Note how they synchronize on download end.

Related

doing restartable downloads in ruby

I've been trying to figure out how to use the Down gem to do restartable downloads in ruby.
So the scenario is downloading a large file over an unreliable link. The script should download as much of the file as it can in the timeout allotted (say it's a 5GB file, and the script is given 30 seconds). I would like that 30 second progress (partial file) to be saved so that next time the script is run, it will download another 30 seconds worth. This can happen until the complete file is downloaded and the partial file is turned into a complete file.
I feel like everything i need to accomplish this is in this gem, but it's unclear to me which features i should be using, and how much of it i need to code myself. (streaming? or caching?) I'm a ruby beginner, so i'm guessing i use the caching and just save the progress to a file myself, and enumerate for as many times as i have time.
How would you solve the problem? Would you use a different gem/method?
You probably don't need to build that yourself. Existing tools like curl and wget already have that functionality.
If you really want to build it yourself, you could perhaps take a look at how curl and wget do it (they're open-source, after all) and implement the same in Ruby.

Capybara + Downloading and using file

I'm using Capybara to navigate through a login on a website and then download a few files (I'm automating a frequent process that I have to do). There's a few things I tried that aren't working and I'm hoping someone might know a solution...
I have the two links I'm executing .click on, but while one file will start downloading (this is using the Chrome Selenium driver), capybara seems to stop functioning after that. Running .click on the other link doesn't do anything... I figured it's because it's not technically on the page anymore (since it followed a download link) but I tried revisiting the page to click the second link and that doesn't work either.
Assuming I can get that working, I'd really like to be able to download to my script location rather than my Downloads folder, but I've tried every profile configuration I've found online and nothing seems to change it.
Because of the first two issues, I decided to try wget... but I would need to continue from the session in capybara to authenticate. Is it possible to pull the session data (just cookies?) from capybara and insert it into a wget or curl command?
Thanks!
For #3 - accessing the cookies is driver dependent - in selenium it's
page.driver.browser.manage.all_cookies
or you can use the https://github.com/nruth/show_me_the_cookies gem which normalizes access across most of Capybaras drivers. With those cookies you can write them out to a file and then use the --load-cookies option of wget (--cookie option in curl)
For #1 you'd need to provide more info about any errors you get, what is current_url, what does "doesn't work" actually mean, etc

Automatically download Cacti Weathermap at regular intervals

I was looking for a way to automatically download the weathermap image from the Cacti Weathermap plugin at regular intervals. There does not seem to be an easy way to do this listed anywhere on the internet using only Windows, so I thought I'd a) ask here and b) post what I've managed so far.
P.S. I've posted where I got up to in one of the answers below.
You can easily right click - Save As while on the weathermap page. This produces a file called weathermap-cacti-plugin.png.
No such file is available from the webpage however. Right clicking - view URL gave me this:
http://<mydomain>/plugins/weathermap/weathermap-cacti-plugin.php?action=viewimage&id=df9c40dcab42d1fd6867&time=1448863933
I did a quick check in powershell to see if this was downloadable (it was):
$client = new-object System.Net.WebClient
$client.DownloadFile("http://<mydomain>/plugins/weathermap/weathermap-cacti-plugin.php?action=viewimage&id=df9c40dcab42d1fd6867&time=1448864049", "C:\data\test.png")
Following a hunch, I refreshed the page and copied a couple more URLs:
<mydomain>/plugins/weathermap/weathermap-cacti-plugin.php?action=viewimage&id=df9c40dcab42d1fd6867&time=1448863989
<mydomain>/plugins/weathermap/weathermap-cacti-plugin.php?action=viewimage&id=df9c40dcab42d1fd6867&time=1448864049
As I had suspected, the time= changed every time I refreshed the page.
Following another hunch, I checked the changing digits (1448863989 etc) on epochconverter.com and got a valid system time which matched my own.
I found a powershell command (Keith Hill's answer on this question) to get the current Unix time
[int64](([datetime]::UtcNow)-(get-date "1/1/1970")).TotalSeconds
and added this into the powershell download script
$client = new-object System.Net.WebClient
$time=[int64](([datetime]::UtcNow)-(get-date "1/1/1970")).TotalSeconds
$client.DownloadFile("http://<mydomain>/plugins/weathermap/weathermap-cacti-plugin.php?action=viewimage&id=df9c40dcab42d1fd6867&time=$time", "C:\data\test.png")
This seems to work - the file test.png modified timestamp was updated every time I ran the code, and it opened showing a valid picture of the weathermap.
All that's required now is to put this in a proper script and schedule it to run every X minutes and save to a folder. I am sure scheduling powershell scripts in Task Scheduler is covered elsewhere so I will not repeat it here.
If anyone knows an easier way of doing this, please let me know. Otherwise, vote up my answer - I have searched a lot and cannot find any other results on the net that let you do this using only Windows. The Cacti forums have a couple of solutions, but they require you to do stuff on the Linux server which is hard for a Linux noob like me.

Automatically uploading text files to an FTP site

I'm looking to automate an upload of a text file to an FTP site. This upload would need to occur daily, and I have access to a server that would run whatever script needed to do the upload. I've looked around for a solution to this and found some information on howtogeek, but neither idea there seemed to be automatic. I'm looking to do this without third-party software if possible. I would appreciate any pointers.
If you're on windows I'd use vbscript (more functionality can be added easily) or .bat files (if you don't need extra functionality) to call on windows FTP.(Provided you don't need anything super secure) Just build the .bat file to call on FTP and append the connection information accordingly. The link Here should help you out. Now in order to make this automatic you need to use the "Task Scheduler" to schedule how you want the script to run.

does recursive wget download visited URLs?

I want to use wget to recursively download a complete webpage. If for example, pages on level 2 of depth contains links to pages from level 1 (that have been already downloaded), will wget download them again? If so, is there a way to prevent this from happening?
Will a manual wget-like script be more optimal than wget, or is it optimised to avoid downloading things over and again? (I'm especially worried about menu links that appear on all pages)
Thank you in advance
A single wget run should never try to download the same page twice. It wouldn't be very useful for mirroring if it did. :) It also has some other failsafes, like refusing to recurse to another domain or a higher directory.
If you want to be sure it's doing the right thing, I suggest just trying it out and watching what it does; you can always mash ^C.

Resources