doing restartable downloads in ruby - ruby

I've been trying to figure out how to use the Down gem to do restartable downloads in ruby.
So the scenario is downloading a large file over an unreliable link. The script should download as much of the file as it can in the timeout allotted (say it's a 5GB file, and the script is given 30 seconds). I would like that 30 second progress (partial file) to be saved so that next time the script is run, it will download another 30 seconds worth. This can happen until the complete file is downloaded and the partial file is turned into a complete file.
I feel like everything i need to accomplish this is in this gem, but it's unclear to me which features i should be using, and how much of it i need to code myself. (streaming? or caching?) I'm a ruby beginner, so i'm guessing i use the caching and just save the progress to a file myself, and enumerate for as many times as i have time.
How would you solve the problem? Would you use a different gem/method?

You probably don't need to build that yourself. Existing tools like curl and wget already have that functionality.
If you really want to build it yourself, you could perhaps take a look at how curl and wget do it (they're open-source, after all) and implement the same in Ruby.

Related

Headless chromedp wait until download finishes

I'm using chromedp to navigate thru the website, to download PDF files which are generated by the system. It takes a while, to generate them so... Code looks like that:
chromedp.Navigate("https://website.com/with/report/to/download"),
// wait for download link
chromedp.WaitReady("a.downloadLink"),
chromedp.Click("a.downloadLink"),
// wait some time to pull the file
chromedp.Sleep(time.Minute),
chromedp.Click("#close-button"),
Right now I'm waiting a minute, and then close the browser, but I don't like it that way. If there is any way to control or get some kind of "event" when file download is finished?
This question was asked a long time ago, but since nobody answered and the entire chromedp tag bunch of questions is kind of thin with answers... here goes.
You should not depend on the passage of time for any asynchronous operations. See the chromedp file download example for the correct way to download
a file. Note how they synchronize on download end.

Define download size for wget or curl

As part of a bash script I need to download a file with a known file size, but I'm having issues with the download itself. The file only gets partially downloaded every time. The server I'm downloading from doesn't seem particularly well set up - it doesn't report file size so wget (which I'm using currently) doesn't know how much data to expect. However, I know the exact size of the file, so theoretically I could tell wget what to expect. Does anyone know if there is a way to do this? I'm using wget at the moment but I can easily switch to curl if it will work better. I know how to adjust timeouts (which might help too), and retries, but I assume that for retries to work it needs to know the size of the file its downloading.
I have seen a couple of other questions indicating that it might be a cookie problem, but that's not it in my case. The actual size downloaded varies from <1Mb to 50Mb, so it looks more like some sort of lost connection.
Could you share the entire command to check what parameters are you using? however, it's a strange case.
You may use the -c parameter, restore the connection in the same point where it stopped after the retries.
Or you can try using --spider parameter. That checks if the file exists and get the info file in log.

Is there a safe way to modify another package's entry in the RPM database?

I've run into the problem described in this question, where an old package was Obsoleted, and its %preun script is run with $1 = 0, resulting in undesirable behavior. I know this could be worked around by using -e + -i, as suggested in that answer, or the --nopreun flag, but it's difficult to get that information out to users who are accustomed to simply using -U.
I can't modify the existing %preun scripts in the wild. I don't see any way to run additional code from the new package after the old one's preun. I can't find any way to have my new package programmatically prevent the old %preun script from executing.
Is there any safe way to reach into the RPM database and remove a scriptlet for an existing package?
Jeff Johnson is absolutely correct that it should not be done. However it certainly can be done.
I have done this in an RPM at work, for distribution, but note, this was a contained semi-structured environment with no remote hands to all systems.
If you have remote hands access, take the "remove, install" path, and script that.
If you really feel you should be doing this, then these are the pointers. I'm not going to show you exactly how I did it because it was "work" and not mine to share. The concepts are mine :-)
First, back up the /var/lib/rpm/Packages file (cp /var/lib/rpm/Packages /var/tmp/Packages.bkp). Put it somehwere safe. Update your backup if any one else changes the system whilst you are working on your solution. MAke regular checks on the count of RPMs and test every which way from Sunday, after each change or step.
You will need to use the db_unload and db_load commands. For speed, you will need to use "s2p" to convert any shell sed patterns to perl. Then build a pipe which looks like this:
db_unload /var/tmp/Packages.bkp |perl -i -e "s2p converted string" |db_load /var/tmp/Packages.new
You can then attempt to test the Packages.new by copying ot over the original. Always run rpm --rebuilddb after manual changes. If you see any errors, restore the back up and rebuild the db again.
If you need to put it in an RPM, then convert it to Lua, and put it in the pretrans or posttrans scriptlet (%pretrans -p <lua>). The selection depends on the ordering you are trying to achieve. The Lua interpreter is built in to rpm, and so it will run OK during a new system install even if your RPM gets called somehow. I wrapped my "pipe" in a lua long string, and made it only execute if the system already exists. It does nothing otherwise. If you are thinking "that will never happen" then check out "Never say Never".
BTW you can completely stiff your RPM base and thus future administration of the system if you mess this up. If you do that, and have no backup or way out, it would be a hard way to learn that you are responsible for your own actions. Just saying that you have been warned!
No you cannot edit an rpmdb: the headers are protected
from change by a SHA1 or a digital signature.
Instead upgrade to a fixed version of the package using --nopreun
to prevent running the buggy script let.

Find files created or modified by an installer

To track changes in OSX filesystem while an installer runs I'm trying to use the fs_usage.
Can somebody guide me with a simple example on how to interpret the result. There a lot of terms I don't understand when I examine it.
fs_usage's output tends to be full of irrelevant chatter, and hard to interpret. I'd recommend using fseventer, which just shows changed files without all the nonsense. If you're an Apple developer, you can also use PackageMaker's snapshot package feature (which records everything that happens, and offers to make an installer package that does the same things).

Appropriate/best practice way to execute some PHP unrelated to database when a module is first installed?

I'm creating a module that requires a few things to be done (once only) when the module is installed. There may be a number of things that need to be done, but the most basic thing that I need to do is make an API call to a server to let the external server know that the module was installed, and to get a few updated configuration items.
I read this this question on stackoverflow however in my situation I truly am interested in executing code that has nothing to do with the database, fixtures, updating tables, etc. Also, just to be clear this module has no affect (effect?) on the front end. FYI, I've also read this spectacular article by Alan Storm, but this really only drives home the point in my mind that the install/upgrade scripts are not for executing random PHP.
In my mind, I have several possible ways to accomplish this:
I do what I fear is not a best practice and add some PHP to my setup/install script to execute this php
I create some sort of cronjob that will execute the task I need once only (not sure how this would work, but it seems like it might be a "creative" solution - of course if cron is not setup properly then this will fail, which is not good
I create a core_config_data flag ('mynamespace/mymodule/initialized') that I set once my script has run, and I check on every area of the adminhtml that my module touches (CMS/Pages and my own custom adminhtml controller). This seems like a good solution except for all of the extra overhead every time CMS/Pages is hit or my controller is hit, checking this core_config_data setting. The GOOD thing about this solution would be that if something were to fail with my API call, I can set this flag to false and it will run again, display the appropriate message, and continue to run until it succeeds (or have additional logic that will stop the initialization code after XX number of attempts)
Are any of these options the "best" way, and is there any sort of precedent for this somewhere, such as a respected extension or from Magento themselves?
Thanks in advance for your time!
You raise an interesting question.
At the moment, I am not aware of a means to go about executing any arbitrary PHP on module installation, the obvious method (rightly/wrongly) would be to use the installer setup/upgrade script as per 1 of your Q.
2 and 3 seem like a more resource intensive approach, ie. needlessly checking on every page load (cache or not).
There is also the approach of using ./pear to install your module (assuming you packaged it using Magento). I had a very quick look through ./downloader/pearlib/php/pearmage.php but didn't see any lines which execute (vs copying files). I would have imagined this is the best place to execute something on install (other than 1 mentioned above).
But, I've never looked into this, so I'm fairly interested in other possible answers.

Resources