DD-WRT wget returns a cached file - bash

I'm developing an installer for my YAMon script for *WRT routers (see http://www.dd-wrt.com/phpBB2/viewtopic.php?t=289324).
I'm currently testing on a TP-Link TL-WR1043ND with DD-WRT v3.0-r28647 std (01/02/16). Like many others, this firmware variant does not include curl so I (gracefully) fall back to a wget call. But, it appears that DD-WRT includes a cut-down version of wget so the -C and --no-cache options are not recognized.
Long & short, my wget calls insist on downloading cached versions of the requested files.
BTW - I'm using: wget "$src" -qO "$dst"
where src is the source file on my remote server and dst is the destination on the local router
So far I've unsuccessfully tried to:
1. append a timestamp to the request URL
2. reboot the router
3. run stopservice dnsmasq & startservice dnsmasq
None have changed the fact that I'm still getting a cached version of the file.
I'm beating my head against the wall... any suggestions? Thx!
Al

Not really an answer but a seemingly viable workaround...
After a lot of experimentation, I found that wget seems to always return the latest version of the file from the remote server if the extension on the requested file is '.html'; but if it is something else (e.g., '.txt' or '.sh'), it does not.
I have no clue why this happens or where they are cached.
But now that I do, all of the files required by my installer have an html extension on the remove server and the script saves them with the proper extension locally. (Sigh...several days of my life that I won't get back)
Al

I had the same prob. While getting images from a camera the HTTP server on the camera always send the same image.
wget --no-http-keep-alive ..
solved my problem
and my full line is
wget --no-check-certificate --no-cache --no-cookies --no-http-keep-alive $URL -O img.jpg -o wget_last.log

Related

`wget -i files.txt` gives Scheme Missing error on VPS

I have gathered few min.map files and stored it in mapfiles.txt
as shown below,
https://example.com/app.bundle.min.map
https://example.com/app.bundle.min.map
https://example.com/app.bundle.min.map
I read the man wget page and found that to download files using wget we need to use
wget -i filename.txt
So I tried same thing and observed weird behavior
On local system,
wget -i mapfiles.txt
This command started downloading .map files
On VPS,
wget -i mapfiles.txt
Got this error
https://example/app.bundle.min.map: Scheme missing No URLs found in mapfiles.txt.
Could you guys help me to figure out where I am going wrong?

Mirror multiple page site with lftp

I need to mirror data hosted on a web site on a regular basis, I am trying to use lftp (version 4.0.9) as it usually does a great job for this task. However the site I am downloading from has multiple pages (I am intending to loop over the most recent n pages in a bash script which will run several times a day). I can't work out how to get lftp to accept the page parameter. I've had no luck searching for a solution online and what I have tried has failed so far.
This works perfectly:
lftp -c 'mirror -v -i "S1A" -P 4 https://qc.sentinel1.eo.esa.int/aux_resorb/'
This does not:
lftp -c 'mirror -v -i "S1A" -P 4 https://qc.sentinel1.eo.esa.int/aux_resorb/?page=2'
It gives error:
mirror: Access failed: 404 NOT FOUND (/aux_resorb/?page=2)
I also tried passing the new URL in as a variable but that didn't work either. I'd be grateful for suggestions to solve this issue.
Before it is suggested, I know wget is an option and the pagination works - I tested it - I don't want to use it because it is less appropriate for this as it wastes a lot of time getting all the "index.html?param=value" and then removing them, given the number of pages this isn't feasible.
The problem with the lftp's mirror command is that it adds a slash to the given URL when requesting the page (see below). So it boils down how the remote end will handle URLs and whether it gets upset of the trailing slash. On my tests, Drupal sites for example do not like the trailing slash and will return a 404 but some other sites worked fine. Unfortunately I was not able to figure out a workaround if you insist of using lftp.
Tests
I tried the following requests against a web server:
1. lftp -c 'mirror -v http://example/path'
2. lftp -c 'mirror -v http://example/path/?page=2'
3. lftp -c 'mirror -v http://example/path/file'
4. lftp -c 'mirror -v http://example/path/file?page=2'
These commands resulted to the following HEAD requests seen by the web server:
1. HEAD /path/
2. HEAD /path/%3Fpage=2/
3. HEAD /path/file/
4. HEAD /path/file%3Fpage=2/
Note that there's always a trailing slash in the request. %3F is just the URL encoded character ?.

Wget mirror site via ftp - timestamps issue

A site I'm working on requires a large amount of files to be downloaded from an external system via FTP, daily. This is not of my design, it is the only solution offered up by the external provider (I cannot use SSH/SFTP/SCP).
I've solved this by using wget, run inside a cron task:
wget -m -v -o log.txt --no-parent -nH -P /exampledirectory/ --user username --password password ftp://www.example.com/"
Unfortunately, wget does not seem to see the timestamp differences, so when a file is modified, it still returns:
Remote file no newer than local file
`/xxx/data/data.file'
-- not retrieving.
When I manually connect via FTP, I can see differences in the timestamps, so it should be getting the updated file. I'm not able to access or control the target server via any other means.
Is there anything I can do to get around this? Can I force wget to mirror while ignoring timestamps? (I understand this defeats the point of mirroring)...

curl returns error (6) occasionally

I have a bash script that downloads some files from an ftp server. the problem is that sometimes curl returns errors 6 (can't resolve host) randomly! I can open the ftp via web browser without any problem. I also noticed that the most errors occurs on the first downloads. any idea?
Also I wanted to know that how can I make curl to retry download when these errors occur
Code I used:
curl -m 60 --retry 10 --retry-delay 10 --ftp-method multicwd -C - ftp://some_address/some_file --output ./some_file
note: I also tried the code without --ftp-method multicwd
OS: CentOS 6.5 64bit
while [ "$ret" != "0" ]; do curl [your options]; ret=$?; sleep 5; done
Assuming those are transitional problems with the server and/or DNS, looping might be of some help. This is a particularly good case for the rarely used (?) until loop:
until curl [your options]; do sleep 5; done
In addition, if using curl is not mandatory, maybe wget might be better suited for "unreliable" network connections. From the man:
GNU Wget is a free utility for non-interactive download of files from
the Web. It supports HTTP, HTTPS, and FTP protocols, as well as
retrieval through HTTP proxies.
[...]
Wget has been designed for robustness over slow or unstable network connections; if a download fails due to
a network problem, it will keep retrying until the whole file has been retrieved. If the server supports
regetting, it will instruct the server to continue the download from where it left off.

How would I construct a terminal command to download a folder with wget from a Media Temple (gs) server?

I'm trying to download a folder using wget on the Terminal (I'm usin a Mac if that matters) because my ftp client sucks and keeps timing out. It doesn't stay connected for long. So I was wondering if I could use wget to connect via ftp protocol to the server to download the directory in question. I have searched around in the internet for this and have attempted to write the command but it keeps failing. So assuming the following:
ftp username is: serveradmin#mydomain.ca
ftp host is: ftp.s12345.gridserver.com
ftp password is: somepassword
I have tried to write the command in the following ways:
wget -r ftp://serveradmin#mydomain.ca:somepassword#s12345.gridserver.com/path/to/desired/folder/
wget -r ftp://serveradmin:somepassword#s12345.gridserver.com/path/to/desired/folder/
When I try the first way I get this error:
Bad port number.
When I try the second way I get a little further but I get this error:
Resolving s12345.gridserver.com... 71.46.226.79
Connecting to s12345.gridserver.com|71.46.226.79|:21... connected.
Logging in as serveradmin ...
Login incorrect.
What could I be doing wrong?
Use scp on the Mac instead, it will probably work much nicer.
scp -r user#mediatemplehost.net:/folder/path /local/path

Resources