wget how to download only newer version of a file

wget how to download only newer version of a file - download

A script regularly downloads some data file from a remote server with wget:
CERTDIR=folder1
SPOOLDIR=folder2
URL="http://..."
FILENAME="$SPOOLDIR/latest.xml.gz"
/usr/bin/wget \
-N \
--quiet \
--private-key=${CERTDIR}/keynopass.pem \
--ca-certificate=${CERTDIR}/ca.pem \
--certificate=${CERTDIR}/client.pem \
"$URL" \
--output-document ${FILENAME}
The -N switch is used to turn on timestamping. (possibly redundant, this seems to be the default)
I was anticipating that the file only will be downloaded if there is a newer remote version.
But this is not the case. The actual download is done, no matter if the remote file has the same timestamp as the local file.
The file is a bit lengthy, so my plan was to check for a new version frequently, but download only as needed. Unfortunately this seems impossible with that approach.
Just guessing: the URL references no file, but is an api call. Could this be the reason?
But: the timestamp of the local file is set to the timestamp of the remote file - so I know, that the timestamp information is available.
Do I miss something?
Notes:
the remote server is not controlled by me
the local server runs ubuntu 16.04
wget --version: GNU Wget 1.17.1 built on linux-gnu.

The documentation mentions that:
Use of -O is not intended to mean simply "use the name file instead of
the one in the URL;" rather, it is analogous to shell redirection:
wget -O file http://foo is intended to work like wget -O - http://foo > file;
file will be truncated immediately, and all downloaded content will be written there.
For this reason, -N (for timestamp-checking) is not supported in
combination with -O: since file is always newly created, it will
always have a very new timestamp. A warning will be issued if this
combination is used.
Thus one option would be to leave out the -O option, let wget download the file (if needed), and just create a symlink in your target directory called latest.xml.gz pointing to the downloaded file...

Related

I get broken archive when I download to the folder

I get right archive when I used command:
wget https://mysite/published/my_archive.tar.gz
I can open this archive and use.
But I create fofder for archive
drwxr-xr-x folder_for_archive
and used command for download archive to the folder:
wget https://mysite/published/my_archive.tar.gz -o "./folder_for_archive/https://mysite/published/my_archive.tar.gz"
But archive is broken and I cannot use it.
What can I do?

The -o argument to wget doesn't do what you think it does:
-o logfile
--output-file=logfile
Log all messages to logfile. The messages are normally reported to standard error.
Either replace wget by curl (whose -o option does do what you expect), or use -O (capital letter O) instead:
-O file
--output-document file
The documents will not be written to the appropriate files, but all will be concatenated together and written to file.
Notice the use of plural "documents": wget is a recursive downloader, so it might produce more than one file. But in your case, it should be fine.
See the wget manual for more information on both options.

Considering a specific name for the downloaded file

I download a .tar.gz file using wget using this command:
wget hello.tar.gz
This is a part of a long script, sometimes when I want to download this file, an error occurs and when for the second time the file is downloaded the name of the downloaded file changes to something like this:
hello.tar.gz.2
the third time:
hello.tar.gz.3
How can I say that the whatever the name of the downloaded is, change it to hello.tar.gz?
In other words I don't want the name of the downloaded file be anything other than hello.tar.gz?

wget hello.tar.gz -O <fileName>

wget have internal option like -r, -p to change default behavior
So just try the following:
wget -p <url>
wget -r <url>

Since now you noticed the incremental change. Discard any repeated files and rely on the following as initial condition:
wget hello.tar.gz
mv hello.tar.gz.2 hello.tar.gz

wget hangs after large file download

I'm trying to download a large file over a ftp. (5GB file). Here is my script.
read ZipName
wget -c -N -q --show-progress "ftp://Password#ftp.server.com/$ZipName"
unzip $ZipName
The files downloads at 100% but never goes to the unzip command. No special error message, no outputs in the terminal. Just blank new line. I have to send CTRL + c and run back to script to unzip since wget detects that the file is fully downloaded.
Why does is hangs out like this? Is it because of the large file, or passing an argument in command?
By the way I can't use ftp because it's not on the VM i'm working on, and it's a temporary VM so no root privilege to install anything.

I've made some tests, and I think that size of the disk was the reason.
I've tried with curl -O and it worked for the same disk space.

wget - how to download all files that only include "480p" using wget from http server?

I want to download all files from a http server like:
http://dl.mysite.com/files/
and I also want to go inside each folder inside that folder.
But I do want to download only those files that have "480p" in their name.
What is the easiest solution for that using wget?
edit:
I want to have that script to be run each night from 2am to 6am to sync those files from that server to my PC.

The following wget command should work with the following flags:
wget -A "*480p*" -r -np -nc --no-check-certificate -e robots=off http://dl.mysite.com/files/
Explanation:
-A "480p" your pattern
-r, recursively recursively look through the folders
-np, --no-parent ignore links to a higher directory
-nc, --no-clobber If a file is downloaded more than once in the same directory, Wget’s behavior depends on a few options, including ‘-nc’. In certain cases, the local file will be clobbered, or overwritten, upon repeated download. In other cases it will be preserved.
--no-check-certificate Don’t check the server certificate against the available certificate authorities.
-e, --execute command A command thus invoked will be executed after the commands in .wgetrc
robots=off robot exclusion
More information on wget flags can be found at the official GNU manual page: https://www.gnu.org/software/wget/manual/wget.html
With regards to it being run once per day, you may want to read up on Cron jobs. Taken from the documentation page at: https://help.ubuntu.com/community/CronHowto
A crontab file is a simple text file containing a list of commands meant to be run at specified times. It is edited using the crontab command. The commands in the crontab file (and their run times) are checked by the cron daemon, which executes them in the system background.
So basically you need to put your wget command into a file, and set the cron to run this file at the specified time.
Note: Windows does not have a native implementation of Cron, but you can achieve the same effect using the Windows Task Scheduler.

Untar multipart tarball on Windows

I have a series of files named filename.part0.tar, filename.part1.tar, … filename.part8.tar.
I guess tar can create multiple volumes when archiving, but I can't seem to find a way to unarchive them on Windows. I've tried to untar them using 7zip (GUI & commandline), WinRAR, tar114 (which doesn't run on 64-bit Windows), WinZip, and ZenTar (a little utility I found).
All programs run through the part0 file, extracting 3 rar files, then quit reporting an error. None of the other part files are recognized as .tar, .rar, .zip, or .gz.
I've tried concatenating them using the DOS copy command, but that doesn't work, possibly because part0 thru part6 and part8 are each 100Mb, while part7 is 53Mb and therefore likely the last part. I've tried several different logical orders for the files in concatenation, but no joy.
Other than installing Linux, finding a live distro, or tracking down the guy who left these files for me, how can I untar these files?

Install 7-zip. Right click on the first tar. In the context menu, go to "7zip -> Extract Here".
Works like a charm, no command-line kung-fu needed:)
EDIT:
I only now noticed that you mention already having tried 7zip. It might have balked if you tried to "open" the tar by going "open with" -> 7zip - Their command-line for opening files is a little unorthodox, so you have to associate via 7zip instead of via the file association system built-in to windows. If you try the right click -> "7-zip" -> "extract here", though, that should work- I tested the solution myself (albeit on a 32-bit Windows box- Don't have a 64 available)

1) download gzip http://www.gzip.org/ for windows and unpack it
2) gzip -c filename.part0.tar > foo.gz
gzip -c filename.part1.tar >> foo.gz
...
gzip -c filename.part8.tar >> foo.gz
3) unpack foo.gz
worked for me

As above, I had the same issue and ran into this old thread. For me it was a severe case of RTFM when installing a Siebel VM . These instructions were straight from the manual:
cat \
OVM_EL5U3_X86_ORACLE11G_SIEBEL811ENU_SIA21111_PVM.tgz.1of3 \
OVM_EL5U3_X86_ORACLE11G_SIEBEL811ENU_SIA21111_PVM.tgz.2of3 \
OVM_EL5U3_X86_ORACLE11G_SIEBEL811ENU_SIA21111_PVM.tgz.3of3 \
| tar xzf –
Worked for me!

The tar -M switch should it for you on windows (I'm using tar.exe).
tar --help says:
-M, --multi-volume create/list/extract multi-volume archive

I found this thread because I had the same problem with these files. Yes, the same exact files you have. Here's the correct order: 042358617 (i.e. start with part0, then part4, etc.)
Concatenate in that order and you'll get a tarball you can unarchive. (I'm not on Windows, so I can't advise on what app to use.) Note that of the 19 items contained therein, 3 are zip files that some unarchive utilities will report as being corrupted. Other apps will allow you to extract 99% of their contents. Again, I'm not on Windows, so you'll have to experiment for yourself.
Enjoy! ;)

This works well for me with multivolume tar archives (numbered .tar.1, .tar.2 and so on) and even allows to --list or --get specific folders or files in them:
#!/bin/bash
TAR=/usr/bin/tar
ARCHIVE=bkup-01Jun
RPATH=home/user
RDEST=restore/
EXCLUDE=.*
mkdir -p $RDEST
$TAR vf $ARCHIVE.tar.1 -F 'echo '$ARCHIVE'.tar.${TAR_VOLUME} >&${TAR_FD}' -C $RDEST --get $RPATH --exclude "$EXCLUDE"
Copy to a script file, then just change the parameters:
TAR=location of tar binary
ARCHIVE=Archive base name (without .tar.multivolumenumber)
RPATH=path to restore (leave empty for full restore)
RDEST=restore destination folder (relative or absolute path)
EXCLUDE=files to exclude (with pattern matching)
Interesting thing for me is you really DON'T use the -M option, as this would only ask you questions (insert next volume etc.)

Hello perhaps would help.
I had the same problems ...
a save on my web site made automaticaly in Centos at 4 am create multiple file in multivolume tar format (saveblabla.tar, saveblabla.tar1.tar, saveblabla.tar2.tar,etc..)
after downloading this file on my PC (windows) i can't extract them with both windows cmd or 7zip (unknow error).
I thirst binary copy file to reassemble tar files. (above in that thread)
copy /b file1+file2+file3 destination
after that, 7zip worked !!! Thanks for you help

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio