How to retrieve a video from an HTML file using wget - bash

I am trying to download some videos from a website using wget. I used the syntax of wget.
The command is as follows:
wget https://some_link/download.mp4?lecture_id=5
The problem is that it is downloading an HTML file. However, when I right click on this link in the website and select Save target as, I get the video file which I want to save. Similarily, when I click on the link, it shows a video file that can be saved or opened.
I tried the following command but to no avail:-
wget -O vid.mp4 https://some_link/download.mp4?lecture_id=5
It created a .mp4 file but it didn't have any video in it. The size of this file was also equal to the size of the HTML file that was created before.

You can use wget with parameters to download files recursively (you can download html page with all resources needed by this page like graphics).
Alternatively you can download HTML page via wget, then look for the mp4 file using grep with regular expression and then again use wget do download only mp4 file.

Related

Is it possible to get a files owner url metadata in the macOS terminal?

I can access the meta data property "owner url" thru Photoshop, but am hoping that there's a way to access it from the command line without having to open the file.
Does anyone know of a way to do this?
mdls doesn't list this particular metadata field.
There is no built-in command line tool to achieve this.
However, you can utilize exiftool, which is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in a wide variety of files.
Installation:
The guidelines for installing it on macOS can be found here. In summary:
Download the ExifTool OS X Package from the ExifTool home page.
(The file you download should be named ExifTool-11.17.dmg.)
Install as a normal OS X package.
(Open the disk image, double-click on the install package, and
follow the instructions.)
You can now run exiftool by typing exiftool in a Terminal window.
Processing a single file:
Reading the "owner url" via the command line:
Run the following command in a Terminal window:
$ exiftool -b −xmp:WebStatement ~/Desktop/path/to/image.psd
Note: the ~/Desktop/path/to/image.psd part in the command above should be replaced with a real image filepath.
This command will log the URL to the console only if the image metadata contains one. For instance:
https://www.example.com
Writing the "owner url" via the command line:
You can also write the "owner url" to a file by running the following command:
$ exiftool −xmp:WebStatement="https://www.foobar.com" ~/Desktop/path/to/image.psd
Note: As mentioned previously, the ~/Desktop/path/to/image.psd part in the command above should be replaced with a real image filepath, and the https://www.foobar.com part should be replaced with the actual URL you want to apply.
Processing multiple files:
Reading the "owner url" for multiple files via the command line:
If you wanted to read the "owner url" for all image files within a given folder, (including those in sub folders), and generate a JSON report you can run the following command:
$ exiftool -j -r −xmp:WebStatement ~/Desktop/path/to/folder/ -ext jpg -ext png -ext psd -ext tif > ~/Desktop/owner-urls.json
Breakdown of command (above):
-j - Use JSON formatting for output.
-r - Recursively process sub directories.
−xmp:WebStatement - Retrieve the WebStatement value, i.e. "owner url".
~/Desktop/path/to/folder/ - The path to the folder containing images (This should be replaced with a real path to a folder).
-ext jpg -ext png -ext psd -ext tif - The file extension(s) to process.
> ~/Desktop/owner-urls.json - Save the JSON output to file at the Desktop named owners-url.json.

How to download a page from GET url using wget

I am trying to download the google search results page for Gop primaries results using wget but I am not able to do that (this page). But, I noticed that the webpage is getting the data from this file https://goo.gl/KPGSqS which it gets using a GET request.
So, I was wondering if there is a way to download that file with wget? The usual way i do is using wget -c url but that is not working. So, any ideas on what i should do for this?
I tried with the user-agent option, but even that isn't working.
If you want to download a webpage content (parsed to simple text file or source html code) you could consider using lynx. Download lynx by typing sudo apt-get lynx and then you could save the webpage content using lynx http://your.url/ > savefile.txt.
You can find how to use lynx in this page

Linux Shell Curl Command

I have a problem with using the curl command: I have to download a file from a site that is of "www.example.com:8000/get.php?username=xxxx&password=xxxx" form.
Now what happens is this: If I open the link from the browser, if that file exists, part of the automatic download, but if the link is not correct, nothing happens and the displayed page is white.
My problem is that by using the command
curl -o file.txt "www.example.com:8000/get.php?username=xxxx&password=xxxx", the file is generated to me is if the link is correct, by downloading the file correctly, is that the link is not right, generating the 0 byte .txt file.
How can I do that, if not corrected the link (and therefore there is no file to download), no file is generated to me?

How to download a file with wget that starts with a word and it has a specific extension?

Im trying to do a bash script and i need to download certain files with wget
like libfat-nds-1.0.11.tar.bz2 but after some times the version of this file may change so i would like to download a file that start with libfatnds and ends in .tar.bz2 .Is this possible with wget?
Using only wget, it can be achieved by specifying filename with wildcards in the list of accepted extensions.
wget -r -np -nd --accept='libfat-nds-*.tar.bz2'
The problem is that HTTP doesn't support wildcard downloads
. But if there is content listing enabled on the server or you have a index.html containing the available file names you could download that, extract the file name you need and then download the file with wget.
Something in this order
Download the index with curl
Use grep and/or sed to extract the exact file name
Download the file with wget (or curl)
If you pipe the commands you can do it on one line.

Creating a static copy of a web page on UNIX command line or shell script

I need to create a static copy of a web page (all media resources, like CSS, images and JS included) in a shell script. This copy should be openable offline in any browser.
Some browsers have a similar functionality (Save As... Web Page, complete) which create a folder from a page and rewrite external resources as relative static resources in this folder.
What's a way to accomplish and automatize this on Linux command line to a given URL?
You can use wget like this:
wget --recursive --convert-links --domains=example.org http://www.example.org
this command will recursively download any page reachable by hyperlinks from the page at www.example.org not following links outside the example.org domain.
Check wget manual page for more options for controlling recursion.
You want the tool wget to mirror a site do:
$ wget -mk http://www.example.com/
Options:
-m --mirror
Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps
FTP
directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.
-k --convert-links
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not
only the
visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets,
hyperlinks to non-HTML content, etc.

Resources