How to use curl get image right filename and extension - bash

My links just like below
https://cdn.sspai.com/2022/06/22/article/a88df95f3401d5b6c9d716bf31eeef33?imageView2/2/w/1120/q/90/interlace/1/ignore-error/1
If I use chrome to open this like and cmd + s
I will get the right filename and right extension png.
But if I use bash below, then it will no extension:
curl -J -O https://cdn.sspai.com/2022/06/22/article/a88df95f3401d5b6c9d716bf31eeef33?imageView2/2/w/1120/q/90/interlace/1/ignore-error/1
I just want to download image with right filename and extension.
a88df95f3401d5b6c9d716bf31eeef33.png
Same error include different image links below:
https://cdn.sspai.com/article/fa848601-4cdf-38b0-b020-7afd6efc4a7e.jpg?imageMogr2/auto-orient/quality/95/thumbnail/!800x400r/gravity/Center/crop/800x400/interlace/1

You can get the name from the URL itself.
url="YOUR-URL"
file="`echo "${url}" | sed 's|\?.*|.jpg|' | xargs basename`"
curl -o "${file}.tmp" "${url}"
mv "${file}.tmp" "${file}"
Hope it helps

Related

How to download the latest binary release from github?

I want to download the two (.bin and .zip) binaries from the latest releases.
I tried using the following command
curl -s https://github.com/Atmosphere-NX/Atmosphere/releases/latest | grep "browser_download_url.*zip" | cut -d : -f 2,3 | tr -d "\" | wget -qi -
but nothing happens, output being SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
I'm open to using any other (wget, ecurl etc) commands.
Is it trying to extract the download link from the HTML page? That's error prone and may break any time.
For such operations, check if they offer an API first.
They do: https://docs.github.com/en/rest/reference/releases#get-the-latest-release
You could write something like (pseudo code):
curl \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/repos/Atmosphere-NX/Atmosphere/releases/latest \
| jq .assets[0].browser_download_url \
| xargs wget -qi -
Like suggested in the comments, test each command (pipe separated) individually.
You can use the GitHub CLI, specifically the release download command:
gh release download --repo Atmosphere-NX/Atmosphere --pattern '*.bin'
gh release download --repo Atmosphere-NX/Atmosphere --archive zip
Without specifying a release tag, the command defaults to the latest release.
Just running curl on the url gives this:
curl https://github.com/Atmosphere-NX/Atmosphere/releases/latest
<html><body>You are being redirected.</body>
So, you easily see something is amiss straight off. Checking the curl help, you find options, command below to pinpoint what you need:
curl --help | grep redirect
-L, --location Follow redirects
--max-redirs <num> Maximum number of redirects allowed
--proto-redir <protocols> Enable/disable PROTOCOLS on redirect
--stderr Where to redirect stderr
First clue is redirect in the response and then we see in the help section that there is a flag to handle that.
Running it with th -L command gives the expected output. Pipeing it to grep "browser_download_url.*zip" however gives you nothing. You then investigate to see what the right match would be. But let's try mathing just html link with zip, just to see what happens.
curl -sL https://github.com/Atmosphere-NX/Atmosphere/releases/latest | grep "href=.*zip"
<a href="/Atmosphere-NX/Atmosphere/releases/download/1.2.6/atmosphere-1.2.6-master-173d5c2d3+hbl-2.4.1+hbmenu-3.5.0.zip" rel="nofollow" data-skip-pjax>
<a href="/Atmosphere-NX/Atmosphere/archive/refs/tags/1.2.6.zip" rel="nofollow" data-skip-pjax>
From there you can probably find what you are after to construct your command. As you see, links are relative with this method, so you still have to provide the base url to wget (or a curl equivalent) to finally be able to dowload what you are after.
This is more a reply to get you going on trouble shooting. You already have other answers to actually do what you want. But if you can't install the tools suggested, you could probably do something like this:
curl -sL https://github.com/Atmosphere-NX/Atmosphere/releases/latest |
awk '/releases\/download/ && done != 1 {
sub(/.*href="/, "https://github.com")
sub(/".*/, "")
print
done = 1
}' |
xargs curl -LsO
Not suggesting this is a good way, just a way.

return filename after downloading file with curl

I would like to capture the filename to a variable of a file that is downloaded using curl. I am using the following flag to preserve the filename as in the below using --remote-name
My code:
file1=$(curl -O --remote-name 'https://url.com/download_file.tgz')
echo $file1
You can use the -w|--write-out switch of curl:
file1="$(curl -O --remote-name -s \
-w "%{filename_effective}" "https://url.com/download_file.tgz")"
echo "$file1"
file1=download_file.tgz
url=https://url.com/$file1 #encoding this might be necessary
curl -O --remote-name $url
echo $file1
If, to construct the URL, you need to know the filename that you want to download, then you don't need anything from curl to identify the file that it downloaded unless there is not a 1:1 relationship between the basename of the URL and file that was downloaded.

wget to parse a webpage in shell

I am trying to extract URLS from a webpage using wget. I tried this
wget -r -l2 --reject=gif -O out.html www.google.com | sed -n 's/.*href="\([^"]*\).*/\1/p'
It is displaiyng FINISHED
Downloaded: 18,472 bytes in 1 files
But not displaying the weblinks. If I try to do it seperately
wget -r -l2 --reject=gif -O out.html www.google.com
sed -n 's/.*href="\([^"]*\).*/\1/p' < out.html
Output
http://www.google.com/intl/en/options/
/intl/en/policies/terms/
It is not displaying all the links
ttp://www.google.com
http://maps.google.com
https://play.google.com
http://www.youtube.com
http://news.google.com
https://mail.google.com
https://drive.google.com
http://www.google.com
http://www.google.com
http://www.google.com
https://www.google.com
https://plus.google.com
And more over I want to get links from 2nd level and more can any one give a solution for this
Thanks in advance
The -O file option captures the output of wget and writes it to the specified file, so there is no output going through the pipe to sed.
You can say -O - to direct wget output to standard output.
If you don't want to use grep, you can try
sed -n "/href/ s/.*href=['\"]\([^'\"]*\)['\"].*/\1/gp"

How to download a file using curl

I'm on mac OS X and can't figure out how to download a file from a URL via the command line. It's from a static page so I thought copying the download link and then using curl would do the trick but it's not.
I referenced this StackOverflow question but that didn't work. I also referenced this article which also didn't work.
What I've tried:
curl -o https://github.com/jdfwarrior/Workflows.git
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information
.
wget -r -np -l 1 -A zip https://github.com/jdfwarrior/Workflows.git
zsh: command not found: wget
How can a file be downloaded through the command line?
The -o --output option means curl writes output to the file you specify instead of stdout. Your mistake was putting the url after -o, and so curl thought the url was a file to write to rate and hence that no url was specified. You need a file name after the -o, then the url:
curl -o ./filename https://github.com/jdfwarrior/Workflows.git
And wget is not available by default on OS X.
curl -OL https://github.com/jdfwarrior/Workflows.git
-O: This option used to write the output to a file which named like remote file we get. In this curl that file would be Workflows.git.
-L: This option used if the server reports that the requested page has moved to a different location (indicated with a Location: header and a 3XX response code), this option will make curl redo the request on the new place.
Ref: curl man page
The easiest solution for your question is to keep the original filename. In that case, you just need to use a capital o ("-O") as option (not a zero=0!). So it looks like:
curl -O https://github.com/jdfwarrior/Workflows.git
There are several options to make curl output to a file
# saves it to myfile.txt
curl http://www.example.com/data.txt -o myfile.txt -L
# The #1 will get substituted with the url, so the filename contains the url
curl http://www.example.com/data.txt -o "file_#1.txt" -L
# saves to data.txt, the filename extracted from the URL
curl http://www.example.com/data.txt -O -L
# saves to filename determined by the Content-Disposition header sent by the server.
curl http://www.example.com/data.txt -O -J -L
# -O Write output to a local file named like the remote file we get
# -o <file> Write output to <file> instead of stdout (variable replacement performed on <file>)
# -J Use the Content-Disposition filename instead of extracting filename from URL
# -L Follow redirects

get veehd url in bash/python?

Can anybody figure out how to get the .avi URL of a veehd[dot]com video, by providing the page of the video in a script? It can be BASH, or Python, or common programs in Ubuntu.
They make you install a extension, and I've tried looking at the code but I can't figure it out.
This worked for me:
#!/bin/bash
URL=$1 # page with the video
FRAME=`wget -q -O - $URL | sed -n -e '/playeriframe.*do=d/{s/.*src : "//;s/".*//p;q}'`
STREAM=`wget -q -O - http://veehd.com$FRAME | sed -n -e '/<a/{s/.*href="//;s/".*//p;q}'`
echo $STREAM

Resources