I'm attempting to download an image from google books with wget (I've tried curl as well) and I continually get a 500 error
// COMMAND
wget "http://books.google.com/books/content?id=pztHgTT4BGUC&printsec=frontcover&img=1"
// OUTPUT
--2016-07-13 20:58:06-- http://books.google.com/books/content?id=pztHgTT4BGUC&printsec=frontcover&img=1
Resolving books.google.com... 216.58.194.206, 2607:f8b0:4005:801::200e
Connecting to books.google.com|216.58.194.206|:80... connected.
HTTP request sent, awaiting response... 500 Internal Server Error
2016-07-13 20:58:06 ERROR 500: Internal Server Error.
It fails for the same reason the URL will fail in a browser if you're not logged into Google: The server refuses to serve you the content unless you're logged in.
You can probably copy a session cookie from an existing session if you log in with a browser and use it in wget.
Related
we're getting the following error when trying to update the Jenkins plugins. we've proxy configured
"
Failure -
java.io.IOException: Server returned HTTP response code: 401 for URL: https://updates.jenkins.io/download/plugins/active-directory/2.26/active-directory.hpi"
enter image description here
401 status code means failed authorisation.
https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2
Try verifying your username password once.
https://updates.jenkins.io/download/plugins/active-directory/2.26/active-directory.hpi is globally accessible.
A simple wget from the jenkins machine will download the new plugin if the proxy is configured correctly.
I'm trying to use WebUpd8 team's oracle-java8-installer to install Java 8 on my Ubuntu 14.04 computers. Some of them could succeed but others failed. After some debugging, I realized it was caused by the HTTP proxy setting. I'll provide more details below, but basically my questions are: Why does the use of http_proxy cause the problem? I believe it's must be related to how an HTTP proxy works, but since I have little experience in that, could someone tell me what knowledge I should learn to understand this issue?
Here are more details.
Under the hood, the oracle-java8-installer uses wget to download the jdk-8u181 package. So I can reproduce the issue with the steps below:
Install apt-cacher-ng: sudo apt-get install apt-cacher-ng
You don't have to configure anything in the APT configuration to reproduce this problem. apt-cacher-ng uses localhost:3142 by default to cache the packages.
Run http_proxy="http://localhost:3142" wget --continue --no-check-certificate -O jdk-8u181-linux-x64.tar.gz --header "Cookie: oraclelicense=a" http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz
Here are some notes:
The http://localhost:3142 is configured for apt-cacher-ng. Those machines that failed had apt-cacher-ng installed before I tried to install jdk-8u181.
The Cookie: oraclelicense=a is to indicate the user has accepted the license.
If you run the last command, the download of the jdk-8u181-linux-x64.tar.gz is finished instantly. There is a line saying "Proxy request sent, awaiting response... 200 OK". But if you open the received ".tar.gz", you'll see it's merely an HTML page that contains error information.
If you remove the http_proxy environment variable and run:
wget --continue --no-check-certificate -O jdk-8u181-linux-x64.tar.gz --header "Cookie: oraclelicense=a" http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz
You will have the full package downloaded correctly.
My best guess is that an HTTP proxy works with wget if the target URL is the final URL, so the proxy would cache it in its storage. Conceptually, it's like a key-value store:
proxy['URL'] = result
However, in this case, the target URL (http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz) actually returns a "302" code and a "Location" header field for the new URL. This can be seen from the output:
ywen#ubuntu:~$ wget --continue --no-check-certificate -O
jdk-8u181-linux-x64.tar.gz --header "Cookie: oraclelicense=a"
http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz
--2018-08-01 11:10:04-- http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz
Resolving download.oracle.com (download.oracle.com)... 23.32.72.143
Connecting to download.oracle.com
(download.oracle.com)|23.32.72.143|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location:
https://edelivery.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz
[following]
--2018-08-01 11:10:04-- https://edelivery.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz
Resolving edelivery.oracle.com (edelivery.oracle.com)...
23.216.148.161, 2001:559:19:3081::2d3e, 2001:559:19:3086::2d3e
Connecting to edelivery.oracle.com
(edelivery.oracle.com)|23.216.148.161|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location:
http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz?AuthParam=1533136324_72efc4e6208a5a7fc1cbba0527c741b6
[following]
--2018-08-01 11:10:04-- http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz?AuthParam=1533136324_72efc4e6208a5a7fc1cbba0527c741b6
Connecting to download.oracle.com
(download.oracle.com)|23.32.72.143|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 185646832 (177M) [application/x-gzip]
Saving to: ‘jdk-8u181-linux-x64.tar.gz’
Handling the redirection is out of the capability of a proxy (Am I right??), therefore those machines set with the HTTP proxies failed.
I was trying to download magento2.0.5 on linux using command
wget http://www.magentocommerce.com/downloads/assets/2.0.4/magento-2.0.4.tar.gz
error returned-
Connecting to magento.com (magento.com)|66.211.190.110|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
Is the download url changed? or are there some new permission constraints I need to pass through?
Try here
https://github.com/magento/magento2/releases
If you are looking for download link then here
https://github.com/magento/magento2/archive/2.3.0.tar.gz
I tried and this link works.
Currently i'm trying to get my script to download a mysql file from a website but it seems to go to a 302 redirect link. When i use the exact same link in my regular (windows) browser, it downloads the file.
Here's the output from the wget:
--2013-06-07 09:42:40-- http://6pp.kvdb.net/exports/mysql_sql.txt.gz
Resolving 6pp.kvdb.net... 2a01:7c8:eb:0:95:170:70:116, 212.78.187.48
Connecting to 6pp.kvdb.net|2a01:7c8:eb:0:95:170:70:116|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://www.d-centralize.nl/exports/mysql_sql.txt.gz [following]
--2013-06-07 09:42:40-- http://www.d-centralize.nl/exports/mysql_sql.txt.gz
Resolving www.d-centralize.nl... 2a00:1450:400c:c03::79, 173.194.66.121
Connecting to www.d-centralize.nl|2a00:1450:400c:c03::79|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2013-06-07 09:42:40 ERROR 404: Not Found.
As you can see that http://6pp.kvdb.net/exports/mysql_sql.txt.gz this url downloads the file (in windows browser).
Looks like 6pp.kvdb.net redirects to www.d-centralize.nl and www.d-centralize.nl has both an IPv4 and an IPv6 address:
$ host www.d-centralize.nl
www.d-centralize.nl is an alias for ghs.google.com.
ghs.google.com is an alias for ghs.l.google.com.
ghs.l.google.com has address 173.194.69.121
ghs.l.google.com has IPv6 address 2a00:1450:4008:c01::79
Their webserver seems to be misconfigured. It is listening on both addresses, but serving files only on the IPv4 address. As your box is IPv6 enabled it will prefer the IPv6 address, which is the broken one.
Try wget -4 ... to force the IPv4 address of the server.
I am using a script to pull down some XML data on a authentication required URL with WGET.
In doing so, my script produces the following output for each url accessed (IPs and hostnames changed to protect the guilty):
> Resolving host.name.com... 127.0.0.1
> Connecting to host.name.com|127.0.0.1|:80... connected.
> HTTP request sent, awaiting response... 401 Access denied
> Connecting to host.name.com|127.0.0.1|:80... connected.
> HTTP request sent, awaiting response... 401 Unauthorized
> Reusing existing connection to host.name.com:80.
> HTTP request sent, awaiting response... 200 OK
Why does WGET complain that accessing the URL fails twice before successfully connecting? Is there a way to shut it up, or get it to connect properly in the first attempt?
For reference, here's the line I am using to call WGET:
wget --http-user=USERNAME --password=PASSWORD -O file.xml http://host.name.com/file.xml
This appears to be by design. Following the advice of #Wayne Conrad, I added the -d switch and was able to observe the first attempt failing because NTLM was required, and the second attempt failing because the first NTLM attempt was only level 1, where a level 3 NTLM challenge-response was required. WGET finally provides the needed authentication at the third attempt.
WGET does get a cookie to prevent re-authenticating for the duration of the session, which would prevent this if the connection wasn't terminated between files. I would need to pass WGET a list of files for this to occur, however I am unable to because I do not know the file names in advance.
You seem to have a new version of wget. After 1.10.2, wget will not send out authentication unless challenged by the server first. And that is why the first one is failing. The second is failing cause of the what you described.
You can reduce one of them by adding the parameter --auth-no-challenge. This sends out the first in "basic" which will fail and the second one will be sent in "digest" mode. Which should work.