I currently have been blocked from http://www.apkmirror.com, so a wget produces an ERROR 403: Forbidden:
kurt#kurt-ThinkPad:~$ wget http://www.apkmirror.com
--2017-04-21 12:51:42-- http://www.apkmirror.com/
Resolving www.apkmirror.com (www.apkmirror.com)... 104.19.135.58, 104.19.132.58, 104.19.133.58, ...
Connecting to www.apkmirror.com (www.apkmirror.com)|104.19.135.58|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2017-04-21 12:51:42 ERROR 403: Forbidden.
I'm trying to use a proxy from https://free-proxy-list.net/ to gain access. For example, following https://www.gnu.org/software/wget/manual/html_node/Proxies.html, I've tried
kurt#kurt-ThinkPad:~$ wget http://www.apkmirror.com#70.32.89.160:3128
--2017-04-21 12:57:56-- http://www.apkmirror.com#70.32.89.160:3128/
Connecting to 70.32.89.160:3128... connected.
HTTP request sent, awaiting response... 400 Bad Request
2017-04-21 12:57:59 ERROR 400: Bad Request.
but I get an ERROR 400: Bad Request. Is there anything wrong with this (attempted) usage of wget?
You are using the wrong method, Let me tell you.
Go to home on your server or computer using cd ~
then create a file using vim like vi ~/.wgetrc
and paste your proxy URL inside the file using as below.
use_proxy = on
http_proxy = http://70.32.89.160:3128
https_proxy = http://70.32.89.160:3128
ftp_proxy = http://70.32.89.160:3128
now use below command to access your blocked side.
wget -e use_proxy=yes -e http_proxy=$proxy http://www.apkmirror.com
or try using wget http://www.apkmirror.com you will see below output.
root#ubuntu:~# wget www.apkmirror.co
--2017-04-21 08:12:45-- http://www.apkmirror.co/
Connecting to 70.32.89.160:3128... connected.
Proxy request sent, awaiting response... 301 Moved Permanently
Location: http://www.apkmirror.co/ [following]
--2017-04-21 08:12:47-- http://www.apkmirror.co/
Connecting to 70.32.89.160:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 76512 (75K) [text/html]
Saving to: ‘index.html.8’
100%[===================================================================================================================>] 76,512 447KB/s in 0.2s
2017-04-21 08:12:49 (447 KB/s) - ‘index.html.8’ saved [76512/76512]
you can use,this is:
wget -e use_proxy=yes -e http_proxy=http://ilanni:123456#10.10.10.128:3128 http://oss.aliyuncs.com/aliyunecs/rds_backup_extract.sh
start wget through socks5 proxy using tsocks:
install tsocks: sudo apt install tsocks
config tsocks
# vi /etc/tsocks.conf
server = 127.0.0.1
server_type = 5
server_port = 1080
start: tsocks wget http://url_to_get
Related
I have a bucket that allows for open files. I have uploaded a test file called test.gsm and have tried to presign the file by doing
root#server2:~# aws s3 presign s3://dovid-ft/test.gsm --expires-in 604800
https://dovid-ft.s3.amazonaws.com/test.gsm?AWSAccessKeyId=AKIAJSDPJKCCGAZ257VQ&Signature=0zbBU2B%2FKVrqgOXFQNTGh3gme%2Fo%3D&Expires=1625658191
root#server2:~#
If I then try to grab that file I get a 403.
root#server2:~# wget 'https://dovid-ft.s3.amazonaws.com/test.gsm?AWSAccessKeyId=AKIAJSDPJKCCGAZ257VQ&Signature=0zbBU2B%2FKVrqgOXFQNTGh3gme%2Fo%3D&Expires=1625658191'
--2021-06-30 07:49:21-- https://dovid-ft.s3.amazonaws.com/test.gsm?AWSAccessKeyId=AKIAJSDPJKCCGAZ257VQ&Signature=0zbBU2B%2FKVrqgOXFQNTGh3gme%2Fo%3D&Expires=1625658191
Resolving dovid-ft.s3.amazonaws.com (dovid-ft.s3.amazonaws.com)... 52.217.88.204
Connecting to dovid-ft.s3.amazonaws.com (dovid-ft.s3.amazonaws.com)|52.217.88.204|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-06-30 07:49:21 ERROR 403: Forbidden.
root#server2:~#
I also tried to decode the HTML of the key to see if that would help and it did not.
root#server2:~# wget 'https://dovid-ft.s3.amazonaws.com/test.gsm?AWSAccessKeyId=AKIAJSDPJKCCGAZ257VQ&Signature=0zbBU2B/KVrqgOXFQNTGh3gme/o=&Expires=1625658191'
--2021-06-30 07:49:37-- https://dovid-ft.s3.amazonaws.com/test.gsm?AWSAccessKeyId=AKIAJSDPJKCCGAZ257VQ&Signature=0zbBU2B/KVrqgOXFQNTGh3gme/o=&Expires=1625658191
Resolving dovid-ft.s3.amazonaws.com (dovid-ft.s3.amazonaws.com)... 52.217.32.100
Connecting to dovid-ft.s3.amazonaws.com (dovid-ft.s3.amazonaws.com)|52.217.32.100|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-06-30 07:49:37 ERROR 403: Forbidden.
root#server2:~#
Is there any way to get logs or see what the issue is and why my request is being rejected? As of now the only way to be able to get the file is to make it publicly available which I don't want to do.
It turns out I was using the wrong credentials to presign the file. Why amazon didn't throw an error when I tried to presign them with the wrong credentials is beyond me.
The solution is in wget. after recreating the scenario I wasn't able to download a file via wget also.
wget -O test.gsm "https://yourURL" # and will do.
reference: Amazon AWS S3 signed URL via Wget
I'm using wget to retrieve the Instagram JSON from the URL https://www.instagram.com/instagram/?__a=1.
Running wget from my local Manjaro setup returns a 200 OK and the proper JSON response, but running it from a Debian server retrieves a 302 found.
At first I thought it could be because of the wget version differences, but running curl locally also works while wget doesn't work properly.
Is there anything that I should be setting up on my server to get a proper response? My guess is that the HTTPS connection is refusing my server from connecting properly.
So, this is a weird quirk of the Instagram servers. Nothing you can do about it.
The problem is that Instagram responds differently depending on whether you connect to their server over IPv4 or IPv6. Why they would do that is beyond me, but I can reliably reproduce the result by controlling for only this variable.
IPv4:
$ wget -O/dev/null -4 "https://www.instagram.com/instagram/?__a=1"
--2020-09-03 14:22:15-- https://www.instagram.com/instagram/?__a=1
Resolving www.instagram.com (www.instagram.com)... 157.240.27.174
Connecting to www.instagram.com (www.instagram.com)|157.240.27.174|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 118552 (116K) [application/json]
Saving to: ‘/dev/null’
100%[================================================================================================================================>] 118,552 306KB/s in 0.4s
2020-09-03 14:22:17 (306 KB/s) - ‘/dev/null’ saved [118552/118552]
IPv6:
$ wget -O/dev/null -6 "https://www.instagram.com/instagram/?__a=1"
--2020-09-03 14:22:54-- https://www.instagram.com/instagram/?__a=1
Resolving www.instagram.com (www.instagram.com)... 2a03:2880:f23f:e5:face:b00c:0:4420
Connecting to www.instagram.com (www.instagram.com)|2a03:2880:f23f:e5:face:b00c:0:4420|:443... connected.
HTTP request sent, awaiting response... 302 Found
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Location: https://www.instagram.com/accounts/login/?next=/instagram/%3F__a%3D1 [following]
--2020-09-03 14:22:54-- https://www.instagram.com/accounts/login/?next=/instagram/%3F__a%3D1
Reusing existing connection to [www.instagram.com]:443.
HTTP request sent, awaiting response... 200 OK
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Length: 48094 (47K) [text/html]
Saving to: ‘/dev/null’
100%[================================================================================================================================>] 48,094 --.-K/s in 0.04s
2020-09-03 14:22:54 (1.28 MB/s) - ‘/dev/null’ saved [48094/48094]
This is the same thing you see in your debug logs. On Manjaro, it makes a IPv4 connection, while on Debian it makes a IPv6 connection leading to the differences.
Welcome to the world of crazy webservers :)
In any case, the answer to your question then is to use only a IPv4 connection
I have a problem submitting wget jobs to condor.
I can use wget to download a file from a url using command line.
$ wget https://wordpress.org/latest.zip
--2019-01-24 16:43:42-- https://wordpress.org/latest.zip
Resolving wordpress.org... 198.143.164.252
Connecting to wordpress.org|198.143.164.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11383968 (11M) [application/zip]
Saving to: “latest.zip”
100%[======================================>] 11,383,968 --.-K/s in 0.09s
2019-01-24 16:43:43 (117 MB/s) - “latest.zip” saved [11383968/11383968]
But if I save the command to a bash script file "test.sh", like this:
#!/bin/sh
wget https://wordpress.org/latest.zip
And then submit it to condor:
#!/usr/bin/env condor_submit
Executable = test.sh
Universe = vanilla
output = tmp.out
error = tmp.error
Log = tmp.log
Queue 1
It will have error as "Connection time out."
--2019-01-24 16:53:50-- https://wordpress.org/latest.zip
Resolving wordpress.org... 198.143.164.252
Connecting to wordpress.org|198.143.164.252|:443... failed: Connection timed out
But test.sh works well from command line as follows:
$./test.sh
I changed "tesh.sh" to:
#!/bin/sh
wget --debug --verbose https://wordpress.org/latest.zip
And the output is:
Setting --verbose (verbose) to 1
DEBUG output created by Wget 1.12 on linux-gnu.
--2019-01-24 17:25:58-- https://wordpress.org/latest.zip
Resolving wordpress.org... 198.143.164.252
Caching wordpress.org => 198.143.164.252
Connecting to wordpress.org|198.143.164.252|:443... Closed fd 3
failed: Connection timed out.
I suspect the problem is with "https": the SSL/TLS handshake is failing.
If you connect to an HTTPS site with a web browser, you use the browser's trust store (very often, this is one-in-the-same as your desktop's Windows trust store).
SUGGESTIONS:
Verify your "test.sh" works from the command line.
Modify your script to run wget --debug --trace ..., and check the output.
If only for troubleshooting purposes, try --no-check-certificate, too.
Look here for more details:
GNU Wget 1.18 Manual
ADDENDUM:
If you run wget --debug --verbose ..., you should see something like this:
$ wget --debug --verbose https://wordpress.org/latest.zip
Setting --verbose (verbose) to 1
Setting --verbose (verbose) to 1
DEBUG output created by Wget 1.19.4 on linux-gnu.
Reading HSTS entries from /home/xxxxx/.wget-hsts
URI encoding = ‘UTF-8’
Converted file name 'latest.zip' (UTF-8) -> 'latest.zip' (UTF-8)
--2019-01-24 15:49:12-- https://wordpress.org/latest.zip
Resolving wordpress.org (wordpress.org)... 198.143.164.252
Caching wordpress.org => 198.143.164.252
Connecting to wordpress.org (wordpress.org)|198.143.164.252|:443... connected.
Created socket 3.
Releasing 0x000055afcdc64650 (new refcount 1).
Initiating SSL handshake.
Handshake successful; connected socket 3 to SSL handle 0x000055afcdc64750
certificate:
subject: CN=*.wordpress.org,OU=Domain Control Validated
issuer: CN=Go Daddy Secure Certificate Authority - G2,OU=http://certs.godaddy.com/repository/,O=GoDaddy.com\\, Inc.,L=Scottsdale,ST=Arizona,C=US
X509 certificate successfully verified and matches host wordpress.org
---request begin---
GET /latest.zip HTTP/1.1
User-Agent: Wget/1.19.4 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: wordpress.org
Connection: Keep-Alive
...
If you DON'T see ANY of this ... I'd contact your network administrator about a firewall or proxy that might be blocking your Condor app.
I want to download all the png images in http://media.pldh.net/pokemon/go/ directory with wget. I am sure there are images in the directory: http://media.pldh.net/pokemon/go/bulbasaur.png
This is what i use:
wget -r -nd -np -A.png http://media.pldh.net/pokemon/go/
This is the output:
--2016-08-06 13:46:20-- http://media.pldh.net/pokemon/go/
Resolving media.pldh.net (media.pldh.net)... 192.185.70.249
Connecting to media.pldh.net (media.pldh.net)|192.185.70.249|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-08-06 13:46:21 ERROR 404: Not Found.
Is it possible use wget to download recursively all the images?
I can browse the page by browser, but I can't download the html page by wget.
https://money.benck.tw
When I use wget, it can't even connect to the website:
--2011-10-12 05:30:24-- https://money.benck.tw/
Resolving money.benck.tw... 97.107.135.68
Connecting to money.benck.tw|97.107.135.68|:443... failed: Connection timed out.
Retrying.
--2011-10-12 05:33:35-- (try: 2) https://money.benck.tw/
Connecting to money.benck.tw|97.107.135.68|:443...
However, I can download the other https website like: https://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js
It's very weird.
For this website you have to use the --no-check-certificate command
wget --no-check-certificate https://money.benck.tw
I'm experiments the same issue, I trying to download files from an external site like https://downloads.wordpress.org/plugin/easy-wp-smtp.zip and I wget using --no-check-certificate stills not working.... It's freezing in this line:
Connecting to downloads.wordpress.org (downloads.wordpress.org)|198.143.164.250|:443...
Anyone have the same issue?
No IP tables configured and rules. When I do this on other server on the same networks works fine. This only happens on this server specialy.
Regards,
Francisco Yu
This is because of this page is probably scraped by wget too often. You need to modify headers, especially useragent.
Examples from other website:
--no-check-certificate does not hepls
wget --no-check-certificate "https://www.money.pl/pieniadze/depozyty/walutowearch/1921-02-05,2021-02-05,LIBORCHF3M,strona,1.html" --2021-02-05 17:05:34-- https://www.money.pl/pieniadze/depozyty/walutowearch/1921-02-05,2021-02-05,LIBORCHF3M,strona,1.html
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving www.money.pl (www.money.pl)... 212.77.101.20
Connecting to www.money.pl (www.money.pl)|212.77.101.20|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-02-05 17:05:34 ERROR 403: Forbidden.
but other tool to download sendign other headers works
http -h "https://www.money.pl/pieniadze/depozyty/walutowearch/1921-02-05,2021-02-05,LIBORCHF3M,strona,1.html"
HTTP/1.1 200 OK
Cache-control: max-age=60, public,stale-while-revalidate=5
Connection: keep-alive
Content-Encoding: gzip
Content-Length: 20756
Content-Security-Policy: upgrade-insecure-requests;
Content-Type: text/html; charset=iso-8859-2
Date: Fri, 05 Feb 2021 16:04:16 GMT
Link: <https://money.wp.pl/dGxwOTV0SyYZFTlneUtGM1pNbSY9EkhlJ1V1dglvOxgnKBALCW87GCcoEAsJbzsYJygQCwlvOxgnKBALCW87GCcoEAsJbzsYJygQCwlvOxgnKBALCW87GCcoEAsJbzsYJygQCwlvOxgnKBALCW87GCcoEAsJbzsYJygQCwlvOxgnKBALCW87GCcobXh0RUZ9WlgoNTAeDjRHBTlpZxYWIhMeKydrAld1TER2ciZYECoUSjgjIR4JKBYSNnomXEF1TUUJJD9VCi4ZEzUxcwJRdT4TKiQ5Sh0zAVJ9YWR2EyYUAjs7IVUFNRsfamZjAiJ2QUV-eWYCSXdNUn1hZHNWd0pGYmRkHVRyXUV6ZhV8LQU3JQwcEAMpYkpCfRclRBYoFhZqZmMCJ3ZWHzs5OhY0EDkoLjA0VFl1XgQ_PTgNKRMbQgIuB0lCIRQEOzUiWQB6XhYrIgVcCzMLSn9lZhYHJBkDKjM5Qh16DxYjISJJRjo=>;rel="preload";as="script";
Server: nginx
Set-Cookie: mny_ver2=v8c;Domain=.money.pl;Path=/;Max-Age=2592000;
Vary: Accept-Encoding