How to submit wget jobs through a bash script to condor? - bash

I have a problem submitting wget jobs to condor.
I can use wget to download a file from a url using command line.
$ wget https://wordpress.org/latest.zip
--2019-01-24 16:43:42-- https://wordpress.org/latest.zip
Resolving wordpress.org... 198.143.164.252
Connecting to wordpress.org|198.143.164.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11383968 (11M) [application/zip]
Saving to: “latest.zip”
100%[======================================>] 11,383,968 --.-K/s in 0.09s
2019-01-24 16:43:43 (117 MB/s) - “latest.zip” saved [11383968/11383968]
But if I save the command to a bash script file "test.sh", like this:
#!/bin/sh
wget https://wordpress.org/latest.zip
And then submit it to condor:
#!/usr/bin/env condor_submit
Executable = test.sh
Universe = vanilla
output = tmp.out
error = tmp.error
Log = tmp.log
Queue 1
It will have error as "Connection time out."
--2019-01-24 16:53:50-- https://wordpress.org/latest.zip
Resolving wordpress.org... 198.143.164.252
Connecting to wordpress.org|198.143.164.252|:443... failed: Connection timed out
But test.sh works well from command line as follows:
$./test.sh
I changed "tesh.sh" to:
#!/bin/sh
wget --debug --verbose https://wordpress.org/latest.zip
And the output is:
Setting --verbose (verbose) to 1
DEBUG output created by Wget 1.12 on linux-gnu.
--2019-01-24 17:25:58-- https://wordpress.org/latest.zip
Resolving wordpress.org... 198.143.164.252
Caching wordpress.org => 198.143.164.252
Connecting to wordpress.org|198.143.164.252|:443... Closed fd 3
failed: Connection timed out.

I suspect the problem is with "https": the SSL/TLS handshake is failing.
If you connect to an HTTPS site with a web browser, you use the browser's trust store (very often, this is one-in-the-same as your desktop's Windows trust store).
SUGGESTIONS:
Verify your "test.sh" works from the command line.
Modify your script to run wget --debug --trace ..., and check the output.
If only for troubleshooting purposes, try --no-check-certificate, too.
Look here for more details:
GNU Wget 1.18 Manual
ADDENDUM:
If you run wget --debug --verbose ..., you should see something like this:
$ wget --debug --verbose https://wordpress.org/latest.zip
Setting --verbose (verbose) to 1
Setting --verbose (verbose) to 1
DEBUG output created by Wget 1.19.4 on linux-gnu.
Reading HSTS entries from /home/xxxxx/.wget-hsts
URI encoding = ‘UTF-8’
Converted file name 'latest.zip' (UTF-8) -> 'latest.zip' (UTF-8)
--2019-01-24 15:49:12-- https://wordpress.org/latest.zip
Resolving wordpress.org (wordpress.org)... 198.143.164.252
Caching wordpress.org => 198.143.164.252
Connecting to wordpress.org (wordpress.org)|198.143.164.252|:443... connected.
Created socket 3.
Releasing 0x000055afcdc64650 (new refcount 1).
Initiating SSL handshake.
Handshake successful; connected socket 3 to SSL handle 0x000055afcdc64750
certificate:
subject: CN=*.wordpress.org,OU=Domain Control Validated
issuer: CN=Go Daddy Secure Certificate Authority - G2,OU=http://certs.godaddy.com/repository/,O=GoDaddy.com\\, Inc.,L=Scottsdale,ST=Arizona,C=US
X509 certificate successfully verified and matches host wordpress.org
---request begin---
GET /latest.zip HTTP/1.1
User-Agent: Wget/1.19.4 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: wordpress.org
Connection: Keep-Alive
...
If you DON'T see ANY of this ... I'd contact your network administrator about a firewall or proxy that might be blocking your Condor app.

Related

No php output when using Xdebug DBGp proxy

When using Xdebug directly from VSCode using one of two PHP extension (xdebug.php-debug, devsense.phptools-vscode) for debugging everything works as expected. But I have a shared environment where an Apache/PHP/Xdebug setup/instance is to be used by multiple developers.
With both extension everything works well when being used directly receiving the Xdebug connection. I have a breakpoint at the last line of code and although the whole HTML content was completely sent by PHP the browser it is still waiting to do anything - although ob_end_flush() (output buffering) was already called. Stepping over the last line of code the HTML content arrives at the browser. The Xdebug log is finishing in the same way.
[Step Debug] -> <response xmlns="urn:debugger_protocol_v1" xmlns:xdebug="https://xdebug.org/dbgp/xdebug" command="run" transaction_id="14" status="stopping" reason="ok"></response>
...
[Step Debug] -> <response xmlns="urn:debugger_protocol_v1" xmlns:xdebug="https://xdebug.org/dbgp/xdebug" command="stop" transaction_id="22" status="stopped" reason="ok"></response>
When I now set this up using DBGp proxy the two PHP extension behave different. One is blocking the HTML output forever for all request been started and if I kill the apache processes I get a proxy log
21:01:28.652 [warn] [server] Handler response error: Error reading response: Error reading length: EOF
21:01:28.652 [info] [server] Closing server connection from 127.0.0.1:60104
21:01:28.653 [warn] [server] Handler response error: Error reading response: Error reading length: EOF
21:01:28.653 [info] [server] Closing server connection from 127.0.0.1:35422
21:01:28.653 [warn] [server] Handler response error: Error reading response: Error reading length: EOF
21:01:28.653 [info] [server] Closing server connection from 127.0.0.1:43594
21:01:28.653 [warn] [server] Handler response error: Error reading response: Error reading length: EOF
21:01:28.653 [info] [server] Closing server connection from 127.0.0.1:56398
21:01:28.659 [warn] [server] Handler response error: Error reading response: Error reading length: EOF
21:01:28.659 [info] [server] Closing server connection from 127.0.0.1:40030
while the other is delivering the HTML output and says in the proxy log the same moment
22:02:02.316 [warn] [server] Handler response error: Error reading response: Error reading length: EOF
22:02:02.317 [info] [server] Closing server connection from 127.0.0.1:33160
In both cases the Xdebug log says the same: status stopped, reason ok.
So I wonder whether the way the request seems to completely end in both cases reading an EOF and closing the server connection is really a warning because it's unexpected or whether this is the normal behaviour. And then of course it's an issue of the one PHP extension seeming not to tell the proxy and server they can end their connection (I assume this is the client's job which here is the proxy).
I checked all configuration options from Xdebug, the VSCode PHP extension and DBGp proxy to increase logging but I could not find out more than what I have described above. I tried setting the connection either via the VSCode extension or manually issuing echo -e "proxyinit -p 39001 -k jni-vscode -m 1\0" | nc 127.0.0.1 9001 with no success. The EOF warning stays and for the devsense.phptools-vscode plugin there is no php output to the browser. If anyone has a clue why I would appreciate a hint.
Thanks,
Jürgen
PS: Killing dbgpProxy makes apache deliver all the content with access_log entry showing it took 3064 seconds to fulfill the request.

wget on Debian Server gets 302 Found while wget on Manjaro gets 200 OK

I'm using wget to retrieve the Instagram JSON from the URL https://www.instagram.com/instagram/?__a=1.
Running wget from my local Manjaro setup returns a 200 OK and the proper JSON response, but running it from a Debian server retrieves a 302 found.
At first I thought it could be because of the wget version differences, but running curl locally also works while wget doesn't work properly.
Is there anything that I should be setting up on my server to get a proper response? My guess is that the HTTPS connection is refusing my server from connecting properly.
So, this is a weird quirk of the Instagram servers. Nothing you can do about it.
The problem is that Instagram responds differently depending on whether you connect to their server over IPv4 or IPv6. Why they would do that is beyond me, but I can reliably reproduce the result by controlling for only this variable.
IPv4:
$ wget -O/dev/null -4 "https://www.instagram.com/instagram/?__a=1"
--2020-09-03 14:22:15-- https://www.instagram.com/instagram/?__a=1
Resolving www.instagram.com (www.instagram.com)... 157.240.27.174
Connecting to www.instagram.com (www.instagram.com)|157.240.27.174|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 118552 (116K) [application/json]
Saving to: ‘/dev/null’
100%[================================================================================================================================>] 118,552 306KB/s in 0.4s
2020-09-03 14:22:17 (306 KB/s) - ‘/dev/null’ saved [118552/118552]
IPv6:
$ wget -O/dev/null -6 "https://www.instagram.com/instagram/?__a=1"
--2020-09-03 14:22:54-- https://www.instagram.com/instagram/?__a=1
Resolving www.instagram.com (www.instagram.com)... 2a03:2880:f23f:e5:face:b00c:0:4420
Connecting to www.instagram.com (www.instagram.com)|2a03:2880:f23f:e5:face:b00c:0:4420|:443... connected.
HTTP request sent, awaiting response... 302 Found
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Location: https://www.instagram.com/accounts/login/?next=/instagram/%3F__a%3D1 [following]
--2020-09-03 14:22:54-- https://www.instagram.com/accounts/login/?next=/instagram/%3F__a%3D1
Reusing existing connection to [www.instagram.com]:443.
HTTP request sent, awaiting response... 200 OK
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Length: 48094 (47K) [text/html]
Saving to: ‘/dev/null’
100%[================================================================================================================================>] 48,094 --.-K/s in 0.04s
2020-09-03 14:22:54 (1.28 MB/s) - ‘/dev/null’ saved [48094/48094]
This is the same thing you see in your debug logs. On Manjaro, it makes a IPv4 connection, while on Debian it makes a IPv6 connection leading to the differences.
Welcome to the world of crazy webservers :)
In any case, the answer to your question then is to use only a IPv4 connection

How to use wget with a proxy

I currently have been blocked from http://www.apkmirror.com, so a wget produces an ERROR 403: Forbidden:
kurt#kurt-ThinkPad:~$ wget http://www.apkmirror.com
--2017-04-21 12:51:42-- http://www.apkmirror.com/
Resolving www.apkmirror.com (www.apkmirror.com)... 104.19.135.58, 104.19.132.58, 104.19.133.58, ...
Connecting to www.apkmirror.com (www.apkmirror.com)|104.19.135.58|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2017-04-21 12:51:42 ERROR 403: Forbidden.
I'm trying to use a proxy from https://free-proxy-list.net/ to gain access. For example, following https://www.gnu.org/software/wget/manual/html_node/Proxies.html, I've tried
kurt#kurt-ThinkPad:~$ wget http://www.apkmirror.com#70.32.89.160:3128
--2017-04-21 12:57:56-- http://www.apkmirror.com#70.32.89.160:3128/
Connecting to 70.32.89.160:3128... connected.
HTTP request sent, awaiting response... 400 Bad Request
2017-04-21 12:57:59 ERROR 400: Bad Request.
but I get an ERROR 400: Bad Request. Is there anything wrong with this (attempted) usage of wget?
You are using the wrong method, Let me tell you.
Go to home on your server or computer using cd ~
then create a file using vim like vi ~/.wgetrc
and paste your proxy URL inside the file using as below.
use_proxy = on
http_proxy = http://70.32.89.160:3128
https_proxy = http://70.32.89.160:3128
ftp_proxy = http://70.32.89.160:3128
now use below command to access your blocked side.
wget -e use_proxy=yes -e http_proxy=$proxy http://www.apkmirror.com
or try using wget http://www.apkmirror.com you will see below output.
root#ubuntu:~# wget www.apkmirror.co
--2017-04-21 08:12:45-- http://www.apkmirror.co/
Connecting to 70.32.89.160:3128... connected.
Proxy request sent, awaiting response... 301 Moved Permanently
Location: http://www.apkmirror.co/ [following]
--2017-04-21 08:12:47-- http://www.apkmirror.co/
Connecting to 70.32.89.160:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 76512 (75K) [text/html]
Saving to: ‘index.html.8’
100%[===================================================================================================================>] 76,512 447KB/s in 0.2s
2017-04-21 08:12:49 (447 KB/s) - ‘index.html.8’ saved [76512/76512]
you can use,this is:
wget -e use_proxy=yes -e http_proxy=http://ilanni:123456#10.10.10.128:3128 http://oss.aliyuncs.com/aliyunecs/rds_backup_extract.sh
start wget through socks5 proxy using tsocks:
install tsocks: sudo apt install tsocks
config tsocks
# vi /etc/tsocks.conf
server = 127.0.0.1
server_type = 5
server_port = 1080
start: tsocks wget http://url_to_get

/usr/bin/perl install-module.pl DateTime

While installing bugzilla on RHEL, it checks for required modules:
./checksetup.pl --check-modules
It showed some unavailable modules.
While trying to install one of them, I encountered following error:
[root#localhost bugzilla-4.2.3]# /usr/bin/perl install-module.pl DateTime
Checking for CPAN (v1.81) ok: found v1.94
**Checking for YAML (any) not found**
Checking for ExtUtils-MakeMaker (v6.31) ok: found v6.55_02
CPAN: Storable loaded ok (v2.20)
CPAN: LWP::UserAgent loaded ok (v5.833)
CPAN: Time::HiRes loaded ok (v1.9721)
Warning: no success downloading '/root/.cpan/source/authors/01mailrc.txt.gz.tmp19575'.
Giving up on it. at `/usr/share/perl5/CPAN/Index.pm line 225`
Fetching with LWP:
http://www.perl.org/CPAN/authors/01mailrc.txt.gz
LWP failed with code[500] message[Can't connect to www.perl.org:80 (Bad hostname 'www.perl.org')]
Trying with "/usr/bin/curl -L -f -s -S --netrc-optional" to get "http://www.perl.org/CPAN/authors/01mailrc.txt.gz" :
curl: (6) Couldn't resolve host 'www.perl.org'
Function system("/usr/bin/curl -L -f -s -S --netrc-optional "http://www.perl.org/CPAN/authors/01mailrc.txt.gz" > /root/.cpan/source/authors/01mailrc.txt.tmp19575")returned status 6 (wstat 1536)
Warning: expected file [/root/.cpan/source/authors/01mailrc.txt.gz.tmp19575] doesn't exist
Trying with "/usr/bin/wget -O /root/.cpan/source/authors/01mailrc.txt.tmp19575" to get<some URL>
--2012-09-24 17:29:33-- <some URL>
Resolving www.perl.org... failed: Name or service not known.
wget: unable to resolve host address “www.perl.org”
Function system("/usr/bin/wget -O /root/.cpan/source/authors/01mailrc.txt.tmp19575 "some URL ")
returned status 4 (wstat 1024)
Warning: expected file [/root/.cpan/source/authors/01mailrc.txt.gz.tmp19575] doesn't exist
Warning: no success downloading '/root/.cpan/source/authors/01mailrc.txt.gz.tmp19575'.
Giving up on it. at /usr/share/perl5/CPAN/Index.pm line 225
Can anyone help me out?
thanks a ton!
It looks like it cannot resolve hostnames, can you ping www.google.co.uk from that machine?
If thats the problem you can temporary edit your DNS settings, I don't now how it works in Red Hat, but here is the Ubuntu way:
sudo nano /etc/resolv.conf
Add DNS server manual:
#Google nameserver 1:
nameserver 8.8.8.8
#Google nameserver 2:
nameserver 8.8.4.4
Save the file and restart the network interface:
sudo /etc/init.d/networking restart

HTTP2_Plain in node-http2 module is not working?

I want to create a http2 server using node-http2 module without TLS. My code is as follows:
http2 = require('http2');
const bunyan = require('bunyan');
var log = bunyan.createLogger({name: "HTTP2 server without TLS!"});
var options = {
log: log
}
var server = http2.raw.createServer(options, function(request, response) {
console.log("Receiving HTTP2 request!");
// response.writeHead(200);
response.end('Hello world from HTTP2!');
});
server.listen(8000);
However, it does not work. When connecting to this server from chrome, it shows downloading something. When I closed the server, the downloading is finished with blank file (26 bytes).
Does anyone know what is wrong here? Do I need to configure the browser? Thanks in advance!
Chrome and all other browsers only support HTTP/2 over TLS (h2) and not plain HTTP/2 (h2c). So your browser does not understand what is returned from the server and apparently node-http2 does not send a proper error response when it receives a non-http2 request.
The problem seems not just from the browser. Using [curl] curllink that supports http2 over an http:// URL does not working either. Following is the output from the curl:
$ curl -I --http2 http://54.208.83.136:8000/ -v -k
* Trying 54.208.83.136...
* Connected to 54.208.83.136 (54.208.83.136) port 8000 (#0)
> HEAD / HTTP/1.1
> Host: 54.208.83.136:8000
> User-Agent: curl/7.47.1
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAAQAAP__
>
As we see from the curl output. It sends http/1.1 Upgrade request with proper headers set as it supposed to do according to the [http2 rfc] rfclink.
On the server side, the logs were very long, so I present here only the content of msg in the relevant three logs.
New incoming HTTP/2 connection
Client connection header prelude does not match
PROTOCOL ERROR, Fatal error, closing connection
So basically the server closed the connection because the client connection header prelude does not match. By checking the code, I figured out the error was originated from the readPrelude function of [endpoint.js] endpointlink. It is a function to read the client header, but I don't know what is wrong in the client header :(.
Thus maybe I can say the node-http2 module does not support http2 over plaintext.
Update: it turns out that I was wrong. The node-http2 module do support http2 over plaintext with direct connecting, it does not support HTTP/2 server with Upgrade from HTTP/1.1. The problem resulted from the client side using Upgrade mechanism to connect to the server not supporting Upgrade. Using nghttp client to connect sever with prior knowledge works as follows.
$ nghttp http://127.0.0.1:8000/
Hello world from HTTP2!
nghttpd server also supports HTTP2 without TLS, even though it does not support HTTP Upgrade.
$ nghttpd -d /Documents/Proxy 8080 --no-tls -v
So I highly suggest to use nghttp when you want to test HTTP2 without TLS.

Resources