How to get original URL from the cURL output log entry? - bash

I feed cURL with multiple URLs at a time, and have a difficulty parsing the output log to get the original addresses back. Namely, if an URL resolves, the output is as follows:
$ curl --head --verbose https://www.google.com/
* Trying 64.233.165.106...
* TCP_NODELAY set
* Connected to www.google.com (64.233.165.106) port 443 (#0)
<...>
> HEAD / HTTP/2
> Host: www.google.com
<...>
which can eventually be parsed back to https://www.google.com/.
However, with an invalid URL it does not do:
$ curl --head --verbose --connect-timeout 3 https://imap.gmail.com/
* Trying 74.125.131.109...
* TCP_NODELAY set
* After 1491ms connect time, move on!
* connect to 74.125.131.109 port 443 failed: Operation timed out
<...>
* Failed to connect to imap.gmail.com port 443: Operation timed out
The error message contains the URL in this case, but in other cases it does not. I can't rely on it.
So, I need either have URL-to-IP resolving disabled in the output, like
* Trying https://imap.gmail.com/...
or somehow append each URL from the list to the corresponding output, like:
$ curl --head --verbose --connect-timeout 3 https://imap.gmail.com/ https://www.google.com/
https://imap.gmail.com/
* Trying 64.233.162.108...
* TCP_NODELAY set
* After 1495ms connect time, move on!
* connect to 64.233.162.108 port 443 failed: Operation timed out
<...>
https://www.google.com/
* Trying 74.125.131.17...
* TCP_NODELAY set
* Connected to www.gmail.com (74.125.131.17) port 443 (#0)
<...>
Wget or HTTPie are not an option. How one can achieve that with cURL?

Perhaps this is the solution:
while read LINE ; do
print "REQUESTED URL: $LINE" >> output.txt;
curl $LINE >> output.txt 2>&1;
done < url-list.txt

Starting with curl v.7.75.0, it is possible with the --write-out '%{url}' option to make curl display the URL that was fetched.

Related

Unable to bind any program to IPv4 TCP port 80 on Mac [duplicate]

I have the following very simple docker-compose.yml, running on a Mac:
version: "3.7"
services:
apache:
image: httpd:2.4.41
ports:
- 80:80
I run docker-compose up, then I run this curl and Apache returns content:
/tmp/test $ curl -v http://localhost
* Trying ::1:80...
* TCP_NODELAY set
* Connected to localhost (::1) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.66.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Sat, 26 Oct 2019 18:30:03 GMT
< Server: Apache/2.4.41 (Unix)
< Last-Modified: Mon, 11 Jun 2007 18:53:14 GMT
< ETag: "2d-432a5e4a73a80"
< Accept-Ranges: bytes
< Content-Length: 45
< Content-Type: text/html
<
<html><body><h1>It works!</h1></body></html>
* Connection #0 to host localhost left intact
However, if I try to access the container using 127.0.0.1 instead of localhost, I get connection refused:
/tmp/test $ curl -v http://127.0.0.1
* Trying 127.0.0.1:80...
* TCP_NODELAY set
* Connection failed
* connect to 127.0.0.1 port 80 failed: Connection refused
* Failed to connect to 127.0.0.1 port 80: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 127.0.0.1 port 80: Connection refused
Localhost does point to 127.0.0.1:
/tmp/test $ ping localhost
PING localhost (127.0.0.1): 56 data bytes
And netstat shows all local IP addresses port 80 to be forwarded:
/tmp/test $ netstat -tna | grep 80
...
tcp46 0 0 *.80 *.* LISTEN
...
I came to this actually trying to access the container using a custom domain I had on my /etc/hosts file pointing to 127.0.0.1. I thought there was something wrong with that domain name, but then I tried 127.0.0.1 and didn't work either, so I'm concluding there is something very basic about docker I'm not doing right.
Why is curl http://localhost working but curl http://127.0.0.1 is not?
UPDATE
It seems localhost is resolving to IPv6 ::1, so port forwarding seems to be working on IPv6 but not IPv4 addresses. Does that make any sense?
UPDATE 2
I wasn't able to fix it, but pointing my domain name to ::1 instead of 127.0.0.1 in my /etc/hosts serves as a workaround for the time being.
UPDATE 3
8 months later I bumped into the same issue and found my own question here, still unanswered. But this time I can't apply the same workaround, because I need to bind the port forwarding to my IPv4 address so it can be accessed from other hosts.
Found the culprit: pfctl
AFAIK, pfctl is not supposed to run automatically but my /System/Library/LaunchDaemons/com.apple.pfctl.plist said otherwise.
The Packet Filtering was configured to redirect all incoming traffic on port 80 to 8080, and 443 to 8443. And this is done without any process actually listening to port 80 and 443, that's why lsof and netstat wouldn't show anything,.
/Library/LaunchDaemons/it.winged.httpdfwd.plist has the following
<key>ProgramArguments</key>
<array>
<string>sh</string>
<string>-c</string>
<string>echo "rdr pass proto tcp from any to any port {80,8080} -> 127.0.0.1 port 8080" | pfctl -a "com.apple/260.HttpFwdFirewall" -Ef - && echo "rdr pass proto tcp from any to any port {443,8443} -> 127.0.0.1 port 8443" | pfctl -a "com.apple/261.HttpFwdFirewall" -Ef - && sysctl -w net.inet.ip.forwarding=1</string>
</array>
<key>RunAtLoad</key>
The solution was simply to listen on ports 8080 and 8443. All requests to ports 80 and 443 are now being redirected transparently.
While debugging this I found countless open questions about similar problems without answers. I hope this helps somebody.

How to Connect from cURL to Rest API USPS portal?

I am trying to connect to USPS portal using curl command.
In hadoop its working fine.
curl -i "https://secure.shippingapis.com/ShippingAPI.dll?API=Verify&XML=<AddressValidateRequest USERID="XXXXXXXXX">"
* About to connect() to secure.shippingapis.com port 443 (#0)
* Trying 56.0.34.44...
* Connected to secure.shippingapis.com (56.0.34.44) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL connection using TLS_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=secure.shippingapis.com,OU=Unified Communications,OU=Hosted by United States Postal Service,OU=WebTools,O=United States Postal Service,STREET=2825 Lone Oak Parkway,L=Eagan,ST=MN,postalCode=55121,C=US
* start date: May 24 00:00:00 2019 GMT
* expire date: May 23 23:59:59 2020 GMT
* common name: secure.shippingapis.com
* issuer: CN=COMODO SHA-256 Organization Validation Secure Server CA,O=COMODO CA Limited,L=Salford,ST=Greater Manchester,C=GB
> GET /ShippingAPI.dll?API=Verify&XML=<AddressValidateRequest USERID=XXXXXXXXX> HTTP/1.1
> User-Agent: curl/7.29.0
> Host: secure.shippingapis.com
> Accept: */*
However when i try to send a request and get response back from server, it is unable to do so .
curl -i -X GET AddressValidateRequest.txt "https://secure.shippingapis.com/ShippingAPI.dll?API=Verify&XML=<AddressValidateRequest USERID="XXXXXXXXXXXX">
<Address>
<Address1>6406 IVY LANE</Address1>
<Address2></Address2>
<City>GREENBELT</City>
<State>TX</State>
<Zip5>20770</Zip5>
<Zip4></Zip4>
</Address>
</AddressValidateRequest>"
USERID - usps shiping authentication ID
curl -v -H "Content-Type: application/xml" -d > 'API=RateV4&XML=<RateV4Request USERID="USERID"><Revision>2</Revision><Package ID="1ST"><Service>Priority Mail Express</Service><FirstClassMailType>PACKAGE SERVICE RETAIL</FirstClassMailType><ZipOrigination>33016</ZipOrigination><ZipDestination>35004</ZipDestination><Pounds>1</Pounds><Ounces>0</Ounces><Container/><Width>1</Width><Length>1</Length><Height>1</Height><Machinable>true</Machinable></Package></RateV4Request>' -X POST 'https://secure.shippingapis.com/ShippingAPI.dll'
XML is typically not appended to the end of a URL, but your quotes are off since you have double quotes within double quotes.
Also try removing all the whitespace, and not sure what AddressValidateRequest.txt is trying to do
You can also remove the -i -v -X GET if you only want to see the response
curl 'https://secure.shippingapis.com/ShippingAPI.dll?API=Verify&XML=<AddressValidateRequest USERID="XXXXXXXXXXXX"><Address>...</Address></AddressValidateRequest>'

Send an HTTPS request to TLS1.0-only server in Alpine linux

I'm writing a simple web crawler inside Docker Alpine image. However I cannot send HTTPS requests to servers that support only TLS1.0 . How can I configure Alpine linux to allow obsolete TLS versions?
I tried adding MinProtocol to /etc/ssl/openssl.cnf with no luck.
Example Dockerfile:
FROM node:12.0-alpine
RUN printf "[system_default_sect]\nMinProtocol = TLSv1.0\nCipherString = DEFAULT#SECLEVEL=1" >> /etc/ssl/openssl.cnf
CMD ["/usr/bin/wget", "https://www.restauracesalanda.cz/"]
When I build and run this container, I get
Connecting to www.restauracesalanda.cz (93.185.102.124:443)
ssl_client: www.restauracesalanda.cz: handshake failed: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol
wget: error getting response: Connection reset by peer
I can reproduce your issue using the builtin-busybox-wget. However, using the "regular" wget works:
root#a:~# docker run --rm -it node:12.0-alpine /bin/ash
/ # wget -q https://www.restauracesalanda.cz/; echo $?
ssl_client: www.restauracesalanda.cz: handshake failed: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol
wget: error getting response: Connection reset by peer
1
/ # apk add wget
fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/community/x86_64/APKINDEX.tar.gz
(1/1) Installing wget (1.20.3-r0)
Executing busybox-1.29.3-r10.trigger
OK: 7 MiB in 17 packages
/ # wget -q https://www.restauracesalanda.cz/; echo $?
0
/ #
I'm not sure, but maybe you should post an issue at https://bugs.alpinelinux.org
Putting this magic 1 liner into my dockerfile solved my issues and i was able to use TLS 1.0:
RUN sed -i 's/MinProtocol = TLSv1.2/MinProtocol = TLSv1/' /etc/ssl/openssl.cnf \ && sed -i 's/CipherString = DEFAULT#SECLEVEL=2/CipherString = DEFAULT#SECLEVEL=1/' /etc/ssl/openssl.cnf
Credit goes to this dude: http://blog.travisgosselin.com/tls-1-0-1-1-docker-container-support/

cURL including garbage when redirecting stderr to stdout

I'm using curl 7.54.1 (x86_64-apple-darwin15.6.0) to download a track from the soundcloud API in a bash script
The code looks like this -
# -vsLOJ = --verbose --silent --location --remote-name --remote-header-name
# redirect stderr to stdout to capture the headers
curl_output=$(curl -vsLOJ $track_download_url?client_id=$client_id 2>&1);
This is supposed to:
verbosely print out the request/response (to capture HTTP headers)
silence the download bar
follow the location (the API provides a pretty link that 302's to the actual file)
Create a file using "Content-Disposition" header as the file name (this becomes the output file, not stdout)
redirect stderr (where the verbose output is sent) to stdout
What's happening:
The download is OK, it saves the file to the working directory with the correct name from "Content-Disposition" but $curl_output is filled with garbage data of what looks like a mix of an ls of the working directory and partial verbose data.
Example output cURL-ing https://www.google.com in a test directory with files:
curl_output=$(curl --verbose -o /dev/null "https://www.google.com" 2>&1)
echo $curl_output
fakefile.1 fakefile.2 hello.txt song.mp3 vacation.png Rebuilt URL to:
https://www.google.com/ fakefile.1 fakefile.2 hello.txt song.mp3
vacation.png Trying 172.217.10.100... fakefile.1 fakefile.2 hello.txt
song.mp3 vacation.png TCP_NODELAY set fakefile.1 fakefile.2 hello.txt
song.mp3 vacation.png Connected to www.google.com (172.217.10.100)
port 443 (#0) fakefile.1 fakefile.2 hello.txt song.mp3 vacation.png
TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
fakefile.1 fakefile.2 hello.txt song.mp3 vacation.png Server
certificate: www.google.com fakefile.1 fakefile.2 hello.txt song.mp3
vacation.png Server certificate: Google Internet Authority G2
fakefile.1 fakefile.2 hello.txt song.mp3 vacation.png Server
certificate: GeoTrust Gl < Set-Cookie:
REDACTED=REDACTED######################################################################## 100.0%* Connection #0 to host www.google.com left intact
Completely confusing to me. I've tested this in a bash script and from Terminal. It only seems to be happening when I store the result in a variable, running that cURL (including the stderr redirect) without storing it in $curl_output will correctly write
And, this is happening for any URL I test with
My .curlrc:
user-agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_19_4) AppleWebKit/603.4.1 (KHTML, like Gecko) Chrome/59.0.3050.56 Safari/603.4.1"
referer = ";auto"
connect-timeout = 10
progress-bar
max-time = 90
remote-time
Put quotes around your $curl_output variable, because it contains * which get interpreted.
% echo "$curl_output"
* Rebuilt URL to: https://www.google.com/
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 2a00:1450:4009:806::2004...
* Connected to www.google.com (2a00:1450:4009:806::2004) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
Whereas:
echo $curl_output
would resolve the * into whatever files names are lying in your current directory.

Curl: Bypass proxy for localhost

I'm under a proxy, and if I try curl http://localhost/mysite or curl http://127.0.0.1/mysite curl try to resolve it with the proxy. So I tried with --noproxy option, but doesn't work. Is working ok for external servers with the proxy as curl http://mysite.com.
My configuration:
Cygwin (bash) under Windows 8 with curl extension.
Proxy: proxy.domain.xx:1080 without authentication
http_proxy=http://proxy.domain.xx:1080
Local Server: XAMP Version 1.8.0
Apache ports: 80,443
Browser: Chrome with proxy, but configured to access to localhost and *.dev
From the curl --help
--noproxy : Comma-separated list of hosts which do not use proxy
What I tried:
I have deactivated the firewall and nothing
$ curl -v http://localhost/mysite -> Debug:
Response
Connected to proxy.domain.xx (200.55.xxx.xx) port 1080 (#0)
GET http://localhost/mysite HTTP/1.1
User-Agent: curl/7.21.1 (i686-pc-mingw32) libcurl/7.21.1 OpenSSL/0.9.8r zlib/1.2.3
Host: localhost
Accept: */*
Proxy-Connection: Keep-Alive
The system returned: <PRE><I>(111) Connection refused</I></PRE>
curl -v --noproxy localhost, http://localhost/muestra
Response
About to connect() to localhost port 80 (#0)
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET /mysite HTTP/1.1
> User-Agent: curl/7.21.1 (i686-pc-mingw32) libcurl/7.21.1 OpenSSL/0.9.8r zlib/1.2.3
> Host: localhost
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Server: Apache/2.4.2 (Win32) OpenSSL/1.0.1c PHP/5.4.4
< Location: http://localhost/mysite
< Content-Length: 331
< Content-Type: text/html; charset=iso-8859-1
Any idea how to fix this?
After
curl -v --noproxy localhost, http://localhost/muestra
curl responded with
About to connect() to localhost port 80 (#0)
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 80 (#0)
So it clearly stated that it connected to localhost.
try below, to bypass proxy service for local ips
export no_proxy=localhost,127.0.0.1
use
curl -v --noproxy '*' http://abc.com
to completely disable proxy.
or if you want to disable proxy for only abc.com destination
curl -v --noproxy "abc.com" http://abc.com
where abc.com is the url you want to go
As some others said, the --noproxy options is what you are looking for.
https://curl.haxx.se/docs/manpage.html#--noproxy
Apparently, the second request you tried was receiving a HTTP 301 response, so you probably also want to use the -L option to follow redirects:
https://curl.haxx.se/docs/manpage.html#-L
You can alias curl to always ignore proxies for localhost requests.
alias curl='curl --noproxy localhost,127.0.0.1'
Add it to your .bashrc file for convenience:
echo "alias curl='curl --noproxy localhost,127.0.0.1'" >> ~/.bashrc
In Windows, the following option worked for me for connecting to localhost.
curl --proxy "" --location http://127.0.0.1:8983
Curl expects the port to be specified with proxy this solution worked for me
export http_proxy="http://myproxy:80"
As a workaround, you can just unset the variable http_proxy before running your curl without the option --noproxy.

Resources