cURL including garbage when redirecting stderr to stdout - bash

I'm using curl 7.54.1 (x86_64-apple-darwin15.6.0) to download a track from the soundcloud API in a bash script
The code looks like this -
# -vsLOJ = --verbose --silent --location --remote-name --remote-header-name
# redirect stderr to stdout to capture the headers
curl_output=$(curl -vsLOJ $track_download_url?client_id=$client_id 2>&1);
This is supposed to:
verbosely print out the request/response (to capture HTTP headers)
silence the download bar
follow the location (the API provides a pretty link that 302's to the actual file)
Create a file using "Content-Disposition" header as the file name (this becomes the output file, not stdout)
redirect stderr (where the verbose output is sent) to stdout
What's happening:
The download is OK, it saves the file to the working directory with the correct name from "Content-Disposition" but $curl_output is filled with garbage data of what looks like a mix of an ls of the working directory and partial verbose data.
Example output cURL-ing https://www.google.com in a test directory with files:
curl_output=$(curl --verbose -o /dev/null "https://www.google.com" 2>&1)
echo $curl_output
fakefile.1 fakefile.2 hello.txt song.mp3 vacation.png Rebuilt URL to:
https://www.google.com/ fakefile.1 fakefile.2 hello.txt song.mp3
vacation.png Trying 172.217.10.100... fakefile.1 fakefile.2 hello.txt
song.mp3 vacation.png TCP_NODELAY set fakefile.1 fakefile.2 hello.txt
song.mp3 vacation.png Connected to www.google.com (172.217.10.100)
port 443 (#0) fakefile.1 fakefile.2 hello.txt song.mp3 vacation.png
TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
fakefile.1 fakefile.2 hello.txt song.mp3 vacation.png Server
certificate: www.google.com fakefile.1 fakefile.2 hello.txt song.mp3
vacation.png Server certificate: Google Internet Authority G2
fakefile.1 fakefile.2 hello.txt song.mp3 vacation.png Server
certificate: GeoTrust Gl < Set-Cookie:
REDACTED=REDACTED######################################################################## 100.0%* Connection #0 to host www.google.com left intact
Completely confusing to me. I've tested this in a bash script and from Terminal. It only seems to be happening when I store the result in a variable, running that cURL (including the stderr redirect) without storing it in $curl_output will correctly write
And, this is happening for any URL I test with
My .curlrc:
user-agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_19_4) AppleWebKit/603.4.1 (KHTML, like Gecko) Chrome/59.0.3050.56 Safari/603.4.1"
referer = ";auto"
connect-timeout = 10
progress-bar
max-time = 90
remote-time

Put quotes around your $curl_output variable, because it contains * which get interpreted.
% echo "$curl_output"
* Rebuilt URL to: https://www.google.com/
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 2a00:1450:4009:806::2004...
* Connected to www.google.com (2a00:1450:4009:806::2004) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
Whereas:
echo $curl_output
would resolve the * into whatever files names are lying in your current directory.

Related

How to get original URL from the cURL output log entry?

I feed cURL with multiple URLs at a time, and have a difficulty parsing the output log to get the original addresses back. Namely, if an URL resolves, the output is as follows:
$ curl --head --verbose https://www.google.com/
* Trying 64.233.165.106...
* TCP_NODELAY set
* Connected to www.google.com (64.233.165.106) port 443 (#0)
<...>
> HEAD / HTTP/2
> Host: www.google.com
<...>
which can eventually be parsed back to https://www.google.com/.
However, with an invalid URL it does not do:
$ curl --head --verbose --connect-timeout 3 https://imap.gmail.com/
* Trying 74.125.131.109...
* TCP_NODELAY set
* After 1491ms connect time, move on!
* connect to 74.125.131.109 port 443 failed: Operation timed out
<...>
* Failed to connect to imap.gmail.com port 443: Operation timed out
The error message contains the URL in this case, but in other cases it does not. I can't rely on it.
So, I need either have URL-to-IP resolving disabled in the output, like
* Trying https://imap.gmail.com/...
or somehow append each URL from the list to the corresponding output, like:
$ curl --head --verbose --connect-timeout 3 https://imap.gmail.com/ https://www.google.com/
https://imap.gmail.com/
* Trying 64.233.162.108...
* TCP_NODELAY set
* After 1495ms connect time, move on!
* connect to 64.233.162.108 port 443 failed: Operation timed out
<...>
https://www.google.com/
* Trying 74.125.131.17...
* TCP_NODELAY set
* Connected to www.gmail.com (74.125.131.17) port 443 (#0)
<...>
Wget or HTTPie are not an option. How one can achieve that with cURL?
Perhaps this is the solution:
while read LINE ; do
print "REQUESTED URL: $LINE" >> output.txt;
curl $LINE >> output.txt 2>&1;
done < url-list.txt
Starting with curl v.7.75.0, it is possible with the --write-out '%{url}' option to make curl display the URL that was fetched.

Send an HTTPS request to TLS1.0-only server in Alpine linux

I'm writing a simple web crawler inside Docker Alpine image. However I cannot send HTTPS requests to servers that support only TLS1.0 . How can I configure Alpine linux to allow obsolete TLS versions?
I tried adding MinProtocol to /etc/ssl/openssl.cnf with no luck.
Example Dockerfile:
FROM node:12.0-alpine
RUN printf "[system_default_sect]\nMinProtocol = TLSv1.0\nCipherString = DEFAULT#SECLEVEL=1" >> /etc/ssl/openssl.cnf
CMD ["/usr/bin/wget", "https://www.restauracesalanda.cz/"]
When I build and run this container, I get
Connecting to www.restauracesalanda.cz (93.185.102.124:443)
ssl_client: www.restauracesalanda.cz: handshake failed: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol
wget: error getting response: Connection reset by peer
I can reproduce your issue using the builtin-busybox-wget. However, using the "regular" wget works:
root#a:~# docker run --rm -it node:12.0-alpine /bin/ash
/ # wget -q https://www.restauracesalanda.cz/; echo $?
ssl_client: www.restauracesalanda.cz: handshake failed: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol
wget: error getting response: Connection reset by peer
1
/ # apk add wget
fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/community/x86_64/APKINDEX.tar.gz
(1/1) Installing wget (1.20.3-r0)
Executing busybox-1.29.3-r10.trigger
OK: 7 MiB in 17 packages
/ # wget -q https://www.restauracesalanda.cz/; echo $?
0
/ #
I'm not sure, but maybe you should post an issue at https://bugs.alpinelinux.org
Putting this magic 1 liner into my dockerfile solved my issues and i was able to use TLS 1.0:
RUN sed -i 's/MinProtocol = TLSv1.2/MinProtocol = TLSv1/' /etc/ssl/openssl.cnf \ && sed -i 's/CipherString = DEFAULT#SECLEVEL=2/CipherString = DEFAULT#SECLEVEL=1/' /etc/ssl/openssl.cnf
Credit goes to this dude: http://blog.travisgosselin.com/tls-1-0-1-1-docker-container-support/

HTTPS file transfer stopping when it encounters 0x1a (substitute character)

I'm trying to send binary files using an openSSL server, but the transfer stops just before it should be sending a 0x1A character.
I was able to send 8MB+ text files, but with both binaries I tried sending, it stopped after 20kB, right before the 0x1A character, so the size of the file is not a problem. I'm on Windows 10 and using openSSL to run the server.
winpty openssl s_server -WWW -key key.pem -cert cert.pem -port 8075
Using the following command, I am able to download text files without any issue, but it "fails" before the whole binary file was sent.
winpty curl -v -k https://192.168.1.100:8075/hello-world.bin --output hello-world-recv.bin
The verbose from the curl command is as follows (removed the first part where it's about the certificates and handshake):
GET /hello-world.bin HTTP/1.1
> Host: 192.168.1.100:8075
> User-Agent: curl/7.60.0
> Accept: */*
>
{ [5 bytes data]
* HTTP 1.0, assume close after body
< HTTP/1.0 200 ok
< Content-type: text/plain
<
{ [16339 bytes data]
100 19270 0 19270 0 0 43995 0 --:--:-- --:--:-- --:--:-- 45663
* Closing connection 0
} [5 bytes data]
* TLSv1.2 (OUT), TLS alert, Client hello (1):
} [2 bytes data]
As we can see, only 19kB of data was sent, but the original file is 1.2MB.
Any help would be appreciated, thanks.
Check in OpenSSL-source function www_body in apps/s_server.c (line 3254 in 1.1.1b): it opens the file in default "r" mode, which might be text mode in Windows, that means CRLF will be converted to LF and SUB will be interpreted as end of file.

Mac Terminal Curl for textfile

I am trying to download a CSV file from a local webserver (the webserver runs on a signal conditioning instrument) with:
curl -O http://10.0.0.139/crv01.csv
The output from curl is only weird symbols. If I put the same url in Safari the CSV File is correctly depicted. Is there any encoding problem with curl?
I tried the verbose option which gives:
Gunthers-MacBook-Pro:Documents guenther$ curl -v -O http://10.0.0.141/crv01.csv
* About to connect() to 10.0.0.141 port 80 (#0)
* Trying 10.0.0.141...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* connected
* Connected to 10.0.0.141 (10.0.0.141) port 80 (#0)
> GET /crv01.csv HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8x zlib/1.2.5
> Host: 10.0.0.141
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon Aug 26 21:03:12 2013
< Server: VEGA-HTTP-SERVER
< Cache-Control: public, max-age=3600
< Content-type: text/plain
< Connection: Close
< Content-Encoding: gzip
< Content-Length: 226
<
{ [data not shown]
100 226 100 226 0 0 1883 0 --:--:-- --:--:-- --:--:-- 1965
* Closing connection #0
Gunthers-MacBook-Pro:Documents guenther$ vi crv01.csv
^_<8b>^H^#^#^#^#^#^#^C}<92>Á
<83>0^L<86>ï<82>ï xÙ^N+I;kkO²<95>!l^N6õ-ÆÞÿ¶zФ¬4Ç~$ÿ<9f>?­_~^YÞÃs¬P#YÔ<97>ùµø
ÐMýí4~E<85>áñÚOÞMÃû¥¿Ïþð9<96><85>l^D^X!^A<95>CÛ)Õ¡uR´^RBETB<96>b<96>J¢^X^F^G<87>LWª^?ªwWÀt·^F<99>n<82>&tY/Ó]M­®^X=g]5D^S½:KÛ,5;Õv^]^]®À\Ù^\EÈRÌRÊ*¡<8b><94>U<8a>RV)JY¥(e¥^M<84><8a>öEÊ*E^Mmd TÜk<89>¶^AÆ÷eÿy<9b>ü^C»üß{E^C^#^#
Source Code of the webpage (Google Chrome) is a plain cdv file. CSV File is create by
http://www.vega.com/en/Signal-conditioning-instrument-VEGAMET391.htm
The --text-ascii option also did not help!
It seems the page is sent back compressed (see the header "Content-Encoding: gzip") even though curl didn't ask for it. If you ask for it, curl will decompress it for you automatically when receiving it. Try:
curl --compressed -O http://10.0.0.139/crv01.csv
That command should work, it works correctly on my system (10.6) when serving a csv file locally.
You could try the command with verbose on to see if there is any issue:
curl -v -O http://10.0.0.139/crv01.csv
How was the CSV created? By Excel or was it always native text? I suspect Safari is rendering while ignoring extraneous binary data.
You could View Source in Safari and make sure it is plain.
Also try curl --trace-ascii to request ASCII context.
Edit:
From your verbose output, it looks like the file is gzipped.
Try saving it as a .gz file instead and then gunzip crv01.gz
curl http://10.0.0.139/crv01.csv -o crv01.gz
gunzip crv01.gz
If there are more crv files, you can also download a range of them at once:
curl "http://10.0.0.139/crv[01-50].csv" -o crv#1.gz

Recursive wget won't work

I'm trying to crawl a local site with wget -r but I'm unsuccessful: it just downloads the first page and doesn't go any deeper. By the way, I'm so unsuccessful that for whatever site I'm trying it doesn't work... :)
I've tried various options but nothing better happens. Here's the command I thought I'd make it with:
wget -r -e robots=off --user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4" --follow-tags=a,ref --debug `http://rocky:8081/obix`
Really, I've no clue. Whatever site or documentation I read about wget tells me that it should simply work with wget -r so I'm starting to think my wget is buggy (I'm on Fedora 16).
Any idea?
EDIT: Here's the output I'm getting for wget -r --follow-tags=ref,a http://rocky:8081/obix/ :
wget -r --follow-tags=ref,a http://rocky:8081/obix/
--2012-10-19 09:29:51-- http://rocky:8081/obix/ Resolving rocky... 127.0.0.1 Connecting to rocky|127.0.0.1|:8081...
connected. HTTP request sent, awaiting response... 200 OK Length: 792
[text/xml] Saving to: “rocky:8081/obix/index.html”
100%[==============================================================================>] 792 --.-K/s in 0s
2012-10-19 09:29:51 (86,0 MB/s) - “rocky:8081/obix/index.html”
saved [792/792]
FINISHED --2012-10-19 09:29:51-- Downloaded: 1 files, 792 in 0s (86,0
MB/s)
Usually there's no need to give the user-agent.
It should be sufficient to give:
wget -r http://stackoverflow.com/questions/12955253/recursive-wget-wont-work
To see, why wget doesn't do what you want, look at the output it is giving you and post it here.

Resources