RemoteException when creating file with WebHDFS REST API - hadoop

I have not been able to create a file using Hadoop's WebHDFS REST API.
Following the docs, I'm doing this.
curl -i -X PUT "http://hadoop-primarynamenode:50070/webhdfs/v1/tmp/test1234?op=CREATE&overwrite=false"
Response:
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Fri, 15 Jul 2016 04:10:13 GMT
Date: Fri, 15 Jul 2016 04:10:13 GMT
Pragma: no-cache
Expires: Fri, 15 Jul 2016 04:10:13 GMT
Date: Fri, 15 Jul 2016 04:10:13 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Location: http://hadoop-datanode1:50075/webhdfs/v1/tmp/test1234?op=CREATE&namenoderpcaddress=hadoop-primarynamenode:8020&overwrite=false
Content-Length: 0
Server: Jetty(6.1.26)
Following the redirect:
curl -i -X PUT -T MYFILE "http://hadoop-datanode1:50075/webhdfs/v1/tmp/test1234?op=CREATE&namenoderpcaddress=hadoop-primarynamenode:8020"
Response:
HTTP/1.1 100 Continue
HTTP/1.1 400 Bad Request
Content-Type: application/json; charset=utf-8
Content-Length: 162
Connection: close
{"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"Failed to parse \"null\" to Boolean."}}
I could not find any leads for that error message. Has anyone experienced this before?
I'm running a Hadoop cluster installed using Ambari.

Seems like the second PUT command requires a "createparent" parameter. In fact, both "overwrite" and "createparent" are needed. WebHDFS is not using default values. Definitely a bug...
curl -i -X PUT -T MYFILE "http://hadoop-datanode1:50075/webhdfs/v1/tmp/test1234?op=CREATE&namenoderpcaddress=hadoop-primarynamenode:8020&overwrite=false&createparent=false"

Related

CURL Download of TIFF Content Not Working

I'm trying to download images from an API and my curl command is not working for TIFF content types. The API relies on serial numbers and does not include image format. My plan is to pull the header first, then run CURL again based on the content type. Here's the header I'm getting:
HTTP/1.1 200 OK Last-Modified: Fri, 01 Feb 2002 01:02:00 GMT Vary: Accept-Encoding Strict-Transport-Security: max-age=31536000 Content-Type: image/tiff;charset=UTF-8 Content-Length: 2876 Date: Fri, 23 Apr 2021 23:53:11 GMT Connection: close Content-Security-Policy: frame-ancestors https://*.an.agency.gov:* X-XSS-Protection: 1; mode=block X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN
I'm using the below CURL command:
curl -X GET https://URL/STUFF/12345678 > /storage/12345678.tiff -H API-KEY
I can get JPG content to save correctly as JPG.
Here's what the JPG content headers look like:
HTTP/1.1 200 OK Last-Modified: Wed, 01 Oct 2008 12:55:33 GMT Vary: Accept-Encoding Strict-Transport-Security: max-age=31536000 Content-Type: image/jpeg;charset=UTF-8 Transfer-Encoding: chunked Date: Fri, 23 Apr 2021 23:53:05 GMT Connection: close Content-Security-Policy: frame-ancestors https://*.an.agency.gov:* X-XSS-Protection: 1; mode=block X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN
Kinda stumped here.
UPDATE:
I ran a GM Convert command on a downloaded TIF and got the following error:
Improper image header
Any ideas how to fix this? It seems the API is doing something to make the TIFF image displayable in the browser, but it seems to be messing up the file download.
After a ton of head scratching and investigation, it turns out the issue is with the API itself. The TIFF files are exported with a really important tag missing that causes most programs to say it's screwed up (I suppose it is technically). I figured out what was happening when I decided to give GIMP a shot at reading the file and to my surprise, it could read it and convert it successfully. Graphics Magick and ImageMagick can't deal with TIFFs missing the 'photometric interpretation' tag it seems, but GIMP apparently can. Found a nice little command line solution here:
Setting the photometric interpretation tag for a multi-page tiff

Shell script - Force HTTP 1.1 pipelining in curl, nc or telnet

I need to force HTTP pipelining (1.1) on serveral GET requests with curl, telnet or netcat on a bash script. I've already tried to do so with curl, but as far as I know the tool has dropped HTTP pipelining support since version 7.65.0, and I wasn't able to find much information about how to do so. Still, if with telnet or netcat couldn't be possible, I have access to curl version 7.29.0 in other computer.
From Wikipedia:
HTTP pipelining is a technique in which multiple HTTP requests are sent on a single TCP (transmission control protocol) connection without waiting for the corresponding responses.
To send multiple GET requests with netcat, something like this should do the trick:
echo -en "GET /index.html HTTP/1.1\r\nHost: example.com\r\n\r\nGET /other.html HTTP/1.1\r\nHost: example.com\r\n\r\n" | nc example.com 80
This will send two HTTP GET requests to example.com, one for http://example.com/index.html and another for http://example.com/other.html.
The -e flag means interpret escape sequences (the carriage returns and line feeds, or \r and \n). The -n means don't print a newline at the end (it would probably work without the -n).
I just ran the above command and got two responses from this, one was a 200 OK, the other was a 404 Not Found.
It might be easier to see the multiple requests and responses if you do a HEAD request instead of a GET request. That way, example.com's server will only respond with the headers.
echo -en "HEAD /index.html HTTP/1.1\r\nHost: example.com\r\n\r\nHEAD /other.html HTTP/1.1\r\nHost: example.com\r\n\r\n" | nc example.com 80
This is the output I get from the above command:
$ echo -en "HEAD /index.html HTTP/1.1\r\nHost: example.com\r\n\r\nHEAD /other.html HTTP/1.1\r\nHost: example.com\r\n\r\n" | nc example.com 80
HTTP/1.1 200 OK
Content-Encoding: gzip
Accept-Ranges: bytes
Age: 355429
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Mon, 24 Feb 2020 14:48:39 GMT
Etag: "3147526947"
Expires: Mon, 02 Mar 2020 14:48:39 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (dna/63B3)
X-Cache: HIT
Content-Length: 648
HTTP/1.1 404 Not Found
Accept-Ranges: bytes
Age: 162256
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Mon, 24 Feb 2020 14:48:39 GMT
Expires: Mon, 02 Mar 2020 14:48:39 GMT
Last-Modified: Sat, 22 Feb 2020 17:44:23 GMT
Server: ECS (dna/63AD)
X-Cache: 404-HIT
Content-Length: 1256
If you want to see more details, run one of these commands while wireshark is running. The request is sent in No. 7 (highlighted) and the two responses are received on No. 11 and No. 13.

Is there a way to improve copying to the clipboard format from Bash on Ubuntu on Windows?

When using Bash on Windows, copying from the terminal results in terribly formatted content when I paste. Here is a curl request I made and copied directly from Bash on Ubuntu on Windows:
$ curl -IL aol.com HTTP/1.1 301 Moved Permanently Date: Fri, 27 Apr 2018 15:58:31 GMT Server: Apache Location: https://www.aol.com/ Content-Type: text/html; charset=iso-8859-1
It's basically all on one line, almost like the carriage returns are not respected. I would expect the paste output to look like this:
$ curl -IL aol.com
HTTP/1.1 301 Moved Permanently
Date: Fri, 27 Apr 2018 15:19:49 GMT
Server: Apache
Location: https://www.aol.com/
Content-Type: text/html; charset=iso-8859-1
Is there a way to fix this? Right now I have to go into a text editor to format every copy/paste I need to make.
Edit: Here is a screen shot of the above code blocked copied to the clipboard directly from Bash on Win10: https://i.imgur.com/VUKRhLX.png

How to ask the date to a server from windows terminal?

For example, if I want to know if I am connected I can use:
ping 8.8.8.8
to send a ping to google DNS server. Can I ask in a similar way the date of a server? Something like:
give_me_your_date 8.8.8.8
if you had curl you could
curl -I http://example.com
HTTP/1.1 200 OK
Date: Sun, 16 Oct 2016 23:37:15 GMT
Server: Apache/2.4.23 (Unix)
X-Powered-By: PHP/5.6.24
Connection: close
Content-Type: text/html; charset=UTF-8
and you if that server is a webserver you would get an http header which you can retrieve the date from
you can get curl for windows here https://curl.haxx.se/download.html

Using Wget with buggy URL

I've got the following link, which is downloading a CSV file when put through a web browser.
http://pro.allocine.fr/film/export_classement.html?typeaffichage=2&lsttype=1001&lsttypeperiode=3002&typedonnees=visites&cfilm=&datefiltre=
However, when using Wget with Cygwin, with the command below, Wget retrieves a file, which is not a CSV file, but a file without extension. The file is empty, that is, has no data at all.
wget 'http://pro.allocine.fr/film/export_classement.html?typeaffichage=2&lsttype=1001&lsttypeperiode=3002&typedonnees=visites&cfilm=&datefiltre='
So as I hate to be stuck, I tried the following as well. I put the URL in a text file and used Wget with the file option:
inside fic.txt
'http://pro.allocine.fr/film/export_classement.html?typeaffichage=2&lsttype=1001&lsttypeperiode=3002&typedonnees=visites&cfilm=&datefiltre='
I used Wget in the following way:
wget -i fic.txt
I got the following errors:
Scheme missing
No URLs found in toto.txt
I think I can suggest some other options that will make your underlying problem more clear which is that it's supposed to be html, but there is no content (content-length = 0).
More concretely, this
wget -S -O export_classement.html 'http://pro.allocine.fr/film/export_classement.html?typeaffichage=2&lsttype=1001&lsttypeperiode=3002&typedonnees=visites&cfilm=&datefiltre='
produces this
Resolving pro.allocine.fr... 62.39.143.50
Connecting to pro.allocine.fr|62.39.143.50|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: nginx
Date: Fri, 28 Mar 2014 09:54:44 GMT
Content-Type: text/html; Charset=iso-8859-1
Connection: close
X-ServerName: WEBNX2
akamainocache: no-store
Content-Length: 0
Cache-control: private
X-KompressorName: kompressor7
Length: 0 [text/html]
2014-03-28 05:54:52 (0.00 B/s) - ‘export_classement.html’ saved [0/0]
Additionally the server is tailoring it's output based on how the browser identifies itself. using wget does have an option to include an arbitrary user-agent in the headers. Here's an example what happens when you make wget identify itself as Chrome. Here's a list of other possibiities.
wget -S --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36" 'http://pro.allocine.fr/film/export_classement.html?typeaffichage=2&lsttype=1001‌​&lsttypeperiode=3002&typedonnees=visites&cfilm=&datefiltre='
Now the output changes to export.csv, with type "application/octet-stream" instead of "text/html"
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: nginx
Date: Fri, 28 Mar 2014 10:34:09 GMT
Content-Type: application/octet-stream; Charset=iso-8859-1
Transfer-Encoding: chunked
Connection: close
X-ServerName: WEBNX2
Edge-Control: no-store
Last-Modified: Fri, 28 Mar 2014 10:34:17 GMT
Content-Disposition: attachment; filename=export.csv

Resources