webhdfs is redirecting to localhost:50075 - hadoop

I am trying to create a file from a non-hadoop environment to a remote hdfs.
For this purpose, I am using pywebhdfs api and I'm running command using curl.
https://pythonhosted.org/pywebhdfs/
I used this documentation as a reference, I am able to execute all other methods except of create_file().
While using create_file(), I am getting error like: 'Couldn't connect to host'
Command: curl -i -X PUT -L "http://xxx.xxx.xxx.xxx:50070/webhdfs/v1/test1/?op=CREATE" -T sample.txt
Response:HTTP/1.1 100 Continue
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Tue, 30 Oct 2018 12:04:04 GMT
Date: Tue, 30 Oct 2018 12:04:04 GMT
Pragma: no-cache
Expires: Tue, 30 Oct 2018 12:04:04 GMT
Date: Tue, 30 Oct 2018 12:04:04 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Location: http://localhost:50075/webhdfs/v1/test1/?op=CREATE&namenoderpcaddress=xxx.xxx.xxx.xxx:9000&overwrite=false
Content-Length: 0
Server: Jetty(6.1.26)
curl: (7) couldn't connect to host
This Location is displaying as localhost here. I took reference from the past post.
webhdfs always redirect to localhost:50075
but I didn't get success.
I tried changing IP in hdfs-site.xml and /etc/hosts file but no success at all.
Can anyone tell me, how to fix this?
Thanks in advance..

Related

WebHDFS FileNotFoundException rest api

I am posting this question as a continuation of post webhdfs rest api throwing file not found exception
I have an image file I would like to OPEN through the WebHDFS rest api.
the file exists in hdfs and has appropriate permissions
I can
LISTSTATUS that file and get an answer:
curl -i "http://namenode:50070/webhdfs/v1/tmp/file.png?op=LISTSTATUS"
HTTP/1.1 200 OK
Date: Fri, 17 Jul 2020 22:47:29 GMT
Cache-Control: no-cache
Expires: Fri, 17 Jul 2020 22:47:29 GMT
Date: Fri, 17 Jul 2020 22:47:29 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
Content-Type: application/json
Transfer-Encoding: chunked
{"FileStatuses":{"FileStatus":[
{"accessTime":1594828591740,"blockSize":134217728,"childrenNum":0,"fileId":11393739,"group":"hdfs","length":104811,"modificationTime":1594828592000,"owner":"XXXX","pathSuffix":"XXXX","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}
]}}
Content-Type: application/octet-stream
Content-Length: 0
So the api can properly read the metadata, but I cannot get that file to OPEN:
curl -i "http://namenode:50070/webhdfs/v1/tmp/file.png?op=OPEN"
HTTP/1.1 307 Temporary Redirect
Date: Fri, 17 Jul 2020 22:23:17 GMT
Cache-Control: no-cache
Expires: Fri, 17 Jul 2020 22:23:17 GMT
Date: Fri, 17 Jul 2020 22:23:17 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
Location: http://datanode1:50075/webhdfs/v1/tmp/file.png?op=OPEN&namenoderpcaddress=namenode:8020&offset=0
Content-Type: application/octet-stream
Content-Length: 0
{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path is not a file: /tmp/file.png......
So, according to webhdfs rest api throwing file not found exception, I can see that the request is passed off from the namenode to the datanode1.
Datanode1 is in my hosts file, I can connect to it an check the status of webhdfs from there:
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
<final>true</final>
</property>
It is allowed, same on the namenode.
I also went to look at the hdfs logs on /var/log/hadoop/hdfs/*.{log,out} to see if I could find errors triggered when I curled, but nothing seems to happen. I see no entry pertaining to my file or webhdfs query. I tried that on the namenode and datanode1.
as a last ditch effort I tried to increase permissions (not ideal) from 644 (seen in point 2/) to 666
hdfs dfs -chmod 666 /tmp/file.png
curl -i "http://namenode:50070/webhdfs/v1/tmp/file.png?op=LISTSTATUS"
HTTP/1.1 403 Forbidden
Date: Fri, 17 Jul 2020 23:06:18 GMT
Cache-Control: no-cache
Expires: Fri, 17 Jul 2020 23:06:18 GMT
Date: Fri, 17 Jul 2020 23:06:18 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
Content-Type: application/json
Transfer-Encoding: chunked
{"RemoteException":{"exception":"AccessControlException","javaClassName":"org.apache.hadoop.security.AccessControlException","message":"Permission denied: user=XXXX, access=READ_EXECUTE, inode=\"/tmp/file.png\":XXXX:hdfs:drw-rw-rw-"}}
So it seems it did the switch but somehow I got a permission issue when relaxing the current permissions that I didnt get before? It is not like I removed the X flag, it wasn't there to begin with. Does access=READ_EXECUTE require both R and X?
Now I am at a loss as to why I can see but not read this file with HDFS. Can someone please help me troubleshoot this?
Looking closer at your last error,
... inode=\"/tmp/file.png\":XXXX:hdfs:drw-rw-rw-"}
, it seems to indicate that file.png is actually a directory (leading d symbol) and not a file. This is consistent with the error you're getting in step #3 *..."message":"Path is not a file: /tmp/file.png....
You can double check that simply by doing $ hdfs dfs -ls /tmp/file.png/.
Getting back to your access error, you do need an "execute" (x) permission to list the files in a directory.

webhdfs rest api throwing file not found exception

I am trying to open a hdfs file that is present on cdh4 cluster from cdh5 machine using webhdfs from the command line as below:
curl -i -L "http://namenodeIpofCDH4:50070/webhdfs/v1/user/quad/source/JSONML.java?user.name=quad&op=OPEN"
I am getting "File Not Found Exception" even if the file JSONML.java is present in the mentioned path in namenode as well as datanode and its trace is as follows:
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Thu, 01-Jan-1970 00:00:00 GMT
Date: Mon, 22 Feb 2016 13:25:35 GMT
Pragma: no-cache
Date: Mon, 22 Feb 2016 13:25:35 GMT
Pragma: no-cache
Set-Cookie: hadoop.auth="u=quad&p=quad&t=simple&e=1456183535737&s=KdZYcA5iwJeIU2F9ZJfLSaT4qMY=";Path=/
Location: http://n3.quadratics.com:50075/webhdfs/v1/user/quad/source/JSONML.java?op=OPEN&user.name=quad&namenoderpcaddress=n1.quadratics.com:8020&offset=0
Content-Type: application/octet-stream
Content-Length: 0
Server: Jetty(6.1.26.cloudera.4)
HTTP/1.1 404 Not Found
Cache-Control: no-cache
Expires: Mon, 22 Feb 2016 13:26:28 GMT
Date: Mon, 22 Feb 2016 13:26:28 GMT
Pragma: no-cache
Expires: Mon, 22 Feb 2016 13:26:28 GMT
Date: Mon, 22 Feb 2016 13:26:28 GMT
Pragma: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26.cloudera.4)
{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File does not exist: /user/quad/source/JSONML.java\n\tat org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)\n\tat org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:559)\n\tat org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)\n"}}
But I don't get any error and get the status of the above file when I use the below command:
curl -i -L http://namenodeIpofCDH4:50070/webhdfs/v1/user/quad/source/JSONML.java?user.name=quad&op=GETFILESTATUS"
I get the output response as below:
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Thu, 01-Jan-1970 00:00:00 GMT
Date: Mon, 22 Feb 2016 13:38:48 GMT
Pragma: no-cache
Date: Mon, 22 Feb 2016 13:38:48 GMT
Pragma: no-cache
Set-Cookie: hadoop.auth="u=quad&p=quad&t=simple&e=1456184328134&s=sE6esO8J39O+itl+ggNzX4/WzjQ=";Path=/
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26.cloudera.4)
{"FileStatus":{"accessTime":1456147448567,"blockSize":134217728,"group":"quad","length":14849,"modificationTime":1456143798039,"owner":"quad","pathSuffix":"","permission":"644","replication":3,"type":"FILE"}}
Any ideas of the reason of why opening a file is failing and fixing that would be greatly appreciated.
I saw a similar error when I had misconfigured my /etc/hosts
The OPEN command above returns a redirect which provides a hostname. The localhost will try and resolve this hostname based on the local DNS setup.
It will then look for the file at the IP address that the hostname resolves to. Not necessarily the one you issued the command to.

webhdfs always redirect to localhost:50075

I have a hdfs cluster (hadoop 2.7.1), with one namenode, one secondary namenode, 3 datanodes.
When I enable webhdfs and test, I found it always redirect to "localhost:50075" which is not configured as datanodes.
csrd#secondarynamenode:~/lybica-hdfs-viewer$ curl -i -L "http://10.56.219.30:50070/webhdfs/v1/demo.zip?op=OPEN"
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Tue, 01 Dec 2015 03:29:21 GMT
Date: Tue, 01 Dec 2015 03:29:21 GMT
Pragma: no-cache
Expires: Tue, 01 Dec 2015 03:29:21 GMT
Date: Tue, 01 Dec 2015 03:29:21 GMT
Pragma: no-cache
Location: http://localhost:50075/webhdfs/v1/demo.zip?op=OPEN&namenoderpcaddress=10.56.219.30:9000&offset=0
Content-Type: application/octet-stream
Content-Length: 0
Server: Jetty(6.1.26)
curl: (7) Failed to connect to localhost port 50075: Connection refused
The etc/hadoop/slaves is configured as:
10.56.219.32
10.56.219.33
10.56.219.34
Is there any configurations on this?
Thanks!
Well, it's a /etc/hosts mistake.
The /etc/hosts on datanodes was:
127.0.0.1 localhost datanode-1
change it to:
127.0.0.1 datanode-1 localhost
fix this problem.
You need to have this entry in hdfs-site.xml
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
Value should be 0.0.0.0 on a cluster. You need to restart the cluster after updating hdfs-site.xml file and deploying it on all nodes in the cluster.

WebHDFS OPEN command returns empty results

I created a simple file in HDFS at the path /user/admin/foo.txt
I can see the contents of this file in Hue.
How I issue the command
curl -i http://namenode:50070/webhdfs/v1/user/admin/foo.txt?op=OPEN
I get the response
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Tue, 24 Nov 2015 16:20:15 GMT
Date: Tue, 24 Nov 2015 16:20:15 GMT
Pragma: no-cache
Expires: Tue, 24 Nov 2015 16:20:15 GMT
Date: Tue, 24 Nov 2015 16:20:15 GMT
Pragma: no-cache
Location: http://datanode:50075/webhdfs/v1/user/admin/foo.txt?op=OPEN&namenoderpcaddress=nameservice1&offset=0
Content-Type: application/octet-stream
Content-Length: 0
Server: Jetty(6.1.26.cloudera.4)
why is the content-length: 0?? I was hoping that this would list the contents of the file.
Execute:
curl -i http://datanode:50075/webhdfs/v1/user/admin/foo.txt?op=OPEN&namenoderpcaddress=nameservice1&offset=0
As for the explanation - when using WebHDFS to open a file you have to do the following:
You don't know which node the file resides on, so you ask the namenode.
The namenode returns you a datanode containing the file.
You can then open the file itself by talking directly to the datanode.
So this activity is expected. See https://hadoop.apache.org/docs/r1.0.4/webhdfs.html for more information.

NGINX + HHVM + Magento

I've been working on a development box deploying an Nginx, Magento, HHVM model. The site loads nicely and I don't get any real clear errors that I can see. However, a good number of my images are not loading properly. All I get are the image dimensions. The version of HHVM I'm using is:
HipHop VM 3.3.0-dev (rel)
Compiler: heads/master-0-g39d07abf36f7cad883d6be2a384a77c0c8aac040
Repo schema: 248cb6ce2d336c890fb26fd16690bf1b53b46994
Extension API: 20140829
The version of Nginx is:
nginx version: nginx/1.7.4
Some images load, and others don't. We're using timthumb for images, and I do see one error:
\nWarning: Is a directory in /home/user/shared/timthumb/index.php on line 90
If I run a curl from the command-line, I get the following result for an image, which means it's being seen as far as I can tell:
HTTP/1.1 200 OK
Content-Type: image/gif
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: HHVM/3.3.0-dev
CF-Cache-Status: HIT
CF-RAY: 16513733df33085c-IAD
Server: cloudflare-nginx
Date: Fri, 05 Sep 2014 08:56:47 GMT
Expires: Sun, 05 Oct 2014 08:56:47 GMT
Cache-Control: public, max-age=2592000
Set-Cookie: __cfduid=d6c188df6595a443200995301f0b707981409907407975; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.placehold.it; HttpOnly
Last-Modified: Fri, 19 Nov 2010 07:00:00 GMT
Cache-Control: max-age=31536000, public, must-revalidate, proxy-revalidate
I have a separate test server with the same content/database running NGINX/PHP-FPM, and the only variable I can see that's different in the curl is that when I run the curl against the dev server running php-fpm, the result is Content-Type: image/jpeg. When I run it against HHVM I get Content-Type: image/gif.
I would think this is related to rewrite rules, but I'm not sure. I also just discovered this error here as well in exception.log:
2014-09-05T12:04:06+00:00 ERR (3): Warning: open(tcp://127.0.0.1:11211?persistent=1&weight=2&timeout=10&retry_interval=10/sess_dd885307c2ba602e5faa7020c9c7c038, O_RDWR) failed: No such file or directory (2) in on line 0
Any advice would be greatly appreciated.

Resources