Using kerberized webhdfs/hdfs with JAVA API - hadoop

I have a Hadoop cluster with Kerberos enabled I want to put files on HDFS using a windows/linux machine outside the cluster.
Hadoop admin team have provided me with username to access hadoop and keytab file, how should I use them in my java code?
I went through many resources on internet but none of them give any guide for accessing kerberized hadoop from outside the cluster.
Also, Is it necessary to run the code using hadoop jar? if yes how will I run it fromo outside the cluster
Reference
http://blog.rajeevsharma.in/2009/06/using-hdfs-in-java-0200.html
http://princetonits.com/technology/using-filesystem-api-to-read-and-write-data-to-hdfs/
I got kerberos working ,able to generate ticket now
But curl is not working(windows)
curl -i --negotiate u:qjdht93 "http://server:50070/webhdfs/v1/user/qjdht93/?op=LISTSTATUS"
Gives error as
HTTP/1.1 401 Authentication required
Cache-Control: must-revalidate,no-cache,no-store
Date: Mon, 01 Jun 2015 15:26:37 GMT
Pragma: no-cache
Date: Mon, 01 Jun 2015 15:26:37 GMT
Pragma: no-cache
Content-Type: text/html; charset=iso-8859-1
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Version=1; Path=/; Expires=Thu, 01-Jan-1970 00:00:00 G
MT; HttpOnly
Content-Length: 1416
Server: Jetty(6.1.26)
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 401 Authentication required</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /webhdfs/v1/user/qjdht93. Reason:
<pre> Authentication required</pre></p><hr /><i><small>Powered by Jetty://</s
mall></i><br/>
<br/>
<br/>
Please suggest

This can be achieve through hdfs command. All you need is hadoop distribution and configuration files which present on namenode.
Copy the hadoop distribution on the client node. It means you have to copy the complete hadoop package to the client machine.
Refer this
Get a user ticket from keytab using kinit command which is a command line tool for java.
a. Install jdk in you client machine.
b. Set JAVA_HOME, see here
c. Create a krb5.ini file in the location C:\windows\krb5.ini. This file should contain below information,
[libdefaults]
default_realm = REALM
[realms]
REALM = {
kdc = kdcvalue
admin_server = kdcvalue
default_domain = kdcvalue
}
[domain_realm]
.kdcvalue = REALM
kdcvalue = REALM
REALM - Server Realm name
kdcvalue - Server host name or ip address
d. Make sure java bin path set in the PATH variable in windows machine. Open command prompt, Type the below command to get user ticket.
kinit -k -t keytabfile username
Now you can able to put the file into HDFS using "hadoop fs -put src dest" or using java.

Related

Laravel Scheduler creating thousands of log files in tmp folder

I have a Digital Ocean Ubuntu 16.04 server running Laravel.
I have a couple crons running every minute in the scheduler that trigger the Quickbooks API library. Everytime it runs it logs a text file similar to below. It creates a request and response txt file:
RESPONSE URI FOR SEQUENCE ID 04745
==================================
https://quickbooks.api.intuit.com/v3/company/123456/query?minorversion=54
RESPONSE HEADERS
================
date: Tue, 15 Dec 2020 18:15:51 GMT
content-type: application/xml;charset=UTF-8
content-length: 1193
connection: close
server: nginx
strict-transport-security: max-age=15552000
intuit_tid: 1-5fd8fd57-123456
x-spanid: d59bb673-e981-4d61-9bdf-123456
x-amzn-trace-id: Root=1-5fd8fd57-123456
set-cookie: JSESSIONID=123456.c21-pprdc21uw2apv019661-stack-b; Domain=qbo.intuit.com; Path=/; Secure; Ht$
qbo-version: 1949.239
service-time: total=8, db=3
expires: 0
cache-control: max-age=0, no-cache, no-store, must-revalidate, private
vary: Accept-Encoding
x-xss-protection: 1; mode=block
RESPONSE BODY
=============
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<IntuitResponse xmlns="http://schema.intuit.com/finance/v3" time="2020-12-15T10:15:51.199-08:00">
<QueryResponse startPosition="1" maxResults="1">
<Vendor domain="QBO" sparse="false">
<Id>6213</Id>.....
I don't need these logs created. How can I stop these from being created? I am not sure if they are being created from Ubuntu, Laravel or the Quickbooks API.
The bandaid is that I have a cron to remove these files from a shell script but I am trying to stop them from being generated in the first place. Thanks
It looks like a response from QuickBooks API. You can check here or here to find out how to disable or modify response log.

webhdfs is redirecting to localhost:50075

I am trying to create a file from a non-hadoop environment to a remote hdfs.
For this purpose, I am using pywebhdfs api and I'm running command using curl.
https://pythonhosted.org/pywebhdfs/
I used this documentation as a reference, I am able to execute all other methods except of create_file().
While using create_file(), I am getting error like: 'Couldn't connect to host'
Command: curl -i -X PUT -L "http://xxx.xxx.xxx.xxx:50070/webhdfs/v1/test1/?op=CREATE" -T sample.txt
Response:HTTP/1.1 100 Continue
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Tue, 30 Oct 2018 12:04:04 GMT
Date: Tue, 30 Oct 2018 12:04:04 GMT
Pragma: no-cache
Expires: Tue, 30 Oct 2018 12:04:04 GMT
Date: Tue, 30 Oct 2018 12:04:04 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Location: http://localhost:50075/webhdfs/v1/test1/?op=CREATE&namenoderpcaddress=xxx.xxx.xxx.xxx:9000&overwrite=false
Content-Length: 0
Server: Jetty(6.1.26)
curl: (7) couldn't connect to host
This Location is displaying as localhost here. I took reference from the past post.
webhdfs always redirect to localhost:50075
but I didn't get success.
I tried changing IP in hdfs-site.xml and /etc/hosts file but no success at all.
Can anyone tell me, how to fix this?
Thanks in advance..

webhdfs always redirect to localhost:50075

I have a hdfs cluster (hadoop 2.7.1), with one namenode, one secondary namenode, 3 datanodes.
When I enable webhdfs and test, I found it always redirect to "localhost:50075" which is not configured as datanodes.
csrd#secondarynamenode:~/lybica-hdfs-viewer$ curl -i -L "http://10.56.219.30:50070/webhdfs/v1/demo.zip?op=OPEN"
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Tue, 01 Dec 2015 03:29:21 GMT
Date: Tue, 01 Dec 2015 03:29:21 GMT
Pragma: no-cache
Expires: Tue, 01 Dec 2015 03:29:21 GMT
Date: Tue, 01 Dec 2015 03:29:21 GMT
Pragma: no-cache
Location: http://localhost:50075/webhdfs/v1/demo.zip?op=OPEN&namenoderpcaddress=10.56.219.30:9000&offset=0
Content-Type: application/octet-stream
Content-Length: 0
Server: Jetty(6.1.26)
curl: (7) Failed to connect to localhost port 50075: Connection refused
The etc/hadoop/slaves is configured as:
10.56.219.32
10.56.219.33
10.56.219.34
Is there any configurations on this?
Thanks!
Well, it's a /etc/hosts mistake.
The /etc/hosts on datanodes was:
127.0.0.1 localhost datanode-1
change it to:
127.0.0.1 datanode-1 localhost
fix this problem.
You need to have this entry in hdfs-site.xml
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
Value should be 0.0.0.0 on a cluster. You need to restart the cluster after updating hdfs-site.xml file and deploying it on all nodes in the cluster.

HTTP/1.1 401 Access to the HttpFS server is restricted

bivm:/home/biadmin/Desktop # curl -i "http://bivm.ibm.com:14000/webhdfs/v1/tmp/newfile?op=OPEN"
HTTP/1.1 401 Access to the HttpFS server is restricted. Please obtain proper credential from the BigInsights Console first.
Content-Type: text/html; charset=iso-8859-1
Cache-Control: must-revalidate,no-cache,no-store
Content-Length: 1589
Server: Jetty(6.1.x)
I am getting the above mentioned error, when I am trying to access a file in my cluster using the WebHDFS REST API call through the HttpFS server (running on port 14000 by default). Please advice.

Magento Soap API working locally but not remotely

I have created a custom API for Magento Enterprise 1.11. Calling the API through Soap v1 works fine on my local dev environment, however I am unable to make calls from my local environment to the remote environment.
Using PHP interactive shell on my localdev:
php > $client = new SoapClient(WSDL_URI,array('trace'=>1));
php > $client->login(API_USER,API_KEY);
php > var_dump($client->__getLastResponse());
string(538) "<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="urn:Magento" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"><SOAP-ENV:Body><ns1:loginResponse><loginReturn xsi:type="xsd:string">f0eec73e49665aaf9cc4a6644fba5dc6</loginReturn></ns1:loginResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>
I have been able to do this successfully from the localhost, as well as between two local VMs running on my dev machine. I can also access the methods of my custom API without issue.
However, when I try to make a soap client to my remote test environment, I am able to create the client, but the call to $client->login(), or any subsequent call results in the following:
php > $client = new SoapClient(REMOTE_WSDL_URI,array('trace'=>1));
php > $client->login(API_USER,API_KEY);
PHP Warning: Uncaught SoapFault exception: [WSDL] SOAP-ERROR: Parsing WSDL: Couldn't load from 'http://REMOTE_HOST/index.php/api/index/index/wsdl/1/' : failed to load external entity "http://REMOTE_HOST/index.php/api/index/index/wsdl/1/" in php shell code:1
Stack trace:
#0 php shell code(1): SoapClient->__call('login', Array)
#1 php shell code(1): SoapClient->login(API_USER, API_KEY)
#2 {main}
php > var_dump($client->__getLastRequestHeaders());
string(255) "POST /index.php/api/index/index/ HTTP/1.1
Host: REMOTE_HOST
Connection: Keep-Alive
User-Agent: PHP-SOAP/5.3.18-1~dotdeb.0
Content-Type: text/xml; charset=utf-8
SOAPAction: "urn:Mage_Api_Model_Server_HandlerAction"
Content-Length: 550
php > var_dump($client->__getLastResponseHeaders());
string(840) "HTTP/1.1 500 Internal Service Error
Date: Mon, 11 Feb 2013 19:06:56 GMT
Server: Apache/2.2.16 (Debian)
X-Powered-By: PHP/5.3.19-1~dotdeb.0
Set-Cookie: PHPSESSID=7uqrcmiv96hroubnb1uu7c7cm6; expires=Wed, 13-Feb-2013 01:06:56 GMT; path=/; domain=.REMOTE_HOST; HttpOnly
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: CUSTOMER=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.REMOTE_HOST; httponly
Set-Cookie: CUSTOMER_INFO=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.REMOTE_HOST; httponly
Set-Cookie: CUSTOMER_AUTH=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.REMOTE_HOST; httponly
Content-Length: 468
Vary: Accept-Encoding
Connection: close
Content-Type: text/xml; charset=utf-8
php > var_dump($client->__getLastResponse());
string(468) "<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Body><SOAP-ENV:Fault><faultcode>WSDL</faultcode><faultstring>SOAP-ERROR: Parsing WSDL: Couldn't load from 'http://REMOTE_HOST/index.php/api/index/index/wsdl/1/' : failed to load external entity "http://REMOTE_HOST/index.php/api/index/index/wsdl/1/"
</faultstring></SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>
When I hit //REMOTE_HOST/index.php/api/?wsdl I get the standard Magento WSDL.
The two environments are 99.99% identical:
Server version: Apache/2.2.16 (Debian) (both local dev and remote)
PHP 5.3.18 (local dev) 5.3.19 (remote host)
Apache/PHP configurations are the same.
Code base is identical
I have scoured the intewebs for clues, including:
http://www.magentocommerce.com/boards/viewthread/56528/
http://www.magentocommerce.com/wiki/5_-_modules_and_development/web_services/overriding_an_existing_api_class_with_additional_functionality#wsdl
Unable to connect to Magento SOAP API v2 due to "failed to load external entity"
Magento API SOAP-ERROR: Parsing WSDL: Couldn't load from '[url]/index.php/api/index/index/?wsdl=1' : Couldn't find end of Start Tag part line 56
http://www.magentocommerce.com/api/soap/introduction.html
I've tried the "Content-Length" header fix mentioned in the sedond-to-last link, and just about everything else I could think of... Stumped.
While you can load the WSDL URL (http://REMOTE_HOST/index.php/api/index/index/wsdl/1/) from your computer, your remote server can't contact itself via its REMOTE_HOST.
PHP's SoapServer object (used by Magento's implementation) needs to contact the WSDL to know which methods are exposed.
For reasons I've never been able to figure out, it's a common network configuration for a server to not have access to it's own DNS entries. Connect to your server via SSH and try running the following
curl http://REMOTE_HOST/index.php/api/index/index/wsdl/1/
My guess is you'll get a network timeout or a REMOTE_HOST unknown error. Fix your configuration so your server can access itself, and everything should start working.
You could try changing host DNS nameservers perhaps.
vim /etc/resolv.conf to add Google's 8.8.8.8, 8.8.4.4

Resources