WebHDFS not working on a secure hadoop cluster - hadoop

I am trying to secure my HDP2 Hadoop cluster using Kerberos.
So far Hdfs, Hive, Hbase, Hue Beeswax and Hue Job/task browsers are working properly ; however Hue's File Browser is not working, it answers :
WebHdfsException at /filebrowser/
AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] (error 500)
Request Method: GET
Request URL: http://bt1svlmy:8000/filebrowser/
Django Version: 1.2.3
Exception Type: WebHdfsException
Exception Value:
AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] (error 500)
Exception Location: /usr/lib/hue/desktop/libs/hadoop/src/hadoop/fs/webhdfs.py in _stats, line 208
Python Executable: /usr/bin/python2.6
Python Version: 2.6.6
(...)
My hue.inifile is configured with all security_enabled=true and other related parameters set.
I believe the problem is with WebHDFS.
I tried the curl commands given at http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#Authentication
curl -i --negotiate -L -u : "http://172.19.115.50:14000/webhdfs/v1/filetoread?op=OPEN"
answers :
HTTP/1.1 403 Forbidden
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth=; Path=/; Expires=Thu, 01-Jan-1970 00:00:00 GMT; HttpOnly
Content-Type: text/html;charset=utf-8
Content-Length: 1027
Date: Wed, 08 Oct 2014 06:55:51 GMT
<html><head><title>Apache Tomcat/6.0.37 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 403 - Anonymous requests are disallowed</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>Anonymous requests are disallowed</u></p><p><b>description</b> <u>Access to the specified resource has been forbidden.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.37</h3></body></html>
And I could reproduce Hue's error message by adding a user with the following curl request:
curl --negotiate -i -L -u: "http://172.19.115.50:14000/webhdfs/v1/filetoread?op=OPEN&user.name=theuser"
it answers :
HTTP/1.1 500 Internal Server Error
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth=u=theuser&p=theuser&t=simple&e=1412735529027&s=rQAfgMdExsQjx6N8cQ10JKWb2kM=; Path=/; Expires=Wed, 08-Oct-2014 02:32:09 GMT; HttpOnly
Content-Type: application/json
Transfer-Encoding: chunked
Date: Tue, 07 Oct 2014 16:32:09 GMT
Connection: close
{"RemoteException":{"message":"SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]","exception":"AccessControlException","javaClassName":"org.apache.hadoop.security.AccessControlException"}}
It seems that there is no Kerberos negotiation between WebHDFS and curl.
I was expecting something like :
HTTP/1.1 401 UnauthorizedContent-Type: text/html; charset=utf-8
WWW-Authenticate: Negotiate
Content-Length: 0
Server: Jetty(6.1.26)
HTTP/1.1 307 TEMPORARY_REDIRECT
Content-Type: application/octet-stream
Expires: Thu, 01-Jan-1970 00:00:00 GMT
Set-Cookie: hadoop.auth="u=exampleuser&p=exampleuser#MYCOMPANY.COM&t=kerberos&e=1375144834763&s=iY52iRvjuuoZ5iYG8G5g12O2Vwo=";Path=/
Location: http://hadoopnamenode.mycompany.com:1006/webhdfs/v1/user/release/docexample/test.txt?op=OPEN&delegation=JAAHcmVsZWFzZQdyZWxlYXNlAIoBQCrfpdGKAUBO7CnRju3TbBSlID_osB658jfGfRpEt8-u9WHymRJXRUJIREZTIGRlbGVnYXRpb24SMTAuMjAuMTAwLjkxOjUwMDcw&offset=0
Content-Length: 0
Server: Jetty(6.1.26)
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 16
Server: Jetty(6.1.26)
A|1|2|3
B|4|5|6
Any idea what could have gone wrong ?
I do have in my hdfs-site.xml on every node :
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST#MY-REALM.COM</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>/etc/hadoop/conf/HTTP.keytab</value> <!-- path to the HTTP keytab -->
</property>

Looks like you do not access WebHDFS (default port = 50070) but HttpFS (default port = 14000), which is a "plain" webapp that is not secured the same way.
A WebHDFS url is often something like http://namenode:50070/webhdfs/v1 ; try to modify hue.ini with that parameter (WebHDFS is recommended over HttpFS)

Related

How to get https value from response headers in jmeter

I have to get the sid value from response header and use this value in next request. but my regular expression is not picking up the value
Response:
HTTP/1.1 301 Moved Permanently
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
Location: https://abc.be.com/tsg/de.aspx?sid=s44mNTM3MkRCRUMtRkEfrgtfTEwRDMyQUZFJDI1MDYxMTE5m12s
Server: Microsoft-IIS/8.5
X-AspNet-Version: 4.0.30319
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000; includeSubDomains
Date: Mon, 07 Jun 2021 10:16:04 GMT
Content-Length: 0
enter image description here
enter image description here
I have tried my option and found solution my using the following regex:
Location: .+/de.aspx?sid=(.*?)\n
Just sid=(.*) should do the trick for you:
Where:
() - grouping
. - matches any character
* - zero or more occurrences
So it will start from sid= and capture everything till the line break
More information:
JMeter Regular Expressions
Using RegEx (Regular Expression Extractor) with JMeter
Perl 5 Regex Cheat sheet

Firefox web push "Invalid URL endpoint"

I try to send webpush to firefox
curl -i -X PUT https://updates.push.services.mozilla.com/push/gAAAAABW5EzHyop8VZSH2jm9LJ7W8ybH3ISlbZHDGnd4RwW7h2Jb0IGTuSsP2BCoBxl0kJp-kXXL164xNzhxkTEztP1-IqVf9040VOEuy_htb1nnp-24W-RGgWgjtGK1kZYAb1k3xmAS
HTTP/1.1 400 Bad Request
Access-Control-Allow-Headers: content-encoding,encryption,crypto-key,ttl,encryption-key,content-type,authorization
Access-Control-Allow-Methods: POST,PUT
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: location,www-authenticate
Content-Type: application/json
Date: Tue, 15 Mar 2016 13:04:44 GMT
Server: cyclone/1.1
Content-Length: 51
Connection: keep-alive
{"errno": 102, "code": 400, "error": "Bad Request"}
Does it mean that I have invalid registration id stored in my database and I should remove it?
The endpoint URL doesn't seem valid, it's usually something like https://updates.push.services.mozilla.com/push/v1/SOME_LONG_ID (note the v1 that your URL doesn't contain).
Indeed, this works:
curl -i -X PUT https://updates.push.services.mozilla.com/push/v1/gAAAAABW5EzHyop8VZSH2jm9LJ7W8ybH3ISlbZHDGnd4RwW7h2Jb0IGTuSsP2BCoBxl0kJp-kXXL164xNzhxkTEztP1-IqVf9040VOEuy_htb1nnp-24W-RGgWgjtGK1kZYAb1k3xmAS
Note that you might want to add the TTL header, otherwise your request might fail (you just need -H "TTL: 60"): https://blog.mozilla.org/services/2016/02/20/webpushs-new-requirement-ttl-header/.

Ruby http, net/http, httpclient: can't parse www.victoriassecret.com

I am using httpclient gem, it works fine on Windows, just moved to AWS EC2, tried it on https://victoriassecret.com and it gets this response:
= Response
HTTP/1.1 920 Unknown
Content-Type: text/html
Date: Wed, 21 Oct 2015 21:42:51 GMT
Connection: Keep-Alive
Content-Length: 23
<h1>File not found</h1>#<HTTP::Message:0x000000023f5168
#http_body=
#<HTTP::Message::Body:0x000000023f50a0
#body="<h1>File not found</h1>",
#chunk_size=nil,
#positions=nil,
#size=0>,
#http_header=
#<HTTP::Message::Headers:0x000000023f5140
#body_charset=nil,
#body_date=nil,
#body_encoding=#<Encoding:ASCII-8BIT>,
#body_size=0,
#body_type=nil,
#chunked=false,
#dumped=false,
#header_item=
[["Content-Type", "text/html"],
["Date", "Wed, 21 Oct 2015 21:42:51 GMT"],
["Connection", "Keep-Alive"],
["Content-Length", "23"]],
#http_version="1.1",
#is_request=false,
#reason_phrase="Unknown",
#request_absolute_uri=nil,
#request_method="GET",
#request_query=nil,
#request_uri=
#<URI::HTTPS:0x000000023f58c0 URL:https://www.victoriassecret.com/pink/new-and-now>,
#status_code=920>,
#peer_cert=
#<OpenSSL::X509::Certificate: subject=#<OpenSSL::X509::Name:0x000000024ebe00>, issuer=#<OpenSSL::X509::Name:0x000000024ebec8>, serial=#<OpenSSL::BN:0x000000024de110>, not_before=2015-05-27 00:00:00 UTC, not_after=2017-05-26 23:59:59 UTC>,
#previous=nil>
It does not work only with this website, httpclient get https://google.com for example works fine. But on Windows I get normal response from httpclient get https://www.victoriassecret.com. Butt when using standard NET/HTTP library I get the same 920 response on Windows.
This isn't ec2 related. It's most likely related to the User Agent header sent by the various http library implementations.
For example, they clearly don't like 'wget':
curl -A "Wget/1.13.4 (linux-gnu)" -v https://www.victoriassecret.com
* Rebuilt URL to: https://www.victoriassecret.com/
* Trying 98.158.54.100...
* Connected to www.victoriassecret.com (98.158.54.100) port 443 (#0)
* TLS 1.2 # truncated
> GET / HTTP/1.1
> Host: www.victoriassecret.com
> User-Agent: Wget/1.13.4 (linux-gnu)
> Accept: */*
>
< HTTP/1.1 910 Unknown
< Content-Type: text/html
< Date: Thu, 22 Oct 2015 01:16:31 GMT
< Connection: Keep-Alive
< Content-Length: 23
<
* Connection #0 to host www.victoriassecret.com left intact
<h1>File not found</h1>%

Django-rest-framework token auth doesn't work

I'm trying to POST json data to url, decorated with login_required, but django returns redirect to login page
DRF setup:
'DEFAULT_AUTHENTICATION_CLASSES': (
'rest_framework.authentication.SessionAuthentication',
'rest_framework.authentication.TokenAuthentication',
),
and rest_framework.authtoken in INSTALLED_APPS
I can obtain auth token via curl
$ curl -X POST -d "{\"username\" : 7, \"password\" : 1}" -H "Content-Type: application/json" http://127.0.0.1:9000/extapi/get-auth-token/
{"token":"bc61497d98bed02bd3a84af2235365d0b2b549ff"}
But when i POST to the view, decorated with login_required, django returns http 302 with Location header pointing to the login page.
$ curl -v -X POST -d '{"event":"14","user":"7","action":"1868","unit":"","value":"-1"}' -H "Content-Type: application/json" -H "Authorization: Token bc61497d98bed02bd3a84af2235365d0b2b549ff" http://127.0.0.1:9000/zk2015/events/actions/api/uservotejournal/7/
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 9000 (#0)
> POST /zk2015/events/actions/api/uservotejournal/7/ HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 127.0.0.1:9000
> Accept: */*
> Content-Type: application/json
> Authorization: Token bc61497d98bed02bd3a84af2235365d0b2b549ff
> Content-Length: 64
>
* upload completely sent off: 64 out of 64 bytes
< HTTP/1.1 302 FOUND
* Server nginx/1.4.6 (Ubuntu) is not blacklisted
< Server: nginx/1.4.6 (Ubuntu)
< Date: Fri, 18 Sep 2015 11:14:31 GMT
< Content-Type: text/html; charset=utf-8
< Location: http://127.0.0.1:9000/accounts/login/?next=/zk2015/events/actions/api/uservotejournal/7/
< Transfer-Encoding: chunked
< Connection: keep-alive
< Vary: Cookie
< X-Frame-Options: SAMEORIGIN
< ETag: "d41d8cd98f00b204e9800998ecf8427e"
< Set-Cookie: csrftoken=G85fWrKKsIA5a2uGPIn9fS4pqKrS51jK; expires=Fri, 16-Sep-2016 11:14:31 GMT; Max-Age=31449600; Path=/
<
* Connection #0 to host 127.0.0.1 left intact
I've tried to set breakpoints in rest_framework.authentication.SessionAuthentication and rest_framework.authentication.TokenAuthentication, but they were never fired
What is wrong in my setup? Help, please.
You are not passing the Authorization in Header in the curl
curl -X POST -d "{\"username\" : 7, \"password\" : 1}" -H "Authorization: Token bc61497d98bed02bd3a84af2235365d0b2b549ff" http://127.0.0.1:9000/extapi/get-auth-token/
The point is that request.user is AnonymousUser in drf.APIView.dispatch(), but is defined as authorized user in drf.APIView.post() and other similar methods.
This differs from django: request.user is defined as authorized user in django.views.View.dispatch()
Also that is the cause, why django.contrib.auth.decorators.login_required is not compatible whith drf views.

Gitlab public repo clone fails with "401 Unauthorized"

I'm hoping someone can help me diagnose this issue. I'm running Gitlab 5.2 on a default Ubuntu 12.04 install with the latest ruby and git. It's mostly vanilla with the exception of some LDAP mapping modifications (username, display name).
I'm running into an error with Gitlab that I'm having trouble diagnosing. Whenever I attempt to clone a 'public' repo, instead of the expected (and working on CentOS with the same LDAP mapping modifications):
Started GET "/dd/lol.git/info/refs?service=git-upload-pack" for 127.0.0.1 at 2013-06-17 10:21:55 -0400
Started POST "/dd/lol.git/git-upload-pack" for 127.0.0.1 at 2013-06-17 10:21:55 -0400
I get (on Ubuntu):
Started GET "/dd/lol.git/info/refs?service=git-upload-pack" for 127.0.0.1 at 2013-06-17 10:26:13 -0400
Started GET "/dd/lol.git/HEAD" for 127.0.0.1 at 2013-06-17 10:26:13 -0400
Started GET "/dd/lol.git/HEAD" for 127.0.0.1 at 2013-06-17 10:26:15 -0400
Started GET "/dd/lol.git/HEAD" for 127.0.0.1 at 2013-06-17 10:26:15 -0400
Started GET "/dd/lol.git/objects/8c/4e72acdc72843492f55d5918f53dd12e5f1e43" for 127.0.0.1 at 2013-06-17 10:26:15 -0400
Started GET "/dd/lol.git/objects/info/packs" for 127.0.0.1 at 2013-06-17 10:26:15 -0400
On the client side I get consistent "401 Unauthorized" messages, then I'm prompted for a password. It doesn't seem to be related to Apache or Nginx proxying.
Client-side log:
git clone http://127.0.0.1:9292/dd/lol.git
Cloning into 'lol'...
* Couldn't find host 127.0.0.1 in the .netrc file; using defaults
* About to connect() to 127.0.0.1 port 9292 (#0)
* Trying 127.0.0.1...
* Adding handle: conn: 0x7fc610803000
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fc610803000) send_pipe: 1, recv_pipe: 0
* Connected to 127.0.0.1 (127.0.0.1) port 9292 (#0)
> GET /dd/lol.git/info/refs?service=git-upload-pack HTTP/1.1
User-Agent: git/1.7.12.4 (Apple Git-37)
Host: 127.0.0.1:9292
Accept: */*
Accept-Encoding: gzip
Pragma: no-cache
< HTTP/1.1 200 OK
< Content-Type: text/plain; charset=utf-8
< Last-Modified: Mon, 17 Jun 2013 14:33:31 GMT
< Expires: Fri, 01 Jan 1980 00:00:00 GMT
< Pragma: no-cache
< Cache-Control: no-cache, max-age=0, must-revalidate
< X-UA-Compatible: IE=Edge,chrome=1
< X-Request-Id: 0a9ec65cffb7888fb6fbc136171fa80a
< X-Runtime: 0.079635
< Date: Mon, 17 Jun 2013 14:33:31 GMT
< X-Content-Digest: 198141e92e2cf9bb83d1aa1022fdea885993f02e
< Age: 0
< X-Rack-Cache: stale, invalid, store
< Content-Length: 59
<
* Connection #0 to host 127.0.0.1 left intact
* Couldn't find host 127.0.0.1 in the .netrc file; using defaults
* Found bundle for host 127.0.0.1: 0x7fc6104155f0
* Re-using existing connection! (#0) with host 127.0.0.1
* Connected to 127.0.0.1 (127.0.0.1) port 9292 (#0)
* Adding handle: conn: 0x7fc610803000
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fc610803000) send_pipe: 1, recv_pipe: 0
> GET /dd/lol.git/HEAD HTTP/1.1
User-Agent: git/1.7.12.4 (Apple Git-37)
Host: 127.0.0.1:9292
Accept: */*
Accept-Encoding: gzip
Pragma: no-cache
* The requested URL returned error: 401 Unauthorized
* Closing connection 0
Any suggestions at all are very welcome, I'm not familiar with Gitlab and I'm currently a bit stumped.
Dmitry
Cloning with LDAP activated seems to be a recurring problem, especially over https:
issue 4288
issue 3890
issue 4129
A workaround is proposed here, and is related to file lib/gitlab/backend/grack_auth.rb, but a final fix is still in progress.
Update: from 5.3+ and 6.x, this should have been fixed.

Resources