Ruby http, net/http, httpclient: can't parse www.victoriassecret.com - ruby

I am using httpclient gem, it works fine on Windows, just moved to AWS EC2, tried it on https://victoriassecret.com and it gets this response:
= Response
HTTP/1.1 920 Unknown
Content-Type: text/html
Date: Wed, 21 Oct 2015 21:42:51 GMT
Connection: Keep-Alive
Content-Length: 23
<h1>File not found</h1>#<HTTP::Message:0x000000023f5168
#http_body=
#<HTTP::Message::Body:0x000000023f50a0
#body="<h1>File not found</h1>",
#chunk_size=nil,
#positions=nil,
#size=0>,
#http_header=
#<HTTP::Message::Headers:0x000000023f5140
#body_charset=nil,
#body_date=nil,
#body_encoding=#<Encoding:ASCII-8BIT>,
#body_size=0,
#body_type=nil,
#chunked=false,
#dumped=false,
#header_item=
[["Content-Type", "text/html"],
["Date", "Wed, 21 Oct 2015 21:42:51 GMT"],
["Connection", "Keep-Alive"],
["Content-Length", "23"]],
#http_version="1.1",
#is_request=false,
#reason_phrase="Unknown",
#request_absolute_uri=nil,
#request_method="GET",
#request_query=nil,
#request_uri=
#<URI::HTTPS:0x000000023f58c0 URL:https://www.victoriassecret.com/pink/new-and-now>,
#status_code=920>,
#peer_cert=
#<OpenSSL::X509::Certificate: subject=#<OpenSSL::X509::Name:0x000000024ebe00>, issuer=#<OpenSSL::X509::Name:0x000000024ebec8>, serial=#<OpenSSL::BN:0x000000024de110>, not_before=2015-05-27 00:00:00 UTC, not_after=2017-05-26 23:59:59 UTC>,
#previous=nil>
It does not work only with this website, httpclient get https://google.com for example works fine. But on Windows I get normal response from httpclient get https://www.victoriassecret.com. Butt when using standard NET/HTTP library I get the same 920 response on Windows.

This isn't ec2 related. It's most likely related to the User Agent header sent by the various http library implementations.
For example, they clearly don't like 'wget':
curl -A "Wget/1.13.4 (linux-gnu)" -v https://www.victoriassecret.com
* Rebuilt URL to: https://www.victoriassecret.com/
* Trying 98.158.54.100...
* Connected to www.victoriassecret.com (98.158.54.100) port 443 (#0)
* TLS 1.2 # truncated
> GET / HTTP/1.1
> Host: www.victoriassecret.com
> User-Agent: Wget/1.13.4 (linux-gnu)
> Accept: */*
>
< HTTP/1.1 910 Unknown
< Content-Type: text/html
< Date: Thu, 22 Oct 2015 01:16:31 GMT
< Connection: Keep-Alive
< Content-Length: 23
<
* Connection #0 to host www.victoriassecret.com left intact
<h1>File not found</h1>%

Related

Using PDF Reactor as Web Service

I am discovering PDF reactor and I'd like to use it as a web service. To test a file, I use cURL
curl -v -X POST --header "Content-Type:application/xml" http://localhost:9423/service/rest/convert/async -d #test.html
Is that correct ?
test.html :
<html>
<body>
Coucou, je suis terrien.
</body>
</html>
Thank you for your help,
Cédrik
edit #1:
response from the comman above :
* About to connect() to localhost port 9423 (#0)
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 9423 (#0)
> POST /service/rest/convert/async HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.3.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: localhost:9423
> Accept: */*
> Content-Type:application/xml
> Content-Length: 50
>
< HTTP/1.1 400 Bad Request
< Content-Type: text/plain
< Date: Tue, 15 Dec 2015 11:47:29 GMT
< Content-Length: 307
< Server: Jetty(9.3.2.v20150730)
<
* Connection #0 to host localhost left intact
* Closing connection #0
JAXBException occurred : élément inattendu (URI : "", local : "html"). Les éléments attendus sont <{http://webservice.pdfreactor.realobjects.com/}configuration>. élément inattendu (URI : "", local : "html"). Les éléments attendus sont <{http://webservice.pdfreactor.realobjects.com/}configuration>.
When using the REST API of PDFreactor via cURL you have to send a configuration XML or JSON to the server which includes configuration for PDFreactor and your document, as described here: http://www.pdfreactor.com/product/doc_html/index.html#d0e688
A sample configuration for XML could look like this:
config.xml:
<tns:configuration xmlns:tns="http://webservice.pdfreactor.realobjects.com/">
<document><html> <body> Coucou, je suis terrien. </body> </html></document>
</tns:configuration>
You can then call the following:
curl -v -X POST --header "Content-Type:application/xml" http://localhost:9423/service/rest/convert/async.xml -d #config.xml
The output will look like the following:
* About to connect() to localhost port 9423
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 9423
> POST /service/rest/convert/async.xml HTTP/1.1
> User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
> Host: localhost:9423
> Accept: */*
> Content-Type:application/xml
> Content-Length: 195
>
> <tns:configuration xmlns:tns="http://webservice.pdfreactor.realobjects.com/"> <document><html><body>Coucou, je suis terrien.</body></html></document></tns:configuration>HTTP/1.1 202 Accepted
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Headers: Accept, Content-Length, content-type, Host, User-Agent
< Access-Control-Allow-Methods: GET, PUT, POST, DELETE
< Access-Control-Expose-Headers: Location
< Cache-Control: no-cache
< Date: Wed, 16 Dec 2015 16:34:19 GMT
< Location: http://localhost:9423/service/rest/progress/c2a58dbd-ef9d-4b79-87d9-079c139fe9ed
< Content-Length: 0
< Server: Jetty(9.3.2.v20150730)
* Connection #0 to host localhost left intact
* Closing connection #0
The "Location" response header contains the URL which can be used to retrieve the progress of the conversion, so you can retrieve the progress with (the ID will of course vary):
curl -v http://localhost:9423/service/rest/progress/c2a58dbd-ef9d-4b79-87d9-079c139fe9ed
This will return the conversion progress and if the conversion has finished the "Location" repsonse header will contain a new URL to retrieve the document. You can use ".pdf" to retrieve the PDF binary data or ".xml" to retrieve XML data containing the PDF as base64 encoded String, the number of pages of the document, etc.
curl -v http://localhost:9423/service/rest/document/c2a58dbd-ef9d-4b79-87d9-079c139fe9ed.pdf

WebHDFS not working on a secure hadoop cluster

I am trying to secure my HDP2 Hadoop cluster using Kerberos.
So far Hdfs, Hive, Hbase, Hue Beeswax and Hue Job/task browsers are working properly ; however Hue's File Browser is not working, it answers :
WebHdfsException at /filebrowser/
AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] (error 500)
Request Method: GET
Request URL: http://bt1svlmy:8000/filebrowser/
Django Version: 1.2.3
Exception Type: WebHdfsException
Exception Value:
AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] (error 500)
Exception Location: /usr/lib/hue/desktop/libs/hadoop/src/hadoop/fs/webhdfs.py in _stats, line 208
Python Executable: /usr/bin/python2.6
Python Version: 2.6.6
(...)
My hue.inifile is configured with all security_enabled=true and other related parameters set.
I believe the problem is with WebHDFS.
I tried the curl commands given at http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#Authentication
curl -i --negotiate -L -u : "http://172.19.115.50:14000/webhdfs/v1/filetoread?op=OPEN"
answers :
HTTP/1.1 403 Forbidden
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth=; Path=/; Expires=Thu, 01-Jan-1970 00:00:00 GMT; HttpOnly
Content-Type: text/html;charset=utf-8
Content-Length: 1027
Date: Wed, 08 Oct 2014 06:55:51 GMT
<html><head><title>Apache Tomcat/6.0.37 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 403 - Anonymous requests are disallowed</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>Anonymous requests are disallowed</u></p><p><b>description</b> <u>Access to the specified resource has been forbidden.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.37</h3></body></html>
And I could reproduce Hue's error message by adding a user with the following curl request:
curl --negotiate -i -L -u: "http://172.19.115.50:14000/webhdfs/v1/filetoread?op=OPEN&user.name=theuser"
it answers :
HTTP/1.1 500 Internal Server Error
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth=u=theuser&p=theuser&t=simple&e=1412735529027&s=rQAfgMdExsQjx6N8cQ10JKWb2kM=; Path=/; Expires=Wed, 08-Oct-2014 02:32:09 GMT; HttpOnly
Content-Type: application/json
Transfer-Encoding: chunked
Date: Tue, 07 Oct 2014 16:32:09 GMT
Connection: close
{"RemoteException":{"message":"SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]","exception":"AccessControlException","javaClassName":"org.apache.hadoop.security.AccessControlException"}}
It seems that there is no Kerberos negotiation between WebHDFS and curl.
I was expecting something like :
HTTP/1.1 401 UnauthorizedContent-Type: text/html; charset=utf-8
WWW-Authenticate: Negotiate
Content-Length: 0
Server: Jetty(6.1.26)
HTTP/1.1 307 TEMPORARY_REDIRECT
Content-Type: application/octet-stream
Expires: Thu, 01-Jan-1970 00:00:00 GMT
Set-Cookie: hadoop.auth="u=exampleuser&p=exampleuser#MYCOMPANY.COM&t=kerberos&e=1375144834763&s=iY52iRvjuuoZ5iYG8G5g12O2Vwo=";Path=/
Location: http://hadoopnamenode.mycompany.com:1006/webhdfs/v1/user/release/docexample/test.txt?op=OPEN&delegation=JAAHcmVsZWFzZQdyZWxlYXNlAIoBQCrfpdGKAUBO7CnRju3TbBSlID_osB658jfGfRpEt8-u9WHymRJXRUJIREZTIGRlbGVnYXRpb24SMTAuMjAuMTAwLjkxOjUwMDcw&offset=0
Content-Length: 0
Server: Jetty(6.1.26)
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 16
Server: Jetty(6.1.26)
A|1|2|3
B|4|5|6
Any idea what could have gone wrong ?
I do have in my hdfs-site.xml on every node :
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST#MY-REALM.COM</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>/etc/hadoop/conf/HTTP.keytab</value> <!-- path to the HTTP keytab -->
</property>
Looks like you do not access WebHDFS (default port = 50070) but HttpFS (default port = 14000), which is a "plain" webapp that is not secured the same way.
A WebHDFS url is often something like http://namenode:50070/webhdfs/v1 ; try to modify hue.ini with that parameter (WebHDFS is recommended over HttpFS)

GZIP encoding in Jersey 2 / Grizzly

I can't activate gzip-encoding in my Jersey service. This is what I've tried:
Started out with the jersey-quickstart-grizzly2 archetype from the Getting Started Guide.
Added rc.register(org.glassfish.grizzly.http.GZipContentEncoding.class);
(have also tried rc.register(org.glassfish.jersey.message.GZipEncoder.class);)
Started with mvn exec:java
Tested with curl --compressed -v -o - http://localhost:8080/myapp/myresource
The result is the following:
> GET /myapp/myresource HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 zlib/1.2.3.4 ...
> Host: localhost:8080
> Accept: */*
> Accept-Encoding: deflate, gzip
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Date: Sun, 03 Nov 2013 08:07:10 GMT
< Content-Length: 7
<
* Connection #0 to host localhost left intact
* Closing connection #0
Got it!
That is, despite Accept-Encoding: deflate, gzip in the request, there is no Content-Encoding: gzip in the response.
What am I missing here??
You have to register the org.glassfish.jersey.server.filter.EncodingFilter as well. This example enables deflate and gzip compression:
import org.glassfish.jersey.message.DeflateEncoder;
import org.glassfish.jersey.message.GZipEncoder;
import org.glassfish.jersey.server.ResourceConfig;
import org.glassfish.jersey.server.filter.EncodingFilter;
...
private void enableCompression(ResourceConfig rc) {
rc.registerClasses(
EncodingFilter.class,
GZipEncoder.class,
DeflateEncoder.class);
}
This solution is jersey specific and works not only with Grizzly, but with the JDK Http server as well.
Try the code like:
HttpServer httpServer = GrizzlyHttpServerFactory.createHttpServer(
BASE_URI, rc, false);
CompressionConfig compressionConfig =
httpServer.getListener("grizzly").getCompressionConfig();
compressionConfig.setCompressionMode(CompressionConfig.CompressionMode.ON); // the mode
compressionConfig.setCompressionMinSize(1); // the min amount of bytes to compress
compressionConfig.setCompressableMimeTypes("text/plain", "text/html"); // the mime types to compress
httpServer.start();

Gitlab public repo clone fails with "401 Unauthorized"

I'm hoping someone can help me diagnose this issue. I'm running Gitlab 5.2 on a default Ubuntu 12.04 install with the latest ruby and git. It's mostly vanilla with the exception of some LDAP mapping modifications (username, display name).
I'm running into an error with Gitlab that I'm having trouble diagnosing. Whenever I attempt to clone a 'public' repo, instead of the expected (and working on CentOS with the same LDAP mapping modifications):
Started GET "/dd/lol.git/info/refs?service=git-upload-pack" for 127.0.0.1 at 2013-06-17 10:21:55 -0400
Started POST "/dd/lol.git/git-upload-pack" for 127.0.0.1 at 2013-06-17 10:21:55 -0400
I get (on Ubuntu):
Started GET "/dd/lol.git/info/refs?service=git-upload-pack" for 127.0.0.1 at 2013-06-17 10:26:13 -0400
Started GET "/dd/lol.git/HEAD" for 127.0.0.1 at 2013-06-17 10:26:13 -0400
Started GET "/dd/lol.git/HEAD" for 127.0.0.1 at 2013-06-17 10:26:15 -0400
Started GET "/dd/lol.git/HEAD" for 127.0.0.1 at 2013-06-17 10:26:15 -0400
Started GET "/dd/lol.git/objects/8c/4e72acdc72843492f55d5918f53dd12e5f1e43" for 127.0.0.1 at 2013-06-17 10:26:15 -0400
Started GET "/dd/lol.git/objects/info/packs" for 127.0.0.1 at 2013-06-17 10:26:15 -0400
On the client side I get consistent "401 Unauthorized" messages, then I'm prompted for a password. It doesn't seem to be related to Apache or Nginx proxying.
Client-side log:
git clone http://127.0.0.1:9292/dd/lol.git
Cloning into 'lol'...
* Couldn't find host 127.0.0.1 in the .netrc file; using defaults
* About to connect() to 127.0.0.1 port 9292 (#0)
* Trying 127.0.0.1...
* Adding handle: conn: 0x7fc610803000
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fc610803000) send_pipe: 1, recv_pipe: 0
* Connected to 127.0.0.1 (127.0.0.1) port 9292 (#0)
> GET /dd/lol.git/info/refs?service=git-upload-pack HTTP/1.1
User-Agent: git/1.7.12.4 (Apple Git-37)
Host: 127.0.0.1:9292
Accept: */*
Accept-Encoding: gzip
Pragma: no-cache
< HTTP/1.1 200 OK
< Content-Type: text/plain; charset=utf-8
< Last-Modified: Mon, 17 Jun 2013 14:33:31 GMT
< Expires: Fri, 01 Jan 1980 00:00:00 GMT
< Pragma: no-cache
< Cache-Control: no-cache, max-age=0, must-revalidate
< X-UA-Compatible: IE=Edge,chrome=1
< X-Request-Id: 0a9ec65cffb7888fb6fbc136171fa80a
< X-Runtime: 0.079635
< Date: Mon, 17 Jun 2013 14:33:31 GMT
< X-Content-Digest: 198141e92e2cf9bb83d1aa1022fdea885993f02e
< Age: 0
< X-Rack-Cache: stale, invalid, store
< Content-Length: 59
<
* Connection #0 to host 127.0.0.1 left intact
* Couldn't find host 127.0.0.1 in the .netrc file; using defaults
* Found bundle for host 127.0.0.1: 0x7fc6104155f0
* Re-using existing connection! (#0) with host 127.0.0.1
* Connected to 127.0.0.1 (127.0.0.1) port 9292 (#0)
* Adding handle: conn: 0x7fc610803000
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fc610803000) send_pipe: 1, recv_pipe: 0
> GET /dd/lol.git/HEAD HTTP/1.1
User-Agent: git/1.7.12.4 (Apple Git-37)
Host: 127.0.0.1:9292
Accept: */*
Accept-Encoding: gzip
Pragma: no-cache
* The requested URL returned error: 401 Unauthorized
* Closing connection 0
Any suggestions at all are very welcome, I'm not familiar with Gitlab and I'm currently a bit stumped.
Dmitry
Cloning with LDAP activated seems to be a recurring problem, especially over https:
issue 4288
issue 3890
issue 4129
A workaround is proposed here, and is related to file lib/gitlab/backend/grack_auth.rb, but a final fix is still in progress.
Update: from 5.3+ and 6.x, this should have been fixed.

Apache2 is changing my content type for a Ruby cgi script

I have a ruby cgi script which writes it output like this:
cgi.out("Cache-Control" => "no-cache, must-revalidate",
"type" => "text/html",
"charset" => "UTF-8") {
template.result(binding)
}
Unfortunately, when I view the headers from cURL, I see the following:
< HTTP/1.1 200 OK
< Date: Sun, 23 Aug 2009 09:48:03 GMT
< Server: Apache/2.2.11 (Ubuntu) DAV/2 SVN/1.5.4 PHP/5.2.6-3ubuntu4.1 with Suhosin-Patch mod_ssl/2.2.11 OpenSSL/0.9.8g
< 5541-Content-Type: text/html; charset=UTF-8
< Cache-Control: no-cache, must-revalidate
< Content-Length: 2495
< Cache-Control: max-age=86400
< Expires: Mon, 24 Aug 2009 09:48:03 GMT
< Content-Type: application/x-ruby
It's renaming my Content-Type, and adding a second cache control header. Clearly I have something misconfigured.
Turns out a had a debugging 'print' statement which was executing before the cgi.out() line. This caused a bit a text to prefix the headers.

Resources