parsing from webpage yields 405 Not Allowed - ruby

I've been looking around for solutions before asking this questions but unfortunately none of them yielded good results.
I get a OpenURI::HTTPError: 405 Not Allowed when accessing this specific url:
require 'open-uri'
doc = Nokogiri::HTML(open("http://streeteasy.com"))
#=> OpenURI::HTTPError: 405 Not Allowed
from /Users/cyrusghazanfar/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/open-uri.rb:358:in `open_http'
also tried:
$ curl -I http://streeteasy.com
which returned:
HTTP/1.1 405 Not Allowed
Date: Fri, 22 Sep 2017 20:03:59 GMT
Content-Type: text/html
Connection: keep-alive
Server: nginx
X-DZ: 24.193.31.96
Vary: Accept-Encoding
X-DZ: 127.0.0.1
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Cache-Control: private, no-cache, no-store, must-revalidate
Edge-Control: no-store, bypass-cache
Surrogate-Control: no-store, bypass-cache

the problem is that the server needs an User-Agent header to work, so in curl it would be like:
curl --header "User-Agent: Mozilla/5.0" http://streeteasy.com

Related

{"message":"no Route matched with those values"}

I am new to Kong so bear with me:)
I am hosting my APIs on a windows server as http://supermarket.xxxx.com:5000
added service as follows on an Ubuntu box (http://supermarket.xxxx.com is added to hosts file)
curl -i -X POST
--url http://localhost:8001/services/
--data 'name=SupermarketService'
--data 'url=http://supermarket.xxx.com:5000'
HTTP/1.1 201 Created
Date: Thu, 17 Dec 2020 07:11:50 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Access-Control-Allow-Origin: *
Server: kong/2.1.3
Content-Length: 379
X-Kong-Admin-Latency: 204
2 Added the routes
curl -i -X POST
--url http://localhost:8001/services/SupermarketService/routes
--data 'hosts[]=supermarket.xxx.com'
--data 'paths[]=/api/categories'
--data 'strip_path=false'
--data 'methods[]=GET'
HTTP/1.1 201 Created
Date: Thu, 17 Dec 2020 09:01:17 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Access-Control-Allow-Origin: *
Server: kong/2.1.3
Content-Length: 463
X-Kong-Admin-Latency: 11
testing the setting on the Ubuntu box
curl -i -X GET
--url http://localhost:8000/api/categories
--header 'Host: supermarket.xxx.com'
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Server: Microsoft-IIS/8.5
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-UA-Compatible: IE=Edge,chrome=1
X-Xss-Protection: 1; mode=block
Content-Security-Policy: default-src https: data: 'unsafe-inline' 'unsafe-eval' 'self' ; connect-src 'self' data: *;
Date: Thu, 17 Dec 2020 23:10:21 GMT
X-Kong-Upstream-Latency: 1586
X-Kong-Proxy-Latency: 2
Via: kong/2.1.3
[{"id":100,"name":"Fruits and Vegetables"},{"id":101,"name":"Dairy"}]
when I try to access the same API in another box using a web browser
http://192.168.44.67:8000/api/categories //where 192.168.44.67 is the IP address of my Ubuntu box
iI get this
{"message":"no Route matched with those values"}
Please let me know what is wrong.
when accessing the API from the other box using a webbrowser you didn't set the Host header. Since when using a webbrowser the browser will automatically set that header based on the url you type in the browser url bar.
remove the following element from the curl request to create the route:
--data 'hosts[]=supermarket.xxx.com'
Then it will only match on the path segment you provided and it should work

How Can I Detect something from HTTP Response

I'am Developing bash script can detect web application firewall from header tags but i can find example like my idea ?
To find the request headers from bash you can simply use curl. If you're on windows you'll want the new windows bash shell or cygwin to run it.
There are dozens more tricks you can play with curl to get whatever you want in whatever format you want, lots of SO questions on it to answer any questions you come up with.
curl --head www.google.com
HTTP/1.1 200 OK
Date: Thu, 06 Apr 2017 02:07:00 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See https://www.google.com/support/accounts/answer/151657?hl=en for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Set-Cookie: NID=100=IoNzfnVsz_oaEwIQE182ysgVSHoZYRVKjTqSQ5GqKrz1ewxwav2ae5GPo_bx0apr39Pnn4yvM5RfsmQnJ_QFmllVwS34ts-bNrvkzDFIfaokkDTo1BXHDDI69duBn1f9kx4sXJ_rcCK28og6; expires=Fri, 06-Oct-2017 02:07:00 GMT; path=/; domain=.google.com; HttpOnly
Transfer-Encoding: chunked
Accept-Ranges: none
Vary: Accept-Encoding
Here's an example of getting response headers using curl:
curl -D - www.google.com
HTTP/1.1 200 OK
Date: Thu, 06 Apr 2017 02:11:26 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See https://www.google.com/support/accounts/answer/151657?hl=en for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Set-Cookie: NID=100=DrUalBDiHKiZkX0yETtowdWhEfjJy7ioPU0Fe7Wch9pbbYI8MeSbg8M42dHmwu-hKZmYUlnE7VIgLhJ_Zi6byG_PYpTu5s2KYUv9XjPeH-GfSOTSq22I2GnEqXZwhJv-Bdn0aYzCUugF9FHb3Q; expires=Fri, 06-Oct-2017 02:11:26 GMT; path=/; domain=.google.com; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Transfer-Encoding: chunked
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop=<cut the rest of the HTTP request>

Http-request not through proxy cache (tor + privoxy + squid)

I create one proxy cache using (tor + privoxy + squid) follow
How to create an advanced proxy server using Tor, Privoxy, and Squid
I try call to Proxy cache
curl -x 'my-ip:3400' -I 'https://github.global.ssl.fastly.net/images/modules/dashboard/bootcamp/octocat_setup.png'
curl -x 'my-ip:3400' -I http://hadoop.apache.org/images/hadoop-logo.jpg
https request is ok (through Pivoxy)
HTTP/1.1 200 Connection established
Proxy-Agent: Privoxy/3.0.21
HTTP/1.1 200 OK
Date: Sat, 08 Mar 2014 04:38:37 GMT
Server: GitHub.com
Content-Type: image/png
Last-Modified: Fri, 16 Aug 2013 09:42:18 GMT
Expires: Sun, 08 Mar 2015 04:38:37 GMT
Cache-Control: max-age=31536000, public
Timing-Allow-Origin: https://github.com
Content-Length: 10722
Accept-Ranges: bytes
Via: 1.1 varnish
Age: 0
X-Served-By: cache-am72-AMS
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1394253517.449177980,VS0,VE97
Vary: Accept-Encoding
but http requets not working
HTTP/1.0 200 OK
Date: Sat, 08 Mar 2014 04:39:19 GMT
Server: Apache/2.4.8-dev (Unix) mod_wsgi/3.4 Python/2.7.5 OpenSSL/1.0.1e
Last-Modified: Tue, 12 Feb 2008 17:38:57 GMT
ETag: "24e3-445f987fe69d8"
Accept-Ranges: bytes
Content-Length: 9443
Content-Type: image/jpeg
X-Cache: MISS from SVR226L-696.localdomain
X-Cache-Lookup: MISS from SVR226L-696.localdomain:3128
Via: 1.0 SVR226L-696.localdomain (squid)
Connection: keep-alive
Please suggest a specific solution

Display HTTP headers using Open::URI?

with Open::URI, I can do the following:
require 'open-uri'
#check status
open('http://google.com').status
#get entire html
open('http://google.com').read
Is it possible to get the HTTP headers of a request so things can be debugged, something like Curls' curl -I http://google.com?
$ curl -I google.com
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Mon, 17 Dec 2012 14:28:17 GMT
Expires: Wed, 16 Jan 2013 14:28:17 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Is this possible?
Use the meta method of the virtual filehandle:
open('http://google.com'){|f| pp f.meta }
{"x-frame-options"=>"SAMEORIGIN",
"expires"=>"-1",
"p3p"=>
"CP=\"This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info.\"",
"content-type"=>"text/html; charset=ISO-8859-1",
"date"=>"Mon, 17 Dec 2012 14:37:29 GMT",
"server"=>"gws",
"x-xss-protection"=>"1; mode=block",
"set-cookie"=>
"PREF=ID=d2fb8a93d369bcd2:FF=0:TM=1355755049:LM=1355755049:S=ONVSP6n2jtluFgll; expires=Wed, 17-Dec-2014 14:37:29 GMT; path=/; domain=.google.com, NID=67=OFEvvHCOa3C6wScQCUIKfu_89oL9MSmnFjwN-u5LX_foP8NLsX7G9dq48NLVrf4WUXhqOA1jb38s0e9qeRp1Iwx_LT_N8IuF0Qi6dXVtR2zdvA86INqtfg5uNrKvxJfJ; expires=Tue, 18-Jun-2013 14:37:29 GMT; path=/; domain=.google.com; HttpOnly",
"cache-control"=>"private, max-age=0",
"transfer-encoding"=>"chunked"}
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/open-uri/rdoc/OpenURI/Meta.html

"301 Moved", Google APIs with Opera

In my web app I'm making a CURL call to Google's (unofficial) weather API at:
http://www.google.com/ig/api?weather=,,, ...
This works fine in all cases except when the page involved is accessed in Opera. When accessed in Opera, instead of the weather I get the following in the xml response:
301 Moved</H1>
The document has moved
here.
How can I fix this? I've seen some suggestions around the web that clearing the cache is a solution, but that hasn't worked for me. Note I'm particularly concerned with Opera Mini.
Thanks a lot.
Update 2012-06-20: Tested with Opera 12 and Google has fixed the sniffing issue it seems.
I do not think you can fix it. It would be interesting to know why Google does server-side user agent sniffing and redirects Opera to another URI. Could you paste the full weather URI, so we can test it at Opera ourselves? If do a get of the one you have given into Opera I get
<xml_api_reply version="1">
<weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0">
<problem_cause data=""/>
</weather>
</xml_api_reply>
I also do not get a redirection, but I guess it is because the URI is not the one you have used.
% curl -sI "http://www.google.com/ig/api?weather=,,,"
HTTP/1.1 200 OK
Content-Type: text/xml; charset=ISO-8859-1
Date: Fri, 02 Mar 2012 12:04:44 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate
Set-Cookie: PREF=ID=fe13590e95ceb98e:TM=1330689884:LM=1330689884:S=h1eocRzhNcZ_Kwoa; expires=Sun, 02-Mar-2014 12:04:44 GMT; path=/; domain=.google.com
X-Content-Type-Options: nosniff
Server: igfe
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked
More details will help, and I will complete here.
Update 2012-03-12 First I tried with a simple curl.
→ curl -sI "http://www.google.com/ig/api?weather=,,,40735500,-73986500"
HTTP/1.1 200 OK
Content-Type: text/xml; charset=ISO-8859-1
Date: Mon, 12 Mar 2012 13:16:43 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate
Set-Cookie: PREF=ID=9bc71bbf2edb7ebb:TM=1331558203:LM=1331558203:S=K5Ew69E5IsYhA0s8; expires=Wed, 12-Mar-2014 13:16:43 GMT; path=/; domain=.google.com
X-Content-Type-Options: nosniff
Server: igfe
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked
Then with Opera User agent.
→ curl -sI -A "Opera/9.80 (Macintosh; Intel Mac OS X 10.7.3; U; fr) Presto/2.10.229 Version/11.61" "http://www.google.com/ig/api?weather=,,,40735500,-73986500"
HTTP/1.1 200 OK
Content-Type: text/xml; charset=UTF-8
Date: Mon, 12 Mar 2012 13:17:47 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate
Set-Cookie: PREF=ID=54cc62619394059e:TM=1331558267:LM=1331558267:S=JRCO-WNJMUNMMHsO; expires=Wed, 12-Mar-2014 13:17:47 GMT; path=/; domain=.google.com
X-Content-Type-Options: nosniff
Server: igfe
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked
And finally with Firefox User Agent
→ curl -sI -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0) Gecko/20100101 Firefox/9.0" "http://www.google.com/ig/api?weather=,,,40735500,-73986500"
HTTP/1.1 200 OK
Content-Type: text/xml; charset=UTF-8
Date: Mon, 12 Mar 2012 13:20:09 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate
Set-Cookie: PREF=ID=ab709995945767a8:TM=1331558409:LM=1331558409:S=bom-8pa-x9gGY5Sb; expires=Wed, 12-Mar-2014 13:20:09 GMT; path=/; domain=.google.com
X-Content-Type-Options: nosniff
Server: igfe
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked
There is a no X-Content-Type-Options: nosniff in all cases and no redirection. What is the user agent you are using. You can type about:opera in the addressbar and the user agent string will appear.

Resources