Studio 3 error message re W3C validation - user-agent

Just a quick question from a newbie. Has anyone else using Aptana Studio 3 been getting any error messages when using W3C validation?
Status: 403 Forbidden Vary: Referer Content-type: text/html
Markup Validation Service
Sorry! This document can not be checked.
No User-Agent header found!
A quick Google suggests that other editor/IDE users are experiencing something similar, e.g. HTML-Kit. It looks as if the W3C validation service is expecting a user-agent string as would be provided directly by a browser but presumably isn't by an editor/IDE?
I know that there are ways around the issue by using a different validation service or checking the code via a browser. Just thought I would flag it up.

I've submitted a fix to the Aptana Studio project for this.
this fix involves adding a http user-agent to the post that is sent to w3c.
w3c_validation.rb
Replace the text in the w3c_validation.rb file with the text below.
example path:
C:\Users\user\AppData\Local\Aptana Studio 3\configuration\org.eclipse.osgi\bundles\101\1.cp\bundles\html.ruble\commands\w3c_validation.rb
require 'ruble'
command t(:validate_syntax) do |cmd|
cmd.key_binding = 'CONTROL+M2+V'
cmd.scope = 'text.html'
cmd.output = :show_as_html
cmd.input = :document
cmd.invoke do |context|
$KCODE = 'U'
page = $stdin.read
page.gsub!(/<\?(php|=).*?\?>|<%.*?%>/m, '')
w3c_url = 'http://validator.w3.org/check'
require 'net/http'
require 'uri'
#fix for w3c blocking http requests without a user-agent
#changed the way the http post is sent to w3c so that it includes a user-agent
uri = URI(w3c_url)
req = Net::HTTP::Post.new(uri.path)
req.set_form_data({'ss' => "1", 'fragment' => page})
req['User-Agent'] = 'Aptana'
response = Net::HTTP.start(uri.host, uri.port) do |http|
http.request(req)
end
status = response['x-w3c-validator-status']
content = response.body
content = content.gsub(/<\/title>/, '\&<base href="http://validator.w3.org/">')
# content.gsub!(/Line (\d+),? Column (\d+)/i) do
# # FIXME These links won't work for us!
# "<a href='txmt://open?line=#\$1&column=#{\$2.to_i + 1}'>#\$&</a>"
# end
content
end
end

Your browser in your IDE is non-compliant with W3C HTTP standards
Additionally, you can't check pages that haven't been published to the web with the web tool. You need to set up a small test server to use it.
First, use a different browser. All browsers should send a user-agent string, no websites should depend on anything more than the existence of that string, or to black-list known bad browsers.
You really should test your code in a compliant browser such as Chrome or Firefox with a compliance testing developer tool set, not from your IDE.
If you must, you can instead download the client side version of the tools from the W3C website to check your code. It is a command line tool available from all platforms. There are extensions for Firefox, Chrome and derivatives available that use the library version of these tools. (Again, you can't use your IDE browser.)

Related

Copied and pasted Ruby code from Hubspot API but I get an HTTPUnsupportedMediaType415

I am simply trying to do an HTTP PUT request using a Ruby script, and I am literally copying and pasting 100% of the same thing from Hubspot's example. It's working in Hubspot's example, but not mine.
For example, here's the 99% full code from HubSpot API (with my API key redacted):
# https://rubygems.org/gems/hubspot-api-client
require 'uri'
require 'net/http'
require 'openssl'
url = URI("https://api.hubapi.com/crm/v3/objects/deals/4104381XXXX/associations/company/530997XXXX/deal_to_company?hapikey=XXXX")
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Put.new(url)
request["accept"] = 'application/json'
response = http.request(request)
puts response.read_body
When initiated by hubspot, the response is an HTTP 201, but in my Ruby script it's giving me the following error:
=> #<Net::HTTPUnsupportedMediaType 415 Unsupported Media Type readbody=true>
I have tried directly copying and pasting the exact same thing, but no luck. I would copy what I'm using, but it's 100% the same code as above except for the redacted API, deal, and company IDs. I have copied and pasted HubSpot's example directly into my rails console, but I get an unsupported media type error.
I have also tried adding a body to the request, such as request.body = "hello" and nothing.
Any suggestion would be greatly appreciated.
After analyzing a working cURL request and the ruby script via BurpSuite, I determined that the following HTTP header in the request was the culprit:
Content-Type: application/x-www-form-urlencoded
For whatever reason, the Ruby code in the original post uses this content-type by default, even though the user doesn't specify it. Makes no sense.

Why does requests library fail on this URL?

I have a url. When I try to access it programmatically, the backend server fails (I don't run the server):
import requests
r = requests.get('http://www.courts.wa.gov/index.cfm?fa=controller.managefiles&filePath=Opinions&fileName=875146.pdf')
r.status_code # 200
print r.content
When I look at the content, it's an error page, though the status code is 200. If you click the link, it'll work in your browser -- you'll get a PDF -- which is what I expect in r.content. So it works in my browser, but fails in Requests.
To diagnose, I'm trying to eliminate differences between my browser and Requests library. So far I've:
Disabled Javascript
Disabled (and deleted) cookies
Set the User-Agent to be the same in each
But I can't get the thing to work properly in Requests or fail in my browser due to disabling something. Can somebody with a better idea of browser-magic help me diagnose and solve this?
Does the request work in Chrome? If so, you can open the web inspector and right-click the request to copy it as a curl command. Then you'll have access to all the headers, params, and request body, which you can play around with to see which are triggering the failure you're seeing with the requests library.
You're probably running into a server that discriminates based on User-Agent. This works:
import requests
S = requests.Session()
S.headers.update({'User-Agent': 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'})
r = S.get('http://www.courts.wa.gov/index.cfm?fa=controller.managefiles&filePath=Opinions&fileName=875146.pdf')
with open('dl.pdf', 'wb') as f:
f.write(r.content)

Google Places API server key doesn't work using Curl

I have obtained a valid api key from Google Places API. I need to use this on the backend, so I get a server-side key. However, the call does not work using curl nor the Rails console.
It DOES, however, work thru the browser. That said, I have triple checked that I am using the server-side key that I generated. I'm also only using the sample URL that is in the Google places documentation, so all params should be correct. Here is my curl:
curl -v https://maps.googleapis.com/maps/api/place/search/xml?location=-33.8670522,151.1957362&radius=500&types=food&name=harbour&sensor=false&key=my_key
Also, in (Ruby) Rails console:
Net::HTTP.get_response(URI.parse("https://maps.googleapis.com/maps/api/place/search/xml?location=-33.8670522,151.1957362&radius=500&types=food&name=harbour&sensor=false&key=my_key"))
Any ideas? It seems like multiple people have had issues, but there is nothing specific out there for server keys not working.
Thanks!
With CURL, be sure to put quotes around the URL. Otherwise, if you're working in Linux, the URL will be truncated after the first ampersand, which will cause a REQUEST_DENIED response.
For HTTPS with Ruby, the following should work (ref http://www.rubyinside.com/nethttp-cheat-sheet-2940.html):
require "net/https"
require "uri"
uri = URI.parse("https://maps.googleapis.com/maps/api/place/search/xml?location=-33.8670522,151.1957362&radius=500&types=food&name=harbour&sensor=false&key=...")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
print response.body

How can I print information about a NET:HTTPRequest for debug purposes?

I'm new to Ruby coming from Java. I'm trying to make a http get request and I'm getting an http response code of 400. The service I'm calling over http is very particular and I'm pretty sure that my request isn't exactly correct. It'd be helpful to "look inside" the req object after I do the head request (below) to double check that the request_headers that are being sent are what I think I'm sending. Is there a way to print out the req object?
req = Net::HTTP.new(url.host, url.port)
req.use_ssl = true
res = req.head(pathWithScope, request_headers)
code = res.code.to_i
puts "Response code: #{code}"
I tried this: puts "Request Debug: #{req.inspect}" but it only prints this: #<Net::HTTP www.blah.com:443 open=false>
Use set_debug_output.
http = Net::HTTP.new(url.host, url.port)
http.set_debug_output($stdout) # Logger.new("foo.log") works too
That and more in http://github.com/augustl/net-http-cheat-sheet :)
If you want to see & debug exactly what your app is sending, not just see its log output, I've just released an open-source tool for exactly this: http://httptoolkit.tech/view/ruby/
It supports almost all Ruby HTTP libraries so it'll work perfectly for this case, but also many other tools & languages too (Python, Node, Chrome, Firefox, etc).
As noted in the other answer you can configure Net::HTTP to print its logs to work out what it's doing, but that only shows you what it's trying to do, it won't help you if you use any other HTTP libraries or tools (or use modules that do), and it requires you to change your actual application code (and remember to change it back).
With HTTP Toolkit you can just click a button to open a terminal, run your Ruby code from there as normal, and every HTTP request sent gets collected automatically.

How to request for gzipped pages from web servers through ruby scripts?

I have a ruby script that goes and saves web pages from various sites, how do i make sure that it checks if the server can send gzipped files and saves them if available...
any help would be great!
One can send custom headers as hashes ...
custom_request = Net::HTTP::Get.new(url.path, {"Accept-Encoding" => "gzip"})
you can then check the response by defining a response object as :
response = Net::HTTP.new(url.host, url.port).start do |http|
http.request(custom_request)
end
p [response['Content-Encoding']
Thanks to those who responded...
You need to send the following header with your request:
Accept-Encoding: gzip,deflate
However, I am still reading how to code ruby and dont know how to do the header syntax in the net/http library (which I assume you are using to make the request)
Edit:
Actually, according to the ruby doc it appears the this header is part of the default header sent if you dont specify other 'accept-encoding' headers.
Then again, like I said in my original answer, I am still just reading the subject so I could be wrong.
For grabbing web pages and doing stuff with them, ScrubyIt is terrific.

Resources