State object once and retrieve multiple attributes - ruby

Here is a snippet from a Sinatra app where users will be submitting urls. I must ensure that http:// is prepended to the url in order to route outside my application. How can I state site once and access it's attributes? (Line 3)
p.params= ""
site = URI(p.params[:url])
site = "http://" + + site.path + site.query

If you need to ensure the url begins with http://, why not use a regex?
p.params = ""
site = p.params.gsub(/^(­?!http:\/\­/)/, "http­://")
# site =
^(­?!http:\/\­/) matches only when the beginning of the string is not followed by http://


Domain name in ASCII doc

Is there anything is ASCII doc where I can fetch the domain for the site. For example: if I am viewing the docs on website then the doc should display HOST: If I am viewing on local instance then doc should display HOST: localhost:8443
You can use JavaScript to extract the Host-Name, like
var host =
And then you could set this value the JavaScript way, like
var hostDiv = document.getElementById('hostDiv');
hostDiv.innerHTML = host;
I use docinfo-footer for such snippets. See

`open_http': 403 Forbidden (OpenURI::HTTPError) for the string "Steve_Jobs" but not for any other string

I was going through the Ruby tutorials provided at and I encountered the following code:
require "open-uri"
remote_base_url = ""
r1 = "Steve_Wozniak"
r2 = "Steve_Jobs"
f1 = "my_copy_of-" + r1 + ".html"
f2 = "my_copy_of-" + r2 + ".html"
# read the first url
remote_full_url = remote_base_url + "/" + r1
rpage = open(remote_full_url).read
# write the first file to disk
file = open(f1, "w")
# read the first url
remote_full_url = remote_base_url + "/" + r2
rpage = open(remote_full_url).read
# write the second file to disk
file = open(f2, "w")
# open a new file:
compiled_file = open("apple-guys.html", "w")
# reopen the first and second files again
k1 = open(f1, "r")
k2 = open(f2, "r")
The code fails with the following trace:
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:277:in `open_http': 403 Forbidden (OpenURI::HTTPError)
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:616:in `buffer_open'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:164:in `open_loop'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:162:in `catch'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:162:in `open_loop'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:132:in `open_uri'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:518:in `open'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:30:in `open'
from /Users/arkidmitra/tweetfetch/samecode.rb:11
My problem is not that the code fails but that whenever I change r2 to anything other than Steve_Jobs, it works. What is happening here?
Your code runs fine for me (Ruby MRI 1.9.3) when I request a wiki page that exists.
When I request a wiki page that does NOT exist, I get a mediawiki 404 error code.
Steve_Jobs => success
Steve_Austin => success
Steve_Rogers => success
Steve_Foo => error
Wikipedia does a ton of caching, so if you see reponses for "Steve_Jobs" that are different than other people who do exist, then best-guess this is because wikipedia is caching the Steve Jobs article because he's famous, and potentially adding extra checks/verifications to protect the article from rapid changes, defacings, etc.
The solution for you: always open the url with a User Agent string.
rpage = open(remote_full_url, "User-Agent" => "Whatever you want here").read
Details from the Mediawiki docs: "When you make HTTP requests to the MediaWiki web service API, be sure to specify a User-Agent header that properly identifies your client. Don't use the default User-Agent provided by your client library, but make up a custom header that includes the name and the version number of your client: something like "MyCuteBot/0.1".
On Wikimedia wikis, if you don't supply a User-Agent header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. See our User-Agent policy."
I think this happens for locked down entries like "Steve Jobs", "Al-Gore" etc. This is specified in the same book that you are referring to:
For some pages – such as Al Gore's locked-down entry – Wikipedia will
not respond to a web request if a User-Agent isn't specified. The
"User-Agent" typically refers to your browser, and you can see this by
inspecting the headers you send for any page request in your browser.
By providing a "User-Agent" key-value pair, (I basically use "Ruby"
and it seems to work), we can pass it as a hash (I use the constant
HEADERS_HASH in the example) as the second argument of the method
It is specified later at

htaccess internal and external request distinction

I have a problem with an .htaccess file. I've tried googling but could not find anything helpful.
I have an AJAX request loading pages into the index.php. The link triggering it is getting prepended by "#" via jquery. So if you click on the link (a wordpress permalink) you get in the browser and the content will get loaded via AJAX.
My problem is: Since these are blog posts, external links grab the real link (, so I want them to get redirected to (cause then ajax checks the hash and does its magic).
Example here.
The jquery code for the prepend is:
$allLinks.each(function() {
$(this).attr('href', '#' + this.pathname);
and then the script checks
if (hash) { //we know what we want, the url is not the home page!
hash = hash.substring(1);
URL = 'http://' + + hash;
var $link = $('a[href="' + URL + '"]'), // find the link of the url
Now I am trying to get the redirect to work with htaccess. I need to check if the request is external or internal
RewriteCond %{REMOTE_HOST} !^127\.0\.0\.1 #???
and if the uri starts with "/#/" which is a problem since it's a comment then, \%23 does not really work somehow.
RewriteCond %{REQUEST_URI} !^/\%23/(.*)$ #???
How do I get this to work to simply redirect an external request from to without affecting the internal AJAX stuff?
I suppose your $allinks variable is assigned in a fashion similar to this:
$allinks = $('a');
Do this instead:
$allinks = $('a[href^="' + document.location.protocol + '//' + document.location.hostname + '"]');
This will transform internal links to your hash-y style only.
Ok i've done it with PHP here is the code
$path = $_SERVER["REQUEST_URI"];
if(isset($_SERVER['HTTP_X_REQUESTED_WITH']) && strtolower($_SERVER['HTTP_X_REQUESTED_WITH']) == 'xmlhttprequest') {
echo "It's ajax";
} else {
if(strpos($path, '/#/') === false) {
header("Location:".$path); //ONLY WORKS IF THERE IS NO BODY TAG
There sure is a better solution, but this does the trick for now and since the page /foo/bar does, in my case, not include the header.php there is no >body<-tag and the php "header()" function works . If anyone knows the htaccess script for this I am keen to know and learn.

How to retrieve the web site URL that provides the email service from an email string?

I am using Ruby on Rails v3.0.9 and I am finding the best way to retrieve the "last part" of a email string and the related web site URL (that is, the web site that provides the email service).
For example, if I have
I would like to retrieve
and "transform" that so to have the following:
How can I accomplish that?
You could do something like this:
a = ""
b = a.split("#").last
=> ""
"http://www." + b
=> ""
You could do it all in one line with:
"http://www." + "".split('#').last
There may be better ways, but this is fairly simple.
The mail exchange server will often be on a different domain than the email address, so you will have to lookup the MX records using the DNS server to get that information:
require 'resolv'
def mx_host_of_domain(domain)
mx = nil do |dns|
servers = dns.getresources(domain, Resolv::DNS::Resource::IN::MX)
if servers && !servers.empty?
mx = servers.sort_by(&:preference)
email = 'stackoverflow' + '#' + ''
mxhost = mx_host_of_domain email.split('#').last
# =>
url = "http://www.#{mxhost.split('.').last(2).join('.')}/"
# =>
Note that there is no guarantee for a web server to be located at this address. In my case the real web address to the email service is, but luckily the one generated by using the above method will forward you there.

Making a URL in a string usable by Ruby's Net::HTTP

Ruby's Net:HTTP needs to be given a full URL in order for it to connect to the server and get the file properly. By "full URL" I mean a URL including the http:// part and the trailing slash if it needs it. For instance, Net:HTTP won't connect to a URL looking like this:, but will connect just fine to Is there any way to make sure a URL is a full URL, and add the required parts if it isn't?
EDIT: Here is the code I am using:
parsed_url = URI.parse(url)
req =
res = Net::HTTP.start(, parsed_url.port) {|http|
If this is only doing what the sample code shows, Open-URI would be an easier approach.
require 'open-uri'
res = open(url).read
This would do a simple check for http/https:
if !(url =~ /^https?:/i)
url = "http://" + url
This could be a more general one to handle multiple protocols (ftp, etc.)
if !(url =~ /^\w:/i)
url = "http://" + url
In order to make sure parsed_url.path gives you a proper value (it should be / when no specific path was provided), you could do something like this:
req = ? '/' : parsed_url.path)
