State object once and retrieve multiple attributes - ruby

Here is a snippet from a Sinatra app where users will be submitting urls. I must ensure that http:// is prepended to the url in order to route outside my application. How can I state site once and access it's attributes? (Line 3)
p.params= "www.ruby-lang.org/en/"
site = URI(p.params[:url])
site = "http://" + site.host + site.path + site.query

If you need to ensure the url begins with http://, why not use a regex?
p.params = "www.ruby-lang.org/en/"
site = p.params.gsub(/^(­?!http:\/\­/)/, "http­://")
# site = http://www.ruby-lang.org/en/
^(­?!http:\/\­/) matches only when the beginning of the string is not followed by http://

Related

Domain name in ASCII doc

Is there anything is ASCII doc where I can fetch the domain for the site. For example: if I am viewing the docs on example.com website then the doc should display HOST: example.com. If I am viewing on local instance then doc should display HOST: localhost:8443
You can use JavaScript to extract the Host-Name, like
var host = window.location.host
And then you could set this value the JavaScript way, like
var hostDiv = document.getElementById('hostDiv');
hostDiv.innerHTML = host;
I use docinfo-footer for such snippets. See https://asciidoctor.org/docs/user-manual/#docinfo-file

`open_http': 403 Forbidden (OpenURI::HTTPError) for the string "Steve_Jobs" but not for any other string

I was going through the Ruby tutorials provided at http://ruby.bastardsbook.com/ and I encountered the following code:
require "open-uri"
remote_base_url = "http://en.wikipedia.org/wiki"
r1 = "Steve_Wozniak"
r2 = "Steve_Jobs"
f1 = "my_copy_of-" + r1 + ".html"
f2 = "my_copy_of-" + r2 + ".html"
# read the first url
remote_full_url = remote_base_url + "/" + r1
rpage = open(remote_full_url).read
# write the first file to disk
file = open(f1, "w")
file.write(rpage)
file.close
# read the first url
remote_full_url = remote_base_url + "/" + r2
rpage = open(remote_full_url).read
# write the second file to disk
file = open(f2, "w")
file.write(rpage)
file.close
# open a new file:
compiled_file = open("apple-guys.html", "w")
# reopen the first and second files again
k1 = open(f1, "r")
k2 = open(f2, "r")
compiled_file.write(k1.read)
compiled_file.write(k2.read)
k1.close
k2.close
compiled_file.close
The code fails with the following trace:
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:277:in `open_http': 403 Forbidden (OpenURI::HTTPError)
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:616:in `buffer_open'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:164:in `open_loop'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:162:in `catch'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:162:in `open_loop'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:132:in `open_uri'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:518:in `open'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:30:in `open'
from /Users/arkidmitra/tweetfetch/samecode.rb:11
My problem is not that the code fails but that whenever I change r2 to anything other than Steve_Jobs, it works. What is happening here?
Your code runs fine for me (Ruby MRI 1.9.3) when I request a wiki page that exists.
When I request a wiki page that does NOT exist, I get a mediawiki 404 error code.
Steve_Jobs => success
Steve_Austin => success
Steve_Rogers => success
Steve_Foo => error
Wikipedia does a ton of caching, so if you see reponses for "Steve_Jobs" that are different than other people who do exist, then best-guess this is because wikipedia is caching the Steve Jobs article because he's famous, and potentially adding extra checks/verifications to protect the article from rapid changes, defacings, etc.
The solution for you: always open the url with a User Agent string.
rpage = open(remote_full_url, "User-Agent" => "Whatever you want here").read
Details from the Mediawiki docs: "When you make HTTP requests to the MediaWiki web service API, be sure to specify a User-Agent header that properly identifies your client. Don't use the default User-Agent provided by your client library, but make up a custom header that includes the name and the version number of your client: something like "MyCuteBot/0.1".
On Wikimedia wikis, if you don't supply a User-Agent header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. See our User-Agent policy."
I think this happens for locked down entries like "Steve Jobs", "Al-Gore" etc. This is specified in the same book that you are referring to:
For some pages – such as Al Gore's locked-down entry – Wikipedia will
not respond to a web request if a User-Agent isn't specified. The
"User-Agent" typically refers to your browser, and you can see this by
inspecting the headers you send for any page request in your browser.
By providing a "User-Agent" key-value pair, (I basically use "Ruby"
and it seems to work), we can pass it as a hash (I use the constant
HEADERS_HASH in the example) as the second argument of the method
call.
It is specified later at http://ruby.bastardsbook.com/chapters/web-crawling/

htaccess internal and external request distinction

I have a problem with an .htaccess file. I've tried googling but could not find anything helpful.
I have an AJAX request loading pages into the index.php. The link triggering it is getting prepended by "#" via jquery. So if you click on the link domain.com/foo/bar (a wordpress permalink) you get domain.com/#/foo/bar in the browser and the content will get loaded via AJAX.
My problem is: Since these are blog posts, external links grab the real link (domain.com/foo/bar), so I want them to get redirected to domain.com/#/foo/bar (cause then ajax checks the hash and does its magic).
Example here.
The jquery code for the prepend is:
$allLinks.each(function() {
$(this).attr('href', '#' + this.pathname);
...
and then the script checks
if (hash) { //we know what we want, the url is not the home page!
hash = hash.substring(1);
URL = 'http://' + top.location.host + hash;
var $link = $('a[href="' + URL + '"]'), // find the link of the url
...
Now I am trying to get the redirect to work with htaccess. I need to check if the request is external or internal
RewriteCond %{REMOTE_HOST} !^127\.0\.0\.1 #???
and if the uri starts with "/#/" which is a problem since it's a comment then, \%23 does not really work somehow.
RewriteCond %{REQUEST_URI} !^/\%23/(.*)$ #???
How do I get this to work to simply redirect an external request from domain.com/foo/bar to domain.com/#/foo/bar without affecting the internal AJAX stuff?
I suppose your $allinks variable is assigned in a fashion similar to this:
$allinks = $('a');
Do this instead:
$allinks = $('a[href^="' + document.location.protocol + '//' + document.location.hostname + '"]');
This will transform internal links to your hash-y style only.
Ok i've done it with PHP here is the code
$path = $_SERVER["REQUEST_URI"];
if(isset($_SERVER['HTTP_X_REQUESTED_WITH']) && strtolower($_SERVER['HTTP_X_REQUESTED_WITH']) == 'xmlhttprequest') {
echo "It's ajax";
} else {
if(strpos($path, '/#/') === false) {
header("Location: http://schnellebuntebilder.de/#".$path); //ONLY WORKS IF THERE IS NO BODY TAG
}
}
There sure is a better solution, but this does the trick for now and since the page /foo/bar does, in my case, not include the header.php there is no >body<-tag and the php "header()" function works . If anyone knows the htaccess script for this I am keen to know and learn.

How to retrieve the web site URL that provides the email service from an email string?

I am using Ruby on Rails v3.0.9 and I am finding the best way to retrieve the "last part" of a email string and the related web site URL (that is, the web site that provides the email service).
For example, if I have
sample_email_title#gmail.com
I would like to retrieve
gmail.com
and "transform" that so to have the following:
http://www.gmail.com
How can I accomplish that?
You could do something like this:
a = "my_email#gmail.com"
b = a.split("#").last
=> "gmail.com"
"http://www." + b
=> "http://www.gmail.com"
You could do it all in one line with:
"http://www." + "my_email#gmail.com".split('#').last
There may be better ways, but this is fairly simple.
The mail exchange server will often be on a different domain than the email address, so you will have to lookup the MX records using the DNS server to get that information:
require 'resolv'
def mx_host_of_domain(domain)
mx = nil
Resolv::DNS.open do |dns|
servers = dns.getresources(domain, Resolv::DNS::Resource::IN::MX)
if servers && !servers.empty?
mx = servers.sort_by(&:preference).first.exchange.to_s
end
end
mx
end
email = 'stackoverflow' + '#' + 'larshaugseth.com'
mxhost = mx_host_of_domain email.split('#').last
# => in1.smtp.messagingengine.com
url = "http://www.#{mxhost.split('.').last(2).join('.')}/"
# => http://www.messagingengine.com/
Note that there is no guarantee for a web server to be located at this address. In my case the real web address to the email service is https://www.fastmail.fm/, but luckily the one generated by using the above method will forward you there.

Making a URL in a string usable by Ruby's Net::HTTP

Ruby's Net:HTTP needs to be given a full URL in order for it to connect to the server and get the file properly. By "full URL" I mean a URL including the http:// part and the trailing slash if it needs it. For instance, Net:HTTP won't connect to a URL looking like this: example.com, but will connect just fine to http://example.com/. Is there any way to make sure a URL is a full URL, and add the required parts if it isn't?
EDIT: Here is the code I am using:
parsed_url = URI.parse(url)
req = Net::HTTP::Get.new(parsed_url.path)
res = Net::HTTP.start(parsed_url.host, parsed_url.port) {|http|
http.request(req)
}
If this is only doing what the sample code shows, Open-URI would be an easier approach.
require 'open-uri'
res = open(url).read
This would do a simple check for http/https:
if !(url =~ /^https?:/i)
url = "http://" + url
end
This could be a more general one to handle multiple protocols (ftp, etc.)
if !(url =~ /^\w:/i)
url = "http://" + url
end
In order to make sure parsed_url.path gives you a proper value (it should be / when no specific path was provided), you could do something like this:
req = Net::HTTP::Get.new(parsed_url.path.empty? ? '/' : parsed_url.path)

Resources