Etags in Rails 5 with Phusion/Apache 2.4 - passenger

When using fresh_when strong_etag: #collection in a Rails view, the etag is generated before being gzipped by Apache, thus it lacks the "-gzip" suffix.
However, if Apache is set to gzip responses, the Etag that the browser sends will contain the "-gzip" suffix.
This may or may not be connected to using Phusion Passenger as the proxy server for Rails.
Suggested solutions include having Apache strip the "-gzip" suffix by adding RequestHeader edit "If-None-Match" "^(.*)-gzip$" "$1" to the site's directives. However, this does not appear to work as intended.
Is there another way to rewrite the header to strip the suffix so that Etags can be successfully compared?

This can be accomplished in Rails by adding a before_action to the Application controller (to affect all requests).
class ApplicationController < ActionController::Base
before_action :fix_etag_header
...
private
def fix_etag_header
if request.headers["HTTP_IF_ONE_MATCH"]
request.headers["HTTP_IF_NONE_MATCH"].sub! "-gzip", ""
end
end
end
Obviously, this circumvents the directive that a compressed resource should have a unique Etag from the same resource uncompressed. That could cause issues if those aspects of an HTTP request are altered or addressed by other code.
However, in most instances, this is a "good enough" solution especially when considering that it requires only 6 lines.

Related

How to remove '/' from end of Sinatra routes

I'm using Sinatra and the shotgun server.
When I type in http://localhost:9393/tickets, my page loads as expected. But, with an extra "/" on the end, Sinatra suggests that I add
get '/tickets/' do
How do I get the server to accept the extra "/" without creating the extra route?
The information in Sinatra's "How do I make the trailing slash optional?" section looks useful, but this means I would need to add this code to every single route.
Is there an easier or more standard way to do that?
My route is set up as
get '/tickets' do
It looks like the FAQ doesn't mention an option that was added in 2017 (https://github.com/sinatra/sinatra/pull/1273/commits/2445a4994468aabe627f106341af79bfff24451e)
Put this in the same scope where you are defining your routes:
set :strict_paths, false
With this, Sinatra will treat /tickets/ as if it were /tickets so you don't need to add /? to all your paths
This question is actually bigger than it appears at first glance. Following the advice in "How do I make the trailing slash optional?" does solve the problem, but:
it requires you to modify all existing routes, and
it creates a "duplicate content" problem, where identical content is served from multiple URLs.
Both of these issues are solvable but I believe a cleaner solution is to create a redirect for all non-root URLs that end with a /. This can easily be done by adding Sinatra's before filter into the existing application controller:
before '/*/' do
redirect request.path_info.chomp('/')
end
get '/tickets' do
…
end
After that, your existing /tickets route will work as it did before, but now all requests to /tickets/ will be redirected to /tickets before being processed as normal.
Thus, the application will respond on both /ticket and /tickets/ endpoints without you having to change any of the existing routes.
PS: Redirecting the root URL (eg: http://localhost:9393/ → http://localhost:9393) will create an infinite loop, so you definitely don't want to do that.

Add a prefix to generated links, but not to incoming routes

Our Rails 4 application needs to be accessible over an archaic portal. This portal works by adding (from the perspective of the browser) a prefix to each URL; this prefix is removed by the portal before forwarding the request to my application.
So the browser calls https://portal.company.com/portal/prefix/xyzzy/myapp/mymodel/new; the portal does its thing and requests https://myserver.company.com/myapp/mymodel/new (passing along the stripped prefix in some irrelevant way). The prefix is dynamic and can change between requests.
The problem is that the portal is not able to rewrite the HTML pages served by my application. That is, it does not put in the prefix. It expects applications to either only emit relative URLs, or to add the portal prefix themselves.
So:
A regular URL /myapp/mymodel/new, for example, must stay as is for when the application is accessed directly (for certain users which do not use the portal).
When accessed over the portal, our application must still understand /myapp/mymodel/new as usual, but when using mymodel_new_path or link_to #mymodel or form_for #my_model or whatever other magic URL generators there are, it has to add the portal prefix. So, any URL emitted by the application must look like /portal/prefix/xyzzy/myapp/mymodel/new where the per-request string /portal/prefix/xyzzy is given by some method defined by us (and the part xyzzy can change between requests).
How can I achieve that? My routes.rb looks like this today:
MyApp::application.routes.draw do
scope ' /myapp' do
get ...
This probably has to stay as is, because URLs in incoming requests do not change when coming from the portal. But how do I influence the outgoing URLs?
This suggestion will allow you to easily prefix the urls produced by the Rails path helpers as your require. Do note, however, it will also make these extended paths valid requests for your application - they shoud just route where expected but you'll get get some extra values in the params hash that you can ignore, so I suspect this is possibly acceptable.
First, add all the prefix bits as optional parameters to your routes' base scope:
scope '(:portal/)(:prefixA/)(:prefixB)/myapp' do
# routes
end
Note that the those optional params cannot include the / char without it being escaped by the path helpers, so if you have a few levels in the prefix (which it appears you do in the question) you'll need a few different params, all but the last followed by a slash, as above.
With that done, you should define default_url_options in your ApplicationController, it should return a hash of the values you need in your routes:
def default_url_options(_options={})
{
portal: 'portal',
prefixA: 'whatevertheprefixis',
prefixB: 'nextbitoftheprefix'
}
end
And that should do it, path helpers (along with link_to #object etc) should all now include those values every time you use them.
Note that since the portal bit at the start is also an optional parameter, you can simply add additional logic to default_url_options and have it return an empty hash whenever you do not want this prefixing behaviour.

Skip large pages in Mechanize (Ruby)

I'm trying to skip processing a few large pages (some over 10MB) scattered in a result set, as Mechanize (version 2.7.3) crawls an array of links.
Unfortunately I can't find a 'content-length' property or a similar indicator. The Mechanize::FileResponse class has a content_length method but Mechanize::Page does not.
Current approach
At the moment I'm calling content.length on the page. This is very slow when one of the large pages is crawled:
detail_links.each do |detail_link|
detail_page = detail_link.click
# skip long pages
break if detail_page.content.length > 100_000
# rest of the processing
end
Content_length during response_read:
In the Mechanize source code I found a reference to content_length when the response is read. Is querying the response properties a possible solution?
# agent.rb extract from the Mechanize project
def response_read response, request, uri
content_length = response.content_length
if use_tempfile? content_length then
body_io = make_tempfile 'mechanize-raw'
else
body_io = StringIO.new.set_encoding(Encoding::BINARY)
end
Mechanize will normally "get" the entire page. Instead you should use a head request first to get the page size, then conditionally get the page. See "How can I perform a Head request using mechanize in Ruby" for an example.
The thing to be careful of is that a dynamically generated resource might not have a known size when you do the head request, so you could get a response without the size entry. Notice that in the selected answer for the question linked above, that Google didn't return the content-length header because it's a dynamically generated page. Static pages and resources should have the header... unless the server doesn't return them for some reason.
The Mechanize documentation mentions this:
Problems with content-length
Some sites return an incorrect content-length value. Unlike a browser, mechanize raises an error when the content-length header does not match the response length since it does not know if there was a connection problem or if the mismatch is a server bug.
The error raised, Mechanize::ResponseReadError, can be converted to a parsed Page, File, etc. depending upon the content-type:
agent = Mechanize.new
uri = URI 'http://example/invalid_content_length'
begin
page = agent.get uri
rescue Mechanize::ResponseReadError => e
page = e.force_parse
end
In other words, while head can help, it's not necessarily going to give you enough information to allow you to skip huge pages. You have to investigate the site you're crawling and learn how their server responds.

moving from Joomla to Refinery

I have an idea to move site from PHP (Joomla + pure php) to Ruby on Rails (Refinery CMS). Old site has links with .html and .php extensions. Is there a way to keep old urls the same (I don't want to loose Google PageRank)? I tried to use config.use_custom_slugs = true config with Refenery, but it drop '.' from url. (store.php becomes storephp, also FAQ.html becomes faqhtml ... etc )
Any help appreciated! Thanks.
In Rails I can do it next way
#Application Controller
def unknown_path
#it works for all urls, here I can load page with needed slug
render :text => "#{request.fullpath}"
end
#routes.rb
match "*a", :to => "application#unknown_path" #in the end
And it will make any url working. So I could use custom slug if it exist or raise 404
But CMS dosn't allow to create really custom slugs
Why not 301
Trying to explain: you get an external link coming to Page 1, then your Page 1 links internally to Page 2. Link to page 1 gets 1000 amount of page rank from the link. Link to page 2 gets 900 Therefore a link to 301 gets 1000 and the page that the 301 points to gets 900. So: 900 is better than it disappearing altogether, but I'm trying avoiding creating this situation. That how it works.
As per my answer on Refinery's issue tracker, where this was also asked, do you have to do this in Rails? What about rewriting the request before it even gets to Rails using proxy software such as nginx with HttpRewriteModule?
I ran into a similar issue when moving from wordpress to Comfortable Mexican Sofa ( another Rails CMS engine).
I ended up doing this in my routes.rb ( here is a sample of one of the redirects ( I don't have a lot - total of 15 redirects like that), which is generic Rails solution that can be used by RefineryCMS as well I think.
get '/2009/01/28/if-programming-languages-were-cars/', to: redirect('/blog/2009/01/if-programming-languages-were-cars-translated-into-russian')
This actually generates a proper redirect like so ( impossible to see in the browser, but if you curl you'll see this:
curl http://gorbikoff.com/2009/01/28/if-programming-languages-were-cars/
<html>
<body>
You are being redirected.
</body>
</html>
And if we check headers - we see a proper 301 response:
curl -I http://gorbikoff.com/2009/01/28/if-programming-languages-were-cars/
returns
HTTP/1.1 301 Moved Permanently
Content-Type: text/html
Content-Length: 158
Connection: keep-alive
Status: 301 Moved Permanently
Location: http://gorbikoff.com/blog/2009/01/if-programming-languages-were-cars-translated-into-russian
X-UA-Compatible: IE=Edge,chrome=1
Cache-Control: no-cache
X-Request-Id: 9ec0d0b29e94dcc26433c3aad034eab1
X-Runtime: 0.002247
Date: Wed, 10 Jul 2013 15:11:22 GMT
X-Rack-Cache: miss
X-Powered-By: Phusion Passenger 4.0.5
Server: nginx/1.4.1 + Phusion Passenger 4.0.5
Specific to your case
So that's the general approach. However for what you are trying to do ( redirect all urls that end in .php like yourdomain.com/store.php to yourdomain.com/store) you should be able to do something like this. This assumes that you'll (map your new structure exactly, otherwise you may need to create a bunch of custom redirects like I mentioned in the beginning or do some Regex voodoo) :
NON-Redirect Solution
Don't redirect user, just render new page at the same address (it's a twist on solution you put in your question):
This is the only way that's more or less robust, if you don't want to do a redirect.
# routes.rb
# You don't need to match for .html - as they will be handled automatically by rails.
match '/*\.php', to: "application#php_path"
# application_controller.rb
# This doesn't have to be application controller,
# you may do this in your home_controller if you wish.
def php_path
require 'open-uri'
file = open "http://#{request.host_with_port}#{request.fullpath.gsub!(/(.php)/,'')}"
render :inline => file.read
end
Redirect Solution:
According to Google 301 is the preferred way https://support.google.com/webmasters/answer/93633?hl=en .
match '/*\.php', to: redirect {|params, request|
"http://#{request.host_with_port}#{request.fullpath.gsub(/(.php)/,'')}"
}
This will provide a Response Header Status Code 200. As far as the world is concerned you are serving PHP through passenger ( run curl -I against my site using this sample urls - these are just arbitrary params to illustrate my point)
curl -I http://gorbikoff.com/about?name=nick
curl -I http://gorbikoff.com/about.php?name=nick
You may need to fidget with this based on your specifics ( https vs http, etc, and maybe some virtual routes need to be address separately. Also remember about precedence in the route.rb ) but I think this might work for you.
EDIT
I just realized that this solution works out of the box in Comfortable Mexican Sofa ( it just ignores format ( .php is treated same way .html would be treated). However I tried my solutions in non-cms based rails 3 project I have ( it's not public) and my solution still holds with slight change - I just fixed it (sorry for the confusion).
I suggest also to check official guide at (I'm assuming you are on Rails 3.2.13 since RefinderyCMS doesn't support Rails 4 officially yet)
http://guides.rubyonrails.org/v3.2.13/routing.html#redirection
And check out Rails API
http://api.rubyonrails.org/v3.2.13/classes/ActionDispatch/Routing/Redirection.html#method-i-redirect
Hope that helps

Sinatra Url '/' interpretations

I am a ruby newbie and have been trying Sinatra for quite some time now, one thing that Iam not able to figure out is why does a '/' in the url make such a big difference.
I mean isnt:
get 'some_url' do
end
and
get 'some_url/' do
end
Supposed to point to the same route? why is that Sinatra considers it as different routes? I spent a good one hour trying to figure that out.
According to RFC 2616 and RFC 2396 (RFCs defining resource identity) those URLs do not define the same resource. Therefore Sinatra treats them differently. This is esp. important if you imagine the route returning a page with relative links. This link
click me
Would point to /bar if you're coming from /foo, to /foo/bar if you're coming from /foo/.
You can use the following syntax to define a route matching both:
get '/foo/?' do
# ...
end
Or the Regexp version mentioned in the comments above.
They are different routes. The second is a URL with a directory extension ('/'); the first is a URL with no extension. A lot of frameworks (like Rails) will interpret both as the same route, or append the `/' (e.g., Django, and Apache can be configured to do that as well), but technically they are different URLs.

Resources