caching problem - shell

I wrote one script which is running on the linux machine.It fetches data from one url and displays the content on a page.
The problem I am facing is some time if I refresh the page 4-5 times it displays the old content and not the latest one.
The problem could be because of caching proxy which is still caching old content.
Please tell me what to write in the script which automatically delete the caching proxy.

You should try using the Cache-Control HTTP header in your request, to tell the proxy (if there is one) not to cache the result.
See RFC 2616 for an explanation.

Take a look here: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/topic/com.ibm.websphere.express.doc/info/exp/ae/twbs_cookie.html
and set the following HTTP headers:
Expires with the value a hard coded GMT date in the past
Last-Modified with the value the current date in GMT formatted "EEE, d MMM yyyy HH:mm:ss"
Cache-Control with the following value 'no-store, no-cache, must-revalidate'
Cache-Control with the following value 'post-check=0, pre-check=0'
Pragma with the following value 'no-cache'

Related

In Progress Telerik Fiddler Web Debugger is there a way to adjust Content-Length automatically in AutoResponder?

I am using fiddler autoresponder to return a different JS file than the one loaded from my server originally. The adjusted file uses:
HTTP/1.1 200 OK
Cache-Control: private, max-age=31536000
Content-Type: application/javascript
...other headers
Content-Length: 37010
...the javascript code
At the top of the file, this Content-Length header is not automatically adjusted to the edited file though. So I have to try and load my changes, my app will crash because the Content-Length is wrong, but then I check fiddlers 'transformer' tab to see how many bytes my request body actually is, update that in my modified file, refresh again and then it works.
I have tried to change the encoding to chunked, so that I could leave out the Content-Length header, but I don't think my app knows how to decode chunked for some reason.
So my question is, is there any way to automatically update the Content-Length in the auto-responder?
You can simply use FiddlerScript in Fiddler classic to build your auto responder. That way the content-length is set automatically:
static function OnBeforeRequest(oSession: Session) {
// ... some other FiddlerScript code
// host is e.g. "localhost:3000"
if (oSession.HostnameIs("<host>") && oSession.uriContains("<file name>.js")) {
oSession.utilCreateResponseAndBypassServer();
oSession["ui-backcolor"] = "lime"; // Makes it more visible
if (!oSession.LoadResponseFromFile("<file path>.js")) {
throw new ApplicationException("LoadResponseFromFile Failed!! ");
}
// Just loads forever if Content-Length is not added
oSession.oResponse["Content-Length"] = oSession.responseBodyBytes.GetLength(0);
}
}

ASP Cache Control and ETags without access to server

I'm looking to optimize my ASP pages. Google best practices tell me to set expiration dates, modification dates and ETags etc.
I get the logic of it all, but I don't understand the details of the implementation and since we're using ASP pages, it's very hard to find the information and I don't have access to change anything directly on the server.
Would this be the type of info I would be looking to configure? And with what?
<%
Response.CacheControl = "no-cache"
Response.AddHeader "Pragma", "no-cache"
Response.Expires = -1
%>
And how do ETags work? Do I just decide on a number for the url or does there have to be some logic?
Most of the pages on our site have content that only changes 1 or 2 a year and then image files that sometimes change daily or weekly.
Don't forget to return status 304 Not Modified when you detect a cache hit.
ie. Etags, you would need to generate a unique id for the bit of content/page first, then return it as a header. Then the browser will add that request header next time it try's to fetch it, but if you don't return the right status when you see the request header, then it's not implemented completely.
In this case you can do something like this:
eTag = "t153120141610"
If eTag = Request.ServerVariables("HTTP_ETAG") Then
Response.Status = "304 Not Modified"
Response.End
End If
Response.AddHeader "etag",eTag
Response.Write "Cache me next time via eTag: " & eTag
Just make sure you're generating a unique id, because it's easy to return content of entirely something else that's already stored in the clients browser with the same eTag. Good luck.
Edit:
A common way of making a self-signed eTag, would be to first identify all the types of content that will be cached (ie: Forum Page, Post Page, Profile Page) and usually there is an Id to each one of these. Then encode it using md5 or base64.
So you can do something like this if you wanted to keep it in the clients browser for 1 day.
Profile Page:
eTag = Base64Encode("Profile" & ProfileId & Date)
This means the eTag will change every day, therefore, it will be cached for 1 day, then the following day a new eTag will be generated and the old eTag won't be recognizable anymore. You can also have a last modified date field in a database or file and use that instead of Date, so it only changes when the content changes.
(NOTE: I don't use etags, I use Last-modified and If-Modified-Since headers instead for browser-side caching, for the obvious reasons.)

moving from Joomla to Refinery

I have an idea to move site from PHP (Joomla + pure php) to Ruby on Rails (Refinery CMS). Old site has links with .html and .php extensions. Is there a way to keep old urls the same (I don't want to loose Google PageRank)? I tried to use config.use_custom_slugs = true config with Refenery, but it drop '.' from url. (store.php becomes storephp, also FAQ.html becomes faqhtml ... etc )
Any help appreciated! Thanks.
In Rails I can do it next way
#Application Controller
def unknown_path
#it works for all urls, here I can load page with needed slug
render :text => "#{request.fullpath}"
end
#routes.rb
match "*a", :to => "application#unknown_path" #in the end
And it will make any url working. So I could use custom slug if it exist or raise 404
But CMS dosn't allow to create really custom slugs
Why not 301
Trying to explain: you get an external link coming to Page 1, then your Page 1 links internally to Page 2. Link to page 1 gets 1000 amount of page rank from the link. Link to page 2 gets 900 Therefore a link to 301 gets 1000 and the page that the 301 points to gets 900. So: 900 is better than it disappearing altogether, but I'm trying avoiding creating this situation. That how it works.
As per my answer on Refinery's issue tracker, where this was also asked, do you have to do this in Rails? What about rewriting the request before it even gets to Rails using proxy software such as nginx with HttpRewriteModule?
I ran into a similar issue when moving from wordpress to Comfortable Mexican Sofa ( another Rails CMS engine).
I ended up doing this in my routes.rb ( here is a sample of one of the redirects ( I don't have a lot - total of 15 redirects like that), which is generic Rails solution that can be used by RefineryCMS as well I think.
get '/2009/01/28/if-programming-languages-were-cars/', to: redirect('/blog/2009/01/if-programming-languages-were-cars-translated-into-russian')
This actually generates a proper redirect like so ( impossible to see in the browser, but if you curl you'll see this:
curl http://gorbikoff.com/2009/01/28/if-programming-languages-were-cars/
<html>
<body>
You are being redirected.
</body>
</html>
And if we check headers - we see a proper 301 response:
curl -I http://gorbikoff.com/2009/01/28/if-programming-languages-were-cars/
returns
HTTP/1.1 301 Moved Permanently
Content-Type: text/html
Content-Length: 158
Connection: keep-alive
Status: 301 Moved Permanently
Location: http://gorbikoff.com/blog/2009/01/if-programming-languages-were-cars-translated-into-russian
X-UA-Compatible: IE=Edge,chrome=1
Cache-Control: no-cache
X-Request-Id: 9ec0d0b29e94dcc26433c3aad034eab1
X-Runtime: 0.002247
Date: Wed, 10 Jul 2013 15:11:22 GMT
X-Rack-Cache: miss
X-Powered-By: Phusion Passenger 4.0.5
Server: nginx/1.4.1 + Phusion Passenger 4.0.5
Specific to your case
So that's the general approach. However for what you are trying to do ( redirect all urls that end in .php like yourdomain.com/store.php to yourdomain.com/store) you should be able to do something like this. This assumes that you'll (map your new structure exactly, otherwise you may need to create a bunch of custom redirects like I mentioned in the beginning or do some Regex voodoo) :
NON-Redirect Solution
Don't redirect user, just render new page at the same address (it's a twist on solution you put in your question):
This is the only way that's more or less robust, if you don't want to do a redirect.
# routes.rb
# You don't need to match for .html - as they will be handled automatically by rails.
match '/*\.php', to: "application#php_path"
# application_controller.rb
# This doesn't have to be application controller,
# you may do this in your home_controller if you wish.
def php_path
require 'open-uri'
file = open "http://#{request.host_with_port}#{request.fullpath.gsub!(/(.php)/,'')}"
render :inline => file.read
end
Redirect Solution:
According to Google 301 is the preferred way https://support.google.com/webmasters/answer/93633?hl=en .
match '/*\.php', to: redirect {|params, request|
"http://#{request.host_with_port}#{request.fullpath.gsub(/(.php)/,'')}"
}
This will provide a Response Header Status Code 200. As far as the world is concerned you are serving PHP through passenger ( run curl -I against my site using this sample urls - these are just arbitrary params to illustrate my point)
curl -I http://gorbikoff.com/about?name=nick
curl -I http://gorbikoff.com/about.php?name=nick
You may need to fidget with this based on your specifics ( https vs http, etc, and maybe some virtual routes need to be address separately. Also remember about precedence in the route.rb ) but I think this might work for you.
EDIT
I just realized that this solution works out of the box in Comfortable Mexican Sofa ( it just ignores format ( .php is treated same way .html would be treated). However I tried my solutions in non-cms based rails 3 project I have ( it's not public) and my solution still holds with slight change - I just fixed it (sorry for the confusion).
I suggest also to check official guide at (I'm assuming you are on Rails 3.2.13 since RefinderyCMS doesn't support Rails 4 officially yet)
http://guides.rubyonrails.org/v3.2.13/routing.html#redirection
And check out Rails API
http://api.rubyonrails.org/v3.2.13/classes/ActionDispatch/Routing/Redirection.html#method-i-redirect
Hope that helps

Change Referrer in header using Varnish

I think this is a possiblity with varnish where you can change the referrer in the header of its users and then serve them the content either from cache or from the server. I want to know how can that be made possible.
I tried this with "req.http.referer" and then "set req.http.referer" in varnish 2.1 on centos 32-bit machine but it didn't work when i checked the results with the command "varnishtop -i TxHeader -I Referer".
Anyone got any other ideas better than this?
At least on Varnish 3.0 the following works as expected. Obviously if the response is served from cache and you are not using the req.http.Referer for hash(), it doesn't matter how you change the referer header.
# Modify Referer header
sub vcl_recv {
if (req.http.Referer) {
# Referer was set. Replace foo with bar
set req.http.Referer = regsub(req.http.Referer,"foo","bar");
} else {
# Referer was not set. Set it to something anyway.
set req.http.Referer = "http://referer.was.empty/";
}
}
Also note that varnishtop -i TxHeader -I Referer is case sensitive. If you set req.http.referer then it will not match -I Referer even though your HTTP backend will understand the referer: header as well (according to RFC 2612 4.2 message headers are case insensitive).

Can I get the date when an HTTP file was modified?

I'm trying to check if a file (on web) was modified since the last time I checked. Is it possible to do this by getting http headers to read the last time the file was modified (or uploaded)?
You can use the built-in Net::HTTP library to do most of this for you:
require 'net/http'
Net::HTTP.start('stackoverflow.com') do |http|
response = http.request_head('/robots.txt')
response['Last-Modified']
# => Sat, 04 Jun 2011 08:51:44 GMT
end
If you want, you can convert that to a proper date using Time.parse.
As #tadman says in his answer, a HTTP "HEAD" request is the proper way to check the last modification date.
You can also do it using a conditional GET request using the "IF-*" modifier headers.
Which to use depends on whether you intend to immediately download the page. If you just want the date use HEAD. If you want the content if there has been a change use GET with the "IF-*" headers.

Resources