Serve two pages from two server for one url - caching

I would like to serve two different pages from two servers for one url. Something like facebook does with login page (login page vs profile page on same url).
How will server know what page to serve? I went with cookie because I couldn't think of other solution.
Also cookie removal is needed on logout. I ended up with branch on nginx configuration to push request to right server and removing(setting expired time) cookie there.
Ok and now the bug itself. Chrome caches this url and when user clicks on link(to the same url) chrome skips request to the server and open wrong version from cache. It works when "disable cache" in debug panel is checked and I also confirmed this by checking traffic with wireshark.
To recap urls from browser point of view:
ex.com/ - Server A
ex.com/login_check - Server A -> redirects to / with cookie
ex.com/ - Server B
ex.com/?logout - Server A and remove cookie
ex.com/ - chrome skips request and serves cached content from B
How can be this fixed? Moreover this approach looks like too much magic and many things can go wrong. Could it be done differently?

This can be fixed by adding header for response from both servers.
Cache-Control: private, max-age=0, must-revalidate, no-store;

Related

HTTP url redirects as HTTPS on selenium test run

When I pass an URL to load a website, say, http://yoururl.com, it redirects to https://yoururl.com
I mean, passing an URL with HTTP automatically redirects as https://yoururl.com in the browser URL.
#driver.get("http://yoururl.com")
Browser used: Chrome
Is there a way to stop redirecting the HTTP url as HTTPS?
The Chrome 63 and above versions will no longer take HTTP with domain .dev since you are in the local/dev environment.
https://iyware.com/dont-use-dev-for-development/
Chrome 63 (out since December 2017), will force all domains ending on
.dev (and .foo) to be redirected to HTTPS via a preloaded HTTP Strict
Transport Security (HSTS) header
https://ma.ttias.be/chrome-force-dev-domains-https-via-preloaded-hsts/
There are couple of reasons this would happen.
Redirection at load balancer or reverse proxy level.
This can be fixed by altering web server or LB configuration.
As browsers getting smarter everyday, when you open an https url is browser then next time if you even want to open http url it'll by default go to https because browser already knows that the site supports https as well. So it'll prefer to use secured communication rather text when it is available.
Here is some help for second case https://superuser.com/questions/565409/chrome-how-to-stop-redirect-from-http-to-https

Varnish 4.0 + Magento 1.9 | Login on the first time does not work

So, I have setup Varnish 4.0 + Magento 1.9 using the nexaccess module for Varnish, and I am having some weird behavior on the store with HTTPS.
Since varnish does not support SSL, I am using pound as a SSL termination, and pound acts like a proxy to varnish.
The module is: https://github.com/nexcess/magento-turpentine
Support to SSL: https://github.com/nexcess/magento-turpentine/wiki/SSL_Support
Now the issue in details is the following:
When the user visits the website he is over a HTTP connection in the homepage, when he clicks on the "login" button, he get's redirected to the login page which is over HTTPS. If we look at the cookies in this point, both frontend and frontend_cid are set. When the user try to login he get's redirected back to the login page with no error messages (if we look at the cookies again now, they got generated again, they are different from the first request). If the user try to login again, for the second time, everything works just fine.
Now some behaviors I was able to track:
When I disable HTTPs everything works just fine
If I remove all the cookies from my browser, go direct to the HTTPs login page, and try to login, it does the same thing, first time does not work, second time works great. However, if I go direct to the login page, and before the first try to login just refresh the page, the login works with no problem (and the cookies are not regenerated in this case)
I believe it has something to do with a fix Magento made for preventing MITM attacks in the file app/code/core/Mage/core/Model/Session/Abtract/Varien.php (class Mage_Core_Model_Session_Abstract_Varien. In this file, starting at line 135 magento starts to make a few checks in order to prevent this type of attack. If I comment out this part of the code the problem does not happen with the login page.
I tried to change Pound for Nginx, and Apache2 for proxying the request to varnish, however they all had the same issue, so I think the problem is not here.
Here is my configuration files:
pound.cfg
TimeOut 3600
ListenHTTPS
Address server_ip
Port port
Cert "my cert path"
xHTTP 2
RewriteLocation 1
Ciphers "RC4:!SSLv2:!ADH:!aNULL:!eNULL:!NULL:!LOW:!EXP"
AddHeader "Ssl-Offloaded: 1"
End
Service
BackEnd
Address varnish_ip
Port varnish_port
End
End
Varnish VCL:
The nexaccess module generated this and sends to the server, you can check for the template here: github.com/nexcess/magento-turpentine/blob/devel/app/code/community/Nexcessnet/Turpentine/misc/version-4.vcl

Selenium IDE: How to detect secure cookies on page loaded with http://?

I am using Firefox 22 and Selenium IDE 2.2.0.
I have loaded a page in firefox using the HTTP protocol (not HTTPS). I know for sure that the page has set a secure cookie (as a result of an embedded AJAX request). I can verify this using the browser internal url chrome://web-developer/content/generated/view-cookie-information.html - because among other cookies that page shows a cookie like this:
Name WC_AUTHENTICATION_5122759
Value 5122759%2cDKppXa7BAqnZ0ERDLb0Wee%2bXqUk%3d
Host .testserver.dk
Path /
Expires At end of session
Secure Yes
HttpOnly No
However, when I run assertCookie in the Selenium IDE I can only see the unsecure cookies. I.e. all cookies - except then one above - are detected by Selenium IDE:
Executing: |assertCookie | glob:WC_AUTHENTICATION_* | | yields this set of visible cookies:
[error] Actual value 'JSESSIONID=0000uCQdh2FZ0ZA8z-O5zcGoUtD:-1;
WC_PERSISTENT=lT8Z5tbkQrvLhNm%2bGyCj%2bh4yPAU%3d%0d%0a%3b2013%2d07%2d05+13%3a18%3a18%2e807%5f1373023098807%2d3048%5f10201%5f5122827%2c%2d100%2cDKK%5f10201;
WC_SESSION_ESTABLISHED=true;
WC_ACTIVEPOINTER=%2d100%2c10201; WC_USERACTIVITY_5122827=5122827%2c10201%2cnull%2cnull%2cnull%2cnull%2cnull%2cnull%2cnull%2cnull%2cy6bjcrZgvCVe5c52BBKvcItxyF5lLravpDq9rd9I0ZmRfRNxcC2oG13Eyug3kKgbtLOHVLxm9T76%0d%0a%2fGJFLp5bOrkPoNqmc38TIr%2fO7eU%2fbd7Mfny2kQg7v6xGweYoRkXYgAEz91rH0QavFhlOjpd12A%3d%3d;'
did not match 'glob:WC_AUTHENTICATION_*'
So does anyone know how can I use the Selenium IDE to verify the presence of secure cookies on a page loaded with http:// (not https://) ?
Sadly, what you are doing is breaking the specifications. A secure cookie is suppose to be only available if the connection is secure. Hence, if you are connecting with HTTP, you can't see it.
However, if this is just on your test machine (not your end user), you can modify the response from the server using Fiddler. With Fiddler, you can program something like, if you see this cookie, add another cookie, or strip the secure flag.
EDIT:
Some background information about Selenium and cookies:
Selenium works through the browser with JavaScript as part of the page. Because it is essentially a part of the page, it has to follow all the same rules as the page. This means that it still has to abide by the security rules on cookies. A secure only cookie can only be read on a secure connection, thus Selenium cannot read a secure cookie if it's not on a secure connection.
The place where HTTP request comes in is that cookies are a part of the HTTP header. Both the request (from the browser) and the response (from the server) have an HTTP header. Cookies are present in both.
You want to verify if the server has set the cookie, so you want to inspect the HTTP response from the server for the presence of the cookie. Because of security restrictions, however, you cannot from Selenium. These security restrictions are enforced by the browser. All reputable browsers enforce these policies, since without these policies, the end user's credentials will be easily compromised.
This is where Fiddler comes in. Fiddler inspects the HTTP data at a lower level, before the browser gets to it. Thus, you can use Fiddler to manipulate the data before it gets to the browser to give some kind of indication that the cookie was present.

Stop browser to make HTTP requests for images that should stay cached - mod_expires

After reading many articles and some questions on here, I finally succeded in activating the Apache mod_expires to tell the browser it MUST cache images for 1 year.
<filesMatch "\.(ico|gif|jpg|png)$">
ExpiresActive On
ExpiresDefault "access plus 1 year"
Header append Cache-Control "public"
</filesMatch>
And thankfully server responses seem to be correct:
HTTP/1.1 200 OK
Date: Fri, 06 Apr 2012 19:25:30 GMT
Server: Apache
Last-Modified: Tue, 26 Jul 2011 18:50:14 GMT
Accept-Ranges: bytes
Content-Length: 24884
Cache-Control: max-age=31536000, public
Expires: Sat, 06 Apr 2013 19:25:30 GMT
Connection: close
Content-Type: image/jpeg
Well, I thought this would stop the browser to download and even inquire the server about the images for 1 year. But it's partially true: cause if you close and reopen the browser, the browser does NOT download the images from server anymore, but browser still inquires the server with an HTTP request for each image.
How do I force browser to stop making HTTP requests for each image? Even if these HTTP requests are not followed by an image being downloaded, they are still requests made to the server that unecessarely icrease latency and slow down the page rendering!
I already told the browser it MUST keep the images in cache for 1 year! Why does browser still inquire the server for each image (even if it does not download the image)?!
Looking at network graphs in FireBug (menu FireBug > Net > Images) I can see different caching behaviours (I obviously started with the browser cache completly empty, I forced a cache delete on browser using "Clear All History"):
When the page is loaded for the 1st time all images are downloaded (and same thing happens if I force a page reload by clicking on the browser's reload page button). This makes sense!
When I navigate the site and get back to the same page the images are not downloaded at all and the browser does NOT even inquire the server for any of the images. This makes sense, (and I would like to see this behaviour also when browser is closed)!
When I close the browser and open it again on the same page, the silly browser makes anyway HTTP request to the server one time per image: it does NOT downalod the image, but it still makes an HTTP request, it's like the browser inquires the server about the image (server replies with 200 OK). This is the one that irritates me!
I also attach the graphs below if you are interested:
EDIT: just tested now also with FireFox 11.0 just to make sure it wasn't an issue of my FireFox 3.6 being too old. The same thing happens!!! I also tested Google site and Stackoverflow site, they do both send the Cache-Control: max-age=... but the browser still makes an HTTP request to the server for each image once the browser is closed and opened again on the same page, after server response the browser does NOT download the image (as I explained above) but it still makes the damn request that increases time to see page.
EDIT2: and removing the Last-Modified header as suggested here, does not solve the problem, it does not make any difference.
The behavior you are seeing is the intended (see RFC7234 for more details), specified behavior:
All modern browsers will send HTTP requests to the server for every page element displayed, regardless of cache status. This was a design decision made at the request of web services (especially advertising networks) to ensure that HTTP servers were able to maintain records of every display of every element.
If the browsers did not make these requests, the server would never be notified that an image had been displayed to the user. For advertising networks, this would be catastrophic. Early on, advertising networks 'hacked' their way around this by serving the same ad image using randomly generated names (ex: 'coke_ad_1_98719283719283.gif'). However, for ISPs this practice caused a huge increase in data transfers, because every one of their users was re-downloading these identical ad images, bypassing any caching/proxy servers their ISP was operating.
So a truce was reached: Browsers would always send HTTP requests, even for un-expired cached elements. Servers would respond with HTTP 304 status codes ("not modified"). This allows the servers to record the fact that the image was displayed to the client. As a result, advertising networks generally stopped using randomized image names to bypass network cache servers.
This gave the ad networks what they wanted - a record of every image displayed - and it gave ISPs what they wanted - cache-able images and static content.
That is why there isn't much you can do to prevent browsers from sending HTTP requests for cached page elements.
But if you look at other available client-side solutions that came along with html5, there is a scope to prevent resource loading
Cache Manifest (in spite of its gotchas)
IndexedDB (nice asynchronous features, allows blob storage)
Local Storage (not async)
You were using the wrong tool for analysing the requests.
I'd recommend the really useful Firefox addon Live HTTP headers so you can see what is really going on on the network.
And just to be sure, you can ssh/putty your server and do something like
tail -f /var/log/apache2/access.log
There's a difference between "reloading" and "refreshing". Just navigating to a page with back and forward buttons usually doesn't initiate new HTTP requests, but specifically hitting F5 to "refresh" the page will cause the browser to double check its cache. This is browser dependent but seems to be the norm for FF and Chrome (i.e. the browsers that have the ability to easily watch their network traffic.) Hitting F6, enter should focus the URL address bar and then "go" to it, which should reload the page but not double check the assets on the page.
Update: clarification of back and forward navigating behavior. It's called "Back Forward Cache" or BFCache in browsers. When you navigate with back/forward buttons the intent is to show you exactly as the page was when you saw it in your own timeline. No server requests are made when using back and forward, even if a server cache header says that a particular item expired.
If you see (200 OK BFCache) in your developer network panel, then the server was never hit - even to ask if-modified-since.
http://www.softwareishard.com/blog/firebug/firebug-tip-what-the-heck-is-bfcache/
If I force a refresh using F5 or F5 + Ctrl, a request is send. However if I close the browser and enter the url again then NO reqeust is send. The way I tested if a request is send or not was by using breakpoints on begin request on the server even when a request is not send it still shows up in Firebug as having done a 7 ms wait, so beware of this.
What you are describing here does not reflect my experience. If content is served with a no-store directive or you do an explicit refresh, then yes, I'd expect it to go back to the origin server otherwise it should be cached across browser restarts (assuming it is allowed to, and can write a cache file).
Looking at your waterfalls in a bit more detail (which is tricky because they are a bit small & blurry) the browser appears to be doing exactly what it should - it has entries for the images - but these are just loading from the local cache not from the origin server - check the 'Date' header in the response (why do you think it's taking milliseconds instead of seconds?). That's why they are coloured differently.
After myself spending considerable time looking for a reasonable answer, I found the below link most useful and it does answer the question asked here.
https://webmasters.stackexchange.com/questions/25342/headers-to-prevent-304-if-modified-since-head-requests
If it is a matter of life or death (If you want to optimise page loading this way or if you want to reduce the load on the server as much as possible no matter what), then there IS a workaround.
Use HTML5 local storage to cache images after they were requested for the first time.
[+] You can prevent browser from sending HTTP requests, which in 99% would return 304 (Not Modified), no matter how hard user tries (F5, ctrl+F5, simply revisiting page, etc.)
[-] You have to put some extra efforts in javascript support for this.
[-] Images are stored in base64 (we cannot store binary data), thats why they are decoded each time at client side. Which is usually pretty fast and not big deal, but it is still some extra cpu usage at client side and should be kept in mind.
[-] Local storage is limited. You can aim at using ~5mb of data per domain (Note: base64 adds ~30% to original size of image).
[?] Supported by majority of browsers. http://caniuse.com/#search=localstorage
Example
Test
What you are seeing in Chrome is not a record of the actual HTTP requests - it's a record of asset requests. Chrome does this to show you that an asset is actually being requested by the page. However, this view does not really actually indicate if the request is being made. If an asset is cached, Chrome will never actually create the underlying HTTP request.
You can also confirm this by hovering over the purple segments in the timeline. Cached resources will have a (from cache) in the tooltip.
In order to see the actual HTTP requests, you need to look on a lower level. In some browsers this can be done with a plugin (like Live HTTP Headers).
In reality though, to verify the requests are not actually being made you need to check your server logs or use a debugging proxy like Charles or Fiddler. This will work on an HTTP level to make sure the requests are not actually happening.
Cache Validation and the 304 response
There are a number of situations in which Internet Explorer needs to check whether a cached entry is valid:
The cached entry has no expiration date and the content is being accessed for the first time in a browser session
The cached entry has an expiration date but it has expired
The user has requested a page update by clicking the Refresh button or pressing F5
If the cached entry has a last modification date, IE sends it in the If-Modified-Since header of a GET request message:
GET /images/logo.gif HTTP/1.1
Accept: */*
Referer: http://www.google.com/
Accept-Encoding: gzip, deflate
If-Modified-Since: Thu, 23 Sep 2004 17:42:04 GMT
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)
Host: www.google.com
The server checks the If-Modified-Since header and responds accordingly. If the content has not been changed since the date/time specified, it replies with a status code of 304 and a response message that just contains headers:
HTTP/1.1 304 Not Modified
Content-Type: text/html
Server: GWS/2.1
Content-Length: 0
Date: Thu, 04 Oct 2004 12:00:00 GMT
The response can be quickly downloaded because it contains no content and causes IE to read the data it requires from the cache. In effect, it is like a redirection to the local browser cache.
If the requested object has actually changed since the date/time in the If-Modified-Since header, the server responses with a status code of 200 and supplies the modified version of the resource.
This question has a better answer here at webmasters stack-exchange site.
More information, which is also cited in the above link, is on httpwatch
According to the article:
There are a number of situations in which Internet Explorer needs to check whether a cached entry is valid:
The cached entry has no expiration date and the content is being accessed for the first time in a browser session
The cached entry has an expiration date but it has expired
The user has requested a page update by clicking the Refresh button or pressing F5
enter code here

Google chrome same url cache

I'm testing my servlet using google chrome. When i tried to load the same url twice, say,
localhost/myserver/servlet
chrome only sent out one request to the server. However, if I modified the second url to be:
localhost/myserver/servlet?id=2
it sent two different requests.
I've enabled the incognito mode, but it seems that chrome shares cache and urls between all its incognito tabs.
Caching control is a part of HTTP specification, read something about it. Using HTTP headers like Cache-Control: no-cache or Expires: ... should help you.

Resources