ruby mechanize: How to retrieve an attachment from a GET - ruby

I'm trying to download transactions from a bank account (mine).
Step 1: a form is filled and submitted (POST).
Step 2. subsequent to that, the browser sends a GET
https://accountinfo.corp.xxxxxxx.com.au/AIWeb/ExportAccounts/DownloadExport?OfficeId=201012249&ScheduleId=&FileFormat=CSV-Tran&IsAccountExport=False
The browser receives the file and saves it (default action).
The http response is:
(Status-Line) HTTP/1.1 200 OK
Content-Length 73
Content-Type application/AIUsers
Date Thu, 24 Dec 2015 03:24:22 GMT
p3p CP="NON CUR OTPi OUR NOR UNI"
x-frame-options SAMEORIGIN
x-aspnetmvc-version 1.0
Cache-Control private
Content-Disposition attachment; filename=Accounts_24-12-2015_91456974_T.CSV
I emulate the form submission (with ruby mechanize), wait a few seconds, and agent.get the URL above, as in:
url = "https://accountinfo.corp.westpac.com.au/AIWeb/ExportAccounts/DownloadExport?OfficeId=201012249&ScheduleId=&FileFormat=CSV-Tran&IsAccountExport=False"
download_page = agent.get(url)
The result is incorrect:
<html><body><script>window.parent.location = '/AIWeb/ExportAccounts/ShowErrorMessage?errorCode=3';</script></body></html>
Would appreciate some guidance on how to get the result of the GET.
Regards

There must be something that server expects in headers (and/or) cookies. You can check that using chrome inspector or firebug and add to your request if mechanize supports it (no idea what that is).

Related

Browser serving an obsolete Authorization header from cache

I'm experiencing my client getting logged out after an innocent request to my server. I control both ends and after a lot of debugging, I've found out that the following happens:
The client sends the request with a correct Authorization header.
The server responds with 304 Not Modified without any Authorization header.
The browser serves the full response including an obsolete Authorization header as found in its cache.
From now on, the client uses the obsolete Authorization and gets kicked out.
From what I know, the browser must not cache any request containing Authorization. Nonetheless,
chrome://view-http-cache/http://localhost:10080/api/SearchHost
shows
HTTP/1.1 200 OK
Date: Thu, 23 Nov 2017 23:50:16 GMT
Vary: origin, accept-encoding, authorization, x-role
Cache-Control: must-revalidate
Server: 171123_073418-d8d7cb0 =
x-delay-seconds: 3
Authorization: Wl6pPirDLQqWqYv
Expires: Thu, 01 Jan 1970 00:00:00 GMT
ETag: "zUxy1pv3CQ3IYTFlBg3Z3vYovg3zSw2L"
Content-Encoding: gzip
Content-Type: application/json;charset=utf-8
Content-Length: 255
The funny server header replaces the Jetty server header (which shouldn't be served for security reasons) by some internal information - ignore that. This is what curl says:
< HTTP/1.1 304 Not Modified
< Date: Thu, 23 Nov 2017 23:58:18 GMT
< Vary: origin, accept-encoding, authorization, x-role
< Cache-Control: must-revalidate
< Server: 171123_073418-d8d7cb0 =
< ETag: "zUxy1pv3CQ3IYTFlBg3Z3vYovg3zSw2L"
< x-delay-seconds: 3
< Content-Encoding: gzip
This happens in Firefox, too, although I can't reproduce it at the moment.
The RFC continues, and it looks like the answer linked above is not exact:
unless a cache directive that allows such responses to be stored is present in the response
It looks like the response is cacheable. That's fine, I do want the content to be cached, but I don't want the Authorization header to be served from cache. Is this possible?
Explanation of my problem
My server used to send the Authorization header only when responding to a login request. This used to work fine, problems come with new requirements.
Our site allows users to stay logged in arbitrarily long (we do no sensitive business). We're changing the format of the authorization token and we don't want to force all users to log in again because of this. Therefore, I made the server to send the updated authorization token whenever it sees an obsolete but valid one. So now any response may contain an authorization token, but most of them do not.
The browser cache combining the still valid response with an obsolete authorization token comes in the way.
As a workaround, I made the server send no etag when an authorization token is present. It works, but I'd prefer some cleaner solution.
The quote in the linked answer is misleading because it omitted an important part: "if the cache is shared".
Here's the correct quote (RFC7234 Section 3):
A cache MUST NOT store a response to any request, unless: ... the Authorization header field (see Section 4.2 of [RFC7235]) does not appear in the request, if the cache is shared,
That part of the RFC is basically a summary.
This is the complete rule (RFC7234 Section 3.2) that says essentially the same thing:
A shared cache MUST NOT use a cached response to a request with an Authorization header field (Section 4.2 of [RFC7235]) to satisfy any subsequent request unless a cache directive that allows such responses to be stored is present in the response.
Is a browser cache a shared cache?
This is explained in Introduction section of the RFC:
A private cache, in contrast, is dedicated to a single user; often, they are deployed as a component of a user agent.
That means a browser cache is private cache.
It is not a shared cache, so the above rule does not apply, which means both Chrome and Firefox do their jobs correctly.
Now the solution.
The specification suggests the possibility of a cached response containing Authorization to be reused without the Authorization header.
Unfortunately, it also says that the feature is not widely implemented.
So, the easiest and also the most future-proof solution I can think of is make sure that any response containing Authorization token isn't cached.
For instance, whenever the server sees an obsolete but valid Authorization token, send a new valid one along with Cache-Control: no-store to disallow caching.
Also you must never send Cache-Control: must-revalidate with Authorization header because the must-revalidate directive actually allows the response to be cached, including by shared caches which can cause even more problems in the future.
... unless a cache directive that allows such responses to be stored is present in the response.
In this specification, the following Cache-Control response directives (Section 5.2.2) have such an effect: must-revalidate, public, and s-maxage.
My current solution is to send an authorization header in every response; using a placeholder value of - when no authorization is wanted.
The placeholder value is obviously meaningless and the client knows it and happily ignores it.
This solution is ugly as it adds maybe 20 bytes to every response, but that's still better than occasionally having to resend a whole response content as with the approach mentioned in my question. Moreover, with HTTP/2 it'll be free.

how server can identify ajax request?

i am trying to send request from my localhost to ebay servers , i am using postman to do so.
the request is the same request which this ebay page send http://www.fees.ebay.com/feeweb/feecalculator
when you press on the calculate fees button .
the request should return json response, but i am unable to do that in my postman , please find my request bellow:
how ebay can identify that my response is not ajax request ?
POST /feeweb/calculate HTTP/1.1
Host: www.fees.ebay.com
X-Requested-With: XMLHttpRequest
Referer: http://www.fees.ebay.com/feeweb/feecalculator
Accept: */*
Content-Type: application/x-www-form-urlencoded
Connection: keep-alive
Cache-Control: no-cache
Content-Type: application/x-www-form-urlencoded
locale=en_US_MAIN&catlevel1=2984&dp_cat-level1=Baby&catlevel2=20394&dp_cat-level2=athing+%26+Grooming&catlevel3=113814&dp_cat-level3=Bath+Tubs&saleformat=saleformat&site_id=0&RlogId=t6e%2560cpfg%253C%253Dsm%257Eaf%2560qba%252840%253A702-15753cbd4ec-0xfe&dp_store=Basic&rb-discount=0&freeshipping=1&value_pack_info=Value+Pack+is+Gallery+Plus%2C+Listing+Designer%2C+and+Subtitle+packaged+together+for+a+discount.&store=1&n_reserveprice=0&n_finalsaleprice=100&finalsaleprice=%24100.00
EDIT :
here is the response headers :
Content-Encoding → gzip
Content-Length → 2312
Content-Type → text/html;charset=UTF-8
Date → Sat, 22 Oct 2016 15:10:25 GMT
ETag → 6faadcbec2626c16a5eb2715c89aa47f
Last-Modified → Sat, 22 Oct 2016 14:31:15 GMT
Server → Apache-Coyote/1.1
response body:
From collectibles to cars, buy and sell all kinds of items on eBay
Page Not Responding
The eBay page or feature you are attempting to access is not responding.
Please try the options below:
Try to access the feature directly from the eBay Home Page, instead of using a bookmark.
Wait a few minutes and try to access the feature again.
If you have waited ten to fifteen minutes and you still can't access your page:
Check our Announcement Board to see if the feature is currently unavailable.
If what you are looking for is unavailable, you may still be able to access other parts of the site from the eBay Home page

Does if-no-match need to be set programmatically in ajax request, if server sends Etag

My question is pretty simple. Although while searching over, I have not found a simple satisfying answer.
I am using Jquery ajax request to get the data from a server. Server
hosts a rest API that sets the Etag and Cach-control headers to the GET requests. The Server also sets CORS headers to allow the Etag.
The client of the Api is a browser web app. I am using Ajax request to call the Api. Here are the response headers from server after a simple GET request:
Status Code: 200 OK
Access-Control-Allow-Origin: *
Cache-Control: no-transform, max-age=86400
Connection: Keep-Alive
Content-Encoding: gzip
Content-Type: application/json
Date: Sun, 30 Aug 2015 13:23:41 GMT
Etag: "-783704964"
Keep-Alive: timeout=15, max=99
Server: Apache-Coyote/1.1
Transfer-Encoding: chunked
Vary: Accept-Encoding
access-control-allow-headers: X-Requested-With, Content-Type, Etag,Authorization
access-control-allow-methods: GET, POST, DELETE, PUT
All I want to know is:
Do I need to manually collect the Etag from response headers sent from the server and attach an if-no-match header to ajax request?OR the Browser sends it by-default in a conditional get request when it has an 'Etag'
I have done debugging over the network console in the browser and It
seems the browser is doing the conditional GET automatically and
sets the if-no-match header.
if it is right, Suppose, I created a new resource, and then I called the get request. It gives me the past cached data for the first time. But when I reload the page, It gives the updated one. So I am confused that, If the dataset on the server-side has changed and it sends a different Etag, Why doesn't the browser get an updated data set from the server unless I have to reload
Also in case of pagination. Suppose I have a URL /users?next=0. next is a query param where the value for the next changes for every new request. Since each response will get its own 'Etag'. Will the browser store the 'Etag' based on request or it just stores the lastest Etag of the previous get request, irrespective of the URL.
Well, I have somehow figured out the solution myself:
The browser sends the if-no-match header itself when it sees url had the e-tag header on a previous request. Browser saves the e-tag with respect to that URL, so it does not matter how many requests with different URLs happen.
Also, a trick to force the browser to fetch a conditional-get to check the e-tag:
Set the max-age header to the lowest (for me 60s works great)
once the cache expires, thebrowser will send a conditional-get to check if the expired cached resource is valid. If the if-no-match header matches with e-tag. The server sends the response back with 304: Not-Modified header. This means the expired cached resource is valid and can be used.

Will ETag work without cache-control header set by web server

My server returns the following headers for a file:
Accept-Ranges:bytes
Connection:Keep-Alive
Content-Length:155
Content-Type:text/css
Date:Thu, 06 Feb 2014 18:32:44 GMT
ETag:"99000000061b06-9b-4f1c118fdd2f1"
Keep-Alive:timeout=5, max=100
Last-Modified:Thu, 06 Feb 2014 18:32:37 GMT
As you can see, it doesn't return cache-control header, however it returns ETag and Last-Modified headers.
My question is whether browser is going to cache the requested file? I can observr that during the following requests the browser sends ETag:"99000000061b06-9b-4f1c118fdd2f1" in headers and server returns status code 304.
And second question: Will browser cache resource and request it with ETag if Cache-control is set to no-cache?
For first part of question - It is up to your browser (its implementation and configuration) if the response will be cached and when will be revalidated. The only (standardized) difference between browser behaviour with validation headers and behaviour without validation headers is that former one can reduce traffic with server using validation.
Second question: Yes. Browser will cache resource but every time you open the page browser will ask origin server if resource was not modified. If not modified server will respond 304 and browser will display cached content. Otherwise server will send new content.
My guess would be ETag can serve as cache-control: no-cache.

django: invoking browser's save as from an ajax invoked view

In my django app I want to both render a template and invoke the browser's save as. I’ve implemented this using ajax – I have a view that renders a template. In that template is some javascript that invokes another view with ajax. That view returns a response that should trigger the save as. But when returned from the ajax invoked view it does not. If I invoke the same view by cutting and pasting the URL generated by the ajax call into my browser’s address bar the save as is invoked, but when called from ajax it is not. I have verified from the python side using pdb that the view is invoked and the proper response is being returned. I have verified from the browser side that it received the response.
This is the response I return (cut and pasted from the browser's debug window):
HTTP/1.1 200 OK
Date: Tue, 26 Mar 2013 13:07:47 GMT
Server: Apache/2.2.21 (Unix) mod_ssl/2.2.21 OpenSSL/1.0.0f DAV/2 mod_wsgi/3.3 Python/2.6.7
Vary: Cookie
Content-Disposition: Attachment; filename=SF69.xml
Keep-Alive: timeout=5, max=98
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/xml
Any idea why that could be happening? Why would this behave differenly when being returned from ajax vs. 'normal' way?

Resources