How to influence OkHttp cache usage? - okhttp

You can add a cache to an OkHttpClient which will work according to the various cache-related HTTP header like cache-control:
val okHttpClient =
OkHttpClient.Builder().apply {
cache(Cache(File("http_cache"), 50 * 1024 * 1024))
}.build()
There is a resource for which the server (which is not controlled by me) specifies cache-control: no-cache in the response. But I want to cache it anyway, because I know that under certain circumstances it is safe to do so.
I thought I could intercept the response and set headers accordingly:
val okHttpClient =
OkHttpClient.Builder().apply {
cache(Cache(File("http_cache"), 50 * 1024 * 1024))
addInterceptor { chain ->
val response = chain.proceed(chain.request())
response
.newBuilder()
.header("cache-control", "max-age=1000") // Enable caching
.removeHeader("pragma") // Remove any headers that might conflict with caching
.removeHeader("expires") // ...
.removeHeader("x-cache") // ...
.build()
}
}.build()
Unfortunately, this does not work. Apparently, the caching decisions are made before the interceptor intercepts. Using addNetworkInterceptor() instead of addInterceptor() does not work either.
The opposite - disabling caching when the server allows it by setting cache-control: no-cache - also does not work.
Edit:
Yuri's answer is correct. addNetworkInterceptor() with .header("Cache-Control", "public, max-age=1000") works, and .header("cache-control", "max-age=1000") also works.
But when running my experiments, I had made some false assumptions. This is what I found out later:
The OkHttp cache does not cache responses for POST requests at all. (Source)
"Note that no-cache does not mean "don't cache". no-cache allows caches to store a response, but requires them to revalidate it before reuse. If the sense of "don't cache" that you want is actually "don't store", then no-store is the directive to use." (Source)

It needs to be a networkInterceptor, but it definitely works.
Try .header("Cache-Control", "public, max-age=1000"), which should cache for 15 minutes.
See https://stackoverflow.com/a/23503804/1542667

Related

Azure Form Recognizer training not finding data

I'm trying to train a Form Recognizer using the browser API console (https://eastus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api/operations/TrainCustomModel/console). I've uploaded traning images to a container and created an SAS. The browser API console generate following HTTP request:
POST https://eastus.api.cognitive.microsoft.com/formrecognizer/v1.0-preview/custom/train?source=https://pythonimages.blob.core.windows.net/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2020-01-22T00:23:33Z&st=2020-01-21T16:23:33Z&spr=https&sig=••••••••••••••••••••••••••••••••&prefix=images HTTP/1.1
Host: eastus.api.cognitive.microsoft.com
Content-Type: application/json
Ocp-Apim-Subscription-Key: ••••••••••••••••••••••••••••••••
{
"source": "string",
"sourceFilter": {
"prefix": "string",
"includeSubFolders": true
}
}
However, the answer I get back is
Transfer-Encoding: chunked
x-envoy-upstream-service-time: 4
apim-request-id: 5ad37aa2-e251-4b61-98ae-023930b47d27
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
Date: Tue, 21 Jan 2020 16:25:03 GMT
Content-Type: application/json; charset=utf-8
{
"error": {
"code": "1004",
"message": "Dataset path must be relative to local input mount path '/input' if local data is referenced."
}
}
I don't understand why it seems to be looking for data locally. I've experimented with the SAS, e.g. including the container name (images) in the blob http address rather than as a query parameter, but no success so far.
I've also tried the Python/REST path (described here: https://learn.microsoft.com/en-gb/azure/cognitive-services/form-recognizer/quickstarts/python-train-extract-v1), which results in a different error:
Response status code: 408
Response body: {'error': {'code': '1011', 'innerError': {'requestId': 'e7f9ef9f-97bc-4b6a-86f3-0b29c9591c87'}, 'message': 'The operation exceeded allowed time limit and was canceled. The common reasons are that the data source is too large or contains unsupported content. Please check that your request conforms to service limits and retry with redacted data source.'}}
For completeness, the code I use is as follows (key/signature *ed out:)
########### Python Form Recognizer Train #############
from requests import post as http_post
# Endpoint URL
base_url = r"https://markusformsrecognizer.cognitiveservices.azure.com/" + "/formrecognizer/v1.0-preview/custom"
source = r"https://pythonimages.blob.core.windows.net/images?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2020-01-22T15:37:26Z&st=2020-01-22T07:37:26Z&spr=https&sig=*********************************"
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': '*********************************'
}
url = base_url + "/train"
body = {"source": source}
try:
resp = http_post(url = url, json = body, headers = headers)
print("Response status code: %d" % resp.status_code)
print("Response body: %s" % resp.json())
except Exception as e:
print(str(e))
For error code 1004 Please follow the below to get the Source path containing the training documents and pass as value to the source key.
{
"source": "string",
"sourceFilter": {
"prefix": "string",
"includeSubFolders": true
}
}
Replace with the Azure Blob storage container's shared access signature (SAS) URL. To retrieve the SAS URL, open the Microsoft Azure Storage Explorer, right-click your container, and select Get shared access signature.
Make sure the Read and List permissions are checked, and click Create.
Then copy the value in the URL section. It should have the form:
https://.blob.core.windows.net/container name?SAS value.
Please use the new Form Recognizer v2.0 release it is an async API and enables training on large data sets and analyzing large documents. https://aka.ms/form-recognizer/api
quick start - https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/python-train-extract
To get started with Form Recognizer please login to the Azure Portal using this link to create a Form Recognizer resource (for v2.0 (preview) please use West US 2 or West Europe regions).
try removing the string value from prefix property.
{
"source": "string",
"sourceFilter": {
"prefix": "",
"includeSubFolders": true
}
}
The Python Quick Start code for version 2.0 seems to be working, at least I don’t get any errors anymore. I’m now feeling slightly silly that I didn’t try this earlier. The API (web-browser) console, linked from the Quick Start page of the Form Recognizer seems automatically assume I want to use version 1.0 and there’s no way to change that (or perhaps I’ve just overseen something). Hence I assumed I’d been allocated a v1.0 trial and therefore that’s what I used when I tried the Python Quick Start the first time around.
Instead of using just the SAS URI in the "source" of Request parameter on the API POST call, use the complete string of the container followed by the SAS URI token.
For ex:
https://.blob.core.windows.net//

How can I set HTTP request headers when using Go-Github and an http.Transport?

I am writing an app that uses the GitHub API to look at repositories in my GitHub orgs. I am using the github.com/google/go-github library.
I am also using the github.com/gregjones/httpcache so that I can do token based authentication as well as set the conditional headers for the API calls. I have got authentication working thus:
ctx := context.Background()
// GitHUb API authentication
transport = &oauth2.Transport{
Source: oauth2.StaticTokenSource(
&oauth2.Token{
AccessToken: gh.tokens.GitHub.Token,
},
),
}
// Configure HTTP memory caching
transport = &httpcache.Transport{
Transport: transport,
Cache: httpcache.NewMemoryCache(),
MarkCachedResponses: true,
}
// Create the http client that GutHUb will use
httpClient := &http.Client{
Transport: transport,
}
// Attempt to login to GitHub
client := github.NewClient(httpClient)
However I am unable to work out how to add the necessary If-Match header when I use client.Repositories.Get for example. This is so I can work out if the repo has changed in the last 24 hours for exampple.
I have searched how to do this, but the examples I come across show how to create an HTTP client and then create a request (so the headers can be added) and then do a Do action on it. However As I am using the client directly I do not have that option.
The documentation for go-github states that for conditional requests:
The GitHub API has good support for conditional requests which will help prevent you from burning through your rate limit, as well as help speed up your application. go-github does not handle conditional requests directly, but is instead designed to work with a caching http.Transport. We recommend using https://github.com/gregjones/httpcache for that.
Learn more about GitHub conditional requests at https://developer.github.com/v3/#conditional-requests.
I do not know how to add it in my code, any help is greatly appreciated.
As tends to be the case with these things, shortly after posting my question I found the answer.
The trick is to set the headers using the Base in the Oauth2 transport thus:
transport = &oauth2.Transport{
Source: oauth2.StaticTokenSource(
&oauth2.Token{
AccessToken: gh.tokens.GitHub.Token,
},
),
Base: &transportHeaders{
modifiedSince: modifiedSince,
},
}
The struct and method look like:
type transportHeaders struct {
modifiedSince string
}
func (t *transportHeaders) RoundTrip(req *http.Request) (*http.Response, error) {
// Determine the last modified date based on the transportHeader options
// Do not add any headers if blank or zero
if t.modifiedSince != "" {
req.Header.Set("If-Modified-Since", t.modifiedSince)
}
return http.DefaultTransport.RoundTrip(req)
}
So by doing this I can intercept the call to RoundTrip and add my own header. This now means I can check the resources and see if they return a 304 HTTP status code. For example:
ERRO[0001] Error retrieving repository error="GET https://api.github.com/repos/chef-partners/camsa-setup: 304 []" name=camsa-setup vcs=github
I worked out how to do this after coming across this page - https://github.com/rmichela/go-reddit/blob/bd882abbb7496c54dbde66d92c35ad95d4db1211/authenticator.go#L117

ElasticSearch.net/NEST SniffingConnectionPool switches to port 9200 when using custom port behind proxy

When using the SniffingConnectionPool it seems that Elasticsearch.net switches to port 9200 after the initial http.settings request?
I'm setting up the ConnectionPool with an IEnumerable as follows:
var nodes = cfg.Nodes.Select(x => x.Uri);
var pool = new SniffingConnectionPool(nodes);
The uris passed uses port 92. When debugging the requests, I can see that the first request is correctly made and we get 200 OK. However, the following HEAD request uses port 9200?
11 200 HTTP X.X:X.X:92 /_nodes/http,settings?flat_settings&timeout=500ms 5 121 application/json; charset=UTF-8
12 502 HTTP X.X.X.X:9200 / 512 no-cache, must-revalidate text/html; charset=UTF-8
Do I miss something? Worth to notice is that our cluster is reversed proxied by Nginx, and uses 9200/9300 to communicate internally.
Edit: The http property of http.settings looks like the following:
"http" : {
"bound_address" : [
"[::]:9200"
],
"publish_address" : "X.X.X.X:9200",
"max_content_length_in_bytes" : 104857600
}
Maybe the SniffingConnectionPool parses that content and starts using 9200?

CXF JAX-RS client always sends empty PUT requests in chunking mode regardles of AllowChunking setting

We perform PUT request to our party using CXF JAX-RS client. Request body is empty.
A simple request invocation leads to server response with code 411.
Response-Code: 411
"Content-Length is missing"
Our party's REST-server requires Content-Length HTTP-header to be set.
We switched chunking off according to note about chunking but this did not solve the problem. The REST-server still answers with 411 error.
Here is our conduit configuration from cxf.xml file
<http-conf:conduit name="{http://myhost.com/ChangePassword}WebClient.http-conduit">
<http-conf:client AllowChunking="false"/>
</http-conf:conduit>
Line in the log confirms that execution of our request bound to our conduit configuration:
DEBUG o.a.cxf.transport.http.HTTPConduit - Conduit '{http://myhost.com/ChangePassword}WebClient.http-conduit' has been configured for plain http.
Adding Content-Length header explicitly also did not help.
Invocation.Builder builder = ...
builder = builder.header(HttpHeaders.CONTENT_LENGTH, 0);
A CXF Client's log entry confirms header setting, however when we sniffed packets, we have surprisingly found that header setting has been completely ignored by CXF client. Content-Length header was not sent.
Here is the log. Content-Length header is present:
INFO o.a.c.i.LoggingOutInterceptor - Outbound Message
---------------------------
ID: 1
Address: http://myhost.com/ChangePassword?username=abc%40gmail.com&oldPassword=qwerty123&newPassword=321ytrewq
Http-Method: PUT
Content-Type: application/x-www-form-urlencoded
Headers: {Accept=[application/json], client_id=[abcdefg1234567890abcdefg12345678], Content-Length=[0], Content-Type=[application/x-www-form-urlencoded], Cache-Control=[no-cache], Connection=[Keep-Alive]}
--------------------------------------
DEBUG o.apache.cxf.transport.http.Headers - Accept: application/json
DEBUG o.apache.cxf.transport.http.Headers - client_id: abcdefg1234567890abcdefg12345678
DEBUG o.apache.cxf.transport.http.Headers - Content-Length: 0
DEBUG o.apache.cxf.transport.http.Headers - Content-Type: application/x-www-form-urlencoded
DEBUG o.apache.cxf.transport.http.Headers - Cache-Control: no-cache
DEBUG o.apache.cxf.transport.http.Headers - Connection: Keep-Alive
And here is an output of the packet sniffer. Content-Length header is not present:
PUT http://myhost.com/ChangePassword?username=abc%40gmail.com&oldPassword=qwerty123&newPassword=321ytrewq HTTP/1.1
Content-Type: application/x-www-form-urlencoded
Accept: application/json
client_id: abcdefg1234567890abcdefg12345678
Cache-Control: no-cache
User-Agent: Apache-CXF/3.1.8
Pragma: no-cache
Host: myhost.com
Proxy-Connection: keep-alive
Does anyone know how actually disable chunking?
Here is our code:
public static void main(String[] args)
{
String clientId = "abcdefg1234567890abcdefg12345678";
String uri = "http://myhost.com";
String user = "abc#gmail.com";
Client client = ClientBuilder.newBuilder().newClient();
WebTarget target = client.target(uri);
target = target.path("ChangePassword").queryParam("username", user).queryParam("oldPassword", "qwerty123").queryParam("newPassword", "321ytrewq");
Invocation.Builder builder = target.request("application/json").header("client_id", clientId).header(HttpHeaders.CONTENT_LENGTH, 0);
Response response = builder.put(Entity.form(new Form()));
String body = response.readEntity(String.class);
System.out.println(body);
}
Versions:
OS: Windows 7 Enterprise SP1
Arch: x86_64
Java: 1.7.0_80
CXF: 3.1.8
I had a very similar issue that I was not able to solve as you did by trying to turn off chunking.
What I ended up doing was setting the Content-Length to 1 and adding some white space " " as the body. For me it seemed that the proxy servers before the server application was rejected the request and by doing that got me past the proxy servers and the server was able to process the request as it was only operating based on the URL.

Varnish4 - change PURGE response headers

I'm trying to change PURGE response headers in Varnish4
HTTP/1.1 200 Purged
Content-Type: text/html; charset=utf-8
Date: Fri, 02 Sep 2016 19:57:56 GMT
Retry-After: 5
Server: Varnish
X-Varnish: 163921
Content-Length: 241
Connection: keep-alive
I have modified "Server: Varnish" in vcl_recv, vcl_deliver. Which seems to be working with any other request except for PURGE.
I need to change Server header or at least add a custom response header
I can't find any documentation about it so I was wondering if anyone done it before or it is a hardcoded option.
You need to override the built-in synthetic response generated by Varnish when purging objects. This can be trivially implemented using some extra VCL:
...
sub vcl_purge {
return (synth(700, "Purged"));
}
sub vcl_synth {
if (resp.status == 700) {
set resp.status = 200;
set resp.http.Server = "ACME";
}
}

Resources