HTTP compression with Spring Boot and Nginx - spring

I am having a setting with an Nginx reverse proxy, a Spring Boot application, and a (Redis) cache, and would like to ask you (1) how to configure Nginx to only compress the data if it is not compressed yet, and (2) how to send compressed and cache data in Spring Boot correctly.
Current setup
The Nginx acts as a reverse proxy and compresses the data with specified content types:
gzip on;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
The Spring Boot API application has a few compute-intense endpoints that manage their own cache and many lightweight endpoints that don't need caching. The response sizes are often quite large (up to multiple hundred MBs) so that the data are compressed before caching. In pseudo-code:
#RequestMapping("/uncached1")
public MyUncached1Response getUncachedData1(String query) {
return dataservice.getResults1(query);
}
#RequestMapping("/cached1")
public String getCachedData1(String query) {
if (cache.has(query)) {
return uncompress(cache.get(query));
} else {
String results = dataservice.getResults2(query);
cache.set(query, compress(results));
return results;
}
}
As you can see, the setup is compressing and uncompressing a lot. If the value has not been cached, the application compresses the results for the cache. Then, the application returns the uncompressed data and Nginx compresses it again. If the value is already in the cache, the application first uncompresses it and gives Nginx the uncompressed data for compression.
Envisioned setup
I am wondering if the following setup would be possible:
Nginx only compresses the data if it has not been compressed yet
Some endpoints of the Spring Boot application returns compressed data if the client accepts it:
#RequestMapping("/uncached1")
public MyUncached1Response getUncachedData1(String query) {
return dataservice.getResults1(query);
}
#RequestMapping("/cached1")
public byte[] getCachedData1(String query) {
if (cache.has(query)) {
byte[] compressed = cache.get(query);
if (client.acceptsGzip()) {
return compressed;
} else {
return uncompressed(compressed).toByteArray();
}
} else {
String results = dataservice.getResults2(query);
byte[] compressed = compress(results);
cache.set(query, compressed);
if (client.acceptsGzip()) {
return compressed;
} else {
return results.toByteArray();
}
}
}
Questions:
Does the envisioned setup make sense, is it possible? If yes, could you please provide me with some hints for the implementation?
If it doesn't work this way, what would be a better architecture?
Currently, I use Java's Deflater for compression, but that's not entirely the same as gzip, right? How can I compress the data compatible to gzip for HTTP?
How can I see whether the client accepts gzip?
Thanks so much!

Related

Spring Cloud Gateway not returning correct Response code given by Downstream service (for file upload)

I have a simple downstream service for file upload. Sample code
#RestController
#RequestMapping("/file")
public class FileController {
#PostMapping("/upload")
public ResponseEntity<?> uploadFile(#RequestParam("file") MultipartFile file,
#RequestParam(value = "delay", required = false, defaultValue = "0") int delay) throws Exception {
System.out.println(String.join(System.getProperty("line.separator"),
"File Name => " + file.getOriginalFilename(),
"File Size => " + file.getSize() + "bytes",
"File Content Type => " + file.getContentType()));
TimeUnit.MILLISECONDS.sleep(delay);
return ResponseEntity.ok(file.getName() + " uploaded");
}
}
and a CustomExceptionHandler that returns BAD_REQUEST if there is a MultipartException:
#Configuration
#ControllerAdvice
public class CustomExceptionHandler {
#ExceptionHandler(MultipartException.class)
public ResponseEntity<String> handleMultipartException(MultipartException ex) {
return ResponseEntity.status(HttpStatus.BAD_REQUEST).body(ex.getMessage());
}
}
The size limit is 10MB in application.yml:
spring:
servlet:
multipart:
max-file-size: 10MB
max-request-size: 10MB
If I upload a large file, it gives me a a 400 status as expected
When I try to hit the same via spring cloud gateway I get the following result:
and the logs shows following:
2019-11-08 00:36:10.797 ERROR 21904 --- [ctor-http-nio-2] a.w.r.e.AbstractErrorWebExceptionHandler : [86e57f7e] 500 Server Error for HTTP POST "/product-service/file/upload"
reactor.netty.http.client.PrematureCloseException: Connection has been closed BEFORE response, while sending request body
Note that the gateway is configured to take in large file size with RequestSize filter set globally to take way more than 10MB.
How can I get the same response code as given by the downstream service?
Also, I check with traditional Zuul, and i get a 500 error too.
For the gateway, for this particular case I know we can use the RequestSize filter and now the gateway will return the error code, but then we have to identify all the routes that expect this beforehand.
Also, other validation in the API, like authorization, etc will have the same the same issue. The response code produced because of these validations will not propagate up.
Sample code spring-cloud-gateway/product-service/eureka - https://github.com/dhananjay12/spring-cloud/tree/master/spring-routing
can you try to go through a non limitation of the volume of the file directly to without going through the getway? try the value -1 for the properties :
properties file of the MS where you want to upload the file
spring.servlet.multipart.max-file-size =-1
spring.servlet.multipart.max-request-size =-1
if it good, it may give a problem with the zuul proxy's ribbon socket size, there are properties informed for this type of situation, the following:
Properties file of the getway :
ribbon.eager-load.enabled=true
hystrix.command.default.execution.timeout.enabled=false
hystrix.command.default.execution.isolation.strategy=THREAD
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=3999996
ribbon.ConnectTimeout=999999
ribbon.ReadTimeout=999999
ribbon.SocketTimeout=999999
zuul.host.socket-timeout-millis=999999
zuul.host.connect-timeout-millis=999999
zuul.sensitiveHeaders=Cookie,Set-Cookie

Geocoding requests to HERE API randomly fails

I am trying to geocode addresses with HERE API. I am not free plan. I try following code (Spring Boot in Kotlin):
override fun geocode(address: Address): Coordinate? {
val uriString = UriComponentsBuilder
.fromHttpUrl(endpoint)
.queryParam("app_id", appId)
.queryParam("app_code", appCode)
.queryParam("searchtext", addressToSearchText(address))
.toUriString()
logger.info("Geocode requested with url {}", uriString)
val response = restTemplate.getForEntity(uriString, String::class.java)
return response.body?.let {
Klaxon().parse<GeocodeResponse>(it)
}?.let {
it.Response.View.firstOrNull()?.Result?.firstOrNull()
}?.let {
Coordinate(
latitude = it.Location.DisplayPosition.Latitude,
longitude = it.Location.DisplayPosition.Longitude
)
}.also {
if (it == null) {
logger.warn("Geocode failed: {}", response.body)
}
}
}
It turned out that when I call this method many times in a row, some requests returns empty responses, like this:
{
"Response":{
"MetaInfo":{
"Timestamp":"2019-04-18T11:33:17.756+0000"
},
"View":[
]
}
}
I could not figure out any rule why some requests fail. It seems to be just random.
However, when I try to call same URLs with curl of in my browser, everything works just fine.
I guess there is some limit for amount requests per seconds, but I could not find anything in HERE documentation.
Does anyone have an idea about the limit? Or may it be something else?
Actually, there was a problem with my code. Requests were failing for addresses having "special" symbols like ü and ö. The problem was with building request URL
val uriString = UriComponentsBuilder
.fromHttpUrl(endpoint)
.queryParam("app_id", appId)
.queryParam("app_code", appCode)
.queryParam("searchtext", addressQueryParam(address))
.build(false) // <= this was missed
.toUriString()

Sending .gz file via CURL to RESTful put creating ZipException in GZIPInputStream

The application I am creating takes a gzipped file sent to a RESTful PUT, unzips the file and then does further processing like so:
public class Service {
#PUT
#Path("/{filename}")
Response doPut(#Context HttpServletRequest request,
#PathParam("filename") String filename,
InputStream inputStream) {
try {
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
// Do Stuff with GZIPInputStream
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
}
I am able to successfully send a gzipped file in a unit test like so:
InputStream inputStream = new FileInputStream("src/main/resources/testFile.gz);
Service service = new Service();
service.doPut(mockHttpServletRequest, "testFile.gz", inputStream);
// Verify processing stuff happens
But when I build the application and attempt to CURL the same file from the src/main/resources dir with the following I get a ZipException:
curl -v -k -X PUT --user USER:Password -H "Content-Type: application/gzip" --data-binary #testFile.gz https://myapp.dev.com/testFile.gz
The exception is:
java.util.zip.ZipException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
at Service.doPut(Service.java:23)
// etc.
So does anyone have any idea why sending the file via CURL causes the ZipException?
Update:
I ended up taking a look at the actual bytes being sent via the InputStream and figured out where the ZipException: Not in GZIP format error was coming from. The first two bytes of a GZIP file are required to be 1F and 8B respectively in order for GZIPInputStream to recognize the data as being in GZIP format. Instead the 8B byte, along with every other byte in the steam that doesn't correspond to a valid UTF-8 character, was transformed into the bytes EF, BF, BD which are the UTF-8 unknown character replacement bytes. Thus the server is reading the GZIP data as UTF-8 characters rather than as binary and is corrupting the data.
The issue I am having now is I can't figure out where I need to change the configuration in order to get the server to treat the compressed data as binary vs UTF-8. The application uses Jax-rs on a Jersey server using Spring-Boot that is deployed in a Kubernetes pod and ran as a service, so something in the setup of one of those technologies needs to be tweaked to prevent improper encoding from being used on the data.
I have tried adding -H "Content-Encoding: gzip" to the curl command, registering the EncodingFilter.class and GZipEncoder.class in jersey ResourceConfig class, adding application/gzip to the server.compression.mime-types in application.propertes, adding the #Consumes("application/gzip") annotation to the doPut method above, and several other things I can't remember off the top of my head but nothing seems to have any effect.
I am seeing the following in the verbose CURL logs:
> PUT /src/main/resources/testFile.gz
> HOST: my.host.com
> Authorization: Basic <authorization stuff>
> User-Agent: curl/7.54.1
> Accept: */*
> Content-Encoding: gzip
> Content-Type: application/gzip
> Content-Length: 31
>
} [31 bytes data]
* upload completely sent off: 31 out of 31 bytes
< HTTP/1.1 500
< X-Application-Context: application
< Content-Type: application/json;charset=UTF-8
< Transfer-Encoding: chunked
< Date: <date stuff>
...etc
Nothing I have done has affected the receiving side
Content-Type: application/json;charset=UTF-8
portion, which I suspect is the issue.
I met the same problem and finally solved it by using -H 'Content-Type:application/json;charset=UTF-8'
Use Charles to find the difference
I can successfully send the gzipped file using Postman. So I used Charles to catch two packages sent by curl and postman respectively. After I compared these two packages, I found that Postman used application/json as Content Type while curl used text/plain.
Spring docs: Content Type and Transformation
According to Spring docs, if the content type is text/plain and the source payload is byte[], Spring will convert the payload to string using charset specified in the content-type header. That's why ZipException occurred. Since the original byte data had already been decoded and not in gzip format anymore.
Spring source code
#Override
protected Object convertFromInternal(Message<?> message, Class<?> targetClass, #Nullable Object conversionHint) {
Charset charset = getContentTypeCharset(getMimeType(message.getHeaders()));
Object payload = message.getPayload();
return (payload instanceof String ? payload : new String((byte[]) payload, charset));
}

Spring Cloud Stream w/Kafka + Confluent Schema Registry Client broken?

Curious if anyone has got this working as I'm currently struggling.
I have created simple Source and Sink applications to send and receive an Avro schema based message. The schema for the message is held in a Confluent Schema Registry. Both apps are configured to use the ConfluentSchemaRegistryClient class but I think there might be a bug in here somewhere. Here's what I see that makes me wonder.
If I interact with the Confluent registry's REST API I can see that there is only one version of the schema in question (lightly edited to obscure what I'm working on):
$ curl -i "http://schemaregistry:8081/subjects/somesubject/versions"
HTTP/1.1 200 OK
Date: Fri, 05 May 2017 16:13:37 GMT
Content-Type: application/vnd.schemaregistry.v1+json
Content-Length: 3
Server: Jetty(9.2.12.v20150709)
[1]
When the Source app sends off its message over Kafka I noticed that the version in the header looked a bit funky:
contentType"application/octet-stream"originalContentType/"application/vnd.somesubject.v845+avro"
I'm not 100% clear about why the application/vnd.somesubject.v845+avro content type is wrapped up in application/octet-stream but ignoring that, note that it is saying version 845 not version 1.
Looking at the ConfluentSchemaRegistryClient implementation I see that it POSTs to /subjects/(string: subject)/versions and returns the id of the schema not the version. This then gets put into SchemaReference's version field: https://github.com/spring-cloud/spring-cloud-stream/blob/master/spring-cloud-stream-schema/src/main/java/org/springframework/cloud/stream/schema/client/ConfluentSchemaRegistryClient.java#L81
When the Sink app tries to fetch the schema for the message based upon the header it fails because it tries to fetch version 845 that its plucked out of the header: https://github.com/spring-cloud/spring-cloud-stream/blob/master/spring-cloud-stream-schema/src/main/java/org/springframework/cloud/stream/schema/client/ConfluentSchemaRegistryClient.java#L87
Anyone have thoughts on this? Thanks in advance.
** UPDATE **
OK pretty convinced this is a bug. Took the ConfluentSchemaRegistryClient and modified the register method slightly to POST to /subjects/(string: subject) (i.e. dropped the trailing /versions) which per Confluent REST API docs returns a payload with the version in it. Works like a charm:
public SchemaRegistrationResponse register(String subject, String format, String schema) {
Assert.isTrue("avro".equals(format), "Only Avro is supported");
String path = String.format("/subjects/%s", subject);
HttpHeaders headers = new HttpHeaders();
headers.put("Accept",
Arrays.asList("application/vnd.schemaregistry.v1+json", "application/vnd.schemaregistry+json",
"application/json"));
headers.add("Content-Type", "application/json");
Integer version = null;
try {
String payload = this.mapper.writeValueAsString(Collections.singletonMap("schema", schema));
HttpEntity<String> request = new HttpEntity<>(payload, headers);
ResponseEntity<Map> response = this.template.exchange(this.endpoint + path, HttpMethod.POST, request,
Map.class);
version = (Integer) response.getBody().get("version");
}
catch (JsonProcessingException e) {
e.printStackTrace();
}
SchemaRegistrationResponse schemaRegistrationResponse = new SchemaRegistrationResponse();
schemaRegistrationResponse.setId(version);
schemaRegistrationResponse.setSchemaReference(new SchemaReference(subject, version, "avro"));
return schemaRegistrationResponse;
}

If Chrome, use WebP

Because currently only Chrome and Opera supports WebP, I was wondering if I could target those two particular browsers and redirect them to fetch another version of my website so I can help optimize my site downloading speed more faster?
Thanks.
I solved this problem like this:
Check if the client advertises "image/webp" in Accept header
If WebP is supported, check if the local WebP file is on disk, and
serve it
If server is configured as proxy, append a "WebP: true" header and
forward to backend
Append "Vary: Accept" if a WebP asset is served
in Nginx:
location / {
if ($http_accept ~* "webp") { set $webp "true"; }
# Use $webp variable to add correct image.
}
In my case, I use thumbor software to convert images.
https://github.com/globocom/thumbor
pip install thumbor
My conf:
upstream thumbor {
server 127.0.0.1:9990;
server 127.0.0.1:9991;
server 127.0.0.1:9992;
server 127.0.0.1:9993;
server 127.0.0.1:9994;
}
location / {
if ($http_accept ~* "webp") {
set $webp "T";
}
if ($uri ~* "(jpg|jpeg)$") {
set $webp "${webp}T";
}
proxy_cache_key $host$request_uri$webp;
if ($webp = "TT") {
rewrite ^(.*)$ "/unsafe/smart/filters:format(webp)/exemple.com$uri" break;
proxy_pass http://thumbor;
add_header Content-Disposition "inline; filename=image.webp";
}
if ($webp != "TT") {
proxy_pass http://exemple.com;
}
}
For a while now, thumbor supports automatic webp conversion:
https://github.com/thumbor/thumbor/wiki/Configuration#auto_webp
You'll still have to configure the load balancer to pass the webp accepts header, but other than that, thumbor will take care of everything for you.
Hope that helps!

Resources