Spring Cloud Stream w/Kafka + Confluent Schema Registry Client broken? - spring

Curious if anyone has got this working as I'm currently struggling.
I have created simple Source and Sink applications to send and receive an Avro schema based message. The schema for the message is held in a Confluent Schema Registry. Both apps are configured to use the ConfluentSchemaRegistryClient class but I think there might be a bug in here somewhere. Here's what I see that makes me wonder.
If I interact with the Confluent registry's REST API I can see that there is only one version of the schema in question (lightly edited to obscure what I'm working on):
$ curl -i "http://schemaregistry:8081/subjects/somesubject/versions"
HTTP/1.1 200 OK
Date: Fri, 05 May 2017 16:13:37 GMT
Content-Type: application/vnd.schemaregistry.v1+json
Content-Length: 3
Server: Jetty(9.2.12.v20150709)
[1]
When the Source app sends off its message over Kafka I noticed that the version in the header looked a bit funky:
contentType"application/octet-stream"originalContentType/"application/vnd.somesubject.v845+avro"
I'm not 100% clear about why the application/vnd.somesubject.v845+avro content type is wrapped up in application/octet-stream but ignoring that, note that it is saying version 845 not version 1.
Looking at the ConfluentSchemaRegistryClient implementation I see that it POSTs to /subjects/(string: subject)/versions and returns the id of the schema not the version. This then gets put into SchemaReference's version field: https://github.com/spring-cloud/spring-cloud-stream/blob/master/spring-cloud-stream-schema/src/main/java/org/springframework/cloud/stream/schema/client/ConfluentSchemaRegistryClient.java#L81
When the Sink app tries to fetch the schema for the message based upon the header it fails because it tries to fetch version 845 that its plucked out of the header: https://github.com/spring-cloud/spring-cloud-stream/blob/master/spring-cloud-stream-schema/src/main/java/org/springframework/cloud/stream/schema/client/ConfluentSchemaRegistryClient.java#L87
Anyone have thoughts on this? Thanks in advance.
** UPDATE **
OK pretty convinced this is a bug. Took the ConfluentSchemaRegistryClient and modified the register method slightly to POST to /subjects/(string: subject) (i.e. dropped the trailing /versions) which per Confluent REST API docs returns a payload with the version in it. Works like a charm:
public SchemaRegistrationResponse register(String subject, String format, String schema) {
Assert.isTrue("avro".equals(format), "Only Avro is supported");
String path = String.format("/subjects/%s", subject);
HttpHeaders headers = new HttpHeaders();
headers.put("Accept",
Arrays.asList("application/vnd.schemaregistry.v1+json", "application/vnd.schemaregistry+json",
"application/json"));
headers.add("Content-Type", "application/json");
Integer version = null;
try {
String payload = this.mapper.writeValueAsString(Collections.singletonMap("schema", schema));
HttpEntity<String> request = new HttpEntity<>(payload, headers);
ResponseEntity<Map> response = this.template.exchange(this.endpoint + path, HttpMethod.POST, request,
Map.class);
version = (Integer) response.getBody().get("version");
}
catch (JsonProcessingException e) {
e.printStackTrace();
}
SchemaRegistrationResponse schemaRegistrationResponse = new SchemaRegistrationResponse();
schemaRegistrationResponse.setId(version);
schemaRegistrationResponse.setSchemaReference(new SchemaReference(subject, version, "avro"));
return schemaRegistrationResponse;
}

Related

#RequestHeader behaviour not as expected; working without required User-Agent

I have a project set up using Spring Boot with Kotlin to make REST APIs.
I'm trying to use the #RequestHeader to recognize the User-Agent. The said header is required=true -
#PostMapping("details", produces = ["application/json"])
fun addInfo(#RequestHeader(name = "User-Agent", required = true) userAgent: String,
#Valid #RequestBody podEntity: PodEntity): ResponseEntity<String> {
pod.addPod(podcastEntity)
return ResponseEntity<String>("{ \"status\":\"Added\" }", HttpStatus.CREATED)
}
Problems -
However, even if I'm not sending the User-Agent header, the API is working and adding data. But, if I change the header to any other non default names, like, user or request-source, the required=true requirement is enforced to and the API does not work. Does it mean that default headers cannot be made mandatory using the required tag?
The other problem is that in the case of custom header, when the required header is missing for the request, the API fails by giving 400 error code but does not throw any exception. I was expecting HttpClientErrorException for my junit test case but on checking the console, I see no exception. Adding #Throws is also not helping. How do enable my function to throw an exception when the required header is missing?
Unit test -
#Test
fun test_getAll_fail_missingHeaders() {
val url = getRootUrl() + "/details/all"
val headers = HttpHeaders()
val request = HttpEntity(pod, headers)
try {
restTemplate!!.postForEntity(url, request, String::class.java)
fail()
} catch (ex: HttpClientErrorException) {
assertEquals(400, ex.rawStatusCode);
assertEquals(true, ex.responseBodyAsString.contains("Missing request header"))
}
}

Azure Form Recognizer training not finding data

I'm trying to train a Form Recognizer using the browser API console (https://eastus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api/operations/TrainCustomModel/console). I've uploaded traning images to a container and created an SAS. The browser API console generate following HTTP request:
POST https://eastus.api.cognitive.microsoft.com/formrecognizer/v1.0-preview/custom/train?source=https://pythonimages.blob.core.windows.net/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2020-01-22T00:23:33Z&st=2020-01-21T16:23:33Z&spr=https&sig=••••••••••••••••••••••••••••••••&prefix=images HTTP/1.1
Host: eastus.api.cognitive.microsoft.com
Content-Type: application/json
Ocp-Apim-Subscription-Key: ••••••••••••••••••••••••••••••••
{
"source": "string",
"sourceFilter": {
"prefix": "string",
"includeSubFolders": true
}
}
However, the answer I get back is
Transfer-Encoding: chunked
x-envoy-upstream-service-time: 4
apim-request-id: 5ad37aa2-e251-4b61-98ae-023930b47d27
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
Date: Tue, 21 Jan 2020 16:25:03 GMT
Content-Type: application/json; charset=utf-8
{
"error": {
"code": "1004",
"message": "Dataset path must be relative to local input mount path '/input' if local data is referenced."
}
}
I don't understand why it seems to be looking for data locally. I've experimented with the SAS, e.g. including the container name (images) in the blob http address rather than as a query parameter, but no success so far.
I've also tried the Python/REST path (described here: https://learn.microsoft.com/en-gb/azure/cognitive-services/form-recognizer/quickstarts/python-train-extract-v1), which results in a different error:
Response status code: 408
Response body: {'error': {'code': '1011', 'innerError': {'requestId': 'e7f9ef9f-97bc-4b6a-86f3-0b29c9591c87'}, 'message': 'The operation exceeded allowed time limit and was canceled. The common reasons are that the data source is too large or contains unsupported content. Please check that your request conforms to service limits and retry with redacted data source.'}}
For completeness, the code I use is as follows (key/signature *ed out:)
########### Python Form Recognizer Train #############
from requests import post as http_post
# Endpoint URL
base_url = r"https://markusformsrecognizer.cognitiveservices.azure.com/" + "/formrecognizer/v1.0-preview/custom"
source = r"https://pythonimages.blob.core.windows.net/images?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2020-01-22T15:37:26Z&st=2020-01-22T07:37:26Z&spr=https&sig=*********************************"
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': '*********************************'
}
url = base_url + "/train"
body = {"source": source}
try:
resp = http_post(url = url, json = body, headers = headers)
print("Response status code: %d" % resp.status_code)
print("Response body: %s" % resp.json())
except Exception as e:
print(str(e))
For error code 1004 Please follow the below to get the Source path containing the training documents and pass as value to the source key.
{
"source": "string",
"sourceFilter": {
"prefix": "string",
"includeSubFolders": true
}
}
Replace with the Azure Blob storage container's shared access signature (SAS) URL. To retrieve the SAS URL, open the Microsoft Azure Storage Explorer, right-click your container, and select Get shared access signature.
Make sure the Read and List permissions are checked, and click Create.
Then copy the value in the URL section. It should have the form:
https://.blob.core.windows.net/container name?SAS value.
Please use the new Form Recognizer v2.0 release it is an async API and enables training on large data sets and analyzing large documents. https://aka.ms/form-recognizer/api
quick start - https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/python-train-extract
To get started with Form Recognizer please login to the Azure Portal using this link to create a Form Recognizer resource (for v2.0 (preview) please use West US 2 or West Europe regions).
try removing the string value from prefix property.
{
"source": "string",
"sourceFilter": {
"prefix": "",
"includeSubFolders": true
}
}
The Python Quick Start code for version 2.0 seems to be working, at least I don’t get any errors anymore. I’m now feeling slightly silly that I didn’t try this earlier. The API (web-browser) console, linked from the Quick Start page of the Form Recognizer seems automatically assume I want to use version 1.0 and there’s no way to change that (or perhaps I’ve just overseen something). Hence I assumed I’d been allocated a v1.0 trial and therefore that’s what I used when I tried the Python Quick Start the first time around.
Instead of using just the SAS URI in the "source" of Request parameter on the API POST call, use the complete string of the container followed by the SAS URI token.
For ex:
https://.blob.core.windows.net//

Sending .gz file via CURL to RESTful put creating ZipException in GZIPInputStream

The application I am creating takes a gzipped file sent to a RESTful PUT, unzips the file and then does further processing like so:
public class Service {
#PUT
#Path("/{filename}")
Response doPut(#Context HttpServletRequest request,
#PathParam("filename") String filename,
InputStream inputStream) {
try {
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
// Do Stuff with GZIPInputStream
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
}
I am able to successfully send a gzipped file in a unit test like so:
InputStream inputStream = new FileInputStream("src/main/resources/testFile.gz);
Service service = new Service();
service.doPut(mockHttpServletRequest, "testFile.gz", inputStream);
// Verify processing stuff happens
But when I build the application and attempt to CURL the same file from the src/main/resources dir with the following I get a ZipException:
curl -v -k -X PUT --user USER:Password -H "Content-Type: application/gzip" --data-binary #testFile.gz https://myapp.dev.com/testFile.gz
The exception is:
java.util.zip.ZipException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
at Service.doPut(Service.java:23)
// etc.
So does anyone have any idea why sending the file via CURL causes the ZipException?
Update:
I ended up taking a look at the actual bytes being sent via the InputStream and figured out where the ZipException: Not in GZIP format error was coming from. The first two bytes of a GZIP file are required to be 1F and 8B respectively in order for GZIPInputStream to recognize the data as being in GZIP format. Instead the 8B byte, along with every other byte in the steam that doesn't correspond to a valid UTF-8 character, was transformed into the bytes EF, BF, BD which are the UTF-8 unknown character replacement bytes. Thus the server is reading the GZIP data as UTF-8 characters rather than as binary and is corrupting the data.
The issue I am having now is I can't figure out where I need to change the configuration in order to get the server to treat the compressed data as binary vs UTF-8. The application uses Jax-rs on a Jersey server using Spring-Boot that is deployed in a Kubernetes pod and ran as a service, so something in the setup of one of those technologies needs to be tweaked to prevent improper encoding from being used on the data.
I have tried adding -H "Content-Encoding: gzip" to the curl command, registering the EncodingFilter.class and GZipEncoder.class in jersey ResourceConfig class, adding application/gzip to the server.compression.mime-types in application.propertes, adding the #Consumes("application/gzip") annotation to the doPut method above, and several other things I can't remember off the top of my head but nothing seems to have any effect.
I am seeing the following in the verbose CURL logs:
> PUT /src/main/resources/testFile.gz
> HOST: my.host.com
> Authorization: Basic <authorization stuff>
> User-Agent: curl/7.54.1
> Accept: */*
> Content-Encoding: gzip
> Content-Type: application/gzip
> Content-Length: 31
>
} [31 bytes data]
* upload completely sent off: 31 out of 31 bytes
< HTTP/1.1 500
< X-Application-Context: application
< Content-Type: application/json;charset=UTF-8
< Transfer-Encoding: chunked
< Date: <date stuff>
...etc
Nothing I have done has affected the receiving side
Content-Type: application/json;charset=UTF-8
portion, which I suspect is the issue.
I met the same problem and finally solved it by using -H 'Content-Type:application/json;charset=UTF-8'
Use Charles to find the difference
I can successfully send the gzipped file using Postman. So I used Charles to catch two packages sent by curl and postman respectively. After I compared these two packages, I found that Postman used application/json as Content Type while curl used text/plain.
Spring docs: Content Type and Transformation
According to Spring docs, if the content type is text/plain and the source payload is byte[], Spring will convert the payload to string using charset specified in the content-type header. That's why ZipException occurred. Since the original byte data had already been decoded and not in gzip format anymore.
Spring source code
#Override
protected Object convertFromInternal(Message<?> message, Class<?> targetClass, #Nullable Object conversionHint) {
Charset charset = getContentTypeCharset(getMimeType(message.getHeaders()));
Object payload = message.getPayload();
return (payload instanceof String ? payload : new String((byte[]) payload, charset));
}

Xamarin Android : Https Post Request to Local Intranet Web Api Causes error : StatusCode: 404, ReasonPhrase: 'Not Found'

This is going to be a long question..
Our company has to follow PCI Standards, so a while back we had to ensure all our Servers were TLS1.2 compliant. As a result we implemented TLS as explained here in our Xamarin Forms app. But we noticed issues in Android versions less then Api 22. So we implemented a dependency service for fetching the HTTPClient and if the Api versions were less than 22 we implemented a custom ssl socket factory, here's the example.
Everything was fine till a few weeks back there was a decision to upgrade the servers to Windows 2016 on the dev environment. We've redeployed our Web Api to the server and ever since then, the api is inaccessible from a few devices. The problem we've faced is in Samsung Galaxy S4(Android 4.4) and Nexus 5(Android 5.1.1). We've tried testing the app on a Samsung Galaxy A7(Android 6) and it works okay. iOS is also fine.
This is the error we recieve on the S4 and Nexus 5
StatusCode: 404, ReasonPhrase: 'Not Found', Version: 1.1, Content:
System.Net.Http.StreamContent, Headers: { Date: Wed, 20 Sep 2017
04:00:09 GMT Server: Microsoft-IIS/10.0 X-Android-Received-Millis:
1505880010792 X-Android-Response-Source: NETWORK 404
X-Android-Selected-Transport: http/1.1 X-Android-Sent-Millis:
1505880010781 X-Powered-By: ASP.NET Content-Length: 1245 Content-Type:
text/html
Here's the signature of the Web Api
[HttpPost("GetMinimumVersion")]
public GetMinimumVersionResponse GetMinimumVersion([FromBody] GetMinimumVersionRequest value)
And this is the code we use to make a post request
using (_httpclient = _deviceInfo.GetHttpClient())
{
_httpclient.MaxResponseContentBufferSize = 256000;
_httpclient.BaseAddress = new Uri(BaseAddress);
_httpclient.Timeout = timeout > 0 ? TimeSpan.FromSeconds(timeout) : TimeSpan.FromSeconds(60000);
Insights.Track("ApiUrlCalled", new Dictionary<string, string> { { "Api URL", url } });
var jsonOut = new StringContent(JsonConvert.SerializeObject(body, new IsoDateTimeConverter()));
jsonOut.Headers.ContentType = new MediaTypeHeaderValue("application/json");
HttpResponseMessage response = await _httpclient.PostAsync(url, jsonOut);
switch (response.StatusCode)
{
case HttpStatusCode.OK:
var content = await response.Content.ReadAsStringAsync();
var result = JsonConvert.DeserializeObject<T>(content);
ReceiveNotificationDateTime(result);
return result;
default:
var result1 = new T { StatusID = (int)SystemStatusOutcomes.Failed, StatusMessage = response.ToString() };
ReceiveNotificationDateTime(result1);
return result1;
}
}
It's worth noting that the app when talking to the production api works fine on all devices. And we're also able to make post requests to the dev api via Postman.
After some digging and scratching, I found out that the ciphers used on production and dev were different.
Here's the cipher used on Prod
and here's the one used on dev.
I had a look at the SSL Ciphers Android supports here. And it looks like the ciper suite TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 is supported in Android Api version 20+. This makes sense that it wont work on Android 4.4. But why would we get this error on Nexus 5? Any pointers?
Also is there any workaround to get this cipher enabled on Android 4.4?

How can I Read and Transfer chunks of file with Hadoop WebHDFS?

I need to transfer big files (at least 14MB) from the Cosmos instance of the FIWARE Lab to my backend.
I used the Spring RestTemplate as a client interface for the Hadoop WebHDFS REST API described here but I run into an IO Exception:
Exception in thread "main" org.springframework.web.client.ResourceAccessException: I/O error on GET request for "http://cosmos.lab.fiware.org:14000/webhdfs/v1/user/<user.name>/<path>?op=open&user.name=<user.name>":Truncated chunk ( expected size: 14744230; actual size: 11285103); nested exception is org.apache.http.TruncatedChunkException: Truncated chunk ( expected size: 14744230; actual size: 11285103)
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:580)
at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:545)
at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:466)
This is the actual code that generates the Exception:
RestTemplate restTemplate = new RestTemplate();
restTemplate.setRequestFactory(new HttpComponentsClientHttpRequestFactory());
restTemplate.getMessageConverters().add(new ByteArrayHttpMessageConverter());
HttpEntity<?> entity = new HttpEntity<>(headers);
UriComponentsBuilder builder =
UriComponentsBuilder.fromHttpUrl(hdfs_path)
.queryParam("op", "OPEN")
.queryParam("user.name", user_name);
ResponseEntity<byte[]> response =
restTemplate
.exchange(builder.build().encode().toUri(), HttpMethod.GET, entity, byte[].class);
FileOutputStream output = new FileOutputStream(new File(local_path));
IOUtils.write(response.getBody(), output);
output.close();
I think this is due to a transfer timeout on the Cosmos instance, so I tried to
send a curl on the path by specifying offset, buffer and length parameters, but they seem to be ignored: I got the whole file.
Thanks in advance.
Ok, I found out a solution. I don't understand why, but the transfer succeds if I use a Jetty HttpClient instead of the RestTemplate (and so Apache HttpClient). This works now:
ContentExchange exchange = new ContentExchange(true){
ByteArrayOutputStream bos = new ByteArrayOutputStream();
protected void onResponseContent(Buffer content) throws IOException {
bos.write(content.asArray(), 0, content.length());
}
protected void onResponseComplete() throws IOException {
if (getResponseStatus()== HttpStatus.OK_200) {
FileOutputStream output = new FileOutputStream(new File(<local_path>));
IOUtils.write(bos.toByteArray(), output);
output.close();
}
}
};
UriComponentsBuilder builder = UriComponentsBuilder.fromHttpUrl(<hdfs_path>)
.queryParam("op", "OPEN")
.queryParam("user.name", <user_name>);
exchange.setURL(builder.build().encode().toUriString());
exchange.setMethod("GET");
exchange.setRequestHeader("X-Auth-Token", <token>);
HttpClient client = new HttpClient();
client.setConnectorType(HttpClient.CONNECTOR_SELECT_CHANNEL);
client.setMaxConnectionsPerAddress(200);
client.setThreadPool(new QueuedThreadPool(250));
client.start();
client.send(exchange);
exchange.waitForDone();
Is there any known bug on the Apache Http Client for chunked files transfer?
Was I doing something wrong in my RestTemplate request?
UPDATE: I still don't have a solution
After few tests I see that I don't have solved my problems.
I found out that the hadoop version installed on the Cosmos instance is quite old Hadoop 0.20.2-cdh3u6 and I read that WebHDFS doesn't support partial file transfer with length parameter (introduced since v 0.23.3).
These are the headers I received from the Server when I send a GET request using curl:
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: HEAD, POST, GET, OPTIONS, DELETE
Access-Control-Allow-Headers: origin, content-type, X-Auth-Token, Tenant-ID, Authorization
server: Apache-Coyote/1.1
set-cookie: hadoop.auth="u=<user>&p=<user>&t=simple&e=1448999699735&s=rhxMPyR1teP/bIJLfjOLWvW2pIQ="; Version=1; Path=/
Content-Type: application/octet-stream; charset=utf-8
content-length: 172934567
date: Tue, 01 Dec 2015 09:54:59 GMT
connection: close
As you see the Connection header is set to close. Actually, the connection is usually closed each time the GET request lasts more than 120 seconds, even if the file transfer has not been completed.
In conclusion, I can say that Cosmos is totally useless if it doesn't support large file transfer.
Please correct me if I'm wrong, or if you know a workaround.

Resources