Why is my de-chunked request missing the ending CRLF? - asp.net-web-api

I've just spent the past 10 hours trying to figure out why my http request was failing when I did a
request.Content.ReadAsMultipartAsync().Result.Contents
It kept returning the error:
Unexpected end of MIME multipart stream. MIME multipart message is not complete.
After many hours of research, I finally discovered that the request did not have an ending CRLF, which apparently .Net needs to determine the end of the request. When I added my own CRLF, everything worked great.
In WireShark, I looked at one of the requests, I saw that the chunked request did have an ending CRLF, but the De-Chunked request did not.
So that leaves me with 2 questions.
Why is my request missing the ending CRLF, and
Is there any way to add it back before it gets to .Net so that .Net will process it correctly? Or, can I tell .Net to not look for the ending CRLF?

The two CRLFs that you circled in the last chunk of the chunked mode are part of the chunked transfer encoding, which specifies a CRLF after the number of bytes in each chunk (zero in the last chunk of the request/response entity) and an additional CRLF after the whole thing (see RFC 2616, section 3.6.1). MIME multipart (RFC 2046, section 5.1.1) does not require a CRLF after the last boundary, so the service you're receiving the entity from is not wrong when it doesn't add it. OTOH, the old .NET multipart parser was buggy in rejecting it. The ASP.NET team fixed the issue late last year, so you just need to use an up-to-date System.Net.Http.Formatting (ASP.NET WebApi client libraries 5.1.0 should be OK). If you absolutely can't use the up-to-date assembly, then, to work around the problem, I'd wrap the underlying stream with a special wrapper stream that supplies the trailing CRLF.

Related

Why is expires header response set to future date with max-age=0?

I noticed an odd behaviour with the expires property in the HTTP response header sent by Google Cloud Storage.
Though, the cache-control is defined with max-age:0 for the file in the metadata (as visible in the screenshot), the expires property is set to a date one year in the future (second screenshot). Why is this date set to the future?
The problematic thing with this behaviour is, that the most recent Firefox versions (v.77 and v.78) seem to interprete the expires property, though it is stated in the documentation, that it will not if max-age is defined (see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Expires). For livestream video playback with HLS, this behaviour leads to buffering after short time, because the manifest is cached by the browser. There is already a bug report on mozilla#bugzilla on this behaviour (see https://bugzilla.mozilla.org/show_bug.cgi?id=1648075).
Update
Further investigation showed, that Firefox is not the problem in this case, they just changed the interpretation of the header properties and stick to the documentation since v.77, where '=' is defined as needed character, not ':'. Obviously, other browsers (and FF up to v.76) interprete it also correctly with ':'.
Therefore, in our case the issue needs to be solved inside the service writing the files to our GCS.
It's hard to say why Google Cloud Storages is doing this. Perhaps the Expires is a default, while the Cache-Control is used for custom user settings? More likely, it's just an oversight.
The important point is that this behavior is allowed, and would be harmless with compliant browsers due to the explicit precedence of max-age:
If a response includes a Cache-Control field with the max-age directive (Section 5.2.2.8), a recipient MUST ignore the Expires field.
So the real issue is that Firefox is not conforming to the HTTP specification. Hopefully that bug will be fixed soon.

How to change default character encoding configuration in Jetty app server from UTF-8 to ISO-8859-1

I want my application to support ISO-8859-1 fully in jetty server. But I am Unable to change the default character encoding to ISO-8859-1. Where do i need to set the encoding/charsets?
This is for jetty-distribution-9.4.12, running a struts web application. I have tried modifying the webdefault.xml for encoding mappings. But somehow it fails to take UTF-8 for encoding.
I am seeing an issue when giving a name to an XML resource with japanese chars(私のユーザー). jetty server always fails in taking this name to my resource. when I check in the request I see that the content type is UTF-8 and HTTP 1.1 spec.
I want my server to support in taking my resource name as 私のユーザー. In order to make this happen, I wanted to add that compatibility to the server.
However, with the little knowledge I have, done some research tried to do some configurations in the server but nothing seems to work.
Trial 1
Changing the web-default.xml with locale-encoding
<locale-encoding-mapping>
<locale>en</locale>
<encoding>ISO-8859-1</encoding>
</locale-encoding-mapping>
Trial 2
adding the encoding property to the JAVA_OPTIONS in jetty.sh file
JAVA_OPTIONS+=("-Dfile.encoding=UTF-8")
referred links
Jetty Character encoding issue
Jetty 9, character encoding UTF-8
Jetty uses the current HTTP/1.1 specs (yep, all of these specs talk about current HTTP/1.1 specific behavior)
RFC7230: HTTP/1.1: Message Syntax and Routing
RFC7231: HTTP/1.1: Semantics and Content
RFC7232: HTTP/1.1: Conditional Requests
RFC7233: HTTP/1.1: Range Requests
RFC7234: HTTP/1.1: Caching
RFC7235: HTTP/1.1: Authentication
I think the most relevant spec to your question is from RFC7231 - Appendix B: Updates from RFC2616
The default charset of ISO-8859-1 for text media types has been
removed; the default is now whatever the media type definition says.
Likewise, special treatment of ISO-8859-1 has been removed from the
Accept-Charset header field. (Section 3.1.1.3 and Section 5.3.3)
The idea of ISO-8859-1 being the default charset has long ago been deprecated, the only place you'll find ISO-8859-1 indicated as a default charset is in old specs that have now been labelled as "obsolete" (such as RFC2616).
Timeline:
The older HTTP/1.1 spec, RFC2616, was released in 1999.
The faults in RFC2616 were identified and a revised spec started being discussed in 2006.
The updated specs RFC7230 thru RFC7235 were release in June 2014.
All of the major browser vendors (Chrome, Firefox, Edge, Safari, etc..) updated that year to support RFC7230 and related specs.
Over the years since, the major browser have started to drop RFC2616 concepts and support, removing behaviors, and even quietly dropping features that are from other obsolete specs (eg: older Set-Cookie header syntax now result in a no-op on the browser side, with the cookie being dropped).
Today (Sept 2019):
The HTTP 1.1 protocol has a default character encoding of UTF-8.
The HTTP 1.1 document default character encoding is UTF-8.
The HTTP 2 protocol has a default character encoding of UTF-8.
The HTTP 2 document default character encoding is UTF-8.
What all Web Developers today are responsible for:
You MUST limit your HTTP 1.1 protocol usages (headers names, header values) to US-ASCII.
Header names should follow HTTP 1.1 token rules. (this is a subset of US-ASCII)
Header values that contain a character outside of US-ASCII 1, MUST be encoded first in UTF-8 and then the hex values percent-encoded for representation in the header value.
If you intend to send a ISO-8859-1 document as a response body, then you MUST indicate as such in the HTTP Response Content-Type header the mime-type and charset. (eg: Content-Type: text/html; charset=ISO-8859-1)
But seeing as you didn't indicate where in the HTTP exchange you are wanting to set this default character encoding, it's hard to express a detailed answer/solution to your issue. (eg: it could be a problem with your encoding of application/x-www-form-urlencoded request body content and its interaction with the Servlet spec? which can be fixed with an additional field in your HTML5 form btw)
1: This might seem harsh, but if you check RFC 7230: 3.2.4 Field Parsing you'll see that the existence of characters in the header fields of HTTP outside of US-ASCII will at best be dropped, or at worst be interpreted to be a obs-fold or obs-text character rendering the entire request as bad resulting in a (400 Bad Request).
Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

UTF-8 encoding Google Apps Email Settings API

I've been using Google Apps Email Settings API for a while but I came to a problem when I tried to insert aliases, signatures or any information with "ñ" or "Ñ". It adds garbage instead of those characters and it doesn't seem to respect the charset specified (utf-8) in the HTTP header nor the XML character encoding.
I have tried via my own python code and also using OAuth Playground[1] but it's been impossible to properly add the mentioned characters.
¿Any idea/suggestion?
Thanks in advance.
EDIT: It seems that the problem is not in the request but in the response. I have encoded it successfully in my code but it should be also fixed in OAuth Playground.
[1] https://developers.google.com/oauthplayground/
I have succesfully called Google API client methods using UTF8-encoded strings, so it is definitely an issue with your Python setup.
I would workaround this issue sending Unicode strings instead of UTF-8 encoded:
u'literal string' # This is unicode
'encoded utf-8 string'.decode('utf-8') # This is unicode
EDIT: Re-reading your answer it seems that you are making raw HTTP calls with hand-made XML documents. I can't understand why. If it's the way you want to go, take a look into Emails Settings API client code to learn how to build the XML documents.

High-performance passive-access-optimised dynamic REST web pages

The following question is about a caching framework to be implemented or already existing for the REST-inspired behaviour described in the following.
The goal is that GET and HEAD requests should be handled as efficiently as requests to static pages.
In terms of technology, I think of Java Servlets and MySQL to implement the site. (But emergence of good reasons may still impact my choice of technology.)
The web pages should support GET, HEAD and POST; GET and HEAD being much more frequent than POST. The page content will not change with GET/HEAD, only with POST. Therefore, I want to serve GET and HEAD requests directly from the file system and only POST requests from the servlet.
A first (slightly incomplete) idea is that the POST request would pre-calculate the HTML for successive GET/HEAD requests and store it into the file system. GET/HEAD then would always obtain the file from there. I believe that this could easily be implemented in Apache with conditional URL rewriting.
The more refined approach is that GET would serve the HTML from the file system (and HEAD use it, too), if there is a pre-computed file, and otherwise would invoke the servlet machinery to generate it on the fly. POST in this case would not generate any HTML, but only update the database appropriately and delete the HTML file from the file system as a flag to have it generated anew with the next GET/HEAD. The advantage of this second approach is that it handles more gracefully the “initial phase” of the web pages, where no POST has been called yet. I believe that this lazy-generate-and-store approach could be implemented in Apache by providing an error-handler, which would invoke the servlet in case of “file-not-found-but-should-be-there”.
In a later round of refinement, to save bandwidth, the cached HTML files should also be available in a gzip-ed version which is served when the client understands that. I believe that the basic mechanisms should be the same as for the uncompressed HTML files.
Since there will be many such REST-like pages, both approaches might occasionally need some mechanism to garbage-collect rarely used HTML files in order to save file space.
To summarise, I am confident that my GET/HEAD-optimised architecture can be cleanly implemented. I would like to have opinions on the idea as such in the first place (I believe it is good, but I may be wrong) and whether somebody has already experience with such an architecture, perhaps even knows a free framework implementing it.
Finally, I'd like to note that client caching is not the solution I am after, because multiple different clients will GET or HEAD the same page. Moreover, I want to absolutely avoid the servlet machinery during GET/HEAD requests in case the pre-computed file exists. It should not even be invoked to provide cache-related HTTP headers in GET/HEAD requests nor dump a file to output.
The questions are:
Are there better (standard) mechanisms available to reach the goal stated at the beginning?
If not, does anybody know about an existing framework like the one I consider?
I think that a HTTP cache does not reach my goal. As far as I understand, the HTTP cache would still need to invoke the servlet with a HEAD request in order to learn whether a POST has meanwhile changed the page. Since page changes will come at unpredictable points in time, an HTTP header stating an expiration time is not good enough.
Use Expires HTTP Header and/or HTTP conditional requests.
Expires
The Expires entity-header field gives the date/time after which the response is considered stale. A stale cache entry may not normally be returned by a cache (either a proxy cache or a user agent cache) unless it is first validated with the origin server (or with an intermediate cache that has a fresh copy of the entity). See section 13.2 for further discussion of the expiration model.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
Conditional Requests
Decorate cache-able response with Expires,Last-Modified and/or ETag header. Make requests conditional with If-Modified-Since, If-None-Match header, If-*, etc. (see RFC).
e.g.
Last response headers:
...
Expires: Wed, 15 Nov 1995 04:58:08 GMT
...
don't perform new request on the resource before expiration date (the Expires header) and then perform conditional request:
...
If-Modified-Since: Wed, 15 Nov 1995 04:58:08 GMT
...
If the resource wasn't modified then 304 Not Modified response code is returned and the response doesn't have a body. 200 OK and response with body is returned otherwise.
Note: HTTP RFC also defines Cache-Control header
See Caching in HTTP
http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html

Asking Chrome to bypass local cache for XmlHttpRequest like it's possible in Firefox?

As some of you may already know, there are some caching issues in Firefox/Chrome for requests that are initiated by XmlHttpRequest object. These issues mean that browser does not strictly follow the rules and does not go to server for the new XSLT file (for example). Response does not have Expires header (for performance reasons we can't use it).
Firefox has additional parameter in the XHR object "channel" to which you put value Components.interfaces.nsIRequest.LOAD_BYPASS_CACHE to go to server explicitly.
Does something like that exist for Chrome?
Let me immediatelly stop everyone who would recommend adding timestamp as a value of GET parameter or random integer - I don't want server to get different URL requests. I want it to get the original URL. Reason is that I want to protect server from getting too many different requests for simple static files and sending too much data to clients when it is not needed.
If you hit static file with generated GET parameter (like '?forcenew=12314') would render 200 response each first time and 304 for every following request for that value of random integer. I want to make requests that will always return 304 if the target static file is identical to client version. This is BTW how web browsers should work out-of-the-box but XHR objects tend to not go to server at all to ask is file changed or not.
In my main project at work I had the same exact problem. My solution was not to append random strings or timestamps to GET requests, but to append a specific string to GET requests.
If you have a revision number e.g. subversion revision or likewise from git/mer or whatever you are using, append that. Static files will get 304 responses until the moment a new revision is released. When the new release happens a single 200 response is granted and it is back to happily generating 304 responses. :-)
This has the added bonus of being browser independent.
Should you be unlucky and not have a revision number, then make one up and increment it each time you make a release.
You should look into Etags, etags are keys that can be generated from the contents of the file therefore once the file on the server changes the system will be a new etag. Obviously this will be a service-side change which is something that you will need to do given that you want a 200 and then subsequent 304's. Chrome and FF should respect these etags so you shouldn't need to do any crazy client-side hacks.
Chrome now supports Cache-Control: max-age=0 request HTTP header. You can set it after you open an XMLHttpRequest instance:
xhr.setRequestHeader( "Cache-Control", "max-age=0" );
This will instruct Chrome to not use cached response without revalidation.
For more information check The State of Browser Caching, Revisited by Mark Nottingham and RFC 7234 Hypertext Transfer Protocol (HTTP/1.1): Caching.

Resources