Pipelines in c# - parallel-processing

I am writing a server code. The client writes asynchronously to this server. Using System.Net namespace, httplistener is hosted. I want to search for a pattern in the incoming stream. But as the stream is asynchronous how do I append the incoming next stream and carry on searching the pattern? The pattern is the message boundary in MIME part. Sorry for being vague as I am new to this technology. The message boundary can occur in next stream too. How to I parse these conditions?
How can I come to know about the end of input? Should I consider the last message boundary in the stream to be final or is there any end of file indication provided while client server communication.

Related

why http2 use prioritization over stream instead of requests?

The concepts of "stream, connection, message, and frame" constitute the main design of http2. And what confuses me is the idea of stream.
At first, the stream idea seems to me only as a virtual description of the flow of frames. But then I find the priority of http2 is aimed at streams instead of messages/requests. And why is that, I think the applications both client and server sides care more about and directly control the requests or messages, not which stream these messages reside in.
Plese refer to "stream prioritization":
https://developers.google.com/web/fundamentals/performance/http2#design_and_technical_goals
A stream in HTTP/2 corresponds to all the frames which make up a request and its corresponding response, so is the natural place to handle priority and flow control. The sentences "the response for this request should have high priority" and "the stream for this request and its response should have high priority" are equivalent.
There is a mention in the document you quote of a stream carrying "one or more messages", but I think that's just sloppy language in that document. If you look at section 8.1 of the spec it says "A client sends an HTTP request on a new stream" and "An HTTP request/response exchange fully consumes a single stream."
There can be other frames in that stream, such as PUSH_PROMISE, but those aren't actual requests and responses; the response data for a server push is sent on a new stream, which can then be given a different priority.

Why can't http2 streams be reused?

According to RFC7540:
An HTTP request/response exchange fully consumes a single stream. A request starts with the HEADERS frame that puts the stream into an "open" state. The request ends with a frame bearing END_STREAM, which causes the stream to become "half-closed (local)" for the client and "half-closed (remote)" for the server. A response starts with a HEADERS frame and ends with a frame bearing END_STREAM, which places the stream in the "closed" state.
Knowing that a stream cannot be reopened once it's closed, this means that if I want to implement a long-lived connection where the client sends a stream of requests to the server, I will have to use a new stream for each request. But there is a finite number of streams available, so in theory, I could run out of streams and have to restart the connection.
Why did the writers of the specification design a request/response exchange to completely consume a stream? Wouldn't it have been easy to make a stream like a single thread of exchanges, where you can have multiple exchanges done in serial in one stream?
The point of having many streams multiplexed in a single connection is to interleave them, so that if one cannot proceed, others can.
Reusing a stream for more than one request means just reusing its stream id. I don't see much benefit reusing 4-byte integers -- on the contrary the implementation would become quite more complicated.
For example, the server can inform the client of the last stream that it processed when it's about to close a connection. If stream ids are reused, it would not be possible to report this reliably.
Also, imagine the case where the client sends requestA on stream5; this arrives on the server where its processing takes time; the client times out, sends a RST_STREAM for stream5 (to cancel requestA) and then requestB on stream5. While these are in-flight, the server finishes the processing of requestA and sends the response for requestA on stream5. Now the client reads a response, but it does not know if it is that of requestA or that of requestB.
But there is a finite number of streams available, so in theory, I could run out of streams and have to restart the connection.
That is correct. At 1 ms per exchange, it will take about 12 days to consume the stream ids for a single connection ((2^31-1)/1000/3600/24/2=12.4 days) -- remember that stream ids are incremented by 2 (clients only send odd stream ids).
While this is possible, I have never encountered this case in all the HTTP/2 deployments that I have seen -- typically the connection goes idle and gets closed well before consuming all stream ids.
The specification preferred simplicity and stable features over the ability to reuse stream ids.
Also, bear in mind that HTTP/2 was designed mostly with the web in mind, where browsers make a number of requests to download a web page and its resources, but then stay idle for a while.
The case where an HTTP/2 connection is bombed with non-stop requests is definitely possible, but much rarer and as such it has not probably been deemed important enough in the design -- using 8 bytes for stream ids seems overkill and a cost that is paid for each request even if the 4 bytes limit is never, practically, reached.

Golang http write response without waiting to finish

I'm building an application that builds a pdf file and returns it to the client whenever it receives a request.
Since some of these pdf files might take some time to generate, I would like to periodically send some sort of status update back to client while it is running.
When it's finished building the pdf file, it should be returned to the client as well.
Something akin to:
func buildReport(writer http.ResponseWriter, request *http.Request){
//build pdf build pdf file
for { //for example purposes only
writer.Write([]byte("building. Please wait."))
}
pdf.OutputFileAndClose("report.pdf")
//set header to pdf so that the client knows it's a PDF
writer.Header().Set("Content-Type", "application/pdf")
http.ServeFile(writer, request, "report.pdf")
}
func main() {
http.HandleFunc("/", buildReport)
http.ListenAndServe(":8081", nil)
}
Setting the header might not work, as the writer can only have one header.
TL;DR is that it cannot be implemented that way. You need to
An API that requests the PDF creation. That queues PDF creation job in a task queue (so that too many PDF creation requests won't blow the HTTP server worker pool)
Provide an API that allows you to check where are you with the PDF rendering (I am assuming that the job can provide interim stats). This is going to be polled by the client on a regular basis.
An API to pull the PDF once it is ready.
Hope this helps and best of luck with your project.
This is by no means comprehensive, but a reasonable example of how you might construct your API (which needs to be asynchronous, as the previous respondent pointed out) can be found here: https://www.adayinthelifeof.nl/2011/06/02/asynchronous-operations-in-rest/
The job queue model is a pretty common one. I would recommend you also write a basic API binding library (you'd want this for your own testing purposes in any case) so that your users can understand how you intend them to use the API, and in writing it, you'll get a better sense of how asynchronous REST interactions feel from the end user side.
Contrary to what others have said, what you want is in fact
directly possible but requires fullfillment of the two preconditions:
HTTP/1.1 and above.
You'll be sending custom content to the clients — not PDF data
directly, — and they're prepared to accept and parse it.
You can then employ the so-called "chunked" payload encoding specifically
invented to handle "streamed" downloads where the server does not know how
many bytes it's about to send.
So you may invent some creative kind of payload where you first periodically
stream a "no op" / "progress" marker and then the actual payload.
Say, while the file is being prepared you periodically send a line of text
reading "PROCESSING" + LF then, when a result is ready you send
a line of text "READY" SIZE + LF where SIZE is the size, in bytes,
of the immediately following PDF document. After the document is streamed,
the server signals the end of data.
Hence the stream would look like
PROCESSING
PROCESSING
…
PROCESSING
READY 8388608
%PDF-1.3
…
%%EOF
The clients have to be able to parse this information from the stream
they're receiving and have a simple FSM in place to switch from state to
state as they fetch your stream.
The server has to make sure it flushes the stream after each "informational" line otherwise the whole thing would not be "interactive".
If you have a good idea about the overall state of the processing of the
document, each "status update" line could include the percentage of the work done, like in "PROCESSINGNN" + LF.

why http/2 stream id must be ascending?

in RFC 7540 section 5.1.1. (https://www.rfc-editor.org/rfc/rfc7540#section-5.1.1), it specifies as following:
The identifier of a newly established stream MUST be numerically greater than all streams that the initiating endpoint has opened or reserved.
I searched a lot on Google, but still no one explained why the stream ID must be in an ascending order. I don't see any benefit from making this rule to the protocol. From my point of view, out of order stream IDs should also work well if the server just consider the "stream ID" as an ID and use it to distinguish HTTP2 request.
So could anyone can help out explaining the exact reason for this specification?
Thanks a lot!
Strictly ascending stream IDs are an easy way to make them unique (per connection), and it's super-easy to implement.
Choosing - like you say - "out of order" stream IDs is potentially more complicated, as it requires to avoid clashes, and potentially consumes more resources, as you have to remember all the stream IDs that are in use.
I don't think there is any particular reason to specify that stream IDs must be ascending apart simplicity.
6.8. GOAWAY
The GOAWAY frame (type=0x7) is used to initiate shutdown of a
connection or to signal serious error conditions. GOAWAY allows an
endpoint to gracefully stop accepting new streams while still
finishing processing of previously established streams. This enables
administrative actions, like server maintenance.
There is an inherent race condition between an endpoint starting new
streams and the remote sending a GOAWAY frame. To deal with this
case, the GOAWAY contains the stream identifier of the last peer-
initiated stream that was or might be processed on the sending
endpoint in this connection. For instance, if the server sends a
GOAWAY frame, the identified stream is the highest-numbered stream
initiated by the client.
Once sent, the sender will ignore frames sent on streams initiated by
the receiver if the stream has an identifier higher than the included
last stream identifier. Receivers of a GOAWAY frame MUST NOT open
additional streams on the connection, although a new connection can
be established for new streams.
If the receiver of the GOAWAY has sent data on streams with a higher
stream identifier than what is indicated in the GOAWAY frame, those
streams are not or will not be processed. The receiver of the GOAWAY
frame can treat the streams as though they had never been created at
all, thereby allowing those streams to be retried later on a new
connection.

Equivalent of org.hornetq.api.core.client.ClientMessage.setBodyInputStream in IBM MQSeries

In the following JBoss/HornetQ user manual page you can see how HornetQ provides a mechanism for streaming data to a Message for a Queue using a java.io.InputStream. A JMS version of the same code is given. Has anyone come across an equivalent using IBM MQSeries / WebsphereMQ?
Say I have a large amount of data to place in the JMS Message which to me is just a stream of bytes. In the Hornet example, the stream is only read when the message is sent, so if it is, say a FileInputStream, then we only need enough memory to buffer a chunk of the bytes. I can use a javax.jms.BytesMessage to send in chunks of bytes and use the BytesMessage to buffer them. The problem with this is that the IBM implementation of BytesMessage (com.ibm.msg.client.jms.internal.JmsBytesMessageImpl) has to cache them until the Message is sent and if that is a large amount of data it is a problem. Worse it appears that although I am only sending bytes, the IBM implementation appears to keep duplicate copies, one in a BytesArrayOutputStream the other in a DataOutputStream.
In WebSphere MQ the closest thing to what you describe is a reference message. The method described in the Infocenter requires custom programming of channel exits to grab the filesystem object and put it into a message before it is transmitted over the channel. A complementary exit on the remote side saves the payload to a file and puts a reference to the file in the message that is returned to the app.
We also have programs in WMQ that take STDIN or a pipe at one end and put messages to a queue on the other end. A pair of these can act as a pipe through which line-oriented ASCII data flows between processes on separate machines. However, there's no JMS implementation of this and it doesn't work too well for binary data.
In WMQ, we have concept of Group and Segment.
Segmentation is supported in all OS except Z/OS.
Check for details here Segmentation In WMQ
Make use of GroupId, MsgSeqNumber, and Offset while putting the message.
While getting the message if you give MQGMO_COMPLETE_MSG in GMO, then all segments are joined automatically according to the MsgSeqNumber and
you will get a single message on the recieving application with a
single GET.

Resources