Reactive approach of Marklogic database - spring-boot

Does Marklogic supports backpressure or allow to send data in chunks that is reactive approach ?

'Reactive' is a fairly new term describing a particular incarnation of old concepts common in server and database technologies, but fairly new to modern client and middle-tier programming.
I am assuming the question is prompted by the need/desire to work within an existing 'Reactive' framework (such as vert.x or Rx/Java). For that question, the answer is 'no' - there is not an 'official' API which integrates directly with these frameworks to my knowledge. There are community APIs which I have not personally used, an example is https://github.com/etourdot/vertx-marklogic (reactive, vert.x marklogic API).
MarkLogic is a 'reactive' design internally in that it implements the functionality the modern 'reactive' term is used to describe -- but does not expose any standard 'reactive APIs' for this (there are very few standards in this area). Code running within MarkLogic server (xquery,javascript) implicitly benefits from this - although there is not an explicit backpressure API, a side effect of single threaded blocking IO (from the app perspective) is that the equivalent of 'back pressure' is implemented by implicit flow control of the IO APIS - you cannot over drive a properly configured ML server on a single thread doing blocking IO. Connections to an overloaded server will take longer and eventually time out ('backpressure' :)
Similarly, (most of) the external APIs (REST, XCC) are also blocking, single threaded.
The server core manages rate control via a variety of methods such as actively managing the TCP connection queue size, keep alive times, numbers of active threads etc.
In general the server does a very good job at this without explicit low level programming needed, balancing the latency across all clients. If this needs improving, the administration guides have good direction on how to tune the various parameters so the system behaves well on its own.
If you want to implement a per-connection client aware 'reactive' API you will need to implement it yourself. This can be done using the same techniques used for other blocking IO APis -- i.e. either use multiple threads or non-blocking IO. Some of the ML SDK's have provision for non-blocking IO or control over timeouts which can be used to implement a 'reactive' API.
Similarly, code running in the server itself (XQuery or JavaScript) can implement 'reactive' type behaviour by making use of the task queue -- as exposed by the xdmp:spawn-xxx apis. This is done in many libraries to manage bulk ingest. Care must be taken to carefully control the amount of concurrency as you can easily overload the server by spawning too many concurrent requests. Managing state is a bit tricky as there is a interaction/opposition between the transaction model and task creation -- the former generally presenting an idempotent view of data that can be incongruous with the concept of 'current' wrt asynchronous tasks.

Related

KDB+/Q: GRPC implementation?

gRPC is a modern open source high performance RPC framework that can
run in any environment. It can efficiently connect services in and
across data centers with pluggable support for load balancing,
tracing, health checking and authentication. It is also applicable in
last mile of distributed computing to connect devices, mobile
applications and browsers to backend services.
I'm finding GRPC is becoming increasingly more pertinent in backend infrastructure, and would've liked to have it in my favorite language/tsdb kdb+/q.
I was surprised to find that kdb+ does not have a grpc implementation. Obviously, the (https://code.kx.com/q/interfaces/protobuf/)
package doesn't support the parsing of rpc's, is there anything quantitatively preventing there being a KDB+ implementation of the rpc requests/services etc. found in grpc?
Why would one not want to implement rpc's (grpc) in kdb+ and would it be a good idea to wrap a c++/c implemetation therin inorder to achieve this functionality.
Thanks for your advice.
Interesting post:
https://zimarev.com/blog/event-sourcing/myth-busting/2020-07-09-overselling-event-sourcing/
outlines event sourcing, which I think might be a better fit for kdb?
What is the main issue with services using RPC calls to exchange information? Well, it’s the high degree of coupling introduced by RPC by its nature. The whole group of services or even the whole system can go down if only one of the services stops working. This approach diminishes the whole idea of independent components.
In my practice I hardly encounter any need to use RPC for inter-service communication. Partially because I often use Event Sourcing, more about it later. But we always use asynchronous communication and exchange information between services using events, even without Event Sourcing.
For example, an order microservice in an e-commerce system needs customer data from the customer microservice. These dependencies between microservices are not ideal. Other microservices can go down and synchronous RESTful requests over https do not scale well due to their blocking nature. If there was a way to completely eliminate dependencies between microservices completely the result would be a more robust architecture with less bottlenecks.
You don’t need Event Sourcing to fix this issue. Event-driven systems are perfectly capable of doing that. Event Sourcing can eliminate some of the associated issues like two-phase commits, but again, not a requirement to remove the temporal coupling from your system.

Should we prefer SSE + REST over websocket when using HTTP/2?

When using websocket, we need a dedicated connection for bidirectionnel communication. If we use http/2 we have a second connection maintained by the server.
In that case, using websocket seems to introduce an unecessary overhead because with SSE and regular http request we can have the advantage of bidirectionnal communication over a single HTTP/2 connection.
What do you think?
Using 2 streams in one multiplexed HTTP/2 TCP connection (one stream for server-to-client communication - Server Sent Events (SSE), and one stream for client-to-server communication and normal HTTP communication) versus using 2 TCP connections (one for normal HTTP communication and one for WebSocket) is not easy to compare.
Probably the mileage will vary depending on applications.
Overhead ? Well, certainly the number of connections doubles up.
However, WebSocket can compress messages, while SSE cannot.
Flexibility ? If the connections are separated, they can use different encryptions. HTTP/2 typically requires very strong encryption, which may limit performance.
On the other hand, WebSocket does not require TLS.
Does clear-text WebSocket work in mobile networks ? In the experience I have, it depends. Antiviruses, application firewalls, mobile operators may limit WebSocket traffic, or make it less reliable, depending on the country you operate.
API availability ? WebSocket is a wider deployed and recognized standard; for example in Java there is an official API (javax.websocket) and another is coming up (java.net.websocket).
I think SSE is a technically inferior solution for bidirectional web communication and as a technology it did not become very popular (no standard APIs, no books, etc - in comparison with WebSocket).
I would not be surprised if it gets dropped from HTML5, and I would not miss it, despite being one of the first to implement it in Jetty.
Depending on what you are interested in, you have to do your benchmarks or evaluate the technology for your particular case.
From the perspective of a web developer, the difference between Websockets and a REST interface is semantics. REST uses a request/response model where every message from the server is the response to a message from the client. WebSockets, on the other hand, allow both the server and the client to push messages at any time without any relation to a previous request.
Which technique to use depends on what makes more sense in the context of your application. Sure, you can use some tricks to simulate the behavior of one technology with the other, but it is usually preferably to use the one which fits your communication model better when used by-the-book.
Server-sent events are a rather new technology which isn't yet supported by all major browsers, so it is not yet an option for a serious web application.
It depends a lot on what kind of application you want to implement. WebSocket is more suitable if you really need a bidirectional communication between server and client, but you will have to implement all the communication protocol and it might not be well supported by all IT infrastructures (some firewall, proxy or load balancers may not support WebSockets). So if you do not need a 100% bidirectional link, I would advise to use SSE with REST requests for additional information from client to server.
But on the other hand, SSE comes with certain caveats, like for instance in Javascript implementation, you can not overwrite headers. The only solution is to pass query parameters, but then you can face an issue with the query string size limit.
So, again, choosing between SSE and WebSockets really depends on the kind of application you need to implement.
A few months ago, I had written a blog post that may give you some information: http://streamdata.io/blog/push-sse-vs-websockets/. Although at that time we didn't consider HTTP2, this can help know what question you need to ask yourself.

What are the advantages of websocket APIs to middleware?

Some pieces of middleware support websockets natively e.g. HiveMQ: http://www.hivemq.com/mqtt-over-websockets-with-hivemq/. What advantages are conferred to a developer using the websockets API as a first class client to the middleware, rather than routing requests through an intermediary server that supports language specific APIs e.g.
Client -> Middleware
vs
Client -> Server -> Middleware
For example, we could argue that skipping an intermediary server will reduce bandwidth costs, not require a developer to write an extra layer, native SSL websockets support?
What other advantages might be provided to not just a developer, but any party through providing websockets support for middleware?
The main advantage you get is simplicity and in case of HiveMQ, scalability.
Let me explain these advantages:
Simplicity
In case of HiveMQ, you just start the server and you are good to go. All web applications which use a MQTT library over websockets can connect to the server without even knowing that websockets as transport is used. For HiveMQ itself, it's just another MQTT client. So it doesn't matter if the clients are connected via websockets or via a classic TCP connection. I think you already mentioned the other arguments in your question. And of course last but not least the operations guys will thank you if they have one system (in your case the "Server") less to maintain.
Scalability
Software like HiveMQ is very scalable and it can handle up to hundreds of thousands of concurrent connected clients. The chance is high, that the additional layer ("Server" in your case) could introduce a bottleneck. Also, things like load balancing with a HW or SW load balancer gets a lot easier if you can throw out unneeded layers. In general, your architecture of your system will get a lot of easier if you don't need these additional layers (which are not services which can be reused for other applications, like microservices are).
Last but not least it's worth noting, that HiveMQ itself is often integrated with classic middleware / ESBs. That means, people write custom plugins for integrating HiveMQ to their existing middleware. JMS or webservice calls (REST, SOAP) are often used for doing that.
Take that answer with a grain of salt, since I'm involved developing HiveMQ :-)

Replace ZeroMQ's select() on windows

It is unbelievable that ZeroMQ uses select() on Windows, I didn't know that until I have completes my code and started performance test. They should present this information on their web site with big red font.
Is there anyway to replace ZeroMQ's select()?
IOCP is proactor model and can't be easily integrated into it, how about WSAEventSelect, this is also a reactor model and have a near performance like poll.
Another choice for me is http://nanomsg.org/, but it is still alpha.
One of the main objectives in Zeromq is to provide a consistent API for communication between threads, processes, nodes, and clusters. Protocol specific optimization is outside of this scope because of the ways that it can effect other areas of communication. For example, shared memory would be a better form of IPC, but UNIX domain sockets make a consistent API easier. It would also be nice to know when an endpoint disconnects, but how would you implement such behavior between threads?
Their main goal is to allow every pattern to work the same way regardless of topology, protocol, system, or language, to the point that any mixture can be used regardless of how odd it may seem (node.js Websockets communicating with C# brokers passing messages to Ruby and PHP workers which share work with java threads, etc.)
Each of it's features would be enhanced greatly if optimised for each specific protocol and system, but that would also make uniform patterns close to impossible.
BTW, they might accept a pactch if you could find a way to implement iocp while still maintaining this versatility and neutrality.
PPS, nanomsg is made by one of the main original developers of Zeromq. Crossroads.IO is a direct fork of Zeromq, by original Zeromq developers as well and including some developers of nanomsg. if I'm not mistaken, Nano will likely become the core of crossroads when complete.

Advice on using ZeroMQ

I'm developing a new client-server app (.Net) and have up until now been using WCF, which suits the app's request-response approach nicely. However I've been asked to replace this with a socket-based solution, partly to support non-.Net clients, and future pub-sub/broadcast requirements (I realise WCF is capable, but there are other drivers behind the decision). Having failed miserably at writing my own async socket solution, I'm now looking at ZeroMQ.
My client app has a couple of background threads that periodically request data from the server. Additionally, certain UI actions (e.g. a button click) can trigger a message to the server. WCF made this easy - the code simply called the relevant method on a singleton WCF service proxy (actually I use the Castle Windsor WCF facility which gives me async calling capabilities, but that's probably irrelevant to my question).
I'm not too sure how this approach would translate to ZeroMQ, particularly with regards to managing the sockets - I'm very new to ZeroMQ and still reading the guide. Am I right in saying that I'll need a separate socket for each thread (i.e. the two b/g threads and the UI)? What about socket lifetime - do I create one each time I want to send/receive (presumably inefficient), or create the socket when the thread starts and reuse it for the entire lifetime of the thread?
One thing has to be very clear. ZMQ sockets can connect and talk to ZMQ sockets only.
This means that if I am building an distributed application whose components communicate to each other, I have liberty to choose any communication approach as external clients are not exposed to it.
Choosing ZMQ Sockets for such means is a good idea. It allows you to instantly build on many communication patterns like req/rep, push/pull, pub/sub etc and also build more complicated topologies using ZMQ devices.
How ever, This constraint is not to be taken lightly when external clients are concerned. This will enforce all external clients to use ZMQ sockets which might not be ideal. If one of the client happens to be a browser consuming your web services then you will need to provide services through regular client.
Is your client app using regular sockets?
Can it be re-written to use ZMQ sockets?
if not then don't use ZMQ sockets for external interface
but only for your internal component communication.
[Edit: Further notes]
ZMQ is a wrapper over sockets but that does a few things which are hard to get done by hand
It manages messaging at higher throughput by batching multiple messages at the same time
Optimizes use of socket at the same time
A socket can send messages to only one another socket, ZMQ socket can connect to multiple ZMQ sockets
ZMQ socket based solution can take immediate advantage of various patterns - REQ/REP, PUSH/PULL, PUB/SUB etc
How ever, it is common to mistake ZMQ to be a messaging queue.
Messaging Queue as available has other properties like message persistence and delivery guarantee etc by implementing a queue for storage.
ZMQ stands for "Zero Messaging Queue"
I have only been learning ZMQ in recent times and have been very happy to use it.
Checkout my mini tutorial on ZMQ and See if it make sense for you to use it:
http://learning-0mq-with-pyzmq.readthedocs.org/en/latest/
Regarding castle integration, check out what Henry's done on his fork:
https://github.com/hconceicao/clrzmq2/tree/master/src/integration

Resources