WebSocket is single thread only and golang is low performance with webSocket? - go

https://matttomasetti.medium.com/websocket-performance-comparison-10dc89367055
This document says that websocket uses single thread only, is that true?
And golang shows very low performance, but is this the case with servers in general situations?
I'm sorry if this is a really ignorant question.However, with my search ability, it was hard to tell if this was true.

Related

Interview question: How do you scale or optimize a microservice which receives millions of requests?

I was asked a question during an interview:
How do you optimize a microservice which receives millions of requests?
How do you optimize the latency/frequency of a service response which accessed multiple times?
My answer was:
I would check the DB query which makes the response slow and then configure the cache.
Can anyone let me in what are the ways a service can be optimized other than these? if there is anything cloud side?
It is a vast and complex questions which can have a lot of different (and very long) answers based on the context and the structure of your environment.
There are a lot of patterns and concepts which fit different scenario and architecture.
I would suggest you to start here: https://microservices.io/patterns/index.html
The guy behind the site (Chris Richardson) advocates microservices since a long time. You can find numerous talks of this guy on Youtube. It is a great way to start your journey in the microservice world.
And off course: https://martinfowler.com/articles/microservices.html
Here are my ideas on this question:
Caching, as you mentioned, is a good optimization point when around the data access layer. I would check to see if there is any opportunity to add a cache without breaking the consistency or other hard requirements there may be for the application.
I would analyze the CPU and memory usage and adjust them progressively while monitoring the latency closely. The objective here is to find the point when more resources does not decrease the latency significantly.
The above point has to consider the adjustment of the number of threads in the application, and also making sure that you optimize the synchronization scheme.
If the microservice is built in a JVM-based language, the GC is one component that may introduce latency, mostly when it kicks in, so if the latency spikes correlates with GC cycles, then I would search for optimizations there.
Making sure the application is reusing connections to external services efficiently is another optimization point that may be considered

How does MPI communication work?

I am trying to play with parallel computing, and I've started to study the MPI standard.
I tried to find any information about low level implementation, but unfortunately still searching for it.
I am able to understand all this great high level stuff like rank, communicator and other things. It is not hard, but when I try to understand something I always look for low level details in order to get understanding how does it work under the hood.
So could someone explain me what low level protocols are used for communication ? Is it done over the LAN, shared memory, domain sockets or any other communication means ?
I would be grateful for any details, especially for low level.

Using ZMQ for bidirectional inter-thread communication

I am new to ZeroMQ. I have spent the last couple of months reading the documentation and experimenting with the library. I am currently developing a multi-threaded c++ application and want to use ZeroMQ instead of mutexes to exchange data between my main thread and one of its child.
The child thread is handling the communication with an external application. Therefore, I will need to queue/sockets between the main thread and its child. One for outgoing messages and one for incoming messages.
Which zmq socket should I use in order to achieve this.
Thanks in advance
By moving from using shared memory and mutexes to using ZeroMQ, you are entering the realm of Actor model programming.
This, in my opinion, is a fairly good thing. However, there are some things to be aware of.
The only reason mutexes are no longer needed is because you are copying data, not sharing it. The 'cost' is that copying a lot of data takes a lot longer than locking a mutex that points to shared data. So you can end up with a nice looking Actor model program that runs like a dog in comparison to an equivalent program that uses shared memory / mutexes.
A caveat is that on complicated architectures like Intel Xeons with multiple CPUs, accessing shared memory can, conceivably, take just as long as copying it. This is because this may (depending on how lucky you've been) mean transactions across the QPI bus. Actor model programming is ideal for NUMA hardware architectures. Modern Intel and AMD architectures are, partially/fundamentally, NUMA, but the protocols they run over QPI / Hypertransport "fake" an SMP environment.
I would avoid ZMQ_PAIR sockets wherever practicable. They don't work across network connections. This means that if, for any reason, your application needs to scale across multiple computers you have to re-write your code. However, if you use different socket types from the very beginning, a scale-up of your application is nothing more than a matter of redeploying your code, not changing it. FYI nanomsg PAIRs do not have this restriction.
Don't for one moment assume that Actor model programming is going to solve all your problems. It brings in a whole suite of problems all of it's own. You can still deadlock, livelock, spinlock, etc. The problem with Actor model programmes is that these problems can be lurking in your code for years and never happen, until one day the network is just a little bit busier and -bam- your program stops running...
However, there is a development of Actor model programming called "Communicating Sequential Processes". This doesn't solve those problems, but if you've written your program with these problems they are guaranteed to happen every single time. So you discover the problem during development and testing, not five years later. There's also a process calculi for it, i.e. you can algebraically prove that your design is problem free before you ever write a single line of code. ZeroMQ is not CSP. Interestingly CSP is making something of a comeback - the Rust and Go languages both do CSP. However, they do not do CSP across network connections - it's all in-process stuff. Erlang does CSP too, and AFAIK does it across network connections.
Assuming you've read all that about CSP and are still going to use ZeroMQ, think carefully about what it is you are planning on sending across the ZeroMQ sockets. If it's all within one program on the same machine, then sending copies of, for example, arrays of integers is fine. They'll still be interpretable as integers at the receiving end. However, if you have aspirations to send data through ZMQ sockets to another computer it's well worth considering some sort of serialisation technology. ZeroMQ delivers messages. Why not make those messages the byte stream from an object serialiser? Then you can guarantee that the received message will, after de-serialisation, mean something appropriate at the receiving end, instead of having to solve problems with endianness, etc.
Favourite serialisers for me include Google Protocol Buffers. It is language / operating system agnostic, giving lots of options for a heterogeneous system. ASN.1 is another really good option, it can be got for most of the important languages, and it has a rich set of wire formats (including XML and, now/soon, JSON, which gives some interesting inter-op options), and does Constraints (something Google PBufs don't do), but does tend to cost money if one wants really good tools for it. XML can be understood by almost anything, but is bloated. Basically it's worth picking one that doesn't tie you down to using, say, C#, or Python everywhere.
Good luck!

Detecting when ZMQ_RATE limit or ZMQ_SNDHWM have been reached

Is there a way to programatically know when a pgm zeromq socket has stopped forwarding information because the ZMQ_RATE limit has been reached or if it is dropping data because the ZMQ_SNDHWM limit has been reached. There is a zmq_socket_monitor function call that allows the user to see events like client connect and client disconnect. I am thinking that there should be a similar construct for the rate limit.
Q :"Is there a way to programatically know when a ... zeromq socket has stopped forwarding ... because ..." ?
A : to the best of my knowledge and knowing limits thereof, there is no such way implemented so far.
If bets on reasoning for not having a way, I'd put my few cents on
(a)such feature having zero-( if not negative )-effects on ZeroMQ primary goals, that are maximum performance, minimised resources needs to achieve performance and minimum latency
(b)anyone, capable of providing a stable & acceptable implementation of this feature into the core & c-API could've been warm welcome to implement it, yet no one has put one's efforts into developing & testing & accepting it so far, so we stay in 2022-Q1 still in square number one

Gauging a web browser's bandwidth

Is it possible to gauge a web browsers upload and/or download speed by monitoring normal http requests? Ideally a web application would be able to tell the speed of a client without any modifications and without client-side scripting like JavaScript/Java/Flash. So even if a client was accessing the service with a library like Curl it would still work. If this is possible, how? If its not possible, why? How accurate can this method be?
(If it helps assume PHP/Apache, but really this is a platform independent question. Also being able to gauge the upload speed is more important to me.)
Overview
You're asking for what is commonly called "passive" available bandwidth (ABW) measurement along a path (versus measuring a single link's ABW). There are a number of different techniques1 that estimate bandwidth using passive observation, or low-bandwidth "Active" ABW probing techniques. However, the most common algorithms used in production services are active ABW techniques; they observe packet streams from two different end-points.
I'm most familiar with yaz, which sends packets from one side and measures variation in delay on the other side. The one-sided passive path ABW measurement techniques are considered more experimental; there aren't solid implementations of the algorithms AFAIK.
Discussion
The problem with the task you've asked for is that all non-intrusive2 ABW measurement techniques rely on timing. Sadly, timing is a very tricky thing when working with http...
You have to deal with the reality of object caching (for instance, akamai) and http proxies (which terminate your TCP session prematurely and often spoof the web-server's IP address to the client).
You have to deal with web-hosts which may get intermittently slammed
Finally, active ABW techniques rely on a structured packet stream (wrt packet sizes and timing), unlike what you see in a standard http transfer.
Summary
In summary, unless you set up dedicated client / server / protocol just for ABW measurement, I think you'll be rather frustrated with the results. You can keep your ABW socket connections on TCP/80, but the tools I have seen won't use http3.
Editorial note: My original answer suggested that ABW with http was possible. On further reflection, I changed my mind.
END-NOTES:
---
See Sally Floyd's archive of end-to-end TCP/IP bandwidth estimation tools
The most common intrusive techniques (such as speedtest.net) use a flash or java applet in the browser to send & receive 3-5 parallel TCP streams to each endpoint for 20-30 seconds. Add the streams' average throughput (not including lost packets requiring retransmission) over time, and you get that path's tx and rx ABW. This is obviously pretty disruptive to VoIP calls, or any downloads in progress. Disruptive meausurements are called bulk transfer capacity (BTC). See RFC 3148: A Framework for Defining Empirical Bulk Transfer Capacity Metrics. BTC measurements often use HTTP, but BTC doesn't seem to be what you're after.
That is good, since it removes the risk of in-line caching by denying http caches an object to cache; although some tools (like yaz) are udp-only.
Due to the way TCP connections adapt to available bandwidth, no this is not possible. Requests are small and typically fit within one or two packets. You need a least a dozen full-size packets to get even a coarse bandwidth estimate, since TCP first has to scale up to available bandwidth ("TCP slow start"), and you need to average out jitter effects. If you want any accuracy, you're probably talking hundreds of packets required. That's why upload rate measurement scripts typically transfer several megabytes of data.
OTOH, you might be able to estimate round-trip delay from the three-way handshake and the timing of acks. But download speed has at least as much impact as upload speed.
There's no support in javascript or any browser component to measure upload performance.
The only way I can think of is if you are uploading to a page/http handler, and the page is receiving the incoming bytes, it can measure how many bytes it is receiving per second. Then store that in some application wide dictionary with a session ID.
Then from the browser you can periodically poll the server to get the value in the dictionary using the session ID and show it to user. This way you can tell how's the upload speed.
You can use AJAXOMeter, a JavaScript library which meassures your up- and download speed. You can see a live demo here.
That is not feasible in general as in-bound and out-bound bandwidth frequently is not symmetric. Different ISPs have significantly different ratios here that can vary on even time of the day basis.

Resources