I am working on WebSocket application with STOMP and RabbitMQ as the broker. It is a chat-like application. I tried to dump thousands of messages to check if the socket will crash or observe any flicker in the UI.
Javascript code:
for(let i=1;i<1000;i++)
{
if (stompClient) {
var chatMessage = {
sender : name,
content : "File Processing at "+i+ " %",
type : 'Process'
};
stompClient.send("/app/chat.sendMessage", {}, JSON.stringify(chatMessage));
}
}
I followed this tutorial.
Here is what I observed:
A fresh session will send and receive messages within seconds, but if I send again and again the time taken will increase every time 1s, 3s, 5s, & 20s up to 1 min 30s for same 1,000 messages.
After sending 4-5 times if it takes more than 30s. Once messages are received the WebSocket session is terminated, and sometimes flicker is observed.
Re-connecting to the WebSocket after timeout will again take longer than previous to send and receive, and flickering is observed. The UI gets stuck for a while.
I am very new to this concept. Can anyone tell me why is this happening and how to overcome this?
Related
I am trying to receive messages from a queue and had experimented with different methods and was facing performance issues. Below are the metrics for each type of run:
Receive Mode = peek and lock; 1000 messages took 2.5 minutes as I had to complete each message one by one
Receive Mode = receive and delete; 1000 messages took an average of 1.5 mins.
Receive Mode = receive and delete (with prefetch count as 100); 1000 messages took 3 seconds but I ended up losing the 100 messages which were in the buffer at the time of execution end
Receive Mode = peek and lock (with prefetch count as 100); 1000 message took 2 minutes as I had to again complete each message. It would have been a problem solver only if there was a way to complete them in batch.
Below is my code for reference:
ServiceBusSessionReceiverClient sessionReceiverClient = new ServiceBusClientBuilder()
.connectionString(System.getenv("QueueConnectionString"))
.sessionReceiver()
.maxAutoLockRenewDuration(Duration.ofMinutes(2))
.receiveMode(ServiceBusReceiveMode.PEEK_LOCK)
.queueName(queueName)
.buildClient();
ServiceBusReceiverClient receiverClient = sessionReceiverClient.acceptSession(System.getenv("QueueSessionName"));
ObjectMapper objectMapper = new ObjectMapper();
do {
receiverClient.receiveMessages((int) prefetchCount).stream().forEach(message -> {
try {
String str = message.getBody().toString();
final T dataDto = objectMapper.readValue(message.getBody().toString(), returnType);
dataDtoList.add(dataDto);
receiverClient.complete(message);
} catch (Exception e) {
AzFaUtil.getLogger().severe("Message processing failed. Error: " + e.getMessage() + e + "\n Payload: "
+ message);
}
});
} while (dataDtoList.size() < numberOfMessages);
receiverClient.close();
sessionReceiverClient.close();
Possible solutions that I can think of:
If there is a way to complete messages in batch instead of completing 1 by 1.
If there is a way to requeue the messages back to the queue which are sitting in the prefetch buffer.
Note: This API needs to be Synchronous. I just experimented with 1000 entries but I am working with 30000 entries so performance matters. Also the queue is session enabled and also partition enabled
As per this issue, Microsoft has yet to test the performance of their ServiceBus. As FIFO (First In First Out) was a requirement for my message queue, I used JMS and it performed almost 10x faster on average but there is a drawback. Currently, JMS doesn't support session-based queues so I had to disable sessions, and then to ensure FIFO I also had to disable partitioning on the queue. This is a partial and temporary solution for better performance till either Microsoft improves the performance of their ServiceBusRecieverClient or enables sessions on JMS.
We set the reconnectionDelayMax (docs-ref) to max. 10 seconds.
So when a user is offline for a while socket.io will retry only every 10 seconds - which is good, but it also means, that a user must wait for up to 10 seconds when they get online.
Which is basically also okay, but can be improved in some cases:
We want to listen to the navigator.ononline event and then "reset" the reconnection logic: i.e. try to reconnect immediately (with the initial reconnectionDelay)
Is this possible in socket.io (client v2.x)
You could try adding something like this that would fire when the browser goes online and (re)connect the socket
window.addEventListener('online', function () {
const socket = io('http://localhost:3000');
});
when i send message to broker,this exception occasionally occurs.
MQBrokerException: CODE: 2 DESC: [TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while
This means broker is too busy(when tps>1,5000) to handle so many sending message request.
What would be the most impossible reason to cause this? Disk ,cpu or other things? How can i fix it?
There are many possible ways.
The root cause is that, there are some messages has waited for long time and no worker thread processes them, rocketmq will trigger the fast failure.
So the below is the cause:
Too many thread are working and they are working very slow to process storing message which makes the cache request is timeout.
The jobs it self cost a long time to process for message storing.
This may be because of:
2.1 Storing message is busy, especially when SYNC_FLUSH is used.
2.2 Syncing message to slave takes long when SYNC_MASTER is used.
In
/broker/src/main/java/org/apache/rocketmq/broker/latency/BrokerFastFailure.java you can see:
final long behind = System.currentTimeMillis() - rt.getCreateTimestamp();
if (behind >= this.brokerController.getBrokerConfig().getWaitTimeMillsInSendQueue()) {
if (this.brokerController.getSendThreadPoolQueue().remove(runnable)) {
rt.setStopRun(true);
rt.returnResponse(RemotingSysResponseCode.SYSTEM_BUSY, String.format("[TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while, period in queue: %sms, size of queue: %d", behind, this.brokerController.getSendThreadPoolQueue().size()));
}
}
In common/src/main/java/org/apache/rocketmq/common/BrokerConfig.java, getWaitTimeMillsInSendQueue() method returns
public long getWaitTimeMillsInSendQueue() {
return waitTimeMillsInSendQueue;
}
The default value of waitTimeMillsInSendQueue is 200, thus you can just set it bigger to make the queue waiting for longer time. But if you wanna solve the problem completely, you should follow Jaskey's advice and check your code.
I have a really weird problem that is driving me crazy.
I have a Ruby server and a Flash client (Action Script 3). It's a multiplayer game.
The problem is that everything is working perfect and then, suddenly, a random player stops receiving data. When the server closes the connection because of inactivity, about 20-60 seconds later, the client receives all the buffered data.
The client uses XMLsocket for retrieving data, so the way the client receives data is not the problem.
socket.addEventListener(Event.CONNECT, connectHandler);
function connectHandler(event)
{
sendData(sess);
}
function sendData(dat)
{
trace("SEND: " + dat);
addDebugData("SEND: " + dat)
if (socket.connected) {
socket.send(dat);
} else {
addDebugData("SOCKET NOT CONNECTED")
}
}
socket.addEventListener(DataEvent.DATA, dataHandler);
function dataHandler(e:DataEvent) {
var data:String = e.data;
workData(data);
}
The server flushes data after every write, so is not a flushing problem:
sock.write(data + DATAEOF)
sock.flush()
DATAEOF is null char, so the client parses the string.
When the server accepts a new socket, it sets sync to true, to autoflush, and TCP_NODELAY to true too:
newsock = serverSocket.accept
newsock.sync = true
newsock.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, true)
This is my research:
Info: I was dumping netstat data to a file each second.
When the client stops receiving data, netstat shows that socket status is still ESTABLISHED.
Some seconds after that, send-queue grows accordingly to data sent.
tcpflow shows that packets are sent 2 times.
When the server closes the socket, socket status changes to FIN_WAIT1, as expected. Then, tcpflow shows that all buffered data is sent to the client, but the client don't receives data. some seconds after that, connection dissapears from netstat and tcpflow shows that the same data is sent again, but this time the client receives the data so starts sending data to the server and the server receives it. But it's too late... server has closed connection.
I don't think it's an OS/network problem, because I've changed from a VPS located in Spain to Amazon EC2 located in Ireland and the problem still remains.
I don't think it's a client network problem too, because this occurs dozens of times per day, and the average quantity of online users is about 45-55, with about 400 unique users a day, so the ratio is extremely high.
EDIT:
I've done more research. I've changed the server to C++.
When a client stops sending data, after a while the server receives a "Connection reset by peer" error. In that moment, tcpdump shows me that the client sent a RST packet, this could be because the client closed the connection and the server tried to read, but... why the client closed the connection? I think the answer is that the client is not the one closing the connection, is the kernel. Here is some info: http://scie.nti.st/2008/3/14/amazon-s3-and-connection-reset-by-peer
Basically, as I understand it, Linux kernels 2.6.17+ increased the maximum size of the TCP window/buffer, and this started to cause other gear to wig out, if it couldn’t handle sufficiently large TCP windows. The gear would reset the connection, and we see this as a “Connection reset by peer” message.
I've followed the steps and now it seems that the server is closing connections only when the client losses its connection to internet.
I'm going to add this as an answer so people know a bit mroe about this.
I think the answer is that the kernel is the one closing the connection. Here is some info: http://scie.nti.st/2008/3/14/amazon-s3-and-connection-reset-by-peer
Basically, as I understand it, Linux kernels 2.6.17+ increased the maximum size of the TCP window/buffer, and this started to cause other gear to wig out, if it couldn’t handle sufficiently large TCP windows. The gear would reset the connection, and we see this as a “Connection reset by peer” message.
I've followed the steps and now it seems that the server is closing connections only when the client losses its connection to internet.
How long can the browser wait before an error is shown before server answers for request? Can this time be unlimited?
If you are using a jQuery $.ajax call you can set the timeout property to control the amount of time before a request returns with a timeout status. The timeout is set in milliseconds, so just set it to a very high value. You can also set it to 0 for "unlimited" but in my opinion you should just set a high value instead.
Note: unlimited is actually the default but most browsers have default timeouts that will be hit.
When an ajax call is returned due to timeout it will return with an error status of "timeout" that you can handle with a separate case if needed.
So if you want to set a timeout of 3 seconds, and handle the timeout here is an example:
$.ajax({
url: "/your_ajax_method/",
type: "GET",
dataType: "json",
timeout: 3000, //Set your timeout value in milliseconds or 0 for unlimited
success: function(response) { alert(response); },
error: function(jqXHR, textStatus, errorThrown) {
if(textStatus==="timeout") {
alert("Call has timed out"); //Handle the timeout
} else {
alert("Another error was returned"); //Handle other error type
}
}
});
Yes and no. Yes the server can do it or be configured to do so, no the browsers (i dont know about version/distributor specifics) may have timeouts enabled.
There are 2 solutions though for achieving/emulating this over HTTP:
If this is simple a long running script and you're waiting for results this isnt the way to go, you should instead do as previous poster mentioned and use async processing with server polling for the results, this would be a much more sure fire solution. For example: a thumbnail script from an image processor server side: the user uploads an image, the server immediately returns a 200 and a "Job ID". The client (javascript^^) can then use the JobID to request the job status/result.
If your goal is to have something like a realtime connection between browser and server (1 way connection, once the request is made by the browser no further info can be sent without using new requests (ajax^^)), this is called long polling/reverse ajax and can be used for real-time communication over http. There are several techniques using 2 long polled requests in parallel so that once one of them timeout the second one becomes the active and the first one attempts to reconnect.
Can you explain a bit more about what you're trying to achieve - do you have a long running process on a server, do you want to change the settings on just a local machine or are you after a way to manage it for large numbers of users?
How long the browser will wait depends on a number of factors e.g. where the timeout occurs - is it at the TCP level, the server or the local browser?
If you've got a long running process on a server and you want to update a webpage afterwards the typical way to handle it is to run the long process asynchronously and notify the client when it's complete e.g. have an ajax call that polls the server, or use HTTP 1.1 and serve out a notification stream to the client.
In either case it's still possible for the connection to be closed so the client will still need the ability to re-open it.
I found, that in case of a normal (HTML page) request, browsers run to timeout after cca. 30 secs. It's important, because other participiants probably follows it: proxies, routers (do routers play in this game? I'm not sure). I am using 4 sec long server-side delay (if there's nothing to send to the client), and my AJAX client performs another HTTP request immediatelly (I am on local network, there's no internet lag). 4 sec is long enough to not to overload the server and network with frequented polls, and is short enough for the case, when somehow one poll falls out of the row which the client can't detect and handle.
Also, there're other issues with comet (long HTTP request): browser's limit on number of simultaneous HTTP request, handling of client-side events (must sent to the server immediatelly), server/network down detection and recovery, multi user handling etc.