rocketmq throw exception "[TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while" - rocketmq

version:rocketmq-all-4.1.0-incubating
We send msg 1000 QPS,sync send, but throw exception:-
[TIMEOUT_CLEAN_QUEUE] broker busy, start flow control for a while
There is the related code:
while (true) {
try {
if (!this.brokerController.getSendThreadPoolQueue().isEmpty()) {
final Runnable runnable = this.brokerController.getSendThreadPoolQueue().peek();
if (null == runnable) {
break;
}
final RequestTask rt = castRunnable(runnable);
if (rt == null || rt.isStopRun()) {
break;
}
final long behind = System.currentTimeMillis() - rt.getCreateTimestamp();
if (behind >= this.brokerController.getBrokerConfig().getWaitTimeMillsInSendQueue()) {
if (this.brokerController.getSendThreadPoolQueue().remove(runnable)) {
rt.setStopRun(true);
rt.returnResponse(RemotingSysResponseCode.SYSTEM_BUSY, String.format("[TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while, period in queue: %sms, size of queue: %d", behind, this.brokerController.getSendThreadPoolQueue().size()));
}
} else {
break;
}
} else {
break;
}
} catch (Throwable ignored) {
}
}
}
I find broker the default value of sendMessageThreadPoolNums is 1,
/**
* thread numbers for send message thread pool, since spin lock will be used by default since 4.0.x, the default value is 1.
*/
private int sendMessageThreadPoolNums = 1; //16 + Runtime.getRuntime().availableProcessors() * 4;
private int pullMessageThreadPoolNums = 16 + Runtime.getRuntime().availableProcessors() * 2;
but the previous version isn't 1, and if I configure sendMessageThreadPoolNums = 100, can resolve this question ? It will lead to what is different with default value.
thanks

SHORT ANSWER:
you have two choices:
set sendMessageThreadPoolNums to a small number, say 1, which is the default value after version 4.1.x. And, remain the default value of useReentrantLockWhenPutMessage=false, which is introduced after 4.1.x
sendMessageThreadPoolNums=1
useReentrantLockWhenPutMessage=false
If you need to use a large numbers of threads to process sending message, you'd better use useReentrantLockWhenPutMessage=true
sendMessageThreadPoolNums=128//large thread numbers
useReentrantLockWhenPutMessage=true // indicating that do NOT use spin lock but use ReentrantLock when putting message

Related

Aging values in a queue: Best use of Windows timers?

I have an std::set that contains unique values. I have an std::queue that holds the same values
in order to age the values in std::set.
I'd like to use a timer to determine when to pop a value from the queue and then erase the value from the set.
The timer is created/started every time data is added to an empty set/queue.
If data is added to a non-empty set/queue, no change is made to the timer.
The timer would fire every X milliseconds to execute a function.
The function would pop a value from the queue then erase that value from the set.
If the set/queue is now empty the timer would stop.
If the set/queue is not empty, no change is made to the timer.
This program runs in Windows 10.
Does this way make sense? Is there a better/more efficient/simpler way to age the data?
I've read the docs on Using Timer Queues so I see how the queue and the timers are created and destroyed. What I don't see is a recommendation for starting/stopping timers.
Should I be creating a new TimerQueueTimer to wait for X milliseconds once, run the func and then create a new TimerQueueTimer if the set/queue is not empty?
Should I instead create a single TimerQueueTimer to run periodically X milliseconds but delete it once the set/queue is empty?
Is there a 3rd technique I should use instead?
Here's my example code.
using unsignedIntSet = std::set<std::uint32_t>;
using unsignedIntQ = std::queue<std::uint32_t>;
unsignedIntQ agingQ;
unsignedIntSet agingSet;
HANDLE gDoneEvent = NULL;
HANDLE hTimer = NULL;
HANDLE hTimerQueue = NULL;
VOID CALLBACK ageTimer(PVOID lpParam, BOOLEAN TimerOrWaitFired)
{
if (!agingQ.empty())
{
auto c = agingQ.front();
agingSet.erase(c);
agingQ.pop();
if (!agingQ.empty())
{
// rerun CreateTimerQueueTimer() here?
}
}
SetEvent(gDoneEvent);
}
int createTimerForAgingQ()
{
// create timer if it doesn't already exist
if (gDoneEvent == NULL)
{
gDoneEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
if (gDoneEvent == NULL)
{
std::cerr << "CreateEvent() error: " << WSAGetLastError() << std::endl;
return -1;
}
hTimerQueue = CreateTimerQueue();
if (hTimerQueue == NULL)
{
std::cerr << "CreateTimerQueue() error: " << WSAGetLastError() << std::endl;
return -1;
}
if (!CreateTimerQueueTimer(&hTimer, hTimerQueue, (WAITORTIMERCALLBACK)ageTimer, NULL, 500, 0, WT_EXECUTEONLYONCE))
{
std::cerr << "CreateTimerQueueTimer() error: " << WSAGetLastError() << std::endl;
return -1;
}
}
}
void addUnique(unsigned char* buffer, int bufferLen)
{
// hash value
auto h = hash(buffer, bufferLen);
// test insert into set
auto setResult = agingSet.emplace(h);
if (setResult.second)
{
// enqueue into historyQ
agingQ.emplace(h);
if (!gDoneEvent) createTimerForAgingQ();
}
}
Research shows that the CreateTimerQueue/CreateTimerQueueTimer may not be the way to go.
Use of ThreadpoolTimer

why do I get two events from particle.publish?

I am using code like this on a particle electron to report pulse counts from a flow meter on my kegerator to the particle cloud:
void meterInterrupt(void) {
detachInterrupt(pin);
ticks++;
cloudPending = 1;
attachInterrupt(pin, meterInterrupt, FALLING);
}
void publishStatus() {
if (!cloudPending) {
return;
}
cloudPending = 0;
getStatus(&statusMessage);
// status message contains number of ticks since last publish
bool published = Particle.publish("Ticks", statusMessage, PRIVATE);
if (published) {
resetMeters();
lastPublish = millis();
}
}
void loop() {
if ((millis() - lastPublish) >= 1000) {
publishStatus();
}
}
When I curl the event log into my terminal, I see two events for the first publish like so:
event: Ticks
data: {"data":"ticks:1","ttl":60,"published_at":"2018-07-03T22:35:01.008Z","coreid":"420052000351353337353037"}
event: hook-sent/Ticks
data: {"data":"","ttl":60,"published_at":"2018-07-03T22:35:01.130Z","coreid":"particle-internal"}
event: Ticks
data: {"data":"ticks:46","ttl":60,"published_at":"2018-07-03T22:35:01.193Z","coreid":"420052000351353337353037"}
event: hook-sent/Ticks
data: {"data":"","ttl":60,"published_at":"2018-07-03T22:35:01.303Z","coreid":"particle-internal"}
I don't see how this could happen. Why didn't it just report "ticks:47"? What am I missing?
UPDATE:
I did some further testing and noticed that Particle.publish is returning false the first time when it is actually completing successfully. Is this a timeout issue? The time difference between these publishes is only about 200ms.
OK, This is at least a partial answer.
It appears that Particle.publish is asynchronous. It returns the promise of an answer that starts out as false only eventually becomes true when/if the action is actually completed. If I wait an indeterminate amount of time (say delay(10)) after Particle.publish and before checking the return code, the return value will indicate the actual success or failure of the publish. My code cannot work because the ticks that are counted while I wait will be deleted when I reset the meters. WITH_ACK gives me the same behavior.
I will have to modify my code such that no ticks are lost during the long running Particle.publish . I am thinking that each statusMessage should go onto a list until it is ack'ed by the server.
FINAL ANSWER:
I modified the code to close the window during which I can receive ticks that will then be wiped out when I reset the counters. I do this by capturing the ticks into an array and then resetting the tick counter (meter). I am using a library called PublishQueueAsyncRK (cudos to rickkas7 This library is great!) so I can just fire it and forget it. Check it out on github.
void publishStatus() {
unsigned int counters[NUM_METERS];
unsigned int pending;
for (int i = 0; i < NUM_METERS; i++) {
meter_t *meter = &meters[i];
counters[i] = meter->ticks;
pending += counters[i];
resetMeter(i);
}
if (pending) {
String statusReport;
for (int i = 0; i < NUM_METERS; i++) {
statusReport.concat(String::format("%i:%u|", i+1, counters[i]));
}
publishReport(statusReport);
lastPublished = millis();
}
}
void publishReport(String report) {
if (report != "") {
publishQueue.publish("PourLittleTicks", report, PRIVATE);
}
}
void loop() {
if ((millis() - lastPublished) >= PUBLISH_INTERVAL) {
publishStatus();
}
}

Correct way of synchronization between a method and a stop functionality

I have a function (lets call it function A) that 0 to many threads can access it (at the same time, no shared resources). At any given time, the user can use to stop the process. The stop functionality needs to make sure that there are threads accessing function A, so that a graceful shutdown can be performed. Is there a native procedure to do so?
What I was going to do is have an InterlockedIncrement an integer everytime function A is called (and a corresponding InterlockedDecrement on said integer when function A exists). When an InterlockedDecrement takes place, it checks the value of the integer, if it's set to zero, a event is set to signalled. If the value is not zero, the event is set to nonsignalled.
This makes sense in my mind, but I'm curious whether there is a more native structure / functionality adapted to do so.
I still have to thing about the fact the "stop" function may get starved (in the sense, the said integer may never be set to zero). A sidenote: when the stop event takes place, the InterlockedIncrement process shall be stopped, to reduce said starvation.
what you need and want implement is called Run-Down Protection. unfortunately it supported only in kernel mode, but not hard implement it yourself in user mode too.
the simplest implementation is next:
HANDLE ghStopEvent;
LONG gLockCount = 1;
BOOLEAN bStop = FALSE;
void unlock()
{
if (!InterlockedDecrement(&gLockCount)) SetEvent(ghStopEvent);
}
BOOL lock()
{
LONG Value = gLockCount, NewValue;
for ( ; !bStop && Value; Value = NewValue)
{
NewValue = InterlockedCompareExchange(&gLockCount, Value + 1, Value);
if (NewValue == Value) return TRUE;
}
return FALSE;
}
void funcA();
void UseA()
{
if (lock())
{
funcA();
unlock();
}
}
and when you want begin rundown - once call
bStop = TRUE; unlock();
how you can see lock function is interlocked increment gLockCount on 1 but only if it not 0.
in kernel mode you can call instead
EX_RUNDOWN_REF gRunRef;
void UseA()
{
if (ExAcquireRundownProtection(&gRunRef))
{
funcA();
ExReleaseRundownProtection(&gRunRef)
}
}
and on place final unlock - ExWaitForRundownProtectionRelease
some more complex and scalable implementation of rundown-protection:
#define RUNDOWN_INIT_VALUE 0x80000000
#define RUNDOWN_COMPLETE_VALUE 0
class __declspec(novtable) RUNDOWN_REF
{
LONG _LockCount;
protected:
virtual void RundownCompleted() = 0;
public:
BOOL IsRundownBegin()
{
return 0 <= _LockCount;
}
void Reinit()
{
if (InterlockedCompareExchange(&_LockCount, RUNDOWN_INIT_VALUE, RUNDOWN_COMPLETE_VALUE) != RUNDOWN_COMPLETE_VALUE)
{
__debugbreak();
}
}
RUNDOWN_REF()
{
_LockCount = RUNDOWN_INIT_VALUE;
}
BOOL AcquireRundownProtection()
{
LONG Value = _LockCount, NewValue;
for ( ; Value < 0; Value = NewValue)
{
NewValue = InterlockedCompareExchange(&_LockCount, Value + 1, Value);
if (NewValue == Value) return TRUE;
}
return FALSE;
}
void ReleaseRundownProtection()
{
if (RUNDOWN_COMPLETE_VALUE == InterlockedDecrement(&_LockCount))
{
RundownCompleted();
}
}
void BeginRundown()
{
if (AcquireRundownProtection())
{
_interlockedbittestandreset(&_LockCount, 31);
ReleaseRundownProtection();
}
}
};
and use it like:
class MY_RUNDOWN_REF : public RUNDOWN_REF
{
HANDLE _hEvent;
virtual void RundownCompleted()
{
SetEvent(_hEvent);
}
// ...
} gRunRef;
void UseA()
{
if (gRunRef.AcquireRundownProtection())
{
funcA();
gRunRef.ReleaseRundownProtection();
}
}
and when you want stop:
gRunRef.BeginRundown();// can be safe called multiple times
// wait on gRunRef._hEvent here
interesting that in kernel exist else one (more old - from win2000, when rundown protection from xp) api Remove Locks. it do almost the same. different only in internal implementation and usage. with remove locks code will be look like this:
IO_REMOVE_LOCK gLock;
void UseA()
{
if (0 <= IoAcquireRemoveLock(&gLock, 0))
{
funcA();
IoReleaseRemoveLock(&gLock, 0);
}
}
and when we want stop - call
IoAcquireRemoveLock(&gLock, 0);
IoReleaseRemoveLockAndWait(&gLock, 0);
my first code spinet by implementation near remove locks implementation, when second near rundown-protection implementation. but by sense both do the same

alBufferData() sets AL_INVALID_OPERATION when using buffer ID obtained from alSourceUnqueueBuffers()

I am trying to stream audio data from disk using OpenAL's buffer queueing mechanism. I load and enqueue 4 buffers, start the source playing, and check in a regular intervals to refresh the queue. Everything looks like it's going splendidly, up until the first time I try to load data into a recycled buffer I got from alSourceUnqueueBuffers(). In this situation, alBufferData() always sets AL_INVALID_OPERATION, which according to the official v1.1 spec, it doesn't seem like it should be able to do.
I have searched extensively on Google and StackOverflow, and can't seem to find any reason why this would happen. The closest thing I found was someone with a possibly-related issue in an archived forum post, but details are few and responses are null. There was also this SO question with slightly different circumstances, but the only answer's suggestion does not help.
Possibly helpful: I know my context and device are configured correctly, because loading small wav files completely into a single buffer and playing them works fine. Through experimentation, I've also found that queueing 2 buffers, starting the source playing, and immediately loading and enqueueing the other two buffers throws no errors; it's only when I've unqueued a processed buffer that I run into trouble.
The relevant code:
static constexpr int MAX_BUFFER_COUNT = 4;
#define alCall(funcCall) {funcCall; SoundyOutport::CheckError(__FILE__, __LINE__, #funcCall) ? abort() : ((void)0); }
bool SoundyOutport::CheckError(const string &pFile, int pLine, const string &pfunc)
{
ALenum tErrCode = alGetError();
if(tErrCode != 0)
{
auto tMsg = alGetString(tErrCode);
Log::e(ro::TAG) << tMsg << " at " << pFile << "(" << pLine << "):\n"
<< "\tAL call " << pfunc << " failed." << end;
return true;
}
return false;
}
void SoundyOutport::EnqueueBuffer(const float* pData, int pFrames)
{
static int called = 0;
++called;
ALint tState;
alCall(alGetSourcei(mSourceId, AL_SOURCE_TYPE, &tState));
if(tState == AL_STATIC)
{
Stop();
// alCall(alSourcei(mSourceId, AL_BUFFER, NULL));
}
ALuint tBufId = AL_NONE;
int tQueuedBuffers = QueuedUpBuffers();
int tReady = ProcessedBuffers();
if(tQueuedBuffers < MAX_BUFFER_COUNT)
{
tBufId = mBufferIds[tQueuedBuffers];
}
else if(tReady > 0)
{
// the fifth time through, this code gets hit
alCall(alSourceUnqueueBuffers(mSourceId, 1, &tBufId));
// debug code: make sure these values go down by one
tQueuedBuffers = QueuedUpBuffers();
tReady = ProcessedBuffers();
}
else
{
return; // no update needed yet.
}
void* tConverted = convert(pData, pFrames);
// the fifth time through, we get AL_INVALID_OPERATION, and call abort()
alCall(alBufferData(tBufId, mFormat, tConverted, pFrames * mBitdepth/8, mSampleRate));
alCall(alSourceQueueBuffers(mSourceId, 1, &mBufferId));
if(mBitdepth == BITDEPTH_8)
{
delete (uint8_t*)tConverted;
}
else // if(mBitdepth == BITDEPTH_16)
{
delete (uint16_t*)tConverted;
}
}
void SoundyOutport::PlayBufferedStream()
{
if(!StreamingMode() || !QueuedUpBuffers())
{
Log::w(ro::TAG) << "Attempted to play an unbuffered stream" << end;
return;
}
alCall(alSourcei(mSourceId, AL_LOOPING, AL_FALSE)); // never loop streams
alCall(alSourcePlay(mSourceId));
}
int SoundyOutport::QueuedUpBuffers()
{
int tCount = 0;
alCall(alGetSourcei(mSourceId, AL_BUFFERS_QUEUED, &tCount));
return tCount;
}
int SoundyOutport::ProcessedBuffers()
{
int tCount = 0;
alCall(alGetSourcei(mSourceId, AL_BUFFERS_PROCESSED, &tCount));
return tCount;
}
void SoundyOutport::Stop()
{
if(Playing())
{
alCall(alSourceStop(mSourceId));
}
int tBuffers;
alCall(alGetSourcei(mSourceId, AL_BUFFERS_QUEUED, &tBuffers));
if(tBuffers)
{
ALuint tDummy[tBuffers];
alCall(alSourceUnqueueBuffers(mSourceId, tBuffers, tDummy));
}
alCall(alSourcei(mSourceId, AL_BUFFER, AL_NONE));
}
bool SoundyOutport::Playing()
{
ALint tPlaying;
alCall(alGetSourcei(mSourceId, AL_SOURCE_STATE, &tPlaying));
return tPlaying == AL_PLAYING;
}
bool SoundyOutport::StreamingMode()
{
ALint tState;
alCall(alGetSourcei(mSourceId, AL_SOURCE_TYPE, &tState));
return tState == AL_STREAMING;
}
bool SoundyOutport::StaticMode()
{
ALint tState;
alCall(alGetSourcei(mSourceId, AL_SOURCE_TYPE, &tState));
return tState == AL_STATIC;
}
And here's an annotated screen cap of what I see in my debugger when I hit the error:
I've tried a bunch of little tweaks and variations, and the result is always the same. I've wasted too many days trying to fix this. Please help :)
This error occurs when you trying to fill buffer with data, when the buffer is still queued to the source.
Also this code is wrong.
if(tQueuedBuffers < MAX_BUFFER_COUNT)
{
tBufId = mBufferIds[tQueuedBuffers];
}
else if(tReady > 0)
{
// the fifth time through, this code gets hit
alCall(alSourceUnqueueBuffers(mSourceId, 1, &tBufId));
// debug code: make sure these values go down by one
tQueuedBuffers = QueuedUpBuffers();
tReady = ProcessedBuffers();
}
else
{
return; // no update needed yet.
}
You can fill buffer with data only if it unqueued from source. But your first if block gets tBufId that queued to the source. Rewrite code like so
if(tReady > 0)
{
// the fifth time through, this code gets hit
alCall(alSourceUnqueueBuffers(mSourceId, 1, &tBufId));
// debug code: make sure these values go down by one
tQueuedBuffers = QueuedUpBuffers();
tReady = ProcessedBuffers();
}
else
{
return; // no update needed yet.
}

Poco c++ Websocket server connection reset by peer

I am writing a kind of chat server app where a message received from one websocket client is sent out to all other websocket clients. To do this, I keep the connected clients in a list. When a client disconnects, I need to remove it from the list (so that future "sends" do not fail).
However, sometimes when a client disconnects, the server just gets an exception "connection reset by peer", and the code does not get chance to remove from the client list. Is there a way to guarantee a "nice" notification that the connection has been reset?
My code is:
void WsRequestHandler::handleRequest(HTTPServerRequest &req, HTTPServerResponse &resp)
{
int n;
Poco::Timespan timeOut(5,0);
try
{
req.set("Connection","Upgrade"); // knock out any extra tokens firefox may send such as "keep-alive"
ws = new WebSocket(req, resp);
ws->setKeepAlive(false);
connectedSockets->push_back(this);
do
{
flags = 0;
if (!ws->poll(timeOut,Poco::Net::Socket::SELECT_READ || Poco::Net::Socket::SELECT_ERROR))
{
// cout << ".";
}
else
{
n = ws->receiveFrame(buffer, sizeof(buffer), flags);
if (n > 0)
{
if ((flags & WebSocket::FRAME_OP_BITMASK) == WebSocket::FRAME_OP_BINARY)
{
// process and send out to all other clients
DoReceived(ws, buffer, n);
}
}
}
}
while ((flags & WebSocket::FRAME_OP_BITMASK) != WebSocket::FRAME_OP_CLOSE);
// client has closed, so remove from list
for (vector<WsRequestHandler *>::iterator it = connectedSockets->begin() ; it != connectedSockets->end(); ++it)
{
if (*it == this)
{
connectedSockets->erase(it);
logger->information("Connection closed %s", ws->peerAddress().toString());
break;
}
}
delete(ws);
ws = NULL;
}
catch (WebSocketException& exc)
{
//never gets called
}
}
See receiveFrame() documentation:
Returns the number of bytes received. A return value of 0 means that the peer has shut down or closed the connection.
So if receiveFrame() call returns zero, you can act acordingly.
I do not know if this is an answer to the question, but the implementation you have done does not deal with PING frames. This is currently (as of my POCO version: 1.7.5) not done automatically by the POCO framework. I put up a question about that recently. According to the RFC (6465), the ping and pong frames are used (among others) as a keep-alive function. This may therefore be critical to get right in order to get your connection stable over time. Much of this is guess-work from my side as I am experimenting with this now myself.
#Alex, you are a main developer of POCO I believe, a comment on my answer would be much appreciated.
I extended the catch, to do some exception handling for "Connection reset by peer".
catch (Poco::Net::WebSocketException& exc)
{
// Do something
}
catch (Poco::Exception& e)
{
// This is where the "Connection reset by peer" lands
}
A bit late to the party here... but I am using Poco and Websockets as well - and properly handling disconnects was tricky.
I ended up implementing a simple ping functionality myself where the client side sends an ACK message for every WS Frame it receives. A separate thread on the server side tries to read the ACK messages - and it will now detect when the client has disconnected by looking at flags | WebSocket::FRAME_OP_CLOSE.
//Serverside - POCO. Start thread for receiving ACK packages. Needed in order to detect when websocket is closed!
thread t0([&]()->void{
while((!KillFlag && ws!= nullptr && flags & WebSocket::FRAME_OP_BITMASK) != WebSocket::FRAME_OP_CLOSE && machineConnection != nullptr){
try{
if(ws == nullptr){
return;
}
if(ws->available() > 0){
int len = ws->receiveFrame(buffer, sizeof(buffer), flags);
}
else{
Util::Sleep(10);
}
}
catch(Poco::Exception &pex){
flags = flags | WebSocket::FRAME_OP_CLOSE;
return;
}
catch(...){
//log::info(string("Unknown exception in ACK Thread drained"));
return;
}
}
log::debug("OperatorWebHandler::HttpRequestHandler() Websocket Acking thread DONE");
});
on the client side I just send a dummy "ACK" message back to the server (JS) every time I receive a WS frame from the server (POCO).
websocket.onmessage = (evt) => {
_this.receivedData = JSON.parse(evt.data);
websocket.send("ACK");
};
It is not about disconnect handling, rather about the stability of the connection.
Had some issues with POCO Websocket server in StreamSocket mode and C# client. Sometimes the client sends Pong messages with zero length payload and disconnect occurs so I added Ping and Pong handling code.
int WebSocketImpl::receiveBytes(void* buffer, int length, int)
{
char mask[4];
bool useMask;
_frameFlags = 0;
for (;;) {
int payloadLength = receiveHeader(mask, useMask);
int frameOp = _frameFlags & WebSocket::FRAME_OP_BITMASK;
if (frameOp == WebSocket::FRAME_OP_PONG || frameOp ==
WebSocket::FRAME_OP_PING) {
std::vector<char> tmp(payloadLength);
if (payloadLength != 0) {
receivePayload(tmp.data(), payloadLength, mask, useMask);
}
if (frameOp == WebSocket::FRAME_OP_PING) {
sendBytes(tmp.data(), payloadLength, WebSocket::FRAME_OP_PONG);
}
continue;
}
if (payloadLength <= 0)
return payloadLength;
if (payloadLength > length)
throw WebSocketException(Poco::format("Insufficient buffer for
payload size %d", payloadLength),
WebSocket::WS_ERR_PAYLOAD_TOO_BIG);
return receivePayload(reinterpret_cast<char*>(buffer), payloadLength,
mask, useMask);
}
}

Resources