I have several problems on ch, we call many times and happen remains append al lot query until I restart the service.
There is any way to control query remains not appended?
Related
I have two services A and B. A receives a request, does some processing and sends the processed data to B.
What should I do with the data in the following scenario:
A receives data.
Processes it successfully.
Crashes before sending the data to B.
Comes back online.
I would either use some sort of persistent log to handle the communication between the micro-services (e.g. Kafka) or some sort of retry mechanism.
In either case, the data that A received and processed must not disappear until the entire chain of execution completes successfully or, at the very least, until A has successfully completed its work and passed its payload to the next service. And this payload must exist until the next service processes it, and so on.
Generally, the steps should continue as follows:
A comes back online and sees that there is work to be done: the one that it processed at step #2 (since it's processing is not yet done as far as the overall system is concerned). Unless there are some weird side-effects, it shouldn't matter that it processes it again.
The data is sent to B (although this step should, conceptually, be part of "processing" the data).
If A crashes again then it probably means that the data it processes matches nicely with a bug in A and the whole chain of starting up, reprocessing and crashing will continue for ever. This is a Denial of Service, malicious or not, and you should have some procedure in place to handle it, perhaps you don't reprocess the same data more than a given number of times and log this to be analyzed with top priority.
I recently encountered a thorny problem, while I am using kafkastream's TimeWindowedKStream aggregation method. The phenomenon was that I stopped my program for 5 minutes and then restarted it. I found a small part of my data was lost and got the following prompt, "Skipping record for expired window". All data are normal data that want to be saved, there is no large delay. What can I do to prevent data from being discarded ? It seems that kafkastream got a later time when it got observedstreamtime
The error message means that a window was already closed -- thus you would need to increase GRACE as pointed out by #groo. -- Data expiration is based on event-time so stopping your program and resuming is later should not change much.
However, if there is a repartition topic before the aggregation, if you stop your program for some time, there might be more out-of-order data inside the repartition topic, because the input topic is read much faster than in the "live run" -- this increased unorder during catchup could be the issue.
I've been trying to implement a call centre type system using Taskrouter using this guide as a base:
https://www.twilio.com/docs/tutorials/walkthrough/dynamic-call-center/ruby/rails
Project location is Australia, if that affects call details.
This system dials multiple numbers (workers), and I have run into an issue where phones will continue to ring even after the call has been accepted or cancelled.
ie. If Taskrouter calls Workers A and B, and A picks up first they are connected to the customer, but B will continue to ring. If B then picks up the phone they are greeted by a hangup tone. Ringing can continue for at least minutes until B picks up (I haven't checked if it ever times out).
Similar occurs if no one picks up and the call simply times out and is redirected to voicemail. As you can imagine, an endlessly ringing phone is pretty annoying, especially when there's no one on the other end.
I was able to replicate this issue using the above guide without modification (other than the minimum changes to set it up locally). Note that it doesn't dial workers simultaneously, rather it dials the first in line for a few seconds before moving to the next.
My interpretation of what is occurring is that Taskrouter is dialling workers, but not updating them when dialling should end, and simply moving on to the next stage of the workflow. It does update Worker status, so it knows if they've timed out for instance, but that doesn't update the actual call.
I have looked for any solutions to this and havent found much about it except the following:
How to make Twilio stop dialing numbers when hangup() is fired?
https://www.twilio.com/docs/api/rest/change-call-state
These don't specifically apply to Taskrouter, but suggest that a call that needs to be ended can be updated and completed.
I am not too sure if I can implement this however, as it seems to be using the same CallSid for all calls being dialled within a Workflow, makes it hard/impossible to seperate each call, and would end the active call as well.
It also just seems wrong that Taskrouter wouldn't be doing this automatically, so I wanted to ask about this before I tinker too much and break things.
Has anyone run into this issue before, or is able/unable to replicate it using the tutorial code?
When testing I've noticed the problem much more on landline numbers, which may only be because mobiles have their own timeout/redirects. VOIPs seem to immediately answer calls, so they behave a bit differently.
Any help/suggestions appreciated, thanks!
Current suggestion to work around this is to not issue the dequeue instruction immediately, but rather issue a Call instruction on the REST API when the Worker wishes to accept the Inbound Call.
This will create an Outbound Call to bridge the two calls together and thus won’t have many outbound calls for the same inbound caller at once.
Your implementation will depend on the behavior that you want to achieve:
Do you want to simul-dial both Workers?
Do you want to send
the task to both Workers and whoever clicks to Accept the Task first
will have the call routed to them?
If it's #2, this is a scenario where you're saying that the Worker should accept the Reservation (reservation.accepted) before issuing the Call.
If it's #1, you can either issue a Call Instruction or Dequeue Instruction. The key being that you provide a DequeueStatusCallbackUrl or CallStatusCallbackUrl to receive call progress events. Once one of the outbound calls is connected, you will need to complete the other associated call. So you will have to unfortunately track which outbound calls are tied to which Reservation, by using AssignmentCallbacks or EventCallbacks, to make that determination within your app.
We have an issue, more often than I would like, where whether worker or client sessions crash and these sessions were in the process of using a number sequences to create a new record, but they end up blocking that number sequence literally and anyone else trying to create a record using the same sequence will have its client frozen.
When this happens, I usually go in the NUMBERSEQUENCELIST table, I spot the correct DataAreadId and the user, and delete the row whose Status = 1.
But this kind of annoying really. Is there anything, any way I can configure the AOS server to release number sequence when client/workers crash ?
For the worker sessions, I guess we can fine tweak the code which runs in them, but for the client sessions crashing, not much we can do...
Any ideas ?
Thanks!
EDIT: Turns out that in this situation, after restarting the AOS server, you can go in List in the number sequence menu, and clean it up. Prior to the restart, my client would freeze trying to do that. So no need to do it directly through SQL.
Continuous numbers in NumberSequenceList are automatically cleaned up every 24 hours (or as set up on the number sequence). The cleanup process is quite slow if there are many "dead" numbers (hundreds or thousands). This may be considered as a hang, but is not.
Things to consider:
Is a continuous number sequence needed?
Do the cleanup more frequent (say every half hour instead of the default 24 hour)
Setup the cleanup process as a batch process
Fix the bug in the client code using the number sequence
Also avoid reserving the number, just use it. Instead of the anti-pattern:
NumberSeq idSequence = NumberSeq::newGetNum(IntrastatParameters::numRefIntrastatArchiveID(), true);
this.IntrastatArchiveID = idSequence.num();
idSequence.used();
Just use the number:
this.IntrastatArchiveID = NumberSeq::newGetNum(IntrastatParameters::numRefIntrastatArchiveID()).num();
The makeDecisionLater parameter should only be used in forms, where user may decide not to use the number (by delete or by escape). And in that case the NumberSeqFormHandler class should be used anyway.
I am using mongodb to store user's events, there's a document for every user, containing an array of events. The system processes thousands of events a minute and inserts each one of them to mongo.
The problem is that I get poor performance for the update operation, using a profiler, I notice that the WriteResult.getError is the one that incur the performance impact.
That makes sense, the update is async, but if one wants to retrieve the operation result he needs to wait until the operation is completed.
My question, is there a way to keep the update async, but only get an exception if error occurs (99.999 of the times there is no error, so the system waits for nothing). I understand it means the exception will be raised somewhere further down the process flow, but I can live with that.
Any other suggestions?
The application is written in Java so we're using the Java driver, but I am not sure it's related.
have you done indexing on your records?
it may be a problem to your performance.
if not done before you should do Indexing on ur collection like
db.collectionName.ensureIndex({"event.type":1})
for more help visit http://www.mongodb.org/display/DOCS/Indexes