I'm VERY puzzled and was hoping that someone here could provide some sanity.
I have a fairly elaborate Arduino sketch that is responding to scheduled events
at the time designated in the event. In order to do this I have setup a loop that examines a queue of events and checks the event time to see if it matches the current time. I also have code in the loop that checks the current temp and sends a serial message with a timestamp.
I noticed that it is no longer running for more than 10 - 15 minutes before it crashes (goes silent). I assumed it was some of my code, so I started commenting out pieces until it started working again... I was astounded to find that it only started working (ran for 14+ hours) after I removed the call to setTime()!
I can't understand how a call to the setTime() library function would cause a crash 10 minutes later. I assumed that this was a very stable library...
Any ideas would be welcome!
if(timeStatus() != timeSet) {
if (processSyncMessage()) {;
Serial.println(F("Time sync completed.\n"));
}
else { // Time sync failed for some reason, resend request.
Serial.println(F("Waiting for sync message\n"));
}
}
else { // time is set.
...
I have 'simplified' the following routine to remove the serial message receipt and parsing to further isolate the problem...
int processSyncMessage() {
// if time sync available from serial port, update time and return true
Serial.println(F("Processing time sync message"));
time_t pctime = 1371118147;
setTime(pctime); // Sync Arduino clock to the time received on the serial port
return true;
}
Related
Let me give a brief context first:
I have a scenario where the RSUs will broadcast a fixed message 'RSUmessage' about every TRSU seconds. I have implemented the following code for RSU broadcast (also, these fixed messages have Psid = -100 to differentiate them from others):
void TraCIDemoRSU11p::handleSelfMsg(cMessage* msg) {
if (WaveShortMessage* wsm = dynamic_cast<WaveShortMessage*>(msg)) {
if(wsm->getPsid()==-100){
sendDown(RSUmessage->dup());
scheduleAt(simTime() + trsu +uniform(0.02, 0.05), RSUmessage);
}
}
else {
BaseWaveApplLayer::handleSelfMsg(wsm);
}
}
A car can receive these messages from other cars as well as RSUs. RSUs discard the messages received from cars. The cars will receive multiple such messages, do some comparison stuff and periodically broadcast a similar type of message : 'aggregatedMessage' per interval Tcar. aggregatedMessage also have Psid=-100 ,so that the message can be differentiated from other messages easily.
I am scheduling the car events using self messages. (Though it could have been done inside handlePositionUpdate I believe). The handleSelfMsg of a car is following:
void TraCIDemo11p::handleSelfMsg(cMessage* msg) {
if (WaveShortMessage* wsm = dynamic_cast<WaveShortMessage*>(msg)) {
wsm->setSerial(wsm->getSerial() +1);
if (wsm->getPsid() == -100) {
sendDown(aggregatedMessage->dup());
//sendDelayedDown(aggregatedMessage->dup(), simTime()+uniform(0.1,0.5));
scheduleAt(simTime()+tcar+uniform(0.01, 0.05), aggregatedMessage);
}
//send this message on the service channel until the counter is 3 or higher.
//this code only runs when channel switching is enabled
else if (wsm->getSerial() >= 3) {
//stop service advertisements
stopService();
delete(wsm);
}
else {
scheduleAt(simTime()+1, wsm);
}
}
else {
BaseWaveApplLayer::handleSelfMsg(msg);
}
}
PROBLEM: With this setting, the simulation is very very slow. I get about 50 simulation seconds in 5-6 hours or more in Express mode in OMNET GUI. (No. of RSU: 64, Number of Vehicle: 40, around 1kmx1km map)
Also, I am referring to this post. The OP says that he got faster speed by removing the sending of message after each RSU received a message. In my case I cannot remove that, because I need to send out the broadcast messages after each interval.
Question: I think that this slowness is because every node is trying to sendDown messages at the beginning of each simulated second. Is it the case that when all vehicles and nodes schedules and sends message at the same time OMNET slows down? (Makes sense to slow down , but by what degree) But, there are only around 100 nodes overall in the simulation. Surely it cannot be this slow.
What I tried : I tried using sendDelayedDown(wsm->dup(), simTime()+uniform(0.1,0.5)); to spread the sending of the messages through out 1st half of each simulated second. This seems to stop messages piling up at the beginning of each simulation seconds and sped things up a bit, but not so much overall.
Can anybody please let me know if this is normal behavior or whether I am doing something wrong.
Also I apologize for the long post. I could not explain my problem without giving the context.
It seems you are flooding your network with messages: Every message from an RSU gets duplicated and transmitted again by every Car which has received this message. Hence, the computational time increases quadratically with the number of nodes (sender of messages) in your network (every sent message has to be handled by every node which is in range to receive it). The limit of 3 transmissions per message does not seem to help much and, as the comment in the code indicates, is not used at all, if there is no channel switching.
Therefore, if you can not improve/change your code to simply send less messages, you have to live with that. Your little tweak to send the messages in a delayed manner only distributes the messages over one second but does not solve the problem of flooding.
However, there are still some hints you can follow to improve the performance of your simulation:
Compile in release mode: make MODE=Release
Run your simulation in the terminal environment Cmdenv: ./run -u Cmdenv ...
If you need to use the graphical environment by all means, you should at least speed up the animations by using the slider in the upper part of the interface.
Removing the simtime-resolution parameter from the omnetpp.ini file solves the problem.
It seems the simulation kernel has an issue when the channel delay does not match the simulation-time resolution.
You can verify the solution by cloning the following repository. Note that you need a functional installation of the OMNeT++ framework. Specifically, I test this fix in OMNeT++ 5.6.2.
https://github.com/Ryuuba/flooding
We are using a custom board running version 3.2 of the kernel, but we've come across some issues when testing the robustness of the MMC layer.
First of all, our MMC slot does not have a card detection pin. The issue in question consists of the following:
Load the module (omap_hsmmc). The card is detected on power-up,
and is mounted appropriately.
Read something from the SD card (i.e. cat foo.txt)
While the file is being read, remove the card.
After many failed attempts, the system hangs.
Now, I've tracked the issue to the following code section, in drivers/mmc/card/queue.c:
static int mmc_queue_thread(void *d)
{
struct mmc_queue *mq = d;
struct request_queue *q = mq->queue;
current->flags |= PF_MEMALLOC;
down(&mq->thread_sem);
do {
struct request *req = NULL;
struct mmc_queue_req *tmp;
spin_lock_irq(q->queue_lock);
set_current_state(TASK_INTERRUPTIBLE);
req = blk_fetch_request(q);
mq->mqrq_cur->req = req;
spin_unlock_irq(q->queue_lock);
if (req || mq->mqrq_prev->req) {
set_current_state(TASK_RUNNING);
mq->issue_fn(mq, req);
} else {
if (kthread_should_stop()) {
set_current_state(TASK_RUNNING);
break;
}
up(&mq->thread_sem);
schedule();
down(&mq->thread_sem);
}
/* Current request becomes previous request and vice versa. */
mq->mqrq_prev->brq.mrq.data = NULL;
mq->mqrq_prev->req = NULL;
tmp = mq->mqrq_prev;
mq->mqrq_prev = mq->mqrq_cur;
mq->mqrq_cur = tmp;
} while (1);
up(&mq->thread_sem);
return 0;
}
Investigating this code, I've found that it hangs inside the mq->issue_fn(mq, req) call. This function prepares and issues the appropriate commands to fufill the request passed into it, and it is aware of any errors that happens when it can't communicate with the card. The error is handled in an awkward (in my opinion) manner, and does not 'bubble up' to the mmc_queue_thread. However, I've checked my version's code against that of the most recent kernel release (4.9), and I did not see any difference in the treatment of such errors, apart from a better separation of each error case (the treatment is very similar).
I suspect that the issue is caused by the inability ofthe upper layers to stop making new requests to read from the MMC card.
What I've Tried:
Rewrote the code so that the error is passed up so that I can do ret = mq->issue_fn(mq, req).
Being able to identify the specific errors, I tried to cancel the thread in different ways: calling kthread_stop, calling mmc_queue_suspend, __blk_end_request, and others. With these, the most I've been able to accomplish was keep the thread in a "harmless" state, where it still existed, but did not consume any resources. However, the user space program that triggered the call does not return, locking up in an uninterruptible state.
My questions:
Is the best way to solve this by stopping the upper layers from making new requests? Or should the thread itself be 'killed off'?
In a case such as mine, should it be assumed that because there is no card detection pin, the card should not be removed?
Update: I've found out that you can tell the driver to use polling to detect card insertion/removal. It can be done by adding the MMC_CAP_NEEDS_POLL flag to the mmc's .caps on driver initialization (on newer kernels, you can use the broken-cd property on your DT). However, the problem still happens after this modification.
Figured out the solution!
The issue was, as I suspected, that the block requests kept being issued even after the card was removed. The error was fixed in commit a8ad82cc1b22d04916d9cdb1dc75052e80ac803c from Texas' kernel repository:
commit a8ad82cc1b22d04916d9cdb1dc75052e80ac803c
Author: Sujit Reddy Thumma <sthumma#codeaurora.org>
Date: Thu Dec 8 14:05:50 2011 +0530
mmc: card: Kill block requests if card is removed
Kill block requests when the host realizes that the card is
removed from the slot and is sure that subsequent requests
are bound to fail. Do this silently so that the block
layer doesn't output unnecessary error messages.
Signed-off-by: Sujit Reddy Thumma <sthumma#codeaurora.org>
Acked-by: Adrian Hunter <adrian.hunter#intel.com>
Signed-off-by: Chris Ball <cjb#laptop.org>
The key in the commit is that a return BLKPREP_KILL was added to the mmc_prep_request(), besides some minor tweaks that added some 'granularity' to the error treatment, adding a specific code for the cases where the card was removed.
I need to design a rate limiter service for throttling requests.
For every incoming request a method will check if the requests per second has exceeded its limit or not. If it has exceeded then it will return the amount of time it needs to wait for being handled.
Looking for a simple solution which just uses system tick count and rps(request per second). Should not use queue or complex rate limiting algorithms and data structures.
Edit: I will be implementing this in c++. Also, note I don't want to use any data structures to store the request currently getting executed.
API would be like:
if (!RateLimiter.Limit())
{
do work
RateLimiter.Done();
}
else
reject request
The most common algorithm used for this is token bucket. There is no need to invent a new thing, just search for an implementation on your technology/language.
If your app is high avalaible / load balanced you might want to keep the bucket information on some sort of persistent storage. Redis is a good candidate for this.
I wrote Limitd is a different approach, is a daemon for limits. The application ask the daemon using a limitd client if the traffic is conformant. The limit is configured on the limitd server and the app is agnostic to the algorithm.
since you give no hint of language or platform I'll just give out some pseudo code..
things you are gonna need
a list of current executing requests
a wait to get notified where a requests is finished
and the code can be as simple as
var ListOfCurrentRequests; //A list of the start time of current requests
var MaxAmoutOfRequests;// just a limit
var AverageExecutionTime;//if the execution time is non deterministic the best we can do is have a average
//for each request ether execute or return the PROBABLE amount to wait
function OnNewRequest(Identifier)
{
if(count(ListOfCurrentRequests) < MaxAmoutOfRequests)//if we have room
{
Struct Tracker
Tracker.Request = Identifier;
Tracker.StartTime = Now; // save the start time
AddToList(Tracker) //add to list
}
else
{
return CalculateWaitTime()//return the PROBABLE time it will take for a 'slot' to be available
}
}
//when request as ended release a 'slot' and update the average execution time
function OnRequestEnd(Identifier)
{
Tracker = RemoveFromList(Identifier);
UpdateAverageExecutionTime(Now - Tracker.StartTime);
}
function CalculateWaitTime()
{
//the one that started first is PROBABLY the first to finish
Tracker = GetTheOneThatIsRunnigTheLongest(ListOfCurrentRequests);
//assume the it will finish in avg time
ProbableTimeToFinish = AverageExecutionTime - Tracker.StartTime;
return ProbableTimeToFinish
}
but keep in mind that there are several problems with this
assumes that by returning the wait time the client will issue a new request after the time as passed. since the time is a estimation, you can not use it to delay execution, or you can still overflow the system
since you are not keeping a queue and delaying the request, a client can be waiting for more time that what he needs.
and for last, since you do not what to keep a queue, to prioritize and delay the requests, this mean that you can have a live lock, where you tell a client to return later, but when he returns someone already took its spot, and he has to return again.
so the ideal solution should be a actual execution queue, but since you don't want one.. I guess this is the next best thing.
according to your comments you just what a simple (not very precise) requests per second flag. in that case the code can be something like this
var CurrentRequestCount;
var MaxAmoutOfRequests;
var CurrentTimestampWithPrecisionToSeconds
function CanRun()
{
if(Now.AsSeconds > CurrentTimestampWithPrecisionToSeconds)//second as passed reset counter
CurrentRequestCount=0;
if(CurrentRequestCount>=MaxAmoutOfRequests)
return false;
CurrentRequestCount++
return true;
}
doesn't seem like a very reliable method to control whatever.. but.. I believe it's what you asked..
Recently I met a problem that my program seems waiting for an event handle and never return.
I call IcmpSendEcho2 to send ping request and have set the event parameter correctly. In normal situation if the ping response arrived or timeout the event handle should be singled by system. This works fine in most time but one day I found my program (some of the working threads) is halt after my computer wakes up from sleep state. I guess the root cause is that when the ping request is sent and before system signal the corresponding event handle then the system change to sleep state (no operation for a long time then laptop change to sleep state). After system wakes up by me the previous event handle got no chance to be singled.
Add the pseudo code is:
// a working thread
while (isWorking()) {
...
IcmpSendEcho2(hIcmpFile, hEvent, NULL, NULL, ..., 5000/*timeout is 5 sec*/)
...
WaitForsingleObject(hEvent, INFINITE); // hEvent should be signaled after
// response arrived or timeout (5sec)
// elapsed
//
...
}
I don't know if I'm wrong. I'm still investigating on this problem and post this issue here hope somebody who has experience the same problem could help me. Thank you in advance!
I am building latency measurement into a communication middleware I am building. The way I have it working is that I periodically send a probe msg from my publishing apps. Subscribing apps receive this probe, cache it, and send an echo back at a time of their choosing, noting how much time the msg was kept “on hold”. The subscribing app receives these echos and calculates latency as (now() – time_sent – time_on_hold) / 2.
This kinda works, but the numbers are vastly different (3x) when “time on hold” is greater than 0. I.e if I echo the msg back immediately I get around 50us on my dev env and if I wait, then send the msg back the time jumps to 150us (though I discount whatever time I was on hold). I use QueryPerfomanceCounter for all measurements.
This is all inside a single Windows 7 box. What am I missing here?
TIA.
A bit more information. I am using the following to measure time:
static long long timeFreq;
static struct Init
{
Init()
{
QueryPerformanceFrequency((LARGE_INTEGER*) &timeFreq);
}
} init;
long long OS::now()
{
long long result;
QueryPerformanceCounter((LARGE_INTEGER*)&result);
return result;
}
double OS::secondsDiff(long long ts1, long long ts2)
{
return (double) (ts1-ts2)/timeFreq;
}
On the publish side I do something like:
Probe p;
p.sentTimeStamp = OS::now();
send(p);
Response r = recv();
latency=OS::secondsDiff(OS::now()- r.sentTimeStamp) - r.secondsOnHoldOnReceiver;
And on the receiver side:
Probe p = recv();
long long received = OS::now();
sleep();
Response r;
r.sentTimeStamp = p.timeStamp;
r.secondOnHoldOnReceiver = OS::secondsDiff(OS::now(), received);
send(r);
Ok, I have edited my answer to reflect your answer: Sorry for the delay, but I didn't notice that you had elaborated on the question by creating an answer.
It's seems that functionally you are doing nothing wrong.
I think that when you distribute your application outside of localhost conditions, the additional 100us (if it is indeed roughly constant) will pale into insignificance compared to the average latency of a functioning network.
For the purposes of answering your question am thinking that there is a thread/interrupt scheduling issue on the server side that needs to be investigated, as you do not seem to be doing anything on the client that is not accounted for.
Try the following test scenario:
Send two Probes to clients A and B. (all localhost)
Send the Probe to 'Client B' one second (or X/2 seconds) after you send the probe to Client A.
Ensuring that 'Client A' waits for two seconds (or X seconds) and 'Client B' waits 'one second (or X/2 seconds)
The idea being that hopefully, both clients will send back their probe answers at roughly the same time and both after a sleep/wait (performing the action that exposes the problem). The objective is to try to get one of the clients responses to 'wake up' the publisher to see if the next clients answer will be processed immediately.
If one of these returned probes is not showing the anomily (most likely the second response) it could point to the fact that the publisher thread is waking from a sleep cycle (on recv 1st responce) and is immediately available to process the second response.
Again, if it turns out that the 100us delay is roughly constant, it will be +-10% of 1ms which is the timeframe appropriate for realworld network conditions.