Does OMNeT++ / INET consider computation times for e.g. checksum calculations - time

I want to use OMNeT++ and INET for a network simulation. The focus of my simulation lies in the correct representation of the timing behavior. Therefore, the simulation should not only consider the time of the transmission, but also how long the packet is delayed within the stack. Such delays can occur because of necessary checksum calculations for e.g. TCP, UDP or IPv4. As far as I've seen, the checksum calculation is not considered in INET, it's only possible to represent an incorrect checksum by a bit error.
But I wanted to ask here to make sure that I didn't miss an option that allows to consider such effect on the timing behavior.
I'm thankful for your feedback.

You are correct, the time consumption inside the stack or the time spent during the processing of packets is not considered or included in INET a priori.
This is a complicated topic, because these kinds of "delays" strongly depend on the actual real-life system, the situation of the system, the actually used software etc. Even if all kinds of processing delays are modeled and included, one big question would remain (among others): How to set the delays? To which values? How to verify the correct value settings? Etc...
This discussion aside, if you want to include processing delays, you could start by modeling them via self-messages. Whenever a "processing delay" relevant operation starts, a self-message with a delay (the actual processing time) is sent to the module itself. When the message gets processed, the actual code is executed and the simulation time would advance.
This, if course, requires that depending functions are blocked for the duration of the processing... might be a complex work to introduce such things into the INET stack.

Related

What is the proper way to handle time synchronization when generating sequential events?

The program I am working on generates timestamped sequential events with milliseconds precision.
E.G.
event_1 at time 2022-04-27T12:00:00.000000-04:00
event_2 at time 2022-04-27T12:00:00.001000-04:00
event_3 at time 2022-04-27T12:00:00.002000-04:00
etc...
The problem I am facing is that, as the system performs a time synchronization via NTP, it is possible for the system clock to "go back in time".
In rare circumstances, this might create a situation where an event that should logically be generated after another to be timestamped before it.
E.G.
event_1 at time 2022-04-27T12:00:00.000000-04:00
event_2 at time 2022-04-27T12:00:00.002000-04:00
TimeSync at this point
event_3 at time 2022-04-27T12:00:00.001000-04:00
etc...
Later on, when another system consumes these events (most likely to generate a report), if they assert the logical order of the events from their timestamps, it will fail since it'll appear as event_3 happened before event_2, which wouldn't be correct.
I was wondering what would be the correct way to tackle this issue.
Currently, if multiple events happen under a milliseconds of each other (which is possible), we doctor their timestamps to artificially give them that milliseconds difference/precision after their previous one. This has the side effect of protecting us from the above undesired behavior until the system slows down long enough for the real clock to "catch up" to our doctored timestamps.
Another obvious solution would be to give each event a unique sequential number independent from their timestamps. The problem here is that the domain/standard we are implementing doesn't support this sequential id, so most likely, if we need to interface with a vendor, we'll end up with the same issue.
For context, these timestamps are generated using boost library in C++.

Best Route For Input Clocks on Kintex7 FPGA

I'm looking for advice on a less than ideal situation.
I've inherited a project where we have a hardware design issue. We generate a clock to a chip which feeds the clock back in over a none clock-capable input. This works at up to 160MHz but we are looking to increase the clock so I'm researching IO options. This is used to clock 8 parallel data inputs.
Right now the data inputs go through a delay and a IDDR block. The output is fed to a FIFO. Our clock is still routed to a BUFG - so we have:
Data - IDELAY - IDDR - FIFO
Clock - BUFG ----^------^
I read somewhere that routing to a BUFG has a large delay so a BUFR-BUFIO is better. Is this the case? Have I missed a better option?
When you say generating a clock to "a chip", I will assume that you mean the Kintex7 chip.
The delay is not a problem. The issue is for your timing closure to be set up properly so that the static timing analysis can validate whether you violate any setup or hold time in all boundary corners of the board.
If you look at DS182 document, you will find under AC Switching characteristics which will give you a rough idea on how well the chip can perform.
However, the best is to let the timing analyzer inside Vivado calculate for you whether your desired clock frequency will be able to close timing.
You just need to make sure
The data input is synchronous to your final clock.
If it isn't, then clock that data input across two stages of registers with respect to the final clock.
Specify your timing constraints
Run through synthesis and implementation
Check the timing to see that there are no violations.
Or maybe I did not understand something about what you are trying to do.

Is it "worth it" to reuse event variables in CUDA?

When using events in CUDA, I typically create an event and immediately record it on some stream. After synchronizing, I don't bother to hold on to that cudaEvent_t, to use it elsewhere - I just destroy it.
Other than avoiding the overhead of event creation and destruction, is there any other benefit to "recycling" events? If not, why did nVIDIA bother to separate cudaEventCreate() from cudaEventRecord() ?
First I'm trying to answer the question "what the overhead could be". As we don't have the source code of CUDA event. Everything is based on some reasonable guess. You could make totally different design decision to implement the CUDA event with same or similar behavior.
In the timing task we know that at least the time of the event is recorded somewhere. As the event happens on the device side, I think the time is recorded in the device side memory to avoid using PCIe (high overhead) during recording. As eventually you get the time from the host side, the recorded time must be transferred through PCIe at sometime (probably eventSync()).
You see during the whole procedure, you need some space both in host and device side memory to store the time. It looks good to me a perfect place to allocate/release the memory in eventCreate()/eventDestroy(), just like malloc()/free(). It also looks like a perfect overhead that you want to avoid when recording the time repeatedly (reusing the event).
So two types of overhead here, Allocating device and host space, and PCIe transfer. This is my guess. Maybe you could have another way to implement the timing functionality without involving these overheads.
Then finally, avoiding these overheads seems like a good reason that nVidia uses a separate eventCreate().

What is the best way for "Polling"?

This question is related with Microcontroller programming but anyone may suggest a good algorithm to handle this situation.
I have a one central console and set of remote sensors. The central console has a receiver and the each sensor has a transmitter operates on same frequency. So we can only implement Simplex communication.
Since the transmitters work on same frequency we cannot have 2 sensors sending data to central console at the same time.
Now I want to program the sensors to perform some "polling". The central console should get some idea about the existence of sensors (Whether the each sensor is responding or not)
I can imagine several ways.
Using a same interval between the poll messages for each sensor and start the sensors randomly. So they will not transmit at the same time.
Use of some round mechanism. Sensor 1 starts polling at 5 seconds the second at 10 seconds etc. More formal version of method 1.
The maximum data transfer rate is around 4800 bps so we need to consider that as well.
Can some one imagine a good way to resolve this with less usage of communication links. Note that we can use different poll intervals for each sensors if necessary.
I presume what you describe is that the sensors and the central unit are connected to a bus that can deliver only one message at a time.
A normal way to handle this is to have collision detection. This is e.g. how Ethernet operates as far as I know. You try to send a message; then attempt to detect collision. If you detect a collision, wait for a random amount (to break ties) and then re-transmit, of course with collision check again.
If you can't detect collisions, the different sensors could have polling intervals that are all distinct prime numbers. This would guarantee that every sensor would have dedicated slots for successful polling. Of course there would be still collisions, but they wouldn't need to be detected. Here example with primes 5, 7 and 11:
----|----|----|----|----|----|----|----| (5)
------|------|------|------|------|----- (7)
----------|----------|----------|-:----- (11)
* COLLISION
Notable it doesn't matter if the sensor starts "in phase" or "out of phase".
I think you need to look into a collision detection system (a la Ethernet). If you have time-based synchronization, you rely on the clocks on the console and sensors never drifting out of sync. This might be ok if they are connected to an external, reliable time reference, or if you go to the expense of having a battery backed RTC on each one (expensive).
Consider using all or part of an existing protocol, unless protocol design is an end in itself - apart from saving time you reduce the probability that your protocol will have a race condition that causes rare irreproducible bugs.
A lot of protocols for this situation have the sensors keeping quiet until the master specifically asks them for the current value. This makes it easy to avoid collisions, and it makes it easy for the master to request retransmissions if it thinks it has missed a packet, or if it is more interested in keeping up to date with one sensor than with others. This may even give you better performance than a system based on collision detection, especially if commands from the master are much shorter than sensor responses. If you end up with something like Alohanet (see http://en.wikipedia.org/wiki/ALOHAnet#The_ALOHA_protocol) you will find that the tradeoff between not transmitting very often and having too many collisions forces you to use less than 50% of the available bandwidth.
Is it possible to assign a unique address to each sensor?
In that case you can implement a Master/Slave protocol (like Modbus or similar), with all devices sharing the same communication link:
Master is the only device which can initiate communication. It can poll each sensor separately (one by one), by broadcasting its address to all slaves.
Only the slave device which was addressed will reply.
If there is no response after a certain period of time (timeout), device is not available and Master can poll the next device.
See also: List of automation protocols
I worked with some Zigbee systems a few years back. It only had two sensors so we just hard-coded them with different wait times and had them always respond to requests. But since Zigbee has systems However, we considered something along the lines of this:
Start out with an announcement from the console 'Hey everyone, let's make a network!'
Nodes all attempt to respond with something like 'I'm hardware address x, can I join?'
At first it's crazy, but with some random retry times, eventually the console responds to all nodes: 'Yes hardware address x, you can join. You are node #y and you will have a wait time of z milliseconds from the time you receive your request for data'
Then it should be easy. Every time the console asks for data the nodes respond in their turn. Assuming transmission of all of the data takes less time than the polling period you're set. It's best not to acknowledge the messages. If the console fails to respond, then very likely the node will try to retransmit just when another node is trying to send data, messing both of them up. Then it snowballs into complete failure...

Power Efficient Software Coding

In a typical handheld/portable embedded system device Battery life is a major concern in design of H/W, S/W and the features the device can support. From the Software programming perspective, one is aware of MIPS, Memory(Data and Program) optimized code.
I am aware of the H/W Deep sleep mode, Standby mode that are used to clock the hardware at lower Cycles or turn of the clock entirel to some unused circutis to save power, but i am looking for some ideas from that point of view:
Wherein my code is running and it needs to keep executing, given this how can I write the code "power" efficiently so as to consume minimum watts?
Are there any special programming constructs, data structures, control structures which i should look at to achieve minimum power consumption for a given functionality.
Are there any s/w high level design considerations which one should keep in mind at time of code structure design, or during low level design to make the code as power efficient(Least power consuming) as possible?
Like 1800 INFORMATION said, avoid polling; subscribe to events and wait for them to happen
Update window content only when necessary - let the system decide when to redraw it
When updating window content, ensure your code recreates as little of the invalid region as possible
With quick code the CPU goes back to deep sleep mode faster and there's a better chance that such code stays in L1 cache
Operate on small data at one time so data stays in caches as well
Ensure that your application doesn't do any unnecessary action when in background
Make your software not only power efficient, but also power aware - update graphics less often when on battery, disable animations, less hard drive thrashing
And read some other guidelines. ;)
Recently a series of posts called "Optimizing Software Applications for Power", started appearing on Intel Software Blogs. May be of some use for x86 developers.
Zeroith, use a fully static machine that can stop when idle. You can't beat zero Hz.
First up, switch to a tickless operating system scheduler. Waking up every millisecend or so wastes power. If you can't, consider slowing the scheduler interrupt instead.
Secondly, ensure your idle thread is a power save, wait for next interrupt instruction.
You can do this in the sort of under-regulated "userland" most small devices have.
Thirdly, if you have to poll or perform user confidence activities like updating the UI,
sleep, do it, and get back to sleep.
Don't trust GUI frameworks that you haven't checked for "sleep and spin" kind of code.
Especially the event timer you may be tempted to use for #2.
Block a thread on read instead of polling with select()/epoll()/ WaitForMultipleObjects().
Puts stress on the thread scheuler ( and your brain) but the devices generally do okay.
This ends up changing your high-level design a bit; it gets tidier!.
A main loop that polls all the things you Might do ends up slow and wasteful on CPU, but does guarantee performance. ( Guaranteed to be slow)
Cache results, lazily create things. Users expect the device to be slow so don't disappoint them. Less running is better. Run as little as you can get away with.
Separate threads can be killed off when you stop needing them.
Try to get more memory than you need, then you can insert into more than one hashtable and save ever searching. This is a direct tradeoff if the memory is DRAM.
Look at a realtime-ier system than you think you might need. It saves time (sic) later.
They cope better with threading too.
Do not poll. Use events and other OS primitives to wait for notifiable occurrences. Polling ensures that the CPU will stay active and use more battery life.
From my work using smart phones, the best way I have found of preserving battery life is to ensure that everything you do not need for your program to function at that specific point is disabled.
For example, only switch Bluetooth on when you need it, similarly the phone capabilities, turn the screen brightness down when it isn't needed, turn the volume down, etc.
The power used by these functions will generally far outweigh the power used by your code.
To avoid polling is a good suggestion.
A microprocessor's power consumption is roughly proportional to its clock frequency, and to the square of its supply voltage. If you have the possibility to adjust these from software, that could save some power. Also, turning off the parts of the processor that you don't need (e.g. floating-point unit) may help, but this very much depends on your platform. In any case, you need a way to measure the actual power consumption of your processor, so that you can find out what works and what not. Just like speed optimizations, power optimizations need to be carefully profiled.
Consider using the network interfaces the least you can. You might want to gather information and send it out in bursts instead of constantly send it.
Look at what your compiler generates, particularly for hot areas of code.
If you have low priority intermittent operations, don't use specific timers to wake up to deal with them, but deal with when processing other events.
Use logic to avoid stupid scenarios where your app might go to sleep for 10 ms and then have to wake up again for the next event. For the kind of platform mentioned it shouldn't matter if both events are processed at the same time.
Having your own timer & callback mechanism might be appropriate for this kind of decision making. The trade off is in code complexity and maintenance vs. likely power savings.
Simply put, do as little as possible.
Well, to the extent that your code can execute entirely in the processor cache, you'll have less bus activity and save power. To the extent that your program is small enough to fit code+data entirely in the cache, you get that benefit "for free". OTOH, if your program is too big, and you can divide your programs into modules that are more or less independent of the other, you might get some power saving by dividing it into separate programs. (I suppose it's also possible to make a toolchain that spreas out related bundles of code and data into cache-sized chunks...)
I suppose that, theoretically, you can save some amount of unnecessary work by reducing the number of pointer dereferencing, and by refactoring your jumps so that the most likely jumps are taken first -- but that's not realistic to do as a programmer.
Transmeta had the idea of letting the machine do some instruction optimization on-the-fly to save power... But that didn't seem to help enough... And look where that got them.
Set unused memory or flash to 0xFF not 0x00. This is certainly true for flash and eeprom, not sure about s or d ram. For the proms there is an inversion so a 0 is stored as a 1 and takes more energy, a 1 is stored as a zero and takes less. This is why you read 0xFFs after erasing a block.
Rather timely this, article on Hackaday today about measuring power consumption of various commands:
Hackaday: the-effect-of-code-on-power-consumption
Aside from that:
- Interrupts are your friends
- Polling / wait() aren't your friends
- Do as little as possible
- make your code as small/efficient as possible
- Turn off as many modules, pins, peripherals as possible in the micro
- Run as slowly as possible
- If the micro has settings for pin drive strengh, slew rate, etc. check them & configure them, the defaults are often full power / max speed.
- returning to the article above, go back and measure the power & see if you can drop it by altering things.
also something that is not trivial to do is reduce precision of the mathematical operations, go for the smallest dataset available and if available by your development environment pack data and aggregate operations.
knuth books could give you all the variant of specific algorithms you need to save memory or cpu, or going with reduced precision minimizing the rounding errors
also, spent some time checking for all the embedded device api - for example most symbian phones could do audio encoding via a specialized hardware
Do your work as quickly as possible, and then go to some idle state waiting for interrupts (or events) to happen. Try to make the code run out of cache with as little external memory traffic as possible.
On Linux, install powertop to see how often which piece of software wakes up the CPU. And follow the various tips that the powertop site links to, some of which are probably applicable to non-Linux, too.
http://www.lesswatts.org/projects/powertop/
Choose efficient algorithms that are quick and have small basic blocks and minimal memory accesses.
Understand the cache size and functional units of your processor.
Don't access memory. Don't use objects or garbage collection or any other high level constructs if they expands your working code or data set outside the available cache. If you know the cache size and associativity, lay out the entire working data set you will need in low power mode and fit it all into the dcache (forget some of the "proper" coding practices that scatter the data around in separate objects or data structures if that causes cache trashing). Same with all the subroutines. Put your working code set all in one module if necessary to stripe it all in the icache. If the processor has multiple levels of cache, try to fit in the lowest level of instruction or data cache possible. Don't use floating point unit or any other instructions that may power up any other optional functional units unless you can make a good case that use of these instructions significantly shortens the time that the CPU is out of sleep mode.
etc.
Don't poll, sleep
Avoid using power hungry areas of the chip when possible. For example multipliers are power hungry, if you can shift and add you can save some Joules (as long as you don't do so much shifting and adding that actually the multiplier is a win!)
If you are really serious,l get a power-aware debugger, which can correlate power usage with your source code. Like this

Resources