getting system time in Vxworks - time

is there anyways to get the system time in VxWorks besides tickGet() and tickAnnounce? I want to measure the time between the task switches of a specified task but I think the precision of tickGet() is not good enough because the the two tickGet() values at the beggining and the end of taskSwitchHookAdd function is always the same!

If you are looking to try and time task switches, I would assume you need a timer at least at the microsecond (us) level.
Usually, timers/clocks this fine grained are only provided by the platform you are running on. If you are working on an embedded system, you can try and read thru the manuals for your board support package (if there is one) to see if there are any functions provided to access various timers on a board.
A more low level solution would be to figure out the processor that is running on your system and then write some simple assembly code to poll the processor's internal timebase register (TBR). This might require a bit of research on the processor you are running on, but could be easily done.
If you are running on a PPC based processor, you can use the code below to read the TBR:
loop: mftbu rx #load most significant half from TBU
mftbl ry #load least significant half from TBL
mftbu rz #load from TBU again
cmpw rz,rx #see if 'old' = 'new'
bne loop #repeat if two values read from TBU are unequal
On an x86 based processor, you might consider using the RDTSC assembly instruction to read the Time Stamp Counter (TSC). On vxWorks, pentiumALib has some library functions (pentiumTscGet64() and pentiumTscGet32()) that will make reading the TSC easier using C.
source: http://www-inteng.fnal.gov/Integrated_Eng/GoodwinDocs/pdf/Sys%20docs/PowerPC/PowerPC%20Elapsed%20Time.pdf
Good luck!

It depends on what platform you are on, but if it is x86 then you can use:
pentiumTscGet64();

Related

Why doesn't the Linux kernel see the cache sizes in the gem5 emulator in full system mode?

I want to play around with cache sizes in my gem5 simulator to see how it affects performance of programs, and possibly tune programs at runtime.
As a sanity check, I tried to check that the command lines arguments I used were working , and so I tried the various methods proposed at: https://superuser.com/questions/55776/finding-l2-cache-size-in-linux/1298808#1298808
cat /sys/devices/system/cpu/cpu0/cache/index2/size
getconf LEVEL2_CACHE_SIZE
But I observed that:
the file /sys/devices/system/cpu/cpu0/cache/index2/size does not exist
getconf is empty
Why is that?
I am certain however that the caches are being, since I've benchmarked simple programs, and the cycle counts increase when I decrease the caches.
For example, my base command is:
M5_PATH='/data/git/linux-kernel-module-cheat/gem5/gem5-system' '/data/git/linux-kernel-module-cheat/gem5/gem5/build/ARM/gem5.opt' '/data/git/linux-kernel-module-cheat/gem5/gem5/configs/example/fs.py' --command-line='earlyprintk=pl011,0x1c090000 console=ttyAMA0 lpj=19988480 rw loglevel=8 mem=512MB root=/dev/sda nokaslr norandmaps printk.devkmsg=on printk.time=y' --disk-image='/data/git/linux-kernel-module-cheat/buildroot/output.arm-gem5~/images/rootfs.ext2' --dtb-file='/data/git/linux-kernel-module-cheat/gem5/gem5/system/arm/dt/armv7_gem5_v1_1cpu.dtb' --kernel='/data/git/linux-kernel-module-cheat/buildroot/output.arm-gem5~/build/linux-custom/vmlinux' --machine-type=VExpress_GEM5_V1 --num-cpus=1 --caches --l1d_size=1024 --l1i_size=1024 --l2cache --l2_size=1024 --l3_size=1024 --cpu-type=HPI
With those tiny caches, running the following:
m5 resetstats && dhrystone 10000 && m5 dumpstats
takes 175M cycles, and only 16M cycles if I use the exact same command but with huge caches of size 1024MB.
I observe a similar behavior for x86.
I'm using this testing infrastructure: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/05d8a324f74849f03404eb847f8da748e2e4502c#gem5-change-system-parameters which implies:
gem5 commit: fbe63074e3a8128bdbe1a5e8f6509c565a3abbd4
Linux kernel v4.15 with configuration: https://github.com/cirosantilli/linux-kernel-module-cheat/blob/05d8a324f74849f03404eb847f8da748e2e4502c/kernel_config_arm-gem5
Related thread on the mailing list: http://gem5-users.gem5.narkive.com/4xVBlf3c/verify-cache-configuration
For comparison, QEMU v2.11.0 x86 did show the cache sizes, but not the ARM one.
Maybe for ARM we would need to modify the bootloaders to tell that to kernel? But I don't know how those things work very well:
https://github.com/gem5/gem5/blob/fbe63074e3a8128bdbe1a5e8f6509c565a3abbd4/system/arm/simple_bootloader/simple.S
https://github.com/gem5/gem5/blob/fbe63074e3a8128bdbe1a5e8f6509c565a3abbd4/system/arm/aarch64_bootloader/boot.S
I have been told that:
gem5 doesn't implement the cache size discovery registers.
The problem is that it is really hard to configure them in the general case, and they might not even be able to represent the hierarchy in gem5.

Retrieve RISC-V processor context after execution in FPGA

I'm loading RISC-V into a Zedboard and I'm running a benchmark (provided in riscv-tools) without booting riscv-linux, in this case:
./fesvr-zynq median.riscv
It finishes without errors, giving as result the number of cycles and instret.
My problem is that I want more information, I would like to know the processor context after the execution (register bank values and memory) as well as the result given by the algorithm. Is there any way to know this from the FPGA execution? I know that it can be done with the simulator but I need to run it on FPGA.
Thank you.
Do it the same way it gives you the cycles and instret data. Check out riscv-tests/benchmarks/common/*. The code is running bare metal so you can write whatever code you want and access any of the CSRs, registers or memory, and then you can use a basic version of printf to display the information.

What is a TRAMPOLINE_ADDR for ARM and ARM64(aarch64)?

I am writing a basic check-pointing mechanism for ARM64 using PTrace in order to do so I am using some code from cryopid and I found a TRAMPOLINE_ADDR macro like the following:
#define TRAMPOLINE_ADDR 0x00800000 /* 8MB mark */ for x86
#define TRAMPOLINE_ADDR 0x00300000 /* 3MB mark */ for x86_64
So when I read about trampolines it is something related to jump statements. But my questions is from where the above values came and what would the corresponding values for the ARM and ARM64 platform.
Thank you
Just read the wikipedia page.
There is nothing magic about a trampoline or certainly a particular address, any address where you can have code that executes can hold a trampoline. there are many use cases for them...for example
say you are booting off of a flash, a spi flash, running at some safe rate so that the chip boots for all users. But you want to increase the rate of the spi flash and the spi peripheral does not allow you to change while executing code. So you would copy some code to ram, that code boosts the spi flash rate to a faster rate so you can use and/or run the flash faster, then you bounce back to running from the flash. you have bounced or trampolined off of that little bit of code in ram.
you have a chip that boots from flash, but has the ability to re-map that address space to ram for example, so you copy some code to some other ram, branch to it that little bit of trampoline code remaps the address space, then bounces you back or bounces you to where the flash is now mapped to or whatever.
you will see the gnu linker sometimes add a small trampoline, say you compile some modules as thumb and some others for arm, you no longer have to use that interwork thing, the linker takes care of cleaning this up, it may add an instruction or two to trampoline you between modes, sometimes it modifies the code to just go where it needs to sometimes it modifies the code to branch link somewhere close and that somewhere close is a trampoline.
I assume there may be a need to do the same thing for aarch64 if/when switching to that mode.
so there should be no magic. your specific application might have one or many trampolines, and the one you are interested might not even be called that, but is probably application specific, absolutely no reason why there would be one address for everyone, unless it is some very rigid operating specific (again "application specific") thing and one specific trampoline for that operating system is at some DEFINEd address.

feature request: an atomicAdd() function included in gwan.h

In the G-WAN KV options, KV_INCR_KEY will use the 1st field as the primary key.
That means there is a function which increments atomically already built in the G-WAN core to make this primary index work.
It would be good to make this function opened to be used by servlets, i.e. included in gwan.h.
By doing so, ANSI C newbies like me could benefit from it.
There was ample discussion about this on the old G-WAN forum, and people were invited to share their experiences with atomic operations in order to build a rich list of documented functions, platform by platform.
Atomic operations are not portable because they address the CPU directly. It means that the code for Intel x86 (32-bit) and Intel AMD64 (64-bit) is different. Each platform (ARM, Power7, Cell, Motorola, etc.) has its own atomic instruction sets.
Such a list was not published in the gwan.h file so far because basic operations are easy to find (the GCC compiler offers several atomic intrinsics as C extensions) but more sophisticated operations are less obvious (needs asm skills) and people will build them as they need - for very specific uses in their code.
Software Engineering is always a balance between what can be made available at the lowest possible cost to entry (like the G-WAN KV store, which uses a small number of functions) and how it actually works (which is far less simple to follow).
So, beyond the obvious (incr/decr, set/get), to learn more about atomic operations, use Google, find CPU instruction sets manuals, and arm yourself with courage!
Thanks for Gil's helpful guidance.
Now, I can do it by myself.
I change the code in persistence.c, as below:
firstly, i changed the definition of val in data to volatile.
//data[0]->val++;
//xbuf_xcat(reply, "Value: %d", data[0]->val);
int new_count, loops=50000000, time1, time2, time;
time1=getus();
for(int i; i<loops; i++){
new_count = __sync_add_and_fetch(&data[0]->val, 1);
}
time2=getus();
time=loops/(time2-time1);
time=time*1000;
xbuf_xcat(reply, "Value: %d, time: %d incr_ops/msec", new_count, time);
I got 52,000 incr_operations/msec with my old E2180 CPU.
So, with GCC compiler I can do it by myself.
thanks again.

How can I get a pulse in win32 Assembler (specifically nasm)?

I'm planning on making a clock. An actual clock, not something for Windows. However, I would like to be able to write most of the code now. I'll be using a PIC16F628A to drive the clock, and it has a timer I can access (actually, it has 3, in addition to the clock it has built in). Windows, however, does not appear to have this function. Which makes making a clock a bit hard, since I need to know how long it's been so I can update the current time. So I need to know how I can get a pulse (1Hz, 1KHz, doesn't really matter as long as I know how fast it is) in Windows.
There are many timer objects available in Windows. Probably the easiest to use for your purposes would be the Multimedia Timer, but that's been deprecated. It would still work, but Microsoft recommends using one of the new timer types.
I'd recommend using a threadpool timer if you know your application will be running under Windows Vista, Server 2008, or later. If you have to support Windows XP, use a Timer Queue timer.
There's a lot to those APIs, but general use is pretty simple. I showed how to use them (in C#) in my article Using the Windows Timer Queue API. The code is mostly API calls, so I figure you won't have trouble understanding and converting it.
The LARGE_INTEGER is just an 8-byte block of memory that's split into a high part and a low part. In assembly, you can define it as:
MyLargeInt equ $
MyLargeIntLow dd 0
MyLargeIntHigh dd 0
If you're looking to learn ASM, just do a Google search for [x86 assembly language tutorial]. That'll get you a whole lot of good information.
You could use a waitable timer object. Since Windows is not a real-time OS, you'll need to make sure you set the period long enough that you won't miss pulses. A tenth of a second should be safe most of the time.
Additional:
The const LARGE_INTEGER you need to pass to SetWaitableTimer is easy to implement in NASM, it's just an eight byte constant:
period: dq 100 ; 100ms = ten times a second
Pass the address of period as the second argument to SetWaitableTimer.

Resources