NTP Shared Memory Driver Structure - shared-memory

I would like to use the shared memory driver to push time from a GPS
receiver into NTP (Note 1) - host OS is Linux, using NTP 4.2.6p5.
According to documentation, I
need to populate the following structure in shared memory:
struct shmTime {
int mode; /* 0 - if valid set
* use values,
* clear valid
* 1 - if valid set
* if count before and after read of
* values is equal,
* use values
* clear valid
*/
int count; /* See documentation for "mode" on the site */
time_t clockTimeStampSec; /* external clock */
int clockTimeStampUSec; /* external clock */
time_t receiveTimeStampSec; /* internal clock, when external value was received */
int receiveTimeStampUSec; /* internal clock, when external value was received */
int leap; /* ??? */
int precision; /* Precision of the timestamp, in 2^precision seconds */
int nsamples; /* Set by NTPD - do not populate */
int valid; /* Shared memory is valid */
int dummy[10];
};
The major fields missing are a precise definition of "Timestamp" and "leap". I take it that timestamp is Unix time, in UTC, but how is it to be adjusted for leap seconds, and how does this interact with the "leap" field? Canonical examples of using the interface such as shmpps.c don't really answer the question.
(Note 1: In this project, we are processing our own GPS solution and time based on raw observables and are hence are not using NMEA or other GPS protocol. PPS is supplied to us via a GPIO line and we are using the ATOM driver to capture this via linuxpps)

Related

Best way to maintain an RNG state in multiple devices in openCL

So I'm trying to make use of this custom RNG library for openCL:
http://cas.ee.ic.ac.uk/people/dt10/research/rngs-gpu-mwc64x.html
The library defines a state struct:
//! Represents the state of a particular generator
typedef struct{ uint x; uint c; } mwc64x_state_t;
And in order to generate a random uint, you pass in the state into the following function:
uint MWC64X_NextUint(mwc64x_state_t *s)
which updates the state, so that when you pass it into the function again, the next "random" number in the sequence will be generated.
For the project I am creating I need to be able to generate random numbers not just in different work groups/items but also across multiple devices simultaneously and I'm having trouble figuring out the best way to design this. Like should I create 1 mwc64x_state_t object per device/commandqueue and pass that state in as a global variable? Or is it possible to create 1 state object for all devices at once?
Or do I not even pass it in as a global variable and declare a new state locally within each kernel function?
The library also comes with this function:
void MWC64X_SeedStreams(mwc64x_state_t *s, ulong baseOffset, ulong perStreamOffset)
Which supposedly is supposed to split up the RNG into multiple "streams" but including this in my kernel makes it incredibly slow. For instance, if I do something very simple like the following:
__kernel void myKernel()
{
mwc64x_state_t rng;
MWC64X_SeedStreams(&rng, 0, 10000);
}
Then the kernel call becomes around 40x slower.
The library does come with some source code that serves as example usages but the example code is kind of limited and doesn't seem to be that helpful.
So if anyone is familiar with RNGs in openCL or if you've used this particular library before I'd very much appreciate your advice.
The MWC64X_SeedStreams function is indeed relatively slow, at least in comparison
to the MWC64X_NextUint call, but this is true of most parallel RNGs that try
to split a large global stream into many sub-streams that can be used in
parallel. The assumption is that you'll be calling NextUint many times
within the kernel (e.g. a hundred or more), but SeedStreams is only at the top.
This is an annotated version of the EstimatePi example that comes with
with the library (mwc64x/test/estimate_pi.cpp and mwc64x/test/test_mwc64x.cl).
__kernel void EstimatePi(ulong n, ulong baseOffset, __global ulong *acc)
{
// One RNG state per work-item
mwc64x_state_t rng;
// This calculates the number of samples that each work-item uses
ulong samplesPerStream=n/get_global_size(0);
// And then skip each work-item to their part of the stream, which
// will from stream offset:
// baseOffset+2*samplesPerStream*get_global_id(0)
// up to (but not including):
// baseOffset+2*samplesPerStream*(get_global_id(0)+1)
//
MWC64X_SeedStreams(&rng, baseOffset, 2*samplesPerStream);
// Now use the numbers
uint count=0;
for(uint i=0;i<samplesPerStream;i++){
ulong x=MWC64X_NextUint(&rng);
ulong y=MWC64X_NextUint(&rng);
ulong x2=x*x;
ulong y2=y*y;
if(x2+y2 >= x2)
count++;
}
acc[get_global_id(0)] = count;
}
So the intent is that n should be large and grow as the number
of work items grow, so that samplesPerStream remains around
a hundred or more.
If you want multiple kernels on multiple devices, then you
need to add another level of hierarchy to the stream splitting,
so for example if you have:
K : Number of devices (possibly on parallel machines)
W : Number work-items per device
C : Number of calls to NextUint per work-item
You end up with N=KWC total calls to NextUint across all
work-items. If your devices are identified as k=0..(K-1),
then within each kernel you would do:
MWC64X_SeedStreams(&rng, W*C*k, C);
Then the indices within the stream would be:
[0 .. N ) : Parts of stream used across all devices
[k*(W*C) .. (k+1)*(W*C) ) : Used within device k
[k*(W*C)+(i*C) .. (k*W*C)+(i+1)*C ) : Used by work-item i in device k.
It is fine if each kernel uses less than C samples, you can
over-estimate if necessary.
(I'm the author of the library).

What does the 'h' stand for in the hexadecimal value '4D42h', and why are the bytes reversed?

I'm currently reading about BMP file formats and I've come across the following lines of code which are used to define a bitmap file header.
typedef struct _WinBMPFileHeader
{
WORD FileType; /* File type, always 4D42h ("BM") */
DWORD FileSize; /* Size of the file in bytes */
WORD Reserved1; /* Always 0 */
WORD Reserved2; /* Always 0 */
DWORD BitmapOffset; /* Starting position of image data in bytes */
} WINBMPFILEHEADER;
The first comment claims that 4D42h is identical to BM.
According to Wikipedia 4D is the hexadecimal ASCII code for M, while 42 is the ASCII code for B.
What does the h stand for, though?
And why isn't 4D42h identical to MB instead?
In this context h stands for hexadecimal notation.
For the order, you need to consider the endianess of your processor, where in this case they are stored on disk in a different byte order, than logically in memory.

How to compute SHA1 of an array in Linux kernel

I'm trying to compute SHA1 of an integer array in the Linux kernel. I have gone through crypto.c/crypto.h and security/integrity/ima/ima_crypto.c but I can't figure out how to init and then update the SHA1 computer. Can someone point me to a tutorial or guide on how to go about doing this?
There's a pretty good introduction to the linux cryptography api in Documentation/crypto/api-intro.txt. Also check out fs/ecryptfs/crypto.c for a real-life example of how the functions are used.
Here's a quick summary though to get you start:
Step 1: Declaration
Create some local variables:
struct scatterlist sg;
struct hash_desc desc;
char *plaintext = "plaintext goes here";
size_t len = strlen(plaintext);
u8 hashval[20];
A struct scatterlist is used to hold your plaintext in a format the crypto.h functions can understand, while a struct hash_desc is used to configure the hashing.
The variable plaintext holds our plaintext string, while hashval will hold the hash of our plaintext.
Finally, len holds the length the plaintext string.
Note that while I'm using ASCII plaintext in this example, you can pass an integer array as well -- just store the total memory size in len and replace every instance of plaintext with your integer array:
int myarr[4] = { 1, 3, 3, 7 };
size_t len = sizeof(myarr);
Be careful though: an int element generally has a size greater than a byte, so storing integer values in an int array won't have the same internal representation as a char array -- you may end up with null bytes as padding in between values.
Furthermore, if your intention is to hash the ASCII representation of your integers, you will have to first convert the values in your array to a string character sequence (perhaps using sprintf).
Step 2: Initialization
Initialize sg and desc:
sg_init_one(&sg, plaintext, len);
desc.tfm = crypto_alloc_hash("sha1", 0, CRYPTO_ALG_ASYNC);
Notice that "sha1" is passed to crypto_alloc_hash; this can be set to "md5" for MD5 hashing, or any other supported string in order to use the respective hashing method.
Step 3: Hashing
Now perform the hashing with three function calls:
crypto_hash_init(&desc);
crypto_hash_update(&desc, &sg, len);
crypto_hash_final(&desc, hashval);
crypto_hash_init configures the hashing engine according to the supplied struct hash_desc.
crypto_hash_update performs the actual hashing method on the plaintext.
Finally, crypto_hash_final copies the hash to a character array.
Step 4: Cleanup
Free allocated memory held by desc.tfm:
crypto_free_hash(desc.tfm);
See also
how to use CryptoAPI in the linux kernel 2.6

Drive letter for USB device and HDD devices in Linux kernel

How the drive letter is assigned to USB/HDD drives? I meant in the code level. I looked at the code and noticed that the gendisk struct having the disk_name. that gives sda/sdb/sdc....etc. But if the disk is detected as sda1, sdc1...then where these names can be get form the structures/code?
sda/sdb, etc are the block device representing the entire drive. When the drive is partitioned, you will see the sda1, sdc1, etc. These block devices are used to access just that partition.
I'm not familiar with the code specifically, but hopefully this will help point you in the right direction.
One useful investigation starting point code-wise would be function disk_name(), defined in block/partition-generic.c:
/*
* disk_name() is used by partition check code and the genhd driver.
* It formats the devicename of the indicated disk into
* the supplied buffer (of size at least 32), and returns
* a pointer to that same buffer (for convenience).
*/
char *disk_name(struct gendisk *hd, int partno, char *buf)
{
if (!partno)
snprintf(buf, BDEVNAME_SIZE, "%s", hd->disk_name);
else if (isdigit(hd->disk_name[strlen(hd->disk_name)-1]))
snprintf(buf, BDEVNAME_SIZE, "%sp%d", hd->disk_name, partno);
else
snprintf(buf, BDEVNAME_SIZE, "%s%d", hd->disk_name, partno);
return buf;
}

How do you measure the time a function takes to execute?

How can you measure the amount of time a function will take to execute?
This is a relatively short function and the execution time would probably be in the millisecond range.
This particular question relates to an embedded system, programmed in C or C++.
The best way to do that on an embedded system is to set an external hardware pin when you enter the function and clear it when you leave the function. This is done preferably with a little assembly instruction so you don't skew your results too much.
Edit: One of the benefits is that you can do it in your actual application and you don't need any special test code. External debug pins like that are (should be!) standard practice for every embedded system.
There are three potential solutions:
Hardware Solution:
Use a free output pin on the processor and hook an oscilloscope or logic analyzer to the pin. Initialize the pin to a low state, just before calling the function you want to measure, assert the pin to a high state and just after returning from the function, deassert the pin.
*io_pin = 1;
myfunc();
*io_pin = 0;
Bookworm solution:
If the function is fairly small, and you can manage the disassembled code, you can crack open the processor architecture databook and count the cycles it will take the processor to execute every instructions. This will give you the number of cycles required.
Time = # cycles * Processor Clock Rate / Clock ticks per instructions
This is easier to do for smaller functions, or code written in assembler (for a PIC microcontroller for example)
Timestamp counter solution:
Some processors have a timestamp counter which increments at a rapid rate (every few processor clock ticks). Simply read the timestamp before and after the function.
This will give you the elapsed time, but beware that you might have to deal with the counter rollover.
Invoke it in a loop with a ton of invocations, then divide by the number of invocations to get the average time.
so:
// begin timing
for (int i = 0; i < 10000; i++) {
invokeFunction();
}
// end time
// divide by 10000 to get actual time.
if you're using linux, you can time a program's runtime by typing in the command line:
time [funtion_name]
if you run only the function in main() (assuming C++), the rest of the app's time should be negligible.
I repeat the function call a lot of times (millions) but also employ the following method to discount the loop overhead:
start = getTicks();
repeat n times {
myFunction();
myFunction();
}
lap = getTicks();
repeat n times {
myFunction();
}
finish = getTicks();
// overhead + function + function
elapsed1 = lap - start;
// overhead + function
elapsed2 = finish - lap;
// overhead + function + function - overhead - function = function
ntimes = elapsed1 - elapsed2;
once = ntimes / n; // Average time it took for one function call, sans loop overhead
Instead of calling function() twice in the first loop and once in the second loop, you could just call it once in the first loop and don't call it at all (i.e. empty loop) in the second, however the empty loop could be optimized out by the compiler, giving you negative timing results :)
start_time = timer
function()
exec_time = timer - start_time
Windows XP/NT Embedded or Windows CE/Mobile
You an use the QueryPerformanceCounter() to get the value of a VERY FAST counter before and after your function. Then you substract those 64-bits values and get a delta "ticks". Using QueryPerformanceCounterFrequency() you can convert the "delta ticks" to an actual time unit. You can refer to MSDN documentation about those WIN32 calls.
Other embedded systems
Without operating systems or with only basic OSes you will have to:
program one of the internal CPU timers to run and count freely.
configure it to generate an interrupt when the timer overflows, and in this interrupt routine increment a "carry" variable (this is so you can actually measure time longer than the resolution of the timer chosen).
before your function you save BOTH the "carry" value and the value of the CPU register holding the running ticks for the counting timer you configured.
same after your function
substract them to get a delta counter tick.
from there it is just a matter of knowing how long a tick means on your CPU/Hardware given the external clock and the de-multiplication you configured while setting up your timer. You multiply that "tick length" by the "delta ticks" you just got.
VERY IMPORTANT Do not forget to disable before and restore interrupts after getting those timer values (bot the carry and the register value) otherwise you risk saving incorrect values.
NOTES
This is very fast because it is only a few assembly instructions to disable interrupts, save two integer values and re-enable interrupts. The actual substraction and conversion to real time units occurs OUTSIDE the zone of time measurement, that is AFTER your function.
You may wish to put that code into a function to reuse that code all around but it may slow things a bit because of the function call and the pushing of all the registers to the stack, plus the parameters, then popping them again. In an embedded system this may be significant. It may be better then in C to use MACROS instead or write your own assembly routine saving/restoring only relevant registers.
Depends on your embedded platform and what type of timing you are looking for. For embedded Linux, there are several ways you can accomplish. If you wish to measure the amout of CPU time used by your function, you can do the following:
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#define SEC_TO_NSEC(s) ((s) * 1000 * 1000 * 1000)
int work_function(int c) {
// do some work here
int i, j;
int foo = 0;
for (i = 0; i < 1000; i++) {
for (j = 0; j < 1000; j++) {
for ^= i + j;
}
}
}
int main(int argc, char *argv[]) {
struct timespec pre;
struct timespec post;
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &pre);
work_function(0);
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &post);
printf("time %d\n",
(SEC_TO_NSEC(post.tv_sec) + post.tv_nsec) -
(SEC_TO_NSEC(pre.tv_sec) + pre.tv_nsec));
return 0;
}
You will need to link this with the realtime library, just use the following to compile your code:
gcc -o test test.c -lrt
You may also want to read the man page on clock_gettime there is some issues with running this code on SMP based system that could invalidate you testing. You could use something like sched_setaffinity() or the command line cpuset to force the code on only one core.
If you are looking to measure user and system time, then you could use the times(NULL) which returns something like a jiffies. Or you can change the parameter for clock_gettime() from CLOCK_THREAD_CPUTIME_ID to CLOCK_MONOTONIC...but be careful of wrap around with CLOCK_MONOTONIC.
For other platforms, you are on your own.
Drew
I always implement an interrupt driven ticker routine. This then updates a counter that counts the number of milliseconds since start up. This counter is then accessed with a GetTickCount() function.
Example:
#define TICK_INTERVAL 1 // milliseconds between ticker interrupts
static unsigned long tickCounter;
interrupt ticker (void)
{
tickCounter += TICK_INTERVAL;
...
}
unsigned in GetTickCount(void)
{
return tickCounter;
}
In your code you would time the code as follows:
int function(void)
{
unsigned long time = GetTickCount();
do something ...
printf("Time is %ld", GetTickCount() - ticks);
}
In OS X terminal (and probably Unix, too), use "time":
time python function.py
If the code is .Net, use the stopwatch class (.net 2.0+) NOT DateTime.Now. DateTime.Now isn't updated accurately enough and will give you crazy results
If you're looking for sub-millisecond resolution, try one of these timing methods. They'll all get you resolution in at least the tens or hundreds of microseconds:
If it's embedded Linux, look at Linux timers:
http://linux.die.net/man/3/clock_gettime
Embedded Java, look at nanoTime(), though I'm not sure this is in the embedded edition:
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#nanoTime()
If you want to get at the hardware counters, try PAPI:
http://icl.cs.utk.edu/papi/
Otherwise you can always go to assembler. You could look at the PAPI source for your architecture if you need some help with this.

Resources