OMNeT++ RNG not converging to mean - omnet++

I am using the OMNeT++ 5.1.1 simulator. I've been seeing some strange behavior from the bernoulli() function, so I built a MWE to see what's happening.
The MWE works by creating a network with a single node, and setting one timer (self message) at t=0. When the timer goes off, the simulation runs some number n Bernoulli trials with success probability p. The values n and p are specified via per-run configuration options that I defined using the Register_PerRunConfigOption() macro.
Here is my code:
#include <math.h>
#include <omnetpp.h>
using namespace omnetpp;
Register_PerRunConfigOption(CFGID_NUM_TRIALS, "num-trials", CFG_INT,
"0", "The number of Bernoulli trials to run");
Register_PerRunConfigOption(CFGID_BERNOULLI_MEAN, "bernoulli-mean",
CFG_DOUBLE, "0.0", "The mean of the Bernoulli experiments");
class Test : public cSimpleModule {
private:
int nTrials, nSuccess;
double p;
cMessage *timer;
protected:
virtual void initialize() override;
virtual void handleMessage(cMessage *msg) override;
};
Define_Module(Test);
void Test::initialize()
{
nTrials = getEnvir()->getConfig()->getAsInt(CFGID_NUM_TRIALS);
p = getEnvir()->getConfig()->getAsDouble(CFGID_BERNOULLI_MEAN);
timer = new cMessage("timer");
scheduleAt(0.0, timer);
}
void Test::handleMessage(cMessage *msg)
{
int trial;
printf("\n\n");
for (int n = 0; n < nTrials; n++) {
trial = bernoulli(p);
if (trial)
nSuccess++;
}
double mean = nTrials * p;
double variance = mean * (1.0 - p);
double stddev = std::sqrt(variance);
printf("nTrials: %12d(%.3e)\n", nTrials, (double) nTrials);
printf("nSuccess: %12d(%.3e)\n", nSuccess, (double) nSuccess);
printf("Pct.: %12.5f\n", 100.0 * (double) nSuccess / nTrials);
printf("nStdDevs: %12.2f\n", (nSuccess - mean) / stddev);
printf("\n\n");
delete msg;
}
This code is as simple as I could think of (I'm new to OMNeT++). Here is the .ned file:
simple Test
{
gates:
}
network Bernoulli
{
submodules:
node: Test;
}
Here is the omnetpp.ini file:
[General]
network = Bernoulli
bernoulli-mean = 0.05
num-trials = 10000000
rng-class = "cMersenneTwister"
seed-0-mt = ${seed=0,1,2,3,4,5,6,7,8,9}
I am running the code with the command: ./exe_file -u Cmdenv -r 3 (I am intentionally picking out the third run). When I do this with the omnetpp.ini file above, I get about 532,006 successes (though THIS NUMBER CHANGES SLIGHTLY ON EACH RUN??). For 10^7 runs, this is about 46 standard deviations from the mean (computed using mean and variance of the binomial distribution).
Furthermore, if I comment out the line rng-class="cMersenneTwister", the number jumps to about 531,793 successes, again changing slightly (but not radically) each time.
Moreover, if I comment out the seed-0-mt=... line, then suddenly the simulation starts producing values within 0.06 std. dev. of the mean! This despite the fact that the OMNeT++ manual assures that using the cMersenneTwister algorithm means you can choose seeds at random, since the period is so large.
Why is this happening?? I would expect that (1) since cMersenneTwister is the default, including it in the omnetpp.ini file shouldn't change anything, and (2) that since I'm choosing the same seed each time (i.e. the seed 3), that I should be getting the same results. But I'm not! This confuses me, because the OMNeT++ manual states:
For the cMersenneTwister random number generator, selecting seeds so that the generated sequences don't overlap is easy, due to the extremely long sequence of the RNG. The RNG is initialized from the 32-bit seed value seed = runNumber*numRngs + rngNumber.
Thanks!

You should initialize nSuccess to zero before using it because in C++ the member of class which is fundamental type (int, float, etc.) is not initialized by default.
Moreover, I strongly encourage you to use the parameter mechanism in OMNeT++ - it is a standard way to control simulation. To use it you should:
Add the definition of parameters in NED file of simple module, for example:
simple Test
{
parameters:
double bernoulli_mean;
int num_trials;
gates:
}
Set values in omnetpp.ini:
**.bernoulli_mean = 0.05
**.num_trials = 10000000
Read the parameters in your class:
void Test::initialize()
{
nTrials = par("num_trials");
p = par("bernoulli_mean").doubleValue();
// ...
Notes:
Using "-" in the name of a parameter is forbidden.
In omnetpp.ini every instance of a simple module has own values of its parameters. However, to assign the same values to all module, one can use wildcard patterns, for example **

Related

Why does clock() returns -1 in C

I'm trying to implement an error handler using the clock() function from the "time.h" library. The code runs inside an embeeded system (Colibri IMX7 - M4 Processor). The function is used to monitor a current value within a specific range, if the value of the current isn't correct the function should return an error message.
The function will see if the error is ocurring and in the first run it will save the first appearance of the error in a clock_t as reference, and then in the next runs if the error is still there, it will compare the current time using clock() with the previous reference and see if it will be longer than a specific time.
The problem is that the function clock() is always returning -1. What should I do to avoid that? Also, why can't I declare a clock_t variable as static (e.g. static clock_t start_t = clock()?
Please see below the function:
bool CrossLink_check_error_LED_UV_current_clock(int current_state, int current_at_LED_UV)
{
bool has_LED_UV_current_deviated = false;
static int current_number_of_errors_Current_LED_CANNON = 0;
clock_t startTimeError = clock();
const int maximum_operational_current_when_on = 2000;
const int minimum_turned_on_LED_UV_current = 45;
if( (current_at_LED_UV > maximum_operational_current_when_on)
||(current_state!=STATE_EMITTING && (current_at_LED_UV > minimum_turned_on_LED_UV_current))
||(current_state==STATE_EMITTING && (current_at_LED_UV < minimum_turned_on_LED_UV_current)) ){
current_number_of_errors_Current_LED_CANNON++;
if(current_number_of_errors_Current_LED_CANNON > 1) {
if (clock() - startTimeError > 50000){ // 50ms
has_LED_UV_current_deviated = true;
PRINTF("current_at_LED_UV: %d", current_at_LED_UV);
if(current_state==STATE_EMITTING){
PRINTF(" at state emitting");
}
PRINTF("\n\r");
}
}else{
if(startTimeError == -1){
startTimeError = clock();
}
}
}else{
startTimeError = 0;
current_number_of_errors_Current_LED_CANNON = 0;
}
return has_LED_UV_current_deviated;
}
Edit: I forgot to mention before, but we are using GCC 9.3.1 arm-none-eabi compiler with CMake to build the executable file. We have an embedeed system (Colibri IMX7 made by Toradex) that consists in 2 A7 Processors that runs our Linux (more visual interface) and the program that is used to control our device runs in a M4 Processor without an OS, just pure bare-metal.
For a lot of provided functions in the c standard library, if you have the documentation installed (usually it gets installed with the compiler), you can view documentation using the man command in the shell. With man clock, it tells me that:
NAME
clock - determine processor time
SYNOPSIS
#include <time.h>
clock_t clock(void);
DESCRIPTION
The clock() function returns an approximation of processor time used by the program.
RETURN VALUE
The value returned is the CPU time used so far as a clock_t; to get the number of seconds used, divide by
CLOCKS_PER_SEC. If the processor time used is not available or its value cannot be represented, the function
returns the value (clock_t) -1.
etc.
This tells us that -1 means that the processor time (CLOCK_PROCESS_CPUTIME_ID) is unavailable. The solution is to use CLOCK_MONOTONIC instead. We can select the clock we want to use with clock_gettime.
timespec clock_time;
if (clock_gettime(CLOCK_MONOTONIC, &clock_time)) {
printf("CLOCK_MONOTONIC is unavailable!\n");
exit(1);
}
printf("Seconds: %d Nanoseconds: %ld\n", clock_time.tv_sec, clock_time.tv_nsec);
To answer the second part of your question:
static clock_t start_time = clock();
is not allowed because the return value of the function clock() is not known until runtime, but in C the initializer of a static variable must be a compile-time constant.
You can write:
static clock_t start_time = 0;
if (start_time == 0)
{
start_time = clock();
}
But this may or may not be suitable to use in this case, depending on whether zero is a legitimate return value of the function. If it could be, you would need something like:
static bool start_time_initialized = false;
static clock_t start_time;
if (!start_time_initialized)
{
start_time_initialized = true;
start_time = clock();
}
The above is reliable only if you cannot have two copies of this function running at once (it is not re-entrant).
If you have a POSIX library available you could use a pthread_once_t to do the same as the above bool but in a re-entrant way. See man pthread_once for details.
Note that C++ allows more complicated options in this area, but you have asked about C.
Note also that abbreviating "start time" as start_t is a very bad idea, because the suffix _t means "type" and should only be used for type names.
in the end the problem was that since we are running our code on bare metal, the clock() function wasn't working. We ended up using an internal timer on the M4 Processor that we found, so now everything is fine. Thanks for the answers.

How do I create monotonic clock on Windows which doesn't tick during suspend?

I'm looking for a way to obtain a guaranteed-monotonic clock which excludes time spent during suspend, just like POSIX CLOCK_MONOTONIC.
Solutions requiring Windows 7 (or later) are acceptable.
Here's an example of something that doesn't work:
LONGLONG suspendTime, uiTime1, uiTime2;
do {
QueryUnbiasedInterruptTime((ULONGLONG*)&uiTime1);
suspendTime = GetTickCount64()*10000 - uiTime1;
QueryUnbiasedInterruptTime((ULONGLONG*)&uiTime2);
} while (uiTime1 != uiTime2);
static LARGE_INTEGER firstSuspend = suspendTime;
static LARGE_INTERER lastSuspend = suspendTime;
assert(suspendTime > lastSuspend);
lastSuspend = suspendTime;
LARGE_INTEGER now;
QueryPerformanceCounter(&now);
static LONGLONG firstQpc = now.QuadPart;
return (now.QuadPart - firstQpc)*qpcFreqNumer/qpcFreqDenom -
(suspendTime - firstSuspend);
The problem with this (my first attempt) is that GetTickCount only ticks every 15ms, wheras QueryUnbiasedInterruptTime seems to tick a little more often, so every now and then my method observes the suspend time go back by a little.
I've also tried using CallNtPowerInformation, but it's not clear how to use those values either to get a nice, race-free measure of suspend time.
The suspend bias time is available in kernel mode (_KUSER_SHARED_DATA.QpcBias in ntddk.h). A read-only copy is available in user mode.
#include <nt.h>
#include <ntrtl.h>
#include <nturtl.h>
LONGLONG suspendTime, uiTime1, uiTime2;
QueryUnbiasedInterruptTime((ULONGLONG*)&uiTime1);
uiTime1 -= USER_SHARED_DATA->QpcBias; // subtract off the suspend bias
The full procedure for calculating monotonic time, which does not tick during suspend, is as follows:
typedef struct _KSYSTEM_TIME {
ULONG LowPart;
LONG High1Time;
LONG High2Time;
} KSYSTEM_TIME;
#define KUSER_SHARED_DATA 0x7ffe0000
#define InterruptTime ((KSYSTEM_TIME volatile*)(KUSER_SHARED_DATA + 0x08))
#define InterruptTimeBias ((ULONGLONG volatile*)(KUSER_SHARED_DATA + 0x3b0))
static LONGLONG readInterruptTime() {
// Reading the InterruptTime from KUSER_SHARED_DATA is much better than
// using GetTickCount() because it doesn't wrap, and is even a little quicker.
// This works on all Windows NT versions (NT4 and up).
LONG timeHigh;
ULONG timeLow;
do {
timeHigh = InterruptTime->High1Time;
timeLow = InterruptTime->LowPart;
} while (timeHigh != InterruptTime->High2Time);
LONGLONG now = ((LONGLONG)timeHigh << 32) + timeLow;
static LONGLONG d = now;
return now - d;
}
static LONGLONG scaleQpc(LONGLONG qpc) {
// We do the actual scaling in fixed-point rather than floating, to make sure
// that we don't violate monotonicity due to rounding errors. There's no
// need to cache QueryPerformanceFrequency().
LARGE_INTEGER frequency;
QueryPerformanceFrequency(&frequency);
double fraction = 10000000/double(frequency.QuadPart);
LONGLONG denom = 1024;
LONGLONG numer = std::max(1LL, (LONGLONG)(fraction*denom + 0.5));
return qpc * numer / denom;
}
static ULONGLONG readUnbiasedQpc() {
// We remove the suspend bias added to QueryPerformanceCounter results by
// subtracting the interrupt time bias, which is not strictly speaking legal,
// but the units are correct and I think it's impossible for the resulting
// "unbiased QPC" value to go backwards.
LONGLONG interruptTimeBias, qpc;
do {
interruptTimeBias = *InterruptTimeBias;
LARGE_INTEGER counter;
QueryPerformanceCounter(&counter);
qpc = counter.QuadPart;
} while (interruptTimeBias != *InterruptTimeBias);
static std::pair<LONGLONG,LONGLONG> d(qpc, interruptTimeBias);
return scaleQpc(qpc - d.first) - (interruptTimeBias - d.second);
}
/// getMonotonicTime() returns the time elapsed since the application's first
/// call to getMonotonicTime(), in 100ns units. The values returned are
/// guaranteed to be monotonic. The time ticks in 15ms resolution and advances
/// during suspend on XP and Vista, but we manage to avoid this on Windows 7
/// and 8, which also use a high-precision timer. The time does not wrap after
/// 49 days.
uint64_t getMonotonicTime()
{
OSVERSIONINFOEX ver = { sizeof(OSVERSIONINFOEX), };
GetVersionEx(&ver);
bool win7OrLater = (ver.dwMajorVersion > 6 ||
(ver.dwMajorVersion == 6 && ver.dwMinorVersion >= 1));
// On Windows XP and earlier, QueryPerformanceCounter is not monotonic so we
// steer well clear of it; on Vista, it's just a bit slow.
return win7OrLater ? readUnbiasedQpc() : readInterruptTime();
}

Synchronized Block takes more time after instrumenting with ASM

I am trying to instrument java synchronized block using ASM. The problem is that after instrumenting, the execution time of the synchronized block takes more time. Here it increases from 2 msecs to 200 msecs on Linux box.
I am implementing this by identifying the MonitorEnter and MonitorExit opcode.
I try to instrument at three level 1. just before the MonitorEnter 2. after MonitorEnter 3. Before MonitorExit.
1 and 3 together works fine, but when i do 2, the execution time increase dramatically.
Even if we instrument another single SOP statement, which is intended to be executed just once, it give higher values.
Here the sample code (prime number, 10 loops):
for(int w=0;w<10;w++){
synchronized(s){
long t1 = System.currentTimeMillis();
long num = 2000;
for (long i = 1; i < num; i++) {
long p = i;
int j;
for (j = 2; j < p; j++) {
long n = p % i;
}
}
long t2 = System.currentTimeMillis();
System.out.println("Time>>>>>>>>>>>> " + (t2-t1) );
}
Here the code for instrumention (here System.currentMilliSeconds() gives the time at which instrumention happened, its no the measure of execution time, the excecution time is from obove SOP statement):
public void visitInsn(int opcode)
{
switch(opcode)
{
// Scenario 1
case 194:
visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", "out", "Ljava/io /PrintStream;");
visitLdcInsn("TIME Arrive: "+System.currentTimeMillis());
visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V");
break;
// scenario 3
case 195:
visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;");
visitLdcInsn("TIME exit : "+System.currentTimeMillis());
visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V");
break;
}
super.visitInsn(opcode);
// scenario 2
if(opcode==194)
{
visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;");
visitLdcInsn("TIME enter: "+System.currentTimeMillis());
visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V");
}
}
I am not able to find the reason why it is happening and how t correct it.
Thanks in advance.
The reason lies in the internals of the JVM that you were using for running the code. I assume that this was a HotSpot JVM but the answers below are equally right for most other implementations.
If you trigger the following code:
int result = 0;
for(int i = 0; i < 1000; i++) {
result += i;
}
This will be translated directly into Java byte code by the Java compiler but at run time the JVM will easily see that this code is not doing anything. Executing this code will have no effect on the outside (application) world, so why should the JVM execute it? This consideration is exactly what compiler optimization does for you.
If you however trigger the following code:
int result = 0;
for(int i = 0; i < 1000; i++) {
System.out.println(result);
}
the Java runtime cannot optimize away your code anymore. The whole loop must always run since the System.out.println(int) method is always doing something real such that your code will run slower.
Now let's look at your example. In your first example, you basically write this code:
synchronized(s) {
// do nothing useful
}
This entire code block can easily be removed by the Java run time. This means: There will be no synchronization! In the second example, you are writing this instead:
synchronized(s) {
long t1 = System.currentTimeMillis();
// do nothing useful
long t2 = System.currentTimeMillis();
System.out.println("Time>>>>>>>>>>>> " + (t2-t1));
}
This means that the effective code might be look like this:
synchronized(s) {
long t1 = System.currentTimeMillis();
long t2 = System.currentTimeMillis();
System.out.println("Time>>>>>>>>>>>> " + (t2-t1));
}
What is important here is that this optimized code will be effectively synchronized what is an important difference with respect to execution time. Basically, you are measuring the time it costs to synchronize something (and even that might be optimized away after a couple of runs if the JVM realized that the s is not locked elsewhere in your code (buzzword: temporary optimization with the possibility of deoptimization if loaded code in the future will also synchronize on s).
You should really read this:
http://www.ibm.com/developerworks/java/library/j-jtp02225/
http://www.ibm.com/developerworks/library/j-jtp12214/
Your test for example misses a warm-up, such that you are also measuring how much time the JVM will use for byte code to machine code optimization.
On a side note: Synchronizing on a String is almost always a bad idea. Your strings might be or might not be interned what means that you cannot be absolutely sure about their identity. This means, that synchronization might or might not work and you might even inflict synchronization of other parts of your code.

How to put my structure variable into CPU caches to eliminate main memory page access time? Options

It's clear that there is no explicit way or certain system calls that
help programmers to put a variable into the CPU cache.
But I think that a certain programming style or well designed
algorithm can make it possible to increase the possibilities that the
variable can be cached into the CPU caches.
Here is my example:
I want to append an 8 byte structure at the end of an array consisting
of the same type of structures, declared in the global main memory
region.
This process is continuously repeated for 4 million operations. This process takes 6 seconds, 1.5 us for each operation. I think this result tells that the two memory areas have not been cached.
I got some clues from a cache-oblivious algorithm, so I tried several
ways to enhance this. Until now, no enhancement.
I think some clever codes can reduce the elapsed time, up to 10 to 100
times. Please show me the way.
-------------------------------------------------------------------------
Appended (2011-04-01)
Damon~ thank you for your comment!
After reading your comment, I analyzed my code again, and found several things
that I missed. The following code that I attached is the abbreviated version of my original code.
To accurately measure each operation's execution time (in the original code, there are several different types of operations), I inserted the time measuring code using clock_gettime() function. I thought if I measure each operation's execution time and accumulate them, the additional cost by the main loop can be avoided.
In the original code, the time measuring code was hidden by a macro function, so I totally forgot about it.
The running time of this code is almost 6 seconds. But if I get rid of the time measuring function in the main loop, it becomes 0.1 seconds.
Since the clock_gettime() function supports very high precision (upto 1 nano second), executed on the basis of an independent thread, and also it requires very big structure,
I think the function caused the cache-out of the main memory area where the consecutive insertions are performed.
Thank you again for your comment. For further enhancement, any suggestion will be very helpful for me to optimize my code.
I think the hierachically defined structure variable might cause unnecessary time cost,
but first I want to know how much it would be, before I change it to the more C-style code.
typedef struct t_ptr {
uint32 isleaf :1, isNextLeaf :1, ptr :30;
t_ptr(void) {
isleaf = false;
isNextLeaf = false;
ptr = NIL;
}
} PTR;
typedef struct t_key {
uint32 op :1, key :31;
t_key(void) {
op = OP_INS;
key = 0;
}
} KEY;
typedef struct t_key_pair {
KEY key;
PTR ptr;
t_key_pair() {
}
t_key_pair(KEY k, PTR p) {
key = k;
ptr = p;
}
} KeyPair;
typedef struct t_op {
KeyPair keyPair;
uint seq;
t_op() {
seq = 0;
}
} OP;
#define MAX_OP_LEN 4000000
typedef struct t_opq {
OP ops[MAX_OP_LEN];
int freeOffset;
int globalSeq;
bool queueOp(register KeyPair keyPair);
} OpQueue;
bool OpQueue::queueOp(register KeyPair keyPair) {
bool isFull = false;
if (freeOffset == (int) (MAX_OP_LEN - 1)) {
isFull = true;
}
ops[freeOffset].keyPair = keyPair;
ops[freeOffset].seq = globalSeq++;
freeOffset++;
}
OpQueue opQueue;
#include <sys/time.h>
int main() {
struct timespec startTime, endTime, totalTime;
for(int i = 0; i < 4000000; i++) {
clock_gettime(CLOCK_REALTIME, &startTime);
opQueue.queueOp(KeyPair());
clock_gettime(CLOCK_REALTIME, &endTime);
totalTime.tv_sec += (endTime.tv_sec - startTime.tv_sec);
totalTime.tv_nsec += (endTime.tv_nsec - startTime.tv_nsec);
}
printf("\n elapsed time: %ld", totalTime.tv_sec * 1000000LL + totalTime.tv_nsec / 1000L);
}
YOU don't put the structure into any cache. The CPU does that automatically for you. The CPU is even more clever than that; if you access sequential memory, it will start putting things from memory into the cache before you read them.
And really, it should be common sense that for a simple bit of code like this, the time you spend on measuring is ten times more than the time to perform the code (apparently 60 times in your case).
Since you put so much confidence in clock_gettime (): I suggest you call it five times in a row and store the results, then print the differences. There's resolution, there's precision, and there's how long it takes to return the current time, which is pretty damned long.
I have been unable to force caching, but you can force memory to be uncache-able. If you have large other datastructures you might exclude these so that they will not pollute your caches. This can be done by specifying PAGE_NOCACHE for the Windows VirutalAllocXXX functions.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366786(v=vs.85).aspx

Lua - Current time in milliseconds

Is there a common way to get the current time in or with milliseconds?
There is os.time(), but it only provides full seconds.
I use LuaSocket to get more precision.
require "socket"
print("Milliseconds: " .. socket.gettime()*1000)
This adds a dependency of course, but works fine for personal use (in benchmarking scripts for example).
If you want to benchmark, you can use os.clock as shown by the doc:
local x = os.clock()
local s = 0
for i=1,100000 do s = s + i end
print(string.format("elapsed time: %.2f\n", os.clock() - x))
In standard C lua, no. You will have to settle for seconds, unless you are willing to modify the lua interpreter yourself to have os.time use the resolution you want. That may be unacceptable, however, if you are writing code for other people to run on their own and not something like a web application where you have full control of the environment.
Edit: another option is to write your own small DLL in C that extends lua with a new function that would give you the values you want, and require that dll be distributed with your code to whomever is going to be using it.
Get current time in milliseconds.
os.time()
os.time()
return sec // only
posix.clock_gettime(clk)
https://luaposix.github.io/luaposix/modules/posix.time.html#clock_gettime
require'posix'.clock_gettime(0)
return sec, nsec
linux/time.h // man clock_gettime
/*
* The IDs of the various system clocks (for POSIX.1b interval timers):
*/
#define CLOCK_REALTIME 0
#define CLOCK_MONOTONIC 1
#define CLOCK_PROCESS_CPUTIME_ID 2
#define CLOCK_THREAD_CPUTIME_ID 3
#define CLOCK_MONOTONIC_RAW 4
#define CLOCK_REALTIME_COARSE 5
#define CLOCK_MONOTONIC_COARSE 6
socket.gettime()
http://w3.impa.br/~diego/software/luasocket/socket.html#gettime
require'socket'.gettime()
return sec.xxx
as waqas says
compare & test
get_millisecond.lua
local posix=require'posix'
local socket=require'socket'
for i=1,3 do
print( os.time() )
print( posix.clock_gettime(0) )
print( socket.gettime() )
print''
posix.nanosleep(0, 1) -- sec, nsec
end
output
lua get_millisecond.lua
1490186718
1490186718 268570540
1490186718.2686
1490186718
1490186718 268662191
1490186718.2687
1490186718
1490186718 268782765
1490186718.2688
I made a suitable solution for lua on Windows. I basically did what Kevlar suggested, but with a shared library rather than a DLL. This has been tested using cygwin.
I wrote some lua compatible C code, compiled it to a shared library (.so file via gcc in cygwin), and then loaded it up in lua using package.cpath and require" ". Wrote an adapter script for convenience. Here is all of the source:
first the C code, HighResTimer.c
////////////////////////////////////////////////////////////////
//HighResTimer.c by Cody Duncan
//
//compile with: gcc -o Timer.so -shared HighResTimer.c -llua5.1
//compiled in cygwin after installing lua (cant remember if I
// installed via setup or if I downloaded and compiled lua,
// probably the former)
////////////////////////////////////////////////////////////////
#include <windows.h>
typedef unsigned __int64 u64;
double mNanoSecondsPerCount;
#include "lua.h"
#include "lualib.h"
#include "lauxlib.h"
int prevInit = 0;
int currInit = 0;
u64 prevTime = 0;
u64 currTime = 0;
u64 FrequencyCountPerSec;
LARGE_INTEGER frequencyTemp;
static int readHiResTimerFrequency(lua_State *L)
{
QueryPerformanceFrequency(&frequencyTemp);
FrequencyCountPerSec = frequencyTemp.QuadPart;
lua_pushnumber(L, frequencyTemp.QuadPart);
return 1;
}
LARGE_INTEGER timerTemp;
static int storeTime(lua_State *L)
{
QueryPerformanceCounter(&timerTemp);
if(!prevInit)
{
prevInit = 1;
prevTime = timerTemp.QuadPart;
}
else if (!currInit)
{
currInit = 1;
currTime = timerTemp.QuadPart;
}
else
{
prevTime = currTime;
currTime = timerTemp.QuadPart;
}
lua_pushnumber(L, timerTemp.QuadPart);
return 1;
}
static int getNanoElapsed(lua_State *L)
{
double mNanoSecondsPerCount = 1000000000/(double)FrequencyCountPerSec;
double elapsedNano = (currTime - prevTime)*mNanoSecondsPerCount;
lua_pushnumber(L, elapsedNano);
return 1;
}
int luaopen_HighResolutionTimer (lua_State *L) {
static const luaL_reg mylib [] =
{
{"readHiResTimerFrequency", readHiResTimerFrequency},
{"storeTime", storeTime},
{"getNanoElapsed", getNanoElapsed},
{NULL, NULL} /* sentinel */
};
luaL_register(L,"timer",mylib);
return 1;
}
--
--
Now lets get it loaded up in a lua script, HighResTimer.lua .
Note: I compiled the HighResTimer.c to a shared library, Timer.so
#!/bin/lua
------------------------------------
---HighResTimer.lua by Cody Duncan
---Wraps the High Resolution Timer Functions in
--- Timer.so
------------------------------------
package.cpath = "./Timer.so" --assuming Timer.so is in the same directory
require "HighResolutionTimer" --load up the module
timer.readHiResTimerFrequency(); --stores the tickFrequency
--call this before code that is being measured for execution time
function start()
timer.storeTime();
end
--call this after code that is being measured for execution time
function stop()
timer.storeTime();
end
--once the prior two functions have been called, call this to get the
--time elapsed between them in nanoseconds
function getNanosElapsed()
return timer.getNanoElapsed();
end
--
--
and Finally, utilize the timer, TimerTest.lua .
#!/bin/lua
------------------------------------
---TimerTest.lua by Cody Duncan
---
---HighResTimer.lua and Timer.so must
--- be in the same directory as
--- this script.
------------------------------------
require './HighResTimer'
start();
for i = 0, 3000000 do io.write("") end --do essentially nothing 3million times.
stop();
--divide nanoseconds by 1 million to get milliseconds
executionTime = getNanosElapsed()/1000000;
io.write("execution time: ", executionTime, "ms\n");
Note: Any comments were written after pasting the source code into the post editor, so technically this is untested, but hopefully the comments didn't befuddle anything. I will be sure to come back and provide a fix if it does.
If you're using lua with nginx/openresty you could use ngx.now() which returns a float with millisecond precision
If you're using OpenResty then it provides for in-built millisecond time accuracy through the use of its ngx.now() function. Although if you want fine grained millisecond accuracy then you may need to call ngx.update_time() first. Or if you want to go one step further...
If you are using luajit enabled environment, such as OpenResty, then you can also use ffi to access C based time functions such as gettimeofday() e.g: (Note: The pcall check for the existence of struct timeval is only necessary if you're running it repeatedly e.g. via content_by_lua_file in OpenResty - without it you run into errors such as attempt to redefine 'timeval')
if pcall(ffi.typeof, "struct timeval") then
-- check if already defined.
else
-- undefined! let's define it!
ffi.cdef[[
typedef struct timeval {
long tv_sec;
long tv_usec;
} timeval;
int gettimeofday(struct timeval* t, void* tzp);
]]
end
local gettimeofday_struct = ffi.new("struct timeval")
local function gettimeofday()
ffi.C.gettimeofday(gettimeofday_struct, nil)
return tonumber(gettimeofday_struct.tv_sec) * 1000000 + tonumber(gettimeofday_struct.tv_usec)
end
Then the new lua gettimeofday() function can be called from lua to provide the clock time to microsecond level accuracy.
Indeed, one could take a similar approaching using clock_gettime() to obtain nanosecond accuracy.
Kevlar is correct.
An alternative to a custom DLL is Lua Alien
in openresty there is a function ngx.req.start_time.
From the docs:
Returns a floating-point number representing the timestamp (including milliseconds as the decimal part) when the current request was created.
You can use C function gettimeofday :
http://www.opengroup.org/onlinepubs/000095399/functions/gettimeofday.html
Here C library 'ul_time', function sec_usec resides in 'time' global table and returns seconds, useconds. Copy DLL to Lua folder, open it with require 'ul_time'.
http://depositfiles.com/files/3g2fx7dij
If you're on a system with a GNU-compatible implementation of date that you can execute, here's a one-liner to get the Epoch time in milliseconds:
local function gethammertime()
return tonumber(assert(assert(io.popen'date +%s%3N'):read'a'))
end
Note that the assert calls are necessary to ensure that any failures to read or open date will propagate the errors, respectively. Also note that this relies on garbage collection (or finalizers, in Lua 5.4) to close the process handle: if using a pre-5.4 version of Lua and resource exhaustion is a concern, you may wish to extend this to three lines like Klesun's Windows-based answer and close the handle explicitly.
If your environment is Windows and you have access to system commands, you can get time of centiseconds precision with io.popen(command):
local handle = io.popen("echo %time%")
local result = handle:read("*a")
handle:close()
The result will hold string of hh:mm:ss.cc format: (with trailing line break)
"19:56:53.90\n"
Note, it's in local timezone, so you probably want to extract only the .cc part and combine it with epoch seconds from os.time().

Resources