How to convert Cycles to nanoseconds

How to convert Cycles to nanoseconds - linux-kernel

I wanted to convert Cycles based on CPU counter register values to nano seconds in Linux.
I can see Linux provides cyc_to_ns()/clocksource_cyc2ns api's for the same.
Both the api's required three arguments to be passed.
For example:
cyc_to_ns(u64 cyc, u32 mult, u32 shift);
Now I do have cyc(CPU counter register) value with me but don't have values of mult and shift.
Can anyone point me out, how should I calculate these two values (mult and shift)?

The clocksource provides helper to calculate mult and shift values:
clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 maxsec)
#to and #from are frequency values in HZ. For clock sources #to is
NSEC_PER_SEC == 1GHz and #from is the counter frequency. For clock
event #to is the counter frequency and #from is NSEC_PER_SEC.
The #maxsec conversion range argument controls the time frame in
seconds which must be covered by the runtime conversion with the
calculated mult and shift factors. This guarantees that no 64bit
overflow happens when the input value of the conversion is
multiplied with the calculated mult factor. Larger ranges may
reduce the conversion accuracy by chosing smaller mult and shift
factors.

Related

Why can I not use the MAX and MIN values of f32 and f64 in a call to rand::rngs::ThreadRng::gen_range() in Rust

I have a utility I've written to generate dummy data. For example, you might write:
giveme 12000 u32
... to get an array of 12000 32-bit unsigned integers.
It is possible to set a maximum and/or minimum allowed value, so you might write:
giveme 100 f32 --max=155.25
If one or the other is not given, the programme uses type::MAX and type::MIN.
With the floating point types, however, one cannot pass those values to rand::rngs::ThreadRng::gen_range().
Here is my code for the f64:
fn generate_gift(gift_type: & GiftType,
generator: &mut rand::rngs::ThreadRng,
min: Option<f64>,
max: Option<f64>) -> Gift
{
match gift_type
{
...
GiftType::Float64 =>
{
let _min: f64 = min.unwrap_or(f64::MIN);
let _max: f64 = max.unwrap_or(f64::MAX);
let x: f64 = generator.gen_range(_min..=_max);
Gift::Float64(x)
},
}
}
If one or both of the limits is missing for floating point, 32 or 64-bit, then I get this error:
thread 'main' panicked at 'UniformSampler::sample_single: range overflow', /home/jack/.cargo/registry/src/github.com-1ecc6299db9ec823/rand-0.8.5/src/distributions/uniform.rs:998:1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
It is not obvious to me at all why this error arises. Can you shed any light upon it?

It's a limitation of what it used to generate the float number, see #1090.
gen_range(f64::MIN..f64::MAX) results in high - low overflowing
See the book
f64: we treat this as an approximation of the real numbers, and, by convention, restrict to the range 0 to 1 (if not otherwise specified). We will come back to the conversions used later; for now note that these produce 52-53 bits of precision (depending on which conversion is used, output will be in steps of ε or ε/2, where 1+ε is the smallest representable value greater than 1).
For f32 and f64 the range 0.0 .. 1.0 is used (exclusive of 1.0), for two reasons: (a) this is common practice for random-number generators and (b) because for many purposes having a uniform distribution of samples (along the Real number line) is important, and this is only possible for floating-point representations by restricting the range.

Its a limitation in rand.
In the source code it calculates high - low which is inf in your case, the implementation checks this and raises this error.
https://docs.rs/rand/0.8.5/src/rand/distributions/uniform.rs.html#811-814
The algorithm first calculates a random number in [1,2) then scales it with the length of the interval.

How play method works at "NCD.L1.sample--lottery" contract?

Here is the contract repo. https://github.com/Learn-NEAR/NCD.L1.sample--lottery
I don't understand the play method here
https://github.com/Learn-NEAR/NCD.L1.sample--lottery/blob/2bd11bc1092004409e32b75736f78adee821f35b/src/lottery/assembly/lottery.ts#L11-L16
play(): bool {
const rng = new RNG<u32>(1, u32.MAX_VALUE);
const roll = rng.next();
logging.log("roll: " + roll.toString());
return roll <= <u32>(<f64>u32.MAX_VALUE * this.chance);
}
I don't understand the winning process but I'm sure it is hidden inside this method. So can someone explain how this play method works in detail?

To understand the winning process we should take a look at the play method in the lottery.ts file in the contract.
https://github.com/Learn-NEAR/NCD.L1.sample--lottery/blob/2bd11bc1092004409e32b75736f78adee821f35b/src/lottery/assembly/lottery.ts#L11-L16
play(): bool {
const rng = new RNG<u32>(1, u32.MAX_VALUE);
const roll = rng.next();
logging.log("roll: " + roll.toString());
return roll <= <u32>(<f64>u32.MAX_VALUE * this.chance);
}
There are a couple of things we should know about before we read this code.
bool
u32
f64
RNG<32>
bool means that our play method should only return true or false.
u32 is a 32-bit unsigned integer. It is a positive integer stored using 32 bits.
u8 has a max value of 255. u16 has a max value of 65535. u32 has a max value of 4294967295. u64 has a max value of 18446744073709551615. So, these unsigned integers can't be negative values.
f64 is a number that has a decimal place. This type can represent a wide range of decimal numbers, like 3.5, 27, -113.75, 0.0078125, 34359738368, 0, -1. So unlike integer types (such as i32), floating-point types can represent non-integer numbers, too.
RNG stands for Random Number Generator. It basically gives you a random number in the range of u32. And it takes two parameters that define the range of your method. In that case, the range is between 1 and u32.MAX_VALUE. In other words, it is 1 and 4294967296.
The next line creates a variable called roll and assigned it to the value of rng.next().
So, what does next() do? Think rng as a big machine which only has one big red button on it. When you hit that big red button, it gives you a number that this machine is capable of producing. Meaning, every time you hit that button, it gives you a number between 1 and u32.MAX_VALUE
The third line is just about logging the roll into the console. You should see something like that in your console roll: 3845432649
The last line looks confusing at the beginning but let's take a look piece by piece.
Here, u32.MAX_VALUE * this.chance we multiply this max value with a variable called chance which we defined as 0.2 in the Lottery class.
Then, we put <f64> at the beginning of this calculation because the result always will be a floating number due to 0.2.
Then, we put <32> at the beginning of all to convert that floating number into unsigned integer because we need to compare it with the roll which is an unsigned integer. You can't compare floating numbers with unsigned integers.
Finally, if the roll less than or equals to <u32>(<f64>u32.MAX_VALUE * this.chance) this, player wins.

Algorithm for collecting total events within the last certain time

I am facing an algorithm problem.
We have a task that runs every 10ms and during the running, an event can happen or not happen. Is there any simple algorithm that allows us to keep track of how many time an event is triggered within the latest, say, 1 second?
The only idea that I have is to implement an array and save all the events. As we are programming embedded systems, there is not enough space...
Thanks in advance.

an array of 13 bytes for a second worth of events in 10ms steps.
consider it an array of 104 bits marking 0ms to 104ms
if the event occurs mark the bit and increment to the next time, else just increment to next bit/byte.
if you want ... run length encode after each second to offload the event bits into another value.
or ... treat it as a circular buffer and keep the count available for query.
or both
You could reduce the array size to match the space available.
It is not clear if an event could occur multiple times while your task was running, or if it is always 10ms between events.

This is more-or-less what Dtyree and Weeble have suggested, but an example implementation may help ( C code for illustration):
#include <stdint.h>
#include <stdbool.h>
#define HISTORY_LENGTH 100 // 1 second when called every 10ms
int rollingcount( bool event )
{
static uint8_t event_history[(HISTORY_LENGTH+7) / 8] ;
static int next_history_bit = 0 ;
static int event_count = 0 ;
// Get history byte index and bit mask
int history_index = next_history_bit >> 3 ; // ">> 3" is same as "/ 8" but often faster
uint8_t history_mask = 1 << (next_history_bit & 0x7) ; // "& 0x07" is same as "% 8" but often faster
// Get current bit value
bool history_bit = (event_history[history_index] & history_mask) != 0 ;
// If oldest history event is not the same as new event, adjust count
if( history_bit != event )
{
if( event )
{
// Increment count for 0->1
event_count++ ;
// Replace oldest bit with 1
event_history[history_index] |= history_mask ;
}
else
{
// decrement count for 1->0
event_count-- ;
// Replace oldest bit with 0
event_history[history_index] &= ~history_mask ;
}
}
// increment to oldest history bit
next_history_bit++ ;
if( next_history_bit >= HISTORY_LENGTH ) // Could use "next_history_bit %= HISTORY_COUNT" here, but may be expensive of some processors
{
next_history_bit = 0 ;
}
return event_count ;
}
For a 100 sample history, it requires 13 bytes plus two integers of statically allocated memory, I have used int for generality, but in this case uint8_t counters would suffice. In addition there are three stack variables, and again the use of int is not necessary if you need to really optimise memory use. So in total it is possible to use as little as 15 bytes plus three bytes of stack. The event argument may or may not be passed on the stack, then there is the function call return address, but again that depends on the calling convention of your compiler/processor.

You need some kind of list/queue etc, but a ringbuffer has probably the best performance.
You need to store 100 counters (1 for each time period of 10 ms during the last second) and a current counter.
Ringbuffer solution:
(I used pseudo code).
Create a counter_array of 100 counters (initially filled with 0's).
int[100] counter_array;
current_counter = 0
During the 10 ms cycle:
counter_array[current_counter] = 0;
current_counter++;
For every event:
counter_array[current_counter]++
To check the number of events during the last s, take the sum of counter_array

Can you afford an array of 100 booleans? Perhaps as a bit field? As long as you can afford the space cost, you can track the number of events in constant time:
Store:
A counter C, initially 0.
The array of booleans B, of size equal to the number of intervals you want to track, i.e. 100, initially all false.
An index I, initially 0.
Each interval:
read the boolean at B[i], and decrement C if it's true.
set the boolean at B[i] to true if the event occurred in this interval, false otherwise.
Increment C if the event occurred in this interval.
When I reaches 100, reset it to 0.
That way you at least avoid scanning the whole array every interval.
EDIT - Okay, so you want to track events over the last 3 minutes (180s, 18000 intervals). Using the above algorithm and cramming the booleans into a bit-field, that requires total storage:
2 byte unsigned integer for C
2 byte unsigned integer for I
2250 byte bit-field for B
That's pretty much unavoidable if you require to have a precise count of the number of events in the last 180.0 seconds at all times. I don't think it would be hard to prove that you need all of that information to be able to give an accurate answer at all times. However, if you could live with knowing only the number of events in the last 180 +/- 2 seconds, you could instead reduce your time resolution. Here's a detailed example, expanding on my comment below.
The above algorithm generalizes:
Store:
A counter C, initially 0.
The array of counters B, of size equal to the number of intervals you want to track, i.e. 100, initially all 0.
An index I, initially 0.
Each interval:
read B[i], and decrement C by that amount.
write the number of events that occurred this interval into B[i].
Increment C by the number of events that occurred this interval.
When I reaches the length of B, reset it to 0.
If you switch your interval to 2s, then in that time 0-200 events might occur. So each counter in the array could be a one-byte unsigned integer. You would have 90 such intervals over 3 minutes, so your array would need 90 elements = 90 bytes.
If you switch your interval to 150ms, then in that time 0-15 events might occur. If you are pressed for space, you could cram this into a half-byte unsigned integer. You would have 1200 such intervals over 3 minutes, so your array would need 1200 elements = 600 bytes.

Will the following work for you application?
A rolling event counter that increments every event.
In the routine that runs every 10ms, you compare the current event counter value with the event counter value stored the last time the routine ran.
That tells you how many events occurred during the 10ms window.

hashing a small number to a random looking 64 bit integer

I am looking for a hash-function which operates on a small integer (say in the range 0...1000) and outputs a 64 bit int.
The result-set should look like a random distribution of 64 bit ints: a uniform distribution with no linear correlation between the results.
I was hoping for a function that only takes a few CPU-cycles to execute. (the code will be in C++).
I considered multiplying the input by a big prime number and taking the modulo 2**64 (something like a linear congruent generator), but there are obvious dependencies between the outputs (in the lower bits).
Googling did not show up anything, but I am probably using wrong search terms.
Does such a function exist?
Some Background-info:
I want to avoid using a big persistent table with pseudo random numbers in an algorithm, and calculate random-looking numbers on the fly.
Security is not an issue.

I tested the 64-bit finalizer of MurmurHash3 (suggested by #aix and this SO post). This gives zero if the input is zero, so I increased the input parameter by 1 first:
typedef unsigned long long uint64;
inline uint64 fasthash(uint64 i)
{
i += 1ULL;
i ^= i >> 33ULL;
i *= 0xff51afd7ed558ccdULL;
i ^= i >> 33ULL;
i *= 0xc4ceb9fe1a85ec53ULL;
i ^= i >> 33ULL;
return i;
}
Here the input argument i is a small integer, for example an element of {0, 1, ..., 1000}. The output looks random:
i fasthash(i) decimal: fasthash(i) hex:
0 12994781566227106604 0xB456BCFC34C2CB2C
1 4233148493373801447 0x3ABF2A20650683E7
2 815575690806614222 0x0B5181C509F8D8CE
3 5156626420896634997 0x47900468A8F01875
... ... ...
There is no linear correlation between subsequent elements of the series:
The range of both axes is 0..2^64-1

Why not use an existing hash function, such as MurmurHash3 with a 64-bit finalizer? According to the author, the function takes tens of CPU cycles per key on current Intel hardware.

Given: input i in the range of 0 to 1,000.
const MaxInt which is the maximum value that cna be contained in a 64 bit int. (you did not say if it is signed or unsigned; 2^64 = 18446744073709551616 )
and a function rand() that returns a value between 0 and 1 (most languages have such a function)
compute hashvalue = i * rand() * ( MaxInt / 1000 )

1,000 * 1,000 = 1,000,000. That fits well within an Int32.
Subtract the low bound of your range, from the number.
Square it, and use it as a direct subscript into some sort of bitmap.

linear interpolation on 8bit microcontroller

I need to do a linear interpolation over time between two values on an 8 bit PIC microcontroller (Specifically 16F627A but that shouldn't matter) using PIC assembly language. Although I'm looking for an algorithm here as much as actual code.
I need to take an 8 bit starting value, an 8 bit ending value and a position between the two (Currently represented as an 8 bit number 0-255 where 0 means the output should be the starting value and 255 means it should be the final value but that can change if there is a better way to represent this) and calculate the interpolated value.
Now PIC doesn't have a divide instruction so I could code up a general purpose divide routine and effectivly calculate (B-A)/(x/255)+A at each step but I feel there is probably a much better way to do this on a microcontroller than the way I'd do it on a PC in c++
Has anyone got any suggestions for implementing this efficiently on this hardware?

The value you are looking for is (A*(255-x)+B*x)/255. It requires only 8x8 multiplication, and a final division by 255, which can be approximated by simply taking the high byte of the sum.
Choosing x in range 0..128, no approximation is needed: take the high byte of (A*(128-x)+B*x)<<1.

Assuming you interpolate a sequence of values where the previous endpoint is the new start point:
(B-A)/(x/255)+A
sounds like a bad idea. If you use base 255 as a fixedpoint representation, you get the same interpolant twice. You get B when x=255 and B as the new A when x=0.
Use 256 as the fixedpoint system. Divides become shifts, but you need 16-bit arithmetic and 8x8 multiplication with a 16-bit result. The previous issue can be fixed by simply ignoring any bits in the higher-bytes as x mod 256 becomes 0. This suggestion uses 16-bit multiplication, but can't overflow. and you don't interpolate over the same x twice.
interp = (a*(256 - x) + b*x) >> 8
256 - x becomes just a subtract-with-borrow, as you get 0 - x.
The PIC lacks these operations in its instruction set:
Right and left shift. (both logical and arithmetic)
Any form of multiplication.
You can get right-shifting by using rotate-right instead, followed by masking out the extra bits on the left with bitwise-and. A straight-forward way to do 8x8 multiplication with 16-bit result:
void mul16(
unsigned char* hi, /* in: operand1, out: the most significant byte */
unsigned char* lo /* in: operand2, out: the least significant byte */
)
{
unsigned char a,b;
/* loop over the smallest value */
a = (*hi <= *lo) ? *hi : *lo;
b = (*hi <= *lo) ? *lo : *hi;
*hi = *lo = 0;
while(a){
*lo+=b;
if(*lo < b) /* unsigned overflow. Use the carry flag instead.*/
*hi++;
--a;
}
}

The techniques described by Eric Bainville and Mads Elvheim will work fine; each one uses two multiplies per interpolation.
Scott Dattalo and Tony Kubek have put together a super-optimized PIC-specific interpolation technique called "twist" that is slightly faster than two multiplies per interpolation.
Is using this difficult-to-understand technique worth running a little faster?

You could do it using 8.8 fixed-point arithmetic. Then a number from range 0..255 would be interpreted as 0.0 ... 0.996 and you would be able to multiply and normalize it.
Tell me if you need any more details or if it's enough for you to start.

You could characterize this instead as:
(B-A)*(256/(x+1))+A
using a value range of x=0..255, precompute the values of 256/(x+1) as a fixed-point number in a table, and then code a general purpose multiply, adjust for the position of the binary point. This might not be small spacewise; I'd expect you to need a 256 entry table of 16 bit values and the multiply code. (If you don't need speed, this would suggest your divison method is fine.). But it only takes one multiply and an add.
My guess is that you don't need every possible value of X. If there are only a few values of X, you can compute them offline, do a case-select on the specific value of X and then implement the multiply in terms of a fixed sequence of shifts and adds for the specific value of X. That's likely to be pretty efficient in code and very fast for a PIC.

Interpolation
Given two values X & Y , its basically:
(X+Y)/2
or
X/2 + Y/2 (to prevent the odd-case that A+B might overflow the size of the register)
Hence try the following:
(Pseudo-code)
Initially A=MAX, B=MIN
Loop {
Right-Shift A by 1-bit.
Right-Shift B by 1-bit.
C = ADD the two results.
Check MSB of 8-bit interpolation value
if MSB=0, then B=C
if MSB=1, then A=C
Left-Shift 8-bit interpolation value
}Repeat until 8-bit interpolation value becomes zero.
The actual code is just as easy. Only i do not remember the registers and instructions off-hand.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio