Related
Initially I was very surprised to find out Rust's HashMap, even with the FNV hasher, was considerably slower than the equivalents in Java, .NET, PHP. I am talking about optimized Release mode, not Debug mode. I did some calculations and realized the timings in Java/.NET/PHP were suspiciously low. Then it hit me - even though I was testing with a big hash table (millions of entries), I was reading mostly sequential key values (like 14, 15, 16, ...), which apparently resulted in lots of CPU cache hits, due to the way the standard hash tables (and hash-code functions for integers and short strings) in those languages are implementated, so that entries with nearby keys are usually located in nearby memory locations.
Rust's HashMap, on the other hand, uses the so called SwissTable implementation, which apparently distributes values differently. When I tested reading by random keys, everything fell into place - the "competitors" scored behind Rust.
So if we are in a situation where we need to perform lots of gets sequentially, for example iterating some DB IDs that are ordered and mostly sequential (with not too many gaps), is there a good Rust hash map implementation that can compete with Java's HashMap or .NET's Dictionary?
P.S. As requested in the comments, I paste an example here. I ran lots of tests, but here is a simple example that takes 75 ms in Rust (release mode) and 20 ms in Java:
In Rust:
let hm: FnvHashMap<i32, i32> = ...;
// Start timer here
let mut sum: i64 = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += *x as i64;
}
}
println!("The sum is: {}", sum);
In Java:
Map<Integer, Integer> hm = ...;
// Start timer here
long sum = 0;
for (int i = 0; i < 1_000_000; i++) {
sum += hm.get(i);
}
With HashMap<i32, i32> and its default SipHash hasher it took 190 ms. I know why it's slower than FnvHashMap. I'm just mentioning that for completeness.
First, here is some runnable code to measure the efficiency of the different implementations:
use std::{collections::HashMap, time::Instant};
fn main() {
let hm: HashMap<i32, i32> = (0..1_000_000).map(|i| (i, i)).collect();
let t0 = Instant::now();
let mut sum = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += x;
}
}
let elapsed = t0.elapsed().as_secs_f64();
println!("{} - The sum is: {}", elapsed, sum);
}
On the old desktop machine I'm writing this on, it reports 76 ms to run. Since the machine is 10+ years old, I find it baffling that your hardware would take 190 ms to run the same code, so I'm wondering how and what you're actually measuring. But let's ignore that and concentrate on the relative numbers.
When you care about hashmap efficiency in Rust, and when the keys don't come from an untrusted source, the first thing to try should always be to switch to a non-DOS-resistant hash function. One possibility is the FNV hash function from the fnv crate, which you can get by switching HashMap to fnv::FnvHashMap. That brings performance to 34 ms, i.e. a 2.2x speedup.
If this is not enough, you can try the hash from the rustc-hash crate (almost the same as fxhash, but allegedly better maintained), which uses the same function as the Rust compiler, adapted from the hash used by Firefox. Not based on any formal analysis, it performs badly on hash function test suites, but is reported to consistently outperform FNV. That's confirmed on the above example, where switching from FnvHashMap to rustc_hash::FxHashMap drops the time to 28 ms, i.e. a 2.7x speedup from the initial timing.
Finally, if you want to just imitate what C# and Java do, and could care less about certain patterns of inserted numbers leading to degraded performance, you can use the aptly-named nohash_hasher crate that gives you an identity hash. Changing HashMap<i32, i32> to HashMap<i32, i32, nohash_hasher::BuildNoHashHasher<i32>> drops the time to just under 4 ms, i.e. a whopping 19x speedup from the initial timing.
Since you report the Java example to be 9.5x faster than Rust, a 19x speedup should make your code approximately twice as fast as Java.
Rust's HashMap by default uses an implementation of SipHash as the hash function. SipHash is designed to avoid denial-of-service attacks based on predicting hash collisions, which is an important security property for a hash function used in a hash map.
If you don't need this guarantee, you can use a simpler hash function. One option is using the fxhash crate, which should speed up reading integers from a HashMap<i32, i32> by about a factor of 3.
Other options are implementing your own trivial hash function (e.g. by simply using the identity function, which is a decent hash function for mostly consecutive keys), or using a vector instead of a hash map.
.NET uses the identity function for hashes of Int32 by default, so it's not resistant to hash flooding attacks. Of course this is faster, but the downside is not even mentioned in the documentation of Dictionary. For what it's worth, I prefer Rust's "safe by default" approach over .NET's any day, since many developers aren't even aware of the problems predictable hash functions can cause. Rust still allows you to use a more performant hash function if you don't need the hash flooding protection, so to me personally this seems to be a strength of Rust compared to at least .NET rather than a weakness.
I decided to run some more tests, based on the suggestions by user4815162342. This time I used another machine with Ubuntu 20.04.
Rust code
println!("----- HashMap (with its default SipHash hasher) -----------");
let hm: HashMap<i32, i32> = (0..1_000_000).map(|i| (i, i)).collect();
for k in 0..6 {
let t0 = Instant::now();
let mut sum: i64 = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += *x as i64;
}
}
let elapsed = t0.elapsed().as_secs_f64();
println!("The sum is: {}. Time elapsed: {:.3} sec", sum, elapsed);
}
println!("----- FnvHashMap (fnv 1.0.7) ------------------------------");
let hm: FnvHashMap<i32, i32> = (0..1_000_000).map(|i| (i, i)).collect();
for k in 0..6 {
let t0 = Instant::now();
let mut sum: i64 = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += *x as i64;
}
}
let elapsed = t0.elapsed().as_secs_f64();
println!("The sum is: {}. Time elapsed: {:.3} sec", sum, elapsed);
}
println!("----- FxHashMap (rustc-hash 1.1.0) ------------------------");
let hm: FxHashMap<i32, i32> = (0..1_000_000).map(|i| (i, i)).collect();
for k in 0..6 {
let t0 = Instant::now();
let mut sum: i64 = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += *x as i64;
}
}
let elapsed = t0.elapsed().as_secs_f64();
println!("The sum is: {}. Time elapsed: {:.3} sec", sum, elapsed);
}
println!("----- HashMap/BuildNoHashHasher (nohash-hasher 0.2.0) -----");
let hm: HashMap<i32, i32, nohash_hasher::BuildNoHashHasher<i32>> = (0..1_000_000).map(|i| (i, i)).collect();
for k in 0..6 {
let t0 = Instant::now();
let mut sum: i64 = 0;
for i in 0..1_000_000 {
if let Some(x) = hm.get(&i) {
sum += *x as i64;
}
}
let elapsed = t0.elapsed().as_secs_f64();
println!("The sum is: {}. Time elapsed: {:.3} sec", sum, elapsed);
}
BTW the last one can be replaced with this shorter type:
let hm: IntMap<i32, i32> = (0..1_000_000).map(|i| (i, i)).collect();
For those interested, this is IntMap's definition:
pub type IntMap<K, V> = std::collections::HashMap<K, V, BuildNoHashHasher<K>>;
Java code
On the same machine I tested a Java example. I don't have a JVM installed on it, so I used a Docker image adoptopenjdk/openjdk14 and directly pasted the code below in jshell> (not sure if that would hurt Java's timings). So this is the Java code:
Map<Integer, Integer> hm = new HashMap<>();
for (int i = 0; i < 1_000_000; i++) {
hm.put(i, i);
}
for (int k = 0; k < 6; k++) {
long t0 = System.currentTimeMillis();
// Start timer here
long sum = 0;
for (int i = 0; i < 1_000_000; i++) {
sum += hm.get(i);
}
System.out.println("The sum is: " + sum + ". Time elapsed: " + (System.currentTimeMillis() - t0) + " ms");
}
Results
Rust (release mode):
----- HashMap (with its default SipHash hasher) -----------
The sum is: 499999500000. Time elapsed: 0.149 sec
The sum is: 499999500000. Time elapsed: 0.140 sec
The sum is: 499999500000. Time elapsed: 0.167 sec
The sum is: 499999500000. Time elapsed: 0.150 sec
The sum is: 499999500000. Time elapsed: 0.261 sec
The sum is: 499999500000. Time elapsed: 0.189 sec
----- FnvHashMap (fnv 1.0.7) ------------------------------
The sum is: 499999500000. Time elapsed: 0.055 sec
The sum is: 499999500000. Time elapsed: 0.052 sec
The sum is: 499999500000. Time elapsed: 0.053 sec
The sum is: 499999500000. Time elapsed: 0.058 sec
The sum is: 499999500000. Time elapsed: 0.051 sec
The sum is: 499999500000. Time elapsed: 0.056 sec
----- FxHashMap (rustc-hash 1.1.0) ------------------------
The sum is: 499999500000. Time elapsed: 0.039 sec
The sum is: 499999500000. Time elapsed: 0.076 sec
The sum is: 499999500000. Time elapsed: 0.064 sec
The sum is: 499999500000. Time elapsed: 0.048 sec
The sum is: 499999500000. Time elapsed: 0.057 sec
The sum is: 499999500000. Time elapsed: 0.061 sec
----- HashMap/BuildNoHashHasher (nohash-hasher 0.2.0) -----
The sum is: 499999500000. Time elapsed: 0.004 sec
The sum is: 499999500000. Time elapsed: 0.003 sec
The sum is: 499999500000. Time elapsed: 0.003 sec
The sum is: 499999500000. Time elapsed: 0.003 sec
The sum is: 499999500000. Time elapsed: 0.003 sec
The sum is: 499999500000. Time elapsed: 0.003 sec
Java:
The sum is: 499999500000. Time elapsed: 49 ms // see notes below
The sum is: 499999500000. Time elapsed: 41 ms // see notes below
The sum is: 499999500000. Time elapsed: 18 ms
The sum is: 499999500000. Time elapsed: 29 ms
The sum is: 499999500000. Time elapsed: 19 ms
The sum is: 499999500000. Time elapsed: 23 ms
(With Java the first 1-2 runs are normally slower, as the JVM HotSpot still hasn't fully optimized the relevant piece of code.)
Try hashbrown
It uses aHash which have full comparison with other HashMap algorithm here
I have a loop where I start by a time.Time and I what to add a minute.
for idx := range keys {
var a = idx * time.Minute
var t = tInit.Add(time.Minute * a)
fmt.Println(t, idx)
}
Here is my error
invalid operation: idx * time.Minute (mismatched types int and time.Duration)
The operands to numeric operations must have the same type. Convert the int value idx to a time.Duration: var a = time.Duration(idx) * time.Minute
As a developer in other programing languages I found this the most counterintuitive and illogical way of doing it. I worked in Scala in the last 10 years, and it could be as simple as this:
val a = idx minutes
compared that, the Go way:
var a = time.Duration(idx) * time.Minute
is more verbose, but that wouldn't be the end of the world.
The problem is that multiplying a Duration with another Duration doesn't make any sense if what you want is to obtain another Duration as a result, because from a physical point of view that would be measured in something like seconds squared.
According to the documentation time.Minute is a constant:
const (
Nanosecond Duration = 1
Microsecond = 1000 * Nanosecond
Millisecond = 1000 * Microsecond
Second = 1000 * Millisecond
Minute = 60 * Second
Hour = 60 * Minute
)
And all those are defined in terms of the Duration type which is an alias for int64:
type Duration int64
From what I see is perfectly fine to multiply an integer literal with each one of these constants, after all that's how each one is defined in relation to the others.
So, to recap why is 60 * time.Second valid syntax (and makes perfect sense), but:
var secondsInAMinute := 60
var oneMinute = secondsInAMinute * time.Second
is invalid. This doesn't make any sense.
All those constants are of type Duration. That means they are measured in units of time (multiples of one nanosecond to be precise).
So, it seems the "correct" way to do it (correct in the sense that it compiles and works) doesn't make any physical sense. Let's look at this again:
var a = time.Duration(idx) * time.Minute
So, we are multiplying time.Duration(idx) with time.Minute.
The type for time.Minute is Duration which should be measured with a time unit. In physics it the accepted unit for time is the second. It seems Go uses integer nanoseconds instead, so time.Minute represents a Duration, represented internally in nanoseconds. That's fine.
The problem is that time.Duration(idx) also "converts" the integer idx to a Duration, so in physics it would also be represented as a unit of time, like seconds. So, accordingly, time.Duration(idx), in my opinion, represents idx nanoseconds in Go.
So, basically, when we write time.Duration(idx) * time.Minute we are muliplying idx nanoseconds (idx * 0.0000000001 seconds) with one minute (60 seconds).
So, from a physical point of view time.Duration(idx) * time.Minute would represent idx * 0.000000001 seconds * 60 seconds. Or, simplified, idx * 0.00000006 seconds squared.
Now, in what world is idx * 0.00000006 seconds squared equal to idx * 1 minute?
So, now I know, in Go, if you want to apply a multiplier to a duration, you have to multiply that Duration to another Duration, and divide that in your mind with one millisecond so that all this mess can still makes any kind of physical sense.
I understand that all these unit inconsistencies are the result of the "The operands to numeric operations must have the same type." constraint. But that doesn't make it more logical or less annoying. In my opinion that restriction of the Go language should be removed.
But, for anyone that was lost in my explanations, let's see how illogical all this is with a concrete code example:
package main
import (
"fmt"
"time"
)
func main() {
var oneMinute = 1 * time.Minute
var oneNanosecond = 1 * time.Nanosecond
var oneMinuteTimesOneNanoSecond = oneMinute * oneNanosecond
fmt.Println(oneMinuteTimesOneNanoSecond)
}
The result is exactly what I expected from this nonsensical way of doing time calculations:
1m0s
I'll learn to live with this, but I will never like it.
I am wondering about the formulas used in perf stat to calculate figures from the raw data.
perf stat -e task-clock,cycles,instructions,cache-references,cache-misses ./myapp
1080267.226401 task-clock (msec) # 19.062 CPUs utilized
1,592,123,216,789 cycles # 1.474 GHz (50.00%)
871,190,006,655 instructions # 0.55 insn per cycle (75.00%)
3,697,548,810 cache-references # 3.423 M/sec (75.00%)
459,457,321 cache-misses # 12.426 % of all cache refs (75.00%)
In this context, how do you calculate M/sec from cache-references?
Formulas are seems not to be implemented in the builtin-stat.c (where default event sets for perf stat are defined), but they are probably calculated (and averaged with stddev) in perf_stat__print_shadow_stats() (and some stats are collected into arrays in perf_stat__update_shadow_stats()):
http://elixir.free-electrons.com/linux/v4.13.4/source/tools/perf/util/stat-shadow.c#L626
When HW_INSTRUCTIONS is counted:
"Instructions per clock" = HW_INSTRUCTIONS / HW_CPU_CYCLES; "stalled cycles per instruction" = HW_STALLED_CYCLES_FRONTEND / HW_INSTRUCTIONS
if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
if (total) {
ratio = avg / total;
print_metric(ctxp, NULL, "%7.2f ",
"insn per cycle", ratio);
} else {
print_metric(ctxp, NULL, NULL, "insn per cycle", 0);
}
Branch misses are from print_branch_misses as HW_BRANCH_MISSES / HW_BRANCH_INSTRUCTIONS
There are several cache miss ratio calculations in perf_stat__print_shadow_stats() too like HW_CACHE_MISSES / HW_CACHE_REFERENCES and some more detailed (perf stat -d mode).
Stalled percents are computed as HW_STALLED_CYCLES_FRONTEND / HW_CPU_CYCLES and HW_STALLED_CYCLES_BACKEND / HW_CPU_CYCLES
GHz is computed as HW_CPU_CYCLES / runtime_nsecs_stats, where runtime_nsecs_stats was updated from any of software events task-clock or cpu-clock (SW_TASK_CLOCK & SW_CPU_CLOCK, We still know no exact difference between them two since 2010 in LKML and 2014 at SO)
if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) ||
perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK))
update_stats(&runtime_nsecs_stats[cpu], count[0]);
There are also several formulas for transactions (perf stat -T mode).
"CPU utilized" is from task-clock or cpu-clock / walltime_nsecs_stats, where walltime is calculated by the perf stat itself (in userspace using clock from the wall (astronomic time, ):
static inline unsigned long long rdclock(void)
{
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec * 1000000000ULL + ts.tv_nsec;
}
...
static int __run_perf_stat(int argc, const char **argv)
{
...
/*
* Enable counters and exec the command:
*/
t0 = rdclock();
clock_gettime(CLOCK_MONOTONIC, &ref_time);
if (forks) {
....
}
t1 = rdclock();
update_stats(&walltime_nsecs_stats, t1 - t0);
There are also some estimations from the Top-Down methodology (Tuning Applications Using a Top-down Microarchitecture Analysis Method, Software Optimizations Become Simple with Top-Down Analysis .. Name Skylake, IDF2015, #22 in Gregg's Methodology List. Described in 2016 by Andi Kleen https://lwn.net/Articles/688335/ "Add top down metrics to perf stat" (perf stat --topdown -I 1000 cmd mode).
And finally, if there was no exact formula for the currently printing event, there is universal "%c/sec" (K/sec or M/sec) metric: http://elixir.free-electrons.com/linux/v4.13.4/source/tools/perf/util/stat-shadow.c#L845 Anything divided by runtime nsec (task-clock or cpu-clock events, if they were present in perf stat event set)
} else if (runtime_nsecs_stats[cpu].n != 0) {
char unit = 'M';
char unit_buf[10];
total = avg_stats(&runtime_nsecs_stats[cpu]);
if (total)
ratio = 1000.0 * avg / total;
if (ratio < 0.001) {
ratio *= 1000;
unit = 'K';
}
snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
print_metric(ctxp, NULL, "%8.3f", unit_buf, ratio);
}
Is there any way to compute the time complexity of an algorithm programatically? For example, how could I calculate the complexity of a fibonacci(n) function?
The undecidability of the halting problem says that you can't even tell if an algorithm terminates. I'm pretty sure from this it follows that you can't generally solve the complexity of an algorithm.
While it's impossible to do for all cases (unless you run your own code parser and just look at loops and what impacts on their values and such), it is still possible to do as a black box test with an upper bound time set. That is to say, have some variable that is set to determine that once a program's execution passes this time it's considered to be running forever.
From this your code would look similar to this (quick and dirty code sorry it's a little verbose and the math might be off for larger powers I haven't checked).
It can be improved upon by using a set array of input values rather than randomly generating some, and also by checking a wider range of values, you should be able to check any input versus any other two inputs and determine all the patterns of method duration.
I'm sure there are much better (namely more accurate) ways to calculate the O between a set of given numbers than shown here (which neglects to relate the run time between elements too much).
static void Main(string[] args)
{
var sw = new Stopwatch();
var inputTimes = new Dictionary<int, double>();
List<int> inputValues = new List<int>();
for (int i = 0; i < 25; i++)
{
inputValues.Add(i);
}
var ThreadTimeout = 10000;
for (int i = 0; i < inputValues.Count; i++)
{
int input = inputValues[i];
var WorkerThread = new Thread(t => CallMagicMethod(input)) { Name = "WorkerThread" };
sw.Reset();
Console.WriteLine("Input value '{0}' running...", input);
sw.Start();
WorkerThread.Start();
WorkerThread.Join(ThreadTimeout);
sw.Stop();
if (WorkerThread.IsAlive)
{
Console.WriteLine("Input value '{0}' exceeds timeout", input);
WorkerThread.Abort();
//break;
inputTimes.Add(input, double.MaxValue);
continue;
}
inputTimes.Add(input, sw.Elapsed.TotalMilliseconds);
Console.WriteLine("Input value '{0}' took {1}ms", input, sw.Elapsed.TotalMilliseconds);
}
List<int> indexes = inputTimes.Keys.OrderBy(k => k).ToList();
// calculate the difference between the values:
for (int i = 0; i < indexes.Count - 2; i++)
{
int index0 = indexes[i];
int index1 = indexes[i + 1];
if (!inputTimes.ContainsKey(index1))
{
continue;
}
int index2 = indexes[i + 2];
if (!inputTimes.ContainsKey(index2))
{
continue;
}
double[] runTimes = new double[] { inputTimes[index0], inputTimes[index1], inputTimes[index2] };
if (IsRoughlyEqual(runTimes[2], runTimes[1], runTimes[0]))
{
Console.WriteLine("Execution time for input = {0} to {1} is roughly O(1)", index0, index2);
}
else if (IsRoughlyEqual(runTimes[2] / Math.Log(index2, 2), runTimes[1] / Math.Log(index1, 2), runTimes[0] / Math.Log(index0, 2)))
{
Console.WriteLine("Execution time for input = {0} to {1} is roughly O(log N)", index0, index2);
}
else if (IsRoughlyEqual(runTimes[2] / index2, runTimes[1] / index1, runTimes[0] / index0))
{
Console.WriteLine("Execution time for input = {0} to {1} is roughly O(N)", index0, index2);
}
else if (IsRoughlyEqual(runTimes[2] / (Math.Log(index2, 2) * index2), runTimes[1] / (Math.Log(index1, 2) * index1), runTimes[0] / (Math.Log(index0, 2) * index0)))
{
Console.WriteLine("Execution time for input = {0} to {1} is roughly O(N log N)", index0, index2);
}
else
{
for (int pow = 2; pow <= 10; pow++)
{
if (IsRoughlyEqual(runTimes[2] / Math.Pow(index2, pow), runTimes[1] / Math.Pow(index1, pow), runTimes[0] / Math.Pow(index0, pow)))
{
Console.WriteLine("Execution time for input = {0} to {1} is roughly O(N^{2})", index0, index2, pow);
break;
}
else if (pow == 10)
{
Console.WriteLine("Execution time for input = {0} to {1} is greater than O(N^10)", index0, index2);
}
}
}
}
Console.WriteLine("Fin.");
}
private static double variance = 0.02;
public static bool IsRoughlyEqual(double value, double lower, double upper)
{
//returns if the lower, value and upper are within a variance of the next value;
return IsBetween(lower, value * (1 - variance), value * (1 + variance)) &&
IsBetween(value, upper * (1 - variance), upper * (1 + variance));
}
public static bool IsBetween(double value, double lower, double upper)
{
//returns if the value is between the other 2 values +/- variance
lower = lower * (1 - variance);
upper = upper * (1 + variance);
return value > lower && value < upper;
}
public static void CallMagicMethod(int input)
{
try
{
MagicBox.MagicMethod(input);
}
catch (ThreadAbortException tae)
{
}
catch (Exception ex)
{
Console.WriteLine("Unexpected Exception Occured: {0}", ex.Message);
}
}
And an example output:
Input value '59' running...
Input value '59' took 1711.8416ms
Input value '14' running...
Input value '14' took 90.9222ms
Input value '43' running...
Input value '43' took 902.7444ms
Input value '22' running...
Input value '22' took 231.5498ms
Input value '50' running...
Input value '50' took 1224.761ms
Input value '27' running...
Input value '27' took 351.3938ms
Input value '5' running...
Input value '5' took 9.8048ms
Input value '28' running...
Input value '28' took 377.8156ms
Input value '26' running...
Input value '26' took 325.4898ms
Input value '46' running...
Input value '46' took 1035.6526ms
Execution time for input = 5 to 22 is greater than O(N^10)
Execution time for input = 14 to 26 is roughly O(N^2)
Execution time for input = 22 to 27 is roughly O(N^2)
Execution time for input = 26 to 28 is roughly O(N^2)
Execution time for input = 27 to 43 is roughly O(N^2)
Execution time for input = 28 to 46 is roughly O(N^2)
Execution time for input = 43 to 50 is roughly O(N^2)
Execution time for input = 46 to 59 is roughly O(N^2)
Fin.
Which shows the magic method is likely O(N^2) for the given inputs +/- 2% variance
and another result here:
Input value '0' took 0.7498ms
Input value '1' took 0.3062ms
Input value '2' took 0.5038ms
Input value '3' took 4.9239ms
Input value '4' took 14.2928ms
Input value '5' took 29.9069ms
Input value '6' took 55.4424ms
Input value '7' took 91.6886ms
Input value '8' took 140.5015ms
Input value '9' took 204.5546ms
Input value '10' took 285.4843ms
Input value '11' took 385.7506ms
Input value '12' took 506.8602ms
Input value '13' took 650.7438ms
Input value '14' took 819.8519ms
Input value '15' took 1015.8124ms
Execution time for input = 0 to 2 is greater than O(N^10)
Execution time for input = 1 to 3 is greater than O(N^10)
Execution time for input = 2 to 4 is greater than O(N^10)
Execution time for input = 3 to 5 is greater than O(N^10)
Execution time for input = 4 to 6 is greater than O(N^10)
Execution time for input = 5 to 7 is greater than O(N^10)
Execution time for input = 6 to 8 is greater than O(N^10)
Execution time for input = 7 to 9 is greater than O(N^10)
Execution time for input = 8 to 10 is roughly O(N^3)
Execution time for input = 9 to 11 is roughly O(N^3)
Execution time for input = 10 to 12 is roughly O(N^3)
Execution time for input = 11 to 13 is roughly O(N^3)
Execution time for input = 12 to 14 is roughly O(N^3)
Execution time for input = 13 to 15 is roughly O(N^3)
Which shows the magic method is likely O(N^3) for the given inputs +/- 2% variance
So It is possible to programatically determine the complexity of an algorithm, you need to make sure that you do not introduce some additional work which causes it to be longer than you think (such as building all the input for the function before you start timing it).
Further to this you also need to remember that this is going to take a significant time to try a large series of possible values and return how long it took, a more realistic test is to just call your function at a large realistic upper bound value and determine if it's response time is sufficient for your usage.
You likely would only need to do this if you are performing black box testing without source code (and can't use something like Reflector to view the source), or if you have to prove to a PHB that the coded algorithms are as fast as it can be (ignoring improvements to constants), as you claim it is.
Not in general. If the algorithm consists of nested simple for loops, e.g.
for (int i=a; i<b; ++i)
then you know this will contribute (b-a) steps. Now, if either b or a or both depends on n, then you can get a complexity from that. However, if you have something more exotic, like
for (int i=a; i<b; i=whackyFunction(i))
then you really need to understand what whackyFunction(i) does.
Similarly, break statements may screw this up, and while statements may be a lost cause since it's possible you wouldn't even be able to tell if the loop terminated.
Count arithmetic operations, memory accesses and memory space used inside fibbonacci() or whatever it is, measure its execution time. Do this with different inputs, see the emerging trends, the asymptotic behavior.
General measures like cyclomatic complexity are useful in giving you an idea of the more complex portions of your code, but it is a relatively simple mechanism.
What's a good algorithm for calculating frames per second in a game? I want to show it as a number in the corner of the screen. If I just look at how long it took to render the last frame the number changes too fast.
Bonus points if your answer updates each frame and doesn't converge differently when the frame rate is increasing vs decreasing.
You need a smoothed average, the easiest way is to take the current answer (the time to draw the last frame) and combine it with the previous answer.
// eg.
float smoothing = 0.9; // larger=more smoothing
measurement = (measurement * smoothing) + (current * (1.0-smoothing))
By adjusting the 0.9 / 0.1 ratio you can change the 'time constant' - that is how quickly the number responds to changes. A larger fraction in favour of the old answer gives a slower smoother change, a large fraction in favour of the new answer gives a quicker changing value. Obviously the two factors must add to one!
This is what I have used in many games.
#define MAXSAMPLES 100
int tickindex=0;
int ticksum=0;
int ticklist[MAXSAMPLES];
/* need to zero out the ticklist array before starting */
/* average will ramp up until the buffer is full */
/* returns average ticks per frame over the MAXSAMPLES last frames */
double CalcAverageTick(int newtick)
{
ticksum-=ticklist[tickindex]; /* subtract value falling off */
ticksum+=newtick; /* add new value */
ticklist[tickindex]=newtick; /* save new value so it can be subtracted later */
if(++tickindex==MAXSAMPLES) /* inc buffer index */
tickindex=0;
/* return average */
return((double)ticksum/MAXSAMPLES);
}
Well, certainly
frames / sec = 1 / (sec / frame)
But, as you point out, there's a lot of variation in the time it takes to render a single frame, and from a UI perspective updating the fps value at the frame rate is not usable at all (unless the number is very stable).
What you want is probably a moving average or some sort of binning / resetting counter.
For example, you could maintain a queue data structure which held the rendering times for each of the last 30, 60, 100, or what-have-you frames (you could even design it so the limit was adjustable at run-time). To determine a decent fps approximation you can determine the average fps from all the rendering times in the queue:
fps = # of rendering times in queue / total rendering time
When you finish rendering a new frame you enqueue a new rendering time and dequeue an old rendering time. Alternately, you could dequeue only when the total of the rendering times exceeded some preset value (e.g. 1 sec). You can maintain the "last fps value" and a last updated timestamp so you can trigger when to update the fps figure, if you so desire. Though with a moving average if you have consistent formatting, printing the "instantaneous average" fps on each frame would probably be ok.
Another method would be to have a resetting counter. Maintain a precise (millisecond) timestamp, a frame counter, and an fps value. When you finish rendering a frame, increment the counter. When the counter hits a pre-set limit (e.g. 100 frames) or when the time since the timestamp has passed some pre-set value (e.g. 1 sec), calculate the fps:
fps = # frames / (current time - start time)
Then reset the counter to 0 and set the timestamp to the current time.
Increment a counter every time you render a screen and clear that counter for some time interval over which you want to measure the frame-rate.
Ie. Every 3 seconds, get counter/3 and then clear the counter.
There are at least two ways to do it:
The first is the one others have mentioned here before me.
I think it's the simplest and preferred way. You just to keep track of
cn: counter of how many frames you've rendered
time_start: the time since you've started counting
time_now: the current time
Calculating the fps in this case is as simple as evaluating this formula:
FPS = cn / (time_now - time_start).
Then there is the uber cool way you might like to use some day:
Let's say you have 'i' frames to consider. I'll use this notation: f[0], f[1],..., f[i-1] to describe how long it took to render frame 0, frame 1, ..., frame (i-1) respectively.
Example where i = 3
|f[0] |f[1] |f[2] |
+----------+-------------+-------+------> time
Then, mathematical definition of fps after i frames would be
(1) fps[i] = i / (f[0] + ... + f[i-1])
And the same formula but only considering i-1 frames.
(2) fps[i-1] = (i-1) / (f[0] + ... + f[i-2])
Now the trick here is to modify the right side of formula (1) in such a way that it will contain the right side of formula (2) and substitute it for it's left side.
Like so (you should see it more clearly if you write it on a paper):
fps[i] = i / (f[0] + ... + f[i-1])
= i / ((f[0] + ... + f[i-2]) + f[i-1])
= (i/(i-1)) / ((f[0] + ... + f[i-2])/(i-1) + f[i-1]/(i-1))
= (i/(i-1)) / (1/fps[i-1] + f[i-1]/(i-1))
= ...
= (i*fps[i-1]) / (f[i-1] * fps[i-1] + i - 1)
So according to this formula (my math deriving skill are a bit rusty though), to calculate the new fps you need to know the fps from the previous frame, the duration it took to render the last frame and the number of frames you've rendered.
This might be overkill for most people, that's why I hadn't posted it when I implemented it. But it's very robust and flexible.
It stores a Queue with the last frame times, so it can accurately calculate an average FPS value much better than just taking the last frame into consideration.
It also allows you to ignore one frame, if you are doing something that you know is going to artificially screw up that frame's time.
It also allows you to change the number of frames to store in the Queue as it runs, so you can test it out on the fly what is the best value for you.
// Number of past frames to use for FPS smooth calculation - because
// Unity's smoothedDeltaTime, well - it kinda sucks
private int frameTimesSize = 60;
// A Queue is the perfect data structure for the smoothed FPS task;
// new values in, old values out
private Queue<float> frameTimes;
// Not really needed, but used for faster updating then processing
// the entire queue every frame
private float __frameTimesSum = 0;
// Flag to ignore the next frame when performing a heavy one-time operation
// (like changing resolution)
private bool _fpsIgnoreNextFrame = false;
//=============================================================================
// Call this after doing a heavy operation that will screw up with FPS calculation
void FPSIgnoreNextFrame() {
this._fpsIgnoreNextFrame = true;
}
//=============================================================================
// Smoothed FPS counter updating
void Update()
{
if (this._fpsIgnoreNextFrame) {
this._fpsIgnoreNextFrame = false;
return;
}
// While looping here allows the frameTimesSize member to be changed dinamically
while (this.frameTimes.Count >= this.frameTimesSize) {
this.__frameTimesSum -= this.frameTimes.Dequeue();
}
while (this.frameTimes.Count < this.frameTimesSize) {
this.__frameTimesSum += Time.deltaTime;
this.frameTimes.Enqueue(Time.deltaTime);
}
}
//=============================================================================
// Public function to get smoothed FPS values
public int GetSmoothedFPS() {
return (int)(this.frameTimesSize / this.__frameTimesSum * Time.timeScale);
}
Good answers here. Just how you implement it is dependent on what you need it for. I prefer the running average one myself "time = time * 0.9 + last_frame * 0.1" by the guy above.
however I personally like to weight my average more heavily towards newer data because in a game it is SPIKES that are the hardest to squash and thus of most interest to me. So I would use something more like a .7 \ .3 split will make a spike show up much faster (though it's effect will drop off-screen faster as well.. see below)
If your focus is on RENDERING time, then the .9.1 split works pretty nicely b/c it tend to be more smooth. THough for gameplay/AI/physics spikes are much more of a concern as THAT will usually what makes your game look choppy (which is often worse than a low frame rate assuming we're not dipping below 20 fps)
So, what I would do is also add something like this:
#define ONE_OVER_FPS (1.0f/60.0f)
static float g_SpikeGuardBreakpoint = 3.0f * ONE_OVER_FPS;
if(time > g_SpikeGuardBreakpoint)
DoInternalBreakpoint()
(fill in 3.0f with whatever magnitude you find to be an unacceptable spike)
This will let you find and thus solve FPS issues the end of the frame they happen.
A much better system than using a large array of old framerates is to just do something like this:
new_fps = old_fps * 0.99 + new_fps * 0.01
This method uses far less memory, requires far less code, and places more importance upon recent framerates than old framerates while still smoothing the effects of sudden framerate changes.
You could keep a counter, increment it after each frame is rendered, then reset the counter when you are on a new second (storing the previous value as the last second's # of frames rendered)
JavaScript:
// Set the end and start times
var start = (new Date).getTime(), end, FPS;
/* ...
* the loop/block your want to watch
* ...
*/
end = (new Date).getTime();
// since the times are by millisecond, use 1000 (1000ms = 1s)
// then multiply the result by (MaxFPS / 1000)
// FPS = (1000 - (end - start)) * (MaxFPS / 1000)
FPS = Math.round((1000 - (end - start)) * (60 / 1000));
Here's a complete example, using Python (but easily adapted to any language). It uses the smoothing equation in Martin's answer, so almost no memory overhead, and I chose values that worked for me (feel free to play around with the constants to adapt to your use case).
import time
SMOOTHING_FACTOR = 0.99
MAX_FPS = 10000
avg_fps = -1
last_tick = time.time()
while True:
# <Do your rendering work here...>
current_tick = time.time()
# Ensure we don't get crazy large frame rates, by capping to MAX_FPS
current_fps = 1.0 / max(current_tick - last_tick, 1.0/MAX_FPS)
last_tick = current_tick
if avg_fps < 0:
avg_fps = current_fps
else:
avg_fps = (avg_fps * SMOOTHING_FACTOR) + (current_fps * (1-SMOOTHING_FACTOR))
print(avg_fps)
Set counter to zero. Each time you draw a frame increment the counter. After each second print the counter. lather, rinse, repeat. If yo want extra credit, keep a running counter and divide by the total number of seconds for a running average.
In (c++ like) pseudocode these two are what I used in industrial image processing applications that had to process images from a set of externally triggered camera's. Variations in "frame rate" had a different source (slower or faster production on the belt) but the problem is the same. (I assume that you have a simple timer.peek() call that gives you something like the nr of msec (nsec?) since application start or the last call)
Solution 1: fast but not updated every frame
do while (1)
{
ProcessImage(frame)
if (frame.framenumber%poll_interval==0)
{
new_time=timer.peek()
framerate=poll_interval/(new_time - last_time)
last_time=new_time
}
}
Solution 2: updated every frame, requires more memory and CPU
do while (1)
{
ProcessImage(frame)
new_time=timer.peek()
delta=new_time - last_time
last_time = new_time
total_time += delta
delta_history.push(delta)
framerate= delta_history.length() / total_time
while (delta_history.length() > avg_interval)
{
oldest_delta = delta_history.pop()
total_time -= oldest_delta
}
}
qx.Class.define('FpsCounter', {
extend: qx.core.Object
,properties: {
}
,events: {
}
,construct: function(){
this.base(arguments);
this.restart();
}
,statics: {
}
,members: {
restart: function(){
this.__frames = [];
}
,addFrame: function(){
this.__frames.push(new Date());
}
,getFps: function(averageFrames){
debugger;
if(!averageFrames){
averageFrames = 2;
}
var time = 0;
var l = this.__frames.length;
var i = averageFrames;
while(i > 0){
if(l - i - 1 >= 0){
time += this.__frames[l - i] - this.__frames[l - i - 1];
}
i--;
}
var fps = averageFrames / time * 1000;
return fps;
}
}
});
How i do it!
boolean run = false;
int ticks = 0;
long tickstart;
int fps;
public void loop()
{
if(this.ticks==0)
{
this.tickstart = System.currentTimeMillis();
}
this.ticks++;
this.fps = (int)this.ticks / (System.currentTimeMillis()-this.tickstart);
}
In words, a tick clock tracks ticks. If it is the first time, it takes the current time and puts it in 'tickstart'. After the first tick, it makes the variable 'fps' equal how many ticks of the tick clock divided by the time minus the time of the first tick.
Fps is an integer, hence "(int)".
Here's how I do it (in Java):
private static long ONE_SECOND = 1000000L * 1000L; //1 second is 1000ms which is 1000000ns
LinkedList<Long> frames = new LinkedList<>(); //List of frames within 1 second
public int calcFPS(){
long time = System.nanoTime(); //Current time in nano seconds
frames.add(time); //Add this frame to the list
while(true){
long f = frames.getFirst(); //Look at the first element in frames
if(time - f > ONE_SECOND){ //If it was more than 1 second ago
frames.remove(); //Remove it from the list of frames
} else break;
/*If it was within 1 second we know that all other frames in the list
* are also within 1 second
*/
}
return frames.size(); //Return the size of the list
}
In Typescript, I use this algorithm to calculate framerate and frametime averages:
let getTime = () => {
return new Date().getTime();
}
let frames: any[] = [];
let previousTime = getTime();
let framerate:number = 0;
let frametime:number = 0;
let updateStats = (samples:number=60) => {
samples = Math.max(samples, 1) >> 0;
if (frames.length === samples) {
let currentTime: number = getTime() - previousTime;
frametime = currentTime / samples;
framerate = 1000 * samples / currentTime;
previousTime = getTime();
frames = [];
}
frames.push(1);
}
usage:
statsUpdate();
// Print
stats.innerHTML = Math.round(framerate) + ' FPS ' + frametime.toFixed(2) + ' ms';
Tip: If samples is 1, the result is real-time framerate and frametime.
This is based on KPexEA's answer and gives the Simple Moving Average. Tidied and converted to TypeScript for easy copy and paste:
Variable declaration:
fpsObject = {
maxSamples: 100,
tickIndex: 0,
tickSum: 0,
tickList: []
}
Function:
calculateFps(currentFps: number): number {
this.fpsObject.tickSum -= this.fpsObject.tickList[this.fpsObject.tickIndex] || 0
this.fpsObject.tickSum += currentFps
this.fpsObject.tickList[this.fpsObject.tickIndex] = currentFps
if (++this.fpsObject.tickIndex === this.fpsObject.maxSamples) this.fpsObject.tickIndex = 0
const smoothedFps = this.fpsObject.tickSum / this.fpsObject.maxSamples
return Math.floor(smoothedFps)
}
Usage (may vary in your app):
this.fps = this.calculateFps(this.ticker.FPS)
I adapted #KPexEA's answer to Go, moved the globals into struct fields, allowed the number of samples to be configurable, and used time.Duration instead of plain integers and floats.
type FrameTimeTracker struct {
samples []time.Duration
sum time.Duration
index int
}
func NewFrameTimeTracker(n int) *FrameTimeTracker {
return &FrameTimeTracker{
samples: make([]time.Duration, n),
}
}
func (t *FrameTimeTracker) AddFrameTime(frameTime time.Duration) (average time.Duration) {
// algorithm adapted from https://stackoverflow.com/a/87732/814422
t.sum -= t.samples[t.index]
t.sum += frameTime
t.samples[t.index] = frameTime
t.index++
if t.index == len(t.samples) {
t.index = 0
}
return t.sum / time.Duration(len(t.samples))
}
The use of time.Duration, which has nanosecond precision, eliminates the need for floating-point arithmetic to compute the average frame time, but comes at the expense of needing twice as much memory for the same number of samples.
You'd use it like this:
// track the last 60 frame times
frameTimeTracker := NewFrameTimeTracker(60)
// main game loop
for frame := 0;; frame++ {
// ...
if frame > 0 {
// prevFrameTime is the duration of the last frame
avgFrameTime := frameTimeTracker.AddFrameTime(prevFrameTime)
fps := 1.0 / avgFrameTime.Seconds()
}
// ...
}
Since the context of this question is game programming, I'll add some more notes about performance and optimization. The above approach is idiomatic Go but always involves two heap allocations: one for the struct itself and one for the array backing the slice of samples. If used as indicated above, these are long-lived allocations so they won't really tax the garbage collector. Profile before optimizing, as always.
However, if performance is a major concern, some changes can be made to eliminate the allocations and indirections:
Change samples from a slice of []time.Duration to an array of [N]time.Duration where N is fixed at compile time. This removes the flexibility of changing the number of samples at runtime, but in most cases that flexibility is unnecessary.
Then, eliminate the NewFrameTimeTracker constructor function entirely and use a var frameTimeTracker FrameTimeTracker declaration (at the package level or local to main) instead. Unlike C, Go will pre-zero all relevant memory.
Unfortunately, most of the answers here don't provide either accurate enough or sufficiently "slow responsive" FPS measurements. Here's how I do it in Rust using a measurement queue:
use std::collections::VecDeque;
use std::time::{Duration, Instant};
pub struct FpsCounter {
sample_period: Duration,
max_samples: usize,
creation_time: Instant,
frame_count: usize,
measurements: VecDeque<FrameCountMeasurement>,
}
#[derive(Copy, Clone)]
struct FrameCountMeasurement {
time: Instant,
frame_count: usize,
}
impl FpsCounter {
pub fn new(sample_period: Duration, samples: usize) -> Self {
assert!(samples > 1);
Self {
sample_period,
max_samples: samples,
creation_time: Instant::now(),
frame_count: 0,
measurements: VecDeque::new(),
}
}
pub fn fps(&self) -> f32 {
match (self.measurements.front(), self.measurements.back()) {
(Some(start), Some(end)) => {
let period = (end.time - start.time).as_secs_f32();
if period > 0.0 {
(end.frame_count - start.frame_count) as f32 / period
} else {
0.0
}
}
_ => 0.0,
}
}
pub fn update(&mut self) {
self.frame_count += 1;
let current_measurement = self.measure();
let last_measurement = self
.measurements
.back()
.copied()
.unwrap_or(FrameCountMeasurement {
time: self.creation_time,
frame_count: 0,
});
if (current_measurement.time - last_measurement.time) >= self.sample_period {
self.measurements.push_back(current_measurement);
while self.measurements.len() > self.max_samples {
self.measurements.pop_front();
}
}
}
fn measure(&self) -> FrameCountMeasurement {
FrameCountMeasurement {
time: Instant::now(),
frame_count: self.frame_count,
}
}
}
How to use:
Create the counter:
let mut fps_counter = FpsCounter::new(Duration::from_millis(100), 5);
Call fps_counter.update() on every frame drawn.
Call fps_counter.fps() whenever you like to display current FPS.
Now, the key is in parameters to FpsCounter::new() method: sample_period is how responsive fps() is to changes in framerate, and samples controls how quickly fps() ramps up or down to the actual framerate. So if you choose 10 ms and 100 samples, fps() would react almost instantly to any change in framerate - basically, FPS value on the screen would jitter like crazy, but since it's 100 samples, it would take 1 second to match the actual framerate.
So my choice of 100 ms and 5 samples means that displayed FPS counter doesn't make your eyes bleed by changing crazy fast, and it would match your actual framerate half a second after it changes, which is sensible enough for a game.
Since sample_period * samples is averaging time span, you don't want it to be too short if you want a reasonably accurate FPS counter.
store a start time and increment your framecounter once per loop? every few seconds you could just print framecount/(Now - starttime) and then reinitialize them.
edit: oops. double-ninja'ed