Fastest way to instantiate a pair of integer values in Kotlin to use as a fraction - performance

Motivation
I need to instance fractions in my code without the rounding error of floating point values. Therefore, I decided to use a pair of integer values, one for numerator and the other one for denominator.
Question
I don't know what to use: Pair<Int, Int, List<Int> or IntArray (array and list of size 2)? What instance would be the fastest to create and dispose?
Measurements
I wrote this code:
fun main() {
var b: Any
val elapsedPair = measureNanoTime {
for (i in 0..100000000) {
b = Pair(-2, 1)
}
}
println(elapsedPair)
val elapsedList = measureNanoTime {
for (i in 0..100000000) {
b = listOf(-2, 1)
}
}
println(elapsedList)
val elapsedArray = measureNanoTime {
for (i in 0..100000000) {
b = intArrayOf(-2, 1)
}
}
println(elapsedArray)
}
And got these results every time (not the exact numbers, but their order):
> 16338200
> 1340355300
> 6129200
It is clear that arrays are the fastest (because they are on stack) and lists are the slowest. But compiler could've used some optimizations for array, so these results aren't representative. Maybe there are some underlying optimizations for pair instantiation, that would make pair creation faster than array creation in most cases.

Use a data class:
data class Fraction(val numerator: Int, val denominator: Int)
It's really convenient to use:
val fraction = Fraction(2,7)
val (numerator, denominator) = fraction
And you can even add your own operators:
data class Fraction(val numerator: Int, val denominator: Int) {
operator fun divide(divisor: Int) = Fraction(numerator, denominator * divisor)
}
val fraction = Fraction(2,7)
val divided = fraction / 3
As for performance, it's not a problem until you've proven it's a problem. As always with performance problems, you need to measure and be sure what the real underlying issue is, before sacrificing code readability.

Related

How to generate a random number within a range in substrate?

I want generate random number within a certain range. How to do that in substrate?
fn draw_juror_for_citizen_profile_function(
citizen_id: u128,
length: usize,
) -> DispatchResult {
let nonce = Self::get_and_increment_nonce();
let random_seed = T::RandomnessSource::random(&nonce).encode();
let random_number = u64::decode(&mut random_seed.as_ref())
.expect("secure hashes should always be bigger than u32; qed");
Ok(())
}
I can't use rand package because it doesn't support no_std.
rng.gen_range(0..10));
I think you need to use the Randomness chain extension for this. See the Randomness docs.
This example shows how to call Randomness from a contract.
There is some discussion and another code exaxmple here.
EDIT: I'm not sure how random or appropriate this is but you could build on top of your random_seed snippet. In your example you say you need a random number between 0 and 10 so you could do:
fn max_index(array: &[u8]) -> usize {
let mut i = 0;
for (j, &value) in array.iter().enumerate() {
if value > array[i] {
i = j;
}
}
i
}
// generate your random seed
let arr1 = [0; 2];
let seed = self.env().random(&arr1).0;
// find the maximum index for the slice [0..10]
let rand_index = max_index(&seed.as_ref()[0..10]);
The returned number would be in the range 0-10. However, this is obviously limited by the fact you're starting with a [u8; 32]. For larger ranges maybe you simply concatenate u8 arrays.
Also note that this code simply takes the first max index if there are duplicates.

Hashmap slower than string.find?

I am doing exercises from leetcode as a way to learn Rust. One exercise involves finding the longest substring without any character repetition inside a string.
My first idea involved storing substrings in a string and searching the string to see if the character was already in it:
impl Solution {
pub fn length_of_longest_substring(s: String) -> i32 {
let mut unique_str = String::from("");
let mut schars: Vec<char> = s.chars().collect();
let mut longest = 0 as i32;
for x in 0..schars.len()
{
unique_str = schars[x].to_string();
for y in x+1..schars.len()
{
if is_new_char(&unique_str, schars[y])
{
unique_str.push(schars[y]);
} else {
break;
}
}
let cur_len = unique_str.len() as i32;
if cur_len > longest {
longest = cur_len;
}
}
longest
}
}
fn is_new_char ( unique_str: &str, c: char ) -> bool {
if unique_str.find(c) == None
{
true
} else {
false
}
}
It works fine but the performance was on the low side. Hoping to shave a few ms on the "find" operation, I replaced unique_str with a HashMap:
use std::collections::HashMap;
impl Solution {
pub fn length_of_longest_substring(s: String) -> i32 {
let mut hash_str = HashMap::new();
let mut schars: Vec<char> = s.chars().collect();
let mut longest = 0 as i32;
for x in 0..schars.len()
{
hash_str.insert(schars[x], x);
for y in x+1..schars.len()
{
if hash_str.contains_key(&schars[y]){
break;
} else {
hash_str.insert(schars[y], y);
}
}
let cur_len = hash_str.len() as i32;
if cur_len > longest {
longest = cur_len;
}
hash_str.clear();
}
longest
}
}
Surprisingly, the String.find() version is 3 times faster than the HashMap in the benchmarks, in spite of the fact that I am using the same algorithm (or at least I think so). Intuitively, I would have assumed that doing the lookups in a hashmap should be considerably faster than searching the string's characters, but it turned out to be the opposite.
Can someone explain why the HashMap is so much slower? (or point out what I am doing wrong).
When it comes to performance, one test is always better then 10 reasons.
use std::hash::{Hash, Hasher};
fn main() {
let start = std::time::SystemTime::now();
let mut hasher = std::collections::hash_map::DefaultHasher::new();
let s = "a";
let string = "ab";
for i in 0..100000000 {
s.hash(&mut hasher);
let hash = hasher.finish();
}
eprintln!("{}", start.elapsed().unwrap().as_millis());
}
I use debug build so that compiler would not optimize out most of my code.
On my machine taking 100M hashes above takes 14s. If I replace DefaultHasher with SipHasher as some comments suggested, it takes 17s.
Now, variant with string:
use std::hash::{Hash, Hasher};
fn main() {
let start = std::time::SystemTime::now();
let string = "abcde";
for i in 0..100000000 {
for c in string.chars() {
// do nothing
}
}
eprintln!("{}", start.elapsed().unwrap().as_millis());
}
Executing this code with 5 chars in the string takes 24s. If there are 2 chars, it takes 12s.
Now, how does it answer your question?..
To insert a value into a hashset, a hash must be calculated. Then every time you want to check if a character is in the hashset, you need to calculate a hash again. Also there is some small overhead for checking if the value is in the hashset over just calculating the hash.
As we can see from the tests, calculating one hash of a single character string takes around the same time as iterating over 3 symbol string. So let's say you have a unique_str with value abcde, and you check if there is a x character in it. Just checking it would be faster with HashSet, but then you also need to add x into the set, which makes it taking 2 hashes against iterating 5-symbol string.
So as long as on average your unique_str is shorter than 5 symbols, string realization is guaranteed to be faster. And in case of an input string like aaaaaaaaa...., it will be ~6 times faster, then the HashSet option.
Of course this analisys is very simplistic and there can be many other factors in play (like compiler optimizations and specific realization of Hash and Find for strings), but it gives the idea, why in some cases HashSet can be slower then string.find().
Side note: in your code you use HashMap instead of HashSet, which adds even more overhead and is not needed in your case...

Fewer line to count max int in array on Kotlin and faster than O(nlogn)?

I wonder if there a better way or idiomatic way to count the max int in an array with Kotlin and faster than O(nlogn)?
This code gives O(n) but I feel like it too long
fun countMax(n: Int, ar: Array<Int>): Int {
val max = ar.max();
var countMax = 0
for(i in ar)
if(i==max)
countMax++
return countMax
}
fun main(args: Array<String>) {
val scan = Scanner(System.`in`)
val n = scan.nextLine().trim().toInt()
val ar = scan.nextLine().split(" ").map{ it.trim().toInt() }.toTypedArray()
val result = birthdayCakeCandles(n, ar)
println(result)
}
Sorting then counting got nlogn
val input: Scanner = if (inputFile.exists()) Scanner(inputFile) else Scanner(System.in)
fun main(args: Array<String>) {
input.nextLine()
val nums = input.nextLine().split(' ').map { it.toLong() }.sorted()
val s = nums.takeLastWhile { it == nums.last() }.size
print(s)
}
I wonder there is a shorter code and perform faster than O(nlogn)
You could do it like this:
fun countMax(ar: Array<Int>) =
ar.max().let { max -> ar.count { it == max } }
Calc the maximum with max and then use count to get the number of occurrences of that max in the array.
Alternatively, group the values, extract the group with max as its key, and map to the size:
fun countMax(ar: Array<Int>) =
ar.groupBy { it }.maxBy { it.key }?.value?.size
Fold the array, where the initial value is a pair holding Int.MIN_VALUE and a count of 0, and the operation returns a new pair incrementing the count if the given element is equal to the first number of the given pair (which represents the biggest number seen so far), or if the element is greater than the first number then it returns a pair holding that element and a count of 1, or if the element is less than the first number then it simply returns that same pair.
This approach only traverses the array once, minimizing the number of comparisons performed.

Fastest weighted random algorithm in Scala?

I'm writing a server-side module for a Scala based project, and I need to find the fastest way to perform a weighted random number generation between some Int weights. The method should be as fastest as possible since it will be called very often.
Now, this is what I came up with:
import scala.util.Random
trait CumulativeDensity {
/** Returns the index result of a binary search to find #n in the discrete
* #cdf array.
*/
def search(n: Int, cdf: Array[Int]): Int = {
val i: Int = cdf.indexWhere(_ != 0)
if (i<0 | n<=cdf(i))
i
else
search(n-cdf(i), {cdf.update(i,0); cdf})
}
/** Returns the cumulative density function (CDF) of #list (in simple terms,
* the cumulative sums of the weights).
*/
def cdf(list: Array[Int]) = list.map{
var s = 0;
d => {s += d; s}
}
}
And I define the main method with this piece of code:
def rndWeighted(list: Array[Int]): Int =
search(Random.nextInt(list.sum + 1), cdf(list))
However, it still isn't fast enough. Is there any kind of black magic that makes unnecessary to iterate over the list since its start (libraries, built-ins, heuristics)?
EDIT: this is the final version of the code (much faster now):
def search(n: Int, cdf: Array[Int]): Int = {
if (n > cdf.head)
1 + search(n-cdf.head, cdf.tail)
else
0
}
Instead of cdf.update(i,0) and passing the entire cdf back to cdf.indexWhere(_ != 0) in the next recursive call, consider
cdf.splitAt(i)
and passing only the elements on the right of i, so in the following recursion, indexWhere scans a smaller array. Note the array size being monotonic decreasing at each recursive call ensures termination.

How to get the (Greatest Common Divisor)GCD of Doubles

This is a simple task but i can't seem to figure out how to do it
Here is a sample function structure
private double GetGCD(double num1, double num2)
{
//should return the GCD of the two double
}
test data
num1 = 6;
num2 = 3;
*return value must be 3*
num1 = 8.8;
num2 = 6.6;
*return value must be 2.2*
num1 = 5.1;
num2 = 8.5;
*return value must be 1.7*
note: maximum decimal places is 1.
programming language is not important. i just need the algorthm
please help.. thank you!
If you have only one decimal place, multiply the numbers by 10, convert them to integers and run an integer GCD function.
This will also save you floating point precision errors.
Quoting this answer, the base Euclidean algorithm in Python (for integers!) is:
def gcd(a, b):
"""Calculate the Greatest Common Divisor of a and b.
Unless b==0, the result will have the same sign as b (so that when
b is divided by it, the result comes out positive).
"""
while b:
a, b = b, a%b
return a
So, your code should be something like:
def gcd_floats(x,y):
return gcd( int(x*10), int(y*10) )/10
When it's 8.8 and 6.6 then you can find the GCD of 88 and 66 and then divide it by 10.
There are zillions of places on the web to find code for the GCD function. Since, strictly speaking, it is only defined on integers, I suggest you multiply your doubles by 10, work out the GCD and divide the result by 10. This will save you a world of pain arising from using the wrong datatype.
here is a source from google with some java code : http://www.merriampark.com/gcd.htm this is pretty comprehensive.
There is no such thing as the GCD of a number which is not discrete. However, your case is more specific. If your input is not a Double, but a Decimal, then you can convert it to a Fraction, multiply the denominators, find the GCD of the numerators and divide back down. That is:
8.800 = 8800/1000 = 44/5 (by GCD)
6.600 = 6600/1000 = 33/5 (by GCD)
5.100 = 5100/1000 = 51/10
8.500 = 8500/1000 = 17/2
It's useful to simplify the fractions in this step in order to avoid our numbers getting too large.
Move to a common denominator:
44*5/5*5 = 220/25
33*5/5*5 = 165/25
51*2/2*10 = 102/20
17*10/2*10 = 170/20
GCD of numerator:
gcd(165,220) = 55
gcd(102,170) = 34
So answers are 55/25 and 34/20.
Using 2 methods
The traditional division method
Euclid's method
class GCD
{
public static void main(String[] args)
{
int a = (int)(1.2*10);
int b = (int)(3.4*10);
System.out.println((float)gcd(a, b)/10);
}
// 1
public static int gcd(int a, int b)
{
if(b==0)
return a;
else
return gcd(b, (int)a%b);
}
// 2
public static int gcd(int a, int b)
{
int k,i;
if(a>b)
k = b;
else
k = a;
for(i=k; i>=2; i--)
{
if( (a%i==0)&&(b%i==0) )
{
break;
}
}
return i;
}
}

Resources