How to generate a random number within a range in substrate? - substrate

I want generate random number within a certain range. How to do that in substrate?
fn draw_juror_for_citizen_profile_function(
citizen_id: u128,
length: usize,
) -> DispatchResult {
let nonce = Self::get_and_increment_nonce();
let random_seed = T::RandomnessSource::random(&nonce).encode();
let random_number = u64::decode(&mut random_seed.as_ref())
.expect("secure hashes should always be bigger than u32; qed");
Ok(())
}
I can't use rand package because it doesn't support no_std.
rng.gen_range(0..10));

I think you need to use the Randomness chain extension for this. See the Randomness docs.
This example shows how to call Randomness from a contract.
There is some discussion and another code exaxmple here.
EDIT: I'm not sure how random or appropriate this is but you could build on top of your random_seed snippet. In your example you say you need a random number between 0 and 10 so you could do:
fn max_index(array: &[u8]) -> usize {
let mut i = 0;
for (j, &value) in array.iter().enumerate() {
if value > array[i] {
i = j;
}
}
i
}
// generate your random seed
let arr1 = [0; 2];
let seed = self.env().random(&arr1).0;
// find the maximum index for the slice [0..10]
let rand_index = max_index(&seed.as_ref()[0..10]);
The returned number would be in the range 0-10. However, this is obviously limited by the fact you're starting with a [u8; 32]. For larger ranges maybe you simply concatenate u8 arrays.
Also note that this code simply takes the first max index if there are duplicates.

Related

Rust - why is my program performing very slowly - over 5 times slower than the same program written in JavaScript using Node

I have finished converting an application that I made in JavaScript to Rust for increased performance. I am learning to program, and all the application does is work out the multiplicative persistence of any number in a range. It multiplies all digits together to form a new number, then repeats until the number becomes less than 10.
My issue is, my program written in JavaScript is over 5 times faster than the same in Rust. I must be doing something wrong with converting Strings to ints somewhere, I even tried swapping i128 to i64 and it made little difference.
If I run "cargo run --release" it is still slower!
Please can somebody look through my code to work out if there is any part of it that is causing the issues? Thank you in advance :)
fn multiplicative_persistence(mut user_input: i128) -> i128 {
let mut steps: i128 = 0;
let mut numbers: Vec<i128> = Vec::new();
while user_input > 10 {
let string_number: String = user_input.to_string();
let digits: Vec<&str> = string_number.split("").collect();
let mut sum: i128 = 1;
let digits_count = digits.len();
for number in 1..digits_count - 1 {
sum *= digits[number].parse::<i128>().unwrap();
}
numbers.push(sum);
steps += 1;
user_input = sum;
}
return steps;
}
fn main() {
// let _user_input: i128 = 277777788888899;
let mut highest_steps_count: i128 = 0;
let mut highest_steps_number: i128 = 0;
let start: i128 = 77551000000;
let finish: i128 = 1000000000000000;
for number in start..=finish {
// println!("{}: {}", number, multiplicative_persistence(number));
if multiplicative_persistence(number) > highest_steps_count {
highest_steps_count = multiplicative_persistence(number);
highest_steps_number = number;
}
if number % 1000000 == 0 {
println!("Upto {} so far: {}", number, highest_steps_number);
}
}
println!("Highest step count: {} at {}", highest_steps_number, highest_steps_count);
}
I do plan to use the numbers variable in the function but I have not learnt enough to know how to properly return it as an associative array.
Maybe the issue is that converting a number to a string, and then re-converting it again into a number is not that fast, and avoidable. You don't need this intermediate step:
fn step(mut x: i128) -> i128 {
let mut result = 1;
while x > 0 {
result *= x % 10;
x /= 10;
}
result
}
fn multiplicative_persistence(mut user_input: i128) -> i128 {
let mut steps = 0;
while user_input > 10 {
user_input = step(user_input);
steps += 1;
}
steps
}
EDIT Just out of curiosity, I'd like to know whether the bottleneck is really due to the string conversion or to the rest of the code that is somehow wasteful. Here is an example that does not call .split(""), does not re-allocate that intermediate vector, and only allocates once, not at each step, the string.
#![feature(fmt_internals)]
use std::fmt::{Formatter, Display};
fn multiplicative_persistence(user_input: i128) -> i128 {
let mut steps = 0;
let mut digits = user_input.to_string();
while user_input > 10 {
let product = digits
.chars()
.map(|x| x.to_digit(10).unwrap())
.fold(1, |acc, i| acc*i);
digits.clear();
let mut formatter = Formatter::new(&mut digits);
Display::fmt(&product, &mut formatter).unwrap();
steps += 1;
}
steps
}
I have basically inlined the string conversion that would be performed by .to_string() in order to re-use the already-allocated buffer, instead of re-allocating one each iteration. You can try it out on the playground. Note that you need a nightly compiler because it makes use of an unstable feature.

Hashmap slower than string.find?

I am doing exercises from leetcode as a way to learn Rust. One exercise involves finding the longest substring without any character repetition inside a string.
My first idea involved storing substrings in a string and searching the string to see if the character was already in it:
impl Solution {
pub fn length_of_longest_substring(s: String) -> i32 {
let mut unique_str = String::from("");
let mut schars: Vec<char> = s.chars().collect();
let mut longest = 0 as i32;
for x in 0..schars.len()
{
unique_str = schars[x].to_string();
for y in x+1..schars.len()
{
if is_new_char(&unique_str, schars[y])
{
unique_str.push(schars[y]);
} else {
break;
}
}
let cur_len = unique_str.len() as i32;
if cur_len > longest {
longest = cur_len;
}
}
longest
}
}
fn is_new_char ( unique_str: &str, c: char ) -> bool {
if unique_str.find(c) == None
{
true
} else {
false
}
}
It works fine but the performance was on the low side. Hoping to shave a few ms on the "find" operation, I replaced unique_str with a HashMap:
use std::collections::HashMap;
impl Solution {
pub fn length_of_longest_substring(s: String) -> i32 {
let mut hash_str = HashMap::new();
let mut schars: Vec<char> = s.chars().collect();
let mut longest = 0 as i32;
for x in 0..schars.len()
{
hash_str.insert(schars[x], x);
for y in x+1..schars.len()
{
if hash_str.contains_key(&schars[y]){
break;
} else {
hash_str.insert(schars[y], y);
}
}
let cur_len = hash_str.len() as i32;
if cur_len > longest {
longest = cur_len;
}
hash_str.clear();
}
longest
}
}
Surprisingly, the String.find() version is 3 times faster than the HashMap in the benchmarks, in spite of the fact that I am using the same algorithm (or at least I think so). Intuitively, I would have assumed that doing the lookups in a hashmap should be considerably faster than searching the string's characters, but it turned out to be the opposite.
Can someone explain why the HashMap is so much slower? (or point out what I am doing wrong).
When it comes to performance, one test is always better then 10 reasons.
use std::hash::{Hash, Hasher};
fn main() {
let start = std::time::SystemTime::now();
let mut hasher = std::collections::hash_map::DefaultHasher::new();
let s = "a";
let string = "ab";
for i in 0..100000000 {
s.hash(&mut hasher);
let hash = hasher.finish();
}
eprintln!("{}", start.elapsed().unwrap().as_millis());
}
I use debug build so that compiler would not optimize out most of my code.
On my machine taking 100M hashes above takes 14s. If I replace DefaultHasher with SipHasher as some comments suggested, it takes 17s.
Now, variant with string:
use std::hash::{Hash, Hasher};
fn main() {
let start = std::time::SystemTime::now();
let string = "abcde";
for i in 0..100000000 {
for c in string.chars() {
// do nothing
}
}
eprintln!("{}", start.elapsed().unwrap().as_millis());
}
Executing this code with 5 chars in the string takes 24s. If there are 2 chars, it takes 12s.
Now, how does it answer your question?..
To insert a value into a hashset, a hash must be calculated. Then every time you want to check if a character is in the hashset, you need to calculate a hash again. Also there is some small overhead for checking if the value is in the hashset over just calculating the hash.
As we can see from the tests, calculating one hash of a single character string takes around the same time as iterating over 3 symbol string. So let's say you have a unique_str with value abcde, and you check if there is a x character in it. Just checking it would be faster with HashSet, but then you also need to add x into the set, which makes it taking 2 hashes against iterating 5-symbol string.
So as long as on average your unique_str is shorter than 5 symbols, string realization is guaranteed to be faster. And in case of an input string like aaaaaaaaa...., it will be ~6 times faster, then the HashSet option.
Of course this analisys is very simplistic and there can be many other factors in play (like compiler optimizations and specific realization of Hash and Find for strings), but it gives the idea, why in some cases HashSet can be slower then string.find().
Side note: in your code you use HashMap instead of HashSet, which adds even more overhead and is not needed in your case...

Can I randomly sample from a HashSet efficiently?

I have a std::collections::HashSet, and I want to sample and remove a uniformly random element.
Currently, what I'm doing is randomly sampling an index using rand.gen_range, then iterating over the HashSet to that index to get the element. Then I remove the selected element. This works, but it's not efficient. Is there an efficient way to do randomly sample an element?
Here's a stripped down version of what my code looks like:
use std::collections::HashSet;
extern crate rand;
use rand::thread_rng;
use rand::Rng;
let mut hash_set = HashSet::new();
// ... Fill up hash_set ...
let index = thread_rng().gen_range(0, hash_set.len());
let element = hash_set.iter().nth(index).unwrap().clone();
hash_set.remove(&element);
// ... Use element ...
The only data structures allowing uniform sampling in constant time are data structures with constant time index access. HashSet does not provide indexing, so you can’t generate random samples in constant time.
I suggest to convert your hash set to a Vec first, and then sample from the vector. To remove an element, simply move the last element in its place – the order of the elements in the vector is immaterial anyway.
If you want to consume all elements from the set in random order, you can also shuffle the vector once and then iterate over it.
Here is an example implementation for removing a random element from a Vec in constant time:
use rand::{thread_rng, Rng};
pub trait RemoveRandom {
type Item;
fn remove_random<R: Rng>(&mut self, rng: &mut R) -> Option<Self::Item>;
}
impl<T> RemoveRandom for Vec<T> {
type Item = T;
fn remove_random<R: Rng>(&mut self, rng: &mut R) -> Option<Self::Item> {
if self.len() == 0 {
None
} else {
let index = rng.gen_range(0..self.len());
Some(self.swap_remove(index))
}
}
}
(Playground)
Thinking about Sven Marnach's answer, I want to use a vector, but I also need constant time insertion without duplication. Then I realized that I can maintain both a vector and a set, and ensure that they both had the same elements at all times. This will allow both constant time insertion with deduplication and constant time random removal.
Here's the implementation I ended up with:
struct VecSet<T> {
set: HashSet<T>,
vec: Vec<T>,
}
impl<T> VecSet<T>
where
T: Clone + Eq + std::hash::Hash,
{
fn new() -> Self {
Self {
set: HashSet::new(),
vec: Vec::new(),
}
}
fn insert(&mut self, elem: T) {
assert_eq!(self.set.len(), self.vec.len());
let was_new = self.set.insert(elem.clone());
if was_new {
self.vec.push(elem);
}
}
fn remove_random(&mut self) -> T {
assert_eq!(self.set.len(), self.vec.len());
let index = thread_rng().gen_range(0, self.vec.len());
let elem = self.vec.swap_remove(index);
let was_present = self.set.remove(&elem);
assert!(was_present);
elem
}
fn is_empty(&self) -> bool {
assert_eq!(self.set.len(), self.vec.len());
self.vec.is_empty()
}
}
Sven's answer suggests converting the HashSet to a Vec, in order to randomly sample from the Vec in O(1) time. This conversion takes O(n) time and is suitable if the conversion needs to be done only sparingly; e.g., for taking a series of random samples from an otherwise unchanging hashset. It is less suitable if conversions need to be done often, e.g., if, between taking random samples, one wants to intersperse some O(1) removals-by-value from the HashSet, since that would involve converting back and forth between HashSet and Vec, with each conversion taking O(n) time.
isaacg's solution is to keep both a HashSet and a Vec and operate on them in tandem. This allows O(1) lookup by index, O(1) random removal, and O(1) insertion, but not O(1) lookup by value or O(1) removal by value (because the Vec can't do those).
Below, I give a data structure that allows O(1) lookup by index or by value, O(1) insertion, and O(1) removal by index or value:
It is a HashMap<T, usize> together with a Vec<T>, such that the Vec maps indexes (which are usizes) to Ts, while the HashMap maps Ts to usizes. The HashMap and Vec can be thought of as inverse functions of one another, so that you can go from an index to its value, and from a value back to its index. The insertion and deletion operations are defined so that the indexes are precisely the integers from 0 to size()-1, with no gaps allowed. I call this data structure a BijectiveFiniteSequence. (Note the take_random_val method; it works in O(1) time.)
use std::collections::HashMap;
use rand::{thread_rng, Rng};
#[derive(Clone, Debug)]
struct BijectiveFiniteSequence<T: Eq + Copy + Hash> {
idx_to_val: Vec<T>,
val_to_idx: HashMap<T, usize>,
}
impl<T: Eq + Copy + Hash> BijectiveFiniteSequence<T> {
fn new () -> BijectiveFiniteSequence<T> {
BijectiveFiniteSequence {
idx_to_val: Vec::new(),
val_to_idx: HashMap::new()
}
}
fn insert(&mut self, val: T) {
self.idx_to_val.push(val);
self.val_to_idx.insert(val, self.len()-1);
}
fn take_random_val(&mut self) -> Option<T> {
let mut rng = thread_rng();
let rand_idx: usize = rng.gen_range(0..self.len());
self.remove_by_idx(rand_idx)
}
fn remove_by_idx(&mut self, idx: usize) -> Option<T> {
match idx < self.len() {
true => {
let val = self.idx_to_val[idx];
let last_idx = self.len() - 1;
self.idx_to_val.swap(idx, last_idx);
self.idx_to_val.pop();
// update hashmap entry after the swap above
self.val_to_idx.insert(self.idx_to_val[idx], idx);
self.val_to_idx.remove(&val);
Some(val)
},
false => None
}
}
fn remove_val(&mut self, val: T) -> Option<T> {
//nearly identical to the implementation of remove_by_idx,above
match self.contains(&val) {
true => {
let idx: usize = *self.val_to_idx.get(&val).unwrap();
let last_idx = self.len() - 1;
self.idx_to_val.swap(idx, last_idx);
self.idx_to_val.pop();
// update hashmap entry after the swap above
self.val_to_idx.insert(self.idx_to_val[idx], idx);
self.val_to_idx.remove(&val);
Some(val)
}
false => None
}
}
fn get_idx_of(&mut self, val: &T) -> Option<&usize> {
self.val_to_idx.get(val)
}
fn get_val_at(&mut self, idx: usize) -> Option<T> {
match idx < self.len() {
true => Some(self.idx_to_val[idx]),
false => None
}
}
fn contains(&self, val: &T) -> bool {
self.val_to_idx.contains_key(val)
}
fn len(&self) -> usize {
self.idx_to_val.len()
}
// etc. etc. etc.
}
According to the documentation for HashSet::iter it returns "An iterator visiting all elements in arbitrary order."
Arbitrary is perhaps not exactly uniform randomness, but if it's close enough for your use case, this is O(1) and will return different values each time:
// Build a set of integers 0 - 99
let mut set = HashSet::new();
for i in 0..100 {
set.insert(i);
}
// Sample
for _ in 0..10 {
let n = set.iter().next().unwrap().clone();
println!("{}", n);
set.remove(&n);
}
Like the author I wanted to remove the value after sampling from the HashSet. Sampling multiple times this way, without altering the HashSet, seems to yield the same result each time.

How (if possible) to sort a BTreeMap by value in Rust?

I am following a course on Software Security for which one of the assignments is to write some basic programs in Rust. For one of these assignments I need to analyze a text-file and generate several statistics. One of these is a generated list of the ten most used words in the text.
I have written this program that performs all tasks in the assignment except for the word frequency statistic mentioned above, the program compiles and executes the way I expect:
extern crate regex;
use std::error::Error;
use std::fs::File;
use std::io::prelude::*;
use std::path::Path;
use std::io::BufReader;
use std::collections::BTreeMap;
use regex::Regex;
fn main() {
// Create a path to the desired file
let path = Path::new("text.txt");
let display = path.display();
let file = match File::open(&path) {
Err(why) => panic!("couldn't open {}: {}", display,
why.description()),
Ok(file) => file,
};
let mut wordcount = 0;
let mut averagesize = 0;
let mut wordsize = BTreeMap::new();
let mut words = BTreeMap::new();
for line in (BufReader::new(file)).lines() {
let re = Regex::new(r"([A-Za-z]+[-_]*[A-Za-z]+)+").unwrap();
for cap in re.captures_iter(&line.unwrap()) {
let word = cap.at(1).unwrap_or("");
let lower = word.to_lowercase();
let s = lower.len();
wordcount += 1;
averagesize += s;
*words.entry(lower).or_insert(0) += 1;
*wordsize.entry(s).or_insert(0) += 1;
}
}
averagesize = averagesize / wordcount;
println!("This file contains {} words with an average of {} letters per word.", wordcount, averagesize);
println!("\nThe number of times a word of a certain length was found.");
for (size, count) in wordsize.iter() {
println!("There are {} words of size {}.", count, size);
}
println!("\nThe ten most used words.");
let mut popwords = BTreeMap::new();
for (word, count) in words.iter() {
if !popwords.contains_key(count) {
popwords.insert(count, "");
}
let newstring = format!("{} {}", popwords.get(count), word);
let mut e = popwords.get_mut(count);
}
let mut i = 0;
for (count, words) in popwords.iter() {
i += 1;
if i > 10 {
break;
}
println!("{} times: {}", count, words);
}
}
I have a BTreeMap (that I chose with these instructions), words, that stores each word as key and its associated frequency in the text as value. This functionality works as I expect, but there I am stuck. I have been trying to find ways to sort the BTreemap by value or find another data structure in Rust that is natively sorted by value.
I am looking for the correct way to achieve this data structure (a list of words with their frequency, sorted by frequency) in Rust. Any pointers are greatly appreciated!
If you only need to analyze a static dataset, the easiest way is to just convert your BTreeMap into a Vec<T> in the end and sort the latter (Playground):
use std::iter::FromIterator;
let mut v = Vec::from_iter(map);
v.sort_by(|&(_, a), &(_, b)| b.cmp(&a));
The vector contains the (key, value) pairs as tuple. To sort the vector, we have to use sort_by() or sort_by_key(). To sort the vector in decreasing order, I used b.cmp(&a) (as opposed to a.cmp(&b), which would be the natural order). But there are other possibilities to reverse the order of a sort.
However, if you really need some data structure such that you have a streaming calculation, it's getting more complicated. There are many possibilities in that case, but I guess using some kind of priority queue could work out.

How do I turn a circular buffer into a vector in O(n) without an allocation?

I have a Vec that is the allocation for a circular buffer. Let's assume the buffer is full, so there are no elements in the allocation that aren't in the circular buffer. I now want to turn that circular buffer into a Vec where the first element of the circular buffer is also the first element of the Vec. As an example I have this (allocating) function:
fn normalize(tail: usize, buf: Vec<usize>) -> Vec<usize> {
let n = buf.len();
buf[tail..n]
.iter()
.chain(buf[0..tail].iter())
.cloned()
.collect()
}
Playground
Obviously this can also be done without allocating anything, since we already have an allocation that is large enough, and we have a swap operation to swap arbitrary elements of the allocation.
fn normalize(tail: usize, mut buf: Vec<usize>) -> Vec<usize> {
for _ in 0..tail {
for i in 0..(buf.len() - 1) {
buf.swap(i, i + 1);
}
}
buf
}
Playground
Sadly this requires buf.len() * tail swap operations. I'm fairly sure it can be done in buf.len() + tail swap operations. For concrete values of tail and buf.len() I have been able to figure out solutions, but I'm not sure how to do it in the general case.
My recursive partial solution can be seen in action.
The simplest solution is to use 3 reversals, indeed this is what is recommended in Algorithm to rotate an array in linear time.
// rotate to the left by "k".
fn rotate<T>(array: &mut [T], k: usize) {
if array.is_empty() { return; }
let k = k % array.len();
array[..k].reverse();
array[k..].reverse();
array.reverse();
}
While this is linear, this requires reading and writing each element at most twice (reversing a range with an odd number of elements does not require touching the middle element). On the other hand, the very predictable access pattern of the reversal plays nice with prefetching, YMMV.
This operation is typically called a "rotation" of the vector, e.g. the C++ standard library has std::rotate to do this. There are known algorithms for doing the operation, although you may have to quite careful when porting if you're trying to it generically/with non-Copy types, where swaps become key, as one can't generally just read something straight out from a vector.
That said, one is likely to be able to use unsafe code with std::ptr::read/std::ptr::write for this, since data is just being moved around, and hence there's no need to execute caller-defined code or very complicated concerns about exception safety.
A port of the C code in the link above (by #ker):
fn rotate(k: usize, a: &mut [i32]) {
if k == 0 { return }
let mut c = 0;
let n = a.len();
let mut v = 0;
while c < n {
let mut t = v;
let mut tp = v + k;
let tmp = a[v];
c += 1;
while tp != v {
a[t] = a[tp];
t = tp;
tp += k;
if tp >= n { tp -= n; }
c += 1;
}
a[t] = tmp;
v += 1;
}
}

Resources