I know how to iterate over a HashMap in Rust, however, I am a little confused about how this works in memory. How do we iterate over values that are not stored sequentially in memory? A detailed explanation of the code below at the heap and stack level would be much appreciated.
use std::collections::HashMap;
let name = vec![String::from("Charlie"), String::from("Winston"), String::from("Brian"), String::from("Jack")];
let age = vec![50, 5, 7, 21];
let mut people_ages: HashMap<String, i32> = name.into_iter().zip(age.into_iter()).collect();
for (key, value) in &people_ages {
println!("{}: {}", key, value);
}
At the end of the intro of the documentation, it is mentioned that the implementation relies on a C++ implementation of SwissTables.
This page contains illustrations about two variants: « flat » and « node » based.
The main difference between these two variants is pointer stability.
In the « node » based version, the key-value pairs, once inserted, keep their address in memory even if the hash is reorganised.
In the « flat » version, some insertions/removals can make the previous key-value pairs be moved in memory.
When it comes to the Rust implementation, I am not experienced enough to be certain of any specific detail, but I tried this simple example based on yours.
use std::collections::HashMap;
fn main() {
let name = vec![
String::from("Charlie"),
String::from("Winston"),
String::from("Brian"),
String::from("Jack"),
];
let age = vec![50, 5, 7, 21];
let mut people_ages: HashMap<String, i32> =
name.into_iter().zip(age.into_iter()).collect();
let mut keys = Vec::new();
let mut values = Vec::new();
for (key, value) in &people_ages {
keys.push(key);
values.push(value);
let key_addr = key as *const String as usize;
let value_addr = value as *const i32 as usize;
println!("{:x} {:x} {}: {}", key_addr, value_addr, key, value);
}
// people_ages.insert("Bob".to_owned(), 4); // mutable and immutable borrow
println!("keys: {:?}", keys);
println!("values: {:?}", values);
}
/*
55e08ff8bd40 55e08ff8bd58 Brian: 7
55e08ff8bd20 55e08ff8bd38 Charlie: 50
55e08ff8bd00 55e08ff8bd18 Winston: 5
55e08ff8bce0 55e08ff8bcf8 Jack: 21
keys: ["Brian", "Charlie", "Winston", "Jack"]
values: [7, 50, 5, 21]
*/
The commented out line (insertion) is rejected because we cannot alter the hashmap while keeping references to its content.
Thus, I guess (I'm not certain) that the implementation does not rely on the « node » based variant since we cannot take benefit of the pointer stability it provides (due to the ownership model in Rust), and probably it relies on the « flat » variant.
This means that we can expect that the key-value pairs associated with the same hash are tightly packed in memory, and iterating over them should be very similar to iterating over a vector: regular progression (with some skips however) very friendly with cache prefetch.
Printing the addresses tends to confirm that guess (however the test is not complete enough), and shows a backward progression.
Related
ORIGINAL QUESTION:
I am currently trying to write a library in rust - to be compiled to WASM - for converting a bip39 mnemonic passphrase into an Arweave JWK. I am currently using tiny-bip39 and RSA.
When generating a private key using RSA as per the example given on RSA I want to seed the rng based on the mnemonic passphrase I have passed into the function. I tried achieving this by simply getting the seed from the mnemonic object generated by tiny-bip39, however this seems to generate a &[u8] with a length of 64. However, Seed is defined as [u8; 32], and without having to write my own rng, I cannot figure out how to use a len 64 seed.
#[wasm_bindgen]
pub fn get_key_from_mnemonic(phrase: &str) {
let mnemonic = Mnemonic::from_phrase(phrase, Language::English).unwrap();
assert_eq!(phrase, mnemonic.phrase());
let seed = Seed::new(&mnemonic, "");
let seed_bytes = seed.as_bytes();
let mut rng = ChaCha12Rng::from_seed(seed_bytes);
[...]
}
Is there a cryptographically secure rng that allows for len 64 seed?
I tried simply trying into, but that did not seem to work, which makes sense.
let seed_bytes: <ChaCha12Rng as SeedableRng>::Seed = seed.as_bytes().try_into().unwrap();
EDIT:
I came up with a solution that seem to work in every way except the random number generation.
let mnemonic = Mnemonic::from_phrase(phrase, Language::English).unwrap();
assert_eq!(phrase, mnemonic.phrase());
let seed = Seed::new(&mnemonic, "");
let seed_bytes = seed.as_bytes();
let mut seed_buf: [u8; 32] = Default::default();
let mut hmac_drgb = HmacDRBG::<Sha256>::new(&seed_bytes, &[], &[]);
hmac_drgb.generate_to_slice(&mut seed_buf, None);
let mut chacha = ChaCha20Rng::from_seed(seed_buf);
let modulus_length = 4098;
let rsa_private_key = RsaPrivateKey::new(&mut chacha, modulus_length).unwrap();
let der = rsa_private_key.to_pkcs1_der().unwrap();
let jwk = JWK {
modulus: der.private_key().modulus.as_bytes().to_vec(),
public_exponent: der.private_key().public_exponent.as_bytes().to_vec(),
private_exponent: der.private_key().private_exponent.as_bytes().to_vec(),
prime1: der.private_key().prime1.as_bytes().to_vec(),
prime2: der.private_key().prime2.as_bytes().to_vec(),
exponent1: der.private_key().exponent1.as_bytes().to_vec(),
exponent2: der.private_key().exponent2.as_bytes().to_vec(),
coefficient: der.private_key().coefficient.as_bytes().to_vec(),
};
As I am trying to rewrite some of the functionality provided by arweave-mnemonic-keys, I have tried to go through all of the dependencies, figuring out which rust modules I need, and think I have managed to figure out everything except how to generate the random numbers for the RSA algorithm.
I have tried looking through the node-forge/lib/rsa.js file, and found this snippet:
function generateRandom(bits, rng) {
var num = new BigInteger(bits, rng);
// force MSB set
var bits1 = bits - 1;
if(!num.testBit(bits1)) {
num.bitwiseTo(BigInteger.ONE.shiftLeft(bits1), op_or, num);
}
// align number on 30k+1 boundary
num.dAddOffset(31 - num.mod(THIRTY).byteValue(), 0);
return num;
}
However, I am not sure how to reproduce this in rust. So far I have tried to use ChaCha8Rng, ChaCha12Rng, ChaCha20Rng, and Pcg64, none of which produces the wanted result.
It depends on the CSPRNG. If you were seeding an HMAC DRBG using HMAC-SHA-512, then this would be a perfectly normal amount of input. However, in your case, the CSPRNG is ChaCha, which is configured to have a 256-bit key.
If this mnemonic has been generated from a CSPRNG and has sufficient entropy, then all you need is a simple, straightforward key derivation function like HKDF. You can use HKDF with SHA-256 or SHA-512, with the seed as the input keying material, no salt, and an output keying material which is 32 bytes in size. Then, you can use that to seed your CSPRNG.
You will also need an info string, which is usually some text string for the purpose. I like to use a version number to make things future proof, so you could use something like "v1 PRNG seed".
My recommendation here is that since you have a 64-byte input seed, that HKDF using SHA-512 is best, since it avoids losing entropy if you end up needing to seed other data. Also, while ChaCha12Rng is the default, ChaCha20Rng is more conservative and would probably be appropriate for generating a long term key.
I have an enum type in my Rust program of which some variants may contain inner data.
enum MyEnum {
A,
B(u64),
C(SmallStruct),
D(Box<LargeStruct>)
}
This enum is going to be stored tens of thousands of times and memory usage is an issue. I would like to avoid accidentally adding a very large variant for the enum. Is there a way that I can tell the compiler to limit the size of an enum instance in memory?
As of Rust 1.57 you can use asserts in a const context, so this kind of check will work:
// assert that MyEnum is no larger than 16 bytes
const _ASSERT_SMALL: () = const_assert(mem::size_of::<MyEnum>() <= 16);
Playground
Original answer follow for historical reference.
As noted in the other answer, you can use the const_assert! macro, but it will require an external crate, static_assertions. If you're looking for a std-only solution and can live with the uglier error message when the assertion fails, you can use this:
#[deny(const_err)]
const fn const_assert(ok: bool) {
0 - !ok as usize;
}
// assert that MyEnum is no larger than 16 bytes
const _ASSERT_SMALL: () = const_assert(mem::size_of::<MyEnum>() <= 16);
Playground
You can read about this technique, along with ways to improve it, in the article written by the author of the static_assertions crate.
EDIT: Link to original article is non-functional, web archive version
You could use const_assert! and mem::size_of to assert that your enum is less than or equal to a certain size.
I need structures with fixed maximum size, so the obvious choice seem to be arrayvec crate. However, I'm stuck when ArrayVec is a member of a structure that later needs to be partially initialised:
use arrayvec::ArrayVec; // 0.4.7
#[derive(Debug)]
struct Test {
member_one: Option<u32>,
member_two: ArrayVec<[u16; 5]>,
}
pub fn main() {
let mut test = Test {
member_one: Some(45678),
member_two: [1, 2, 3], // <- What to do here to initialise only 3 elements?
};
print!("{:?}", test);
}
I'd like to initialise the first three elements of the ArrayVec as it's perfectly capable of holding any number of elements from zero to 5 (in my example), but I can't figure out how to do it.
You can collect into an ArrayVec from an iterator:
let mut test = Test {
member_one: Some(45678),
member_two: [1, 2, 3].into_iter().collect(),
};
ArrayVec does not offer a one-step method to do this. Instead, create the ArrayVec and then add values to it, in any of the ways you can add values:
let mut member_two = ArrayVec::new();
member_two.extend([1, 2, 3].iter().cloned());
let test = Test {
member_one: Some(45678),
member_two,
};
I would like to know if an update operation on a mutable map is better in performance than reassignment.
Lets assume I have the following Map
val m=Map(1 -> Set("apple", "banana"),
2 -> Set("banana", "cabbage"),
3 -> Set("cabbage", "dumplings"))
which I would like to reverse into this map:
Map("apple" -> Set(1),
"banana" -> Set(1, 2),
"cabbage" -> Set(2, 3),
"dumplings" -> Set(3))
The code to do so is:
def reverse(m:Map[Int,Set[String]])={
var rm = Map[String,Set[Int]]()
m.keySet foreach { k=>
m(k) foreach { e =>
rm = rm + (e -> (rm.getOrElse(e, Set()) + k))
}
}
rm
}
Would it be more efficient to use the update operator on a map if it is very large in size?
The code using the update on map is as follows:
def reverse(m:Map[Int,Set[String]])={
var rm = scala.collection.mutable.Map[String,Set[Int]]()
m.keySet foreach { k=>
m(k) foreach { e =>
rm.update(e,(rm.getOrElse(e, Set()) + k))
}
}
rm
}
I ran some tests using Rex Kerr's Thyme utility.
First I created some test data.
val rndm = new util.Random
val dna = Seq('A','C','G','T')
val m = (1 to 4000).map(_ -> Set(rndm.shuffle(dna).mkString
,rndm.shuffle(dna).mkString)).toMap
Then I timed some runs with both the immutable.Map and mutable.Map versions. Here's an example result:
Time: 2.417 ms 95% CI 2.337 ms - 2.498 ms (n=19) // immutable
Time: 1.618 ms 95% CI 1.579 ms - 1.657 ms (n=19) // mutable
Time 2.278 ms 95% CI 2.238 ms - 2.319 ms (n=19) // functional version
As you can see, using a mutable Map with update() has a significant performance advantage.
Just for fun I also compared these results with a more functional version of a Map reverse (or what I call a Map inverter). No var or any mutable type involved.
m.flatten{case(k, vs) => vs.map((_, k))}
.groupBy(_._1)
.mapValues(_.map(_._2).toSet)
This version consistently beat your immutable version but still doesn't come close to the mutable timings.
The trade-of between mutable and immutable collections usually narrows down to this:
immutable collections are safer to share and allows to use structural sharing
mutable collections have better performance
Some time ago I did comparison of performance between mutable and immutable Maps in Scala and the difference was about 2 to 3 times in favor of mutable ones.
So, when performance is not critical I usually go with immutable collections for safety and readability.
For example, in your case functional "scala way" of performing this transformation would be something like this:
m.view
.flatMap(x => x._2.map(_ -> x._1)) // flatten map to lazy view of String->Int pairs
.groupBy(_._1) // group pairs by String part
.mapValues(_.map(_._2).toSet) // extract all Int parts into Set
Although I used lazy view to avoid creating intermediate collections, groupBy still internally creates mutable map (you may want to check it's sources, the logic is pretty similar to what you have wrote), which in turn gets converted to immutable Map which then gets discarded by mapValues.
Now, if you want to squeeze every bit of performance you want to use mutable collections and do as little updates of immutable collections as possible.
For your case is means having Map of mutable Sets as you intermediate buffer:
def transform(m:Map[Int, Set[String]]):Map[String, Set[Int]] = {
val accum:Map[String, mutable.Set[Int]] =
m.valuesIterator.flatten.map(_ -> mutable.Set[Int]()).toMap
for ((k, vals) <- m; v <- vals) {
accum(v) += k
}
accum.mapValues(_.toSet)
}
Note, I'm not updating accum once it's created: I'm doing exactly one map lookup and one set update for each value, while in both your examples there was additional map update.
I believe this code is reasonably optimal performance wise. I didn't perform any tests myself, but I highly encourage you to do that on your real data and post results here.
Also, if you want to go even further, you might want to try mutable BitSet instead of Set[Int]. If ints in your data are fairly small it might yield some minor performance increase.
Just using #Aivean method in a functional way:
def transform(mp :Map[Int, Set[String]]) = {
val accum = mp.values.flatten
.toSet.map( (_-> scala.collection.mutable.Set[Int]())).toMap
mp.map {case(k,vals) => vals.map( v => accum(v)+=k)}
accum.mapValues(_.toSet)
}
I'm trying to make a function that defines a vector that varies based on the function's input, and set! works great for this in Scheme. Is there a functional equivalent for this in OCaml?
I agree with sepp2k that you should expand your question, and give more detailed examples.
Maybe what you need are references.
As a rough approximation, you can see them as variables to which you can assign:
let a = ref 5;;
!a;; (* This evaluates to 5 *)
a := 42;;
!a;; (* This evaluates to 42 *)
Here is a more detailed explanation from http://caml.inria.fr/pub/docs/u3-ocaml/ocaml-core.html:
The language we have described so far is purely functional. That is, several evaluations of the same expression will always produce the same answer. This prevents, for instance, the implementation of a counter whose interface is a single function next : unit -> int that increments the counter and returns its new value. Repeated invocation of this function should return a sequence of consecutive integers — a different answer each time.
Indeed, the counter needs to memorize its state in some particular location, with read/write accesses, but before all, some information must be shared between two calls to next. The solution is to use mutable storage and interact with the store by so-called side effects.
In OCaml, the counter could be defined as follows:
let new_count =
let r = ref 0 in
let next () = r := !r+1; !r in
next;;
Another, maybe more concrete, example of mutable storage is a bank account. In OCaml, record fields can be declared mutable, so that new values can be assigned to them later. Hence, a bank account could be a two-field record, its number, and its balance, where the balance is mutable.
type account = { number : int; mutable balance : float }
let retrieve account requested =
let s = min account.balance requested in
account.balance <- account.balance -. s; s;;
In fact, in OCaml, references are not primitive: they are special cases of mutable records. For instance, one could define:
type 'a ref = { mutable content : 'a }
let ref x = { content = x }
let deref r = r.content
let assign r x = r.content <- x; x
set! in Scheme assigns to a variable. You cannot assign to a variable in OCaml, at all. (So "variables" are not really "variable".) So there is no equivalent.
But OCaml is not a pure functional language. It has mutable data structures. The following things can be assigned to:
Array elements
String elements
Mutable fields of records
Mutable fields of objects
In these situations, the <- syntax is used for assignment.
The ref type mentioned by #jrouquie is a simple, built-in mutable record type that acts as a mutable container of one thing. OCaml also provides ! and := operators for working with refs.