I have a set of data that alternates between A and B. These are all valid choices:
A -> B -> A
A -> B -> A -> B
B -> A -> B
B -> A -> B -> A
I want to leverage the type system to make sure the alternating property is checked at compile time while maintaining good performance.
Solution 1: linked list
struct A {
// data
next: Option<B>,
}
struct B {
// data
next: Option<Box<A>>,
}
The problem is that the performance of this data structure will be poor at best. Linked lists have frequent cache misses, and for iterating the data structure this is quite bad.
Solution 2: Vec + enum
enum Types {
A(DataA),
B(DataB),
}
type Data = Vec<Types>;
With this solution, cache locality is much better, so yay for performance. However, this does not prevent putting 2 As side-by-side. There is also the fact that one needs to check the type at each iteration, while it is not needed because of the informal definition.
Solution 3: Combination
struct A {
// data, default in first link = empty
b: Option<B>,
}
struct B {
// data
}
type Data = Vec<A>;
This combines the cache locality of the Vec with the type verification of the linked list. It is quite ugly, and one needs to check the first value to verify if it really is an A, or an empty container for the next B.
The question
Is there a data structure that allows compile-time type verification, while maintaining cache locality and avoiding extra allocation?
To store alternating types in a way that the type system enforces and has reasonable efficiency, you can use a tuple: Vec<(X, Y)>.
Your situation also requires
Storing an extra leading value in an Option to handle starting with Y
Storing an extra trailing value in an Option to handle ending with X
use either::Either; // 1.5.2
use std::iter;
#[derive(Debug, Default)]
struct Data<X, Y> {
head: Option<Y>,
pairs: Vec<(X, Y)>,
tail: Option<X>,
}
impl<X, Y> Data<X, Y> {
fn iter(&self) -> impl Iterator<Item = Either<&X, &Y>> {
let head = self.head.iter().map(Either::Right);
let pairs = self.pairs.iter().flat_map(|(a, b)| {
let a = iter::once(Either::Left(a));
let b = iter::once(Either::Right(b));
a.chain(b)
});
let tail = self.tail.iter().map(Either::Left);
head.chain(pairs).chain(tail)
}
}
That being said, you are going to have ergonomic issues somewhere. For example, you can't just push an Either<X, Y> because the previously pushed value might be of the same type. Creating the entire structure at once might be the simplest direction:
#[derive(Debug)]
struct A;
#[derive(Debug)]
struct B;
fn main() {
let data = Data {
head: Some(B),
pairs: vec![(A, B)],
tail: None,
};
println!("{:?}", data.iter().collect::<Vec<_>>());
}
Is there a data structure that allows compile-time type verification, while maintaining cache locality and avoiding extra allocation?
You can use Rust's type system to enforce that items of each type are added in alternating order. The general strategy is to capture the type of the first item and also the previous item in the type of the whole structure and make different methods available according to the "current" type. When the previous item was an X, only the methods for adding a Y will be available, and vice versa.
I'm using two Vecs rather than a Vec of tuples. Depending on your data types, this could result in better memory adjacency, but that really depends on how you end up iterating.
use std::marker::PhantomData;
use std::fmt;
struct Left;
struct Right;
struct Empty;
struct AlternatingVec<L, R, P = Empty, S = Empty> {
lefts: Vec<L>,
rights: Vec<R>,
prev: PhantomData<P>,
start: PhantomData<S>,
}
impl<L, R> AlternatingVec<L, R, Empty, Empty> {
pub fn new() -> Self {
AlternatingVec {
lefts: Vec::new(),
rights: Vec::new(),
prev: PhantomData,
start: PhantomData,
}
}
}
The types Left, Right and Empty are for "tagging" if the previous and start values correspond to values in the left or right collection, or if that collection is empty. Initially both collections are empty, so both P (the previously added value) and S (the start value) are Empty.
Next, a utility method for changing the types. It doesn't look like it does much, but we'll use it in combination with type inference to produce copies of the data structure, but with the phantom types changed.
impl<L, R, P, S> AlternatingVec<L, R, P, S> {
fn change_type<P2, S2>(self) -> AlternatingVec<L, R, P2, S2> {
AlternatingVec {
lefts: self.lefts,
rights: self.rights,
prev: PhantomData,
start: PhantomData,
}
}
}
In practice, the compiler is smart enough that this method does nothing at runtime.
These two traits define operations on the left and right collections respectively:
trait LeftMethods<L, R, S> {
fn push_left(self, val: L) -> AlternatingVec<L, R, Left, S>;
}
trait RightMethods<L, R, S> {
fn push_right(self, val: R) -> AlternatingVec<L, R, Right, S>;
}
We will implement those for the times we want them to be callable: RightMethods should only be available if the previous item was a "left" or if there are no items added so far. LeftMethods should be implemented if the previous items was a "right" or if there are no items added so far.
impl<L, R> LeftMethods<L, R, Left> for AlternatingVec<L, R, Empty, Empty> {
fn push_left(mut self, val: L) -> AlternatingVec<L, R, Left, Left> {
self.lefts.push(val);
self.change_type()
}
}
impl<L, R, S> LeftMethods<L, R, S> for AlternatingVec<L, R, Right, S> {
fn push_left(mut self, val: L) -> AlternatingVec<L, R, Left, S> {
self.lefts.push(val);
self.change_type()
}
}
impl<L, R> RightMethods<L, R, Right> for AlternatingVec<L, R, Empty, Empty> {
fn push_right(mut self, val: R) -> AlternatingVec<L, R, Right, Right> {
self.rights.push(val);
self.change_type()
}
}
impl<L, R, S> RightMethods<L, R, S> for AlternatingVec<L, R, Left, S> {
fn push_right(mut self, val: R) -> AlternatingVec<L, R, Right, S> {
self.rights.push(val);
self.change_type()
}
}
These methods don't do much except call push on the correct inner Vec, and then use change_type to make the type reflect the signature.
The compiler forces you to call push_left and push_right alternately:
fn main() {
let v = AlternatingVec::new()
.push_left(true)
.push_right(7)
.push_left(false)
.push_right(0)
.push_left(false);
}
This complex structure leads to a lot more work in general. For example, Debug is fiddly to implement in a nice way. I made a version with a Debug impl, but it's getting a bit too long for Stack Overflow. You can see it here:
https://gist.github.com/peterjoel/2ffe8b7f5ad7c649f61c580ac7dabc67
It's quite common to compare data with precedence, for a struct which has multiple members which can be compared, or for a sort_by callback.
// Example of sorting a: Vec<[f64; 2]>, sort first by y, then x,
xy_coords.sort_by(
|co_a, co_b| {
let ord = co_a[1].cmp(&co_b[1]);
if ord != std::cmp::Ordering::Equal {
ord
} else {
co_a[0].cmp(&co_b[0])
}
}
);
Is there a more straightforward way to perform multiple cmp functions, where only the first non-equal result is returned?
perform multiple cmp functions, where only the first non-equal result is returned
That's basically how Ord is defined for tuples. Create a function that converts your type into a tuple and compare those:
fn main() {
let mut xy_coords = vec![[1, 0], [-1, -1], [0, 1]];
fn sort_key(coord: &[i32; 2]) -> (i32, i32) {
(coord[1], coord[0])
}
xy_coords.sort_by(|a, b| {
sort_key(a).cmp(&sort_key(b))
});
}
Since that's common, there's a method just for it:
xy_coords.sort_by_key(sort_key);
It won't help your case, because floating point doesn't implement Ord.
One of many possibilities is to kill the program on NaN:
xy_coords.sort_by(|a, b| {
sort_key(a).partial_cmp(&sort_key(b)).expect("Don't know how to handle NaN")
});
See also
Using max_by_key on a vector of floats
How to do a binary search on a Vec of floats?
There are times when you may not want to create a large tuple to compare values which will be ignored because higher priority values will early-exit the comparison.
Stealing a page from Guava's ComparisonChain, we can make a small builder that allows us to use closures to avoid extra work:
use std::cmp::Ordering;
struct OrdBuilder<T> {
a: T,
b: T,
ordering: Ordering,
}
impl<T> OrdBuilder<T> {
fn new(a: T, b: T) -> OrdBuilder<T> {
OrdBuilder {
a: a,
b: b,
ordering: Ordering::Equal,
}
}
fn compare_with<F, V>(mut self, mut f: F) -> OrdBuilder<T>
where F: for <'a> FnMut(&'a T) -> V,
V: Ord,
{
if self.ordering == Ordering::Equal {
self.ordering = f(&self.a).cmp(&f(&self.b));
}
self
}
fn finish(self) -> Ordering {
self.ordering
}
}
This can be used like
struct Thing {
a: u8,
}
impl Thing {
fn b(&self) -> u8 {
println!("I'm slow!");
42
}
}
fn main() {
let a = Thing { a: 0 };
let b = Thing { a: 1 };
let res = OrdBuilder::new(&a, &b)
.compare_with(|x| x.a)
.compare_with(|x| x.b())
.finish();
println!("{:?}", res);
}
I'm trying to understand nested structs in go, so I made a little test: (playground)
type A struct {
a string
}
type B struct {
A
b string
}
func main() {
b := B{A{"a val"}, "b val"}
fmt.Printf("%T -> %v\n", b, b) // B has a nested A and some values
// main.B -> {{a val} b val}
fmt.Println("b.b ->", b.b) // B's own value
// b.b -> b val
fmt.Println("b.A.a ->", b.A.a) // B's nested value
// b.a -> a val
fmt.Println("b.a ->", b.a) // B's nested value? or own value?
// b.a -> a val
}
So how and why the last two lines work? Are they are same? Which should I use?
They are the same. See the Go Spec on selectors:
For a value x of type T or *T where T is not a pointer or interface type, x.f denotes the field or method at the shallowest depth in T where there is such an f. If there is not exactly one f with shallowest depth, the selector expression is illegal.
Note that this means that b.a is illegal if type B embeds two types with the same field on the same depth:
type A1 struct{ a string }
type A2 struct{ a string }
type B struct {
A1
A2
}
// ...
b := B{A1{"a1"}, A2{"a2"}}
fmt.Println(b.a) // Error: ambiguous selector b.a
Playground: http://play.golang.org/p/PTqm-HzBDr.
I'm trying to define a recursive data structure in Rust, but there are some pieces missing in my understanding of Rust and memory - the only thing that I manage to do is pick a fight with the borrow checker.
I have the following stub of a quad tree and want to project one of the quadrants as follows.
use CC::{Node, Leaf};
enum CC {
Node(i32, bool, i32, Rc<CC>, Rc<CC>, Rc<CC>, Rc<CC>),
Leaf(bool),
}
impl CC {
fn nw(&self) -> CC {
match *self {
Node(_, _, _, ref nw, _, _, _) => *nw.clone(),
_ => panic!()
}
}
}
But all I end up with is
src/hashlife.rs:34:47: 34:58 error: cannot move out of borrowed content
src/hashlife.rs:34 Node(_, _, _, ref nw, _, _, _) => *nw.clone(),
^~~~~~~~~~~
You have two options here.
First, you can return a reference to the subtree:
fn nw(&self) -> &CC {
match *self {
Node(_, _, _, ref nw, _, _, _) => &**nw,
_ => panic!()
}
}
Second, you can return a reference-counted pointer:
fn nw(&self) -> Rc<CC> {
match *self {
Node(_, _, _, ref nw, _, _, _) => nw.clone()
_ => panic!()
}
}
You can't return just CC, however, unless you are willing to clone the value itself. The reason is that this would mean moving out of Rc, leaving it in some undefined state, which is rightly prohibited.
Here's a swap function for two-element tuples:
fn swap<A, B>(obj: (A, B)) -> (B, A)
{
let (a, b) = obj;
(b, a)
}
Example use:
fn main() {
let obj = (10i, 20i);
println!("{}", swap(obj));
}
Is there a way to define swap as a method on two-element tuples? I.e. so that it may be called like:
(10i, 20i).swap()
Yes, there is. Just define a new trait and implement it immediately, something like this:
trait Swap<U> {
fn swap(self) -> U;
}
impl<A, B> Swap<(B, A)> for (A, B) {
#[inline]
fn swap(self) -> (B, A) {
let (a, b) = self;
(b, a)
}
}
fn main() {
let t = (1u, 2u);
println!("{}", t.swap());
}
Note that in order to use this method you will have to import Swap trait into every module where you want to call the method.