How does Cons save data (Rust Linked List) - data-structures

I have been learning Rust and decided doing some basic type would help to learn the language at a deeper level.
The "Rust by Example" page for linked lists states
Ln 4: // Cons: Tuple struct that wraps an element and a pointer to the next node
Which I think means that it is recursively creating the list by always making an empty node to populate with Cons.
enum linkedList
{
Head(Head), // Front Pointer and list metrics
Cons(Arc<linkedList>, isize), //(Data, Next Value) Cons is apparently a LISP construct
Tail(isize), // Rear Pointer
Nil //Used to drop the stream
}
My real question is what is the underlying mechanism that allows data to be stored in the Arc<linkedList> node? I thought it would take a generic (<T>) to store the data on the list but apparently this is incorrect.
p.s I am under the impression ARC and BOX smart pointers are interchangeable but used for different purposes. I was trying to make a thread safe version of a single ended rollover safe linked list, sort of like a circular queue.

Your implementation slightly deviates from the standard definition of Cons lists. The straight-forward definition (similar to how you would do it in Lisp) is
type Data = isize;
enum List {
Nil,
Cons(Box<List>, Data),
}
As you can see, the Cons variant is made up of the nested list and this nodes data element. In your case, each node has an isize.
If you have an Arc<List> or Box<List>, this nested List object could be the Cons variant too and carry another isize. Arc and Box don't care about what they are pointing to.
There are some things that are not quite idiomatic. Having both the Tail and Nil variant doesn't make much sense, because now you have two ways to signal the end of the list. Similarly, having the Head be a variant of the list is strange, because the head of the list is only at the beginning, your implementation allows for a Head variant in the middle of the list though.
It is preferable to not have an extra Nil node to signal the end of the list. Instead, the last node knows that it is the last (like your Tail variant) so that you don't have an extra allocation for the empty node. This is what I would perceive as an idiomatic definition of a singly linked list in Rust:
struct List {
// ...
// extra info for list head
// ...
first_node: Option<Box<ListNode>>,
}
struct ListNode {
value: Data,
next: Option<Box<ListNode>>,
}
To make this generic, we simply have to remove our type alias Data from earlier and make it a generic parameter instead:
struct ListNode<T> {
value: T,
next: Option<Box<ListNode<T>>>,
}
About Box and Arc: Box is an owned pointer to a value on the heap, that means only this one box owns the memory it points to. Arc is a thread safe version of Rc, which is a reference-counted heap value. Multiple Rc-pointers to this memory can exist and read it, the number of these is counted. Thus, the ownership may be shared.
You should pick which to use, based on whether or not you want to create more reference-counted pointers to the nodes (note that you probably want either Rc<RefCell<Node>> or Arc<Mutex<Node>> to also mutate the list after creating it). If you only ever have the head of the list and want to iterate over it, pick Box instead.

Related

Java-8 stream expression to 'OR' several enum values together

I am aggregating a bunch of enum values (different from the ordinal values) in a foreach loop.
int output = 0;
for (TestEnum testEnum: setOfEnums) {
output |= testEnum.getValue();
}
Is there a way to do this in streams API?
If I use a lambda like this in a Stream<TestEnum> :
setOfEnums.stream().forEach(testEnum -> (output |= testEnum.getValue());
I get a compile time error that says, 'variable used in lambda should be effectively final'.
Predicate represents a boolean valued function, you need to use reduce method of stream to aggregate bunch of enum values.
if we consider that you have HashSet as named SetOfEnums :
//int initialValue = 0; //this is effectively final for next stream pipeline if you wont modify this value in that stream
final int initialValue = 0;//final
int output = SetOfEnums.stream()
.map(TestEnum::getValue)
.reduce(initialValue, (e1,e2)-> e1|e2);
You nedd to reduce stream of enums like this:
int output = Arrays.stream(TestEnum.values()).mapToInt(TestEnum::getValue).reduce(0, (acc, value) -> acc | value);
I like the recommendations to use reduction, but perhaps a more complete answer would illustrate why it is a good idea.
In a lambda expression, you can reference variables like output that are in scope where the lambda expression is defined, but you cannot modify the values. The reason for that is that, internally, the compiler must be able to implement your lambda, if it chooses to do so, by creating a new function with your lambda as its body. The compiler may choose to add parameters as needed so that all of the values used in this generated function are available in the parameter list. In your case, such a function would definitely have the lambda's explicit parameter, testEnum, but because you also reference the local variable output in the lambda body, it could add that as a second parameter to the generated function. Effectively, the compiler might generate this function from your lambda:
private void generatedFunction1(TestEnum testEnum, int output) {
output |= testEnum.getValue();
}
As you can see, the output parameter is a copy of the output variable used by the caller, and the OR operation would only be applied to the copy. Since the original output variable wouldn't be modified, the language designers decided to prohibit modification of values passed implicitly to lambdas.
To get around the problem in the most direct way, setting aside for the moment that the use of reduction is a far better approach, you could wrap the output variable in a wrapper (e.g. an int[] array of size 1 or an AtomicInteger. The wrapper's reference would be passed by value to the generated function, and since you would now update the contents of output, not the value of output, output remains effectively final, so the compiler won't complain. For example:
AtomicInteger output = new AtomicInteger();
setOfEnums.stream().forEach(testEnum -> (output.set(output.get() | testEnum.getValue()));
or, since we're using AtomicInteger, we may as well make it thread-safe in case you later choose to use a parallel Stream,
AtomicInteger output = new AtomicInteger();
setOfEnums.stream().forEach(testEnum -> (output.getAndUpdate(prev -> prev | testEnum.getValue())));
Now that we've gone over an answer that most resembles what you asked about, we can talk about the superior solution of using reduction, that other answers have already recommended.
There are two kinds of reduction offered by Stream, stateless reduction (reduce(), and stateful reduction (collect()). To visualize the difference, consider a conveyer belt delivering hamburgers, and your goal is to collect all of the hamburger patties into one big hamburger. With stateful reduction, you would start with a new hamburger bun, and then collect the patty out of each hamburger as it arrives, and you add it to the stack of patties in the hamburger bun you set up to collect them. In stateless reduction, you start out with an empty hamburger bun (called the "identity", since that empty hamburger bun is what you end up with if the conveyer belt is empty), and as each hamburger arrives on the belt, you make a copy of the previous accumulated burger and add the patty from the new one that just arrived, discarding the previous accumulated burger.
The stateless reduction may seem like a huge waste, but there are cases when copying the accumulated value is very cheap. One such case is when accumulating primitive types -- primitive types are very cheap to copy, so stateless reduction is ideal when crunching primitives in applications such as summing, ORing, etc.
So, using stateless reduction, your example might become:
setOfEnums.stream()
.mapToInt(TestEnum::getValue) // or .mapToInt(testEnum -> testEnum.getValue())
.reduce(0, (resultSoFar, testEnum) -> resultSoFar | testEnum);
Some points to ponder:
Your original for loop is probably faster than using streams, except perhaps if your set is very large and you use parallel streams. Don't use streams for the sake of using streams. Use them if they make sense.
In my first example, I showed the use of Stream.forEach(). If you ever find yourself creating a Stream and just calling forEach(), it is more efficient just to call forEach() on the collection directly.
You didn't mention what kind of Set you are using, but I hope you are using EnumSet<TestEnum>. Because it is implemented as a bit field, It performs much better (O(1)) than any other kind of Set for all operations, even copying. EnumSet.noneOf(TestEnum.class) creates an empty Set, EnumSet.allOf(TestEnum.class) gives you a set of all enum values, etc.

Is c++11 operator[] equivalent to emplace on map insertion?

For C++11, is there still a performance difference between the following?
(for std::map<Foo, std::vector<Bar> > as an example)
map[key] = myVector and map.emplace(key, myVector)
The part I'm not figuring out is the exact internal of operator[]. My understanding so far has been (when key doesn't exist):
Create a new key and the associated empty default vector in place inside the map
Return the reference of the associated empty vector
Assign myVector to the reference???
The point 3 is the part I couldn't understand, how can you assign a new value to a reference in the first place?
Though I cannot sort through point 3 I think somehow there's just a copy/move required. Assuming C++11 will be smart enough to know it's gonna be a move operation, is this whole "[]" assignment then already cheaper than insert()? Is it almost equivalent to emplace()? ---- default construction and move content over, versus construct vector with content directly in place?
There are a lot of differences between the two.
If you use operator[], then the map will default construct the value. The return value from operator[] will be this default constructed object, which will then use operator= to assign to it.
If you use emplace, the map will directly construct the value with the parameters you provide.
So the operator[] method will always use two-stage construction. If the default constructor is slow, or if copy/move construction is faster than copy/move assignment, then it could be problematic.
However, emplace will not replace the value if the provided key already exists. Whereas operator[] followed by operator= will always replace the value, whether there was one there or not.
There are other differences too. If copying/moving throws, emplace guarantees that the map will not be changed. By contrast, operator[] will always insert a default constructed element. So if the later copy/move assignment fails, then the map has already been changed. That key will exist with a default constructed value_type.
Really, performance is not the first thing you should be thinking about when deciding which one to use. You need to focus first on whether it has the desired behavior.
C++17 will provide insert_or_assign, which has the effect of map[] = v;, but with the exception safety of insert/emplace.
how can you assign a new value to a reference in the first place?
It's fundamentally no different from assigning to any non-const reference:
int i = 5;
int &j = i;
j = 30;
i == 30; //This is true.

Move assignment operator, move constructor

I've been trying to nail down the rule of 5, but most of the information online is vastly over-complicated, and the example codes differ.
Even my textbook doesn't cover this topic very well.
On move semantics:
Templates, rvalues and lvalues aside, as I understand it, move semantics are simply this:
int other = 0; //Initial value
int number = 3; //Some data
int *pointer1 = &number; //Source pointer
int *pointer2 = &other; //Destination pointer
*pointer2 = *pointer1; //Both pointers now point to same data
pointer1 = nullptr; //Pointer2 now points to nothing
//The reference to 'data' has been 'moved' from pointer1 to pointer2
As apposed to copying, which would be the equivalent of something like this:
pointer1 = &number; //Reset pointer1
int newnumber = 0; //New address for the data
newnumber = *pointer1; //Address is assigned value
pointer2 = &newnumber; //Assign pointer to new address
//The data from pointer1 has been 'copied' to pointer2, at the address 'newnumber'
No explanation of rvalues, lvalues or templates is necessary, I would go as far as to say those topics are unrelated.
The fact that the first example is faster than the second, should be a given. And I would also point out that any efficient code prior to C++ 11 will do this.
To my understanding, the idea was to bundle all of this behavior in a neat little operator move() in std library.
When writing copy constructors and copy assignment operators, I simply do this:
Text::Text(const Text& copyfrom) {
data = nullptr; //The object is empty
*this = copyfrom;
}
const Text& Text::operator=(const Text& copyfrom) {
if (this != &copyfrom) {
filename = copyfrom.filename;
entries = copyfrom.entries;
if (copyfrom.data != nullptr) { //If the object is not empty
delete[] data;
}
data = new std::string[entries];
for (int i = 0; i < entries; i++) {
data[i] = copyfrom.data[i];
//std::cout << data[i];
}
std::cout << "Data is assigned" << std::endl;
}
return *this;
}
The equivalent, one would think, would be this:
Text::Text(Text&& movefrom){
*this = movefrom;
}
Text&& Text::operator=(Text&& movefrom) {
if (&movefrom != this) {
filename = movefrom.filename;
entries = movefrom.entries;
data = movefrom.data;
if (data != nullptr) {
delete[] data;
}
movefrom.data = nullptr;
movefrom.entries = 0;
}
return std::move(*this);
}
I'm quite certain this won't work, so my question is: How do you achieve this type of constructor functionality with move semantics?
It's not entirely clear to me what is supposed to be proved by your code examples -- or what the focus is of this question is.
Is it conceptually what does the phrase 'move semantics' mean in C++?
Is it "how do I write move ctors and move assignment operators?" ?
Here is my attempt to introduce the concept. If you want to see code examples then look at any of the other SO questions that were linked in comments.
Intuitively, in C and C++ an object is supposed to represent a piece of data residing in memory. For any number of reasons, commonly you want to send that data somewhere else.
Often one can take a direct approach of simply passing a pointer / reference to the object to the place where the data is needed. Then, it can be read using the pointer. Taking the pointer and moving the pointer around is very cheap, so this is often very efficient. The chief drawback is that you have to ensure that the object will live for as long as is needed, or you get a dangling pointer / reference and a crash. Sometimes that's easy to ensure, sometimes its not.
When it isn't, one obvious alternative is to make a copy and pass it (pass-by-value) rather than passing by reference. When the place where the data is needed has its own personal copy of the data, it can ensure that the copy stays around as long as is needed. The chief drawback here is that you have to make a copy, which may be expensive if the object is big.
A third alternative is to move the object rather than copying it. When an object is moved, it is not duplicated, and instead becomes available exclusively in the new site, and no longer in the old site. You can only do this when you won't need it at the old site anymore, obviously, but in that case this saves you a copy which can be a big savings.
When the objects are simple, all of these concepts are fairly trivial to actually implement and get right. For instance, when you have a trivial object, that is, one with trivial construction / destruction, it is safe to copy it exactly as you do in the C programming language, using memcpy. memcpy produces a byte-for-byte copy of a block of bytes. If a trivial object was properly initialized, since its creation has no possible side-effects, and its later destruction doesn't either, then memcpy copy is also properly initialized and results in a valid object.
However, in modern C++ many of your objects are not trivial -- they may "own" references to heap memory, and manage this memory using RAII, which ties the lifetime of the object to the usage of some resource. For instance, if you have a std::string as a local variable in a function, the string is not totally a "contiguous" object and rather is connected to two different locations in memory. There is a small, fixed-size (sizeof(std::string), in fact) block on the stack, which contains a pointer and some other info, pointing to a dynamically sized buffer on the heap. Formally, only the small "control" part is the std::string object, but intuitively from the programmer's point the buffer is also "part" of the string and is the part that you usually think about. You can't copy a std::string object like this using memcpy -- think about what will happen if you have std::string s and you try to copy sizeof(std::string) bytes from address &s to get a second string. Instead of two distinct string objects, you'll end up with two control blocks, each pointing to the same buffer. And when the first one is destroyed, that buffer is deleted, so using the second one will cause a segfault, or when the second one is destroyed, you get a double delete.
Generally, copying nontrivial C++ objects with memcpy is illegal and causes undefined behavior. This is because it conflicts with one of the core ideas of C++ which is that object creation and destruction may have nontrivial consequences defined by the programmer using ctors and dtors. Object lifetimes may be used to create and enforce invariants which you use to reason about your program. memcpy is a "dumb" low-level way to just copy some bytes -- potentially it bypasses the mechanisms that enforce the invariants which make your program work, which is why it can cause undefined behavior if used incorrectly.
Instead, in C++ we have copy constructors which you can use to safely make copies of nontrivial objects. You should write these in a way that preserves what invariants you need for your object. The rule of three is a guideline about how to actually do that.
The C++11 "move semantics" idea is a collection of new core language features which were added to extend and refine the traditional copy construction mechanism from C++98. Specifically, it's about, how do we move potentially complex RAII objects, not just trivial objects, which we already were able to move. How do we make the language generate move constructors and such for us automatically when possible, similarly to how it does it for copy constructors. How do we make it use the move options when it can to save us time, without causing bugs in old code, or breaking core assumptions of the language. (This is why I would say that your code example with int's and int *'s has little to do with C++11 move semantics.)
The rule of five, then, is the corresponding extension of the rule of three which describes conditions when you may need to implement a move ctor / move assignment operator also for a given class and not rely on the default behavior of the language.

using new vs. { } when initializing a struct in Go

So i know in go you can initialize a struct two different ways in GO. One of them is using the new keyword which returns a pointer to the struct in memory. Or you can use the { } to make a struct. My question is when is appropriate to use each?
Thanks
I prefer {} when the full value of the type is known and new() when the value is going to be populated incrementally.
In the former case, adding a new parameter may involve adding a new field initializer. In the latter it should probably be added to whatever code is composing the value.
Note that the &T{} syntax is only allowed when T is a struct, array, slice or map type.
Going off of what #Volker said, it's generally preferable to use &A{} for pointers (and this doesn't necessarily have to be zero values: if I have a struct with a single integer in it, I could do &A{1} to initialize the field). Besides being a stylistic concern, the big reason that people normally prefer this syntax is that, unlike new, it doesn't always actually allocate memory in the heap. If the go compiler can be sure that the pointer will never be used outside of the function, it will simply allocate the struct as a local variable, which is much more efficient than calling new.
Most people use A{} to create a zero value of type A, &A{} to create a pointer to a zero value of type A. Using newis only necessary for int and that like as int{} is a no go.

Iterator nested typedefs

I'm creating a custom iterator type, and the only use case right now is std::for_each. But apparently, it's not enough to mimic the pointer interface (I'm only doing forward iteration), there are like, a bajillion nested typedefs. I managed to figure out what to put for iterator_category, but I'm having real trouble figuring out what value_type and pointer and reference should be, because, y'know, I'm not building a container here, it's an iterator. Why would for_each even want to know or care? All it's going to do is forward said on to another function.
If you want to use a type T as an iterator, you must ensure that std::iterator_traits can be specialized for that type. That means you either need to provide the five nested typedefs that it defers to by default, or you need to specialize std::iterator_traits yourself. The five nested typedefs it requires are
difference_type, which is some type that can represent the distance between two iterators (e.g., as would be returned by std::distance)
value_type, which is the type of the object pointed to by the iterator
pointer, which is the return type of the iterator type's operator->. This doesn't necessarily need to be a pointer type and it doesn't necessarily need to be value_type* or value_type const*. For example, if you have an iterator that generates elements, you may not have an object to which you can return a pointer. In that case, you might return an object that wraps the returned element and overloads operator-> itself.
reference, which is the return type of the iterator type's operator*. This doesn't necessarily need to be a reference type and it doesn't necessarily need to be value_type& (or value_type const&). For example, If you're iterating over an immutable range of integers, you might just return the element by value, for performance reasons.
iterator_category, which must be one of the iterator category tags or a type derived from one of those tags: input_iterator_tag, output_iterator_tag, forward_iterator_tag, bidirectional_iterator_tag, and random_access_iterator_tag (all in namespace std). Algorithms can use these to select an optimal algorithm based on the iterator category.
You can't omit any of these; they all have to be defined. That said, sometimes one or more of the typedefs may not make sense. For example, if you have an iterator that generates char elements on the fly, your iterator may not implement operator-> (because char is not a class type). In this case, you might consider just using void for the pointer type, since it should never be used anyway.
value_type is what your iterator iterates over. If iter is an iterator, it's the type of *iter. pointer is the pointer to that, and reference is the reference to that.

Resources