Why does Vec have trait Sync? - data-structures

According to the documentation, Vec<T> implements Sync if T implements Sync. It seems it's generated automatically by some magic, but I feel this is counter-intuitive since a naive implementation for vectors is not thread-safe.
Is Vec<T> in Rust really Sync?

Implementing Sync means that a type guarantees that references to its values can be shared between threads, without risk of a data race in safe Rust.
Values of type &Vec<T> are immutable, so it's always safe to share them. The Rust borrow checker already forbids a mutable reference to exist at the same time as any other reference to the same object so this works automatically as a result of Rust's borrowing rules. Nothing can mutate a Vec while it's shared, so a data race is impossible. Of course, if unsafe code comes into the picture then the guarantees are gone.
Most types are Sync in fact. The ones that aren't (for example RefCell) tend to have interior mutability, or otherwise manage references outside of the control of the compile-time borrow checker.

Related

ARM-SVE: wrapping runtime sized register

In a generic SIMD library eve we were looking into supporting length agnostic sve
However, we cannot wrap a sizeless register into a struct to do some meta-programming around it.
struct foo {
svint8_t a;
};
Is there a way to do it? Either clang or gcc.
I found some talk of __sizeless_struct and some patches flying around but I think it didn't go anywhere.
I also found these gcc tests - no wrapping of a register in a struct.
No, unfortunately this isn't possible (at the time of writing). __sizeless_struct was an experimental feature that Arm added as part of the initial downstream implementation of the SVE ACLE in Clang. The main purpose was to allow tuple types like svfloat32x3_t to be defined directly in <arm_sve.h>. But the feature had complex, counter-trend semantics. It broke one of the fundamental rules of C++, which is that all class objects have a constant size, so it would have been an ongoing maintenance burden for upstream compilers.
__sizeless_struct (or something like it) probably wouldn't be acceptable for a portable SIMD framework, since the sizeless struct would inherit all of the restrictions of sizeless vector types: no global variables, no uses in normal structs, etc. Either all SIMD targets would have to live by those restrictions, or the restrictions would vary by target (limiting portability).
Function-based abstraction might be a better starting point than class-based abstraction for SIMD frameworks that want to support variable-length vectors. Google Highway is an example of this and it works well for SVE.

Is Rust-style ownership and lifetimes possible without Rust-style borrow checking?

Would it be possible for a programming language to consistently have Rust-style ownership and lifetimes (for automatic memory management) while dropping the requirement that only one mutable reference to a piece of data can exist at any time (used to suppress data races)?
In other words, are Rust-style ownership and lifetimes and Rust-style borrow checking two separable concepts? Alternatively, are these two ideas inherently entangled at a semantic level?
Would it be possible for a programming language to consistently have Rust-style ownership and lifetimes (for automatic memory management) while dropping the requirement that only one mutable reference to a piece of data can exist at any time (used to suppress data races)?
A language can do anything so sure.
The problem is that dropping this requirement would be an UB-nest in a language like Rust: if you drop mutable references being unique then they have no purpose so you just have references (always mutable) and the only thing they do is be lexically scoped, this means you can hold a reference to a sub-part of an object, and have a second reference mutate the object such that the sub-part is invalidated (e.g. a reference to a vec item and clearing the vec[0]), and the first reference is now dangling, it points to garbage.
The way to solve that would be to… add a GC? And from that point on the value of "rust-style ownership and references" becomes… limited to nonexistent, because you need a GC non-lexical automated memory management and your references can keep objects alive so having all types be affine by default isn't very useful.
Now what can be useful (and what some languages explore) is for sub-normal types to be opt-in, so types would be normal by default but could be opted into being affine, linear, or even ordered, on a needs basis. This would be solely a type safety measure.
If so, are there any existing languages which achieve this?
Not to my knowledge.
If not, why not?
Because nobody's written one? Affine types by default are useful to Rust but they're not super useful in general so most of the research and design has focused around linear types, which provide more guarantees and are therefore more useful if only a small subset of your types are going to be sub-normal.
[0] which shows that "data races" are not solely about concurrency, it's an entire class of problems which occur commonly in sequential code (e.g. iterator invalidation)

Is a unique_ptr equipped with a function pointer as custom deleter the same size as a shared_ptr?

I know that std::unique_ptr and std::shared_ptr are different classes that address different needs, and therefore asking which one is better is mostly an ill-posed question.
However, as regards their content and performance, without considering the different semantics of the two smart pointers, I have some doubt I want clarify.
My understanding is that a std::unique_ptr contains the raw pointer as its only member variable, and stores the deleter, if any custom one is given, as part of the type; whereas the std::shared_ptr stores in member variables the raw as well as the pointer to a dynamically allocated block which contains the custom deleter, and strong and weak counters.
Scott Meyers, in Effective Modern C++, stresses a lot on this difference in the size that the two smart pointers require (and on the difference in performance in general), however I'd be tempted to say that as soon as a std::unique_ptr is provided with a function pointer as custom deleter it becomes as big as std::shared_ptr, whose size does not increase with the custom deleter.
From there, I would deduce that using a function pointer as a custom deleter for std::unique_ptr basically annihilates the advantage this smart pointer has on the other one, in terms of size/performance.
Is this the case?
It is true (I doubt that this is required by the standard though) that
static_assert(sizeof(std::unique_ptr<int, void(*)(int*)>) == sizeof(std::shared_ptr<int>));
but I don't agree with the conclusion that storing a function pointer as the custom deleter renders the advantages of std::unique_ptr over std::shared_ptr useless. Both types model very different ownership semantics, and choosing one over the other has not so much to do with performance, but rather with how you intend to handle the pointee instances.
Performance-wise, std::unique_ptr will always be more efficient than std::shared_ptr. While this is primarily due to thread-safe reference counting of the latter, it is also true for custom deleters:
std::unique_ptr stores the deleter in-place, i.e., on the stack. Invoking this deleter is likely to be faster than one that lives in a heap-allocated block together with pointee and reference count.
std::unique_ptr also doesn't type-erase the deleter. A lambda that is baked into the type is certainly more efficient than the indirection required to hide the deleter type as in std::shared_ptr.

Is there any technique to implement a cache in a fully immutable programming language?

I checked Haskell but even there they are using mutable data types internally.
Assuming everything is immutable, is there a way to have a cache?
Edit: Assume this is a general purpose cache which is supposed to keep the result of a computation (e.g. reading from a DB)
I came across this post trying to do something similar to what the OP wanted to do, though I'm using Scala. In Scala, variables and objects can be mutable, so you can always use mutable types if you want.
But if not, in Haskell you could use the State monad, combined with Data.Map to model a stateful, mutable cache with immutable data. In Scala you could use the State monad in Cats with an immutable map.

Gc using type information

Does anyone know of a GC algorithm which utilises type information to allow incremental collection, optimised collection, parallel collection, or some other nice feature?
By type information, I mean real semantics. Let me give an example: suppose we have an OO style class with methods to maintain a list which hide the representation. When the object becomes unreachable, the collector can just run down the list deleting all the nodes. It knows they're all unreachable now, because of encapsulation. It also knows there's no need to do a general scan of the nodes for pointers, because it knows all the nodes are the same type.
Obviously, this is a special case and easily handled with destructors in C++. The real question is whether there is way to analyse types used in a program, and direct the collector to use the resulting information to advantage. I guess you'd call this a type directed garbage collector.
The idea of at least exploiting containers for garbage collection in some way is not new, though in Java, you cannot generally assume that a container holds the only reference to objects within it, so your approach will not work in that context.
Here are a couple of references. One is for leak detection, and the other (from my research group) is about improving cache locality.
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4814126
http://www.cs.umass.edu/~emery/pubs/06-06.pdf
You might want to visit Richard Jones's extensive garbage collection bibliography for more references, or ask the folks on gc-list.
I don't think it has anything to do with a specific algorithm.
When the GC computes the graph of objects relationship, the information that a Collection object is sole responsible for those elements of the list is implicitly present in the graph if the compiler was good enough to extract it.
Whatever the GC algorithm chosen: the information depends more on how the compiler/runtime will extract this information.
Also, I would avoid C and C++ with GC. Because of pointer arithmetic, aliasing and the possibility to point within an object (reference on a data member or in an array), it's incredibly hard to perform accurate garbage collection in these languages. They have not been crafted for it.

Resources