Is Rust-style ownership and lifetimes possible without Rust-style borrow checking? - memory-management

Would it be possible for a programming language to consistently have Rust-style ownership and lifetimes (for automatic memory management) while dropping the requirement that only one mutable reference to a piece of data can exist at any time (used to suppress data races)?
In other words, are Rust-style ownership and lifetimes and Rust-style borrow checking two separable concepts? Alternatively, are these two ideas inherently entangled at a semantic level?

Would it be possible for a programming language to consistently have Rust-style ownership and lifetimes (for automatic memory management) while dropping the requirement that only one mutable reference to a piece of data can exist at any time (used to suppress data races)?
A language can do anything so sure.
The problem is that dropping this requirement would be an UB-nest in a language like Rust: if you drop mutable references being unique then they have no purpose so you just have references (always mutable) and the only thing they do is be lexically scoped, this means you can hold a reference to a sub-part of an object, and have a second reference mutate the object such that the sub-part is invalidated (e.g. a reference to a vec item and clearing the vec[0]), and the first reference is now dangling, it points to garbage.
The way to solve that would be to… add a GC? And from that point on the value of "rust-style ownership and references" becomes… limited to nonexistent, because you need a GC non-lexical automated memory management and your references can keep objects alive so having all types be affine by default isn't very useful.
Now what can be useful (and what some languages explore) is for sub-normal types to be opt-in, so types would be normal by default but could be opted into being affine, linear, or even ordered, on a needs basis. This would be solely a type safety measure.
If so, are there any existing languages which achieve this?
Not to my knowledge.
If not, why not?
Because nobody's written one? Affine types by default are useful to Rust but they're not super useful in general so most of the research and design has focused around linear types, which provide more guarantees and are therefore more useful if only a small subset of your types are going to be sub-normal.
[0] which shows that "data races" are not solely about concurrency, it's an entire class of problems which occur commonly in sequential code (e.g. iterator invalidation)

Related

What is Dyon's Memory Model?

The Dyon Tutorial says it uses "lifetimes" rather than garbage collection or manual memory management. But how then does that lifetime model differ from Ownership in Rust?
Dyon has a limited memory model because of the lack of a garbage collector. The language is designed to work around this limitation. - The Dyon Programming Language Tutorial
How exactly is this model limited? Is there an example of memory managing code that Dyon could not run because of this limitation?
The linked Dyon book contains an explanation to just that:
Lifetimes are about references
A lifetime is about the references stored inside a variable. All references outlive variables they are stored in. Variables can not store references to themselves, because it can not outlive itself.
In order to put a reference inside a variable, the lifetime checker must know that the reference outlives the variable.
Because of the lifetime checker, all memory in Dyon is an acyclic graph.
Therefore, the main limitation is that references cannot make any cycles. That is, it is not possible to represent circular node lists or having a child object keep a reference to its parent.
These limitations also apply to Rust, with the exception that Rust also provides workarounds. Reference-counted types (Rc and Arc), in combination with weak references (see std::rc::Weak), can create circular references. Cycles can also be made behind unsafe constructs, namely raw pointers.
See also (Rust specific, but most principles apply):
Why can't I store a value and a reference to that value in the same struct?

When to use references versus types versus boxes and slices versus vectors as arguments and return types?

I've been working with Rust the past few days to build a new library (related to abstract algebra) and I'm struggling with some of the best practices of the language. For example, I implemented a longest common subsequence function taking &[&T] for the sequences. I figured this was Rust convention, as it avoided copying the data (T, which may not be easily copy-able, or may be big). When changing my algorithm to work with simpler &[T]'s, which I needed elsewhere in my code, I was forced to put the Copy type constraint in, since it needed to copy the T's and not just copy a reference.
So my higher-level question is: what are the best-practices for passing data between threads and structures in long-running processes, such as a server that responds to queries requiring big data crunching? Any specificity at all would be extremely helpful as I've found very little. Do you generally want to pass parameters by reference? Do you generally want to avoid returning references as I read in the Rust book? Is it better to work with &[&T] or &[T] or Vec<T> or Vec<&T>, and why? Is it better to return a Box<T> or a T? I realize the word "better" here is considerably ill-defined, but hope you'll understand my meaning -- what pitfalls should I consider when defining functions and structures to avoid realizing my stupidity later and having to refactor everything?
Perhaps another way to put it is, what "algorithm" should my brain follow to determine where I should use references vs. boxes vs. plain types, as well as slices vs. arrays vs. vectors? I hesitate to start using references and Box<T> returns everywhere, as I think that'd get me a sort of "Java in Rust" effect, and that's not what I'm going for!

Does newLISP use garbage collection?

This page has been quite confusing for me.
It says:
Memory management in newLISP does not rely on a garbage collection algorithm. Memory is not marked or reference-counted. Instead, a decision whether to delete a newly created memory object is made right after the memory object is created.
newLISP follows a one reference only (ORO) rule. Every memory object not referenced by a symbol is obsolete once newLISP reaches a higher evaluation level during expression evaluation. Objects in newLISP (excluding symbols and contexts) are passed by value copy to other user-defined functions. As a result, each newLISP object only requires one reference.
Further down, I see:
All lists, arrays and strings are passed in and out of built-in functions by reference.
I can't make sense of these two.
How can newLISP "not rely on a garbage collection algorithm", and yet pass things by reference?
For example, what would it do in the case of circular references?!
Is it even possible for a LISP to not use garbage collection, without making performance go down the drain? (I assume you could always pass things by value, or you could always perform a full-heap scan whenever you think it might be necessary, but then it seems to me like that would insanely hurt your performance.)
If so, how would it deal with circular references? If not, what do they mean?
Perhaps reading http://www.newlisp.org/ExpressionEvaluation.html helps understanding the http://www.newlisp.org/MemoryManagement.html paper better. Regarding circular references: they do not exist in newLISP, there is no way to create them. The performance question is addressed in a sub chapter of that memory management paper and here: http://www.newlisp.org/benchmarks/
May be working and experimenting with newLISP - i.e. trying to create a circular reference - will clear up most of the questions.

Gc using type information

Does anyone know of a GC algorithm which utilises type information to allow incremental collection, optimised collection, parallel collection, or some other nice feature?
By type information, I mean real semantics. Let me give an example: suppose we have an OO style class with methods to maintain a list which hide the representation. When the object becomes unreachable, the collector can just run down the list deleting all the nodes. It knows they're all unreachable now, because of encapsulation. It also knows there's no need to do a general scan of the nodes for pointers, because it knows all the nodes are the same type.
Obviously, this is a special case and easily handled with destructors in C++. The real question is whether there is way to analyse types used in a program, and direct the collector to use the resulting information to advantage. I guess you'd call this a type directed garbage collector.
The idea of at least exploiting containers for garbage collection in some way is not new, though in Java, you cannot generally assume that a container holds the only reference to objects within it, so your approach will not work in that context.
Here are a couple of references. One is for leak detection, and the other (from my research group) is about improving cache locality.
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4814126
http://www.cs.umass.edu/~emery/pubs/06-06.pdf
You might want to visit Richard Jones's extensive garbage collection bibliography for more references, or ask the folks on gc-list.
I don't think it has anything to do with a specific algorithm.
When the GC computes the graph of objects relationship, the information that a Collection object is sole responsible for those elements of the list is implicitly present in the graph if the compiler was good enough to extract it.
Whatever the GC algorithm chosen: the information depends more on how the compiler/runtime will extract this information.
Also, I would avoid C and C++ with GC. Because of pointer arithmetic, aliasing and the possibility to point within an object (reference on a data member or in an array), it's incredibly hard to perform accurate garbage collection in these languages. They have not been crafted for it.

Reimplementing data structures in the real world

The topic of algorithms class today was reimplementing data structures, specifically ArrayList in Java. The fact that you can customize a structure for in various ways definitely got me interested, particularly with variations of add() & iterator.remove() methods.
But is reimplementing and customizing a data structure something that is of more interest to the academics vs the real-world programmers? Has anyone reimplemented their own version of a data structure in a commercial application/program, and why did you pick that route over your particular language's implementation?
Knowing how data structures are implemented and can be implemented is definitely of interest to everyone, not just academics. While you will most likely not reimplement a datastructure if the language already provides an implementation with suitable functions and performance characteristics, it is very possible that you will have to create your own data structure by composing other data structures... or you may need to implement a data structure with slightly different behavior than a well-known data structure. In that case, you certainly will need to know how the original data structure is implemented. Alternatively, you may end up needing a data structure that does not exist or which provides similar behavior to an existing data structure, but the way in which it is used requires that it be optimized for a different set of functions. Again, such a situation would require you to know how to implement (and alter) the data structure, so yes it is of interest.
Edit
I am not advocating that you reimplement existing datastructures! Don't do that. What I'm saying is that the knowledge does have practical application. For example, you may need to create a bidirectional map data structure (which you can implement by composing two unidirectional map data structures), or you may need to create a stack that keeps track of a variety of statistics (such as min, max, mean) by using an existing stack data structure with an element type that contains the value as well as these various statistics. These are some trivial examples of things that you might need to implement in the real world.
I have re-implemented some of a language's built-in data structures, functions, and classes on a number of occasions. As an embedded developer, the main reason I would do that is for speed or efficiency. The standard libraries and types were designed to be useful in a variety of situations, but there are many instances where I can create a more specialized version that is custom-tailored to take advantage of the features and limitations of my current platform. If the language doesn't provide a way to open up and modify existing classes (like you can in Ruby, for instance), then re-implementing the class/function/structure can be the only way to go.
For example, one system I worked on used a MIPS CPU that was speedy when working with 32-bit numbers but slower when working with smaller ones. I re-wrote several data structures and functions to use 32-bit integers instead of 16-bit integers, and also specified that the fields be aligned to 32-bit boundaries. The result was a noticable speed boost in a section of code that was bottlenecking other parts of the software.
That being said, it was not a trivial process. I ended up having to modify every function that used that structure and I ended up having to re-write several standard library functions as well. In this particular instance, the benefits outweighed the work. In the general case, however, it's usually not worth the trouble. There's a big potential for hard-to-debug problems, and it's almost always more work than it looks like. Unless you have specific requirements or restrictions that the existing structures/classes don't meet, I would recommend against re-implementing them.
As Michael mentions, it is indeed useful to know how to re-implement structures even if you never do so. You may find a problem in the future that can be solved by applying the principles and techniques used in existing data structures.

Categories

Resources