Allocator that manages a single block of memory - c++11

Due to system limitations, suppose that I can only allocate memory from a heap once (for example with std::allocator or some other more general C++11 compliant allocator).
This single allocation will take a large memory block.
Then I want to use containers and dynamic memory but all restricted to the previously allocated block of memory.
I managed to write very simple allocator that incrementally "gives" memory shifting a pointer.
In this allocator deallocate is a no-op, and memory from the block is not returned to the block.
One can obviously do better than this.
In other words, I want a managed heap.
Reusing this block memory in a sequence is a hard problem because one needs to manage discontinuous free segments, defragmentation, (optional) thread-safety, etc.
What is the name of this pattern? For some time I though that this was a pool allocator but it seem that that refers to something else (reusing small objects).
What features or standard libraries of C++ can I use either implement and administer such allocation or at least build my own with little effort?
I expected to find something in Boost.
But Boost.Pool is something else and it looks like something like this is implemented for a specific purpose in Boost.Interprocess but it doesn't seem to be easy to use and I have a hard time to understand it outside their prototypical use (such a interprocess shared memory.)
Otherwise, the closest thing I found is this https://www.boost.org/doc/libs/1_41_0/libs/pool/doc/interfaces/pool_alloc.html , but it seems that ::new can be called several times.
Example code:
int main(){
UserBlockAllocator<double> a(new double[1000], 1000);
{
std::vector<double, UserBlockAllocator<double>> v0(600, a);
} // v0 returns memory to block managed by a
std::vector<double, UserBlockAllocator<double>> v1(600, a);
std::vector<double, UserBlockAllocator<double>> v2(600, a); //out of memory
}

This pattern is referred to as arena allocator or stack allocator. If I understand the std::pmr stuff correctly, a std::pmr::monotonic_buffer_resource is related to that, but I have never tried that.
With those keywords you find something, but I have no experience with the tools.
Note that it is easy to succesfully deallocate the most recent allocation.
A powerful pattern is the composition of allocators as described in an entertaining talk by Andrei Alexandrescu at CppCon 2015. If you want to build your own tool, you might consider the combination of a FreeListAllocator (43:18) on top of your StackAllocator (35:42). This way, you may solve the problem how to manage discontinous free segments (as you describe it).

Related

How to split one heap into several heaps inside one process?

Which ecosystems allow to create multiple heaps right now?
Is it possible to have multiple heaps in java?
garbage collection and memory management in Erlang
Is there any benefit to use multiple heaps for memory management purposes?
AppDomains don't create new heaps (there is still one heap for all domains). So, what one need to do to launch several different GC inside the single process?
Which syntactic primitives does one need to create? How a runtime should support that primitives?
Which ecosystems allow to create multiple heaps right now?
One obvious answer would be "C++" (feel free to fill in surrounding pieces as you see fit, if you don't consider a language to be an "ecosystem" in itself).
C++ allows you to specify heaps along a few different axes. One is by the type of an object--you can specify allocation for a particular type by overloading operator new and operator delete for that type:
class Foo {
static void *operator new(size_t size);
static void operator delete(void *block, size_t size);
};
It's then up to you to connect these heap management functions to an actual source of memory. You might allocate that via ::operator new, or you might (for example) go directly to the OS, such as with something like GlobalAlloc or VirtualAlloc on Windows, sbrk on UNIX-like systems, or just have pre-specified blocks of memory on a bare-metal embedded system.
Along a somewhat different axis, all the containers in the C++ standard library allocate and free memory via Allocator classes. The Allocator for any particular collection is specified as a template parameter, so (for example) a declaration for std::vector looks something like this:
template <class T, class Alloc=std::allocator<T>>
class vector {
// ...
};
This lets you specify a heap that will be used to allocate objects in that collection. Much as with operator new and operator delete, this really only specifies the interface by which the collection will allocate and free memory--it's up to you to connect that to code that actually manages the heap.
Garbage Collection
As far as garbage collection goes: I personally find it annoying, and advise against its use as a general rule. The problem is that it while it can (at least from one perspective) fix some types of problems with memory management, it does nothing to help management of other resources--and (unfortunately) I haven't seen anything like a tracing collector for file handles, network sockets, database connections, and so on. RAII provides a uniform method for dealing with resource management in general.
That said, if you really insist on using GC, C++ does support that as well. Prior to C++11, GC was entirely usable on a practical level, but led to what was technically undefined behavior under a few obscure circumstances, such as:
storing a pointer in a file, and reading it back in, or
modifying the bits of a pointer, later un-doing that modification
...and later taking the re-constituted pointer and dereferencing it. Obviously, while the pointer wasn't visible to the CPU, the pointed-to block of memory became eligible for GC, so the later dereference caused problems. C++11 defined these circumstances, and added a few library calls (e.g., declare_reachable, undeclare_reachable) to deal with them (e.g., if you call decalare_reachable(block);, that block is not eligible for collection, regardless of whether a pointer to it is visible). As such, if you want to use GC with C++ you can, and the bounds of defined behavior are thoroughly specified. The only problem is that essentially no code ever calls declare_reachable and/or undeclare_reachable, so in real use they're likely to be of little or no help (but pointer swizzling and/or storage in a file are sufficiently rare that this is unlikely to pose a real problem).
For a practical example, you might want to look at the Boehm-Demers-Weiser collector (if you haven't already).

Is it possible to deallocate memory manually in go?

After discussion with college, I wonder if it would be possible (even if completely does not make any sense) to deallocate memory manually in go (ie. by using unsafe package). Is it?
Here is a thread that may interest you: Add runtime.Free() for GOGC=off
Interesting part:
The Go GC does not have the ability to manually deallocate blocks
anymore. And besides, runtime. Free is unsafe (people might free still
in use pointers or double free) and then all sorts of C memory problem
that Go tries hard to get rid of will come back. The other reason is
that runtime sometimes allocates behind your back and there is no way
for the program to explicitly free memory.
If you really want to manually manage memory with Go, implement your
own memory allocator based on syscall.Mmap or cgo malloc/free.
Disabling GC for extended period of time is generally a bad solution
for a concurrent language like Go. And Go's GC will only be better
down the road.
TL;DR: Yes, but don't do it
I am a bit late but this question is high ranked on google, so here is an article by the creator of DGraph database which explains an alternative to malloc/calloc which is jemalloc, worth a look
https://dgraph.io/blog/post/manual-memory-management-golang-jemalloc/
With these techniques, we get the best of both worlds: We can do manual memory allocation in critical, memory-bound code paths. At the same time, we can get the benefits of automatic garbage collection in non-critical code paths. Even if you are not comfortable using Cgo or jemalloc, you could apply these techniques on bigger chunks of Go memory, with similar impact.
And I haven't tested it yet, but there is a github library called jemalloc-go
https://github.com/spinlock/jemalloc-go
Go 1.20 introduces an experimental concept of arenas for memory management, per the proposal proposal: arena: new package providing memory arenas. We could manage memory manually through arenas.
We propose the addition of a new arena package to the Go standard library. The arena package will allow the allocation of any number of arenas. Objects of arbitrary type can be allocated from the memory of the arena, and an arena automatically grows in size as needed. When all objects in an arena are no longer in use, the arena can be explicitly freed to reclaim its memory efficiently without general garbage collection. We require that the implementation provide safety checks, such that, if an arena free operation is unsafe, the program will be terminated before any incorrect behavior happens.
Sample codes:
a := arena.New()
var ptrT *T
a.New(&ptrT)
ptrT.val = 1
var sliceT []T
a.NewSlice(&sliceT, 100)
sliceT[99] .val = 4
a.Free()
Example: Per Go 1.20 Experiment: Memory Arenas vs Traditional Memory Management from Pyroscope.
Arenas are a powerful tool for optimizing Go programs, particularly in scenarios where your programs spend significant amount of time parsing large protobuf or JSON blobs.
Some recommendations:
Only use arenas in critical code paths. Do not use them everywhere
Profile your code before and after using arenas to make sure you're adding arenas in areas where they can provide the most benefit
Pay close attention to the lifecycle of the objects created on the arena. - Make sure you don't leak them to other components of your program where objects may outlive the arena
Use defer a.Free() to make sure that you don't forget to free memory
Use arena.Clone() to clone objects back to the heap if you want to use them after an arena was freed
Note: This proposal is on hold indefinitely due to serious API concerns. The GOEXPERIMENT=arena code may be changed incompatibly or removed at any time, and we do not recommend its use in production.

Why is free() not allowed in garbage-collected languages?

I was reading the C# entry on Wikipedia, and came across:
Managed memory cannot be explicitly freed; instead, it is automatically garbage collected.
Why is it that in languages with automatic memory management, manual management isn't even allowed? I can see that in most cases it wouldn't be necessary, but wouldn't it come in handy where you are tight on memory and don't want to rely on the GC being smart?
Languages with automatic memory management are designed to provide substantial memory safety guarantees that can't be offered in the presence of any manual memory management.
Among the problems prevented are
Double free()s
Calling free() on a pointer to memory that you do not own, leading to illegal access in other places
Calling free() on a pointer that was not the return value of an allocation function, such as taking the address of some object on the stack or in the middle of an array or other allocation.
Dereferencing a pointer to memory that has already been free()d
Additionally, automatic management can result in better performance when the GC moves live objects to a consolidated area. This improves locality of reference and hence cache performance.
Garbage collection enforces the type safety of a memory allocator by guaranteeing that memory allocations never alias. That is, if a piece of memory is currently being viewed as a type T, the memory allocator can guarantee (with garbage collection) that while that reference is alive, it will always refer to a T. More specifically, it means that the memory allocator will never return that memory as a different type.
Now, if a memory allocator allows for manual free() and uses garbage collection, it must ensure that the memory you free()'d is not referenced by anyone else; in other words, that the reference you pass in to free() is the only reference to that memory. Most of the time this is prohibitively expensive to do given an arbitrary call to free(), so most memory allocators that use garbage collection do not allow for it.
That isn't to say it is not possible; if you could express a single-referrent type, you could manage it manually. But at that point it would be easier to either stop using a GC language or simply not worry about it.
Calling GC.Collect is almost always the better than having an explicit free method. Calling free would make sense only for pointers/object refs that are referenced from nowhere. That is something that is error prone, since there is a chance that your call free for the wrong kind of pointer.
When the runtime environment does reference counting monitoring for you, it knows which pointers can be freed safely, and which not, so letting the GC decide which memory can be freed avoids a hole class of ugly bugs. One could think of a runtime implementation with both GC and free where explicitly calling free for a single memory block might be much faster than running a complete GC.Collect (but don't expect freeing every possible memory block "by hand" to be faster than the GC). But I think the designers of C#, CLI (and other languages with garbage collectors like Java) have decided to favor robustness and safety over speed here.
In systems that allow objects to be manually freed, the allocation routines have to search through a list of freed memory areas to find some free memory. In a garbage-collection-based system, any immediately-available free memory is going to be at the end of the heap. It's generally faster and easier for the system to ignore unused areas of memory in the middle of the heap than it would be to try to allocate them.
Interestingly enough, you do have access to the garbage collector through System.GC -- Though from everything I've read, it's highly recommended that you allow the GC manage itself.
I was advised once to use the following 2 lines by a 3rd party vendor to deal with a garbage collection issue with a DLL or COM object or some-such:
// Force garbage collection (cleanup event objects from previous run.)
GC.Collect(); // Force an immediate garbage collection of all generations
GC.GetTotalMemory(true);
That said, I wouldn't bother with System.GC unless I knew exactly what was going on under the hood. In this case, the 3rd party vendor's advice "fixed" the problem that I was dealing with regarding their code. But I can't help but wonder if this was actually a workaround for their broken code...
If you are in situation that you "don't want to rely on the GC being smart" then most probably you picked framework for your task incorrectly. In .net you can manipulate GC a bit (http://msdn.microsoft.com/library/system.gc.aspx), in Java not sure.
I think you can't call free because you start doing one task of GC. GC's efficiency can be somehow guaranteed overall when it does things the way it finds it best and it does them when it decides. If developers will interfere with GC it might decrease it's overall efficiency.
I can't say that it is the answer, but one that comes to mind is that if you can free, you can accidentally double free a pointer/reference or even worse, use one after free. Which defeats the main point of using languages like c#/java/etc.
Of course one possible solution to that, would be to have your free take it's argument by reference and set it to null after freeing. But then, what if they pass an r-value like this: free(whatever()). I suppose you could have an overload for r-value versions, but don't even know if c# supports such a thing :-P.
In the end, even that would be insufficient because as has been pointed out, you can have multiple references to the same object. Setting one to null would do nothing to prevent the others from accessing the now deallocated object.
Many of the other answers provide good explanations of how the GC work and how you should think when programming against a runtime system which provides a GC.
I would like to add a trick that I try to keep in mind when programming in GC'd languages. The rule is this "It is important to drop pointers as soon as possible." By dropping pointers I mean that I no longer point to objects that I no longer will use. For instance, this can be done in some languages by setting a variable to Null. This can be seen as a hint to the garbage collector that it is fine to collect this object, provided there are no other pointers to it.
Why would you want to use free()? Suppose you have a large chunk of memory you want to deallocate.
One way to do it is to call the garbage collector, or let it run when the system wants. In that case, if the large chunk of memory can't be accessed, it will be deallocated. (Modern garbage collectors are pretty smart.) That means that, if it wasn't deallocated, it could still be accessed.
Therefore, if you can get rid of it with free() but not the garbage collector, something still can access that chunk (and not through a weak pointer if the language has the concept), which means that you're left with the language's equivalent of a dangling pointer.
The language can defend itself against double-frees or trying to free unallocated memory, but the only way it can avoid dangling pointers is by abolishing free(), or modifying its meaning so it no longer has a use.
Why is it that in languages with automatic memory management, manual management isn't even allowed? I can see that in most cases it wouldn't be necessary, but wouldn't it come in handy where you are tight on memory and don't want to rely on the GC being smart?
In the vast majority of garbage collected languages and VMs it does not make sense to offer a free function although you can almost always use the FFI to allocate and free unmanaged memory outside the managed VM if you want to.
There are two main reasons why free is absent from garbage collected languages:
Memory safety.
No pointers.
Regarding memory safety, one of the main motivations behind automatic memory management is eliminating the class of bugs caused by incorrect manual memory management. For example, with manual memory management calling free with the same pointer twice or with an incorrect pointer can corrupt the memory manager's own data structures and cause non-deterministic crashes later in the program (when the memory manager next reaches its corrupted data). This cannot happen with automatic memory management but exposing free would open up this can of worms again.
Regarding pointers, the free function releases a block of allocated memory at a location specified by a pointer back to the memory manager. Garbage collected languages and VMs replace pointers with a more abstract concept called references. Most production GCs are moving which means the high-level code holds a reference to a value or object but the underlying location in memory can change as the VM is capable of moving allocated blocks of memory around without the high-level language knowing. This is used to compact the heap, preventing fragmentation and improving locality.
So there are good reasons not to have free when you have a GC.
Manual management is allowed. For example, in Ruby calling GC.start will free everything that can be freed, though you can't free things individually.

Alternative for Garbage Collector

I'd like to know the best alternative for a garbage collector, with its pros and cons. My priority is speed, memory is less important. If there is garbage collector which doesn't make any pause, let me know.
I'm working on a safe language (i.e. a language with no dangling pointers, checking bounds, etc), and garbage collection or its alternative has to be used.
I suspect you will be best sticking with garbage collection (as per the JVM) unless you have a very good reason otherwise. Modern GCs are extremely fast, general purpose and safe. Unless you can design your language to take advantage of a very specific special case (as in one of the above allocators) then you are unlikely to beat the JVM.
The only really compelling reason I see nowadays as an argument against modern GC is latency issues caused by GC pauses. These are small, rare and not really an issue for most purposes (e.g. I've successfully written 3D engines in Java), but they still can cause problems in very tight realtime situations.
Having said that, there may still be some special cases where a different memory allocation scheme may make sense so I've listed a few interesting options below:
An example of a very fast, specialised memory management approach is the "per frame" allocator used in many games. This works by incrementing a single pointer to allocate memory, and at the end of a time period (typically a visual "frame") all objects are discarded at once by simply setting the pointer back to the base address and overwriting them in the next allocation. This can be "safe", however the constraints of object lifetime would be very strict. Might be a winner if you can guarantee that all memory allocation is bounded in size and only valid for the scope of handling e.g. a single server request.
Another very fast approach is to have dedicated object pools for different classes of object. Released objects can just be recycled in the pool, using something like a linked list of free object slots. Operating systems often used this kind of approach for common data structures. Again however you need to watch object lifetime and explicitly handle disposals by returning objects to the pool.
Reference counting looks superficially good but usually doesn't make sense because you frequently have to dereference and update the count on two objects whenever you change a pointer value. This cost is usually worse than the advantage of having simple and fast memory management, and it also doesn't work in the presence of cyclic references.
Stack allocation is extremely fast and can run safely. Depending on your language, it is possible to make do without a heap and run entirely on a stack based system. However I suspect this will somewhat constrain your language design so that might be a non-starter. Still might be worth considering for certain DSLs.
Classic malloc/free is pretty fast and can be made safe if you have sufficient constraints on object creation and lifetime which you may be able to enforce in your language. An example would be if e.g. you placed significant constraints on the use of pointers.
Anyway - hope this is useful food for thought!
If speed matters but memory does not, then the fastest and simplest allocation strategy is to never free. Allocation is simply a matter of bumping a pointer up. You cannot get faster than that.
Of course, never releasing anything has a huge potential for overflowing available memory. It is very rare that memory is truly "unimportant". Usually there is a large but finite amount of available memory. One strategy is called "region based allocation". Namely you allocate memory in a few big blocks called "regions", with the pointer-bumping strategy. Release occurs only by whole regions. This strategy can be applied with some success if the problem at hand can be structured into successive "tasks", each having its own region.
For more generic solutions, if you want real-time allocation (i.e. guaranteed limits on the response time from allocation requests) then garbage collection is the way to go. A real-time GC may look like this: objects are allocated with a pointer-bumping strategy. Also, on every allocation, the allocator performs a little bit of garbage collection, in which "live" objects are copied somewhere else. In a way the GC runs "at the same time" than the application. This implies a bit of extra work for accessing objects, because you cannot move an object and update all pointers to point to the new object location while keeping the "real-time" promise. Solutions may imply barriers, e.g. an extra indirection. Generational GC allow for barrier-free access to most objects while keeping pause times under strict bounds.
This article is a must-read for whoever wants to study memory allocation, in particular garbage collection.
With C++ it's possible to make a heap allocation ONCE for your objects, then reuse that memory for subsequent objects, I've seen it work and it was blindingly fast.
It's only applicable to a certian set of problems, and it's difficult to do it right, but it is possible.
One of the joys of C++ is you have complete control over memory management, you can decide to use classic new/delete, or implement your own reference counting or Garbage Collection.
However - here be dragons - you really, really need to know what you're doing.
If memory doesn't matter, then what #Thomas says applies. Considering the gargantuan memory spaces of modern hardware, this may very well be a viable option -- it really depends on the process.
Manual memory management doesn't necessarily solve your problems directly, but it does give you complete control over WHEN memory events happen. Generic malloc, for example, is not an O(1) operation. It does all sorts of potentially horrible things in there, both within the heap managed by malloc itself as well as the operating system. For example, ya never know when "malloc(10)" may cause the VM to page something out, now your 10 bytes of RAM have an unknown disk I/O component -- oops! Even worse, that page out could be YOUR memory, which you'll need to immediately page back in! Now c = *p is a disk hit. YAY!
But if you are aware of these, then you can safely set up your code so that all of the time critical parts effectively do NO memory management, instead they work off of pre-allocated structures for the task.
With a GC system, you may have a similar option -- it depends on the collector. I don't think the Sun JVM, for example, has the ability to be "turned off" for short periods of time. But if you work with pre-allocated structures, and call all of your own code (or know exactly what's going on in the library routine you call), you probably have a good chance of not hitting the memory manager.
Because, the crux of the matter is that memory management is a lot of work. If you want to get rid of memory management, the write old school FORTRAN with ARRAYs and COMMON blocks (one of the reasons FORTRAN can be so fast). Of course, you can write "FORTRAN" in most any language.
With modern languages, modern GCs, etc., memory management has been pushed aside and become a "10%" problem. We are now pretty sloppy with creating garbage, copying memory, etc. etc., because the GCs et al make it easy for us to be sloppy. And for 90% of the programs, this is not an issue, so we don't worry about. Nowadays, it's a tuning issue, late in the process.
So, your best bet is set it all up at once, use it, then toss it all away. The "use it" part is where you will get consistent, reliable results (assuming enough memory on the system of course).
As an "alternative" to garbage collection, C++ specifically has smart pointers. boost::shared_ptr<> (or std::tr1::shared_ptr<>) works exactly like Python's reference counted garbage collection. In my eyes, shared_ptr IS garbage collection. (although you may need to do a few weak_ptr<> stuff to make sure that circular references don't happen)
I would argue that auto_ptr<> (or in C++0x, the unique_ptr<>...) is a viable alternative, with its own set of benefits and tradeoffs. Auto_ptr has a clunky syntax and can't be used in STL containers... but it gets the job done. During compile-time, you "move" the ownership of the pointer from variable to variable. If a variable owns the pointer when it goes out of scope, it will call its destructor and free the memory. Only one auto_ptr<> (or unique_ptr<>) is allowed to own the real pointer. (at least, if you use it correctly).
As another alternative, you can store everything on the stack and just pass references around to all the functions you need.
These alternatives don't really solve the general memory management problem that garbage collection solves. Nonetheless, they are efficient and well tested. An auto_ptr doesn't use any more space than the pointer did originally... and there is no overhead on dereferencing an auto_ptr. "Movement" (or assignment in Auto_ptr) has a tiny amount of overhead to keep track of the owner. I haven't done any benchmarks, but I'm pretty sure they're faster than garbage collection / shared_ptr.
If you truly want no pauses at all, disallow all memory allocation except for stack allocation, region-based buffers, and static allocation. Despite what you may have been told, malloc() can actually cause severe pauses if the free list becomes fragmented, and if you often find yourself building massive object graphs, naive manual free can and will lose to stop-and-copy; the only way to really avoid this is to amortize over preallocated pages, such as the stack or a bump-allocated pool that's freed all at once. I don't know how useful this is, but I know that the proprietary graphical programming language LabVIEW by default allocates a static region of memory for each subroutine-equivalent, requiring programmers to manually enable stack allocation; this is the kind of thing that's useful in a hard-real-time environment where you need absolute guarantees on memory usage.
If what you want is to make it easy to reason about pauses and give your developers control over allocation and placement, then there is already a language called Rust that has the same stated goals as your language; while not a completely safe language, it does have a safe subset, allowing you to create safe abstractions for raw bit-twiddling. It uses pointer type annotations to eliminate use-after-free bugs. It also doesn't have null pointers in safe code, because null pointers cost a billion dollars at least.
If bounded pauses are enough, though, there are a wide variety of algorithms that will work. If you really have a small working set compared to available memory, then I would recommend the MOS collector (aka the Train Algorithm), which collects incrementally and provably always makes progress toward freeing unreferenced objects.
It's a common fallacy that managed languages are not suitable for high performance low latency scenarios. Yes, with limited resources (such as an embedded platform) and sloppy programming you can shoot yourself in the foot just as spectacularly as with C++ (and that can be VERY VERY spectacular).
This problem has come whilst developing games in Java/C# and the solution was to utilise a memory pool and not let object die, hence not needing garbage collector to run when you don't expect it. This is really the same approach as with low latency unmanaged systems - TO TRY REALLY REALLY HARD NOT TO ALLOCATE MEMORY.
So, considering the fact that implementing such system in Java/C# is very similar to C++, the advantage of doing it the girly man way(managed), you have the "niceness" of other language features that free up your mental clock cycles to concentrate on important things.

Region based memory management

I'm designing a high level language, and I want it to have the speed of C++ (it will use LLVM), but be safe and high level like C#. Garbage collection is slow, and new/delete is unsafe. I decided to try to use "region based memory management" (there are a few papers about it on the web, mostly for functional languages). The only "useful" language using it is Cyclone, but that also has GC. Basically, objects are allocated on a lexical stack, and are freed when the block closes. Objects can only refer to other objects in the same region or higher, to prevent dangling references. To make this more flexible, I added parallel regions that can be moved up and down the stack, and retained through loops. The type system would be able to verify assignments in most cases, but low overhead runtime checks would be necessary in some places.
Ex:
region(A) {
Foo#A x=new Foo(); //x is deleted when this region closes.
region(B,C) while(x.Y) {
Bar#B n=new Bar();
n.D=x; //OK, n is in lower region than x.
//x.D=n; would cause error: x is in higher region than n.
n.DoSomething();
Bar#C m=new Bar();
//m.D=n; would cause error: m and n are parallel.
if(m.Y)
retain(C); //On the next iteration, m is retained.
}
}
Does this seem practical? Would I need to add non-lexically scoped, reference counted regions? Would I need to add weak variables that can refer to any object, but with a check on region deletion? Can you think of any algorithms that would be hard to use with this system or that would leak?
I would discourage you from trying regions. The problem is that in order to make regions guaranteed to be safe, you need a very sophisticated type system---I'm sure you've looked at the papers by Tofte and Talpin and you have an idea of the complexities involved. Even if you do get regions working successfully, the chances are very hight that your program will require a whose lifetime is the lifetime of the program---and that region at least has to be garbage collected. (This is why Cyclone has regions and GC.)
Since you're just getting started, I'd encourage you to go with garbage collection. Modern garbage collectors can be made pretty fast without a lot of effort. The main issue is to allocate from contiguous free space so that allocation is fast. It helps to be targeting AMD64 or other machine with spare registers so you can use a hardware register as the allocation pointer.
There are lots of good ideas to adapt; one of the easiest to implement is a page-based collector like Joel Bartlett's mostly-copying collector, where the idea is you allocate only from completely empty pages.
If you want to study existing garbage collectors, Lua has a fairly sophisticated incremental garbage collector (so there are no visible pause times) and the implementation is only 700 lines. It is fast enough to be used in a lot of games, where performance matters.
If I were implementing a language with region based memory management, I would probably read A language-independent framework for region inference. That said, it's been a while since I looked into this stuff, and I'm sure the state of the art has moved on, if I ever even knew what the state of the art was.
Well you should go study Apples memory management. It has release pools and zones, which sure sound a lot like what you're doing here.
I won't comment on the "GC is slow" remark,
You can start by Tofte and Talpin's papers about region-based memory management.
How would it return a dynamically created object? Who would "own" it and be responsible for freeing the memory?
Refcounting or GC are so common because they are almost always the best choices. Generational garbage collectors can be very efficient.

Resources