Rust provides a few ways to store a collection of elements inside a user-defined struct. The struct can be given a custom lifetime specifier, and a reference to a slice:
struct Foo<'a> {
elements: &'a [i32]
}
impl<'a> Foo<'a> {
fn new(elements: &'a [i32]) -> Foo<'a> {
Foo { elements: elements }
}
}
Or it can be given a Vec object:
struct Bar {
elements: Vec<i32>
}
impl Bar {
fn new(elements: Vec<i32>) -> Bar {
Bar { elements: elements }
}
}
What are the major differences between these two approaches?
Will using a Vec force the language to copy memory whenever I call Bar::new(vec![1, 2, 3, 4, 5])?
Will the contents of Vec be implicitly destroyed when the owner Bar goes out of scope?
Are there any dangers associated with passing a slice in by reference if it's used outside of the struct that it's being passed to?
A Vec is composed of three parts:
A pointer to a chunk of memory
A count of how much memory is allocated (the capacity)
A count of how many items are stored (the size)
A slice is composed of two parts:
A pointer to a chunk of memory
A count of how many items are stored (the size)
Whenever you move either of these, those fields are all that will be copied. As you might guess, that's pretty lightweight. The actual chunk of memory on the heap will not be copied or moved.
A Vec indicates ownership of the memory, and a slice indicates a borrow of memory. A Vec needs to deallocate all the items and the chunk of memory when it is itself deallocated (dropped in Rust-speak). This happens when it goes out of scope. The slice does nothing when it is dropped.
There are no dangers of using slices, as that is what Rust lifetimes handle. These make sure that you never use a reference after it would be invalidated.
A Vec is a collection that can grow or shrink in size. It is stored on the heap, and it is allocated and deallocated dynamically at runtime. A Vec can be used to store any number of elements, and it is typically used when the number of elements is not known at compile time or when the number of elements may change during the execution of the program.
A slice is a reference to a contiguous sequence of elements in a Vec or other collection. It is represented using the [T] syntax, where T is the type of the elements in the slice. A slice does not store any elements itself, it only references elements stored in another collection. A slice is typically used when a reference to a subset of the elements in a collection is needed.
One of the main differences between a Vec and a slice is that a Vec can be used to add and remove elements, while a slice only provides read-only access to a subset of the elements in a collection. Another difference is that a Vec is allocated on the heap, while a slice is a reference and therefore has a fixed size. This means that a slice cannot be used to store new elements, but it can be used to reference a subset of the elements in a Vec or other collection.
Related
Researching the interface value in go - I found a great (maybe outdated) article by Russ Cox.
According to it:
The itable begins with some metadata about the types involved and then becomes a list of function pointers.
The implementation for this itable should be the one from src/runtime/runtime2.go:
type itab struct {
inter *interfacetype
_type *_type
hash uint32 // copy of _type.hash. Used for type switches.
_ [4]byte
fun [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter.
}
First confusing thing is - how is an array - variable sized?
Second, assuming that we have a function pointer at index 0 for a method that satisfies the interface, where could we store a second/third/... function pointer?
The compiled code and runtime access fun as if the field is declared fun [n]uintpr where n is the number of methods in the interface. The second method is stored at fun[1], the third at fun[2] and so on. The Go Language does not have a variable size array feature like this, but unsafe shenanigans can be used to simulate the feature.
Here's how itab is allocated:
m = (*itab)(persistentalloc(unsafe.Sizeof(itab{})+uintptr(len(inter.mhdr)-1)*goarch.PtrSize, 0, &memstats.other_sys))
The function persistentalloc allocates memory. The first argument to the function is the size to allocate. The expression inter.mhdr is the number of methods in the interface.
Here's code that creates a slice on the variable size array:
methods := (*[1 << 16]unsafe.Pointer)(unsafe.Pointer(&m.fun[0]))[:ni:ni]
The expression methods[i] refers to the same element as m.fun[i] in a hypothetical world where m.fun is a variable size array with length > i. Later code uses normal slice syntax with methods to access the variable size array m.fun.
Can I delete the first element in map? It is possible with slices slice = append(slice, slice[1:]...), but can I do something like this with maps?
Maps being hashtables don't have a specified order, so there's no way to delete keys in a defined order, unless you track keys in a separate slice, in the order you're adding them, something like:
type orderedMap struct {
data map[string]int
keys []string
mu *sync.RWMutex
}
func (o *orderedMap) Shift() (int, error) {
o.mu.Lock()
defer o.mu.Unlock()
if len(o.keys) == 0 {
return 0, ErrMapEmpty
}
i := o.data[o.keys[0]]
delete(o.data, o.keys[0])
o.keys = o.keys[1:]
return i, nil
}
Just to be unequivocal about why you can't really delete the "first" element from a map, let me reference the spec:
A map is an unordered group of elements of one type, called the element type, indexed by a set of unique keys of another type, called the key type. The value of an uninitialized map is nil.
Added the emphasis on the fact that map items are unordered
Using a slice to preserve some notion of the order of keys is, fundamentally, flawed, though. Given operations like this:
foo := map[string]int{
"foo": 1,
"bar": 2,
}
// a bit later:
foo["foo"] = 3
Is the index/key foo now updated, or reassigned? Should it be treated as a new entry, appended to the slice if keys, or is it an in-place update? Things get muddled really quickly. The simple fact of the matter is that the map type doesn't contain an "order" of things, trying to make it have an order quickly devolves in a labour intensive task where you'll end up writing your own type.
As I said earlier: it's a hashtable. Elements within get reshuffled behind the scenes if the hashing algorithm used for the keys produces collisions, for example. This question has the feel of an X-Y problem: why do you need the values in the map to be ordered? Maybe a map simply isn't the right approach for your particular problem.
This question already has answers here:
Why are map values not addressable?
(2 answers)
Closed 4 years ago.
type S struct {
e int
}
func main() {
a := []S{{1}}
a[0].e = 2
b := map[int]S{0: {1}}
b[0].e = 2 // error
}
a[0] is addressable but b[0] is not.
I know first 0 is an index and second 0 is a key.
Why golang implement like this? Any further consideration?
I've read source code of map in github.com/golang/go/src/runtime and map structure already supported indirectkey and indirectvalue if maxKeySize and maxValueSize are little enough.
type maptype struct {
...
keysize uint8 // size of key slot
indirectkey bool // store ptr to key instead of key itself
valuesize uint8 // size of value slot
indirectvalue bool // store ptr to value instead of value itself
...
}
I think if golang designers want this syntax, it works easy now.
Of course indirectkey indirectvalue may cost more resource and GC also need do more work.
So performance is the only reason for supporting this?
Or any other consideration?
In my opinion, supporting syntax like this is valuable.
As far as I known,
That's because a[0] can be replaced with address of array.
Similarly, a[1] can be replace with a[0]+(keySize*1).
But, In case of map one cannot do like that, hash algorithm changes from time to time based on your key, value pairs and number of them.
They are also rearranged from time to time.
specific computation is needed in-order to get the address of value.
Arrays or slices are easily addressable, but in case of maps it's like multiple function calls or structure look-ups ...
If one is thinking to replace it with what ever computation is needed, then binary size is going to be increased in orders of magnitude, and more over hash algorithm can keep changing from time to time.
I am implementing a bit-vector in Go:
// A bit vector uses a slice of unsigned integer values or “words,”
// each bit of which represents an element of the set.
// The set contains i if the ith bit is set.
// The following program demonstrates a simple bit vector type with these methods.
type IntSet struct {
words []uint64 //uint64 is important because we need control over number and value of bits
}
I have defined several methods (e.g. membership test, adding or removing elements, set operations like union, intersection etc.) on it which all have a pointer receiver. Here is one such method:
// Has returns true if the given integer is in the set, false otherwise
func (this *IntSet) Has(m int) bool {
// details omitted for brevity
}
Now, I need to return an empty set that is a true constant, so that I can use the same constant every time I need to refer to an IntSet that contains no elements. One way is to return something like &IntSet{}, but I see two disadvantages:
Every time an empty set is to be returned, a new value needs to be allocated.
The returned value is not really constant since it can be modified by the callers.
How do you define a null set that does not have these limitations?
If you read https://golang.org/ref/spec#Constants you see that constants are limited to basic types. A struct or a slice or array will not work as a constant.
I think that the best you can do is to make a function that returns a copy of an internal empty set. If callers modify it, that isn't something you can fix.
Actually modifying it would be difficult for them since the words inside the IntSet are lowercase and therefore private. If you added a value next to words like mut bool you could add a if mut check to every method that changes the IntSet. If it isn't mutable, return an error or panic.
With that, you could keep users from modifying constant, non-mutable IntSet values.
I've seen people say just create a new slice by appending the old one
*slc = append(*slc[:item], *slc[item+1:]...)
but what if you want to remove the last element in the slice?
If you try to replace i (the last element) with i+1, it returns an out of bounds error since there is no i+1.
You can use len() to find the length and re-slice using the index before the last element:
if len(slice) > 0 {
slice = slice[:len(slice)-1]
}
Click here to see it in the playground
TL;DR:
myslice = myslice[:len(myslice) - 1]
This will fail if myslice is zero sized.
Longer answer:
Slices are data structures that point to an underlying array and operations like slicing a slice use the same underlying array.
That means that if you slice a slice, the new slice will still be pointing to the same data as the original slice.
By doing the above, the last element will still be in the array, but you won't be able to reference it anymore.
If you reslice the slice to its original length you'll be able to reference the last object
If you have a really big slice and you want to also prune the underlying array to save memory, you probably wanna use "copy" to create a new slice with a smaller underlying array and let the old big slice get garbage collected.