Is working past the end of a slice idiomatic? - go

I was reading through Go's compress/flate package, and I found this odd piece of code [1]:
n := int32(len(list))
list = list[0 : n+1]
list[n] = maxNode()
In context, list is guaranteed to be pointing to an array with more data after. This is a private function, so it can't be misused outside the library.
To me, this seems like a scary hack that should be a runtime exception. For example, the following D code generates a RangeError:
auto x = [1, 2, 3];
auto y = x[0 .. 2];
y = y[0 .. 3];
Abusing slices could be done more simply (and also look more safe) with the following:
x := []int{1, 2, 3}
y = x[:2]
y = append(y, 4) // x is now [1, 2, 4] because of how append works
But both solutions seem very hacky and scary and, IMHO, should not work as they do. Is this sort of thing considered idiomatic Go code? If so, which of the the above is more idiomatic?
[1] - http://golang.org/src/pkg/compress/flate/huffman_code.go#L136

This is not abusing the slice, this is just perfectly using what a slice is : a window over an array.
I'll take this illustration from another related answer I made :
array : [0 0 0 0 0 0 0 0 0 0 0 0]
array : <---- capacity --->
slice : [0 0 0 0]
slice : <---- capacity --->
When the array is greater than the slice it's normal and standard to take a greater slice by extending one when you know you don't go out of the underlying array (which can be verified using cap()).
Regarding your buggy code you give as example, yes, it might be dangerous, but arrays and slices are among the most basic structures of the languages and you must understand them before you use them if you want to avoid such bugs. I personally think that any go coder should not only know the API but also what are slices.
In the code you link to, a short analysis shows that there is no possible overflow possible as list is created as
list := make([]literalNode, len(freq)+1)
and is later resized to count which can't be greater than len(freq) :
list = list[0:count]
One might have preferred a few more comments but as the function containing list = list[0 : n+1] is private and called from only one place, it might also be considered the balancing between comment verbosity and code obscurity sounds right. It's painful to have too much comments hiding the code and anybody in need to read this code is able to easily check there is no overflow just like I did.

It cannot be a run time exception because the language specification prescribes that the upper limit of the slice operation is the capacity of the slice, not its length.

Related

golang slice, slicing a slice with slice[a:b:c]

I read go slice usage and internals and Slice and Effective go#slice but there is nothing about slicing a slice with 3 number like this : slice[a:b:c]
For example this code :
package main
import "fmt"
func main() {
var s = []string{"a", "b", "c", "d", "e", "f", "g"}
fmt.Println(s[1:2:6], len(s[1:2:6]), cap(s[1:2:6]))
fmt.Println(s[1:2:5], len(s[1:2:5]), cap(s[1:2:5]))
fmt.Println(s[1:2], len(s[1:2]), cap(s[1:2]))
}
go playground result is this :
[b] 1 5
[b] 1 4
[b] 1 6
I can understand that the third one is something about capacity, but what is the exact meaning of this?
Do I miss something in documents?
The syntax has been introduced in Go 1.2, as I mentioned in "Re-slicing slices in Golang".
It is documented in Full slice expressions:
a[low : high : max]
constructs a slice of the same type, and with the same length and elements as the simple slice expression a[low : high].
Additionally, it controls the resulting slice's capacity by setting it to max - low.
Only the first index may be omitted; it defaults to 0.
After slicing the array a:
a := [5]int{1, 2, 3, 4, 5}
t := a[1:3:5]
the slice t has type []int, length 2, capacity 4, and elements
t[0] == 2
t[1] == 3
The design document for that feature had the following justification:
It would occasionally be useful, for example in custom []byte allocation managers, to be able to hand a slice to a caller and know that the caller cannot edit values beyond a given subrange of the true array.
The addition of append to the language made this somewhat more important, because append lets programmers overwrite entries between len and cap without realizing it or even mentioning cap.
2022: svitanok adds for Go 1.19+:
while the capacity of a "derivative" slice doesn't exceed the one specified by the third index during its creation the slice is still "from" the same spot in the memory as its original ("true") slice, so the changes applied to it will affect the original slice.
And if then, for example, you append to this derivative slice the amount of elements that would cause its capacity to be increased, this new slice will occupy a different place in the memory, and so the changes made to it will not affect the slice it originated from.
In a slice expression slice[a:b:c] or aSlice[1:3:5]
a:b or 1:3 -> gives length
a:c or 1:5 -> gives capacity
We can extract both length and capacity from a slice expression with 3 numbers/indices, without looking at the source slice/array.
expression| aSlice[low:high:max] or aSlice[a:b:c] or aSlice[1:3:7]
------------------------------------------------------------------------
Length | len(aSlice[low:high]) or len(aSlice[a:b]) or len(aSlice[1:3])
Capacity | len(aSlice[low:max]) or len(aSlice[a:c]) or len(aSlice[1:7])
------------------------------------------------------------------------
Read more here at Slice Expressions
Playground
Actually Go slice have a pointer and pointing to the array and it holds length and capacity of the array and we can show it like will be
pointer:length:capacity
and append is used for adding same new length.
sl1 := make([]int, 6)
fmt.Println(sl1)
sl2 := append(sl1, 1)
fmt.Println(sl2)
[0 0 0 0 0 0]
[0 0 0 0 0 0 1]

In a Go slice, why does s[lo:hi] end at element hi-1? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
According to the Tour of Go, in a Go slice s, the expression s[lo:hi] evaluates to a slice of the elements from lo through hi-1, inclusive:
package main
import "fmt"
func main() {
p := []int{0, // slice position 0
10, // slice position 1
20, // slice position 2
30, // slice position 3
40, // slice position 4
50} // slice position 5
fmt.Println(p[0:3]) // => [0 10 20]
}
In my code example above, "p[0:3]" would seem to intuitively "read" as: "the slice from position 0 to position 3", equating to [0, 10, 20, 30]. But of course, it actually equates to [0 10 20].
So my question is: what is the design rationale for the upper value evaluating to hi-1 rather than simply hi? It feels unintuitive, but there must be some reason for it that I'm missing, and I'm curious what that might be.
Thanks in advance.
This is completely a matter of convention, and there are certainly other ways to do it (for example, Matlab uses arrays whose first index is 1). The choice really comes down to what properties you want. As it turns out, using 0-indexed arrays where slicing is inclusive-exclusive (that is, a slice from a to b includes element a and excludes element b) has some really nice properties, and thus it's a very common choice. Here are a few advantages.
Advantages of 0-indexed arrays and inclusive-exclusive slicing
(note that I'm using non-Go terminology, so I'll talk about arrays in the way that C or Java would talk about them. Arrays are what Go calls slices, and slices are sub-arrays (ie, "the slice from index 1 to index 4"))
Pointer arithmetic works. If you're in a language like C, arrays are really just pointers to the first element in the array. Thus, if you use 0-indexed arrays, then you can say that the element at index i is just the element pointed at by the array pointer plus i. For example, if we have the array [3 2 1] with the address of the array being 10 (and assuming that each value takes up one byte of memory), then the address of the first element is 10 + 0 = 10, the address of the second is 10 + 1 = 11, and so on. In short, it makes the math simple.
The length of a slice is also the place to slice it. That is, for an array arr, arr[0:len(arr)] is just arr itself. This comes in handy a lot in practice. For example, if I call n, _ := r.Read(arr) (where n is the number of bytes read into arr), then I can just do arr[:n] to get the slice of arr corresponding to the data that was actually written into arr.
Indices don't overlap. This means that if I have arr[0:i], arr[i:j], arr[j:k], arr[k:len(arr)], these slices fully cover arr itself. You may not often find yourself partitioning an array into sub-slices like this, but it has a number of related advantages. For example, consider the following code to split an array based on non-consecutive integers:
func consecutiveSlices(ints []int) [][]int {
ret := make([][]int, 0)
i, j := 0, 1
for j < len(ints) {
if ints[j] != ints[j-1] + 1 {
ret = append(ret, ints[i:j])
i = j
}
}
ret = append(ret, ints[i:j])
}
(this code obviously doesn't handle some edge cases well, but you get the idea)
If we were to try to write the equivalent function using inclusive-inclusive slicing, it would be significantly more complicated.
If anyone can think of any more, please feel free to edit this answer and add them.
The Go Programming Language Specification
Slice types
Slice expressions
For a string, array, pointer to array, or slice a, the primary
expression
a[low : high]
constructs a substring or slice. The indices low and high select which
elements of operand a appear in the result. The result has indices
starting at 0 and length equal to high - low.
For convenience, any of the indices may be omitted. A missing low
index defaults to zero; a missing high index defaults to the length of
the sliced operand
For arrays or strings, the indices are in range if 0 <= low <= high <=
len(a), otherwise they are out of range. For slices, the upper index
bound is the slice capacity cap(a) rather than the length. A constant
index must be non-negative and representable by a value of type int;
for arrays or constant strings, constant indices must also be in
range. If both indices are constant, they must satisfy low <= high. If
the indices are out of range at run time, a run-time panic occurs.
For q := p[m:n], q is a slice of p starting at index m for a length of n-m elements.

Is there way to remove element from list in Mathematica

Is there function in Wolfram Mathematica for removing element from original list?
For example
a={1,2,3};
DeleteFrom[a,1];
a
a={2,3}
If it is absent can anyone give example of efficient variant of such function?
(I know that there is function Delete() but it will create new list. This is not good if list is big)
If you want to drop the first element from a list a the statement
Drop[a,1]
returns a list the same as a without its first element. Note that this does not update a. To do that you could assign the result to a, eg
a = Drop[a,1]
Note that this is probably exactly what Delete is doing behind the scenes; first making a copy of a without its first element, then assigning the name a to that new list, then freeing the memory used by the old list.
Comparing destructive updates and non-destructive updates in Mathematica is quite complicated and can take one deep into the system's internals. You'll find a lot about the subject on the Stack Exchange Mathematica site.
Every time you change the length of a list in Mathematica you cause a reallocation of the list, which takes O(n) rather than O(1) time. Though no "DeleteFrom" function exists, if it did it would be no faster than a = Delete[a, x].
If you can create in advance a list of all the elements you wish to delete and delete them all at once you will get much better performance. If you cannot you will have to find another way to formulate your problem. I suggest you join us on the proper Stack Exchange site if you have additional questions:
Assign the element to an empty sequence and it will removed from the list. This works for any element.
In[1] := a = {1,2,3}
Out[1]= {1,2,3}
In[2] := a[[1]] = Sequence[]
Out[2] = Sequence[]
In[3] := a
Out[3] = {2,3}
Yes Mathematica tends towards non-destructive programming, but the programmers at Wolfram are pretty clever folks and the code seems to run pretty fast. It hard to believe they would always copy a whole list to change one element, i.e. not make any optimizations.
Improving the user3446498's answer, you can do the following:
In[1] := a = {1,2,3};
In[2] := a[[1]] = Nothing;
In[3] := a
Out[3] = {2,3}
In[4] := a == {2,3}
Out[4] = True
this Nothing symbol was introduced in in the 10th version (2015), see here.
Both solutions from #user3446498 and #pmsoltani won't actually delete the element. Test:
a = {1, 2, 3};
a[[2]] = Sequence[]; (* or Nothing *)
--a[[1]];
++a[[2]];
a
They both output {0, 4, 3}, while {0, 4} is expected.
Replacing 2nd line with a = Delete[a, 2]; would work.

Preallocating arrays of structures in Matlab for efficiency

In Matlab I wish to preallocate a 1x30 array of structures named P with the following structure fields:
imageSize: [128 128]
orientationsPerScale: [8 8 8 8]
numberBlocks: 4
fc_prefilt: 4
boundaryExtension: 32
G: [192x192x32 double]
G might not necessarily be 192x192x32, it could be 128x128x16 for example (though it will have 3 dimensions of type double).
I am doing the preallocation the following way:
P(30) = struct('imageSize', 0, 'orientationsPerScale', [0 0 0 0], ...
'numberBlocks', 0, 'fc_prefilt', 0, 'boundaryExtension', 0, 'G', []);
Is this the correct way of preallocating such a structure, or will there be performance issues relating to G being set to empty []? If there is a better way of allocating this structure please provide an example.
Also, the above approach seems to work (performance issues aside), however, the order of the field name / value pairs seems to be important, since rearranging them leads to error upon assignment after preallocation. Why is this so given that the items/values are referenced by name (not position)?
If G is set to Empty, the interpreter has no way of knowing what size data will be attributed to it later, so it probably will pack the array items tight in memory, and have to redo it all when it doesn't fit.
It's probably more efficient to define upper bounds for the dimensions of G beforehand, and set it to that size. The zeroes function could help.

On PackedArray, looking for advice for using them

I have not used PackedArray before, but just started looking at using them from reading some discussion on them here today.
What I have is lots of large size 1D and 2D matrices of all reals, and no symbolic (it is a finite difference PDE solver), and so I thought that I should take advantage of using PackedArray.
I have an initialization function where I allocate all the data/grids needed. So I went and used ToPackedArray on them. It seems a bit faster, but I need to do more performance testing to better compare speed before and after and also compare RAM usage.
But while I was looking at this, I noticed that some operations in M automatically return lists in PackedArray already, and some do not.
For example, this does not return packed array
a = Table[RandomReal[], {5}, {5}];
Developer`PackedArrayQ[a]
But this does
a = RandomReal[1, {5, 5}];
Developer`PackedArrayQ[a]
and this does
a = Table[0, {5}, {5}];
b = ListConvolve[ {{0, 1, 0}, {1, 4, 1}, {0, 1, 1}}, a, 1];
Developer`PackedArrayQ[b]
and also matrix multiplication does return result in packed array
a = Table[0, {5}, {5}];
b = a.a;
Developer`PackedArrayQ[b]
But element wise multiplication does not
b = a*a;
Developer`PackedArrayQ[b]
My question : Is there a list somewhere which documents which M commands return PackedArray vs. not? (assuming data meets the requirements, such as Real, not mixed, no symbolic, etc..)
Also, a minor question, do you think it will be better to check first if a list/matrix created is already packed before calling calling ToPackedArray on it? I would think calling ToPackedArray on list already packed will not cost anything, as the call will return right away.
thanks,
update (1)
Just wanted to mention, that just found that PackedArray symbols not allowed in a demo CDF as I got an error uploading one with one. So, had to remove all my packing code out. Since I mainly write demos, now this topic is just of an academic interest for me. But wanted to thank everyone for time and good answers.
There isn't a comprehensive list. To point out a few things:
Basic operations with packed arrays will tend to remain packed:
In[66]:= a = RandomReal[1, {5, 5}];
In[67]:= Developer`PackedArrayQ /# {a, a.a, a*a}
Out[67]= {True, True, True}
Note above that that my version (8.0.4) doesn't unpack for element-wise multiplication.
Whether a Table will result in a packed array depends on the number of elements:
In[71]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {10}]]
Out[71]= False
In[72]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {11}]]
Out[72]= True
In[73]:= Developer`PackedArrayQ[Table[RandomReal[], {25}, {10}]]
Out[73]= True
On["Packing"] will turn on messages to let you know when things unpack:
In[77]:= On["Packing"]
In[78]:= a = RandomReal[1, 10];
In[79]:= Developer`PackedArrayQ[a]
Out[79]= True
In[80]:= a[[1]] = 0 (* force unpacking due to type mismatch *)
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {10}. >>
Out[80]= 0
Operations that do per-element inspection will usually unpack the array,
In[81]:= a = RandomReal[1, 10];
In[82]:= Position[a, Max[a]]
Developer`FromPackedArray::unpack: Unpacking array in call to Position. >>
Out[82]= {{4}}
There penalty for calling ToPackedArray on an already packed list is small enough that I wouldn't worry about it too much:
In[90]:= a = RandomReal[1, 10^7];
In[91]:= Timing[Do[Identity[a], {10^5}];]
Out[91]= {0.028089, Null}
In[92]:= Timing[Do[Developer`ToPackedArray[a], {10^5}];]
Out[92]= {0.043788, Null}
The frontend prefers packed to unpacked arrays, which can show up when dealing with Dynamic and Manipulate:
In[97]:= Developer`PackedArrayQ[{1}]
Out[97]= False
In[98]:= Dynamic[Developer`PackedArrayQ[{1}]]
Out[98]= True
When looking into performance, focus on cases where large lists are getting unpacked, rather than the small ones. Unless the small ones are in big loops.
This is just an addendum to Brett's answer:
SystemOptions["CompileOptions"]
will give you the lengths being used for which a function will return a packed array. So if you did need to pack a small list, as an alternative to using Developer`ToPackedArray you could temporarily set a smaller number for one of the compile options. e.g.
SetSystemOptions["CompileOptions" -> {"TableCompileLength" -> 20}]
Note also some difference between functions which to me at least doesn't seem intuitive so I generally have to test these kind of things whenever I use them rather than instinctively knowing what will work best:
f = # + 1 &;
g[x_] := x + 1;
data = RandomReal[1, 10^6];
On["Packing"]
Timing[Developer`PackedArrayQ[f /# data]]
{0.131565, True}
Timing[Developer`PackedArrayQ[g /# data]]
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {1000000}.
{1.95083, False}
Another addition to Brett's answer: If a list is a packed array then a ToPackedArray is very fast since this checked quite early. Also you might find this valuable:
http://library.wolfram.com/infocenter/Articles/3141/
In general for numerics stuff look for talks from Rob Knapp and/or Mark Sofroniou.
When I develop numerics codes, I write the function and then use On["Packing"] to make sure that everything is packed that needs to be packed.
Concerning Mike's answer, the threshold has been introduced since for small stuff there is overhead. Where the threshold is is hardware dependent. It might be an idea to write a function that sets these threshold based on measurements done on the computer.

Resources