Preallocating arrays of structures in Matlab for efficiency - performance

In Matlab I wish to preallocate a 1x30 array of structures named P with the following structure fields:
imageSize: [128 128]
orientationsPerScale: [8 8 8 8]
numberBlocks: 4
fc_prefilt: 4
boundaryExtension: 32
G: [192x192x32 double]
G might not necessarily be 192x192x32, it could be 128x128x16 for example (though it will have 3 dimensions of type double).
I am doing the preallocation the following way:
P(30) = struct('imageSize', 0, 'orientationsPerScale', [0 0 0 0], ...
'numberBlocks', 0, 'fc_prefilt', 0, 'boundaryExtension', 0, 'G', []);
Is this the correct way of preallocating such a structure, or will there be performance issues relating to G being set to empty []? If there is a better way of allocating this structure please provide an example.
Also, the above approach seems to work (performance issues aside), however, the order of the field name / value pairs seems to be important, since rearranging them leads to error upon assignment after preallocation. Why is this so given that the items/values are referenced by name (not position)?

If G is set to Empty, the interpreter has no way of knowing what size data will be attributed to it later, so it probably will pack the array items tight in memory, and have to redo it all when it doesn't fit.
It's probably more efficient to define upper bounds for the dimensions of G beforehand, and set it to that size. The zeroes function could help.

Related

Preallocate or change size of vector

I have a situation where I have a process which needs to "burn-in". This means that I
Start with p values, p relatively small
For n>p, generate nth value using most recently generated p values (e.g. p+1 generated from values 1 to p, p+2 generated from values 2, p+1, etc.)
Repeat until n=N, where N large
Now, only the most recently generated p values will be useful to me, so there are two ways for me to implement this. I can either
Start with a vector of p initial values. At each iteration, mutate the vector, removing the first element, and replacing the last element with the most recently generated value or,
Preallocate a large array of length N, where first p elements are initial values. At iteration n, mutate nth value with most recently generated value
There are pros and cons to both approaches.
Pros of the first, are that we only store most relevant values. Cons of the first are that we are changing the length of the vector at each iteration.
Pros of the second are that we preallocate all the memory we need. Cons of the second is that we store much more than we need.
What is the best way to proceed? Does it depend on what aspect of performance I most need to care about? What will be the quickest?
Cheers in advance.
edit: approximately, p is usually in the order of low tens, N can be several thousand
The first solution has another huge cons: removing the first item of an array takes O(n) time since elements should be moved in memory. This certainly cause the algorithm to runs in quadratic time which is not reasonable. Shifting the items as proposed by #ForceBru should also cause this quadratic run time (since many items are moved just to add one value every time).
The second solution should be pretty fast compared to the first but, indeed, it can use a lot of memory so it should be sub-optimal (it takes time to write values in the RAM).
A faster solution is to use a data structure called a deque. Such data structure enable you to remove the first item in constant time and append a new value at the end also in constant time. That being said, it also introduces some overhead to be able to do that. Julia provide such data structure (more especially queues).
Since the number of in-flight items appears to be bounded in your algorithm, you can implement a rolling buffer. Fortunately, Julia also implement this: see CircularBuffer. This solution should be quite simple and fast (since the operations you want to do are done in O(1) time on it).
It is probably simplest to use CircularArrays.jl for your use case:
julia> using CircularArrays
julia> c = CircularArray([1,2,3,4])
4-element CircularVector(::Vector{Int64}):
1
2
3
4
julia> for i in 5:10
c[i] = i
#show c
end
c = [5, 2, 3, 4]
c = [5, 6, 3, 4]
c = [5, 6, 7, 4]
c = [5, 6, 7, 8]
c = [9, 6, 7, 8]
c = [9, 10, 7, 8]
In this way - as you can see - you can can continue using an increasing index and array will wrap around internally as needed (discarding old values that are not needed any more).
In this way you always store last p values in the array without having to copy anything or re-allocate memory in each step.
...only the most recently generated p values will be useful to me...
Start with a vector of p initial values. At each iteration, mutate the vector, removing the first element, and replacing the last element with the most recently generated value.
Cons of the first are that we are changing the length of the vector at each iteration.
There's no need to change the length of the vector. Simply shift its elements to the left (overwriting the first element) and write the new data to the_vector[end]:
the_vector = [1,2,3,4,5,6]
function shift_and_add!(vec::AbstractVector, value)
vec[1:end-1] .= #view vec[2:end] # shift
vec[end] = value # replace the last value
vec
end
#assert shift_and_add!(the_vector, 80) == [2,3,4,5,6,80]
# `the_vector` will be mutated
#assert the_vector == [2,3,4,5,6,80]

Is working past the end of a slice idiomatic?

I was reading through Go's compress/flate package, and I found this odd piece of code [1]:
n := int32(len(list))
list = list[0 : n+1]
list[n] = maxNode()
In context, list is guaranteed to be pointing to an array with more data after. This is a private function, so it can't be misused outside the library.
To me, this seems like a scary hack that should be a runtime exception. For example, the following D code generates a RangeError:
auto x = [1, 2, 3];
auto y = x[0 .. 2];
y = y[0 .. 3];
Abusing slices could be done more simply (and also look more safe) with the following:
x := []int{1, 2, 3}
y = x[:2]
y = append(y, 4) // x is now [1, 2, 4] because of how append works
But both solutions seem very hacky and scary and, IMHO, should not work as they do. Is this sort of thing considered idiomatic Go code? If so, which of the the above is more idiomatic?
[1] - http://golang.org/src/pkg/compress/flate/huffman_code.go#L136
This is not abusing the slice, this is just perfectly using what a slice is : a window over an array.
I'll take this illustration from another related answer I made :
array : [0 0 0 0 0 0 0 0 0 0 0 0]
array : <---- capacity --->
slice : [0 0 0 0]
slice : <---- capacity --->
When the array is greater than the slice it's normal and standard to take a greater slice by extending one when you know you don't go out of the underlying array (which can be verified using cap()).
Regarding your buggy code you give as example, yes, it might be dangerous, but arrays and slices are among the most basic structures of the languages and you must understand them before you use them if you want to avoid such bugs. I personally think that any go coder should not only know the API but also what are slices.
In the code you link to, a short analysis shows that there is no possible overflow possible as list is created as
list := make([]literalNode, len(freq)+1)
and is later resized to count which can't be greater than len(freq) :
list = list[0:count]
One might have preferred a few more comments but as the function containing list = list[0 : n+1] is private and called from only one place, it might also be considered the balancing between comment verbosity and code obscurity sounds right. It's painful to have too much comments hiding the code and anybody in need to read this code is able to easily check there is no overflow just like I did.
It cannot be a run time exception because the language specification prescribes that the upper limit of the slice operation is the capacity of the slice, not its length.

Data Structure / Hash Function to link Sets of Ints to Value

Given n integer id's, I wish to link all possible sets of up to k id's to a constant value. What I'm looking for is a way to translate sets (e.g. {1, 5}, {1, 3, 5} and {1, 2, 3, 4, 5, 6, 7}) to unique values.
Guarantees:
n < 100 and k < 10 (again: set sizes will range in [1, k]).
The order of id's doesn't matter: {1, 5} == {5, 1}.
All combinations are possible, but some may be excluded.
All sets and values are constant and made only once. No deletes or inserts, no value updates.
Once generated, the only operations taking place will be look-ups.
Look-ups will be frequent and one-directional (given set, look up value).
There is no need to sort (or otherwise organize) the values.
Additionally, it would be nice (but not obligatory) if "neighboring" sets (drop one id, add one id, swap one id, etc) are easy to reach, as well as "all sets that include at least this set".
Any ideas?
Enumerate using the product of primes.
a -> 2
b -> 3
c -> 5
d -> 7
et cetera
Now hash(ab) := 6, and hash (abc) := 30
And a nice side effect is that, if "ab" is a subset of "abc", then:
hash(abc) % hash(ab) == 0
and
hash(abc) / hash(ab) == hash(c)
The bad news: You might run into overflow, the 100th prime will probably be around 1000, and 64 bits cannot accomodate 1000**10. This will not affect the functioning as a hash function; only the subset thingy will fail to work. the same method applied to anagrams
The other option is Zobrist-hashing. It is equivalent to the the primes method, but instead of primes you use a fixed set of (random) numbers, and instead of multiplying you use XOR.
For a fixed small (it needs << ~70 bits) set like yours, it might be possible to tune the zobrist tables to totally avoid collisions (yielding a perfect hash).
And the final (and simplest) way is to use a (100bit) bitmap, and treat that as a hashvalue (maybe after modulo table size)
And a totally unrelated method is to just build a decision tree on the bits of the bitmap. (the tree would have a maximal depth of k) a related kD tree on bit values
May be not the best solution, but you can do the following:
Sort the set from Lowest to highest with a simple IntegerComparator
Add each item of the set to a String
so if you have {2,5,9,4} first Step->{2,4,5,9}; second->"2459"
This way you will get a unique String from a unique set. If you really need to map them to an integer value, you can hash the string after that.
A second way I can think of is to store them in a java Set and simply map it against a HashMap with set as keys
Calculate a 'diff' from each set {1, 6, 87, 89} = {1,5,81,2,0,0,...}
{1,2,3,4} = { 1,1,1,1,0,0,0,0... };
Then binary encode each number with a variable length encoding and concatenate the bits.
It's hard to compare the sets (except for the first few equal bits), but because there can't be many large intervals in a set, all possible values just might fit into 64 bits. (slack of 16 bits at least...)

Performance of List.permute

I implemented a Fisher-Yates shuffle recently, which used List.permute to shuffle the list, and noted that as the size of the list increased, there was a significant performance decrease. I suspect this is due to the fact that while the algorithm assumes it is operating on an array, permute must be accessing the list elements by index, which is O(n).
To confirm this, I tried out applying a permutation to a list to reverse its element, comparing working directly on the list, and transforming the list into an array and back to a list:
let permute i max = max - i - 1
let test = [ 0 .. 10000 ]
let rev1 list =
let perm i = permute i (List.length list)
List.permute perm list
let rev2 list =
let array = List.toArray list
let perm i = permute i (Array.length array)
Array.permute perm array |> Array.toList
I get the following results, which tend to confirm my assumption:
rev1 test;;
Real: 00:00:00.283, CPU: 00:00:00.265, GC gen0: 0, gen1: 0, gen2: 0
rev2 test;;
Real: 00:00:00.003, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
My question is the following:
1) Should List.permute be avoided for performance reasons? And, relatedly, shouldn't the implementation of List.permute automatically do the transformation into an Array behind the scenes?
2) Besides using an Array, is there a more functional way / data structure suitable for this type of work, i.e. shuffling of elements? Or is this simply a problem for which an Array is the right data structure to use?
List.permute converts the list to an array, calls Array.permute, then converts it back to a list. Based on that, you can probably figure out what you need to do (hint: work with arrays!).
Should List.permute be avoided for performance reasons?
The only performance problem here is in your own code, specifically calling List.length.
Besides using an Array, is there a more functional way / data structure suitable for this type of work, i.e. shuffling of elements? Or is this simply a problem for which an Array is the right data structure to use?
You are assuming that arrays cannot be used functionally when, in fact, they can by not mutating their elements. Consider the permute function:
let permute f (xs: _ []) = Array.init xs.Length (fun i -> xs.[f i])
Although it acts upon an array and produces an array it is not mutating anything so it is using an array as a purely functional data structure.

sorting algorithm where pairwise-comparison can return more information than -1, 0, +1

Most sort algorithms rely on a pairwise-comparison the determines whether A < B, A = B or A > B.
I'm looking for algorithms (and for bonus points, code in Python) that take advantage of a pairwise-comparison function that can distinguish a lot less from a little less or a lot more from a little more. So perhaps instead of returning {-1, 0, 1} the comparison function returns {-2, -1, 0, 1, 2} or {-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5} or even a real number on the interval (-1, 1).
For some applications (such as near sorting or approximate sorting) this would enable a reasonable sort to be determined with less comparisons.
The extra information can indeed be used to minimize the total number of comparisons. Calls to the super_comparison function can be used to make deductions equivalent to a great number of calls to a regular comparsion function. For example, a much-less-than b and c little-less-than b implies a < c < b.
The deductions cans be organized into bins or partitions which can each be sorted separately. Effectively, this is equivalent to QuickSort with n-way partition. Here's an implementation in Python:
from collections import defaultdict
from random import choice
def quicksort(seq, compare):
'Stable in-place sort using a 3-or-more-way comparison function'
# Make an n-way partition on a random pivot value
segments = defaultdict(list)
pivot = choice(seq)
for x in seq:
ranking = 0 if x is pivot else compare(x, pivot)
segments[ranking].append(x)
seq.clear()
# Recursively sort each segment and store it in the sequence
for ranking, segment in sorted(segments.items()):
if ranking and len(segment) > 1:
quicksort(segment, compare)
seq += segment
if __name__ == '__main__':
from random import randrange
from math import log10
def super_compare(a, b):
'Compare with extra logarithmic near/far information'
c = -1 if a < b else 1 if a > b else 0
return c * (int(log10(max(abs(a - b), 1.0))) + 1)
n = 10000
data = [randrange(4*n) for i in range(n)]
goal = sorted(data)
quicksort(data, super_compare)
print(data == goal)
By instrumenting this code with the trace module, it is possible to measure the performance gain. In the above code, a regular three-way compare uses 133,000 comparisons while a super comparison function reduces the number of calls to 85,000.
The code also makes it easy to experiment with a variety comparison functions. This will show that naïve n-way comparison functions do very little to help the sort. For example, if the comparison function returns +/-2 for differences greater than four and +/-1 for differences four or less, there is only a modest 5% reduction in the number of comparisons. The root cause is that the course grained partitions used in the beginning only have a handful of "near matches" and everything else falls in "far matches".
An improvement to the super comparison is to covers logarithmic ranges (i.e. +/-1 if within ten, +/-2 if within a hundred, +/- if within a thousand.
An ideal comparison function would be adaptive. For any given sequence size, the comparison function should strive to subdivide the sequence into partitions of roughly equal size. Information theory tells us that this will maximize the number of bits of information per comparison.
The adaptive approach makes good intuitive sense as well. People should first be partitioned into love vs like before making more refined distinctions such as love-a-lot vs love-a-little. Further partitioning passes should each make finer and finer distinctions.
You can use a modified quick sort. Let me explain on an example when you comparison function returns [-2, -1, 0, 1, 2]. Say, you have an array A to sort.
Create 5 empty arrays - Aminus2, Aminus1, A0, Aplus1, Aplus2.
Pick an arbitrary element of A, X.
For each element of the array, compare it with X.
Depending on the result, place the element in one of the Aminus2, Aminus1, A0, Aplus1, Aplus2 arrays.
Apply the same sort recursively to Aminus2, Aminus1, Aplus1, Aplus2 (note: you don't need to sort A0, as all he elements there are equal X).
Concatenate the arrays to get the final result: A = Aminus2 + Aminus1 + A0 + Aplus1 + Aplus2.
It seems like using raindog's modified quicksort would let you stream out results sooner and perhaps page into them faster.
Maybe those features are already available from a carefully-controlled qsort operation? I haven't thought much about it.
This also sounds kind of like radix sort except instead of looking at each digit (or other kind of bucket rule), you're making up buckets from the rich comparisons. I have a hard time thinking of a case where rich comparisons are available but digits (or something like them) aren't.
I can't think of any situation in which this would be really useful. Even if I could, I suspect the added CPU cycles needed to sort fuzzy values would be more than those "extra comparisons" you allude to. But I'll still offer a suggestion.
Consider this possibility (all strings use the 27 characters a-z and _):
11111111112
12345678901234567890
1/ now_is_the_time
2/ now_is_never
3/ now_we_have_to_go
4/ aaa
5/ ___
Obviously strings 1 and 2 are more similar that 1 and 3 and much more similar than 1 and 4.
One approach is to scale the difference value for each identical character position and use the first different character to set the last position.
Putting aside signs for the moment, comparing string 1 with 2, the differ in position 8 by 'n' - 't'. That's a difference of 6. In order to turn that into a single digit 1-9, we use the formula:
digit = ceiling(9 * abs(diff) / 27)
since the maximum difference is 26. The minimum difference of 1 becomes the digit 1. The maximum difference of 26 becomes the digit 9. Our difference of 6 becomes 3.
And because the difference is in position 8, out comparison function will return 3x10-8 (actually it will return the negative of that since string 1 comes after string 2.
Using a similar process for strings 1 and 4, the comparison function returns -5x10-1. The highest possible return (strings 4 and 5) has a difference in position 1 of '-' - 'a' (26) which generates the digit 9 and hence gives us 9x10-1.
Take these suggestions and use them as you see fit. I'd be interested in knowing how your fuzzy comparison code ends up working out.
Considering you are looking to order a number of items based on human comparison you might want to approach this problem like a sports tournament. You might allow each human vote to increase the score of the winner by 3 and decrease the looser by 3, +2 and -2, +1 and -1 or just 0 0 for a draw.
Then you just do a regular sort based on the scores.
Another alternative would be a single or double elimination tournament structure.
You can use two comparisons, to achieve this. Multiply the more important comparison by 2, and add them together.
Here is a example of what I mean in Perl.
It compares two array references by the first element, then by the second element.
use strict;
use warnings;
use 5.010;
my #array = (
[a => 2],
[b => 1],
[a => 1],
[c => 0]
);
say "$_->[0] => $_->[1]" for sort {
($a->[0] cmp $b->[0]) * 2 +
($a->[1] <=> $b->[1]);
} #array;
a => 1
a => 2
b => 1
c => 0
You could extend this to any number of comparisons very easily.
Perhaps there's a good reason to do this but I don't think it beats the alternatives for any given situation and certainly isn't good for general cases. The reason? Unless you know something about the domain of the input data and about the distribution of values you can't really improve over, say, quicksort. And if you do know those things, there are often ways that would be much more effective.
Anti-example: suppose your comparison returns a value of "huge difference" for numbers differing by more than 1000, and that the input is {0, 10000, 20000, 30000, ...}
Anti-example: same as above but with input {0, 10000, 10001, 10002, 20000, 20001, ...}
But, you say, I know my inputs don't look like that! Well, in that case tell us what your inputs really look like, in detail. Then someone might be able to really help.
For instance, once I needed to sort historical data. The data was kept sorted. When new data were added it was appended, then the list was run again. I did not have the information of where the new data was appended. I designed a hybrid sort for this situation that handily beat qsort and others by picking a sort that was quick on already sorted data and tweaking it to be fast (essentially switching to qsort) when it encountered unsorted data.
The only way you're going to improve over the general purpose sorts is to know your data. And if you want answers you're going to have to communicate that here very well.

Resources