I was wondering how does one decide the resizing factor by which dynamic array resizes ?
On wikipedia and else where I have always seen the number of elements being increased by a factor of 2? Why 2? Why not 3? how does one decide this factor ? IF it is language dependent I would like to know this for Java.
Actually in Java's ArrayList the formula to calculate the new capacity after a resize is:
newCapacity = (oldCapacity * 3)/2 + 1;
This means roughtly a 1.5 factor.
About the reason for this number I don't know but I hope someone has done a statistical analisys and found this is a good compromise between space and computational overhead.
Quoting from Wikipedia:
As n elements are inserted, the capacities form a geometric progression. Expanding the array by any constant proportion ensures that inserting n elements takes O(n) time overall, meaning that each insertion takes amortized constant time. The value of this proportion a leads to a time-space tradeoff: the average time per insertion operation is about a/(a−1), while the number of wasted cells is bounded above by (a−1)n. The choice of a depends on the library or application: a=3/21 and a=2[citation needed] is commonly-used.
Apparently it seems that it is a good compromise between CPU time and memory wasting. I guess the "best" value depends on what your application does.
Would you waste more space than you actually use?
If not, the factor should be less than or equal to 2.
If you want it to be an integer so it is simple to work with, there is only one choice.
There is another difference between a growth rate of 2X and a growth rate of 1.5X that nobody here has discussed yet.
Each time we allocate a new buffer to increase our dynamic array capacity, we are building up a region of unused memory preceding the array. If the growth rate is too high, then this region cannot ever be reused in the array.
To visualize, let "X" represent memory cells used by our array, and "O" represent memory cells that we can no longer use. A growth rate of 2X looks like so:
[X] -> [OXX] -> [OOOXXXX] -> [OOOOOOOXXXXXXXX]
... notice that the preceding O's keep growing! In fact, with a 2X growth rate, we can never use that memory again in our array.
But, with a 1.5X growth multiplier (rounded down, but at least 1), the usage looks like:
[X] -> [OXX] -> [OOOXXX] -> [OOOOOOXXXX] -> [XXXXXX]
Wait a sec, we were able to reclaim the old space! That's because the size of the unused space caught up with the size of the array.
If you work out the math, the limit growth factor is Phi (or about 1.618). Anything larger than Phi, and you cannot reclaim the old space.
Related
I thought it meant it takes a constant amount of time to run. Is that different than one step?
O(1) is a class of functions. Namely, it includes functions bounded by a constant.
We say that an algorithm has the complexity of O(1) iff the amount of steps it takes, as a function of the size of the input, is bounded by a(n arbirtary) constant. This function can be a constant, or it can grow, or behave chaotically, or undulate as a sine wave. As long as it never exceeds some predefined constant, it's O(1).
For more information, see Big O notation.
It means that even if you increase the size of whatever the algorithm is operating on, the number of calculations required to run remains the same.
More specifically it means that the number of calculations doesn't get larger than some constant no matter how big the input gets.
In contrast, O(N) means that if the size of the input is N, the number of steps required is at most a constant times N, no matter how big N gets.
So for example (in python code since that's probably easy for most to interpret):
def f(L, index): #L a list, index an integer
x = L[index]
y=2*L[index]
return x + y
then even though f has several calculations within it, the time taken to run is the same regardless of how long the list L is. However,
def g(L): #L a list
return sum(L)
This will be O(N) where N is the length of list L. Even though there is only a single calculation written, the system has to add all N entries together. So it has to do at least one step for each entry. So as N increases, the number of steps increases proportional to N.
As everyone has already tried to answer it, it simply means..
No matter how many mangoes you've got in a box, it'll always take you the same amount of time to eat 1 mango. How you plan on eating it is irrelevant, there maybe a single step or you might go through multiple steps and slice it nicely to consume it.
I have studied the working of Sieve of Eratosthenes for generating the prime numbers up to a given number using iteration and striking off all the composite numbers. And the algorithm needs just to be iterated up to sqrt(n) where n is the upper bound upto which we need to find all the prime numbers. We know that the number of prime numbers up to n=10^9 is very less as compared to the number of composite numbers. So we use all the space to just tell that these numbers are not prime first by marking them composite.
My question is can we modify the algorithm to just store prime numbers since we deal with a very large range (since number of primes are very less)?
Can we just store straight away the prime numbers?
Changing the structure from that of a set (sieve) - one bit per candidate - to storing primes (e.g. in a list, vector or tree structure) actually increases storage requirements.
Example: there are 203.280.221 primes below 2^32. An array of uint32_t of that size requires about 775 MiB whereas the corresponding bitmap (a.k.a. set representation) occupies only 512 MiB (2^32 bits / 8 bits/byte = 2^29 bytes).
The most compact number-based representation with fixed cell size would be storing the halved distance between consecutive odd primes, since up to about 2^40 the halved distance fits into a byte. At 193 MiB for the primes up to 2^32 this is slightly smaller than an odds-only bitmap but it is only efficient for sequential processing. For sieving it is not suitable because, as Anatolijs has pointed out, algorithms like the Sieve of Eratosthenes effectively require a set representation.
The bitmap can be shrunk drastically by leaving out the multiples of small primes. Most famous is the odds-only representation that leaves out the number 2 and its multiples; this halves the space requirement to 256 MiB at virtually no cost in added code complexity. You just need to remember to pull the number 2 out of thin air when needed, since it isn't represented in the sieve.
Even more space can be saved by leaving out multiples of more small primes; this generalisation of the 'odds-only' trick is usually called 'wheeled storage' (see Wheel Factorization in the Wikipedia). However, the gain from adding more small primes to the wheel gets smaller and smaller whereas the wheel modulus ('circumference') increases explosively. Adding 3 removes 1/3rd of the remaining numbers, adding 5 removes a further 1/5th, adding 7 only gets you a further 1/7th and so on.
Here's an overview of what adding another prime to the wheel can get you. 'ratio' is the size of the wheeled/reduced set relative to the full set that represents every number; 'delta' gives the shrinkage compared to the previous step. 'spokes' refers to the number of prime-bearing spokes which need to be represented/stored; the total number of spokes for a wheel is of course equal to its modulus (circumference).
The mod 30 wheel (about 136 MiB for the primes up to 2^32) offers an excellent cost/benefit ratio because it has eight prime-bearing spokes, which means that there is a one-to-one correspondence between wheels and 8-bit bytes. This enables many efficient implementation tricks. However, its cost in added code complexity is considerable despite this fortuitous circumstance, and for many purposes the odds-only sieve ('mod 2 wheel') gives the most bang for buck by far.
There are two additional considerations worth keeping in mind. The first is that data sizes like these often exceed the capacity of memory caches by a wide margin, so that programs can often spend a lot of time waiting for the memory system to deliver the data. This is compounded by the typical access patterns of sieving - striding over the whole range, again and again and again. Speedups of several orders of magnitude are possible by working the data in small batches that fit into the level-1 data cache of the processor (typically 32 KiB); lesser speedups are still possible by keeping within the capacity of the L2 and L3 caches (a few hundred KiB and a few MiB, respectively). The keyword here is 'segmented sieving'.
The second consideration is that many sieving tasks - like the famous SPOJ PRIME1 and its updated version PRINT (with extended bounds and tightened time limit) - require only the small factor primes up to the square root of the upper limit to be permanently available for direct access. That's a comparatively small number: 3512 when sieving up to 2^31 as in the case of PRINT.
Since these primes have already been sieved there's no need for a set representation anymore, and since they are few there are no problems with storage space. This means they are most profitably kept as actual numbers in a vector or list for easy iteration, perhaps with additional, auxiliary data like current working offset and phase. The actual sieving task is then easily accomplished via a technique called 'windowed sieving'. In the case of PRIME1 and PRINT this can be several orders of magnitude faster than sieving the whole range up to the upper limit, since both tasks only asks for a small number of subranges to be sieved.
You can do that (remove numbers that are detected to be non-prime from your array/linked list), but then time complexity of algorithm will degrade to O(N^2/log(N)) or something like that, instead of original O(N*log(N)). This is because you will not be able to say "the numbers 2X, 3X, 4X, ..." are not prime anymore. You will have to loop through your entire compressed list.
You could erase each composite number from the array/vector once you have shown it to be composite. Or when you fill an array of numbers to put through the sieve, remove all even numbers (other than 2) and all numbers ending in 5.
If you have studied the sieve right, you must know we don't have the primes to begin with. We have an array, sizeof which is equal to the range. Now, if you want the range to be 10e9, you want this to be the size of the array. You have mentioned no language, but for each number, you must need a bit to represent whether it is prime or not.
Even that means you need 10^9 bits = 1.125 * 10^8 bytes which is greater than 100 MB of RAM.
Assuming you have all this, most optimized sieve takes O(n * log(log n)) time, which is, if n = 10e9, on a machine that evaluates 10e8 instructions per second, will still take some minutes.
Now, assuming you have all this with you, still, number of primes till 10e9 is q = 50,847,534, to save these will still take q * 4 bytes, which is still greater than 100MB. (more RAM)
Even if you remove the indexes which are multiples of 2, 3 or 5, this removes 21 numbers in every 30. This is not good enough, because in total, you will still need around 140 MB space. (40MB = a third of 10^9 bits + ~100MB for storing prime numbers).
So, since, for storing the primes, you will, in any case require similar amount of memory (of the same order as calculation), your question, IMO has no solution.
You can halve the size of the sieve by only 'storing' the odd numbers. This requires code to explicitly deal with the case of testing even numbers. For odd numbers, bit b of the sieve represents n = 2b + 3. Hence bit 0 represents 3, bit 1 represents 5 and so on. There is a small overhead in converting between the number n and the bit index b.
Whether this technique is any use to you depends on the memory/speed balance you require.
I'm learning about hash tables and quadratic probing in particular. I've read that if the load factor is <= 0.5 and the table's size is prime, quadratic probing will always find an empty slot and no key will be accessed multiple times. It then goes on to say that, in order to ensure efficient insertions, I should always maintain a load factor <= 0.5. What does this mean? Surely if we keep adding items, the load factor will increase until it equals 1 whether we want it to or not. So what is implied when my textbook says I should maintain a small load factor?
The implication is that at some point (when you would exceed a load factor of 0.5 in this case), you'll have to allocate a new table (which is bigger by some factor, maybe 1.5 or 2, and then rounded up to the nearest prime number) and copy all the elements from the old table into it (that's not a straight copy, the new position of an item will usually be different than the old position).
The following image visualizes the needed life span for 16 memory blocks of various sizes:
What I'm essentially looking for is given a number of N blocks of size sizei and lifetime [begini, endi), return the minimum sized total memory block needed to contain them during our total time interval and N offsets, offseti, into this total memory block for the input blocks.
A trivial non optimal algorithm would be the following:
int offsets[N];
offsets[0] = 0;
int total_size = size[0];
for (int i = 1; i < N; ++i)
{
offsets[i] = offsets[i - 1] + size[i - 1];
total_size += size[i];
}
Our current algorithm is to sort the blocks by size and then process them from largest to smallest, finding the first offset where the block doesn't overlap with an already "allocated" block. This is essentially a greedy algorithm, so I have a feeling that it would be possible to do better.
The algorithm only needs to be run once at the start of the application so it doesn't have to be superfast. The number of allocations is in the order of 10-50 and for our purposes the time can be discretized into around 50 fixed size units.
Find the minimum start time and maximum end time from your list of begin and end time intervals. This is the total interval, (t_min, t_max), of time you are interested in. Next, divide the time interval into some discrete and uniform intervals. Let the length of this interval be u. This is basically the maximum resolution of your memory management (how often you can possibly free and/or claim a block of memory).
For each unit of time, determine which Allocation IDs need memory and what size they each require at that time, call it s(t, id). The sum of s(t, id) over all allocation IDs is a lower bound of how much total memory you require at any given time. You can't do any better than the maximum of this function, which fails to take into account the desire to keep things allocated in the same region without moving them at each time step.
To find an optimal position for each item you could use a heuristic search. Basically, search the state space of all possible starting addresses for each memory block for the solution that takes up the smallest total amount of memory, which you find by simulating the progression of time from t_min to t_max.
A heuristic that might be worth trying is to prefer allocations where big chunks occupy spaces previously occupied by other big chunks, and small chunks are placed at locations with small contribution to the maximum memory usage of the strategy. You could also prune any strategy found that is worse than the best seen so far since the maximum memory claimed by the strategy is monotonic over time.
The heuristic search method may be slow, but it sounds like you care more about optimal memory usage than runtime of the allocation algorithm.
Clearly Seq asymptotically performs the same or better as [] for all possible operations. But since its structure is more complicated than lists, for small sizes its constant overhead will probably make it slower. I'd like to know how much, in particular:
How much slower is <| compared to :?
How much slower is folding over/traversing Seq compared to folding over/traversing [] (excluding the cost of a folding/traversing function)?
What is the size (approximately) for which \xs x -> xs ++ [x] becomes slower than |>?
What is the size (approximately) for which ++ becomes slower than ><?
What's the cost of calling viewl and pattern matching on the result compared to pattern matching on a list?
How much memory does an n-element Seq occupy compared to an n-element list? (Not counting the memory occupied by the elements, only the structure.)
I know that it's difficult to measure, since with Seq we talk about amortized complexity, but I'd like to know at least some rough numbers.
This should be a start - http://www.haskell.org/haskellwiki/Performance#Data.Sequence_vs._lists
A sequence uses between 5/6 and 4/3 times as much space as the equivalent list (assuming an overhead of one word per node, as in GHC). If only deque operations are used, the space usage will be near the lower end of the range, because all internal nodes will be ternary. Heavy use of split and append will result in sequences using approximately the same space as lists. In detail:
a list of length n consists of n cons nodes, each occupying 3 words.
a sequence of length n has approximately n/(k-1) nodes, where k is the average arity of the internal nodes (each 2 or 3). There is a pointer, a size and overhead for each node, plus a pointer for each element, i.e. n(3/(k-1) + 1) words.
List is a non-trivial constant-factor faster for operations at the head (cons and head), making it a more efficient choice for stack-like and stream-like access patterns. Data.Sequence is faster for every other access pattern, such as queue and random access.
I have one more concrete result to add to above answer. I am solving a Langevin equation. I used List and Data.Sequence. A lot of insertions at back of list/sequence are going on in this solution.
To sum up, I did not see any improvement in speed, in fact performance deteriorated with Sequences. Moreover with Data.Sequence, I need to increase the memory available for Haskell RTS.
Since I am definitely not an authority on optimizing; I post the both cases below. I'd be glad to know if this can be improved. Both codes were compiled with -O2 flag.
Solution with List, takes approx 13.01 sec
Solution with Data.Sequence, takes approx 15.13 sec