Related
I was going through my Data Structures and Algorithms notes, and came across the following examples regarding Time Complexity and Big-O Notation: The columns on the left count the number of operations carried out in each line. I didn't understand why almost all the lines in the first example have a multiple of 2 in front of them, whereas the other two examples don't. Obviously this doesn't affect the resulting O(n), but I would still like to know where the 2 came form.
I can only find one explanation for this: the sloppiness of the author of the slides.
In a proper analysis one have to explain what kind of operations are performed at which time for what input (like for example this book on page 21). Without this you can not even be sure whether we count multiplication of 2 numbers as 1 operation or 2 or something else?
These slides are inconsistent. For example:
In slide1 currentMax = A[0] takes 2 operations. Kind of makes sense if you take finding 0-th element in array as 1 operation and assigning as another one. But in the slide3 n iterations of s = s + X[i] takes n operations. Which means that s = s + X[i] takes 1 operation. Also kind of makes sense we just increase one counter.
But it is totally inconsistent with each other, because it doesn't makes sense that a = X[0] is 2 operations, and a = a + X[0] where you do more takes only 1.
I am aware of the fact that the Sieve of Eratosthenes can be implemented so that it finds primes continuosly without an upper bound (the segmented sieve).
My question is, could the Sieve of Atkin/Bernstein be implemented in the same way?
Related question: C#: How to make Sieve of Atkin incremental
However the related question has only 1 answer, which says "It's impossible for all sieves", which is obviously incorrect.
Atkin/Bernstein give a segmented version in Section 5 of their original paper. Presumably Bernstein's primegen program uses that method.
In fact, one can implement an unbounded Sieve of Atkin (SoA) not using segmentation at all as I have done here in F#. Note that this is a pure functional version that doesn't even use arrays to combine the solutions of the quadratic equations and the squaresfree filter and thus is considerably slower than a more imperative approach.
Berstein's optimizations using look up tables for optimum 32-bit ranges would make the code extremely complex and not suitable for presentation here, but it would be quite easy to adapt my F# code so that the sequences start at a set lower limit and are used only over a range in order to implement a segmented version, and/or applying the same techniques to a more imperative approach using arrays.
Note that even Berstein's implementation of the SoA isn't really faster than the Sieve of Eratosthenes with all possible optimizations as per Kim Walisch's primesieve but is only faster than an equivalently optimized version of the Sieve of Eratosthenes for the selected range of numbers as per his implementation.
EDIT_ADD: For those who do not want to wade through Berstein's pseudo-code and C code, I am adding to this answer to add a pseudo-code method to use the SoA over a range from LOW to HIGH where the delta from LOW to HIGH + 1 might be constrained to an even modulo 60 in order to use the modulo (and potential bit packing to only the entries on the 2,3,5 wheel) optimizations.
This is based on a possible implementation using the SoA quadratics of (4*x^2 + y^), (3*x^2 + y^2), and (3*x^2 -y^2) to be expressed as sequences of numbers with the x value for each sequence fixed to values between one and SQRT((HIGH - 1) / 4), SQRT((HIGH - 1) / 3), and solving the quadratic for 2*x^2 + 2*x - HIGH - 1 = 0 for x = (SQRT(1 + 2 * (HIGH + 1)) - 1) / 2, respectively, with the sequences expressed in my F# code as per the top link. Optimizations to the sequences there use that when sieving for only odd composites, for the "4x" sequences, the y values need only be odd and that the "3x" sequences need only use odd values of y when x is even and vice versa. Further optimization reduce the number of solutions to the quadratic equations (= elements in the sequences) by observing that the modulo patterns over the above sequences repeat over very small ranges of x and also repeat over ranges of y of only 30, which is used in the Berstein code but not (yet) implemented in my F# code.
I also do not include the well known optimizations that could be applied to the prime "squares free" culling to use wheel factorization and the calculations for the starting segment address as I use in my implementations of a segmented SoE.
So for purposes of calculating the sequence starting segment addresses for the "4x", "3x+", and "3x-" (or with "3x+" and "3x-" combined as I do in the F# code), and having calculated the ranges of x for each as per the above, the pseudo-code is as follows:
Calculate the range LOW - FIRST_ELEMENT, where FIRST_ELEMENT is with the lowest applicable value of y for each given value of x or y = x - 1 for the case of the "3x-" sequence.
For the job of calculating how many elements are in this range, this boils down to the question of how many of (y1)^2 + (y2)^2 + (y3)^2... there are where each y number is separated by two, to produce even or odd 'y's as required. As usual in square sequence analysis, we observe that differences between squares have a constant increasing increment as in delta(9 - 1) is 8, delta(25 - 9) is 16 for an increase of 8, delta (49 - 25) is 24 for a further increase of 8, etcetera. so that for n elements the last increment is 8 * n for this example. Expressing the sequence of elements using this, we get it is one (or whatever one chooses as the first element) plus eight times the sequence of something like (1 + 2 + 3 + ...+ n). Now standard reduction of linear sequences applies where this sum is (n + 1) * n / 2 or n^2/2 + n/2. This we can solve for how many n elements there are in the range by solving the quadratic equation n^2/2 + n/2 - range = 0 or n = (SQRT(8*range + 1) - 1) / 2.
Now, if FIRST_ELEMENT + 4 * (n + 1) * n does not equal LOW as the starting address, add one to n and use FIRST_ELEMENT + 4 * (n + 2) * (n + 1) as the starting address. If one uses further optimizations to apply wheel factorization culling to the sequence pattern, look up table arrays can be used to look up the closest value of used n that satisfies the conditions.
The modulus 12 or 60 of the starting element can be calculated directly or can be produced by use of look up tables based on the repeating nature of the modulo sequences.
Each sequence is then used to toggle the composite states up to the HIGH limit. If the additional logic is added to the sequences to jump values between only the applicable elements per sequence, no further use of modulo conditions is necessary.
The above is done for every "4x" sequence followed by the "3x+" and "3x-" sequences (or combine "3x+" and "3x-" into just one set of "3x" sequences) up to the x limits as calculated earlier or as tested per loop.
And there you have it: given an appropriate method of dividing the sieve range into segments, which is best used as fixed sizes that are related to the CPU cache sizes for best memory access efficiency, a method of segmenting the SoA just as used by Bernstein but somewhat simpler in expression as it mentions but does not combine the modulo operations and bit packing.
I'm pretty sure that this is the right site for this question, but feel free to move it to some other stackexchange site if it fits there better.
Suppose you have a sum of fractions a1/d1 + a2/d2 + … + an/dn. You want to compute a common numerator and denominator, i.e., rewrite it as p/q. We have the formula
p = a1*d2*…*dn + d1*a2*d3*…*dn + … + d1*d2*…d(n-1)*an
q = d1*d2*…*dn.
What is the most efficient way to compute these things, in particular, p? You can see that if you compute it naïvely, i.e., using the formula I gave above, you compute a lot of redundant things. For example, you will compute d1*d2 n-1 times.
My first thought was to iteratively compute d1*d2, d1*d2*d3, … and dn*d(n-1), dn*d(n-1)*d(n-2), … but even this is inefficient, because you will end up computing multiplications in the "middle" twice (e.g., if n is large enough, you will compute d3*d4 twice).
I'm sure this problem could be expressed somehow using maybe some graph theory or combinatorics, but I haven't studied enough of that stuff to have a good feel for it.
And one note: I don't care about cancelation, just the most efficient way to multiply things.
UPDATE:
I should have known that people on stackoverflow would be assuming that these were numbers, but I've been so used to my use case that I forgot to mention this.
We cannot just "divide" out an from each term. The use case here is a symbolic system. Actually, I am trying to fix a function called .as_numer_denom() in the SymPy computer algebra system which presently computes this the naïve way. See the corresponding SymPy issue.
Dividing out things has some problems, which I would like to avoid. First, there is no guarantee that things will cancel. This is because mathematically, (a*b)**n != a**n*b**n in general (if a and b are positive it holds, but e.g., if a == b ==-1 and n == 1/2, you get (a*b)**n == 1**(1/2) == 1 but (-1)**(1/2)*(-1)**(1/2) == I*I == -1). So I don't think it's a good idea to assume that dividing by an will cancel it in the expression (this may be actually be unfounded, I'd need to check what the code does).
Second, I'd like to also apply a this algorithm to computing the sum of rational functions. In this case, the terms would automatically be multiplied together into a single polynomial, and "dividing" out each an would involve applying the polynomial division algorithm. You can see in this case, you really do want to compute the most efficient multiplication in the first place.
UPDATE 2:
I think my fears for cancelation of symbolic terms may be unfounded. SymPy does not cancel things like x**n*x**(m - n) automatically, but I think that any exponents that would combine through multiplication would also combine through division, so powers should be canceling.
There is an issue with constants automatically distributing across additions, like:
In [13]: 2*(x + y)*z*(S(1)/2)
Out[13]:
z⋅(2⋅x + 2⋅y)
─────────────
2
But this is first a bug and second could never be a problem (I think) because 1/2 would be split into 1 and 2 by the algorithm that gets the numerator and denominator of each term.
Nonetheless, I still want to know how to do this without "dividing out" di from each term, so that I can have an efficient algorithm for summing rational functions.
Instead of adding up n quotients in one go I would use pairwise addition of quotients.
If things cancel out in partial sums then the numbers or polynomials stay smaller, which makes computation faster.
You avoid the problem of computing the same product multiple times.
You could try to order the additions in a certain way, to make canceling more likely (maybe add quotients with small denominators first?), but I don't know if this would be worthwhile.
If you start from scratch this is simpler to implement, though I'm not sure it fits as a replacement of the problematic routine in SymPy.
Edit: To make it more explicit, I propose to compute a1/d1 + a2/d2 + … + an/dn as (…(a1/d1 + a2/d2) + … ) + an/dn.
Compute two new arrays:
The first contains partial multiples to the left: l[0] = 1, l[i] = l[i-1] * d[i]
The second contains partial multiples to the right: r[n-1] = 1, r[i] = d[i] * r[i+1]
In both cases, 1 is the multiplicative identity of whatever ring you are working in.
Then each of your terms on the top, t[i] = l[i-1] * a[i] * r[i+1]
This assumes multiplication is associative, but it need not be commutative.
As a first optimization, you don't actually have to create r as an array: you can do a first pass to calculate all the l values, and accumulate the r values during a second (backward) pass to calculate the summands. No need to actually store the r values since you use each one once, in order.
In your question you say that this computes d3*d4 twice, but it doesn't. It does multiply two different values by d4 (one a right-multiplication and the other a left-multiplication), but that's not exactly a repeated operation. Anyway, the total number of multiplications is about 4*n, vs. 2*n multiplications and n divisions for the other approach that doesn't work in non-commutative multiplication or non-field rings.
If you want to compute p in the above expression, one way to do this would be to multiply together all of the denominators (in O(n), where n is the number of fractions), letting this value be D. Then, iterate across all of the fractions and for each fraction with numerator ai and denominator di, compute ai * D / di. This last term is equal to the product of the numerator of the fraction and all of the denominators other than its own. Each of these terms can be computed in O(1) time (assuming you're using hardware multiplication, otherwise it might take longer), and you can sum them all up in O(n) time.
This gives an O(n)-time algorithm for computing the numerator and denominator of the new fraction.
It was also pointed out to me that you could manually sift out common denominators and combine those trivially without multiplication.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I've noticed recently that there are a great many algorithms out there based in part or in whole on clever uses of numbers in creative bases. For example:
Binomial heaps are based on binary numbers, and the more complex skew binomial heaps are based on skew binary numbers.
Some algorithms for generating lexicographically ordered permutations are based on the factoradic number system.
Tries can be thought of as trees that look at one digit of the string at a time, for an appropriate base.
Huffman encoding trees are designed to have each edge in the tree encode a zero or one in some binary representation.
Fibonacci coding is used in Fibonacci search and to invert certain types of logarithms.
My question is: what other algorithms are out there that use a clever number system as a key step of their intuition or proof?. I'm thinking about putting together a talk on the subject, so the more examples I have to draw from, the better.
Chris Okasaki has a very good chapter in his book Purely Functional Data Structures that discusses "Numerical Representations": essentially, take some representation of a number and convert it into a data structure. To give a flavor, here are the sections of that chapter:
Positional Number Systems
Binary Numbers (Binary Random-Access Lists, Zeroless Representations, Lazy Representations, Segmented Representations)
Skew Binary Numbers (Skew Binary Random Access Lists, Skew Binomial Heaps)
Trinary and Quaternary Numbers
Some of the best tricks, distilled:
Distinguish between dense and sparse representations of numbers (usually you see this in matrices or graphs, but it's applicable to numbers too!)
Redundant number systems (systems that have more than one representation of a number) are useful.
If you arrange the first digit to be non-zero or use a zeroless representation, retrieving the head of the data structure can be efficient.
Avoid cascading borrows (from taking the tail of the list) and carries (from consing onto the list) by segmenting the data structure
Here is also the reference list for that chapter:
Guibas, McCreight, Plass and Roberts: A new representation for linear lists.
Myers: An applicative random-access stack
Carlsson, Munro, Poblete: An implicit binomial queue with constant insertion time.
Kaplan, Tarjan: Purely functional lists with catenation via recursive slow-down.
"Ternary numbers can be used to convey
self-similar structures like a
Sierpinski Triangle or a Cantor set
conveniently." source
"Quaternary numbers are used in the
representation of 2D Hilbert curves." source
"The quater-imaginary numeral system
was first proposed by Donald Knuth in
1955, in a submission to a high-school
science talent search. It is a
non-standard positional numeral system
which uses the imaginary number 2i as
its base. It is able to represent
every complex number using only the
digits 0, 1, 2, and 3." source
"Roman numerals are a biquinary system." source
"Senary may be considered useful in the
study of prime numbers since all
primes, when expressed in base-six,
other than 2 and 3 have 1 or 5 as the
final digit." source
"Sexagesimal (base 60) is a numeral
system with sixty as its base. It
originated with the ancient Sumerians
in the 3rd millennium BC, it was
passed down to the ancient
Babylonians, and it is still used — in
a modified form — for measuring time,
angles, and the geographic coordinates
that are angles." source
etc...
This list is a good starting point.
I read your question the other day, and today was faced with a problem: How do I generate all partitionings of a set? The solution that occurred to me, and that I used (maybe due to reading your question) was this:
For a set with (n) elements, where I need (p) partitions, count through all (n) digit numbers in base (p).
Each number corresponds to a partitioning. Each digit corresponds to an element in the set, and the value of the digit tells you which partition to put the element in.
It's not amazing, but it's neat. It's complete, causes no redundancy, and uses arbitrary bases. The base you use depends on the specific partitioning problem.
I recently came across a cool algorithm for generating subsets in lexicographical order based on the binary representations of the numbers between 0 and 2n - 1. It uses the numbers' bits both to determine what elements should be chosen for the set and to locally reorder the generated sets to get them into lexicographical order. If you're curious, I have a writeup posted here.
Also, many algorithms are based on scaling (such as a weakly-polynomial version of the Ford-Fulkerson max-flow algorithm), which uses the binary representation of the numbers in the input problem to progressively refine a rough approximation into a complete solution.
Not exactly a clever base system but a clever use of the base system: Van der Corput sequences are low-discrepancy sequences formed by reversing the base-n representation of numbers. They're used to construct the 2-d Halton sequences which look kind of like this.
I vaguely remember something about double base systems for speeding up some matrix multiplication.
Double base system is a redundant system that uses two bases for one number.
n = Sum(i=1 --> l){ c_i * 2^{a_i} * 3 ^ {b_i}, where c in {-1,1}
Redundant means that one number can be specified in many ways.
You can look for the article "Hybrid Algorithm for the Computation of the Matrix Polynomial" by Vassil Dimitrov, Todor Cooklev.
Trying to give the best short overview I can.
They were trying to compute matrix polynomial G(N,A) = I + A + ... + A^{N-1}.
Supoosing N is composite G(N,A) = G(J,A) * G(K, A^J), if we apply for J = 2, we get:
/ (I + A) * G(K, A^2) , if N = 2K
G(N,A) = |
\ I + (A + A^2) * G(K, A^2) , if N = 2K + 1
also,
/ (I + A + A^2) * G(K, A^3) , if N = 3K
G(N,A) = | I + (A + A^2 + A^3) * G(K, A^3) , if N = 3K + 1
\ I + A * (A + A^2 + A^3) * G(K, A^3) , if N = 3K + 2
As it's "obvious" (jokingly) that some of these equations are fast in the first system and some better in the second - so it is a good idea to choose the best of those depending on N. But this would require fast modulo operation for both 2 and 3. Here's why the double base comes in - you can basically do the modulo operation fast for both of them giving you a combined system:
/ (I + A + A^2) * G(K, A^3) , if N = 0 or 3 mod 6
G(N,A) = | I + (A + A^2 + A^3) * G(K, A^3) , if N = 1 or 4 mod 6
| (I + A) * G(3K + 1, A^2) , if N = 2 mod 6
\ I + (A + A^2) * G(3K + 2, A^2) , if N = 5 mod 6
Look at the article for better explanation as I'm not an expert in this area.
RadixSort can use a various number bases.
http://en.wikipedia.org/wiki/Radix_sort
Pretty interesting implementation of a bucketSort.
here is a good post on using ternary numbers to solve the "counterfeit coin" problem (where you have to detect a single counterfeit coin in a bag of regular ones, using a balance as few times as possible)
Hashing strings (e.g. in the Rabin-Karp algorithm) often evaluate the string as a base-b number consisting of n digits (where n is the length of the string, and b is some chosen base that is large enough). For example the string "ABCD" can be hashed as:
'A'*b^3+'B'*b^2+'C'*b^1+'D'*b^0
Substituting ASCII values for characters and taking b to be 256 this becomes,
65*256^3+66*256^2+67*256^1+68*256^0
Though, in most practical applications, the resulting value is taken modulo some reasonably sized number to keep the result sufficiently small.
Exponentiation by squaring is based on binary representation of the exponent.
In Hackers Delight (a book every programmer should know in my eyes) there is a complete chapter about unusal bases, like -2 as base (yea, right negative bases) or -1+i (i as imaginary unit sqrt(-1)) as base.
Also I nice calculation what the best base is (in terms of hardware design, for all who dont want to read it: The solution of the equation is e, so you can go with 2 or 3, 3 would be little bit better (factor 1.056 times better than 2) - but is technical more practical).
Other things which come to my mind are gray counter (you when you count in this system only 1 bit changes, you often use this property in hardware design to reduce metastability issues) or the generalisation of the already mentioned Huffmann encoding - the arithmetic encoding.
Cryptography makes extensive use of integer rings (modular arithmatic) and also finite fields, whose operations are intuitively based on the way polynomials with integer coefficients behave.
I really like this one for converting binary numbers into Gray codes: http://www.matrixlab-examples.com/gray-code.html
Great question. The list is long indeed.
Telling time is a simple instance of mixed bases (days | hours | minutes | seconds | am/pm)
I've created a meta-base enumeration n-tuple framework if you're interested in hearing about it. It's some very sweet syntactic sugar for base numbering systems. It's not released yet. Email my username (at gmail).
One of my favourites using base 2 is Arithmetic Encoding. Its unusual because the hart of the algorithm uses representations of numbers between 0 and 1 in binary.
May be AKS is the case.
Lets say I have a large set of data .
Then I can divide it into two find mean of those two and calculate the mean of the last 2 values I get.
a) Is this the mean of the original big quantity ?
b) Can I do this sort of method for calculating standard deviation ??
a) only if the sets you divide into are always the same size, meaning that the original set size must be a power of 2.
For example, the mean of {6} is 6, and the mean of {3,6} is 4.5, but the mean of {3,6,6} is not 5.25, it's 5.
Certainly you could recursively divide into parts to calculate the sum, though, and divide by the total size at the end. Not sure if that does you any good.
b) no
For example, the s.d of {2} is 0, and the s.d. of {1} is 0, but the s.d of {1,2} is not 0.
Once you've calculated the mean of the whole set, you can recursively divide to calculate the sum square deviation from the mean, and as with the mean calculation, divide by the total size and take square root at the end. [Edit: in fact all you need to calculate s.d is the sumsquare, the sum, and the count. Forgot about that. So you don't have to calculate the mean first]
It is incorrect, but if you can express the mean and standard deviation of a set from the means, standard deviations, and size of the sets which that set is divided into.
Specifically, if m_x, s_x and n_x are the means, standard deviations, and sizes of x, and X is partitioned into many x's, then
n_X = sum_x(n_x)
m_X = sum_x(n_x m_x)/n_X
s_X^2 = (sum_x(n_x(s_x^2 + m_x^2)) - m_X)/n_X
assuming the standard deviation is of the form sum(x - mean(x))/n; if it is the sample unbiased estimator, just adjust the weights accordingly.
Sure you can. No need for equal sets, power of two. Pseudo code:
N1,mean1,s1;
N2,mean2,s2;
N12,mean12,s12;
N12 = N1+N2;
mean12 = ((mean1*N1) + (mean2*N2)) / N12;
s12 = sqrt( (s1*s1*N1 + s2*s2*N2) / N12 + N1*N2/(N12*N12)*(s1-s2)*(s1-s2) );
http://en.wikipedia.org/wiki/Weighted_mean
http://en.wikipedia.org/wiki/Standard_deviation#Combining_standard_deviations
On (a) - it's only precisely correct if you precisely divided the set into two. If there were an odd number of items, for instance, there is a slight weighting toward the smaller "half". The larger the set, the less significant the problem. However, the problem recurs for the smaller sets as you subdivide. You get very large error when dividing a set of three items into a single item and a pair - each item in the pair is only half as significant to the final result as the single item.
I don't see the gain, though. You still do as many additions. You even end up doing more divisions. More importantly, you access memory in a non-sequential order, leading to poor cache performance.
The usual approach for a mean and standard deviation is to first calculate the sum of all items, and the sum of the squares - both in the same loop. Old calculators used to handle this with running totals, also keeping count of the number of items as they went. At the end, those three values (n, sum-of-x and sum-of-x-squared) are all you need - the rest is just substitution into the standard formulae for the mean and standard deviation.
EDIT
If you're dead set on using recursion for this, look up "tail recursion". Mathematically, tail recursion and iteration are equivalent - different representations of the same thing. In implementation terms tail recursion might cause a stack overflow where iteration would work, but (1) some languages guarantee this will not happen (e.g. Scheme, Haskell), and (2) many compilers will handle this as an optimisation anyway (e.g. GCC for C or C++).