How to implement in ATS the following function based on tail-recursion? - ats

I have the following recursively defined function:
fun foo(n: int): int =
ifcase
| n = 0 => 0
| n = 1 => 1
| n = 2 => 1
| (* else *) => foo(n-1) + foo(n-3)
// end of [ifcase]
How can I implement foo based on tail-recursion only.

You can do the followings, where the function loop is a tail-recursive function that keeps track of all necessary intermediate states. With tail-call optimization, this one will be optimized into a loop within a single frame, which is much faster than foo.
implement bar (n) = let
fun loop (k:int, n1:int, n2:int, n3:int): int =
if k = n
then n1 + n3
else loop (k+1, n2, n3, n1+n3)
in
ifcase
| n = 0 => 0
| n = 1 => 1
| n = 2 => 1
| _ => loop(3, 0, 1, 2)
end

Related

Using matrices to find the number of different ways to write n as the sum of 1, 3, and 4?

This is a question given in this presentation. Dynamic Programming
now i have implemented the algorithm using recursion and it works fine for small values. But when n is greater than 30 it becomes really slow.The presentation mentions that for large values of n one should consider something similar to
the matrix form of Fibonacci numbers .I am having trouble undestanding how to use the matrix form of Fibonacci numbers to come up with a solution.Can some one give me some hints or pseudocode
Thanks
Yes, you can use the technique from fast Fibonacci implementations to solve this problem in time O(log n)! Here's how to do it.
Let's go with your definition from the problem statement that 1 + 3 is counted the same as 3 + 1. Then you have the following recurrence relation:
A(0) = 1
A(1) = 1
A(2) = 1
A(3) = 2
A(k+4) = A(k) + A(k+1) + A(k+3)
The matrix trick here is to notice that
| 1 0 1 1 | |A( k )| |A(k) + A(k-2) + A(k-3)| |A(k+1)|
| 1 0 0 0 | |A(k-1)| | A( k ) | |A( k )|
| 0 1 0 0 | |A(k-2)| = | A(k-1) | = |A(k-1)|
| 0 0 1 0 | |A(k-3)| | A(k-2) | = |A(k-2)|
In other words, multiplying a vector of the last four values in the series produces a vector with those values shifted forward by one step.
Let's call that matrix there M. Then notice that
|A( k )| |A(k+2)|
|A(k-1)| |A(k+1)|
M^2 |A(k-2)| = |A( k )|
|A(k-3)| |A(k-1)|
In other words, multiplying by the square of this matrix shifts the series down two steps. More generally:
|A( k )| | A(k+n) |
|A(k-1)| |A(k-1 + n)|
M^n |A(k-2)| = |A(k-2 + n)|
|A(k-3)| |A(k-3 + n)|
So multiplying by Mn shifts the series down n steps. Now, if we want to know the value of A(n+3), we can just compute
|A(3)| |A(n+3)|
|A(2)| |A(n+2)|
M^n |A(1)| = |A(n+1)|
|A(0)| |A(n+2)|
and read off the top entry of the vector! This can be done in time O(log n) by using exponentiation by squaring. Here's some code that does just that. This uses a matrix library I cobbled together a while back:
#include "Matrix.hh"
#include <cstdint>
#include <iomanip>
#include <iostream>
#include <algorithm>
using namespace std;
/* Naive implementations of A. */
uint64_t naiveA(int n) {
if (n == 0) return 1;
if (n == 1) return 1;
if (n == 2) return 1;
if (n == 3) return 2;
return naiveA(n-1) + naiveA(n-3) + naiveA(n-4);
}
/* Constructs and returns the giant matrix. */
Matrix<4, 4, uint64_t> M() {
Matrix<4, 4, uint64_t> result;
fill(result.begin(), result.end(), uint64_t(0));
result[0][0] = 1;
result[0][2] = 1;
result[0][3] = 1;
result[1][0] = 1;
result[2][1] = 1;
result[3][2] = 1;
return result;
}
/* Constructs the initial vector that we multiply the matrix by. */
Vector<4, uint64_t> initVec() {
Vector<4, uint64_t> result;
result[0] = 2;
result[1] = 1;
result[2] = 1;
result[3] = 1;
return result;
}
/* O(log n) time for raising a matrix to a power. */
Matrix<4, 4, uint64_t> fastPower(const Matrix<4, 4, uint64_t>& m, int n) {
if (n == 0) return Identity<4, uint64_t>();
auto half = fastPower(m, n / 2);
if (n % 2 == 0) return half * half;
else return half * half * m;
}
/* Fast implementation of A(n) using matrix exponentiation. */
uint64_t fastA(int n) {
if (n == 0) return 1;
if (n == 1) return 1;
if (n == 2) return 1;
if (n == 3) return 2;
auto result = fastPower(M(), n - 3) * initVec();
return result[0];
}
/* Some simple test code showing this in action! */
int main() {
for (int i = 0; i < 25; i++) {
cout << setw(2) << i << ": " << naiveA(i) << ", " << fastA(i) << endl;
}
}
Now, how would this change if 3 + 1 and 1 + 3 were treated as equivalent? This means that we can think about solving this problem in the following way:
Let A(n) be the number of ways to write n as a sum of 1s, 3s, and 4s.
Let B(n) be the number of ways to write n as a sum of 1s and 3s.
Let C(n) be the number of ways to write n as a sum of 1s.
We then have the following:
A(n) = B(n) for all n ≤ 3, since for numbers in that range the only options are to use 1s and 3s.
A(n + 4) = A(n) + B(n + 4), since your options are either (1) use a 4 or (2) not use a 4, leaving the remaining sum to use 1s and 3s.
B(n) = C(n) for all n ≤ 2, since for numbers in that range the only options are to use 1s.
B(n + 3) = B(n) + C(n + 3), sine your options are either (1) use a 3 or (2) not use a 3, leaving the remaining sum to use only 1s.
C(0) = 1, since there's only one way to write 0 as a sum of no numbers.
C(n+1) = C(n), since the only way to write something with 1s is to pull out a 1 and write the remaining number as a sum of 1s.
That's a lot to take in, but do notice the following: we ultimately care about A(n), and to evaluate it, we only need to know the values of A(n), A(n-1), A(n-2), A(n-3), B(n), B(n-1), B(n-2), B(n-3), C(n), C(n-1), C(n-2), and C(n-3).
Let's imagine, for example, that we know these twelve values for some fixed value of n. We can learn those twelve values for the next value of n as follows:
C(n+1) = C(n)
B(n+1) = B(n-2) + C(n+1) = B(n-2) + C(n)
A(n+1) = A(n-3) + B(n+1) = A(n-3) + B(n-2) + C(n)
And the remaining values then shift down.
We can formulate this as a giant matrix equation:
A( n ) A(n-1) A(n-2) A(n-3) B( n ) B(n-1) B(n-2) C( n )
| 0 0 0 1 0 0 1 1 | |A( n )| = |A(n+1)|
| 1 0 0 0 0 0 0 0 | |A(n-1)| = |A( n )|
| 0 1 0 0 0 0 0 0 | |A(n-2)| = |A(n-1)|
| 0 0 1 0 0 0 0 0 | |A(n-3)| = |A(n-2)|
| 0 0 0 0 0 0 1 1 | |B( n )| = |B(n+1)|
| 0 0 0 0 1 0 0 0 | |B(n-1)| = |B( n )|
| 0 0 0 0 0 1 0 0 | |B(n-2)| = |B(n-1)|
| 0 0 0 0 0 0 0 1 | |C( n )| = |C(n+1)|
Let's call this gigantic matrix here M. Then if we compute
|2| // A(3) = 2, since 3 = 3 or 3 = 1 + 1 + 1
|1| // A(2) = 1, since 2 = 1 + 1
|1| // A(1) = 1, since 1 = 1
M^n |1| // A(0) = 1, since 0 = (empty sum)
|2| // B(3) = 2, since 3 = 3 or 3 = 1 + 1 + 1
|1| // B(2) = 1, since 2 = 1 + 1
|1| // B(1) = 1, since 1 = 1
|1| // C(3) = 1, since 3 = 1 + 1 + 1
We'll get back a vector whose first entry is A(n+3), the number of ways to write n+3 as a sum of 1's, 3's, and 4's. (I've actually coded this up to check it - it works!) You can then use the technique for computing Fibonacci numbers using a matrix to a power efficiently that you saw with Fibonacci numbers to solve this in time O(log n).
Here's some code doing that:
#include "Matrix.hh"
#include <cstdint>
#include <iomanip>
#include <iostream>
#include <algorithm>
using namespace std;
/* Naive implementations of A, B, and C. */
uint64_t naiveC(int n) {
return 1;
}
uint64_t naiveB(int n) {
return (n < 3? 0 : naiveB(n-3)) + naiveC(n);
}
uint64_t naiveA(int n) {
return (n < 4? 0 : naiveA(n-4)) + naiveB(n);
}
/* Constructs and returns the giant matrix. */
Matrix<8, 8, uint64_t> M() {
Matrix<8, 8, uint64_t> result;
fill(result.begin(), result.end(), uint64_t(0));
result[0][3] = 1;
result[0][6] = 1;
result[0][7] = 1;
result[1][0] = 1;
result[2][1] = 1;
result[3][2] = 1;
result[4][6] = 1;
result[4][7] = 1;
result[5][4] = 1;
result[6][5] = 1;
result[7][7] = 1;
return result;
}
/* Constructs the initial vector that we multiply the matrix by. */
Vector<8, uint64_t> initVec() {
Vector<8, uint64_t> result;
result[0] = 2;
result[1] = 1;
result[2] = 1;
result[3] = 1;
result[4] = 2;
result[5] = 1;
result[6] = 1;
result[7] = 1;
return result;
}
/* O(log n) time for raising a matrix to a power. */
Matrix<8, 8, uint64_t> fastPower(const Matrix<8, 8, uint64_t>& m, int n) {
if (n == 0) return Identity<8, uint64_t>();
auto half = fastPower(m, n / 2);
if (n % 2 == 0) return half * half;
else return half * half * m;
}
/* Fast implementation of A(n) using matrix exponentiation. */
uint64_t fastA(int n) {
if (n == 0) return 1;
if (n == 1) return 1;
if (n == 2) return 1;
if (n == 3) return 2;
auto result = fastPower(M(), n - 3) * initVec();
return result[0];
}
/* Some simple test code showing this in action! */
int main() {
for (int i = 0; i < 25; i++) {
cout << setw(2) << i << ": " << naiveA(i) << ", " << fastA(i) << endl;
}
}
This is a very interesting sequence. It is almost but not quite the order-4 Fibonacci (a.k.a. Tetranacci) numbers. Having extracted the doubling formulas for Tetranacci from its companion matrix, I could not resist doing it again for this very similar recurrence relation.
Before we get into the actual code, some definitions and a short derivation of the formulas used are in order. Define an integer sequence A such that:
A(n) := A(n-1) + A(n-3) + A(n-4)
with initial values A(0), A(1), A(2), A(3) := 1, 1, 1, 2.
For n >= 0, this is the number of integer compositions of n into parts from the set {1, 3, 4}. This is the sequence that we ultimately wish to compute.
For convenience, define a sequence T such that:
T(n) := T(n-1) + T(n-3) + T(n-4)
with initial values T(0), T(1), T(2), T(3) := 0, 0, 0, 1.
Note that A(n) and T(n) are simply shifts of each other. More precisely, A(n) = T(n+3) for all integers n. Accordingly, as elaborated by another answer, the companion matrix for both sequences is:
[0 1 0 0]
[0 0 1 0]
[0 0 0 1]
[1 1 0 1]
Call this matrix C, and let:
a, b, c, d := T(n), T(n+1), T(n+2), T(n+3)
a', b', c', d' := T(2n), T(2n+1), T(2n+2), T(2n+3)
By induction, it can easily be shown that:
[0 1 0 0]^n = [d-c-a c-b b-a a]
[0 0 1 0] [ a d-c c-b b]
[0 0 0 1] [ b b+a d-c c]
[1 1 0 1] [ c c+b b+a d]
As seen above, for any n, C^n can be fully determined from its rightmost column alone. Furthermore, multiplying C^n with its rightmost column produces the rightmost column of C^(2n):
[d-c-a c-b b-a a][a] = [a'] = [a(2d - 2c - a) + b(2c - b)]
[ a d-c c-b b][b] [b'] [ a^2 + c^2 + 2b(d - c)]
[ b b+a d-c c][c] [c'] [ b(2a + b) + c(2d - c)]
[ c c+b b+a d][d] [d'] [ b^2 + d^2 + 2c(a + b)]
Thus, if we wish to compute C^n for some n by repeated squaring, we need only perform matrix-vector multiplication per step instead of the full matrix-matrix multiplication.
Now, the implementation, in Python:
# O(n) integer additions or subtractions
def A_linearly(n):
a, b, c, d = 0, 0, 0, 1 # T(0), T(1), T(2), T(3)
if n >= 0:
for _ in range(+n):
a, b, c, d = b, c, d, a + b + d
else: # n < 0
for _ in range(-n):
a, b, c, d = d - c - a, a, b, c
return d # because A(n) = T(n+3)
# O(log n) integer multiplications, additions, subtractions.
def A_by_doubling(n):
n += 3 # because A(n) = T(n+3)
if n >= 0:
a, b, c, d = 0, 0, 0, 1 # T(0), T(1), T(2), T(3)
else: # n < 0
a, b, c, d = 1, 0, 0, 0 # T(-1), T(0), T(1), T(2)
# Unroll the final iteration to avoid computing extraneous values
for i in reversed(range(1, abs(n).bit_length())):
w = a*(2*(d - c) - a) + b*(2*c - b)
x = a*a + c*c + 2*b*(d - c)
y = b*(2*a + b) + c*(2*d - c)
z = b*b + d*d + 2*c*(a + b)
if (n >> i) & 1 == 0:
a, b, c, d = w, x, y, z
else: # (n >> i) & 1 == 1
a, b, c, d = x, y, z, w + x + z
if n & 1 == 0:
return a*(2*(d - c) - a) + b*(2*c - b) # w
else: # n & 1 == 1
return a*a + c*c + 2*b*(d - c) # x
print(all(A_linearly(n) == A_by_doubling(n) for n in range(-1000, 1001)))
Because it was rather trivial to code, the sequence is extended to negative n in the usual way. Also provided is a simple linear implementation to serve as a point of reference.
For n large enough, the logarithmic implementation above is 10-20x faster than directly exponentiating the companion matrix with numpy, by a simple (i.e. not rigorous, and likely flawed) timing comparison. And by my estimate, it would still take ~100 years to compute A(10**12)! Even though the algorithm above has room for improvement, that number is simply too large. On the other hand, computing A(10**12) mod M for some M is much more attainable.
A direct relation to Lucas and Fibonacci numbers
It turns out that T(n) is even closer to the Fibonacci and Lucas numbers than it is to Tetranacci. To see this, note that the characteristic polynomial for T(n) is x^4 - x^3 - x - 1 = 0 which factors into (x^2 - x - 1)(x^2 + 1) = 0. The first factor is the characteristic polynomial for Fibonacci & Lucas! The 4 roots of (x^2 - x - 1)(x^2 + 1) = 0 are the two Fibonacci roots, phi and psi = 1 - phi, and i and -i--the two square roots of -1.
The closed-form expression or "Binet" formula for T(n) will have the general form:
T(n) = U(n) + V(n)
U(n) = p*(phi^n) + q*(psi^n)
V(n) = r*(i^n) + s*(-i)^n
for some constant coefficients p, q, r, s.
Using the initial values for T(n), solving for the coefficients, applying some algebra, and noting that the Lucas numbers have the closed-form expression: L(n) = phi^n + psi^n, we can derive the following relations:
L(n+1) - L(n) L(n-1) F(n) + F(n-2)
U(n) = ------------- = -------- = ------------
5 5 5
where L(n) is the n'th Lucas number with L(0), L(1) := 2, 1 and F(n) is the n'th Fibonacci number with F(0), F(1) := 0, 1. And we also have:
V(n) = 1 / 5 if n = 0 (mod 4)
| -2 / 5 if n = 1 (mod 4)
| -1 / 5 if n = 2 (mod 4)
| 2 / 5 if n = 3 (mod 4)
Which is ugly, but trivial to code. Note that the numerator of V(n) can also be succinctly expressed as cos(n*pi/2) - 2sin(n*pi/2) or (3-(-1)^n) / 2 * (-1)^(n(n+1)/2), but we use the piece-wise definition for clarity.
Here's an even nicer, more direct identity:
T(n) + T(n+2) = F(n)
Essentially, we can compute T(n) (and therefore A(n)) by using Fibonacci & Lucas numbers. Theoretically, this should be much more efficient than the Tetranacci-like approach.
It is known that the Lucas numbers can computed more efficiently than Fibonacci, therefore we will compute A(n) from the Lucas numbers. The most efficient, simple Lucas number algorithm I know of is one by L.F. Johnson (see his 2010 paper: Middle and Ripple, fast simple O(lg n) algorithms for Lucas Numbers). Once we have a Lucas algorithm, we use the identity: T(n) = L(n - 1) / 5 + V(n) to compute A(n).
# O(log n) integer multiplications, additions, subtractions
def A_by_lucas(n):
n += 3 # because A(n) = T(n+3)
offset = (+1, -2, -1, +2)[n % 4]
L = lf_johnson_2010_middle(n - 1)
return (L + offset) // 5
def lf_johnson_2010_middle(n):
"-> n'th Lucas number. See [L.F. Johnson 2010a]."
#: The following Lucas identities are used:
#:
#: L(2n) = L(n)^2 - 2*(-1)^n
#: L(2n+1) = L(2n+2) - L(2n)
#: L(2n+2) = L(n+1)^2 - 2*(-1)^(n+1)
#:
#: The first and last identities are equivalent.
#: For the unrolled iteration, the following is also used:
#:
#: L(2n+1) = L(n)*L(n+1) - (-1)^n
#:
#: Since this approach uses only square multiplications per loop,
#: It turns out to be slightly faster than standard Lucas doubling,
#: which uses 1 square and 1 regular multiplication.
if n >= 0:
a, b, sign = 2, 1, +1 # L(0), L(1), (-1)^0
else: # n < 0
a, b, sign = -1, 2, -1 # L(-1), L(0), (-1)^(-1)
# unroll the last iteration to avoid computing unnecessary values
for i in reversed(range(1, abs(n).bit_length())):
a = a*a - 2*sign # L(2k)
c = b*b + 2*sign # L(2k+2)
b = c - a # L(2k+1)
sign = +1
if (n >> i) & 1:
a, b = b, c
sign = -1
if n & 1:
return a*b - sign
else:
return a*a - 2*sign
You may verify that A_by_lucas produces the same results as the previous A_by_doubling function, but is roughly 5x faster. Still not fast enough to compute A(10**12) in any reasonable amount of time!
You can easily improve your current recursion implementation by adding memoization which makes the solution fast again. C# code:
// Dictionary to store computed values
private static Dictionary<int, long> s_Solutions = new Dictionary<int, long>();
private static long Count134(int value) {
if (value == 0)
return 1;
else if (value <= 0)
return 0;
long result;
// Improvement: Do we have the value computed?
if (s_Solutions.TryGetValue(value, out result))
return result;
result = Count134(value - 4) +
Count134(value - 3) +
Count134(value - 1);
// Improvement: Store the value computed for future use
s_Solutions.Add(value, result);
return result;
}
And so you can easily call
Console.Write(Count134(500));
The outcome (which takes about 2 milliseconds) is
3350159379832610737

how to implement sigma in haskell?

i'm learning haskell and i'm trying to solve http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&problem=1098
i solved this problem using other language so i know the solution
answerTo(n, d) = func(n, d) - func(n, d - 1)
func(n, d) = Σ(func(k - 2, d -1) * func(n - k, d)) | 2 <= k <= n, k is even
func(0, d) = 1
and i need to implement func in haskell. i don't know how to.
func n d
| n < 0 || d < 0 = 0
| n == 0 && d >= 0 = 1
| otherwise = --need to implement Σ(func (k - 2) (d -1)) * (func (n - k) d) | 2 <= k <= n, k is even
i solved in this way
func (n, d)
| n == 0 && d >= 0 = 1
| n < 0 || d < 0 = 0
| otherwise = sum (zipWith (*) (map func arg1) (map func arg2))
where
arg1 = [(k - 2, d - 1) | k <- filter even [2..n]]
arg2 = [(n - k, d) | k <- filter even [2..n]]
there are other solutions more graceful?
The Math.NumberTheory.Primes.Factorization module in the arithmoi package includes a σ function that may be the one you need, along with other related functions. You can click through to the source code for that function if you like.
Edit
Now that you've clarified what you mean, the answer is simple. You need to enumerate the values you need and then sum them. The easiest way to do this is using the sum function to sum a list that you create using a list comprehension:
func n d
| n < 0 || d < 0 = 0
| n == 0 && d >= 0 = 1
| otherwise = sum [func (k - 2) (d - 1) * func (n - k) d | k <- [2,4..n]]

how to match dna sequence pattern

I am getting a trouble finding an approach to solve this problem.
Input-output sequences are as follows :
**input1 :** aaagctgctagag
**output1 :** a3gct2ag2
**input2 :** aaaaaaagctaagctaag
**output2 :** a6agcta2ag
Input nsequence can be of 10^6 characters and largest continuous patterns will be considered.
For example for input2 "agctaagcta" output will not be "agcta2gcta" but it will be "agcta2".
Any help appreciated.
Explanation of the algorithm:
Having a sequence S with symbols s(1), s(2),…, s(N).
Let B(i) be the best compressed sequence with elements s(1), s(2),…,s(i).
So, for example, B(3) will be the best compressed sequence for s(1), s(2), s(3).
What we want to know is B(N).
To find it, we will proceed by induction. We want to calculate B(i+1), knowing B(i), B(i-1), B(i-2), …, B(1), B(0), where B(0) is empty sequence, and and B(1) = s(1). At the same time, this constitutes a proof that the solution is optimal. ;)
To calculate B(i+1), we will pick the best sequence among the candidates:
Candidate sequences where the last block has one element:
B(i )s(i+1)1
B(i-1)s(i+1)2 ; only if s(i) = s(i+1)
B(i-2)s(i+1)3 ; only if s(i-1) = s(i) and s(i) = s(i+1)
…
B(1)s(i+1)[i-1] ; only if s(2)=s(3) and s(3)=s(4) and … and s(i) = s(i+1)
B(0)s(i+1)i = s(i+1)i ; only if s(1)=s(2) and s(2)=s(3) and … and s(i) = s(i+1)
Candidate sequences where the last block has 2 elements:
B(i-1)s(i)s(i+1)1
B(i-3)s(i)s(i+1)2 ; only if s(i-2)s(i-1)=s(i)s(i+1)
B(i-5)s(i)s(i+1)3 ; only if s(i-4)s(i-3)=s(i-2)s(i-1) and s(i-2)s(i-1)=s(i)s(i+1)
…
Candidate sequences where the last block has 3 elements:
…
Candidate sequences where the last block has 4 elements:
…
…
Candidate sequences where last block has n+1 elements:
s(1)s(2)s(3)………s(i+1)
For each possibility, the algorithm stops when the sequence block is no longer repeated. And that’s it.
The algorithm will be some thing like this in psude-c code:
B(0) = “”
for (i=1; i<=N; i++) {
// Calculate all the candidates for B(i)
BestCandidate=null
for (j=1; j<=i; j++) {
Calculate all the candidates of length (i)
r=1;
do {
Candidadte = B([i-j]*r-1) s(i-j+1)…s(i-1)s(i) r
If ( (BestCandidate==null)
|| (Candidate is shorter that BestCandidate))
{
BestCandidate=Candidate.
}
r++;
} while ( ([i-j]*r <= i)
&&(s(i-j*r+1) s(i-j*r+2)…s(i-j*r+j) == s(i-j+1) s(i-j+2)…s(i-j+j))
}
B(i)=BestCandidate
}
Hope that this can help a little more.
The full C program performing the required task is given below. It runs in O(n^2). The central part is only 30 lines of code.
EDIT I have restructured a little bit the code, changed the names of the variables and added some comment in order to be more readable.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
// This struct represents a compressed segment like atg4, g3, agc1
struct Segment {
char *elements;
int nElements;
int count;
};
// As an example, for the segment agagt3 elements would be:
// {
// elements: "agagt",
// nElements: 5,
// count: 3
// }
struct Sequence {
struct Segment lastSegment;
struct Sequence *prev; // Points to a sequence without the last segment or NULL if it is the first segment
int totalLen; // Total length of the compressed sequence.
};
// as an example, for the sequence agt32ta5, the representation will be:
// {
// lastSegment:{"ta" , 2 , 5},
// prev: #A,
// totalLen: 8
// }
// and A will be
// {
// lastSegment{ "agt", 3, 32},
// prev: NULL,
// totalLen: 5
// }
// This function converts a sequence to a string.
// You have to free the string after using it.
// The strategy is to construct the string from right to left.
char *sequence2string(struct Sequence *S) {
char *Res=malloc(S->totalLen + 1);
char *digits="0123456789";
int p= S->totalLen;
Res[p]=0;
while (S!=NULL) {
// first we insert the count of the last element.
// We do digit by digit starting with the units.
int C = S->lastSegment.count;
while (C) {
p--;
Res[p] = digits[ C % 10 ];
C /= 10;
}
p -= S->lastSegment.nElements;
strncpy(Res + p , S->lastSegment.elements, S->lastSegment.nElements);
S = S ->prev;
}
return Res;
}
// Compresses a dna sequence.
// Returns a string with the in sequence compressed.
// The returned string must be freed after using it.
char *dnaCompress(char *in) {
int i,j;
int N = strlen(in);; // Number of elements of a in sequence.
// B is an array of N+1 sequences where B(i) is the best compressed sequence sequence of the first i characters.
// What we want to return is B[N];
struct Sequence *B;
B = malloc((N+1) * sizeof (struct Sequence));
// We first do an initialization for i=0
B[0].lastSegment.elements="";
B[0].lastSegment.nElements=0;
B[0].lastSegment.count=0;
B[0].prev = NULL;
B[0].totalLen=0;
// and set totalLen of all the sequences to a very HIGH VALUE in this case N*2 will be enougth, We will try different sequences and keep the minimum one.
for (i=1; i<=N; i++) B[i].totalLen = INT_MAX; // A very high value
for (i=1; i<=N; i++) {
// at this point we want to calculate B[i] and we know B[i-1], B[i-2], .... ,B[0]
for (j=1; j<=i; j++) {
// Here we will check all the candidates where the last segment has j elements
int r=1; // number of times the last segment is repeated
int rNDigits=1; // Number of digits of r
int rNDigitsBound=10; // We will increment r, so this value is when r will have an extra digit.
// when r = 0,1,...,9 => rNDigitsBound = 10
// when r = 10,11,...,99 => rNDigitsBound = 100
// when r = 100,101,.,999 => rNDigitsBound = 1000 and so on.
do {
// Here we analitze a candidate B(i).
// where the las segment has j elements repeated r times.
int CandidateLen = B[i-j*r].totalLen + j + rNDigits;
if (CandidateLen < B[i].totalLen) {
B[i].lastSegment.elements = in + i - j*r;
B[i].lastSegment.nElements = j;
B[i].lastSegment.count = r;
B[i].prev = &(B[i-j*r]);
B[i].totalLen = CandidateLen;
}
r++;
if (r == rNDigitsBound ) {
rNDigits++;
rNDigitsBound *= 10;
}
} while ( (i - j*r >= 0)
&& (strncmp(in + i -j, in + i - j*r, j)==0));
}
}
char *Res=sequence2string(&(B[N]));
free(B);
return Res;
}
int main(int argc, char** argv) {
char *compressedDNA=dnaCompress(argv[1]);
puts(compressedDNA);
free(compressedDNA);
return 0;
}
Forget Ukonnen. Dynamic programming it is. With 3-dimensional table:
sequence position
subsequence size
number of segments
TERMINOLOGY: For example, having a = "aaagctgctagag", sequence position coordinate would run from 1 to 13. At sequence position 3 (letter 'g'), having subsequence size 4, the subsequence would be "gctg". Understood? And as for the number of segments, then expressing a as "aaagctgctagag1" consists of 1 segment (the sequence itself). Expressing it as "a3gct2ag2" consists of 3 segments. "aaagctgct1ag2" consists of 2 segments. "a2a1ctg2ag2" would consist of 4 segments. Understood? Now, with this, you start filling a 3-dimensional array 13 x 13 x 13, so your time and memory complexity seems to be around n ** 3 for this. Are you sure you can handle it for million-bp sequences? I think that greedy approach would be better, because large DNA sequences are unlikely to repeat exactly. And, I would suggest that you widen your assignment to approximate matches, and you can publish it straight in a journal.
Anyway, you will start filling the table of compressing a subsequence starting at some position (dimension 1) with length equal to dimension 2 coordinate, having at most dimension 3 segments. So you first fill the first row, representing compressions of subsequences of length 1 consisting of at most 1 segment:
a a a g c t g c t a g a g
1(a1) 1(a1) 1(a1) 1(g1) 1(c1) 1(t1) 1(g1) 1(c1) 1(t1) 1(a1) 1(g1) 1(a1) 1(g1)
The number is the character cost (always 1 for these trivial 1-char sequences; number 1 does not count into the character cost), and in the parenthesis, you have the compression (also trivial for this simple case). The second row will be still simple:
2(a2) 2(a2) 2(ag1) 2(gc1) 2(ct1) 2(tg1) 2(gc1) 2(ct1) 2(ta1) 2(ag1) 2(ga1) 2(ag1)
There is only 1 way to decompose a 2-character sequence into 2 subsequences -- 1 character + 1 character. If they are identical, the result is like a + a = a2. If they are different, such as a + g, then, because only 1-segment sequences are admissible, the result cannot be a1g1, but must be ag1. The third row will be finally more interesting:
2(a3) 2(aag1) 3(agc1) 3(gct1) 3(ctg1) 3(tgc1) 3(gct1) 3(cta1) 3(tag1) 3(aga1) 3(gag1)
Here, you can always choose between 2 ways of composing the compressed string. For example, aag can be composed either as aa + g or a + ag. But again, we cannot have 2 segments, as in aa1g1 or a1ag1, so we must be satisfied with aag1, unless both components consist of the same character, as in aa + a => a3, with character cost 2. We can continue onto 4 th line:
4(aaag1) 4(aagc1) 4(agct1) 4(gctg1) 4(ctgc1) 4(tgct1) 4(gcta1) 4(ctag1) 4(taga1) 3(ag2)
Here, on the first position, we cannot use a3g1, because only 1 segment is allowed at this layer. But at the last position, compression to character cost 3 is agchieved by ag1 + ag1 = ag2. This way, one can fill the whole first-level table all the way up to the single subsequence of 13 characters, and each subsequence will have its optimal character cost and its compression under the first-level constraint of at most 1 segment associated with it.
Then you go to the 2nd level, where 2 segments are allowed... And again, from the bottom up, you identify the optimum cost and compression of each table coordinate under the given level's segment count constraint, by comparing all the possible ways to compose the subsequence using already computed positions, until you fill the table completely and thus compute the global optimum. There are some details to solve, but sorry, I'm not gonna code this for you.
After trying my own way for a while, my kudos to jbaylina for his beautiful algorithm and C implementation. Here's my attempted version of jbaylina's algorithm in Haskell, and below it further development of my attempt at a linear-time algorithm that attempts to compress segments that include repeated patterns in a one-by-one fashion:
import Data.Map (fromList, insert, size, (!))
compress s = (foldl f (fromList [(0,([],0)),(1,([s!!0],1))]) [1..n - 1]) ! n
where
n = length s
f b i = insert (size b) bestCandidate b where
add (sequence, sLength) (sequence', sLength') =
(sequence ++ sequence', sLength + sLength')
j' = [1..min 100 i]
bestCandidate = foldr combCandidates (b!i `add` ([s!!i,'1'],2)) j'
combCandidates j candidate' =
let nextCandidate' = comb 2 (b!(i - j + 1)
`add` ((take j . drop (i - j + 1) $ s) ++ "1", j + 1))
in if snd nextCandidate' <= snd candidate'
then nextCandidate'
else candidate' where
comb r candidate
| r > uBound = candidate
| not (strcmp r True) = candidate
| snd nextCandidate <= snd candidate = comb (r + 1) nextCandidate
| otherwise = comb (r + 1) candidate
where
uBound = div (i + 1) j
prev = b!(i - r * j + 1)
nextCandidate = prev `add`
((take j . drop (i - j + 1) $ s) ++ show r, j + length (show r))
strcmp 1 _ = True
strcmp num bool
| (take j . drop (i - num * j + 1) $ s)
== (take j . drop (i - (num - 1) * j + 1) $ s) =
strcmp (num - 1) True
| otherwise = False
Output:
*Main> compress "aaagctgctagag"
("a3gct2ag2",9)
*Main> compress "aaabbbaaabbbaaabbbaaabbb"
("aaabbb4",7)
Linear-time attempt:
import Data.List (sortBy)
group' xxs sAccum (chr, count)
| null xxs = if null chr
then singles
else if count <= 2
then reverse sAccum ++ multiples ++ "1"
else singles ++ if null chr then [] else chr ++ show count
| [x] == chr = group' xs sAccum (chr,count + 1)
| otherwise = if null chr
then group' xs (sAccum) ([x],1)
else if count <= 2
then group' xs (multiples ++ sAccum) ([x],1)
else singles
++ chr ++ show count ++ group' xs [] ([x],1)
where x:xs = xxs
singles = reverse sAccum ++ (if null sAccum then [] else "1")
multiples = concat (replicate count chr)
sequences ws strIndex maxSeqLen = repeated' where
half = if null . drop (2 * maxSeqLen - 1) $ ws
then div (length ws) 2 else maxSeqLen
repeated' = let (sequence,(sequenceStart, sequenceEnd'),notSinglesFlag) = repeated
in (sequence,(sequenceStart, sequenceEnd'))
repeated = foldr divide ([],(strIndex,strIndex),False) [1..half]
equalChunksOf t a = takeWhile(==t) . map (take a) . iterate (drop a)
divide chunkSize b#(sequence,(sequenceStart, sequenceEnd'),notSinglesFlag) =
let t = take (2*chunkSize) ws
t' = take chunkSize t
in if t' == drop chunkSize t
then let ts = equalChunksOf t' chunkSize ws
lenTs = length ts
sequenceEnd = strIndex + lenTs * chunkSize
newEnd = if sequenceEnd > sequenceEnd'
then sequenceEnd else sequenceEnd'
in if chunkSize > 1
then if length (group' (concat (replicate lenTs t')) [] ([],0)) > length (t' ++ show lenTs)
then (((strIndex,sequenceEnd,chunkSize,lenTs),t'):sequence, (sequenceStart,newEnd),True)
else b
else if notSinglesFlag
then b
else (((strIndex,sequenceEnd,chunkSize,lenTs),t'):sequence, (sequenceStart,newEnd),False)
else b
addOne a b
| null (fst b) = a
| null (fst a) = b
| otherwise =
let (((start,end,patLen,lenS),sequence):rest,(sStart,sEnd)) = a
(((start',end',patLen',lenS'),sequence'):rest',(sStart',sEnd')) = b
in if sStart' < sEnd && sEnd < sEnd'
then let c = ((start,end,patLen,lenS),sequence):rest
d = ((start',end',patLen',lenS'),sequence'):rest'
in (c ++ d, (sStart, sEnd'))
else a
segment xs baseIndex maxSeqLen = segment' xs baseIndex baseIndex where
segment' zzs#(z:zs) strIndex farthest
| null zs = initial
| strIndex >= farthest && strIndex > 0 = ([],(0,0))
| otherwise = addOne initial next
where
next#(s',(start',end')) = segment' zs (strIndex + 1) farthest'
farthest' | null s = farthest
| otherwise = if start /= end && end > farthest then end else farthest
initial#(s,(start,end)) = sequences zzs strIndex maxSeqLen
areExclusive ((a,b,_,_),_) ((a',b',_,_),_) = (a' >= b) || (b' <= a)
combs [] r = [r]
combs (x:xs) r
| null r = combs xs (x:r) ++ if null xs then [] else combs xs r
| otherwise = if areExclusive (head r) x
then combs xs (x:r) ++ combs xs r
else if l' > lowerBound
then combs xs (x: reduced : drop 1 r) ++ combs xs r
else combs xs r
where lowerBound = l + 2 * patLen
((l,u,patLen,lenS),s) = head r
((l',u',patLen',lenS'),s') = x
reduce = takeWhile (>=l') . iterate (\x -> x - patLen) $ u
lenReduced = length reduce
reduced = ((l,u - lenReduced * patLen,patLen,lenS - lenReduced),s)
buildString origStr sequences = buildString' origStr sequences 0 (0,"",0)
where
buildString' origStr sequences index accum#(lenC,cStr,lenOrig)
| null sequences = accum
| l /= index =
buildString' (drop l' origStr) sequences l (lenC + l' + 1, cStr ++ take l' origStr ++ "1", lenOrig + l')
| otherwise =
buildString' (drop u' origStr) rest u (lenC + length s', cStr ++ s', lenOrig + u')
where
l' = l - index
u' = u - l
s' = s ++ show lenS
(((l,u,patLen,lenS),s):rest) = sequences
compress [] _ accum = reverse accum ++ (if null accum then [] else "1")
compress zzs#(z:zs) maxSeqLen accum
| null (fst segment') = compress zs maxSeqLen (z:accum)
| (start,end) == (0,2) && not (null accum) = compress zs maxSeqLen (z:accum)
| otherwise =
reverse accum ++ (if null accum || takeWhile' compressedStr 0 /= 0 then [] else "1")
++ compressedStr
++ compress (drop lengthOriginal zzs) maxSeqLen []
where segment'#(s,(start,end)) = segment zzs 0 maxSeqLen
combinations = combs (fst $ segment') []
takeWhile' xxs count
| null xxs = 0
| x == '1' && null (reads (take 1 xs)::[(Int,String)]) = count
| not (null (reads [x]::[(Int,String)])) = 0
| otherwise = takeWhile' xs (count + 1)
where x:xs = xxs
f (lenC,cStr,lenOrig) (lenC',cStr',lenOrig') =
let g = compare ((fromIntegral lenC + if not (null accum) && takeWhile' cStr 0 == 0 then 1 else 0) / fromIntegral lenOrig)
((fromIntegral lenC' + if not (null accum) && takeWhile' cStr' 0 == 0 then 1 else 0) / fromIntegral lenOrig')
in if g == EQ
then compare (takeWhile' cStr' 0) (takeWhile' cStr 0)
else g
(lenCompressed,compressedStr,lengthOriginal) =
head $ sortBy f (map (buildString (take end zzs)) (map reverse combinations))
Output:
*Main> compress "aaaaaaaaabbbbbbbbbaaaaaaaaabbbbbbbbb" 100 []
"a9b9a9b9"
*Main> compress "aaabbbaaabbbaaabbbaaabbb" 100 []
"aaabbb4"

divide list in two parts that their sum closest to each other

This is a hard algorithms problem that :
Divide the list in 2 parts (sum) that their sum closest to (most) each other
list length is 1 <= n <= 100 and their(numbers) weights 1<=w<=250 given in the question.
For example : 23 65 134 32 95 123 34
1.sum = 256
2.sum = 250
1.list = 1 2 3 7
2.list = 4 5 6
I have an algorithm but it didn't work for all inputs.
init. lists list1 = [], list2 = []
Sort elements (given list) [23 32 34 65 95 123 134]
pop last one (max one)
insert to the list which differs less
Implementation :
list1 = [], list2 = []
select 134 insert list1. list1 = [134]
select 123 insert list2. because if you insert to the list1 the difference getting bigger 3. select 95 and insert list2 . because sum(list2) + 95 - sum(list1) is less.
and so on...
You can reformulate this as the knapsack problem.
You have a list of items with total weight M that should be fitted into a bin that can hold maximum weight M/2. The items packed in the bin should weigh as much as possible, but not more than the bin holds.
For the case where all weights are non-negative, this problem is only weakly NP-complete and has polynomial time solutions.
A description of dynamic programming solutions for this problem can be found on Wikipedia.
The problem is NPC, but there is a pseudo polynomial algorithm for it, this is a 2-Partition problem, you can follow the way of pseudo polynomial time algorithm for sub set sum problem to solve this. If the input size is related polynomially to input values, then this can be done in polynomial time.
In your case (weights < 250) it's polynomial (because weight <= 250 n => sums <= 250 n^2).
Let Sum = sum of weights, we have to create two dimensional array A, then construct A, Column by Column
A[i,j] = true if (j == weight[i] or j - weight[i] = weight[k] (k is in list)).
The creation of array with this algorithm takes O(n^2 * sum/2).
At last we should find most valuable column which has true value.
Here is an example:
items:{0,1,2,3}
weights:{4,7,2,8} => sum = 21 sum/2 = 10
items/weights 0| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
---------------------------------------------------------
|0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0
|1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0
|2 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1
|3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1
So because a[10, 2] == true the partition is 10, 11
This is an algorithm I found here and edited a little bit to solve your problem:
bool partition( vector< int > C ) {
// compute the total sum
int n = C.size();
int N = 0;
for( int i = 0; i < n; i++ ) N += C[i];
// initialize the table
T[0] = true;
for( int i = 1; i <= N; i++ ) T[i] = false;
// process the numbers one by one
for( int i = 0; i < n; i++ )
for( int j = N - C[i]; j >= 0; j--)
if( T[j] ) T[j + C[i]] = true;
for(int i = N/2;i>=0;i--)
if (T[i])
return i;
return 0;
}
I just returned first T[i] which is true instead of returning T[N/2] (in max to min order).
Finding the path which gives this value is not hard.
This problem is at least as hard as the NP-complete problem subset sum. Your algorithm is a greedy algorithm. This type of algorithm is fast, and can generate an approximate solution quickly but it cannot find the exact solution to an NP-complete problem.
A brute force approach is probably the simplest way to solve your problem, although it is will be to slow if there are too many elements.
Try every possible way of partitioning the elements into two sets and calculate the absolute difference in the sums.
Choose the partition for which the absolute difference is minimal.
Generating all the partitions can be done by considering the binary representation of each integer from 0 to 2^n, where each binary digit determines whether the correspending element is in the left or right partition.
Trying to resolve the same problem I have faced into the following idea which seems too much a solution, but it works in a linear time. Could one provide an example which would show that it does not work or explain why it is not a solution?
arr = [20,10,15,6,1,17,3,9,10,2,19] # a list of numbers
g1 = []
g2 = []
for el in reversed(sorted(arr)):
if sum(g1) > sum(g2):
g2.append(el)
else:
g1.append(el)
print(f"{sum(g1)}: {g1}")
print(f"{sum(g2)}: {g2}")
Typescript code:
import * as _ from 'lodash'
function partitionArray(numbers: number[]): {
arr1: number[]
arr2: number[]
difference: number
} {
let sortedArr: number[] = _.chain(numbers).without(0).sortBy((x) => x).value().reverse()
let arr1: number[] = []
let arr2: number[] = []
let median = _.sum(sortedArr) / 2
let sum = 0
_.each(sortedArr, (n) => {
let ns = sum + n
if (ns > median) {
arr1.push(n)
} else {
sum += n
arr2.push(n)
}
})
return {
arr1: arr1,
arr2: arr2,
difference: Math.abs(_.sum(arr1) - _.sum(arr2))
}
}

How to create the most compact mapping n → isprime(n) up to a limit N?

Naturally, for bool isprime(number) there would be a data structure I could query.
I define the best algorithm, to be the algorithm that produces a data structure with lowest memory consumption for the range (1, N], where N is a constant.
Just an example of what I am looking for: I could represent every odd number with one bit e.g. for the given range of numbers (1, 10], starts at 3: 1110
The following dictionary can be squeezed more, right? I could eliminate multiples of five with some work, but numbers that end with 1, 3, 7 or 9 must be there in the array of bits.
How do I solve the problem?
The fastest algorithm for general prime testing is AKS. The Wikipedia article describes it at lengths and links to the original paper.
If you want to find big numbers, look into primes that have special forms like Mersenne primes.
The algorithm I usually implement (easy to understand and code) is as follows (in Python):
def isprime(n):
"""Returns True if n is prime."""
if n == 2:
return True
if n == 3:
return True
if n % 2 == 0:
return False
if n % 3 == 0:
return False
i = 5
w = 2
while i * i <= n:
if n % i == 0:
return False
i += w
w = 6 - w
return True
It's a variant of the classic O(sqrt(N)) algorithm. It uses the fact that a prime (except 2 and 3) is of form 6k - 1 or 6k + 1 and looks only at divisors of this form.
Sometimes, If I really want speed and the range is limited, I implement a pseudo-prime test based on Fermat's little theorem. If I really want more speed (i.e. avoid O(sqrt(N)) algorithm altogether), I precompute the false positives (see Carmichael numbers) and do a binary search. This is by far the fastest test I've ever implemented, the only drawback is that the range is limited.
A pretty simple and concise brute-force solution to check whether a number N is prime: simply check if there is any divisor of N from 2 up to the square root of N (see why here if interested).
The following code is compatible with both Python 2 and Python 3:
from math import sqrt
from itertools import count, islice
def is_prime(n):
return n > 1 and all(n % i for i in islice(count(2), int(sqrt(n) - 1)))
And here's a simpler Python 3 only implementation:
def is_prime(n):
return n > 1 and all(n % i for i in range(2, int(n ** 0.5) + 1))
Here are the extended versions of the above for clarity:
from math import sqrt
from itertools import count, islice
def is_prime(n):
if n < 2:
return False
for divisor in islice(count(2), int(sqrt(n) - 1)):
if n % divisor == 0:
return False
return True
def is_prime(n):
if n < 2:
return False
for divisor in range(2, int(n ** 0.5) + 1):
if n % divisor == 0:
return False
return True
This is not meant to be anything near the fastest nor the most optimal primality check algorithm, it only accomplishes the goal of being simple and concise, which also reduces implementation errors. It has a time complexity of O(sqrt(n)).
If you are looking for faster algorithms to check whether a number is prime, you might be interested in the following:
Finding primes & proving primality: brief overview and explanation of the most famous primality tests and their history.
Probabilistic primality tests (Wikipedia): these can be incorporated in the above code rather easily to skip the brute force if they do not pass, as an example there is this excellent answer to the duplicate of this question.
Fast deterministic primaliry tests (Wikipedia)
This Q&A Fastest way to list all primes below N along with the pyprimesieve library.
Implementation notes
You might have noticed that in the Python 2 compatible implementation I am using itertools.count() in combination with itertools.islice() instead of a simple range() or xrange() (the old Python 2 generator range, which in Python 3 is the default). This is because in CPython 2 xrange(N) for some N such that N > 263 ‒ 1 (or N > 231 ‒ 1 depending on the implementation) raises an OverflowError. This is an unfortunate CPython implementation detail.
We can use itertools to overcome this issue. Since we are counting up from 2 to infinity using itertools.count(2), we'll reach sqrt(n) after sqrt(n) - 1 steps, and we can limit the generator using itertools.islice().
There are many efficient ways to test primality (and this isn't one of them), but the loop you wrote can be concisely rewritten in Python:
def is_prime(a):
return all(a % i for i in xrange(2, a))
That is, a is prime if all numbers between 2 and a (not inclusive) give non-zero remainder when divided into a.
This is the most efficient way to see if a number is prime, if you only have a few queries. If you ask a lot of numbers if they are prime, try Sieve of Eratosthenes.
import math
def is_prime(n):
if n == 2:
return True
if n % 2 == 0 or n <= 1:
return False
sqr = int(math.sqrt(n)) + 1
for divisor in range(3, sqr, 2):
if n % divisor == 0:
return False
return True
If a is a prime then the while x: in your code will run forever, since x will remain True.
So why is that while there?
I think you wanted to end the for loop when you found a factor, but didn't know how, so you added that while since it has a condition. So here is how you do it:
def is_prime(a):
x = True
for i in range(2, a):
if a%i == 0:
x = False
break # ends the for loop
# no else block because it does nothing ...
if x:
print "prime"
else:
print "not prime"
I compared the efficiency of the most popular suggestions to determine if a number is prime. I used python 3.6 on ubuntu 17.10; I tested with numbers up to 100.000 (you can test with bigger numbers using my code below).
This first plot compares the functions (which are explained further down in my answer), showing that the last functions do not grow as fast as the first one when increasing the numbers.
And in the second plot we can see that in case of prime numbers the time grows steadily, but non-prime numbers do not grow so fast in time (because most of them can be eliminated early on).
Here are the functions I used:
this answer and this answer suggested a construct using all():
def is_prime_1(n):
return n > 1 and all(n % i for i in range(2, int(math.sqrt(n)) + 1))
This answer used some kind of while loop:
def is_prime_2(n):
if n <= 1:
return False
if n == 2:
return True
if n == 3:
return True
if n % 2 == 0:
return False
if n % 3 == 0:
return False
i = 5
w = 2
while i * i <= n:
if n % i == 0:
return False
i += w
w = 6 - w
return True
This answer included a version with a for loop:
def is_prime_3(n):
if n <= 1:
return False
if n % 2 == 0 and n > 2:
return False
for i in range(3, int(math.sqrt(n)) + 1, 2):
if n % i == 0:
return False
return True
And I mixed a few ideas from the other answers into a new one:
def is_prime_4(n):
if n <= 1: # negative numbers, 0 or 1
return False
if n <= 3: # 2 and 3
return True
if n % 2 == 0 or n % 3 == 0:
return False
for i in range(5, int(math.sqrt(n)) + 1, 2):
if n % i == 0:
return False
return True
Here is my script to compare the variants:
import math
import pandas as pd
import seaborn as sns
import time
from matplotlib import pyplot as plt
def is_prime_1(n):
...
def is_prime_2(n):
...
def is_prime_3(n):
...
def is_prime_4(n):
...
default_func_list = (is_prime_1, is_prime_2, is_prime_3, is_prime_4)
def assert_equal_results(func_list=default_func_list, n):
for i in range(-2, n):
r_list = [f(i) for f in func_list]
if not all(r == r_list[0] for r in r_list):
print(i, r_list)
raise ValueError
print('all functions return the same results for integers up to {}'.format(n))
def compare_functions(func_list=default_func_list, n):
result_list = []
n_measurements = 3
for f in func_list:
for i in range(1, n + 1):
ret_list = []
t_sum = 0
for _ in range(n_measurements):
t_start = time.perf_counter()
is_prime = f(i)
t_end = time.perf_counter()
ret_list.append(is_prime)
t_sum += (t_end - t_start)
is_prime = ret_list[0]
assert all(ret == is_prime for ret in ret_list)
result_list.append((f.__name__, i, is_prime, t_sum / n_measurements))
df = pd.DataFrame(
data=result_list,
columns=['f', 'number', 'is_prime', 't_seconds'])
df['t_micro_seconds'] = df['t_seconds'].map(lambda x: round(x * 10**6, 2))
print('df.shape:', df.shape)
print()
print('', '-' * 41)
print('| {:11s} | {:11s} | {:11s} |'.format(
'is_prime', 'count', 'percent'))
df_sub1 = df[df['f'] == 'is_prime_1']
print('| {:11s} | {:11,d} | {:9.1f} % |'.format(
'all', df_sub1.shape[0], 100))
for (is_prime, count) in df_sub1['is_prime'].value_counts().iteritems():
print('| {:11s} | {:11,d} | {:9.1f} % |'.format(
str(is_prime), count, count * 100 / df_sub1.shape[0]))
print('', '-' * 41)
print()
print('', '-' * 69)
print('| {:11s} | {:11s} | {:11s} | {:11s} | {:11s} |'.format(
'f', 'is_prime', 't min (us)', 't mean (us)', 't max (us)'))
for f, df_sub1 in df.groupby(['f', ]):
col = df_sub1['t_micro_seconds']
print('|{0}|{0}|{0}|{0}|{0}|'.format('-' * 13))
print('| {:11s} | {:11s} | {:11.2f} | {:11.2f} | {:11.2f} |'.format(
f, 'all', col.min(), col.mean(), col.max()))
for is_prime, df_sub2 in df_sub1.groupby(['is_prime', ]):
col = df_sub2['t_micro_seconds']
print('| {:11s} | {:11s} | {:11.2f} | {:11.2f} | {:11.2f} |'.format(
f, str(is_prime), col.min(), col.mean(), col.max()))
print('', '-' * 69)
return df
Running the function compare_functions(n=10**5) (numbers up to 100.000) I get this output:
df.shape: (400000, 5)
-----------------------------------------
| is_prime | count | percent |
| all | 100,000 | 100.0 % |
| False | 90,408 | 90.4 % |
| True | 9,592 | 9.6 % |
-----------------------------------------
---------------------------------------------------------------------
| f | is_prime | t min (us) | t mean (us) | t max (us) |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_1 | all | 0.57 | 2.50 | 154.35 |
| is_prime_1 | False | 0.57 | 1.52 | 154.35 |
| is_prime_1 | True | 0.89 | 11.66 | 55.54 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_2 | all | 0.24 | 1.14 | 304.82 |
| is_prime_2 | False | 0.24 | 0.56 | 304.82 |
| is_prime_2 | True | 0.25 | 6.67 | 48.49 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_3 | all | 0.20 | 0.95 | 50.99 |
| is_prime_3 | False | 0.20 | 0.60 | 40.62 |
| is_prime_3 | True | 0.58 | 4.22 | 50.99 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_4 | all | 0.20 | 0.89 | 20.09 |
| is_prime_4 | False | 0.21 | 0.53 | 14.63 |
| is_prime_4 | True | 0.20 | 4.27 | 20.09 |
---------------------------------------------------------------------
Then, running the function compare_functions(n=10**6) (numbers up to 1.000.000) I get this output:
df.shape: (4000000, 5)
-----------------------------------------
| is_prime | count | percent |
| all | 1,000,000 | 100.0 % |
| False | 921,502 | 92.2 % |
| True | 78,498 | 7.8 % |
-----------------------------------------
---------------------------------------------------------------------
| f | is_prime | t min (us) | t mean (us) | t max (us) |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_1 | all | 0.51 | 5.39 | 1414.87 |
| is_prime_1 | False | 0.51 | 2.19 | 413.42 |
| is_prime_1 | True | 0.87 | 42.98 | 1414.87 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_2 | all | 0.24 | 2.65 | 612.69 |
| is_prime_2 | False | 0.24 | 0.89 | 322.81 |
| is_prime_2 | True | 0.24 | 23.27 | 612.69 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_3 | all | 0.20 | 1.93 | 67.40 |
| is_prime_3 | False | 0.20 | 0.82 | 61.39 |
| is_prime_3 | True | 0.59 | 14.97 | 67.40 |
|-------------|-------------|-------------|-------------|-------------|
| is_prime_4 | all | 0.18 | 1.88 | 332.13 |
| is_prime_4 | False | 0.20 | 0.74 | 311.94 |
| is_prime_4 | True | 0.18 | 15.23 | 332.13 |
---------------------------------------------------------------------
I used the following script to plot the results:
def plot_1(func_list=default_func_list, n):
df_orig = compare_functions(func_list=func_list, n=n)
df_filtered = df_orig[df_orig['t_micro_seconds'] <= 20]
sns.lmplot(
data=df_filtered, x='number', y='t_micro_seconds',
col='f',
# row='is_prime',
markers='.',
ci=None)
plt.ticklabel_format(style='sci', axis='x', scilimits=(3, 3))
plt.show()
One can use sympy.
import sympy
sympy.ntheory.primetest.isprime(33393939393929292929292911111111)
True
From sympy docs. The first step is looking for trivial factors, which if found enables a quick return. Next, if the sieve is large enough, use bisection search on the sieve. For small numbers, a set of deterministic Miller-Rabin tests are performed with bases that are known to have no counterexamples in their range. Finally if the number is larger than 2^64, a strong BPSW test is performed. While this is a probable prime test and we believe counterexamples exist, there are no known counterexamples
According to wikipedia, the Sieve of Eratosthenes has complexity O(n * (log n) * (log log n)) and requires O(n) memory - so it's a pretty good place to start if you aren't testing for especially large numbers.
bool isPrime(int n)
{
// Corner cases
if (n <= 1) return false;
if (n <= 3) return true;
// This is checked so that we can skip
// middle five numbers in below loop
if (n%2 == 0 || n%3 == 0) return false;
for (int i=5; i*i<=n; i=i+6)
if (n%i == 0 || n%(i+2) == 0)
return false;
return true;
}
This is just c++ implementation of above
For large numbers you cannot simply naively check whether the candidate number N is divisible by none of the numbers less than sqrt(N). There are much more scalable tests available, such as the Miller-Rabin primality test. Below you have implementation in python:
def is_prime(x):
"""Fast implementation fo Miller-Rabin primality test, guaranteed to be correct."""
import math
def get_sd(x):
"""Returns (s: int, d: int) for which x = d*2^s """
if not x: return 0, 0
s = 0
while 1:
if x % 2 == 0:
x /= 2
s += 1
else:
return s, x
if x <= 2:
return x == 2
# x - 1 = d*2^s
s, d = get_sd(x - 1)
if not s:
return False # divisible by 2!
log2x = int(math.log(x) / math.log(2)) + 1
# As long as Riemann hypothesis holds true, it is impossible
# that all the numbers below this threshold are strong liars.
# Hence the number is guaranteed to be a prime if no contradiction is found.
threshold = min(x, 2*log2x*log2x+1)
for a in range(2, threshold):
# From Fermat's little theorem if x is a prime then a^(x-1) % x == 1
# Hence the below must hold true if x is indeed a prime:
if pow(a, d, x) != 1:
for r in range(0, s):
if -pow(a, d*2**r, x) % x == 1:
break
else:
# Contradicts Fermat's little theorem, hence not a prime.
return False
# No contradiction found, hence x must be a prime.
return True
You can use it to find huge prime numbers:
x = 10000000000000000000000000000000000000000000000000000000000000000000000000000
for e in range(1000):
if is_prime(x + e):
print('%d is a prime!' % (x + e))
break
# 10000000000000000000000000000000000000000000000000000000000000000000000000133 is a prime!
If you are testing random integers probably you want to first test whether the candidate number is divisible by any of the primes smaller than, say 1000, before you call Miller-Rabin. This will help you filter out obvious non-primes such as 10444344345.
Python 3:
def is_prime(a):
return a > 1 and all(a % i for i in range(2, int(a**0.5) + 1))
Way too late to the party, but hope this helps. This is relevant if you are looking for big primes:
To test large odd numbers you need to use the Fermat-test and/or Miller-Rabin test.
These tests use modular exponentiation which is quite expensive, for n bits exponentiation you need at least n big int multiplication and n big int divison. Which means the complexity of modular exponentiation is O(n³).
So before using the big guns, you need to do quite a few trial divisions. But don't do it naively, there is a way to do them fast.
First multiply as many primes together as many fits into a the words you use for the big integers. If you use 32 bit words, multiply 3*5*7*11*13*17*19*23*29=3234846615 and compute the greatest common divisor with the number you test using the Euclidean algorithm. After the first step the number is reduced below the word size and continue the algorithm without performing complete big integer divisions. If the GCD != 1, that means one of the primes you multiplied together divides the number, so you have a proof it's not prime. Then continue with 31*37*41*43*47 = 95041567, and so on.
Once you tested several hundred (or thousand) primes this way, you can do 40 rounds of Miller-Rabin test to confirm the number is prime, after 40 rounds you can be certain the number is prime there is only 2^-80 chance it's not (it's more likely your hardware malfunctions...).
I have got a prime function which works until (2^61)-1 Here:
from math import sqrt
def isprime(num): num > 1 and return all(num % x for x in range(2, int(sqrt(num)+1)))
Explanation:
The all() function can be redefined to this:
def all(variables):
for element in variables:
if not element: return False
return True
The all() function just goes through a series of bools / numbers and returns False if it sees 0 or False.
The sqrt() function is just doing the square root of a number.
For example:
>>> from math import sqrt
>>> sqrt(9)
>>> 3
>>> sqrt(100)
>>> 10
The num % x part returns the remainder of num / x.
Finally, range(2, int(sqrt(num))) means that it will create a list that starts at 2 and ends at int(sqrt(num)+1)
For more information about range, have a look at this website!
The num > 1 part is just checking if the variable num is larger than 1, becuase 1 and 0 are not considered prime numbers.
I hope this helped :)
In Python:
def is_prime(n):
return not any(n % p == 0 for p in range(2, int(math.sqrt(n)) + 1))
A more direct conversion from mathematical formalism to Python would use all(n % p != 0... ), but that requires strict evaluation of all values of p. The not any version can terminate early if a True value is found.
best algorithm for Primes number javascript
function isPrime(num) {
if (num <= 1) return false;
else if (num <= 3) return true;
else if (num % 2 == 0 || num % 3 == 0) return false;
var i = 5;
while (i * i <= num) {
if (num % i == 0 || num % (i + 2) == 0) return false;
i += 6;
}
return true
}
A prime number is any number that is only divisible by 1 and itself. All other numbers are called composite.
The simplest way, of finding a prime number, is to check if the input number is a composite number:
function isPrime(number) {
// Check if a number is composite
for (let i = 2; i < number; i++) {
if (number % i === 0) {
return false;
}
}
// Return true for prime numbers
return true;
}
The program has to divide the value of number by all the whole numbers from 1 and up to the its value. If this number can be divided evenly not only by one and itself it is a composite number.
The initial value of the variable i has to be 2 because both prime and composite numbers can be evenly divided by 1.
for (let i = 2; i < number; i++)
Then i is less than number for the same reason. Both prime and composite numbers can be evenly divided by themselves. Therefore there is no reason to check it.
Then we check whether the variable can be divided evenly by using the remainder operator.
if (number % i === 0) {
return false;
}
If the remainder is zero it means that number can be divided evenly, hence being a composite number and returning false.
If the entered number didn't meet the condition, it means it's a prime number and the function returns true.
Smallest memory? This isn't smallest, but is a step in the right direction.
class PrimeDictionary {
BitArray bits;
public PrimeDictionary(int n) {
bits = new BitArray(n + 1);
for (int i = 0; 2 * i + 3 <= n; i++) {
bits.Set(i, CheckPrimality(2 * i + 3));
}
}
public PrimeDictionary(IEnumerable<int> primes) {
bits = new BitArray(primes.Max());
foreach(var prime in primes.Where(p => p != 2)) {
bits.Set((prime - 3) / 2, true);
}
}
public bool IsPrime(int k) {
if (k == 2) {
return true;
}
if (k % 2 == 0) {
return false;
}
return bits[(k - 3) / 2];
}
}
Of course, you have to specify the definition of CheckPrimality.
To find if the number or numbers in a range is/are prime.
#!usr/bin/python3
def prime_check(*args):
for arg in args:
if arg > 1: # prime numbers are greater than 1
for i in range(2,arg): # check for factors
if(arg % i) == 0:
print(arg,"is not Prime")
print(i,"times",arg//i,"is",arg)
break
else:
print(arg,"is Prime")
# if input number is less than
# or equal to 1, it is not prime
else:
print(arg,"is not Prime")
return
# Calling Now
prime_check(*list(range(101))) # This will check all the numbers in range 0 to 100
prime_check(#anynumber) # Put any number while calling it will check.
myInp=int(input("Enter a number: "))
if myInp==1:
print("The number {} is neither a prime not composite no".format(myInp))
elif myInp>1:
for i in range(2,myInp//2+1):
if myInp%i==0:
print("The Number {} is not a prime no".format(myInp))
print("Because",i,"times",myInp//i,"is",myInp)
break
else:
print("The Number {} is a prime no".format(myInp))
else:
print("Alas the no {} is a not a prime no".format(myInp))
public static boolean isPrime(int number) {
if(number < 2)
return false;
else if(number == 2 || number == 3)
return true;
else {
for(int i=2;i<=number/2;i++)
if(number%i == 0)
return false;
else if(i==number/2)
return true;
}
return false;
}
Most of previous answers are correct but here is one more way to test to see a number is prime number. As refresher, prime numbers are whole number greater than 1 whose only factors are 1 and itself.(source)
Solution:
Typically you can build a loop and start testing your number to see if it's divisible by 1,2,3 ...up to the number you are testing ...etc but to reduce the time to check, you can divide your number by half of the value of your number because a number cannot be exactly divisible by anything above half of it's value.
Example if you want to see 100 is a prime number you can loop through up to 50.
Actual code:
def find_prime(number):
if(number ==1):
return False
# we are dividiing and rounding and then adding the remainder to increment !
# to cover not fully divisible value to go up forexample 23 becomes 11
stop=number//2+number%2
#loop through up to the half of the values
for item in range(2,stop):
if number%item==0:
return False
print(number)
return True
if(find_prime(3)):
print("it's a prime number !!")
else:
print("it's not a prime")
We can use java streams to implement this in O(sqrt(n)); Consider that noneMatch is a shortCircuiting method that stops the operation when finds it unnecessary for determining the result:
Scanner in = new Scanner(System.in);
int n = in.nextInt();
System.out.println(n == 2 ? "Prime" : IntStream.rangeClosed(2, ((int)(Math.sqrt(n)) + 1)).noneMatch(a -> n % a == 0) ? "Prime" : "Not Prime");
With help of Java-8 streams and lambdas, it can be implemented like this in just few lines:
public static boolean isPrime(int candidate){
int candidateRoot = (int) Math.sqrt( (double) candidate);
return IntStream.range(2,candidateRoot)
.boxed().noneMatch(x -> candidate % x == 0);
}
Performance should be close to O(sqrt(N)). Maybe someone find it useful.
### is_prime(number) =
### if number % p1 !=0 for all p1(prime numbers) < (sqrt(number) + 1),
### filter numbers that are not prime from divisors
import math
def check_prime(N, prime_numbers_found = [2]):
if N == 2:
return True
if int(math.sqrt(N)) + 1 > prime_numbers_found[-1]:
divisor_range = prime_numbers_found + list(range(prime_numbers_found[-1] + 1, int(math.sqrt(N)) + 1+ 1))
else:
divisor_range = prime_numbers_found
#print(divisor_range, N)
for number in divisor_range:
if number not in prime_numbers_found:
if check_prime(number, prime_numbers_found):
prime_numbers_found.append(number)
if N % number == 0:
return False
else:
if N % number == 0:
return False
return True
bool isPrime(int n) {
if(n <= 3)
return (n > 1)==0? false: true;
else if(n%2 == 0 || n%3 == 0)
return false;
int i = 5;
while(i * i <= n){
if(n%i == 0 || (n%(i+2) == 0))
return false;
i = i + 6;
}
return true;
}
For any number, the minimum iterations to check if the number is prime or not can be from 2 to square root of the number. To reduce the iterations, even more, we can check if the number is divisible by 2 or 3 as maximum numbers can be eliminated by checking if the number is divisible by 2 or 3. Further any prime number greater than 3 can be expressed as 6k+1 or 6k-1. So the iteration can go from 6k+1 to the square root of the number.
Let me suggest you the perfect solution for 64 bit integers. Sorry to use C#. You have not already specified it as python in your first post. I hope you can find a simple modPow function and analyze it easily.
public static bool IsPrime(ulong number)
{
return number == 2
? true
: (BigInterger.ModPow(2, number, number) == 2
? ((number & 1) != 0 && BinarySearchInA001567(number) == false)
: false)
}
public static bool BinarySearchInA001567(ulong number)
{
// Is number in list?
// todo: Binary Search in A001567 (https://oeis.org/A001567) below 2 ^ 64
// Only 2.35 Gigabytes as a text file http://www.cecm.sfu.ca/Pseudoprimes/index-2-to-64.html
}
BEST SOLUTION
I an unsure if I understand the concept of Time complexity: O(sqrt(n)) and Space complexity: O(1) in this context but the
function prime(n) is probably the fastest way (least iterations)
to calculate if a number is prime number of any size.
https://github.com/ganeshkbhat/fastprimenumbers
This probably is the BEST solution in the internet as of today 11th
March 2022. Feedback and usage is welcome.
This same code can be applied in any languages like C, C++, Go Lang,
Java, .NET, Python, Rust, etc with the same logic and have performance
benefits. It is pretty fast. I have not seen this implemented before
and has been indigenously done.
If you are looking at the speed and performance here is the """BEST""" hopeful solution I can give:
Max iterations 16666 for n == 100000 instead of 100000 of conventional
way
The codes can also be found here: https://github.com/ganeshkbhat/fastprimecalculations
If you use it for your project please spend 2 minutes of your time crediting me by letting me know by either sending me an email, or logging an Github issue with subject heading [User], or star my Github project. But let me know here https://github.com/ganeshkbhat/fastprimecalculations. I would love to know the fans and users of the code logic
def prime(n):
if ((n == 2 or n == 3 or n == 5 or n == 7)):
return True
if (n == 1 or ((n > 7) and (n % 5 == 0 or n % 7 == 0 or n % 2 == 0 or n % 3 == 0))):
return False
if ( type((n - 1) / 6) == int or type((n + 1) / 6) == int):
for i in range(1, n):
factorsix = (i * 6)
five = n / (5 + factorsix)
seven = n / (7 + factorsix)
if ( ((five > 1) and type(five) == int) or ((seven > 1) and type(five) == int) ):
return False;
if (factorsix > n):
break;
return True
return False
Here is an analysis of all the ways of calculation:
Conventional way of checking for prime:
def isPrimeConventionalWay(n):
count = 0
if (n <= 1):
return False;
# Check from 2 to n-1
# Max iterations 99998 for n == 100000
for i in range(2,n):
# Counting Iterations
count += 1
if (n % i == 0):
print("count: Prime Conventional way", count)
return False;
print("count: Prime Conventional way", count)
return True;
SQUAREROOT way of checking for prime:
def isPrimeSquarerootWay(num):
count = 0
# if not is_number num return False
if (num < 2):
print("count: Prime Squareroot way", count)
return False
s = math.sqrt(num)
for i in range(2, num):
# Counting Iterations
count += 1
if (num % i == 0):
print("count: Prime Squareroot way", count)
return False
print("count: Prime Squareroot way", count)
return True
OTHER WAYS:
def isprimeAKSWay(n):
"""Returns True if n is prime."""
count = 0
if n == 2:
return True
if n == 3:
return True
if n % 2 == 0:
return False
if n % 3 == 0:
return False
i = 5
w = 2
while i * i <= n:
count += 1
if n % i == 0:
print("count: Prime AKS - Mersenne primes - Fermat's little theorem or whatever way", count)
return False
i += w
w = 6 - w
print("count: Prime AKS - Mersenne primes - Fermat's little theorem or whatever way", count)
return True
SUGGESTED way of checking for prime:
def prime(n):
count = 0
if ((n == 2 or n == 3 or n == 5 or n == 7)):
print("count: Prime Unconventional way", count)
return True
if (n == 1 or ((n > 7) and (n % 5 == 0 or n % 7 == 0 or n % 2 == 0 or n % 3 == 0))):
print("count: Prime Unconventional way", count)
return False
if (((n - 1) / 6).is_integer()) or (((n + 1) / 6).is_integer()):
for i in range(1, n):
# Counting Iterations
count += 1
five = 5 + (i * 6)
seven = 7 + (i * 6)
if ((((n / five) > 1) and (n / five).is_integer()) or (((n / seven) > 1) and ((n / seven).is_integer()))):
print("count: Prime Unconventional way", count)
return False;
if ((i * 6) > n):
# Max iterations 16666 for n == 100000 instead of 100000
break;
print("count: Prime Unconventional way", count)
return True
print("count: Prime Unconventional way", count)
return False
Tests to compare with the traditional way of checking for prime numbers.
def test_primecalculations():
count = 0
iterations = 100000
arr = []
for i in range(1, iterations):
traditional = isPrimeConventionalWay(i)
newer = prime(i)
if (traditional == newer):
count = count + 1
else:
arr.push([traditional, newer, i])
print("[count, iterations, arr] list: ", count, iterations, arr)
if (count == iterations):
return True
return False
# print("Tests Passed: ", test_primecalculations())
You will see the results of count of number of iterations as below for check of prime number: 100007:
print("Is Prime 100007: ", isPrimeConventionalWay(100007))
print("Is Prime 100007: ", isPrimeSquarerootWay(100007))
print("Is Prime 100007: ", prime(100007))
print("Is Prime 100007: ", isprimeAKSWay(100007))
count: Prime Conventional way 96
Is Prime 100007: False
count: Prime Squareroot way 96
Is Prime 100007: False
count: Prime Unconventional way 15
Is Prime 100007: False
count: Prime AKS - Mersenne primes - Fermat's little theorem or whatever way 32
Is Prime 100007: False
Here are some performance tests and results below:
import time
isPrimeConventionalWayArr = []
isPrimeSquarerootWayArr = []
primeArr = []
isprimeAKSWayArr = []
def tests_performance_isPrimeConventionalWayArr():
global isPrimeConventionalWayArr
for i in range(1, 1000000):
start = time.perf_counter_ns()
isPrimeConventionalWay(30000239)
end = time.perf_counter_ns()
isPrimeConventionalWayArr.append(end - start)
tests_performance_isPrimeConventionalWayArr()
def tests_performance_isPrimeSquarerootWayArr():
global isPrimeSquarerootWayArr
for i in range(1, 1000000):
start = time.perf_counter_ns()
isPrimeSquarerootWay(30000239)
end = time.perf_counter_ns()
isPrimeSquarerootWayArr.append(end - start)
tests_performance_isPrimeSquarerootWayArr()
def tests_performance_primeArr():
global primeArr
for i in range(1, 1000000):
start = time.perf_counter_ns()
prime(30000239)
end = time.perf_counter_ns()
primeArr.append(end - start)
tests_performance_primeArr()
def tests_performance_isprimeAKSWayArr():
global isprimeAKSWayArr
for i in range(1, 1000000):
start = time.perf_counter_ns()
isprimeAKSWay(30000239)
end = time.perf_counter_ns()
isprimeAKSWayArr.append(end - start)
tests_performance_isprimeAKSWayArr()
print("isPrimeConventionalWayArr: ", sum(isPrimeConventionalWayArr)/len(isPrimeConventionalWayArr))
print("isPrimeSquarerootWayArr: ", sum(isPrimeSquarerootWayArr)/len(isPrimeSquarerootWayArr))
print("primeArr: ", sum(primeArr)/len(primeArr))
print("isprimeAKSWayArr: ", sum(isprimeAKSWayArr)/len(isprimeAKSWayArr))
Sample 1 Million Iterations
Iteration 1:
isPrimeConventionalWayArr: 1749.97224997225
isPrimeSquarerootWayArr: 1835.6258356258356
primeArr (suggested): 475.2365752365752
isprimeAKSWayArr: 1177.982377982378
Iteration 2:
isPrimeConventionalWayArr: 1803.141403141403
isPrimeSquarerootWayArr: 2184.222484222484
primeArr (suggested): 572.6434726434726
isprimeAKSWayArr: 1403.3838033838033
Iteration 3:
isPrimeConventionalWayArr: 1876.941976941977
isPrimeSquarerootWayArr: 2190.43299043299
primeArr (suggested): 569.7365697365698
isprimeAKSWayArr: 1449.4147494147494
Iteration 4:
isPrimeConventionalWayArr: 1873.2779732779734
isPrimeSquarerootWayArr: 2177.154777154777
primeArr (suggested): 590.4243904243905
isprimeAKSWayArr: 1401.9143019143019
Iteration 5:
isPrimeConventionalWayArr: 1891.1986911986912
isPrimeSquarerootWayArr: 2218.093218093218
primeArr (suggested): 571.6938716938716
isprimeAKSWayArr: 1397.6471976471976
Iteration 6:
isPrimeConventionalWayArr: 1868.8454688454688
isPrimeSquarerootWayArr: 2168.034368034368
primeArr (suggested): 566.3278663278663
isprimeAKSWayArr: 1393.090193090193
Iteration 7:
isPrimeConventionalWayArr: 1879.4764794764794
isPrimeSquarerootWayArr: 2199.030199030199
primeArr (suggested): 574.055874055874
isprimeAKSWayArr: 1397.7587977587978
Iteration 8:
isPrimeConventionalWayArr: 1789.2868892868894
isPrimeSquarerootWayArr: 2182.3258823258825
primeArr (suggested): 569.3206693206694
isprimeAKSWayArr: 1407.1486071486072
Similar idea to the algorithm which has been mentioned
public static boolean isPrime(int n) {
if(n == 2 || n == 3) return true;
if((n & 1 ) == 0 || n % 3 == 0) return false;
int limit = (int)Math.sqrt(n) + 1;
for(int i = 5, w = 2; i <= limit; i += w, w = 6 - w) {
if(n % i == 0) return false;
numChecks++;
}
return true;
}
When I have to do a fast verification, I write this simple code based on the basic division between numbers lower than square root of input.
def isprime(n):
if n%2==0:
return n==2
else:
cota = int(n**0.5)+1
for ind in range(3,2,cota):
if n%ind==0:
print(ind)
return False
is_one = n==1
return True != is_one
isprime(22783)
The last True != n==1 is to avoid the case n=1.

Resources