Is there any faster method of matrix exponentiation to calculate Mn (where M is a matrix and n is an integer) than the simple divide and conquer algorithm?
You could factor the matrix into eigenvalues and eigenvectors. Then you get
M = V * D * V^-1
Where V is the eigenvector matrix and D is a diagonal matrix. To raise this to the Nth power, you get something like:
M^n = (V * D * V^-1) * (V * D * V^-1) * ... * (V * D * V^-1)
= V * D^n * V^-1
Because all the V and V^-1 terms cancel.
Since D is diagonal, you just have to raise a bunch of (real) numbers to the nth power, rather than full matrices. You can do that in logarithmic time in n.
Calculating eigenvalues and eigenvectors is r^3 (where r is the number of rows/columns of M). Depending on the relative sizes of r and n, this might be faster or not.
It's quite simple to use Euler fast power algorith. Use next algorith.
#define SIZE 10
//It's simple E matrix
// 1 0 ... 0
// 0 1 ... 0
// ....
// 0 0 ... 1
void one(long a[SIZE][SIZE])
{
for (int i = 0; i < SIZE; i++)
for (int j = 0; j < SIZE; j++)
a[i][j] = (i == j);
}
//Multiply matrix a to matrix b and print result into a
void mul(long a[SIZE][SIZE], long b[SIZE][SIZE])
{
long res[SIZE][SIZE] = {{0}};
for (int i = 0; i < SIZE; i++)
for (int j = 0; j < SIZE; j++)
for (int k = 0; k < SIZE; k++)
{
res[i][j] += a[i][k] * b[k][j];
}
for (int i = 0; i < SIZE; i++)
for (int j = 0; j < SIZE; j++)
a[i][j] = res[i][j];
}
//Caluclate a^n and print result into matrix res
void pow(long a[SIZE][SIZE], long n, long res[SIZE][SIZE])
{
one(res);
while (n > 0) {
if (n % 2 == 0)
{
mul(a, a);
n /= 2;
}
else {
mul(res, a);
n--;
}
}
}
Below please find equivalent for numbers:
long power(long num, long pow)
{
if (pow == 0) return 1;
if (pow % 2 == 0)
return power(num*num, pow / 2);
else
return power(num, pow - 1) * num;
}
Exponentiation by squaring is frequently used to get high powers of matrices.
I would recommend approach used to calculate Fibbonacci sequence in matrix form. AFAIK, its efficiency is O(log(n)).
Related
int sum = 0;
for(int i = 1; i < n; i++) {
for(int j = 1; j < i * i; j++) {
if(j % i == 0) {
for(int k = 0; k < j; k++) {
sum++;
}
}
}
}
I don't understand how when j = i, 2i, 3i... the last for loop runs n times. I guess I just don't understand how we came to that conclusion based on the if statement.
Edit: I know how to compute the complexity for all the loops except for why the last loop executes i times based on the mod operator... I just don't see how it's i. Basically, why can't j % i go up to i * i rather than i?
Let's label the loops A, B and C:
int sum = 0;
// loop A
for(int i = 1; i < n; i++) {
// loop B
for(int j = 1; j < i * i; j++) {
if(j % i == 0) {
// loop C
for(int k = 0; k < j; k++) {
sum++;
}
}
}
}
Loop A iterates O(n) times.
Loop B iterates O(i2) times per iteration of A. For each of these iterations:
j % i == 0 is evaluated, which takes O(1) time.
On 1/i of these iterations, loop C iterates j times, doing O(1) work per iteration. Since j is O(i2) on average, and this is only done for 1/i iterations of loop B, the average cost is O(i2 / i) = O(i).
Multiplying all of this together, we get O(n × i2 × (1 + i)) = O(n × i3). Since i is on average O(n), this is O(n4).
The tricky part of this is saying that the if condition is only true 1/i of the time:
Basically, why can't j % i go up to i * i rather than i?
In fact, j does go up to j < i * i, not just up to j < i. But the condition j % i == 0 is true if and only if j is a multiple of i.
The multiples of i within the range are i, 2*i, 3*i, ..., (i-1) * i. There are i - 1 of these, so loop C is reached i - 1 times despite loop B iterating i * i - 1 times.
The first loop consumes n iterations.
The second loop consumes n*n iterations. Imagine the case when i=n, then j=n*n.
The third loop consumes n iterations because it's executed only i times, where i is bounded to n in the worst case.
Thus, the code complexity is O(n×n×n×n).
I hope this helps you understand.
All the other answers are correct, I just want to amend the following.
I wanted to see, if the reduction of executions of the inner k-loop was sufficient to reduce the actual complexity below O(n⁴). So I wrote the following:
for (int n = 1; n < 363; ++n) {
int sum = 0;
for(int i = 1; i < n; ++i) {
for(int j = 1; j < i * i; ++j) {
if(j % i == 0) {
for(int k = 0; k < j; ++k) {
sum++;
}
}
}
}
long cubic = (long) Math.pow(n, 3);
long hypCubic = (long) Math.pow(n, 4);
double relative = (double) (sum / (double) hypCubic);
System.out.println("n = " + n + ": iterations = " + sum +
", n³ = " + cubic + ", n⁴ = " + hypCubic + ", rel = " + relative);
}
After executing this, it becomes obvious, that the complexity is in fact n⁴. The last lines of output look like this:
n = 356: iterations = 1989000035, n³ = 45118016, n⁴ = 16062013696, rel = 0.12383254507467704
n = 357: iterations = 2011495675, n³ = 45499293, n⁴ = 16243247601, rel = 0.12383580700180696
n = 358: iterations = 2034181597, n³ = 45882712, n⁴ = 16426010896, rel = 0.12383905075183874
n = 359: iterations = 2057058871, n³ = 46268279, n⁴ = 16610312161, rel = 0.12384227647628734
n = 360: iterations = 2080128570, n³ = 46656000, n⁴ = 16796160000, rel = 0.12384548432498857
n = 361: iterations = 2103391770, n³ = 47045881, n⁴ = 16983563041, rel = 0.12384867444612208
n = 362: iterations = 2126849550, n³ = 47437928, n⁴ = 17172529936, rel = 0.1238518469862343
What this shows is, that the actual relative difference between actual n⁴ and the complexity of this code segment is a factor asymptotic towards a value around 0.124... (actually 0.125). While it does not give us the exact value, we can deduce, the following:
Time complexity is n⁴/8 ~ f(n) where f is your function/method.
The wikipedia-page on Big O notation states in the tables of 'Family of Bachmann–Landau notations' that the ~ defines the limit of the two operand sides is equal. Or:
f is equal to g asymptotically
(I chose 363 as excluded upper bound, because n = 362 is the last value for which we get a sensible result. After that, we exceed the long-space and the relative value becomes negative.)
User kaya3 figured out the following:
The asymptotic constant is exactly 1/8 = 0.125, by the way; here's the exact formula via Wolfram Alpha.
Remove if and modulo without changing the complexity
Here's the original method:
public static long f(int n) {
int sum = 0;
for (int i = 1; i < n; i++) {
for (int j = 1; j < i * i; j++) {
if (j % i == 0) {
for (int k = 0; k < j; k++) {
sum++;
}
}
}
}
return sum;
}
If you're confused by the if and modulo, you can just refactor them away, with j jumping directly from i to 2*i to 3*i ... :
public static long f2(int n) {
int sum = 0;
for (int i = 1; i < n; i++) {
for (int j = i; j < i * i; j = j + i) {
for (int k = 0; k < j; k++) {
sum++;
}
}
}
return sum;
}
To make it even easier to calculate the complexity, you can introduce an intermediary j2 variable, so that every loop variable is incremented by 1 at each iteration:
public static long f3(int n) {
int sum = 0;
for (int i = 1; i < n; i++) {
for (int j2 = 1; j2 < i; j2++) {
int j = j2 * i;
for (int k = 0; k < j; k++) {
sum++;
}
}
}
return sum;
}
You can use debugging or old-school System.out.println in order to check that i, j, k triplet is always the same in each method.
Closed form expression
As mentioned by others, you can use the fact that the sum of the first n integers is equal to n * (n+1) / 2 (see triangular numbers). If you use this simplification for every loop, you get :
public static long f4(int n) {
return (n - 1) * n * (n - 2) * (3 * n - 1) / 24;
}
It is obviously not the same complexity as the original code but it does return the same values.
If you google the first terms, you can notice that 0 0 0 2 11 35 85 175 322 546 870 1320 1925 2717 3731 appear in "Stirling numbers of the first kind: s(n+2, n).", with two 0s added at the beginning. It means that sum is the Stirling number of the first kind s(n, n-2).
Let's have a look at the first two loops.
The first one is simple, it's looping from 1 to n. The second one is more interesting. It goes from 1 to i squared. Let's see some examples:
e.g. n = 4
i = 1
j loops from 1 to 1^2
i = 2
j loops from 1 to 2^2
i = 3
j loops from 1 to 3^2
In total, the i and j loops combined have 1^2 + 2^2 + 3^2.
There is a formula for the sum of first n squares, n * (n+1) * (2n + 1) / 6, which is roughly O(n^3).
You have one last k loop which loops from 0 to j if and only if j % i == 0. Since j goes from 1 to i^2, j % i == 0 is true for i times. Since the i loop iterates over n, you have one extra O(n).
So you have O(n^3) from i and j loops and another O(n) from k loop for a grand total of O(n^4)
I want to count the number of ways we can partition the number n, into k distinct parts where each part is not larger than m.
For k := 2 i have following algorithm:
public int calcIntegerPartition(int n, int k, int m) {
int cnt=0;
for(int i=1; i <= m;i++){
for(int j=i+1; j <= m; j++){
if(i+j == n){
cnt++;
break;
}
}
}
return cnt;
}
But how can i count integer partitions with k > 2? Usually I have n > 100000, k := 40, m < 10000.
Thank you in advance.
Let's start by choosing the k largest legal numbers: m, m-1, m-2, ..., m-(k-1). This adds up to k*m - k(k-1)/2. If m < k, there are no solutions because the smallest partition would be <= 0. Let's assume m >= k.
Let's say p = (km - k(k-1)/2) - n.
If p < 0, there are no solutions because the largest number we can make is less than n. Let's assume p >= 0. Note that if p = 0 there is exactly one solution, so let's assume p > 0.
Now, imagine we start by choosing the k largest distinct legal integers, and we then correct this to get a solution. Our correction involves moving values to the left (on the number line) 1 slot, into empty slots, exactly p times. How many ways can we do this?
The smallest value to start with is m-(k-1), and it can move as far down as 1, so up to m-k times. After this, each successive value can move up to its predecessor's move.
Now the problem is, how many nonincreasing integer sequences with a max value of m-k sum to p? This is the partition problem. I.e., how many ways can we partition p (into at most k partitions). This is no closed-form solution to this.
Someone has already written up a nice answer of this problem here (which will need slight modification to meet your restrictions):
Is there an efficient algorithm for integer partitioning with restricted number of parts?
As #Dave alludes to, there is already a really nice answer to the simple restricted integer case (found here (same link as #Dave): Is there an efficient algorithm for integer partitioning with restricted number of parts?).
Below is a variant in C++ which takes into account the maximum value of each restricted part. First, here is the workhorse:
#include <vector>
#include <algorithm>
#include <iostream>
int width;
int blockSize;
static std::vector<double> memoize;
double pStdCap(int n, int m, int myMax) {
if (myMax * m < n || n < m) return 0;
if (myMax * m == n || n <= m + 1) return 1;
if (m < 2) return m;
const int block = myMax * blockSize + (n - m) * width + m - 2;
if (memoize[block]) return memoize[block];
int niter = n / m;
if (m == 2) {
if (myMax * 2 >= n) {
myMax = std::min(myMax, n - 1);
return niter - (n - 1 - myMax);
} else {
return 0;
}
}
double count = 0;
for (; niter--; n -= m, --myMax) {
count += (memoize[myMax * blockSize + (n - m) * width + m - 3] = pStdCap(n - 1, m - 1, myMax));
}
return count;
}
As you can see pStdCap is very similar to the linked solution. The one noticeable difference are the 2 additional checks at the top:
if (myMax * m < n || n < m) return 0;
if (myMax * m == n || n <= m + 1) return 1;
And here is the function that sets up the recursion:
double CountPartLenCap(int n, int m, int myMax) {
if (myMax * m < n || n < m) return 0;
if (myMax * m == n || n <= m + 1) return 1;
if (m < 2) return m;
if (m == 2) {
if (myMax * 2 >= n) {
myMax = std::min(myMax, n - 1);
return n / m - (n - 1 - myMax);
} else {
return 0;
}
}
width = m;
blockSize = m * (n - m + 1);
memoize = std::vector<double>((myMax + 1) * blockSize, 0.0);
return pStdCap(n, m, myMax);
}
Explanation of the parameters:
n is the integer that you are partitioning
m is the length of each partition
myMax is the maximum value that can appear in a given partition. (the OP refers to this as the threshold)
Here is a live demonstration https://ideone.com/c3WohV
And here is a non memoized version of pStdCap which is a bit easier to understand. This is originally found in this answer to Is there an efficient way to generate N random integers in a range that have a given sum or average?
int pNonMemoStdCap(int n, int m, int myMax) {
if (myMax * m < n) return 0;
if (myMax * m == n) return 1;
if (m < 2) return m;
if (n < m) return 0;
if (n <= m + 1) return 1;
int niter = n / m;
int count = 0;
for (; niter--; n -= m, --myMax) {
count += pNonMemoStdCap(n - 1, m - 1, myMax);
}
return count;
}
If you actually intend to calculate the number of partitions for numbers as large as 10000, you are going to need a big int library as CountPartLenCap(10000, 40, 300) > 3.2e37 (Based off the OP's requirement).
I came to this problem in a challenge.
There are two arrays A and B both of size of N and we need to return the count of pairs (A[i],B[j]) where gcd(A[i],B[j])==1 and A[i] != B[j].
I could only think of brute force approach which exceeded time limit for few test cases.
for(int i=0; i<n; i++) {
for(int j=0; j<n; j++) {
if(__gcd(a[i],b[j])==1) {
printf("%d %d\n", a[i], b[j]);
}
}
}
Can you advice time efficient algorithm to solve this.
Edit: Not able to share question link as this was from a hiring challenge. Adding the constraints and input/output format as I remember.
Input -
First line will contain N, the number of elements present in both arrays.
Second line will contain N space separated integers, elements of array A.
Third line will contain N space separated integers, elements of array B.
Output -
The count of pairs A[i],A[j] as per the conditions.
Constraints -
1 <= N <= 10^5
1 < A[i],B[j] <= 10^9 where i,j < N
The first step is to use Eratosthenes sieve to calculate the prime numbers up to sqrt(10^9). This sieve can then be used to quickly find all prime factors of any number less than 10^9 (see the getPrimeFactors(...) function in the code sample below).
Next, for each A[i] with prime factors p0, p1, ..., pk, we compute all possible sub-products X - p0, p1, p0p1, p2, p0p2, p1p2, p0p1p2, p3, p0p3, ..., p0p1p2...pk and count them in map cntp[X]. Effectively, the map cntp[X] tells us the number of elements A[i] divisible by X, where X is a product of prime numbers to the power of 0 or 1. So for example, for the number A[i] = 12, the prime factors are 2, 3. We will count cntp[2]++, cntp[3]++ and cntp[6]++.
Finally, for each B[j] with prime factors p0, p1, ..., pk, we again compute all possible sub-products X and use the Inclusion-exclusion principle to count all non-coprime pairs C_j (i.e. the number of A[i]s that share at least one prime factor with B[j]). The numbers C_j are then subtracted from the total number of pairs - N*N to get the final answer.
Note: the Inclusion-exclusion principle looks like this:
C_j = (cntp[p0] + cntp[p1] + ... + cntp[pk]) -
(cntp[p0p1] + cntp[p0p2] + ... + cntp[pk-1pk]) +
(cntp[p0p1p2] + cntp[p0p1p3] + ... + cntp[pk-2pk-1pk]) -
...
and accounts for the fact that in cntp[X] and cntp[Y] we could have counted the same number A[i] twice, given that it is divisible by both X and Y.
Here is a possible C++ implementation of the algorithm, which produces the same results as the naive O(n^2) algorithm by OP:
// get prime factors of a using pre-generated sieve
std::vector<int> getPrimeFactors(int a, const std::vector<int> & primes) {
std::vector<int> f;
for (auto p : primes) {
if (p > a) break;
if (a % p == 0) {
f.push_back(p);
do {
a /= p;
} while (a % p == 0);
}
}
if (a > 1) f.push_back(a);
return f;
}
// find coprime pairs A_i and B_j
// A_i and B_i <= 1e9
void solution(const std::vector<int> & A, const std::vector<int> & B) {
// generate prime sieve
std::vector<int> primes;
primes.push_back(2);
for (int i = 3; i*i <= 1e9; ++i) {
bool isPrime = true;
for (auto p : primes) {
if (i % p == 0) {
isPrime = false;
break;
}
}
if (isPrime) {
primes.push_back(i);
}
}
int N = A.size();
struct Entry {
int n = 0;
int64_t p = 0;
};
// cntp[X] - number of times the product X can be expressed
// with prime factors of A_i
std::map<int64_t, int64_t> cntp;
for (int i = 0; i < N; i++) {
auto f = getPrimeFactors(A[i], primes);
// count possible products using non-repeating prime factors of A_i
std::vector<Entry> x;
x.push_back({ 0, 1 });
for (auto p : f) {
int k = x.size();
for (int i = 0; i < k; ++i) {
int nn = x[i].n + 1;
int64_t pp = x[i].p*p;
++cntp[pp];
x.push_back({ nn, pp });
}
}
}
// use Inclusion–exclusion principle to count non-coprime pairs
// and subtract them from the total number of prairs N*N
int64_t cnt = N; cnt *= N;
for (int i = 0; i < N; i++) {
auto f = getPrimeFactors(B[i], primes);
std::vector<Entry> x;
x.push_back({ 0, 1 });
for (auto p : f) {
int k = x.size();
for (int i = 0; i < k; ++i) {
int nn = x[i].n + 1;
int64_t pp = x[i].p*p;
x.push_back({ nn, pp });
if (nn % 2 == 1) {
cnt -= cntp[pp];
} else {
cnt += cntp[pp];
}
}
}
}
printf("cnt = %d\n", (int) cnt);
}
Live example
I cannot estimate the complexity analytically, but here are some profiling result on my laptop for different N and uniformly random A[i] and B[j]:
For N = 1e2, takes ~0.02 sec
For N = 1e3, takes ~0.05 sec
For N = 1e4, takes ~0.38 sec
For N = 1e5, takes ~3.80 sec
For comparison, the O(n^2) approach takes:
For N = 1e2, takes ~0.00 sec
For N = 1e3, takes ~0.15 sec
For N = 1e4, takes ~15.1 sec
For N = 1e5, takes too long, didn't wait to finish
Python Implementation:
import math
from collections import defaultdict
def sieve(MAXN):
spf = [0 for i in range(MAXN)]
spf[1] = 1
for i in range(2, MAXN):
spf[i] = i
for i in range(4, MAXN, 2):
spf[i] = 2
for i in range(3, math.ceil(math.sqrt(MAXN))):
if (spf[i] == i):
for j in range(i * i, MAXN, i):
if (spf[j] == j):
spf[j] = i
return(spf)
def getFactorization(x,spf):
ret = list()
while (x != 1):
ret.append(spf[x])
x = x // spf[x]
return(list(set(ret)))
def coprime_pairs(N,A,B):
MAXN=max(max(A),max(B))+1
spf=sieve(MAXN)
cntp=defaultdict(int)
for i in range(N):
f=getFactorization(A[i],spf)
x=[[0,1]]
for p in f:
k=len(x)
for i in range(k):
nn=x[i][0]+1
pp=x[i][1]*p
cntp[pp]+=1
x.append([nn,pp])
cnt=0
for i in range(N):
f=getFactorization(B[i],spf)
x=[[0,1]]
for p in f:
k=len(x)
for i in range(k):
nn=x[i][0]+1
pp=x[i][1]*p
x.append([nn,pp])
if(nn%2==1):
cnt+=cntp[pp]
else:
cnt-=cntp[pp]
return(N*N-cnt)
import random
N=10001
A=[random.randint(1,N) for _ in range(N)]
B=[random.randint(1,N) for _ in range(N)]
print(coprime_pairs(N,A,B))
I have to find the order of matrix formed after matrix chain multiplication.
I have the following code to determine the minimum number of multiplications required to multiply all matrices:
ll MatrixChainOrder(ll p[], ll n) {
ll m[n][n], i, j, k, L, q;
for(i = 1; i < n; i++) {
m[i][i] = 0;
}
for(L = 2; L < n; L++) {
for(i = 1; i < n - L + 1; i++) {
j = i + L - 1;
m[i][j] = INT_MAX;
for(k = i; k <= j - 1; k++) {
q = m[i][k] + m[k+1][j] + p[i-1] * p[k] * p[j];
if (q < m[i][j]) {
m[i][j] = q;
}
}
}
}
return m[1][n-1];
}
How can I print the order of the matrix as well? Can anyone explain?
You need to use another auxiliary matrix (s for example), with indices.
if (q < m[i][j]) {
m[i][j] = q;
s[i][j] = k;
}
With matrices m and s you can print recursively the best matrix parenthesization.
Can someone give me an idea of an efficient algorithm for large n (say 10^10) to find the sum of above series?
Mycode is getting klilled for n= 100000 and m=200000
#include<stdio.h>
int main() {
int n,m,i,j,sum,t;
scanf("%d%d",&n,&m);
sum=0;
for(i=1;i<=n;i++) {
t=1;
for(j=1;j<=i;j++)
t=((long long)t*i)%m;
sum=(sum+t)%m;
}
printf("%d\n",sum);
}
Two notes:
(a + b + c) % m
is equivalent to
(a % m + b % m + c % m) % m
and
(a * b * c) % m
is equivalent to
((a % m) * (b % m) * (c % m)) % m
As a result, you can calculate each term using a recursive function in O(log p):
int expmod(int n, int p, int m) {
if (p == 0) return 1;
int nm = n % m;
long long r = expmod(nm, p / 2, m);
r = (r * r) % m;
if (p % 2 == 0) return r;
return (r * nm) % m;
}
And sum elements using a for loop:
long long r = 0;
for (int i = 1; i <= n; ++i)
r = (r + expmod(i, i, m)) % m;
This algorithm is O(n log n).
I think you can use Euler's theorem to avoid some exponentation, as phi(200000)=80000. Chinese remainder theorem might also help as it reduces the modulo.
You may have a look at my answer to this post. The implementation there is slightly buggy, but the idea is there. The key strategy is to find x such that n^(x-1)<m and n^x>m and repeatedly reduce n^n%m to (n^x%m)^(n/x)*n^(n%x)%m. I am sure this strategy works.
I encountered similar question recently: my 'n' is 1435, 'm' is 10^10. Here is my solution (C#):
ulong n = 1435, s = 0, mod = 0;
mod = ulong.Parse(Math.Pow(10, 10).ToString());
for (ulong i = 1; i <= n;
{
ulong summand = i;
for (ulong j = 2; j <= i; j++)
{
summand *= i;
summand = summand % mod;
}
s += summand;
s = s % mod;
}
At the end 's' is equal to required number.
Are you getting killed here:
for(j=1;j<=i;j++)
t=((long long)t*i)%m;
Exponentials mod m could be implemented using the sum of squares method.
n = 10000;
m = 20000;
sqr = n;
bit = n;
sum = 0;
while(bit > 0)
{
if(bit % 2 == 1)
{
sum += sqr;
}
sqr = (sqr * sqr) % m;
bit >>= 2;
}
I can't add comment, but for the Chinese remainder theorem, see http://mathworld.wolfram.com/ChineseRemainderTheorem.html formulas (4)-(6).