Speed up a C++ code implementing numerical integration - c++11

I have this C++ code that implements a rectangular numerical integration
#include <iostream>
#include <cmath>
using namespace std;
float pdf(float u){
return (1/(pow(1+u, 2)));
}
float cdf(float u){
return (1 - 1/(u+1));
}
// The main function that implements the numerical integration,
//and it is a recursive function
float integ(float h, int k, float du){
float res = 0;
if (k == 1){
res = cdf(h);
}else{
float u = 0;
while (u < h){
res += integ(h - u, k - 1, du)*pdf(u)*du;
u += du;
}
}
return res;
}
int main(){
float du = 0.0001;
int K = 3;
float gamma[4] = {0.31622777, 0.79432823,
1.99526231, 5.01187234};
int G = 50;
int Q = 2;
for (int i = 0; i < 4; i++){
if ((G-Q*(K-1)) > 0){
float gammath = (gamma[i]/Q)*(G-Q*(K-1));
cout<<1-integ(gammath, K, du)<< endl;
}
}
return 0;
}
I am facing a speed problem, although I switched to C++ from Python and MATLAB, because C++ is faster. The problem is that I need a small step size du to get an accurate evaluation of the integration.
Basically, I want to evaluate the integral at 4 different points defined by gammath, which is a function of other defined parameters.
Is there anyway I can speed up this program? I already have 25x+ speed factor over the same code in Python, but still the code takes too long (I ran it all night, and it wasn't finished in the morning). And this is only for K=3, and G=50. In other cases I want to test K = 10, and G = 100 or 300.
Thanks in advance for any tips.

What is behind your computation is that you take the K-fold convolution power of the pdf function and then integrate that power from 0 to h. As you use Riemann sums for integration, it means that you treat the pdf as a step function with steps of width du. In that case, the values of the convolution power can be computed as the coefficients in the power of a (truncated) power series/generating function
p(z)=pdf(0)+pdf(du)*z+pdf(2*du)*z^2+...+pdf(n*du)*z^n
where n*du>h. You can now compute this power via FFT based algorithms. A more basic variant uses that if q(z)=p(z)^K mod z^(n+1) then
p(z)*q'(z) = K*q(z)*p'(z) mod z^n
so that the coefficients of q can be computed via convolution sums from the coefficients p[j]=pdf(j*du) of p. Comparing the terms for the power z^(m-1) in the above formula gives on the coefficient level
sum p[m-j]*j*q[j] = K * sum q[j]*(m-j)*p[m-j], j=0..m
or solved for the new coefficient q[m] when the previous coefficients q[0],...,q[m-1] are already computed:
q[m] = 1/(m*p[0]) * sum (K*(m-j)-j)*p[m-j]*q[j], j=0..m-1
In code that gives
q[0] = pow(p[0], K);
for(m=1; m<=n; m++) {
q[m]=0;
for(j=0; j<m; j++) { q[m] += (K*(m-j)-j)*p[m-j]*q[j]; }
q[m] /= m*p[0];
}
and then sum up for the result,
res = q[0];
for(j=1; j*du < h; j++) { res += q[j]; }
res *= pow(du, K);

Related

Processing 3.3.7 Carmichael function

So I am trying to make a carmichael function in processing for some RSA encryption stuff I am playing with, but the modulo function seems to give many wrong answers.
here is my code:
int carmichael(int n) {
int checkIndex = 0;
int m = 1;
ArrayList<Integer> coprimes = findCoprimesLessThan(n);
println(coprimes);
for(m = 1; m < 50; m++){
for(checkIndex = 0; checkIndex < coprimes.size(); checkIndex++){
int a = coprimes.get(checkIndex);
float mod = pow(a, m) % n;
println(a, m, n, mod, pow(a, m), pow(a, m) % n);
if (mod == 1) {
continue;
}
if (mod != 1){
break;
}
return m;
}
}
return 1;
}
And for an input of say, 31, it loops forever (I have it stop at 100 just for this reason so it just outputs 1 if it goes through all 100 and doesn't find anything) when it should give 30. I believe I have narrowed it down to the modulo operation not working on large numbers as that seems to be the problem, for example:
when a = 3, m = 30, and n = 31, my println statement gives this:
3 30 31 18.0 2.05891136E14 18.0
and all of that is correct except the modulo, it gives 18.0 when it should be 1.0. I am unsure of anyway to get around this as even doing a "manual modulus" like this:
while(mod >= n){
mod-= n;
}
results in the exact same problem. All research I have done into the carmichael function has led me to either confusion or here which was no help.
My guess is you're hitting a limit of float precision.
Float values can only track a certain amount of precision. Try running this example program:
float one = 123456789;
float two = one + 1;
println(one == two);
You would expect this to print false, but if you run it, you'll see that it prints true instead. This is because we're outside the bounds of precision.
To get around this, you could upgrade to the double type. Double values have the same problem, but at a higher level of precision.
double one = 123456789;
double two = one + 1;
println(one == two);
Getting back to your code, by default Processing treats everything as a float value. This is fine for most cases, but if you need lots of precision then you're better off switching to double value.
int a = 3;
int m = 30;
int n = 31;
double p = Math.pow(a, m);
println(p);
double mod = p % n;
println(mod);
Note that I'm using Math.pow() instead of pow(). The Math.pow() function comes from Java and takes and returns double values instead of float values.
(By the way, this is the type of example program I was talking about in the comments.)

How to print values in memoization method-Dynamic pragraming

I know for a problem that can be solved using DP, can be solved by either tabulation(bottom-up) approach or memoization(top-down) approach. personally i find memoization is easy and even efficient approach(analysis required just to get recursive formula,once recursive formula is obtained, a brute-force recursive method can easily be converted to store sub-problem's result and reuse it.) The only problem that i am facing in this approach is, i am not able to construct actual result from the table which i filled on demand.
For example, in Matrix Product Parenthesization problem ( to decide in which order to perform the multiplications on Matrices so that cost of multiplication is minimum) i am able to calculate minimum cost not not able to generate order in algo.
For example, suppose A is a 10 × 30 matrix, B is a 30 × 5 matrix, and C is a 5 × 60 matrix. Then,
(AB)C = (10×30×5) + (10×5×60) = 1500 + 3000 = 4500 operations
A(BC) = (30×5×60) + (10×30×60) = 9000 + 18000 = 27000 operations.
here i am able to get min-cost as 27000 but unable to get order which is A(BC).
I used this. Suppose F[i, j] represents least number of multiplication needed to multiply Ai.....Aj and an array p[] is given which represents the chain of matrices such that the ith matrix Ai is of dimension p[i-1] x p[i]. So
0 if i=j
F[i,j]=
min(F[i,k] + F[k+1,j] +P_i-1 * P_k * P_j where k∈[i,j)
Below is the implementation that i have created.
#include<stdio.h>
#include<limits.h>
#include<string.h>
#define MAX 4
int lookup[MAX][MAX];
int MatrixChainOrder(int p[], int i, int j)
{
if(i==j) return 0;
int min = INT_MAX;
int k, count;
if(lookup[i][j]==0){
// recursively calculate count of multiplcations and return the minimum count
for (k = i; k<j; k++) {
int gmin=0;
if(lookup[i][k]==0)
lookup[i][k]=MatrixChainOrder(p, i, k);
if(lookup[k+1][j]==0)
lookup[k+1][j]=MatrixChainOrder(p, k+1, j);
count = lookup[i][k] + lookup[k+1][j] + p[i-1]*p[k]*p[j];
if (count < min){
min = count;
printf("\n****%d ",k); // i think something has be done here to represent the correct answer ((AB)C)D where first mat is represented by A second by B and so on.
}
}
lookup[i][j] = min;
}
return lookup[i][j];
}
// Driver program to test above function
int main()
{
int arr[] = {2,3,6,4,5};
int n = sizeof(arr)/sizeof(arr[0]);
memset(lookup, 0, sizeof(lookup));
int width =10;
printf("Minimum number of multiplications is %d ", MatrixChainOrder(arr, 1, n-1));
printf("\n ---->");
for(int l=0;l<MAX;++l)
printf(" %*d ",width,l);
printf("\n");
for(int z=0;z<MAX;z++){
printf("\n %d--->",z);
for(int x=0;x<MAX;x++)
printf(" %*d ",width,lookup[z][x]);
}
return 0;
}
I know using tabulation approach printing the solution is much easy but i want to do it in memoization technique.
Thanks.
Your code correctly computes the minimum number of multiplications, but you're struggling to display the optimal chain of matrix multiplications.
There's two possibilities:
When you compute the table, you can store the best index found in another memoization array.
You can recompute the optimal splitting points from the results in the memoization array.
The first would involve creating the split points in a separate array:
int lookup_splits[MAX][MAX];
And then updating it inside your MatrixChainOrder function:
...
if (count < min) {
min = count;
lookup_splits[i][j] = k;
}
You can then generate the multiplication chain recursively like this:
void print_mult_chain(int i, int j) {
if (i == j) {
putchar('A' + i - 1);
return;
}
putchar('(');
print_mult_chain(i, lookup_splits[i][j]);
print_mult_chain(lookup_splits[i][j] + 1, j);
putchar(')');
}
You can call the function with print_mult_chain(1, n - 1) from main.
The second possibility is that you don't cache lookup_splits and recompute it as necessary.
int get_lookup_splits(int p[], int i, int j) {
int best = INT_MAX;
int k_best;
for (int k = i; k < j; k++) {
int count = lookup[i][k] + lookup[k+1][j] + p[i-1]*p[k]*p[j];
if (count < best) {
best = count;
k_best = k;
}
}
return k;
}
This is essentially the same computation you did inside MatrixChainOrder, so if you go with this solution you should factor the code appropriately to avoid having two copies.
With this function, you can adapt print_mult_chain above to use it rather than the lookup_splits array. (You'll need to pass the p array in).
[None of this code is tested, so you may need to edit the answer to fix bugs].

how to calculate combination of large numbers

I calculated permutation of numbers as:-
nPr = n!/(n-r)!
where n and r are given .
1<= n,r <= 100
i find p=(n-r)+1
and
for(i=n;i>=p;i--)
multiply digit by digit and store in array.
But how will I calculate the nCr = n!/[r! * (n-r)!] for the same range.?
I did this using recursion as follow :-
#include <stdio.h>
typedef unsigned long long i64;
i64 dp[100][100];
i64 nCr(int n, int r)
{
if(n==r) return dp[n][r] = 1;
if(r==0) return dp[n][r] = 1;
if(r==1) return dp[n][r] = (i64)n;
if(dp[n][r]) return dp[n][r];
return dp[n][r] = nCr(n-1,r) + nCr(n-1,r-1);
}
int main()
{
int n, r;
while(scanf("%d %d",&n,&r)==2)
{
r = (r<n-r)? r : n-r;
printf("%llu\n",nCr(n,r));
}
return 0;
}
but range for n <=100 , and this is not working for n>60 .
Consider using a BigInteger type of class to represnet your big numbers. BigInteger is available in Java and C# (version 4+ of the .NET Framework). From your question, it looks like you are using C++ (which you should always add as a tag). So try looking here and here for a usable C++ BigInteger class.
One of the best methods for calculating the binomial coefficient I have seen suggested is by Mark Dominus. It is much less likely to overflow with larger values for N and K than some other methods.
static long GetBinCoeff(long N, long K)
{
// This function gets the total number of unique combinations based upon N and K.
// N is the total number of items.
// K is the size of the group.
// Total number of unique combinations = N! / ( K! (N - K)! ).
// This function is less efficient, but is more likely to not overflow when N and K are large.
// Taken from: http://blog.plover.com/math/choose.html
//
if (K > N) return 0;
long r = 1;
long d;
for (d = 1; d <= K; d++)
{
r *= N--;
r /= d;
}
return r;
}
Just replace all the long definitions with BigInt and you should be good to go.

Repeated Squaring - Matrix Multiplication using NEWMAT

I'm trying to use the repeated squaring algorithm (using recursion) to perform matrix exponentiation. I've included header files from the NEWMAT library instead of using arrays. The original matrix has elements in the range (-5,5), all numbers being of type float.
# include "C:\User\newmat10\newmat.h"
# include "C:\User\newmat10\newmatio.h"
# include "C:\User\newmat10\newmatap.h"
# include <iostream>
# include <time.h>
# include <ctime>
# include <cstdlib>
# include <iomanip>
using namespace std;
Matrix repeated_squaring(Matrix A, int exponent, int n) //Recursive function
{
A(n,n);
IdentityMatrix I(n);
if (exponent == 0) //Matrix raised to zero returns an Identity Matrix
return I;
else
{
if ( exponent%2 == 1 ) // if exponent is odd
return (A * repeated_squaring (A*A, (exponent-1)/2, n));
else //if exponent is even
return (A * repeated_squaring( A*A, exponent/2, n));
}
}
Matrix direct_squaring(Matrix B, int k, int no) //Brute Force Multiplication
{
B(no,no);
Matrix C = B;
for (int i = 1; i <= k; i++)
C = B*C;
return C;
}
//----Creating a matrix with elements b/w (-5,5)----
float unifRandom()
{
int a = -5;
int b = 5;
float temp = (float)((b-a)*( rand()/RAND_MAX) + a);
return temp;
}
Matrix initialize_mat(Matrix H, int ord)
{
H(ord,ord);
for (int y = 1; y <= ord; y++)
for(int z = 1; z<= ord; z++)
H(y,z) = unifRandom();
return(H);
}
//---------------------------------------------------
void main()
{
int exponent, dimension;
cout<<"Insert exponent:"<<endl;
cin>>exponent;
cout<< "Insert dimension:"<<endl;
cin>>dimension;
cout<<"The number of rows/columns in the square matrix is: "<<dimension<<endl;
cout<<"The exponent is: "<<exponent<<endl;
Matrix A(dimension,dimension),B(dimension,dimension);
Matrix C(dimension,dimension),D(dimension,dimension);
B= initialize_mat(A,dimension);
cout<<"Initial Matrix: "<<endl;
cout<<setw(5)<<setprecision(2)<<B<<endl;
//-----------------------------------------------------------------------------
cout<<"Repeated Squaring Result: "<<endl;
clock_t time_before1 = clock();
C = repeated_squaring (B, exponent , dimension);
cout<< setw(5) <<setprecision(2) <<C;
clock_t time_after1 = clock();
float diff1 = ((float) time_after1 - (float) time_before1);
cout << "It took " << diff1/CLOCKS_PER_SEC << " seconds to complete" << endl<<endl;
//---------------------------------------------------------------------------------
cout<<"Direct Squaring Result:"<<endl;
clock_t time_before2 = clock();
D = direct_squaring (B, exponent , dimension);
cout<<setw(5)<<setprecision(2)<<D;
clock_t time_after2 = clock();
float diff2 = ((float) time_after2 - (float) time_before2);
cout << "It took " << diff2/CLOCKS_PER_SEC << " seconds to complete" << endl<<endl;
}
I face the following problems:
The random number generator returns only "-5" as each element in the output.
The Matrix multiplication yield different results with brute force multiplication and using the repeated squaring algorithm.
I'm timing the execution time of my code to compare the times taken by brute force multiplication and by repeated squaring.
Could someone please find out what's wrong with the recursion and with the matrix initialization?
NOTE: While compiling this program, make sure you've imported the NEWMAT library.
Thanks in advance!
rand() returns an int so rand()/RAND_MAX will truncate to an integer = 0. Try your
repeated square algorithm by hand with n = 1, 2 and 3 and you'll find a surplus A *
and a gross inefficiency.
Final Working code has the following improvements:
Matrix repeated_squaring(Matrix A, int exponent, int n) //Recursive function
{
A(n,n);
IdentityMatrix I(n);
if (exponent == 0) //Matrix raised to zero returns an Identity Matrix
return I;
if (exponent == 1)
return A;
{
if (exponent % 2 == 1) // if exponent is odd
return (A*repeated_squaring (A*A, (exponent-1)/2, n));
else //if exponent is even
return (repeated_squaring(A*A, exponent/2, n));
}
}
Matrix direct_squaring(Matrix B, int k, int no) //Brute Force Multiplication
{
B(no,no);
Matrix C(no,no);
C=B;
for (int i = 0; i < k-1; i++)
C = B*C;
return C;
}
//----Creating a matrix with elements b/w (-5,5)----
float unifRandom()
{
int a = -5;
int b = 5;
float temp = (float) ((b-a)*((float) rand()/RAND_MAX) + a);
return temp;
}

Counting tilings of a rectangle

I am trying to solve this problem but I can't find a solution:
A board consisting of squares arranged into N rows and M columns is given. A tiling of this board is a pattern of tiles that covers it. A tiling is interesting if:
only tiles of size 1x1 and/or 2x2 are used;
each tile of size 1x1 covers exactly one whole square;
each tile of size 2x2 covers exactly four whole squares;
each square of the board is covered by exactly one tile.
For example, the following images show a few interesting tilings of a board of size 4 rows and 3 columns:
http://dabi.altervista.org/images/task.img.4x3_tilings_example.gif
Two interesting tilings of a board are different if there exists at least one square on the board that is covered with a tile of size 1x1 in one tiling and with a tile of size 2x2 in the other. For example, all tilings shown in the images above are different.
Write a function
int count_tilings(int N, int M);
that, given two integers N and M, returns the remainder modulo 10,000,007 of the number of different interesting tilings of a board of size N rows and M columns.
Assume that:
N is an integer within the range [1..1,000,000];
M is an integer within the range [1..7].
For example, given N = 4 and M = 3, the function should return 11, because there are 11 different interesting tilings of a board of size 4 rows and 3 columns:
http://dabi.altervista.org/images/task.img.4x3_tilings_all.gif
for (4,3) the result is 11, for (6,5) the result is 1213.
I tried the following but it doesn't work:
static public int count_tilings ( int N,int M ) {
int result=1;
if ((N==1)||(M==1)) return 1;
result=result+(N-1)*(M-1);
int max_tiling= (int) ((int)(Math.ceil(N/2))*(Math.ceil(M/2)));
System.out.println(max_tiling);
for (int i=2; i<=(max_tiling);i++){
if (N>=2*i){
int n=i+(N-i);
int k=i;
//System.out.println("M-1->"+(M-1) +"i->"+i);
System.out.println("(M-1)^i)->"+(Math.pow((M-1),i)));
System.out.println( "n="+n+ " k="+k);
System.out.println(combinations(n, k));
if (N-i*2>0){
result+= Math.pow((M-1),i)*combinations(n, k);
}else{
result+= Math.pow((M-1),i);
}
}
if (M>=2*i){
int n=i+(M-i);
int k=i;
System.out.println("(N-1)^i)->"+(Math.pow((N-1),i)));
System.out.println( "n="+n+ " k="+k);
System.out.println(combinations(n, k));
if (M-i*2>0){
result+= Math.pow((N-1),i)*combinations(n, k);
}else{
result+= Math.pow((N-1),i);
}
}
}
return result;
}
static long combinations(int n, int k) {
/*binomial coefficient*/
long coeff = 1;
for (int i = n - k + 1; i <= n; i++) {
coeff *= i;
}
for (int i = 1; i <= k; i++) {
coeff /= i;
}
return coeff;
}
Since this is homework I won't give a full solution, but I'll give you some hints.
First here's a recursive solution:
class Program
{
// Important note:
// The value of masks given here is hard-coded for m == 5.
// In a complete solution, you need to calculate the masks for the
// actual value of m given. See explanation in answer for more details.
int[] masks = { 0, 3, 6, 12, 15, 24, 27, 30 };
int CountTilings(int n, int m, int s = 0)
{
if (n == 1) { return 1; }
int result = 0;
foreach (int mask in masks)
{
if ((mask & s) == 0)
{
result += CountTilings(n - 1, m, mask);
}
}
return result;
}
public static void Main()
{
Program p = new Program();
int result = p.CountTilings(6, 5);
Console.WriteLine(result);
}
}
See it working online: ideone
Note that I've added an extra parameter s. This stores the contents of the first column. If the first column is empty, s = 0. If the first column contains some filled squares the corresponding bits in s are set. Initially s = 0, but when a 2 x 2 tile is placed, this fills up some squares in the next column, and that will mean that s will be non-zero in the recursive call.
The masks variable is hard-coded but in a complete solution it needs to be calculated based on the actual value of m. The values stored in masks make more sense if you look at their binary representations:
00000
00011
00110
01100
01111
11000
11011
11110
In other words, it's all the ways of setting pairs of bits in a binary number with m bits. You can write some code to generate all these possiblities. Or since there are only 7 possible values of m, you could also just hard-code all seven possibilities for masks.
There are however two serious problems with the recursive solution.
It will overflow the stack for large values of N.
It requires exponential time to calculate. It is incredibly slow even for small values of N
Both these problems can be solved by rewriting the algorithm to be iterative. Keep m constant and initalize the result for n = 1 for all possible values of s to be 1. This is because if you only have one column you must use only 1x1 tiles, and there is only one way to do this.
Now you can calculate n = 2 for all possible values of s by using the results from n = 1. This can be repeated until you reach n = N. This algorithm completes in linear time with respect to N, and requires constant space.
Here is a recursive solution:
// time used : 27 min
#include <set>
#include <vector>
#include <iostream>
using namespace std;
void placement(int n, set< vector <int> > & p){
for (int i = 0; i < n -1 ; i ++){
for (set<vector<int> > :: iterator j = p.begin(); j != p.end(); j ++){
vector <int> temp = *j;
if (temp[i] == 1 || temp[i+1] == 1) continue;
temp[i] = 1; temp[i+1] = 1;
p.insert(temp);
}
}
}
vector<vector<int> > placement( int n){
if (n > 7) throw "error";
set <vector <int> > p;
vector <int> temp (n,0);
p.insert (temp);
for (int i = 0; i < 3; i ++) placement(n, p);
vector <vector <int> > s;
s.assign (p.begin(), p.end());
return s;
}
bool tryput(vector <vector <int> > &board, int current, vector<int> & comb){
for (int i = 0; i < comb.size(); i ++){
if ((board[current][i] == 1 || board[current+1][i]) && comb[i] == 1) return false;
}
return true;
}
void put(vector <vector <int> > &board, int current, vector<int> & comb){
for (int i = 0; i < comb.size(); i ++){
if (comb[i] == 1){
board[current][i] = 1;
board[current+1][i] = 1;
}
}
return;
}
void undo(vector <vector <int> > &board, int current, vector<int> & comb){
for (int i = 0; i < comb.size(); i ++){
if (comb[i] == 1){
board[current][i] = 0;
board[current+1][i] = 0;
}
}
return;
}
int place (vector <vector <int> > &board, int current, vector < vector <int> > & all_comb){
int m = board.size();
if (current >= m) throw "error";
if (current == m - 1) return 1;
int count = 0;
for (int i = 0; i < all_comb.size(); i ++){
if (tryput(board, current, all_comb[i])){
put(board, current, all_comb[i]);
count += place(board, current+1, all_comb) % 10000007;
undo(board, current, all_comb[i]);
}
}
return count;
}
int place (int m, int n){
if (m == 0) return 0;
if (m == 1) return 1;
vector < vector <int> > all_comb = placement(n);
vector <vector <int> > board(m, vector<int>(n, 0));
return place (board, 0, all_comb);
}
int main(){
cout << place(3, 4) << endl;
return 0;
}
time complexity O(n^3 * exp(m))
to reduce the space usage try bit vector.
to reduce the time complexity to O(m*(n^3)), try dynamic programming.
to reduce the time complexity to O(log(m) * n^3) try divide and conquer + dynamic programming.
good luck

Resources