Algorithm to sum a triple? - algorithm

We have an array A with m positive integer numbers, what's an algorithm that will
return true if there's a triple (x,y,z) in A
such that A[x] + A[y] + A[z] = 200
Otherwise return false. Numbers in array are distinct and running time must be O(n).
I came up with O(n^3). Any ideas on how to achieve this with O(n)?

Since elements are unique, this boils down to pre processing the array in O(n) to filter redundant elements - which are larger than 200 (none of them will be in the triplet).
Than, you have an array which its size is no larger than 200.
Checking all triplets in this array is O(200^3)=O(1) (it can be done more efficiently in terms of constants though).
So, this will be O(n) U O(200^3) = O(n)

I think you can solve this problem with bit operation. Such as bitset in C++ STL.
Using 3 bitsets, the first bitset cache all numbers you can get it by add 1 number, the second bitset cache all numbers you can get it by add 2 numbers, the third bitset cache all numbers you can get it by add 3 numbers. Then if a new number is coming, you can maintain the bitsets by simple bit operation.
Here is a sample C++ code:
bitset<256> bs[4];
for (int i = 0; i < 4; ++i)
int N, number;
cin >> N;
while (N--)
cin >> number;
bs[3] |= (bs[2] << number);
bs[2] |= (bs[1] << number);
if (number <= 200)
//cout << "1: " << bs[1] << endl;
//cout << "2: " << bs[2] << endl;
//cout << "3: " << bs[3] << endl;
cout << bs[3][200] << endl;
The algorithm complexity is O(n). Because bit operation is quickly, each 64-bit long type can cache 64 number, so if you don't want to use bitset, you can use 4 long type(64 * 4 = 256) to replace it.

I agree with #amit's solution, but there is an question: How can we make it better, in our case just faster.
Here is my solution and it's almost based on amit' idea, but the asymptotic complexity == O(n + sum*(sum+1)/2), where n is a length of input array.
Firstly, we need n steps to filter the input array and put each value, that less the sum into the new array, where index of the value is equal to the value. At the end of this step we have the array, which size is equal to sum and we are able to access any value in O(1).
Finally, to find x,y,z we only need sum*(sum+1)/2 steps.
typedef struct SumATripleResult
unsigned int x;
unsigned int y;
unsigned int z;
} SumATripleResult;
SumATripleResult sumATriple(unsigned int totalSum, unsigned int *inputArray, unsigned int n)
SumATripleResult result;
unsigned int array[totalSum];
//Filter the input array and put each value into 'array' where array[value] = value
for (size_t i = 0; i<n; i++)
unsigned int value = inputArray[i];
array[value] = value;
unsigned int x;
unsigned int y;
unsigned int z;
for (size_t i = 0; i<totalSum; i++)
x = array[i];
for (size_t j = i+1; x>0 && j<totalSum; j++)
y = array[j];
if( y==0 || x + y >= totalSum) continue;
unsigned int zIdx = totalSum - (x + y);
if(zIdx == x || zIdx == y) continue;
z = array[zIdx];
if( z != 0)
result.x = x;
result.y = y;
result.z = z;
return result;
//nothing found
return result;
unsigned int array[] = {1, 21, 30, 12, 15, 10, 3, 5, 6, 11, 17, 31};
SumATripleResult r = sumATriple(52, array, 12);
printf("result = %d %d %d", r.x, r.y, r.y);
r = sumATriple(49, array, 12);
printf("result = %d %d %d", r.x, r.y, r.y);
r = sumATriple(32, array, 12);
printf("result = %d %d %d", r.x, r.y, r.y);

This is known as 3SUM problem and has no linear solution yet. I am providing a pseudo code running with O(n^2) using binary search algorithm:
sumTriple(А[1...n]: array of integers,sum: integer): bool
for i ← 1 to n-2
j ← i+1
k ← n
while k > j
if A[i]+A[j]+A[k] = sum
print i,j,k
return true
else if A[i]+A[j]+A[k] > sum
k ← k-1
else // A[i]+A[j]+A[k] < sum
j ← j+1
return false
More information and further details for the problem you can find here.


Smallest Multiple of given number With digits only 0 and 1

You are given an integer N. You have to find smallest multiple of N which consists of digits 0 and 1 only. Since this multiple could be large, return it in form of a string.
Returned string should not contain leading zeroes.
For example,
For N = 55, 110 is smallest multiple consisting of digits 0 and 1.
For N = 2, 10 is the answer.
I saw several related problems, but I could not find the problem with my code.
Here is my code giving TLE on some cases even after using map instead of set.
#define ll long long
int getMod(string s, int A)
int res=0;
for(int i=0;i<s.length();i++)
return res;
string Solution::multiple(int A) {
return to_string(A);
string s="1";
int mod=getMod(s,A);
return s;
else if(st.find(mod)==st.end())
Here is an implementation in Raku.
my $n = 55;
(1 .. Inf).map( *.base(2) ).first( * %% $n );
(1 .. Inf) is a lazy list from one to infinity. The "whatever star" * establishes a closure and stands for the current element in the map.
base is a method of Rakus Num type which returns a string representation of a given number in the wanted base, here a binary string.
first returns the current element when the "whatever star" closure holds true for it.
The %% is the divisible by operator, it implicitly casts its left side to Int.
Oh, and to top it off. It's easy to parallelize this, so your code can use multiple cpu cores:
(1 .. Inf).race( :batch(1000), :degree(4) ).map( *.base(2) ).first( * %% $n );
As mentioned in the "math" reference, the result is related to the congruence of the power of 10 modulo A.
n = sum_i a[i] 10^i
n modulo A = sum_i a[i] b[i]
Where the a[i] are equal to 0 or 1, and the b[i] = (10^i) modulo A
Then the problem is to find the minimum a[i] sequence, such that the sum is equal to 0 modulo A.
From a graph a point of view, we have to find the shortest path to zero modulo A.
A BFS is generally well adapted to find such a path. The issue is the possible exponential increase of the number of nodes to visit. Here, were are sure to get a number of nodes less than A, by rejecting the nodes, the sum of which (modulo A) has already been obtained (see vector used in the program). Note that this rejection is needed in order to get the minimum number at the end.
Here is a program in C++. The solution being quite simple, it should be easy to understand even by those no familiar with C++.
#include <iostream>
#include <string>
#include <vector>
struct node {
int sum = 0;
std::string s;
std::string multiple (int A) {
std::vector<std::vector<node>> nodes (2);
std::vector<bool> used (A, false);
int range = 0;
int ten = 10 % A;
int pow_ten = 1;
if (A == 0) return "0";
if (A == 1) return "1";
nodes[range].push_back (node{0, "0"});
nodes[range].push_back (node{1, "1"});
used[1] = true;
while (1) {
int range_new = (range + 1) % 2;
pow_ten = (pow_ten * ten) % A;
for (node &x: nodes[range]) {
node y = x;
y.s = "0" + y.s;
y = x;
y.sum = (y.sum + pow_ten) % A;
if (used[y.sum]) continue;
used[y.sum] = true;
y.s = "1" + y.s;
if (y.sum == 0) return y.s;
range = range_new;
int main() {
std::cout << "input number: ";
int n;
std::cin >> n;
std::cout << "Result = " << multiple(n) << "\n";
return 0;
The above program is using a kind of memoization in order to speed up the process but for large inputs memory becomes too large.
As indicated in a comment for example, it cannot handle the case N = 60000007.
I improved the speed and the range a little bit with the following modifications:
A function (reduction) was created to simplify the search when the input number is divisible by 2 or 5
For the memorization of the nodes (nodes array), only one array is used now instead of two
A kind of meet-in-the middle procedure is used: in a first step, a function mem_gen memorizes all relevant 01 sequences up to N_DIGIT_MEM (=20) digits. Then the main procedure multiple2 generates valid 01 sequences "after the 20 first digits" and then in the memory looks for a "complementary sequence" such that the concatenation of both is a valid sequence
With this new program the case N = 60000007 provides the good result (100101000001001010011110111, 27 digits) in about 600ms on my PC.
Instead of limiting the number of digits for the memorization in the first step, I now use a threshold on the size of the memory, as this size does not depent only on the number of digits but also of the input number. Note that the optimal value of this threshold would depend of the input number. Here, I selected a thresholf of 50k as a compromise. With a threshold of 20k, for 60000007, I obtain the good result in 36 ms. Besides, with a threshold of 100k, the worst case 99999999 is solved in 5s.
I made different tests with values less than 10^9. In about all tested cases, the result is provided in less that 1s. However, I met a corner case N=99999999, for which the result consists in 72 consecutive "1". In this particular case, the program takes about 6.7s. For 60000007, the good result is obtained in 69ms.
Here is the new program:
#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <unordered_map>
#include <chrono>
#include <cmath>
#include <algorithm>
std::string reverse (std::string s) {
std::string res {s.rbegin(), s.rend()};
return res;
struct node {
int sum = 0;
std::string s;
node (int sum_ = 0, std::string s_ = ""): sum(sum_), s(s_) {};
// This function simplifies the search when the input number is divisible by 2 or 5
node reduction (int &X, long long &pow_ten) {
node init {0, ""};
while (1) {
int digit = X % 10;
if (digit == 1 || digit == 3 || digit == 7 || digit == 9) break;
switch (digit) {
X /= 10;
X = (5*X)/10;
X = (2*X)/10;
pow_ten = (pow_ten * 10) % X;
return init;
const int N_DIGIT_MEM = 30; // 20
const int threshold_size_mem = 50000;
// This function memorizes all relevant 01 sequences up to N_DIGIT_MEM digits
bool gene_mem (int X, long long &pow_ten, int index_max, std::map<int, std::string> &mem, node &result) {
std::vector<node> nodes;
std::vector<bool> used (X, false);
bool start = true;
for (int index = 0; index < index_max; ++index){
if (start) {
node x = {int(pow_ten), "1"};
nodes.push_back (x);
} else {
for (node &x: nodes) {
int n = nodes.size();
for (int i = 0; i < n; ++i) {
node y = nodes[i];
y.sum = (y.sum + pow_ten) % X;
y.s.back() = '1';
if (used[y.sum]) continue;
used[y.sum] = true;
if (y.sum == 0) {
result = y;
return true;
pow_ten = (10 * pow_ten) % X;
start = false;
int n_mem = nodes.size();
if (n_mem > threshold_size_mem) {
for (auto &x: nodes) {
mem[x.sum] = x.s;
//std::cout << "size mem = " << mem.size() << "\n";
return false;
// This function generates valid 01 sequences "after the 20 first digits" and then in the memory
// looks for a "complementary sequence" such that the concatenation of both is a valid sequence
std::string multiple2 (int A) {
std::vector<node> nodes;
std::map<int, std::string> mem;
int ten = 10 % A;
long long pow_ten = 1;
int digit;
if (A == 0) return "0";
int X = A;
node init = reduction (X, pow_ten);
if (X != A) ten = ten % X;
if (X == 1) {
return reverse(init.s);
std::vector<bool> used (X, false);
node result;
int index_max = N_DIGIT_MEM;
if (gene_mem (X, pow_ten, index_max, mem, result)) {
return reverse(init.s + result.s);
node init2 {0, ""};
while (1) {
for (node &x: nodes) {
int n = nodes.size();
for (int i = 0; i < n; ++i) {
node y = nodes[i];
y.sum = (y.sum + pow_ten) % X;
if (used[y.sum]) continue;
used[y.sum] = true;
y.s.back() = '1';
if (y.sum != 0) {
int target = X - y.sum;
auto search = mem.find(target);
if (search != mem.end()) {
//std::cout << "mem size 2nd step = " << nodes.size() << "\n";
return reverse(init.s + search->second + y.s);
pow_ten = (pow_ten * ten) % X;
int main() {
std::cout << "input number: ";
int n;
std::cin >> n;
std::string res;
auto t1 = std::chrono::high_resolution_clock::now();
res = multiple2(n),
std::cout << "Result = " << res << " ndigit = " << res.size() << std::endl;
auto t2 = std::chrono::high_resolution_clock::now();
auto duration2 = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
std::cout << "time = " << duration2/1000 << " ms" << std::endl;
return 0;
For people more familiar with Python, here is a converted version of #Damien's code. Damien's important insight is to strongly reduce the search tree, taking advantage of the fact that each partial sum only needs to be investigated once, namely the first time it is encountered.
The problem is also described at Mathpuzzle, but there they mostly fix on the necessary existence of a solution. There's also code mentioned at the online encyclopedia of integer sequences. The sage version seems to be somewhat similar.
I made a few changes:
Starting with an empty list helps to correctly solve A=1 while simplifying the code. The multiplication by 10 is moved to the end of the loop. Doing the same for 0 seems to be hard, as log10(0) is minus infinity.
Instead of alternating between nodes[range] and nodes[new_range], two different lists are used.
As Python supports integers of arbitrary precision, the partial results could be stored as decimal or binary numbers instead of as strings. This is not yet done in the code below.
from collections import namedtuple
node = namedtuple('node', 'sum str')
def find_multiple_ones_zeros(A):
nodes = [node(0, "")]
used = set()
pow_ten = 1
while True:
new_nodes = []
for x in nodes:
y = node(x.sum, "0" + x.str)
next_sum = (x.sum + pow_ten) % A
y = node((x.sum + pow_ten) % A, x.str)
if next_sum in used:
y = node(next_sum, "1" + x.str)
if next_sum == 0:
return y.str
pow_ten = (pow_ten * 10) % A
nodes = new_nodes

Sum of different elements in tuples nonexponential algorithm

I was working on something, and I was able to reduce a problem to a particular form: given n tuples each of k integers, say: (a1,a2,a3,a4) , (b1,b2,b3,b4) , (c1,c2,c3,c4) , (d1,d2,d3,d4), I wish to choose any number of tuples, that, when added to each other, give a tuple with no positive elements. If I choose tuples a and b, I get tuple (a1+b1,a2+b2,a3+b3,a4+b4). So, if a = (1,-2,2,0) and b=(-1, 1, -3,0) then a+b =(0,-1,-1,0) which includes no positive numbers, hence is a solution of the problem.
Is there a way to obtain a solution (or verify its nonexistence) using a method other than checking the sum of all subset tuples, which takes 2^n steps?
Since this question is from my head, and not a particular textbook, I do not know the proper way to express it, and research to find an answer has been completely futile. Most of my searches directed me to the subset sum problem, where we choose k elements from a list that sum to a particular question. My problem could be said to be a complication of that: we choose a group of tuples from a list, and we want the sum of each element in these tuples to be <=0.
Edit: Thanks to the link provided, and due to the comments that indicated that a less than exponential solution is difficult, solving the question for the tuples whose elements range between -1,0, and 1 will be enough for me. Furthermore, the tuples will have ranging from 10,000-20,000 integers, and there will be no more than 1000 tuples. Each tuple has at most 10 1's, and 10 -1's, and the rest are zeroes
If anyone could also prove that it is some sort of NP, that would be great.
I failed to come up with a DP solution, and sorting doesn't seem useful
This can be solved in pseudo polynomial time with the given constraints using dynamic programming.
This is similar to the pseudo polynomial time dynamic programming solution for the subset sum problem. It is only extended to multiple dimensions (4).
Time complexity
O(n * sum4) or in this case, since sum has been bounded by n,
Here is a top-down dynamic programming solution with memoization in C++.
const int N = 50;
int a[50][4]= {{0, 1, -1, 0},
{1, -1, 0, 0},
{-1, -1, 0, -1}};
unordered_map<int, bool> dp[N];
bool subset(int n, int sum1, int sum2, int sum3, int sum4)
// Base case: No tuple selected
if (n == -1 && !sum1 && !sum2 && !sum3 && !sum4)
return true;
// Base case: No tuple selected with non-zero sum
else if(n == -1)
return false;
else if(dp[n].find(hashsum(sum1, sum2, sum3, sum4)) != dp[n].end() )
return dp[n][hashsum(sum1, sum2, sum3, sum4)];
// Include the current element
bool include = subset(n - 1,
sum1 - a[n][0],
sum2 - a[n][1],
sum3 - a[n][2],
sum4 - a[n][3]);
// Exclude the current element
bool exclude = subset(n - 1, sum1, sum2, sum3, sum4);
return dp[n][hashsum(sum1, sum2, sum3, sum4)] = include || exclude;
For memoization, the hashsum is calculated as follows:
int hashsum(int sum1, int sum2, int sum3, int sum4) {
int offset = N;
int base = 2 * N;
int hashSum = 0;
hashSum += (sum1 + offset) * 1;
hashSum += (sum2 + offset) * base;
hashSum += (sum3 + offset) * base * base;
hashSum += (sum4 + offset) * base * base * base;
return hashSum;
The driver code can then search for any non-positive sum as follows:
int main()
int n = 3;
bool flag = false;
int sum1, sum2, sum3, sum4;
for (sum1 = -n; sum1 <= 0; sum1++) {
for (sum2 = -n; sum2 <= 0; sum2++) {
for (sum3 = -n; sum3 <= 0; sum3++) {
for (sum4 = -n; sum4 <= 0; sum4++) {
if (subset(n - 1, sum1, sum2, sum3, sum4)) {
flag = true;
goto done;
if (flag && (sum1 || sum2 || sum3 || sum4))
cout << "Solution found. " << sum1 << ' ' << sum2 << ' ' << sum3 << ' ' << sum4 << std::endl;
cout << "No solution found.\n";
return 0;
Note that a trivial solution with sums (0, 0, 0, 0} where no element is ever selected always exists and thus is left out in the driver code.

Dividing an array into K subsets such that sum of all subsets is same using bitmasks+DP

So, this problem I dont have any clue how to solve it the problem statement is :
Given a set S of N integers the task is decide if it is possible to
divide them into K non-empty subsets such that the sum of elements in
every of the K subsets is equal.
N can be at max 20. K can be at max 8
The problem is to be solved specifically using DP+Bitmasks!
I cannot understand where to start ! As there are K sets to be maintained , I cannot take K states each representing some or the other!!
If I try taking the whole set as a state and K as the other, I have issues in creating a recurrent relation!
Can you help??
The link to original problem Problem
You can solve the problem in O(N * 2^N), so the K is meaningless for the complexity.
First let me warn you about the corner case N < K with all the numbers being zero, in which the answer is "no".
The idea of my algorithm is the following. Assume we have computed the sum of each of the masks (that can be done in O(2^N)). We know that for each of the groups, the sum should be the total sum divided by K.
We can do a DP with masks in which the state is just a binary mask telling which numbers have been used. The key idea in removing the K from the algorithm complexity is noticing that if we know which numbers have been used, we know the sum so far, so we also know which group we are filling now (current sum / group sum). Then just try to select the next number for the group: it will be valid if we do not exceed the group expected sum.
You can check my C++ code:
#include <iostream>
#include <vector>
#include <cstring>
using namespace std;
typedef long long ll;
ll v[21 + 5];
ll sum[(1 << 21) + 5];
ll group_sum;
int n, k;
void compute_sums(int position, ll current_sum, int mask)
if (position == -1)
sum[mask] = current_sum;
compute_sums(position - 1, current_sum, mask << 1);
compute_sums(position - 1, current_sum + v[position], (mask << 1) + 1);
void solve_case()
cin >> n >> k;
for (int i = 0; i < n; ++i)
cin >> v[i];
memset(sum, 0, sizeof(sum));
compute_sums(n - 1, 0, 0);
group_sum = sum[(1 << n) - 1];
if (group_sum % k != 0)
cout << "no" << endl;
if (group_sum == 0)
if (n >= k)
cout << "yes" << endl;
cout << "no" << endl;
group_sum /= k;
vector<int> M(1 << n, 0);
M[0] = 1;
for (int mask = 0; mask < (1 << n); ++mask)
if (M[mask])
int current_group = sum[mask] / group_sum;
for (int i = 0; i < n; ++i)
if ((mask >> i) & 1)
if (sum[mask | (1 << i)] <= group_sum * (current_group + 1))
M[mask | (1 << i)] = 1;
if (M[(1 << n) - 1])
cout << "yes" << endl;
cout << "no" << endl;
int main()
int cases;
cin >> cases;
for (int z = 1; z <= cases; ++z)
Here's the working O(K*2^N*N) implementation in JavaScript. From the pseudo code
function equality(set, size, count) {
if(size < count) { return false; }
var total = set.reduce(function(p, c) { return p + c; }, 0);
if((total % count) !== 0) { return false }
var subsetTotal = total / count;
var search = {0: true};
var nextSearch = {};
for(var i=0; i<count; i++) {
for(var bits=0; bits < (1 << size); bits++){
if(search[bits] !== true) { continue; }
var sum = 0;
for(var j=0; j < size; j++) {
if((bits & (1 << j)) !== 0) { sum += set[j]; }
sum -= i * subsetTotal;
for(var j=0; j < size; j++) {
if((bits & (1 << j)) !== 0) { continue; }
var testBits = bits | (1 << j);
var tmpTotal = sum + set[j];
if(tmpTotal == subsetTotal) { nextSearch[testBits] = true; }
else if(tmpTotal < subsetTotal) { search[testBits] = true; }
search = nextSearch;
nextSearch = {};
if(search[(1 << size) - 1] === true) {
return true;
return false;
console.log(true, equality([1,2,3,1,2,3], 6, 2));
console.log(true, equality([1, 2, 4, 5, 6], 5, 3));
console.log(true, equality([10,20,10,20,10,20,10,20,10,20], 10, 5));
console.log(false, equality([1,2,4,5,7], 5, 3));
EDIT The algorithm finds all of the bitmasks (which represent subsets bits) that meet the criteria (having a sum tmpTotal less than or equal to the ideal subset sum subsetTotal). Repeating this process by the amount of subsets required count, you either have a bitmask where all size bits are set which means success or the test fails.
set = [1, 2, 1, 2]
size = 4
count = 2, we want to try to partition the set into 2 subsets
subsetTotal = (1+2+1+2) / 2 = 3
Iteration 1:
search = {0b: true, 1b: true, 10b: true, 100b: true, 1000b: true, 101b: true}
nextSearch = {11b: true, 1100b: true, 110b: true, 1001b: true }
Iteration 2:
search = {11b: true, 1100b: true, 110b: true, 1001b: true, 111b: true, 1101b: true }
nextSearch = {1111b: true}
Final Check
(1 << size) == 10000b, (1 << size) - 1 == 1111b
Since nextSearch[ 1111b ] exists we return success.
UPD: I confused N and K with each other and my idea is true but not efficient.Efficient idea added at the end
Assume that so far you've created k-1 subsets, and now you want to create the k-th subset. For creating the k-th subset, you need to be able to answer these two questions:
1- What should be the sum of elements of k-th subset?
2- Which elements have been used so far ?
Answering the first question is easy, the sum should be equal to sum of all elements divided by K, let's name it subSum.
For second question, we need to have the state of each element, used or not. Here we need to use bitmask idea.
Here's the dp recurrence:
dp[i][mask] = means is it possible to create i subsets with sum of each equals to subSum, using the elements which are 1(not used) in mask (in its bit representation), So dp[i][mask] is a boolean type.
dp[i][mask] = OR(dp[i-1][mask2]) for all possible mask2 states. mask2 will be produced by converting some 1's of mask to 0's, i.e. those 1's that we want to be the elements of i-th subset.
For checking all possible mask2, you need to check all 2^n possible subsets of available 1's bits.Therefore, totaly, the time complexity will be O(N*(2^n)*(2^n)). In your problem is 20*2^8*2^8= 10*2^17 < 10^7 which can pass the time limit.
Obviously, for base case you have to handle dp[0][mask] on your own, without using the recurrence.Final answer is whether dp[K][2^N-1] is true or not.
__UPD__: For getting a better performance,before get into DP, you could preprocess all subsets with sum of subSum. Then, for calculating mask2, you just need to iterate over the preprocessed list, and see whether the AND operation of them with mask would result in the subset in the list or not.
For having an efficient solution, instead of finding proper mask2, we could use the fact that at each step, we know the sum of elements till that point. So we could add elements one by one into the mask, and whenever we had a sum which is divisible by K we could go to the next step for creating next subset.
if (sum of used elements of mask is divisible by K)
dp[i][mask]= dp[i+1][mask];
dp[i][mask]|=dp[i][mask ^(1<<i)] provided that i-th item is not used and can not exceed the current sum more than i*subSum.

Repeated Squaring - Matrix Multiplication using NEWMAT

I'm trying to use the repeated squaring algorithm (using recursion) to perform matrix exponentiation. I've included header files from the NEWMAT library instead of using arrays. The original matrix has elements in the range (-5,5), all numbers being of type float.
# include "C:\User\newmat10\newmat.h"
# include "C:\User\newmat10\newmatio.h"
# include "C:\User\newmat10\newmatap.h"
# include <iostream>
# include <time.h>
# include <ctime>
# include <cstdlib>
# include <iomanip>
using namespace std;
Matrix repeated_squaring(Matrix A, int exponent, int n) //Recursive function
IdentityMatrix I(n);
if (exponent == 0) //Matrix raised to zero returns an Identity Matrix
return I;
if ( exponent%2 == 1 ) // if exponent is odd
return (A * repeated_squaring (A*A, (exponent-1)/2, n));
else //if exponent is even
return (A * repeated_squaring( A*A, exponent/2, n));
Matrix direct_squaring(Matrix B, int k, int no) //Brute Force Multiplication
Matrix C = B;
for (int i = 1; i <= k; i++)
C = B*C;
return C;
//----Creating a matrix with elements b/w (-5,5)----
float unifRandom()
int a = -5;
int b = 5;
float temp = (float)((b-a)*( rand()/RAND_MAX) + a);
return temp;
Matrix initialize_mat(Matrix H, int ord)
for (int y = 1; y <= ord; y++)
for(int z = 1; z<= ord; z++)
H(y,z) = unifRandom();
void main()
int exponent, dimension;
cout<<"Insert exponent:"<<endl;
cout<< "Insert dimension:"<<endl;
cout<<"The number of rows/columns in the square matrix is: "<<dimension<<endl;
cout<<"The exponent is: "<<exponent<<endl;
Matrix A(dimension,dimension),B(dimension,dimension);
Matrix C(dimension,dimension),D(dimension,dimension);
B= initialize_mat(A,dimension);
cout<<"Initial Matrix: "<<endl;
cout<<"Repeated Squaring Result: "<<endl;
clock_t time_before1 = clock();
C = repeated_squaring (B, exponent , dimension);
cout<< setw(5) <<setprecision(2) <<C;
clock_t time_after1 = clock();
float diff1 = ((float) time_after1 - (float) time_before1);
cout << "It took " << diff1/CLOCKS_PER_SEC << " seconds to complete" << endl<<endl;
cout<<"Direct Squaring Result:"<<endl;
clock_t time_before2 = clock();
D = direct_squaring (B, exponent , dimension);
clock_t time_after2 = clock();
float diff2 = ((float) time_after2 - (float) time_before2);
cout << "It took " << diff2/CLOCKS_PER_SEC << " seconds to complete" << endl<<endl;
I face the following problems:
The random number generator returns only "-5" as each element in the output.
The Matrix multiplication yield different results with brute force multiplication and using the repeated squaring algorithm.
I'm timing the execution time of my code to compare the times taken by brute force multiplication and by repeated squaring.
Could someone please find out what's wrong with the recursion and with the matrix initialization?
NOTE: While compiling this program, make sure you've imported the NEWMAT library.
Thanks in advance!
rand() returns an int so rand()/RAND_MAX will truncate to an integer = 0. Try your
repeated square algorithm by hand with n = 1, 2 and 3 and you'll find a surplus A *
and a gross inefficiency.
Final Working code has the following improvements:
Matrix repeated_squaring(Matrix A, int exponent, int n) //Recursive function
IdentityMatrix I(n);
if (exponent == 0) //Matrix raised to zero returns an Identity Matrix
return I;
if (exponent == 1)
return A;
if (exponent % 2 == 1) // if exponent is odd
return (A*repeated_squaring (A*A, (exponent-1)/2, n));
else //if exponent is even
return (repeated_squaring(A*A, exponent/2, n));
Matrix direct_squaring(Matrix B, int k, int no) //Brute Force Multiplication
Matrix C(no,no);
for (int i = 0; i < k-1; i++)
C = B*C;
return C;
//----Creating a matrix with elements b/w (-5,5)----
float unifRandom()
int a = -5;
int b = 5;
float temp = (float) ((b-a)*((float) rand()/RAND_MAX) + a);
return temp;

find number that does not repeat in O(n) time O(1) space

for starters, I did have a look at these questions:
Given an array of integers where some numbers repeat 1 time, some numbers repeat 2 times and only one number repeats 3 times, how do you find the number that repeat 3 times
Algorithm to find two repeated numbers in an array, without sorting
this one different:
given an unsorted array of integers with one unique number and the rest numbers repeat 3 times,
{4,5,3, 5,3,4, 1, 4,3,5 }
we need to find this unique number in O(n) time and O(1) space
NOTE: this is not a homework, just I an nice question I came across
What about this one:
Idea: do bitwise addition mod 3
#include <stdio.h>
int main() {
int a[] = { 1, 9, 9, 556, 556, 9, 556, 87878, 87878, 87878 };
int n = sizeof(a) / sizeof(int);
int low = 0, up = 0;
for(int i = 0; i < n; i++) {
int x = ~(up & a[i]);
up &= x;
x &= a[i];
up |= (x & low);
low ^= x;
printf("single no: %d\n", low);
This solution works for all inputs.
The idea is to extract the bits of an integer from array and add to respective 32bit
bitmap 'b' (implemented as 32byte array to represent 32bit no.)
unsigned int a[7] = {5,5,4,10,4,9,9};
unsigned int b[32] = {0}; //Start with zeros for a 32bit no.
main1() {
int i, j;
unsigned int bit, sum =0 ;
for (i=0;i<7; i++) {
for (j=0; j<32; j++) { //This loop can be optimized!!!!
bit = ((a[i] & (0x01<<j))>>j); //extract the bit and move to right place
b[j] += bit; //add to the bitmap array
for (j=0; j<32; j++) {
b[j] %= 2; //No. repeating exactly 2 times.
if (b[j] == 1) {
sum += (unsigned int) pow(2, j); //sum all the digits left as 1 to get no
//printf("no. is %d", sum);
printf("no. is %d", sum);
