Related
I'm trying to solve a problem which goes like this:
Problem
Given an array of integers "arr" of size "n", process two types of queries. There are "q" queries you need to answer.
Query type 1
input: l r
result: output number of inversions in [l, r]
Query type 2
input: x y
result: update the value at arr [x] to y
Inversion
For every index j < i, if arr [j] > arr [i], the pair (j, i) is one inversion.
Input
n = 5
q = 3
arr = {1, 4, 3, 5, 2}
queries:
type = 1, l = 1, r = 5
type = 2, x = 1, y = 4
type = 1, l = 1, r = 5
Output
4
6
Constraints
Time: 4 secs
1 <= n, q <= 100000
1 <= arr [i] <= 40
1 <= l, r, x <= n
1 <= y <= 40
I know how to solve a simpler version of this problem without updates, i.e. to simply count the number of inversions for each position using a segment tree or fenwick tree in O(N*log(N)). The only solution I have to this problem is O(q*N*log(N)) (I think) with segment tree other than the O(q*N2) trivial algorithm. This however does not fit within the time constraints of the problem. I would like to have hints towards a better algorithm to solve the problem in O(N*log(N)) (if it's possible) or maybe O(N*log2(N)).
I first came across this problem two days ago and have been spending a few hours here and there to try and solve it. However, I'm finding it non-trivial to do so and would like to have some help/hints regarding the same. Thanks for your time and patience.
Updates
Solution
With the suggestion, answer and help by Tanvir Wahid, I've implemented the source code for the problem in C++ and would like to share it here for anyone who might stumble across this problem and not have an intuitive idea on how to solve it. Thank you!
Let's build a segment tree with each node containing information about how many inversions exist and the frequency count of elements present in its segment of authority.
node {
integer inversion_count : 0
array [40] frequency : {0...0}
}
Building the segment tree and handling updates
For each leaf node, initialise inversion count to 0 and increase frequency of the represented element from the input array to 1. The frequency of the parent nodes can be calculated by summing up frequencies of the left and right childrens. The inversion count of parent nodes can be calculated by summing up the inversion counts of left and right children nodes added with the new inversions created upon merging the two segments of their authority which can be calculated using the frequencies of elements in each child. This calculation basically finds out the product of frequencies of bigger elements in the left child and frequencies of smaller elements in the right child.
parent.inversion_count = left.inversion_count + right.inversion_count
for i in [39, 0]
for j in [0, i)
parent.inversion_count += left.frequency [i] * right.frequency [j]
Updates are handled similarly.
Answering range queries on inversion counts
To answer the query for the number of inversions in the range [l, r], we calculate the inversions using the source code attached below.
Time Complexity: O(q*log(n))
Note
The source code attached does break some good programming habits. The sole purpose of the code is to "solve" the given problem and not to accomplish anything else.
Source Code
/**
Lost Arrow (Aryan V S)
Saturday 2020-10-10
**/
#include "bits/stdc++.h"
using namespace std;
struct node {
int64_t inv = 0;
vector <int> freq = vector <int> (40, 0);
void combine (const node& l, const node& r) {
inv = l.inv + r.inv;
for (int i = 39; i >= 0; --i) {
for (int j = 0; j < i; ++j) {
// frequency of bigger numbers in the left * frequency of smaller numbers on the right
inv += 1LL * l.freq [i] * r.freq [j];
}
freq [i] = l.freq [i] + r.freq [i];
}
}
};
void build (vector <node>& tree, vector <int>& a, int v, int tl, int tr) {
if (tl == tr) {
tree [v].inv = 0;
tree [v].freq [a [tl]] = 1;
}
else {
int tm = (tl + tr) / 2;
build(tree, a, 2 * v + 1, tl, tm);
build(tree, a, 2 * v + 2, tm + 1, tr);
tree [v].combine(tree [2 * v + 1], tree [2 * v + 2]);
}
}
void update (vector <node>& tree, int v, int tl, int tr, int pos, int val) {
if (tl == tr) {
tree [v].inv = 0;
tree [v].freq = vector <int> (40, 0);
tree [v].freq [val] = 1;
}
else {
int tm = (tl + tr) / 2;
if (pos <= tm)
update(tree, 2 * v + 1, tl, tm, pos, val);
else
update(tree, 2 * v + 2, tm + 1, tr, pos, val);
tree [v].combine(tree [2 * v + 1], tree [2 * v + 2]);
}
}
node inv_cnt (vector <node>& tree, int v, int tl, int tr, int l, int r) {
if (l > r)
return node();
if (tl == l && tr == r)
return tree [v];
int tm = (tl + tr) / 2;
node result;
result.combine(inv_cnt(tree, 2 * v + 1, tl, tm, l, min(r, tm)), inv_cnt(tree, 2 * v + 2, tm + 1, tr, max(l, tm + 1), r));
return result;
}
void solve () {
int n, q;
cin >> n >> q;
vector <int> a (n);
for (int i = 0; i < n; ++i) {
cin >> a [i];
--a [i];
}
vector <node> tree (4 * n);
build(tree, a, 0, 0, n - 1);
while (q--) {
int type, x, y;
cin >> type >> x >> y;
--x; --y;
if (type == 1) {
node result = inv_cnt(tree, 0, 0, n - 1, x, y);
cout << result.inv << '\n';
}
else if (type == 2) {
update(tree, 0, 0, n - 1, x, y);
}
else
assert(false);
}
}
int main () {
std::ios::sync_with_stdio(false);
std::cin.tie(nullptr);
std::cout.precision(10);
std::cout << std::fixed << std::boolalpha;
int t = 1;
// std::cin >> t;
while (t--)
solve();
return 0;
}
arr[i] can be at most 40. We can use this to our advantage. What we need is a segment tree. Each node will hold 41 values (A long long int which represents inversions for this range and a array of size 40 for count of each numbers. A struct will do). How do we merge two children of a node. We know inversions for left child and right child. Also know frequency of each numbers in both of them. Inversion of parent node will be summation of inversions of both children plus number of inversions between left and right child. We can easily find inversions between two children from frequency of numbers. Query can be done in similar way. Complexity O(40*qlog(n))
I'm looking for an algorithm which computes all permutations of a bitstring of given length (n) and amount of bits set (k). For example while n=4 and k=2 the algorithm shall output:
1100
1010
1001
0011
0101
0110
I'm aware of Gosper's Hack which generates the needed permutations in lexicographic order. But i need them to be generated in such a manner, that two consecutive permutations differ in only two (or at least a constant number of) bitpositions (like in the above example).
Another bithack to do that would be awesome, but also a algorithmic description would help me alot.
Walking bit algorithm
To generate permutations of a binary sequence by swapping exactly one set bit with an unset bit in each step (i.e. the Hamming distance between consecutive permutations equals two), you can use this "walking bit" algorithm; the way it works is similar to creating the (reverse) lexicographical order, but the set bits walk right and left alternately, and as a result some parts of the sequence are mirrored. This is probably better explained with an example:
Recursive implementation
A recursive algorithm would receive a sequence of n bits, with k bits set, either all on the left or all on the right. It would then keep a 1 at the end, recurse with the rest of the sequence, move the set bit and keep 01 at the end, recurse with the rest of the bits, move the set bit and keep 001 at the end, etc... until the last recursion with only set bits. As you can see, this creates alternating left-to-right and right-to-left recursions.
When the algorithm is called with a sequence with only one bit set, this is the deepest recursion level, and the set bit walks from one end to the other.
Code example 1
Here's a simple recursive JavaScript implementation:
function walkingBits(n, k) {
var seq = [];
for (var i = 0; i < n; i++) seq[i] = 0;
walk (n, k, 1, 0);
function walk(n, k, dir, pos) {
for (var i = 1; i <= n - k + 1; i++, pos += dir) {
seq[pos] = 1;
if (k > 1) walk(n - i, k - 1, i%2 ? dir : -dir, pos + dir * (i%2 ? 1 : n - i))
else document.write(seq + "<BR>");
seq[pos] = 0;
}
}
}
walkingBits(7,3);
Translated into C++ that could be something like this:
#include <iostream>
#include <string>
void walkingBits(int n, int k, int dir = 1, int pos = 0, bool top = true) {
static std::string seq;
if (top) seq.resize(n, '0');
for (int i = 1; i <= n - k + 1; i++, pos += dir) {
seq[pos] = '1';
if (k > 1) walkingBits(n - i, k - 1, i % 2 ? dir : -dir, pos + dir * (i % 2 ? 1 : n - i), false);
else std::cout << seq << '\n';
seq[pos] = '0';
}
if (top) seq.clear();
}
int main() {
walkingBits(7, 3);
}
(See also [this C++11 version][3], written by VolkerK in response to a question about the above code.)
(Rextester seems to have been hacked, so I've pasted Volker's code below.)
#include <iostream>
#include <vector>
#include <functional>
void walkingBits(size_t n, size_t k) {
std::vector<bool> seq(n, false);
std::function<void(const size_t, const size_t, const int, size_t)> walk = [&](const size_t n, const size_t k, const int dir, size_t pos){
for (size_t i = 1; i <= n - k + 1; i++, pos += dir) {
seq[pos] = true;
if (k > 1) {
walk(n - i, k - 1, i % 2 ? dir : -dir, pos + dir * (i % 2 ? 1 : n - i));
}
else {
for (bool v : seq) {
std::cout << v;
}
std::cout << std::endl;;
}
seq[pos] = false;
}
};
walk(n, k, 1, 0);
}
int main() {
walkingBits(7, 3);
return 0;
}
Code example 2
Or, if you prefer code where elements of an array are actually being swapped:
function walkingBits(n, k) {
var seq = [];
for (var i = 0; i < n; i++) seq[i] = i < k ? 1 : 0;
document.write(seq + "<BR>");
walkRight(n, k, 0);
function walkRight(n, k, pos) {
if (k == 1) for (var p = pos + 1; p < pos + n; p++) swap(p - 1, p)
else for (var i = 1; i <= n - k; i++) {
[walkLeft, walkRight][i % 2](n - i, k - 1, pos + i);
swap(pos + i - 1, pos + i + (i % 2 ? 0 : k - 1));
}
}
function walkLeft(n, k, pos) {
if (k == 1) for (var p = pos + n - 1; p > pos; p--) swap(p - 1, p)
else for (var i = 1; i <= n - k; i++) {
[walkRight, walkLeft][i % 2](n - i, k - 1, pos);
swap(pos + n - i - (i % 2 ? 1 : k), pos + n - i);
}
}
function swap(a, b) {
var c = seq[a]; seq[a] = seq[b]; seq[b] = c;
document.write(seq + "<BR>");
}
}
walkingBits(7,3);
Code example 3
Here the recursion is rolled out into an iterative implementation, with each of the set bits (i.e. each of the recursion levels) represented by an object {o,d,n,p} which holds the offset from the leftmost position, the direction the set bit is moving in, the number of bits (i.e. the length of this part of the sequence), and the current position of the set bit within this part.
function walkingBits(n, k) {
var b = 0, seq = [], bit = [{o: 0, d: 1, n: n, p: 0}];
for (var i = 0; i < n; i++) seq.push(0);
while (bit[0].p <= n - k) {
seq[bit[b].o + bit[b].p * bit[b].d] = 1;
while (++b < k) {
bit[b] = {
o: bit[b-1].o + bit[b-1].d * (bit[b-1].p %2 ? bit[b-1].n-1 : bit[b-1].p+1),
d: bit[b-1].d * (bit[b-1].p %2 ? -1 : 1),
n: bit[b-1].n - bit[b-1].p - 1,
p: 0
}
seq[bit[b].o + bit[b].p * bit[b].d] = 1;
}
document.write(seq + "<BR>");
b = k - 1;
do seq[bit[b].o + bit[b].p * bit[b].d] = 0;
while (++bit[b].p > bit[b].n + b - k && b--);
}
}
walkingBits(7, 3); // n >= k > 0
Transforming lexicographical order into walking bit
Because the walking bit algorithm is a variation of the algorithm to generate the permutations in (reverse) lexicographical order, each permutation in the lexicographical order can be transformed into its corresponding permutation in the walking bit order, by mirroring the appropriate parts of the binary sequence.
So you can use any algorithm (e.g. Gosper's Hack) to create the permutations in lexicographical or reverse lexicographical order, and then transform each one to get the walking bit order.
Practically, this means iterating over the binary sequence from left to right, and if you find a set bit after an odd number of zeros, reversing the rest of the sequence and iterating over it from right to left, and so on...
Code example 4
In the code below the permutations for n,k = 7,3 are generated in reverse lexicographical order, and then transformed one-by-one:
function lexi2walk(lex) {
var seq = [], ofs = 0, pos = 0, dir = 1;
for (var i = 0; i < lex.length; ++i) {
if (seq[ofs + pos * dir] = lex[i]) {
if (pos % 2) ofs -= (dir *= -1) * (pos + lex.length - 1 - i)
else ofs += dir * (pos + 1);
pos = 0;
} else ++pos;
}
return seq;
}
function revLexi(seq) {
var max = true, pos = seq.length, set = 1;
while (pos-- && (max || !seq[pos])) if (seq[pos]) ++set; else max = false;
if (pos < 0) return false;
seq[pos] = 0;
while (++pos < seq.length) seq[pos] = set-- > 0 ? 1 : 0;
return true;
}
var s = [1,1,1,0,0,0,0];
document.write(s + " → " + lexi2walk(s) + "<br>");
while (revLexi(s)) document.write(s + " → " + lexi2walk(s) + "<br>");
Homogeneous Gray path
The permutation order created by this algorithm is similar, but not identical, to the one created by the "homogeneous Gray path for combinations" algorithm described by D. Knuth in The Art of Computer Programming vol. 4a, sect. 7.2.1.3, formula (31) & fig. 26c.
This is easy to achieve with recursion:
public static void nextPerm(List<Integer> list, int num, int index, int n, int k) {
if(k == 0) {
list.add(num);
return;
}
if(index == n) return;
int mask = 1<<index;
nextPerm(list, num^mask, index+1, n, k-1);
nextPerm(list, num, index+1, n, k);
}
Running this with the client:
public static void main(String[] args) {
ArrayList<Integer> list = new ArrayList<Integer>();
nextPerm(list, 0, 0, 4, 2);
}
Output:
0011
0101
1001
0110
1010
1100
The idea is to start with the initial number, and consider changing a bit, one index at a time, and to keep track of how many times you changed the bits. Once you changed the bits k times (when k == 0), store the number and terminate the branch.
Following is a question from hackerearth.
here's the link to the problem
problem!
I coded its solution in java and c but got time limit exceeded for some test cases on submission. No participant was able to solve this for all test cases. What is the most efficient solution for this?
QUESTION:
Bob likes DSD Numbers. DSD Number is a number which is divisible by its
Digit Sum in Decimal Representation.
digitSum(n) : Sum of digits of n (in Decimal Representation)
eg: n = 1234 then digitSum(n) = 1 + 2 + 3 + 4 = 10
DSD Number is number n such that n % digitSum(n) equal to 0
Bob asked Alice to tell the number of DSD Numbers in range [L,R]
inclusive.
Constraints:
1 <= test cases <= 50
1<=L<=R<=10^9
Sample Input
4
2 5
1 10
20 45
1 100
Sample Output
4
10
9
33
Code in Java:
class DSD {
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
PrintWriter out=new PrintWriter(System.out);
int t=Integer.parseInt(br.readLine());
while(t-->0){
StringTokenizer st=new StringTokenizer(br.readLine());
int L=Integer.parseInt(st.nextToken());
int R=Integer.parseInt(st.nextToken());
int count=0,sum=0,i=L,j=0;
while(i>0){
sum+=i%10;
i=i/10;
}
if(L%sum==0)
count++;
for(i=L+1;i<=R;i++){
if(i%10!=0){
sum+=1;
}
else
{
j=i;
while(j%10==0){
sum-=9;
j/=10;
}
sum+=1;
}
if(i%sum==0)
count++;
}
out.println(count);
}
out.close();
}
}
We can solve this problem by using dynamic programming.
Observation:
There will be maximum 10 digits for each number, so the maximum sum of digit for each number will be less than 100.
So, assuming that we know the sum of digit for one number, by processing digit by digit, we have four things to check:
Whether the current number is larger than the lower bound.
Whether the current number is smaller than the upper bound.
What is the mod of current number with its sum.
What is the current sum of all digits.
We come up with this function int count(int digit, boolean larger, boolean smaller, int left, int mod), and then, the dp state: dp[digit][larger][smaller][left][mod].
For each test case, time complexity is number of possible sum^3 x number of digit = 100^3*10 = 10^7.
There is 50 test cases -> 50*10^7 = 5*10^8 operations, which still be in the time limit.
Java code:
static int[][][][][] dp;
static int[][][][][] check;
static int cur = 0;
public static void main(String[] args) throws FileNotFoundException {
// PrintWriter out = new PrintWriter(new FileOutputStream(new File(
// "output.txt")));
PrintWriter out = new PrintWriter(System.out);
Scanner in = new Scanner();
int n = in.nextInt();
dp = new int[11][2][2][82][82];
check = new int[11][2][2][82][82];
for (int i = 0; i < n; i++) {
int l = in.nextInt();
int r = in.nextInt();
String L = "" + l;
String R = "" + r;
while (L.length() < R.length()) {
L = "0" + L;
}
int result = 0;
for (int j = 1; j <= 81; j++) {
cur = cur + 1;
result += count(0, 0, 0, j, 0, j, L, R);
}
out.println(result);
}
out.close();
}
public static int count(int index, int larger, int smaller, int left,
int mod, int sum, String L, String R) {
if (index == L.length()) {
if (left == 0 && mod == 0) {
return 1;
}
return 0;
}
if((L.length() - index) * 9 < left){
return 0;
}
if (check[index][larger][smaller][left][mod] == cur) {
return dp[index][larger][smaller][left][mod];
}
//System.out.println(cur);
check[index][larger][smaller][left][mod] = cur;
int x = L.charAt(index) - '0';
int y = R.charAt(index) - '0';
int result = 0;
for (int i = 0; i < 10 && i <= left; i++) {
if (x > i && larger == 0) {
continue;
}
if (y < i && smaller == 0) {
continue;
}
int nxtLarger = larger;
int nxtSmaller = smaller;
if (x < i) {
nxtLarger = 1;
}
if (y > i) {
nxtSmaller = 1;
}
int nxtMod = (mod * 10 + i) % sum;
result += count(index + 1, nxtLarger, nxtSmaller, left - i, nxtMod,
sum, L, R);
}
return dp[index][larger][smaller][left][mod] = result;
}
Update: I have submitted and passed all the test cases for this problem, (2nd person who solved this) This is the link of my submission
Let f (L, R) = "number of integers L ≤ x ≤ R where x is divisible by the sum of its digits". We define that x = 0 is not counted.
Let g (M) = "number of integers 1 ≤ x < M where x is divisible by the sum of its digits". We have f (L, R) = g (R + 1) - g (L).
Find the largest k ≥ 0 such that 10^k <= M. Find the largest a ≥ 1 such that a * 10^k <= M. All integers < M have at most 9k + (a-1) as sum of digits.
Let h (M, n) = "number of integers 1 ≤ x < M where x is divisible by n, and the sum of digits is n". g (M) is the sum of h (M, n) for 1 ≤ n ≤ 9*k + (a - 1).
Let r (a, k, n) = "number of integers a*10^k ≤ x < (a+1)*10^k where x is divisible by n, and the sum of digits is n". h (M, n) can be calculated by adding values of r (a, k, n) in an obvious way; for example:
h (1,234,000,000, n) = r (0, 9, n) + r (10, 8, n) + r (11, 8, n) + r (120, 7, n) + r (121, 7, n) + r (122, 7, n) + r (1230, 6, n) + r (1231, 6, n) + r (1232, 6, n) + r (1233, 6, n).
Let f (k, n, d, m) = "number of integers 0 ≤ x < 10^k where the sum of digits is d, and x % n = m". We can calculate r (a, k, n) using this function: The last k digits must have a digit sum of n - digitsum (a). If the whole number is divisible by n, then the last k digits must have a remainder of (- a*10^k) % n. So r (a, k, n) = f (k, n, n - digitsum(a), - (a*10^k) % n).
f (k, n, d, m) is trivial if k = 1: Only for the number d is the sum of digits equal to d, so f (1, n, d, m) is 1 if d % n = m, and 0 otherwise.
To calculate f (k+1, n, d, m) we add f (k, n, d-a, (m - a*10^k)%n) for 0 ≤ a ≤ 9. Obviously all the values f (k, n, d, m) must be stored so they are not recalculated again and again.
And that's it. How many operations: If R < 10^r, then numbers have up to 9r digits. We calculate values f (k, n, d, m) for 1 ≤ k ≤ r, for 1 ≤ n ≤ 9r, for 0 ≤ d ≤ 9r, for 0 ≤ m < n. For each of those we add 10 different numbers, so we have less than 10,000 r^4 additions. So numbers up to 10^19 are no problem.
The following approach should take about 10^7 operations per case.
Split numbers into a prefix (n/10000) and a suffix (n%10000). Once you choose a digit sum, only a little data from each of the prefix and suffix are needed to determine if the digit sum divides the number. (This is related to some things gnasher729 said, but I get a much different running time.)
For each possible digit sum d from 1 to 81,
Map prefix p to a pair (p*10000 % d, digit sum(p)).
Tally the counts in a matrix M.
Map each possible suffix s to a pair (s % d, digit sum(s)).
Tally the counts in a matrix N.
For every (a,b),
total += M[a,b] *N[-a%d,d-b]
There are about 81 * (10^5 + 10^4) steps.
The edge cases where a prefix is partially allowed (L/10000, R/10000, and 100000) can be brute-forced in about 20000 steps once.
Interesting problem. Straightforward solution would be to iterate through the numbers from L to R, calculate the sum of digits for each, and check for each whether the number is divisible by the sum of digits.
Calculating the sum of digits can be made faster obviously. The numbers xxx0, xxx1, xxx2, ..., xxx9 have digit sums n, n+1, n+2, ..., n+9. So for ten consecutive numbers almost no effort is needed to calculate the digit sum, just a modulo operation to check for divisibility.
The modulo check can be made faster. Compilers use clever tricks to divide by constants, replacing a slow division with a shift and a multiplication. You can search for how this is done, and since there are only 81 possible divisors, do at runtime what the compiler would do for constants. That should get the time down to few nanoseconds per number.
To do better: I'd make a loop checking for numbers with digit sum 1, digit sum 2, etc. As an example, assume I'm checking numbers with digit sum 17. These numbers must have a digit sum of 17, and also be multiples of 17. I take the numbers from 0000 to 9999 and for each I calculate the sum of digits, and the value modulo 17, and divide them into 37 x 17 sets where all the numbers in the set have the same digit sum and the same value modulo 17 and count the elements in each set.
Then to check the numbers from 0 to 9999: I pick the set where the digit sum is 17, and the value modulo 17 is 0 and take the element count of that set. To check numbers from 10,000 to 19,999: I pick the set where the digit sum is 16, and the value modulo 17 is 13 (because 10013 is divisible by 17), and so on.
That's just the idea. I think with a bit of cleverness that can be extended to a method that takes O (log^4 R) steps to handle all the numbers from L to R.
In the C code below, I have focused on the core portion, i.e. finding the DSD count. The code is admittedly ugly, but that's what you get when coding in a hurry.
The basic observation is that the digit sum can be simplified by tracking the digits of the number individually, reducing the digit sum determination to simple increments/decrements in each step. There are probably clever ways to accelerate the modulo computations, I could not come up with any on the double.
On my machine (Xeon E3 1270 v2, 3.5 GHz) the code below finds the count of DSDs in [1,1e9] in 3.54 seconds. I compiled with MSVC 2010 at optimization level -O2. While you stated a time limit of 1 second in an update to your question, it is not clear that this extreme case is exercised by the framework at the website you mentioned. In any event this will provide a reasonable baseline to compare other proposed solutions against.
#include <stdio.h>
#include <stdlib.h>
/* sum digits in decimal representation of x */
int digitsum (int x)
{
int sum = 0;
while (x) {
sum += x % 10;
x = x / 10;
}
return sum;
}
/* split integer into individual decimal digits. p[0]=ones, p[1]=tens, ... */
void split (int a, int *p)
{
int i = 0;
while (a) {
p[i] = a % 10;
a = a / 10;
i++;
}
}
/* return number of DSDs in [first,last] inclusive. first, last in [1,1e9] */
int count_dsd (int first, int last)
{
int num, ds, count = 0, p[10] = {0};
num = first;
split (num, p);
ds = digitsum (num);
while (p[9] < 10) {
while (p[8] < 10) {
while (p[7] < 10) {
while (p[6] < 10) {
while (p[5] < 10) {
while (p[4] < 10) {
while (p[3] < 10) {
while (p[2] < 10) {
while (p[1] < 10) {
while (p[0] < 10) {
count += ((num % ds) == 0);
if (num == last) {
return count;
}
num++;
p[0]++;
ds++;
}
p[0] = 0;
p[1]++;
ds -= 9;
}
p[1] = 0;
p[2]++;
ds -= 9;
}
p[2] = 0;
p[3]++;
ds -= 9;
}
p[3] = 0;
p[4]++;
ds -= 9;
}
p[4] = 0;
p[5]++;
ds -= 9;
}
p[5] = 0;
p[6]++;
ds -= 9;
}
p[6] = 0;
p[7]++;
ds -= 9;
}
p[7] = 0;
p[8]++;
ds -= 9;
}
p[8] = 0;
p[9]++;
ds -= 9;
}
return count;
}
int main (void)
{
int i, first, last, *count, testcases;
scanf ("%d", &testcases);
count = malloc (testcases * sizeof(count[0]));
if (!count) return EXIT_FAILURE;
for (i = 0; i < testcases; i++) {
scanf ("%d %d", &first, &last);
count[i] = count_dsd (first, last);
}
for (i = 0; i < testcases; i++) {
printf ("%d\n", count[i]);
}
free (count);
return EXIT_SUCCESS;
}
I copied the sample inputs stated in the question into a text file testdata, and when I call the executable like so:
dsd < testdata
the output is as desired:
4
10
9
33
Solution in Java
Implement a program to find out whether a number is divisible by the sum of its digits.
Display appropriate messages.
class DivisibleBySum
{
public static void main(String[] args)
{
// Implement your code here
int num = 123;
int number = num;
int sum=0;
for(;num>0;num /=10)
{
int rem = num % 10;
sum += rem;
}
if(number %sum ==0)
System.out.println(number+" is divisible by sum of its digits");
else
System.out.println(number+" is not divisible by sum of its digits");
}
}
Given a sorted list of numbers, I would like to find the longest subsequence where the differences between successive elements are geometrically increasing. So if the list is
1, 2, 3, 4, 7, 15, 27, 30, 31, 81
then the subsequence is 1, 3, 7, 15, 31. Alternatively consider 1, 2, 5, 6, 11, 15, 23, 41, 47 which has subsequence 5, 11, 23, 47 with a = 3 and k = 2.
Can this be solved in O(n2) time? Where n is the length of the list.
I am interested both in the general case where the progression of differences is ak, ak2, ak3, etc., where both a and k are integers, and in the special case where a = 1, so the progression of difference is k, k2, k3, etc.
Update
I have made an improvement of the algorithm that it takes an average of O(M + N^2) and memory needs of O(M+N). Mainly is the same that the protocol described below, but to calculate the possible factors A,K for ech diference D, I preload a table. This table takes less than a second to be constructed for M=10^7.
I have made a C implementation that takes less than 10minutes to solve N=10^5 diferent random integer elements.
Here is the source code in C: To execute just do: gcc -O3 -o findgeo findgeo.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <memory.h>
#include <time.h>
struct Factor {
int a;
int k;
struct Factor *next;
};
struct Factor *factors = 0;
int factorsL=0;
void ConstructFactors(int R) {
int a,k,C;
int R2;
struct Factor *f;
float seconds;
clock_t end;
clock_t start = clock();
if (factors) free(factors);
factors = malloc (sizeof(struct Factor) *((R>>1) + 1));
R2 = R>>1 ;
for (a=0;a<=R2;a++) {
factors[a].a= a;
factors[a].k=1;
factors[a].next=NULL;
}
factorsL=R2+1;
R2 = floor(sqrt(R));
for (k=2; k<=R2; k++) {
a=1;
C=a*k*(k+1);
while (C<R) {
C >>= 1;
f=malloc(sizeof(struct Factor));
*f=factors[C];
factors[C].a=a;
factors[C].k=k;
factors[C].next=f;
a++;
C=a*k*(k+1);
}
}
end = clock();
seconds = (float)(end - start) / CLOCKS_PER_SEC;
printf("Construct Table: %f\n",seconds);
}
void DestructFactors() {
int i;
struct Factor *f;
for (i=0;i<factorsL;i++) {
while (factors[i].next) {
f=factors[i].next->next;
free(factors[i].next);
factors[i].next=f;
}
}
free(factors);
factors=NULL;
factorsL=0;
}
int ipow(int base, int exp)
{
int result = 1;
while (exp)
{
if (exp & 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
void findGeo(int **bestSolution, int *bestSolutionL,int *Arr, int L) {
int i,j,D;
int mustExistToBeBetter;
int R=Arr[L-1]-Arr[0];
int *possibleSolution;
int possibleSolutionL=0;
int exp;
int NextVal;
int idx;
int kMax,aMax;
float seconds;
clock_t end;
clock_t start = clock();
kMax = floor(sqrt(R));
aMax = floor(R/2);
ConstructFactors(R);
*bestSolutionL=2;
*bestSolution=malloc(0);
possibleSolution = malloc(sizeof(int)*(R+1));
struct Factor *f;
int *H=malloc(sizeof(int)*(R+1));
memset(H,0, sizeof(int)*(R+1));
for (i=0;i<L;i++) {
H[ Arr[i]-Arr[0] ]=1;
}
for (i=0; i<L-2;i++) {
for (j=i+2; j<L; j++) {
D=Arr[j]-Arr[i];
if (D & 1) continue;
f = factors + (D >>1);
while (f) {
idx=Arr[i] + f->a * f->k - Arr[0];
if ((f->k <= kMax)&& (f->a<aMax)&&(idx<=R)&&H[idx]) {
if (f->k ==1) {
mustExistToBeBetter = Arr[i] + f->a * (*bestSolutionL);
} else {
mustExistToBeBetter = Arr[i] + f->a * f->k * (ipow(f->k,*bestSolutionL) - 1)/(f->k-1);
}
if (mustExistToBeBetter< Arr[L-1]+1) {
idx= floor(mustExistToBeBetter - Arr[0]);
} else {
idx = R+1;
}
if ((idx<=R)&&H[idx]) {
possibleSolution[0]=Arr[i];
possibleSolution[1]=Arr[i] + f->a*f->k;
possibleSolution[2]=Arr[j];
possibleSolutionL=3;
exp = f->k * f->k * f->k;
NextVal = Arr[j] + f->a * exp;
idx=NextVal - Arr[0];
while ( (idx<=R) && H[idx]) {
possibleSolution[possibleSolutionL]=NextVal;
possibleSolutionL++;
exp = exp * f->k;
NextVal = NextVal + f->a * exp;
idx=NextVal - Arr[0];
}
if (possibleSolutionL > *bestSolutionL) {
free(*bestSolution);
*bestSolution = possibleSolution;
possibleSolution = malloc(sizeof(int)*(R+1));
*bestSolutionL=possibleSolutionL;
kMax= floor( pow (R, 1/ (*bestSolutionL) ));
aMax= floor(R / (*bestSolutionL));
}
}
}
f=f->next;
}
}
}
if (*bestSolutionL == 2) {
free(*bestSolution);
possibleSolutionL=0;
for (i=0; (i<2)&&(i<L); i++ ) {
possibleSolution[possibleSolutionL]=Arr[i];
possibleSolutionL++;
}
*bestSolution = possibleSolution;
*bestSolutionL=possibleSolutionL;
} else {
free(possibleSolution);
}
DestructFactors();
free(H);
end = clock();
seconds = (float)(end - start) / CLOCKS_PER_SEC;
printf("findGeo: %f\n",seconds);
}
int compareInt (const void * a, const void * b)
{
return *(int *)a - *(int *)b;
}
int main(void) {
int N=100000;
int R=10000000;
int *A = malloc(sizeof(int)*N);
int *Sol;
int SolL;
int i;
int *S=malloc(sizeof(int)*R);
for (i=0;i<R;i++) S[i]=i+1;
for (i=0;i<N;i++) {
int r = rand() % (R-i);
A[i]=S[r];
S[r]=S[R-i-1];
}
free(S);
qsort(A,N,sizeof(int),compareInt);
/*
int step = floor(R/N);
A[0]=1;
for (i=1;i<N;i++) {
A[i]=A[i-1]+step;
}
*/
findGeo(&Sol,&SolL,A,N);
printf("[");
for (i=0;i<SolL;i++) {
if (i>0) printf(",");
printf("%d",Sol[i]);
}
printf("]\n");
printf("Size: %d\n",SolL);
free(Sol);
free(A);
return EXIT_SUCCESS;
}
Demostration
I will try to demonstrate that the algorithm that I proposed is in average for an equally distributed random sequence. I’m not a mathematician and I am not used to do this kind of demonstrations, so please fill free to correct me any error that you can see.
There are 4 indented loops, the two firsts are the N^2 factor. The M is for the calculation of the possible factors table).
The third loop is executed only once in average for each pair. You can see this checking the size of the pre-calculated factors table. It’s size is M when N->inf. So the average steps for each pair is M/M=1.
So the proof happens to check that the forth loop. (The one that traverses the good made sequences is executed less that or equal O(N^2) for all the pairs.
To demonstrate that, I will consider two cases: one where M>>N and other where M ~= N. Where M is the maximum difference of the initial array: M= S(n)-S(1).
For the first case, (M>>N) the probability to find a coincidence is p=N/M. To start a sequence, it must coincide the second and the b+1 element where b is the length of the best sequence until now. So the loop will enter times. And the average length of this series (supposing an infinite series) is . So the total number of times that the loop will be executed is . And this is close to 0 when M>>N. The problem here is when M~=N.
Now lets consider this case where M~=N. Lets consider that b is the best sequence length until now. For the case A=k=1, then the sequence must start before N-b, so the number of sequences will be N-b, and the times that will go for the loop will be a maximum of (N-b)*b.
For A>1 and k=1 we can extrapolate to where d is M/N (the average distance between numbers). If we add for all A’s from 1 to dN/b then we see a top limit of:
For the cases where k>=2, we see that the sequence must start before , So the loop will enter an average of and adding for all As from 1 to dN/k^b, it gives a limit of
Here, the worst case is when b is minimum. Because we are considering minimum series, lets consider a very worst case of b= 2 so the number of passes for the 4th loop for a given k will be less than
.
And if we add all k’s from 2 to infinite will be:
So adding all the passes for k=1 and k>=2, we have a maximum of:
Note that d=M/N=1/p.
So we have two limits, One that goes to infinite when d=1/p=M/N goes to 1 and other that goes to infinite when d goes to infinite. So our limit is the minimum of both, and the worst case is when both equetions cross. So if we solve the equation:
we see that the maximum is when d=1.353
So it is demonstrated that the forth loops will be processed less than 1.55N^2 times in total.
Of course, this is for the average case. For the worst case I am not able to find a way to generate series whose forth loop are higher than O(N^2), and I strongly believe that they does not exist, but I am not a mathematician to prove it.
Old Answer
Here is a solution in average of O((n^2)*cube_root(M)) where M is the difference between the first and last element of the array. And memory requirements of O(M+N).
1.- Construct an array H of length M so that M[i - S[0]]=true if i exists in the initial array and false if it does not exist.
2.- For each pair in the array S[j], S[i] do:
2.1 Check if it can be the first and third elements of a possible solution. To do so, calculate all possible A,K pairs that meet the equation S(i) = S(j) + AK + AK^2. Check this SO question to see how to solve this problem. And check that exist the second element: S[i]+ A*K
2.2 Check also that exist the element one position further that the best solution that we have. For example, if the best solution that we have until now is 4 elements long then check that exist the element A[j] + AK + AK^2 + AK^3 + AK^4
2.3 If 2.1 and 2.2 are true, then iterate how long is this series and set as the bestSolution until now is is longer that the last.
Here is the code in javascript:
function getAKs(A) {
if (A / 2 != Math.floor(A / 2)) return [];
var solution = [];
var i;
var SR3 = Math.pow(A, 1 / 3);
for (i = 1; i <= SR3; i++) {
var B, C;
C = i;
B = A / (C * (C + 1));
if (B == Math.floor(B)) {
solution.push([B, C]);
}
B = i;
C = (-1 + Math.sqrt(1 + 4 * A / B)) / 2;
if (C == Math.floor(C)) {
solution.push([B, C]);
}
}
return solution;
}
function getBestGeometricSequence(S) {
var i, j, k;
var bestSolution = [];
var H = Array(S[S.length-1]-S[0]);
for (i = 0; i < S.length; i++) H[S[i] - S[0]] = true;
for (i = 0; i < S.length; i++) {
for (j = 0; j < i; j++) {
var PossibleAKs = getAKs(S[i] - S[j]);
for (k = 0; k < PossibleAKs.length; k++) {
var A = PossibleAKs[k][0];
var K = PossibleAKs[k][17];
var mustExistToBeBetter;
if (K==1) {
mustExistToBeBetter = S[j] + A * bestSolution.length;
} else {
mustExistToBeBetter = S[j] + A * K * (Math.pow(K,bestSolution.length) - 1)/(K-1);
}
if ((H[S[j] + A * K - S[0]]) && (H[mustExistToBeBetter - S[0]])) {
var possibleSolution=[S[j],S[j] + A * K,S[i]];
exp = K * K * K;
var NextVal = S[i] + A * exp;
while (H[NextVal - S[0]] === true) {
possibleSolution.push(NextVal);
exp = exp * K;
NextVal = NextVal + A * exp;
}
if (possibleSolution.length > bestSolution.length) {
bestSolution = possibleSolution;
}
}
}
}
}
return bestSolution;
}
//var A= [ 1, 2, 3,5,7, 15, 27, 30,31, 81];
var A=[];
for (i=1;i<=3000;i++) {
A.push(i);
}
var sol=getBestGeometricSequence(A);
$("#result").html(JSON.stringify(sol));
You can check the code here: http://jsfiddle.net/6yHyR/1/
I maintain the other solution because I believe that it is still better when M is very big compared to N.
Just to start with something, here is a simple solution in JavaScript:
var input = [0.7, 1, 2, 3, 4, 7, 15, 27, 30, 31, 81],
output = [], indexes, values, i, index, value, i_max_length,
i1, i2, i3, j1, j2, j3, difference12a, difference23a, difference12b, difference23b,
scale_factor, common_ratio_a, common_ratio_b, common_ratio_c,
error, EPSILON = 1e-9, common_ratio_is_integer,
resultDiv = $("#result");
for (i1 = 0; i1 < input.length - 2; ++i1) {
for (i2 = i1 + 1; i2 < input.length - 1; ++i2) {
scale_factor = difference12a = input[i2] - input[i1];
for (i3 = i2 + 1; i3 < input.length; ++i3) {
difference23a = input[i3] - input[i2];
common_ratio_1a = difference23a / difference12a;
common_ratio_2a = Math.round(common_ratio_1a);
error = Math.abs((common_ratio_2a - common_ratio_1a) / common_ratio_1a);
common_ratio_is_integer = error < EPSILON;
if (common_ratio_2a > 1 && common_ratio_is_integer) {
indexes = [i1, i2, i3];
j1 = i2;
j2 = i3
difference12b = difference23a;
for (j3 = j2 + 1; j3 < input.length; ++j3) {
difference23b = input[j3] - input[j2];
common_ratio_1b = difference23b / difference12b;
common_ratio_2b = Math.round(common_ratio_1b);
error = Math.abs((common_ratio_2b - common_ratio_1b) / common_ratio_1b);
common_ratio_is_integer = error < EPSILON;
if (common_ratio_is_integer && common_ratio_2a === common_ratio_2b) {
indexes.push(j3);
j1 = j2;
j2 = j3
difference12b = difference23b;
}
}
values = [];
for (i = 0; i < indexes.length; ++i) {
index = indexes[i];
value = input[index];
values.push(value);
}
output.push(values);
}
}
}
}
if (output !== []) {
i_max_length = 0;
for (i = 1; i < output.length; ++i) {
if (output[i_max_length].length < output[i].length)
i_max_length = i;
}
for (i = 0; i < output.length; ++i) {
if (output[i_max_length].length == output[i].length)
resultDiv.append("<p>[" + output[i] + "]</p>");
}
}
Output:
[1, 3, 7, 15, 31]
I find the first three items of every subsequence candidate, calculate the scale factor and the common ratio from them, and if the common ratio is integer, then I iterate over the remaining elements after the third one, and add those to the subsequence, which fit into the geometric progression defined by the first three items. As a last step, I select the sebsequence/s which has/have the largest length.
In fact it is exactly the same question as Longest equally-spaced subsequence, you just have to consider the logarithm of your data. If the sequence is a, ak, ak^2, ak^3, the logarithmique value is ln(a), ln(a) + ln(k), ln(a)+2ln(k), ln(a)+3ln(k), so it is equally spaced. The opposite is of course true. There is a lot of different code in the question above.
I don't think the special case a=1 can be resolved more efficiently than an adaptation from an algorithm above.
Here is my solution in Javascript. It should be close to O(n^2) except may be in some pathological cases.
function bsearch(Arr,Val, left,right) {
if (left == right) return left;
var m=Math.floor((left + right) /2);
if (Val <= Arr[m]) {
return bsearch(Arr,Val,left,m);
} else {
return bsearch(Arr,Val,m+1,right);
}
}
function findLongestGeometricSequence(S) {
var bestSolution=[];
var i,j,k;
var H={};
for (i=0;i<S.length;i++) H[S[i]]=true;
for (i=0;i<S.length;i++) {
for (j=0;j<i;j++) {
for (k=j+1;k<i;) {
var possibleSolution=[S[j],S[k],S[i]];
var K = (S[i] - S[k]) / (S[k] - S[j]);
var A = (S[k] - S[j]) * (S[k] - S[j]) / (S[i] - S[k]);
if ((Math.floor(K) == K) && (Math.floor(A)==A)) {
exp= K*K*K;
var NextVal= S[i] + A * exp;
while (H[NextVal] === true) {
possibleSolution.push(NextVal);
exp = exp * K;
NextVal= NextVal + A * exp;
}
if (possibleSolution.length > bestSolution.length)
bestSolution=possibleSolution;
K--;
} else {
K=Math.floor(K);
}
if (K>0) {
var NextPossibleMidValue= (S[i] + K*S[j]) / (K +1);
k++;
if (S[k]<NextPossibleMidValue) {
k=bsearch(S,NextPossibleMidValue, k+1, i);
}
} else {
k=i;
}
}
}
}
return bestSolution;
}
function Run() {
var MyS= [0.7, 1, 2, 3, 4, 5,6,7, 15, 27, 30,31, 81];
var sol = findLongestGeometricSequence(MyS);
alert(JSON.stringify(sol));
}
Small Explanation
If we take 3 numbers of the array S(j) < S(k) < S(i) then you can calculate a and k so that: S(k) = S(j) + a*k and S(i) = S(k) + a*k^2 (2 equations and 2 incognits). With that in mind, you can check if exist a number in the array that is S(next) = S(i) + a*k^3. If that is the case, then continue checknng for S(next2) = S(next) + a*k^4 and so on.
This would be a O(n^3) solution, but you can hava advantage that k must be integer in order to limit the S(k) points selected.
In case that a is known, then you can calculate a(k) and you need to check only one number in the third loop, so this case will be clearly a O(n^2).
I think this task is related with not so long ago posted Longest equally-spaced subsequence. I've just modified my algorithm in Python a little bit:
from math import sqrt
def add_precalc(precalc, end, (a, k), count, res, N):
if end + a * k ** res[1]["count"] > N: return
x = end + a * k ** count
if x > N or x < 0: return
if precalc[x] is None: return
if (a, k) not in precalc[x]:
precalc[x][(a, k)] = count
return
def factors(n):
res = []
for x in range(1, int(sqrt(n)) + 1):
if n % x == 0:
y = n / x
res.append((x, y))
res.append((y, x))
return res
def work(input):
precalc = [None] * (max(input) + 1)
for x in input: precalc[x] = {}
N = max(input)
res = ((0, 0), {"end":0, "count":0})
for i, x in enumerate(input):
for y in input[i::-1]:
for a, k in factors(x - y):
if (a, k) in precalc[x]: continue
add_precalc(precalc, x, (a, k), 2, res, N)
for step, count in precalc[x].iteritems():
count += 1
if count > res[1]["count"]: res = (step, {"end":x, "count":count})
add_precalc(precalc, x, step, count, res, N)
precalc[x] = None
d = [res[1]["end"]]
for x in range(res[1]["count"] - 1, 0, -1):
d.append(d[-1] - res[0][0] * res[0][1] ** x)
d.reverse()
return d
explanation
Traversing the array
For each previous element of the array calculate factors of the difference between current and taken previous element and then precalculate next possible element of the sequence and saving it to precalc array
So when arriving at element i there're already all possible sequences with element i in the precalc array, so we have to calculate next possible element and save it to precalc.
Currently there's one place in algorithm that could be slow - factorization of each previous number. I think it could be made faster with two optimizations:
more effective factorization algorithm
find a way not to see at each element of array, using the fact that array is sorted and there's already a precalculated sequences
Python:
def subseq(a):
seq = []
aset = set(a)
for i, x in enumerate(a):
# elements after x
for j, x2 in enumerate(a[i+1:]):
j += i + 1 # enumerate starts j at 0, we want a[j] = x2
bk = x2 - x # b*k (assuming k and k's exponent start at 1)
# given b*k, bruteforce values of k
for k in range(1, bk + 1):
items = [x, x2] # our subsequence so far
nextdist = bk * k # what x3 - x2 should look like
while items[-1] + nextdist in aset:
items.append(items[-1] + nextdist)
nextdist *= k
if len(items) > len(seq):
seq = items
return seq
Running time is O(dn^3), where d is the (average?) distance between two elements,
and n is of course len(a).
I have GCD(n, i) where i=1 is increasing in loop by 1 up to n. Is there any algorithm which calculate all GCD's faster than naive increasing and compute GCD using Euclidean algorithm?
PS I've noticed if n is prime I can assume that number from 1 to n-1 would give 1, because prime number would be co-prime to them. Any ideas for other numbers than prime?
C++ implementation, works in O(n * log log n) (assuming size of integers are O(1)):
#include <cstdio>
#include <cstring>
using namespace std;
void find_gcd(int n, int *gcd) {
// divisor[x] - any prime divisor of x
// or 0 if x == 1 or x is prime
int *divisor = new int[n + 1];
memset(divisor, 0, (n + 1) * sizeof(int));
// This is almost copypaste of sieve of Eratosthenes, but instead of
// just marking number as 'non-prime' we remeber its divisor.
// O(n * log log n)
for (int x = 2; x * x <= n; ++x) {
if (divisor[x] == 0) {
for (int y = x * x; y <= n; y += x) {
divisor[y] = x;
}
}
}
for (int x = 1; x <= n; ++x) {
if (n % x == 0) gcd[x] = x;
else if (divisor[x] == 0) gcd[x] = 1; // x is prime, and does not divide n (previous line)
else {
int a = x / divisor[x], p = divisor[x]; // x == a * p
// gcd(a * p, n) = gcd(a, n) * gcd(p, n / gcd(a, n))
// gcd(p, n / gcd(a, n)) == 1 or p
gcd[x] = gcd[a];
if ((n / gcd[a]) % p == 0) gcd[x] *= p;
}
}
}
int main() {
int n;
scanf("%d", &n);
int *gcd = new int[n + 1];
find_gcd(n, gcd);
for (int x = 1; x <= n; ++x) {
printf("%d:\t%d\n", x, gcd[x]);
}
return 0;
}
SUMMARY
The possible answers for the gcd consist of the factors of n.
You can compute these efficiently as follows.
ALGORITHM
First factorise n into a product of prime factors, i.e. n=p1^n1*p2^n2*..*pk^nk.
Then you can loop over all factors of n and for each factor of n set the contents of the GCD array at that position to the factor.
If you make sure that the factors are done in a sensible order (e.g. sorted) you should find that the array entries that are written multiple times will end up being written with the highest value (which will be the gcd).
CODE
Here is some Python code to do this for the number 1400=2^3*5^2*7:
prime_factors=[2,5,7]
prime_counts=[3,2,1]
N=1
for prime,count in zip(prime_factors,prime_counts):
N *= prime**count
GCD = [0]*(N+1)
GCD[0] = N
def go(i,n):
"""Try all counts for prime[i]"""
if i==len(prime_factors):
for x in xrange(n,N+1,n):
GCD[x]=n
return
n2=n
for c in xrange(prime_counts[i]+1):
go(i+1,n2)
n2*=prime_factors[i]
go(0,1)
print N,GCD
Binary GCD algorithm:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm
is faster than Euclidean algorithm:
https://en.wikipedia.org/wiki/Euclidean_algorithm
I implemented "gcd()" in C for type "__uint128_t" (with gcc on Intel i7 Ubuntu), based on iterative Rust version:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm#Iterative_version_in_Rust
Determining number of trailing 0s was done efficiently with "__builtin_ctzll()". I did benchmark 1 million loops of two biggest 128bit Fibonacci numbers (they result in maximal number of iterations) against gmplib "mpz_gcd()" and saw 10% slowdown. Utilizing the fact that u/v values only decrease, I switched to 64bit special case "_gcd()" when "<=UINT64_max" and now see speedup of 1.31 over gmplib, for details see:
https://www.raspberrypi.org/forums/viewtopic.php?f=33&t=311893&p=1873552#p1873552
inline int ctz(__uint128_t u)
{
unsigned long long h = u;
return (h!=0) ? __builtin_ctzll( h )
: 64 + __builtin_ctzll( u>>64 );
}
unsigned long long _gcd(unsigned long long u, unsigned long long v)
{
for(;;) {
if (u > v) { unsigned long long a=u; u=v; v=a; }
v -= u;
if (v == 0) return u;
v >>= __builtin_ctzll(v);
}
}
__uint128_t gcd(__uint128_t u, __uint128_t v)
{
if (u == 0) { return v; }
else if (v == 0) { return u; }
int i = ctz(u); u >>= i;
int j = ctz(v); v >>= j;
int k = (i < j) ? i : j;
for(;;) {
if (u > v) { __uint128_t a=u; u=v; v=a; }
if (v <= UINT64_MAX) return _gcd(u, v) << k;
v -= u;
if (v == 0) return u << k;
v >>= ctz(v);
}
}