Summing intervals - algorithm

I have to sum intervals like these:
1..6
2..4
The result is 1..6, so there are 6 numbers in the end.
Here's another example:
4..6
8..10
14..16
4, 5, 6, 8, 9, 10, 14, 15, 16, the size is 9.
Now, I have to do this in O(N). Here's a not-so-good approach I quickly came up with using the STL:
#include <set>
#include <stdio.h>
using namespace std;
int main() {
int n;
scanf("%d", &n);
set<int> numbers;
int a, b;
for (int i = 0; i < n; i++) {
scanf("%d %d", &a, &b);
for (int u = a; u <= b; u++) {
numbers.insert(u);
}
}
printf("%d\n", numbers.size());
return 0;
}
Any idea of how this can be done in O(N)? I know I have to sort it before, but I can use this I just made:
bool compare(const vector<int> first, const vector<int> second) {
if (first[0] == second[0]) return first[1] < second[1];
return first[0] < second[0];
}
sort(intervals.begin(), intervals.end(), compare);
So it'd be O(log N + N).
Any ideas? Thank you.

If n is the number of intervals then I don't think that there is a way to do this that is not O(n log(n)).
But if we're willing to face that, the first step is to sort the intervals by their left-hand value. (This takes time O(n log(n)).) Then you try to compute a minimal set of intervals in the union according to the following pseudo-code
answer = 0
while intervals left
(min, max) = next interval
while intervals left and min of next interval < max:
if max < max of next interval:
max = max of next interval
move forward in interval list
# the next interval is [min..max]
answer += max - min + 1
(This code is linear in the number of intervals, the non-linear piece is sorting it.)

I did it time ago in OCaml, here's the code:
let rec calc c l1 l2 =
match c,l1,l2 with
None, (f1,t1) :: y1, ((f2,t2) :: y2 as n2) when f1 < f2 -> calc (Some (f1,t1)) y1 n2
| None, n1, (f2,t2) :: y2 -> calc (Some (f2,t2)) n1 y2
| None, _, _ -> []
| (Some (fc,tc) as cur), (f1,t1) :: y1, ((f2,t2) :: y2 as n2) when t1 <= fc -> calc cur y1 n2
| (Some (fc,tc) as cur), ((f1,t1) :: y1 as n1), (f2,t2) :: y2 when t2 <= fc -> calc cur n1 y2
| Some (fc,tc), (f1,t1) :: y1, ((f2,t2) :: y2 as n2) when f1 <= tc && t1 > fc -> calc (Some (fc,t1)) y1 n2
| Some (fc,tc), ((f1,t1) :: y1 as n1), (f2,t2) :: y2 when f2 <= tc && t2 > fc -> calc (Some (fc,t2)) n1 y2
| Some (fc,tc), (f1,t1) :: y1, ((f2,t2) :: y2 as n2) when f1 < f2 -> [fc,tc] # calc (Some (f1,t1)) y1 n2
| Some (fc,tc), (t :: e as n1), (f2,t2) :: y2 -> [fc,tc] # calc (Some (f2,t2)) n1 y2
| Some (fc,tc), [], (f,t) :: tr when f <= tc && t > tc -> calc (Some (fc,t)) [] tr
| Some (fc,tc), [], (f,t) :: tr when f <= tc && t <= tc -> calc (Some (fc,tc)) [] tr
| Some (fc,tc), [], x -> [fc,tc] # x
| Some (fc,tc), (f,t) :: tr, [] when f <= tc && t > tc -> calc (Some (fc,t)) tr []
| Some (fc,tc), (f,t) :: tr, [] when f <= tc && t <= tc -> calc (Some (fc,tc)) tr []
| Some (fc,tc), x, [] -> [fc,tc] # x
This computes the union of two ranges (which are two arbitrary sets of couple of elements) and it's O(N+M) (N and M are the number of single intervals in each set). Result is sorted.
After this you can easily compute the list in linear time:
List.fold_left (fun a (f,t) -> for i = f to t do a := !a # [Int i] done; a) (ref []) range
Ok, this is OCaml but I had it ready so maybe it will be useful to you, especially on tricky part that merges intervals by deleting overlapping parts, since I spent some time to figure out the algorithm but I couldn't describe it to you in metacode (as you can see from implementation).

I believe the best complexity you can achieve here is O(N*log(N)) where N is the number of intervals. The solution is not very hard - you need to first sort the intervals by their beginning and then do another linear pass to compute their union. I will try to write some code in c++:
struct Interval {
int from, to;
bool operator<(const Interval& other) const {
if(from != other.from) {
return from < other.from;
}
return to < other.to;
}
};
int main() {
vector<Interval> intervals;
sort(intervals.begin(), intervals.end());
int current_position = intervals[0].from;
int sum = 0;
for (int i = 0; i < intervals.size(); ++i) {
if (intervals[i].to < current_position) {
continue;
} else if (intervals[i].from <= current_position) {
sum += intervals[i].to - current_position + 1;
current_position = intervals[i].to + 1;
} else {
sum += intervals[i].to - intervals[i].from + 1;
current_position = intervals[i].to + 1;
}
}
std::cout << sum << std::endl;
return 0;
}

First, let's be clear about what N is - is it the number of segments?
If this is the case, then you can't always do it - simply printing out the number of individual numbers within all the segments - call that m - takes O(m) time. The fastest algorithm, then, can't be better than O(m+n)

Related

Counting inversions in a segment with updates

I'm trying to solve a problem which goes like this:
Problem
Given an array of integers "arr" of size "n", process two types of queries. There are "q" queries you need to answer.
Query type 1
input: l r
result: output number of inversions in [l, r]
Query type 2
input: x y
result: update the value at arr [x] to y
Inversion
For every index j < i, if arr [j] > arr [i], the pair (j, i) is one inversion.
Input
n = 5
q = 3
arr = {1, 4, 3, 5, 2}
queries:
type = 1, l = 1, r = 5
type = 2, x = 1, y = 4
type = 1, l = 1, r = 5
Output
4
6
Constraints
Time: 4 secs
1 <= n, q <= 100000
1 <= arr [i] <= 40
1 <= l, r, x <= n
1 <= y <= 40
I know how to solve a simpler version of this problem without updates, i.e. to simply count the number of inversions for each position using a segment tree or fenwick tree in O(N*log(N)). The only solution I have to this problem is O(q*N*log(N)) (I think) with segment tree other than the O(q*N2) trivial algorithm. This however does not fit within the time constraints of the problem. I would like to have hints towards a better algorithm to solve the problem in O(N*log(N)) (if it's possible) or maybe O(N*log2(N)).
I first came across this problem two days ago and have been spending a few hours here and there to try and solve it. However, I'm finding it non-trivial to do so and would like to have some help/hints regarding the same. Thanks for your time and patience.
Updates
Solution
With the suggestion, answer and help by Tanvir Wahid, I've implemented the source code for the problem in C++ and would like to share it here for anyone who might stumble across this problem and not have an intuitive idea on how to solve it. Thank you!
Let's build a segment tree with each node containing information about how many inversions exist and the frequency count of elements present in its segment of authority.
node {
integer inversion_count : 0
array [40] frequency : {0...0}
}
Building the segment tree and handling updates
For each leaf node, initialise inversion count to 0 and increase frequency of the represented element from the input array to 1. The frequency of the parent nodes can be calculated by summing up frequencies of the left and right childrens. The inversion count of parent nodes can be calculated by summing up the inversion counts of left and right children nodes added with the new inversions created upon merging the two segments of their authority which can be calculated using the frequencies of elements in each child. This calculation basically finds out the product of frequencies of bigger elements in the left child and frequencies of smaller elements in the right child.
parent.inversion_count = left.inversion_count + right.inversion_count
for i in [39, 0]
for j in [0, i)
parent.inversion_count += left.frequency [i] * right.frequency [j]
Updates are handled similarly.
Answering range queries on inversion counts
To answer the query for the number of inversions in the range [l, r], we calculate the inversions using the source code attached below.
Time Complexity: O(q*log(n))
Note
The source code attached does break some good programming habits. The sole purpose of the code is to "solve" the given problem and not to accomplish anything else.
Source Code
/**
Lost Arrow (Aryan V S)
Saturday 2020-10-10
**/
#include "bits/stdc++.h"
using namespace std;
struct node {
int64_t inv = 0;
vector <int> freq = vector <int> (40, 0);
void combine (const node& l, const node& r) {
inv = l.inv + r.inv;
for (int i = 39; i >= 0; --i) {
for (int j = 0; j < i; ++j) {
// frequency of bigger numbers in the left * frequency of smaller numbers on the right
inv += 1LL * l.freq [i] * r.freq [j];
}
freq [i] = l.freq [i] + r.freq [i];
}
}
};
void build (vector <node>& tree, vector <int>& a, int v, int tl, int tr) {
if (tl == tr) {
tree [v].inv = 0;
tree [v].freq [a [tl]] = 1;
}
else {
int tm = (tl + tr) / 2;
build(tree, a, 2 * v + 1, tl, tm);
build(tree, a, 2 * v + 2, tm + 1, tr);
tree [v].combine(tree [2 * v + 1], tree [2 * v + 2]);
}
}
void update (vector <node>& tree, int v, int tl, int tr, int pos, int val) {
if (tl == tr) {
tree [v].inv = 0;
tree [v].freq = vector <int> (40, 0);
tree [v].freq [val] = 1;
}
else {
int tm = (tl + tr) / 2;
if (pos <= tm)
update(tree, 2 * v + 1, tl, tm, pos, val);
else
update(tree, 2 * v + 2, tm + 1, tr, pos, val);
tree [v].combine(tree [2 * v + 1], tree [2 * v + 2]);
}
}
node inv_cnt (vector <node>& tree, int v, int tl, int tr, int l, int r) {
if (l > r)
return node();
if (tl == l && tr == r)
return tree [v];
int tm = (tl + tr) / 2;
node result;
result.combine(inv_cnt(tree, 2 * v + 1, tl, tm, l, min(r, tm)), inv_cnt(tree, 2 * v + 2, tm + 1, tr, max(l, tm + 1), r));
return result;
}
void solve () {
int n, q;
cin >> n >> q;
vector <int> a (n);
for (int i = 0; i < n; ++i) {
cin >> a [i];
--a [i];
}
vector <node> tree (4 * n);
build(tree, a, 0, 0, n - 1);
while (q--) {
int type, x, y;
cin >> type >> x >> y;
--x; --y;
if (type == 1) {
node result = inv_cnt(tree, 0, 0, n - 1, x, y);
cout << result.inv << '\n';
}
else if (type == 2) {
update(tree, 0, 0, n - 1, x, y);
}
else
assert(false);
}
}
int main () {
std::ios::sync_with_stdio(false);
std::cin.tie(nullptr);
std::cout.precision(10);
std::cout << std::fixed << std::boolalpha;
int t = 1;
// std::cin >> t;
while (t--)
solve();
return 0;
}
arr[i] can be at most 40. We can use this to our advantage. What we need is a segment tree. Each node will hold 41 values (A long long int which represents inversions for this range and a array of size 40 for count of each numbers. A struct will do). How do we merge two children of a node. We know inversions for left child and right child. Also know frequency of each numbers in both of them. Inversion of parent node will be summation of inversions of both children plus number of inversions between left and right child. We can easily find inversions between two children from frequency of numbers. Query can be done in similar way. Complexity O(40*qlog(n))

Clarification of Answer... find the max possible two equal sum in a SET

I need a clarification of the answer of this question but I can not comment (not enough rep) so I ask a new question. Hope it is ok.
The problem is this:
Given an array, you have to find the max possible two equal sum, you
can exclude elements.
i.e 1,2,3,4,6 is given array we can have max two equal sum as 6+2 =
4+3+1
i.e 4,10,18, 22, we can get two equal sum as 18+4 = 22
what would be your approach to solve this problem apart from brute
force to find all computation and checking two possible equal sum?
edit 1: max no of array elements are N <= 50 and each element can be
up to 1<= K <=1000
edit 2: Total elements sum cannot be greater than 1000.
The approved answer says:
I suggest solving this using DP where instead of tracking A,B (the
size of the two sets), you instead track A+B,A-B (the sum and
difference of the two sets).
Then for each element in the array, try adding it to A, or B, or
neither.
The advantage of tracking the sum/difference is that you only need to
keep track of a single value for each difference, namely the largest
value of the sum you have seen for this difference.
What I do not undertand is:
If this was the subset sum problem I could solve it with DP, having a memoization matrix of (N x P), where N is the size of the set and P is the target sum...
But I can not figure it out how I should keep track A+B,A-B (as said for the author of the approved answer). Which should be the dimensions of the memoization matrix ? and how that helps to solve the problem ?
The author of the answer was kind enough to provide a code example but it is hard to me to undertand since I do not know python (I know java).
I think thinking how this solution relates to the single subset problem might be misleading for you. Here we are concerned with a maximum achievable sum, and what's more, we need to distinguish between two disjoint sets of numbers as we traverse. Clearly tracking specific combinations would be too expensive.
Looking at the difference between sets A and B, we can say:
A - B = d
A = d + B
Clearly, we want the highest sum when d = 0. How do we know that sum? It's (A + B) / 2!
For the transition in the dynamic program, we'd like to know if it's better to place the current element in A, B or neither. This is achieved like this:
e <- current element
d <- difference between A and B
(1) add e to A -> d + e
why?
A = d + B
(A + e) = d + e + B
(2) add e to B -> d - e
why?
A = d + B
A = d - e + (B + e)
(3) don't use e -> that's simply
what we already have stored for d
Let's look at Peter de Rivas' code for the transition:
# update a copy of our map, so
# we can reference previous values,
# while assigning new values
D2=D.copy()
# d is A - B
# s is A + B
for d,s in D.items():
# a new sum that includes element a
# we haven't decided if a
# will be in A or B
s2 = s + a
# d2 will take on each value here
# in turn, once d - a (adding a to B),
# and once d + a (adding a to A)
for d2 in [d-a, d+a]:
# The main transition:
# the two new differences,
# (d-a) and (d+a) as keys in
# our map get the highest sum
# seen so far, either (1) the
# new sum, s2, or (2) what we
# already stored (meaning `a`
# will be excluded here)
# so all three possibilities
# are covered.
D2[abs(d2)] = max(D2[abs(d2)], s2)
In the end we have stored the highest A + B seen for d = 0, where the elements in A and B form disjoint sets. Return (A + B) / 2.
Try this dp approch : it works fine.
/*
*
i/p ::
1
5
1 2 3 4 6
o/p : 8
1
4
4 10 18 22
o/p : 22
1
4
4 118 22 3
o/p : 0
*/
import java.util.Scanner;
public class TwoPipesOfMaxEqualLength {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int t = sc.nextInt();
while (t-- > 0) {
int n = sc.nextInt();
int[] arr = new int[n + 1];
for (int i = 1; i <= n; i++) {
arr[i] = sc.nextInt();
}
MaxLength(arr, n);
}
}
private static void MaxLength(int[] arr, int n) {
int dp[][] = new int[1005][1005];
int dp1[][] = new int[1005][1005];
// initialize dp with values as 0.
for (int i = 0; i <= 1000; i++) {
for (int j = 0; j <= 1000; j++)
dp[i][j] = 0;
}
// make (0,0) as 1.
dp[0][0] = 1;
for (int i = 1; i <= n; i++) {
for (int j = 0; j <= 1000; j++) {
for (int k = 0; k <= 1000; k++) {
if (j >= arr[i]) {
if (dp[j - arr[i]][k] == 1) {
dp1[j][k] = 1;## Heading ##
}
}
if (k >= arr[i]) {
if (dp[j][k - arr[i]] == 1) {
dp1[j][k] = 1;
}
}
if (dp[j][k] == 1) {
dp1[j][k] = 1;
}
}
}
for (int j = 0; j <= 1000; j++) {
for (int k = 0; k <= 1000; k++) {
dp[j][k] = dp1[j][k];
dp1[j][k] = 0;
}
}
}
int ans = 0;
for (int i = 1; i <= 1000; i++) {
if (dp[i][i] == 1) {
ans = i;
}
}
System.out.println(ans);
}
}
#include <bits/stdc++.h>
using namespace std;
/*
Brute force recursive solve.
*/
void solve(vector<int>&arr, int &ans, int p1, int p2, int idx, int mx_p){
// if p1 == p2, we have a potential answer
if(p1 == p2){
ans = max(ans, p1);
}
//base case 1:
if((p1>mx_p) || (p2>mx_p) || (idx >= arr.size())){
return;
}
// leave the current element
solve(arr, ans, p1, p2, idx+1, mx_p);
// add the current element to p1
solve(arr, ans, p1+arr[idx], p2, idx+1, mx_p);
// add the current element to p2
solve(arr, ans, p1, p2+arr[idx], idx+1, mx_p);
}
/*
Recursive solve with memoization.
*/
int solve(vector<vector<vector<int>>>&memo, vector<int>&arr,
int p1, int p2, int idx, int mx_p){
//base case 1:
if((p1>mx_p) || (p2>mx_p) || (idx>arr.size())){
return -1;
}
// memo'ed answer
if(memo[p1][p2][idx]>-1){
return memo[p1][p2][idx];
}
// if p1 == p2, we have a potential answer
if(p1 == p2){
memo[p1][p2][idx] = max(memo[p1][p2][idx], p1);
}
// leave the current element
memo[p1][p2][idx] = max(memo[p1][p2][idx], solve(memo, arr, p1, p2,
idx+1, mx_p));
// add the current element to p1
memo[p1][p2][idx] = max(memo[p1][p2][idx],
solve(memo, arr, p1+arr[idx], p2, idx+1, mx_p));
// add the current element to p2
memo[p1][p2][idx] = max(memo[p1][p2][idx],
solve(memo, arr, p1, p2+arr[idx], idx+1, mx_p));
return memo[p1][p2][idx];
}
int main(){
vector<int>arr = {1, 2, 3, 4, 7};
int ans = 0;
int mx_p = 0;
for(auto i:arr){
mx_p += i;
}
mx_p /= 2;
vector<vector<vector<int>>>memo(mx_p+1, vector<vector<int>>(mx_p+1,
vector<int>(arr.size()+1,-1)));
ans = solve(memo, arr, 0, 0, 0, mx_p);
ans = (ans>=0)?ans:0;
// solve(arr, ans, 0, 0, 0, mx_p);
cout << ans << endl;
return 0;
}

Using extended euclidean algorithm to find number of intersections of a line segment with points on a 2D grid

In the grid constructed by grid points (M*x, M*y) and given the point A(x1,y1) and point B(x2,y2) where all the variables are integers. I need to check how many grid points lie on the line segment from point A to point B. I know that it can be done by using the extended euclidean algorithm somehow, but I have no clue on how to do it. I would appreciate your help.
Rephrased, you want to determine how many numbers t satisfy
(1) M divides (1-t) x1 + t x2
(2) M divides (1-t) y1 + t y2
(3) 0 <= t <= 1.
Let's focus on (1). We introduce an integer variable q to represent the divisibility constraint and solve for t:
exists integer q, M q = (1-t) x1 + t x2
exists integer q, M q - x1 = (x2 - x1) t.
If x1 is not equal to x2, then this gives a periodic set of possibilities of the form t in {a + b q | q integer}, where a and b are fractions. Otherwise, all t or no t are solutions.
The extended Euclidean algorithm is useful for intersecting the solution sets arising from (1) and (2). Suppose that we want to compute the intersection
{a + b q | q integer} intersect {c + d s | s integer}.
By expressing a generic common element in two different ways, we arrive at a linear Diophantine equation:
a + b q = c + d s,
where a, b, c, d are constant and q, s are integer. Let's clear denominators and gather terms into one equation
A q + B s = C,
where A, B, C are integers. This equation has solutions if and only if the greatest common divisor g of A and B also divides C. Use the extended Euclidean algorithm to compute integer coefficients u, v such that
A u + B v = g.
Then we have a solution
q = u (C/g) + k (B/g)
s = v (C/g) - k (A/g)
for each integer k.
Finally, we have to take constraint (3) into consideration. This should boil down to some arithmetic and one floor division, but I'd rather not work out the details (the special cases of what I've written so far already will take quite a lot of your time).
Let's dX = Abs(x2-x1) and dY = Abs(y2 - y1)
Then number of lattice points on the segment is
P = GCD(dX, dY) + 1
(including start and end points)
where GCD is the greatest common divisor (through usual (not extended) Euclidean algorithm)
(See last Properties here)
Following instructions of Mr. David Eisenstat I have managed to write a program in c++ that calculates the answer.
#include <iostream>
#include <math.h>
using namespace std;
int gcd (int A, int B, int &u, int &v)
{
int Ad = 1;
int Bd = 1;
if (A < 0) { Ad = -1; A = -A; }
if (B < 0) { Bd = -1; B = -B; }
int x = 1, y = 0;
u = 0, v = 1;
while (A != 0)
{
int q = B/A;
int r = B%A;
int m = u-x*q;
int n = v-y*q;
B = A;
A = r;
u = x;
v = y;
x = m;
y = n;
}
u *= Ad;
v *= Bd;
return B;
}
int main(int argc, const char * argv[])
{
int t;
scanf("%d", &t);
for (int i = 0; i<t; i++) {
int x1, x2, y1, y2, M;
scanf("%d %d %d %d %d", &M, &x1, &y1, &x2, &y2);
if ( x1 == x2 ) // vertical line
{
if (x1%M != 0) { // in between the horizontal lines
cout << 0 << endl;
} else
{
int r = abs((y2-y1)/M); // number of points
if (y2%M == 0 || y1%M == 0) // +1 if the line starts or ends on the point
r++;
cout << r << endl;
}
} else {
if (x2 < x1)
{
int c;
c = x1;
x1 = x2;
x2 = c;
}
int A, B, C;
C = x1*y2-y1*x2;
A = M*(y2-y1);
B = -M*(x2-x1);
int u, v;
int g = gcd(A, B, u, v);
//cout << "A: " << A << " B: " << B << " C: " << C << endl;
//cout << A << "*" << u <<"+"<< B <<"*"<<v<<"="<<g<<endl;
double a = -x1/(double)(x2-x1);
double b = M/(double)(x2-x1);
double Z = (-a-C*b/g*u)*g/(B*b);
double Y = (1-a-C*b/g*u)*g/(B*b);
cout << floor(Z) - ceil(Y) + 1 << endl;
}
}
return 0;
}
Thank you for your help! Please check if all special cases are taken into consideration.

Given an integer z<=10^100, find the smallest row of Pascal's triangle that contains z [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
How can I find an algorithm to solve this problem using C++: given an integer z<=10^100, find the smallest row of Pascal's triangle that contains the number z.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
For example if z=6 => result is on the 4th row.
Another way to describe the problem: given integer z<=10^100, find the smallest integer n: exist integer k so that C(k,n) = z.
C(k,n) is combination of n things taken k at a time without repetition
EDIT This solution needs Logarithmic time, it's O(Log z). Or maybe O( (Log z)^2 ).
Say you are looking for n,k where Binomial(n,k)==z for a given z.
Each row has its largest value in the middle, so starting from n=0 you increase the row number, n, as long as the middle value is smaller than the given number. Actually, 10^100 isn't that big, so before row 340 you find a position n0,k0=n0/2 where the value from the triangle is larger than or equal to z: Binomial(n0,k0)>=z
You walk to the left, i.e. you decrease the column number k, until eventually you find a value smaller than z. If there was a matching value in that row you would have hit it by now. k is not very large, less than 170, so this step won't be executed more often than that and does not present a performance problem.
From here you walk down, increasing n. Here you will find a steadily increasing value of Binomial[n,k]. Continue with 3 until the value gets bigger than or equal to z, then goto 2.
EDIT: This step 3 can loop for a very long time when the row number n is large, so instead of checking each n linearly you can do a binary search for n with Binomial(n,k) >= z > Binomial(n-1,k), then it only needs Log(n) time.
A Python implementation looks like this, C++ is similar but somewhat more cumbersome because you need to use an additional library for arbitrary precision integers:
# Calculate (n-k+1)* ... *n
def getnk( n, k ):
a = n
for u in range( n-k+1, n ):
a = a * u
return a
# Find n such that Binomial(n,k) >= z and Binomial(n-1,k) < z
def find_n( z, k, n0 ):
kfactorial = k
for u in range(2, k):
kfactorial *= u
xk = z * kfactorial
nk0 = getnk( n0, k )
n1=n0*2
nk1 = getnk( n1, k )
# duplicate n while the value is too small
while nk1 < xk:
nk0=nk1
n0=n1
n1*=2
nk1 = getnk( n1, k )
# do a binary search
while n1 > n0 + 1:
n2 = (n0+n1) // 2
nk2 = getnk( n2, k )
if nk2 < xk:
n0 = n2
nk0 = nk2
else:
n1 = n2
nk1 = nk2
return n1, nk1 // kfactorial
def find_pos( z ):
n=0
k=0
nk=1
# start by finding a row where the middle value is bigger than z
while nk < z:
# increase n
n = n + 1
nk = nk * n // (n-k)
if nk >= z:
break
# increase both n and k
n = n + 1
k = k + 1
nk = nk * n // k
# check all subsequent rows for a matching value
while nk != z:
if nk > z:
# decrease k
k = k - 1
nk = nk * (k+1) // (n-k)
else:
# increase n
# either linearly
# n = n + 1
# nk = nk * n // (n-k)
# or using binary search:
n, nk = find_n( z, k, n )
return n, k
z = 56476362530291763837811509925185051642180136064700011445902684545741089307844616509330834616
print( find_pos(z) )
It should print
(5864079763474581, 6)
Stirling estimation for n! can be used to find first row in triangle with binomial coefficient bigger or equal to a given x. Using this estimation we can derive lower and upper bound for
and then by observation that this is the maximum coefficient in row that expands 2n:
P( 2n, 0), P( 2n, 1), P( 2n, 2), ..., P( 2n, 2n -1), P( 2n, 2n)
we can find first row with maximum binomial coefficient bigger or equal to a given x. This is the first row in which x can be looking for, this is not possible that x can be found in the row smaller than this. Note: this may be right hint and give an answer immediately in some cases. At the moment I cannot see other way than to start a brute force search from this row.
template <class T>
T binomial_coefficient(unsigned long n, unsigned long k) {
unsigned long i;
T b;
if (0 == k || n == k) {
return 1;
}
if (k > n) {
return 0;
}
if (k > (n - k)) {
k = n - k;
}
if (1 == k) {
return n;
}
b = 1;
for (i = 1; i <= k; ++i) {
b *= (n - (k - i));
if (b < 0) return -1; /* Overflow */
b /= i;
}
return b;
}
Stirling:
double stirling_lower_bound( int n) {
double n_ = n / 2.0;
double res = pow( 2.0, 2 * n_);
res /= sqrt( n_ * M_PI);
return res * exp( ( -1.0) / ( 6 * n_));
}
double stirling_upper_bound( int n) {
double n_ = n / 2.0;
double res = pow( 2.0, 2 * n_) ;
res /= sqrt( n_ * M_PI);
return res * exp( 1.0 / ( 24 * n_));
}
int stirling_estimate( double x) {
int n = 1;
while ( stirling_lower_bound( n) <= x) {
if ( stirling_upper_bound( n) > x) return n;
++n;
}
return n;
}
usage:
long int search_coefficient( unsigned long int &n, unsigned long int x) {
unsigned long int k = n / 2;
long long middle_coefficient = binomial_coefficient<long long>( n, k);
if( middle_coefficient == x) return k;
unsigned long int right = binomial_coefficient<unsigned long>( n, ++k);
while ( x != right) {
while( x < right || x < ( right * ( n + 1) / ( k + 1))) {
right = right * ( n + 1) / ( ++k) - right;
}
if ( right == x) return k;
right = right * ( ++n) / ( ++k);
if( right > x) return -1;
}
return k;
}
/*
*
*/
int main(int argc, char** argv) {
long long x2 = 1365;
unsigned long int n = stirling_estimate( x2);
long int k = search_coefficient( n, x2);
std::cout << "row:" << n <<", column: " << k;
return 0;
}
output:
row:15, column: 11

Stretching out an array

I've got a vector of samples that form a curve. Let's imagine there are 1000 points in it. If I want to stretch it to fill 1500 points, what is the simplest algorithm that gives decent results? I'm looking for something that is just a few lines of C/C++.
I'll always want to increase the size of the vector, and the new vector can be anywhere from 1.1x to 50x the size of the current vector.
Thanks!
Here's C++ for linear and quadratic interpolation.
interp1( 5.3, a, n ) is a[5] + .3 * (a[6] - a[5]), .3 of the way from a[5] to a[6];
interp1array( a, 1000, b, 1500 ) would stretch a to b.
interp2( 5.3, a, n ) draws a parabola through the 3 nearest points a[4] a[5] a[6]: smoother than interp1 but still fast.
(Splines use 4 nearest points, smoother yet; if you read python, see
basic-spline-interpolation-in-a-few-lines-of-numpy.
// linear, quadratic interpolation in arrays
// from interpol.py denis 2010-07-23 July
#include <stdio.h>
#include <stdlib.h>
// linear interpolate x in an array
// inline
float interp1( float x, float a[], int n )
{
if( x <= 0 ) return a[0];
if( x >= n - 1 ) return a[n-1];
int j = int(x);
return a[j] + (x - j) * ( a[j+1] - a[j] );
}
// linear interpolate array a[] -> array b[]
void interp1array( float a[], int n, float b[], int m )
{
float step = float( n - 1 ) / (m - 1);
for( int j = 0; j < m; j ++ )
{
b[j] = interp1( j*step, a, n );
}
}
//..................................................................
// parabola through 3 points, -1 < x < 1
float parabola( float x, float f_1, float f0, float f1 )
{
if( x <= -1 ) return f_1;
if( x >= 1 ) return f1;
float l = f0 - x * (f_1 - f0);
float r = f0 + x * (f1 - f0);
return (l + r + x * (r - l)) / 2;
}
// quadratic interpolate x in an array
float interp2( float x, float a[], int n )
{
if( x <= .5 || x >= n - 1.5 )
return interp1( x, a, n );
int j = int( x + .5 );
float t = 2 * (x - j); // -1 .. 1
return parabola( t, (a[j-1] + a[j]) / 2, a[j], (a[j] + a[j+1]) / 2 );
}
// quadratic interpolate array a[] -> array b[]
void interp2array( float a[], int n, float b[], int m )
{
float step = float( n - 1 ) / (m - 1);
for( int j = 0; j < m; j ++ ){
b[j] = interp2( j*step, a, n );
}
}
int main( int argc, char* argv[] )
{
// a.out [n m] --
int n = 10, m = 100;
int *ns[] = { &n, &m, 0 },
**np = ns;
char* arg;
for( argv ++; (arg = *argv) && *np; argv ++, np ++ )
**np = atoi( arg );
printf( "n: %d m: %d\n", n, m );
float a[n], b[m];
for( int j = 0; j < n; j ++ ){
a[j] = j * j;
}
interp2array( a, n, b, m ); // a[] -> b[]
for( int j = 0; j < m; j ++ ){
printf( "%.1f ", b[j] );
}
printf( "\n" );
}
what is the simplest algorithm that gives decent results?
Catmull-Rom splines. (if you want a smooth curve)
http://www.mvps.org/directx/articles/catmull/
http://en.wikipedia.org/wiki/Cubic_Hermite_spline
For each new item calculate fractional position in old array, use use fractional part (f - floor(f)) as interpolation factor, and "integer" (i.e. floor(f)) part to find nearest elements.
That is assuming that you're operating on data that can be mathematically interpolated (floats). If data cannot be interpolated (strings), then the only solution is to use nearest available element of old array.
You'll need some tweaking if points in array aren't evenly distributed.
Simplest option I can think of is just a fn that expands the array based on mean averages, so:
x,y,z
becomes
x, avg(x,y), y, avg (y,z), z
If you need more data points, just run it multiple times on the vector.

Resources