Algorithm to match sets with overlapping members

Looking for an efficient algorithm to match sets among a group of sets, ordered by the most overlapping members. 2 identical sets for example are the best match, while no overlapping members are the worst.
So, the algorithm takes input a list of sets and returns matching set pairs ordered by the sets with the most overlapping members.
Would be interested in ideas to do this efficiently. Brute force approach is to try all combinations and sort which obviously is not very performant when the number of sets is very large.
Edit: Use case - Assume a large number of sets already exist. When a new set arrives, the algorithm is run and the output includes matching sets (with at least one element overlap) sorted by the most matching to least (doesn't matter how many items are in the new/incoming set). Hope that clarifies my question.

If you can afford an approximation algorithm with a chance of error, then you should probably consider MinHash.
This algorithm allows estimating the similarity between 2 sets in constant time. For any constructed set, a fixed size signature is computed, and then only the signatures are compared when estimating the similarities. The similarity measure being used is Jaccard distance, which ranges from 0 (disjoint sets) to 1 (identical sets). It is defined as the intersection to union ratio of two given sets.
With this approach, any new set has to be compared against all existing ones (in linear time), and then the results can be merged into the top list (you can use a bounded search tree/heap for this purpose).

Since the number of possible different values is not very large, you get a fairly efficient hashing if you simply set the nth bit in a "large integer" when the nth number is present in your set. You can then look for overlap between sets with a simple bitwise AND followed by a "count set bits" operation. On 64 bit architecture, that means that you can look for the similarity between two numbers (out of 1000 possible values) in about 16 cycles, regardless of the number of values in each cluster. As the cluster gets more sparse, this becomes a less efficient algorithm.
Still - I implemented some of the basic functions you might need in some code that I attach here - not documented but reasonably understandable, I think. In this example I made the numbers small so I can check the result by hand - you might want to change some of the #defines to get larger ranges of values, and obviously you will want some dynamic lists etc to keep up with the growing catalog.
#include <stdio.h>
// biggest number you will come across: want this to be much bigger
#define MAXINT 25
// use the biggest type you have - not int
#define BITSPER (8*sizeof(int))
// max number in a cluster
#define CSIZE 5
typedef struct{
unsigned int num[NWORDS]; // want to use longest type but not for demo
int newmatch;
int rank;
} hmap;
// convert number to binary sequence:
void hashIt(int* t, int n, hmap* h) {
int ii;
for(ii=0;ii<n;ii++) {
int a, b;
a = t[ii]%BITSPER;
b = t[ii]/BITSPER;
// print binary number:
void printBinary(int n) {
unsigned int jj;
jj = 1<<31;
while(jj!=0) {
printf(" ");
// print the array of binary numbers:
void printHash(hmap* h) {
unsigned int ii, jj;
for(ii=0; ii<NWORDS; ii++) {
jj = 1<<31;
printf("0x%08x: ", h->num[ii]);
// find the maximum overlap for set m of n
int maxOverlap(hmap* h, int m, int n) {
int ii, jj;
int overlap, maxOverlap = -1;
for(ii = 0; ii<n; ii++) {
if(ii == m) continue; // don't compare with yourself
else {
overlap = 0;
for(jj = 0; jj< NWORDS; jj++) {
// just to see what's going on: take these print statements out
int bc = countBits(h->num[ii] & h->num[m]);
printBinary(h->num[ii] & h->num[m]);
printf("%d bits overlap\n", bc);
overlap += bc;
if(overlap > maxOverlap) maxOverlap = overlap;
return maxOverlap;
int countBits (unsigned int b) {
int count;
for (count = 0; b != 0; count++) {
b &= b - 1; // this clears the LSB-most set bit
return count;
int main(void) {
int cluster[20][CSIZE];
int temp[CSIZE];
int ii,jj;
static hmap H[20]; // make them all 0 initially
for(jj=0; jj<20; jj++){
for(ii=0; ii<CSIZE; ii++) {
temp[ii] = rand()%MAXINT;
hashIt(temp, CSIZE, &H[jj]);
for(ii=0;ii<20;ii++) {
printf("max overlap: %d\n", maxOverlap(H, ii, 20));
Find the missing coordinate of rectangle

Chef has N axis-parallel rectangles in a 2D Cartesian coordinate system. These rectangles may intersect, but it is guaranteed that all their 4N vertices are pairwise distinct.
Unfortunately, Chef lost one vertex, and up until now, none of his fixes have worked (although putting an image of a point on a milk carton might not have been the greatest idea after all…). Therefore, he gave you the task of finding it! You are given the remaining 4N−1 points and you should find the missing one.
The first line of the input contains a single integer T denoting the number of test cases. The description of T test cases follows.
The first line of each test case contains a single integer N.
Then, 4N−1 lines follow. Each of these lines contains two space-separated integers x and y denoting a vertex (x,y) of some rectangle.
For each test case, print a single line containing two space-separated integers X and Y ― the coordinates of the missing point. It can be proved that the missing point can be determined uniquely.
the sum of N over all test cases does not exceed 2⋅105
Example Input
1 1
1 2
4 6
2 1
9 6
9 3
4 3
Example Output
2 2
Problem link:
my approach: I have created a frequency array for x and y coordinates and then calculated the point which is coming odd no. of times.
#include <iostream>
using namespace std;
int main() {
// your code goes here
int t;
long int n;
long long int a[4*n-1][2];
long long int xm,ym,x,y;
for(int i=0;i<4*n-1;i++)
long long int frqx[xm+1],frqy[ym+1];
for(long long int i=0;i<xm+1;i++)
for(long long int j=0;j<ym+1;j++)
for(long long int i=0;i<4*n-1;i++)
for(long long int i=0;i<xm+1;i++)
if(frqx[i]>0 && frqx[i]%2>0)
for(long long int j=0;j<ym+1;j++)
if(frqy[j]>0 && frqy[j]%2>0)
cout<<x<<" "<<y<<"\n";
return 0;
My code is showing TLE for inputs <10^6
First of all, your solution is not handling negative x/y correctly. long long int frqx[xm+1],frqy[ym+1] allocated barely enough memory to hold positive values, but not enough to hold negative ones.
It doesn't even matter though, as with the guarantee that abs(x) <= 109, you can just statically allocate a vector of 219 elements, and map both positive and negative coordinates in there.
Second, you are not supposed to buffer the input in a. Not only is this going to overflow the stack, is also entirely unnecessary. Write to the frequency buckets right away, don't buffer.
Same goes for most of these challenges. Don't buffer, always try to process the input directly.
About your buckets, you don't need a long long int. A bool per bucket is enough. You do not care even the least how many coordinates were sorted into the bucket, only whether the number so far was even or not. What you implemented as a separate loop can be substituted by simply toggling a flag while processing the input.
I find the answer of #Ext3h with respect to the errors adequate.
The solution, giving that you came on the odd/even quality of the problem,
can be done more straight-forward.
You need to find the x and y that appear an odd number of times.
In java
int[] missingPoint(int[][] a) {
//int n = (a.length + 1) / 4;
int[] pt = new int[2]; // In C initialize with 0.
for (int i = 0; i < a.length; ++i) {
for (int j = 0; j < 2; ++j) {
pt[j] ^= a[i][j];
return pt;
This uses exclusive-or ^ which is associative and reflexive 0^x=x, x^x=0. (5^7^4^7^5=4.)
For these "search the odd one" one can use this xor-ing.
Selection Sort in Cuda

So, I'm trying to implement selection sort in Cuda, but so far I haven't been as successful.
__device__ void selection_sort( int *data, int left, int right ){
for( int i = left ; i <= right ; ++i ){
int min_val = data[i];
int min_idx = i;
// Find the smallest value in the range [left, right].
for( int j = i+1 ; j <= right ; ++j ){
int val_j = data[j];
if( val_j < min_val ){
min_idx = j;
min_val = val_j;
// Swap the values.
if( i != min_idx ){
data[min_idx] = data[i];
data[i] = min_val;
My main attempt here is to find the minimum and parallelize the solution. Now, I realize the code looks very C++ 'ish but I'm nowhere qualified as skilled in Cuda.
Is there a way to parallelize the solution? Are there any more additions to be made?
Selection sort algorithm for N numbers can be roughly described as:
for i from N-1 down to 0
find the maximum element among data[0] ~ data[i]
swap that maximum element with data[i] within the data array
The first part (finding the maximum element) falls into a widely known and well documented class of problems called reduction. However, to perform the second part (swapping), you must track the index of the maximum element while comparing the values, and it is not so natural to do that while performing reduction. This is one of the reasons why selection sort do not port well to parallel architectures.
Also, you can see that the problem size diminishes by one for each loop, and this is another aspect of the selection sort algorithm that does not map well to parallel architectures. In case of CUDA, 32 threads form a warp, which execute at the same time. Although you can tell arbitrary number of threads to run within a warp, it is generally not recommended to do so because it is a loss of computing power.
I've tried to build a CUDA version of selection sort myself, but I stopped doing it because it seems there are better algorithms well suited for CUDA. But I'll just show you what I've done so far to illustrate why selection sort is not good for CUDA.
Firstly, start from a small and simple problem: sorting 32 elements. Since 32 threads form a warp, you can use shuffle instructions to find maximum value. (Full code)
// Finds the maximum element within a warp and gives the maximum element to
// thread with lane id 0. Note that other elements do not get lost but their
// positions are shuffled.
__inline__ __device__ int warpMax(int data, unsigned int threadId)
for (int mask = 16; mask > 0; mask /= 2) {
int dual_data = __shfl_xor(data, mask, 32);
if (threadId & mask)
data = min(data, dual_data);
data = max(data, dual_data);
return data;
__global__ void selection32(int* d_data, int* d_data_sorted)
unsigned int threadId = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int laneId = threadIdx.x % 32;
int n = N;
while(n-- > 0) {
// get the maximum element among d_data and put it in d_data_sorted[n]
int data = d_data[threadId];
data = warpMax(data, threadId);
d_data[threadId] = data;
// now maximum element is in d_data[0]
if (laneId == 0) {
d_data_sorted[n] = d_data[0];
d_data[0] = INT_MIN; // this element is ignored from now on
int main()
// ... build data and trasfer to d_data ...
selection32<<<1, 32>>>(d_data, d_data_sorted);
// ... get the sorted array stored at d_data_sorted ...
(Some may argue that this is not exactly a selection sort since 1) the array elements of the unsorted area keep shuffling, and 2) it is not an in-place sort. Please note that I'm just trying to show that selection sort does not fit in for CUDA. Also, note that warpMax has highly divergent branches, making it less optimal for CUDA.)
The case with only 1 warp of elements may look parallel-ish, but the thing gets worse when the problem size increases to multiple warps. Let's see the case for 1024 elements. (I've chosen the number 1024 becuase it is the maximum number limit of threads in a block.) Now there are 32 warps, and after calling warpMax for each warp, we must compare the maximum elements of each warp to get the maximum element among the 1024 elements. This problem of comparing 32 warp-maximum-values cannot be done with warpMax because we need to track in which warp the maximum value came from to swap the maximum value with the last element in the data array. One way I can think of for doing this is using one single thread to compare warp-maximum-values. This is not a good implemenation for CUDA becuase other 1023 threads in the block become idle.
Covering segments by points

I did search and looked at these below links but it didn't help .
Point covering problem
Segments poked (covered) with points - any tricky test cases?
Need effective greedy for covering a line segment
Problem Description:
You are given a set of segments on a line and your goal is to mark as
few points on a line as possible so that each segment contains at least
one marked point
Given a set of n segments {[a0,b0],[a1,b1]....[an-1,bn-1]} with integer
coordinates on a line, find the minimum number 'm' of points such that
each segment contains at least one point .That is, find a set of
integers X of the minimum size such that for any segment [ai,bi] there
is a point x belongs X such that ai <= x <= bi
Output Description:
Output the minimum number m of points on the first line and the integer
coordinates of m points (separated by spaces) on the second line
Sample Input - I
1 3
2 5
3 6
Output - I
Sample Input - II
4 7
1 3
2 5
5 6
Output - II
3 6
I didn't understand the question itself. I need the explanation, on how to solve this above problem, but i don't want the code. Examples would be greatly helpful
Maybe this formulation of the problem will be easier to understand. You have n people who can each tolerate a different range of temperatures [ai, bi]. You want to find the minimum number of rooms to make them all happy, i.e. you can set each room to a certain temperature so that each person can find a room within his/her temperature range.
As for how to solve the problem, you said you didn't want code, so I'll just roughly describe an approach. Think about the coldest room you have. If making it one degree warmer won't cause anyone to no longer be able to tolerate that room, you might as well make the increase, since that can only allow more people to use that room. So the first temperature you should set is the warmest one that the most cold-loving person can still tolerate. In other words, it should be the smallest of the bi. Now this room will satisfy some subset of your people, so you can remove them from consideration. Then repeat the process on the remaining people.
Now, to implement this efficiently, you might not want to literally do what I said above. I suggest sorting the people according to bi first, and for the ith person, try to use an existing room to satisfy them. If you can't, try to create a new one with the highest temperature possible to satisfy them, which is bi.
Yes the description is pretty vague and the only meaning that makes sense to me is this:
You got some line
Segment on a line is defined by l,r
Where one parameter is distance from start of line and second is the segments length. Which one is which is hard to tell as the letters are not very usual for such description. My bet is:
l length of segment
r distance of (start?) of segment from start of line
You want to find min set of points
So that each segment has at least one point in it. That mean for 2 overlapped segments you need just one point ...
Surely there are more option how to solve this, the obvious is genere & test with some heuristics like genere combinations only for segments that are overlapped more then once. So I would attack this task in this manner (using assumed terminology from #2):
sort segments by r
add number of overlaps to your segment set data
so the segment will be { r,l,n } and set the n=0 for all segments for now.
scan segments for overlaps
something like
for (i=0;i<segments;i++) // loop all segments
for (j=i+1;j<segments;j++) // loop all latter segments until they are still overlapped
if ( segment[i] and segment [j] are overlapped )
segment[i].n++; // update overlap counters
else break;
Now if the r-sorted segments are overlapped then
segment[i].r <=segment[j].r
scan segments handling non overlapped segments
for each segment such that segment[i].n==0 add to the solution point list its point (middle) defined by distance from start of line.
And after that remove segment from the list (or tag it as used or what ever you do for speed boost...).
scan segments that are overlapped just once
So if segment[i].n==1 then you need to determine if it is overlapped with i-1 or i+1. So add the mid point of the overlap to the solution points and remove i segment from list. Then decrement the n of the overlapped segment (i+1 or i-1)` and if zero remove it too.
points.add(0.5*( segment[j].r + min(segment[i].r+segment[i].l , segment[j].r+segment[j].l )));
Loop this whole scanning until there is no new point added to the solution.
now you got only multiple overlaps left
From this point I will be a bit vague for 2 reasons:
I do not have this tested and I d not have any test data to validate not to mention I am lazy.
This smells like assignment so there is some work/fun left for you.
From start I would scann all segments and remove all of them which got any point from the solution inside. This step you should perform after any changes in the solution.
Now you can experiment with generating combination of points for each overlapped group of segments and remember the minimal number of points covering all segments in group. (simply by brute force).
There are more heuristics possible like handling all twice overlapped segments (in similar manner as the single overlaps) but in the end you will have to do brute force on the rest of data ...
[edit1] as you added new info
The r,l means distance of left and right from the start of line. So if you want to convert between the other formulation { r',l' } and (l<=r) then
and back
Sorry too lazy to rewrite the whole thing ...
Here is the working solution in C, please refer to it partially and try to fix your code before reading the whole. Happy coding :) Spoiler alert
#include <stdio.h>
#include <stdlib.h>
int cmp_func(const void *ptr_a, const void *ptr_b)
const long *a = *(double **)ptr_a;
const long *b = *(double **)ptr_b;
if (a[1] == b[1])
return a[0] - b[0];
return a[1] - b[1];
int main()
int i, j, n, num_val;
long **arr;
scanf("%d", &n);
long values[n];
arr = malloc(n * sizeof(long *));
for (i = 0; i < n; ++i) {
*(arr + i) = malloc(2 * sizeof(long));
scanf("%ld %ld", &arr[i][0], &arr[i][1]);
qsort(arr, n, sizeof(long *), cmp_func);
i = j = 0;
num_val = 0;
while (i < n) {
int skip = 0;
values[num_val] = arr[i][1];
for (j = i + 1; j < n; ++j) {
int condition;
condition = arr[i][1] <= arr[j][1] ? arr[j][0] <= arr[i][1] : 0;
if (condition) {
} else {
i += skip + 1;
printf("%d\n", num_val);
for (int k = 0; k < num_val; ++k) {
printf("%ld ", values[k]);
return 0;
Here's the working code in C++ for anyone searching :)
#include <bits/stdc++.h>
#define ll long long
#define double long double
#define vi vector<int>
#define endl "\n"
#define ff first
#define ss second
#define pb push_back
#define all(x) (x).begin(),(x).end()
#define mp make_pair
using namespace std;
bool cmp(const pair<ll,ll> &a, const pair<ll,ll> &b)
return (a.second < b.second);
vector<ll> MinSig(vector<pair<ll,ll>>&vec)
vector<ll> points;
for(int x=0;x<vec.size()-1;)
bool found=false;
for(int y=x+1;y<vec.size();y++)
return points;
int main()
int n;
for(int x=0;x<n;x++)
ll temp1,temp2;
for(auto it:res)
Simple random number generator that can generate nth number in series in O(1) time

I do not intend to use this for security purposes or statistical analysis. I need to create a simple random number generator for use in my computer graphics application. I don't want to use the term "random number generator", since people think in very strict terms about it, but I can't think of any other word to describe it.
it has to be fast.
it must be repeatable, given a particular seed.
Eg: If seed = x, then the series a,b,c,d,e,f..... should happen every time I use the seed x.
Most importantly, I need to be able to compute the nth term in the series in constant time.
It seems, that I cannot achieve this with rand_r or srand(), since these need are state dependent, and I may need to compute the nth in some unknown order.
I've looked at Linear Feedback Shift registers, but these are state dependent too.
So far I have this:
int rand = (n * prime1 + seed) % prime2
n = used to indicate the index of the term in the sequence. Eg: For
first term, n ==1
prime1 and prime2 are prime numbers where
prime1 > prime2
seed = some number which allows one to use the same function to
produce a different series depending on the seed, but the same series
for a given seed.
I can't tell how good or bad this is, since I haven't used it enough, but it would be great if people with more experience in this can point out the problems with this, or help me improve it..
EDIT - I don't care if it is predictable. I'm just trying to creating some randomness in my computer graphics.
Use a cryptographic block cipher in CTR mode. The Nth output is just encrypt(N). Not only does this give you the desired properties (O(1) computation of the Nth output); it also has strong non-predictability properties.
I stumbled on this a while back, looking for a solution for the same problem. Recently, I figured out how to do it in low-constant O(log(n)) time. While this doesn't quite match the O(1) requested by the author, It may be fast enough (a sample run, compiled with -O3, achieved performance of 1 billion arbitrary index random numbers, with n varying between 1 and 2^48, in 55.7s -- just shy of 18M numbers/s).
First, the theory behind the solution:
A common type of RNGs are Linear Congruential Generators, basically, they work as follows:
random(n) = (m*random(n-1) + b) mod p
Where m and b, and p are constants (see a reference on LCGs for how they are chosen). From this, we can devise the following using a bit of modular arithmetic:
random(0) = seed mod p
random(1) = m*seed + b mod p
random(2) = m^2*seed + m*b + b mod p
random(n) = m^n*seed + b*Sum_{i = 0 to n - 1} m^i mod p
= m^n*seed + b*(m^n - 1)/(m - 1) mod p
Computing the above can be a problem, since the numbers will quickly exceed numeric limits. The solution for the generic case is to compute m^n in modulo with p*(m - 1), however, if we take b = 0 (a sub-case of LCGs sometimes called Multiplicative congruential Generators), we have a much simpler solution, and can do our computations in modulo p only.
In the following, I use the constant parameters used by RANF (developed by CRAY), where p = 2^48 and g = 44485709377909. The fact that p is a power of 2 reduces the number of operations required (as expected):
#include <cassert>
#include <stdint.h>
#include <cstdlib>
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
// Sets the MCG to a specific index
inline void setPosition(size_t index){
state = seed;
uint64_t mPower = m;
for (uint64_t b = 1; index; b <<= 1){
if (index & b){
state *= mPower;
index ^= b;
mPower *= mPower;
#include <cstdio>
void example(){
RANF R(1);
// Gets the number through random-access -- O(log(n))
R.setPosition(12345); // Goes to the nth random number
printf("fast nth number = %lu\n", R.getNext());
// Gets the number through standard, sequential access -- O(n)
for(size_t i = 0; i < 12345; i++) R.getNext();
printf("slow nth number = %lu\n", R.getNext());
While I presume the author has moved on by now, hopefully this will be of use to someone else.
If you're really concerned about runtime performance, the above can be made about 10x faster with lookup tables, at the cost of compilation time and binary size (it also is O(1) w.r.t the desired random index, as requested by OP)
In the version below, I used c++14 constexpr to generate the lookup tables at compile time, and got to 176M arbitrary index random numbers per second (doing this did however add about 12s of extra compilation time, and a 1.5MB increase in binary size -- the added time may be mitigated if partial recompilation is used).
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
// Lookup table
struct lookup_t{
uint64_t v[3][65536];
constexpr lookup_t() : v() {
uint64_t mi = RANF::m;
for (size_t i = 0; i < 3; i++){
v[i][0] = 1;
uint64_t val = mi;
for (uint16_t j = 0x0001; j; j++){
v[i][j] = val;
val *= mi;
mi = val;
friend struct lookup_t;
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
// Sets the MCG to a specific index
// Note: idx.u16 indices need to be adapted for big-endian machines!
inline void setPosition(size_t index){
static constexpr auto lookup = lookup_t();
union { uint16_t u16[4]; uint64_t u64; } idx;
idx.u64 = index;
state = seed * lookup.v[0][idx.u16[0]] * lookup.v[1][idx.u16[1]] * lookup.v[2][idx.u16[2]];
Basically, what this does is splits the computation of, for example, m^0xAAAABBBBCCCC mod p, into (m^0xAAAA00000000 mod p)*(m^0xBBBB0000 mod p)*(m^CCCC mod p) mod p, and then precomputes tables for each of the values in the 0x0000 - 0xFFFF range that could fill AAAA, BBBB or CCCC.
RNG in a normal sense, have the sequence pattern like f(n) = S(f(n-1))
They also lost precision at some point (like % mod), due to computing convenience, therefore it is not possible to expand the sequence to a function like X(n) = f(n) = trivial function with n only.
This mean at best you have O(n) with that.
To target for O(1) you therefore need to abandon the idea of f(n) = S(f(n-1)), and designate a trivial formula directly so that the N'th number can be calculated directly without knowing (N-1)'th; this also render the seed meaningless.
So, you end up have a simple algebra function and not a sequence. For example:
int my_rand(int n) { return 42; } // Don't laugh!
int my_rand(int n) { 3*n*n + 2*n + 7; }
If you want to put more constraint to the generated pattern (like distribution), it become a complex maths problem.
However, for your original goal, if what you want is constant speed to get pseudo-random numbers, I suggest to pre-generate it with traditional RNG and access with lookup table.
EDIT: I noticed you have concern with a table size for a lot of numbers, however you may introduce some hybrid model, like a table of N entries, and do f(k) = g( tbl[k%n], k), which at least provide good distribution across N continue sequence.
This demonstrates an PRNG implemented as a hashed counter. This might appear to duplicate R.'s suggestion (using a block cipher in CTR mode as a stream cipher), but for this, I avoided using cryptographically secure primitives: for speed of execution and because security wasn't a desired feature.
If we were trying to create a secure stream cipher with your requirement that any emitted sequence be trivially repeatable, given knowledge of its index...
...then we could choose a secure hash algorithm (like SHA256) and a counter with a lot of bits (maybe 2048 -> sequence repeats every 2^2048 generated numbers without reseeding).
HOWEVER, the version I present here uses Bob Jenkins' famous hash function (simple and fast, but not secure) along with a 64-bit counter (which is as big as integers can get on my system, without needing custom incrementing code).
Code in main demonstrates that knowledge of the RNG's counter (seed) after initialization allows a PRNG sequence to be repeated, as long as we know how many values were generated leading up to the repetition point.
Actually, if you know the counter's value at any point in the output sequence, you will be able to retrieve all values generated previous to that point, AND all values which will be generated afterward. This only involves adding or subtracting ordinal differences to/from a reference counter value associated with a known point in the output sequence.
It should be pretty easy to adapt this class for use as a testing framework -- you could plug in other hash functions and change the counter's size to see what kind of impact there is on speed as well as the distribution of generated values (the only uniformity analysis I did was to look for patterns in the screenfuls of hexadecimal numbers printed by main()).
#include <iostream>
#include <iomanip>
#include <ctime>
using namespace std;
class CHashedCounterRng {
static unsigned JenkinsHash(const void *input, unsigned len) {
unsigned hash = 0;
for(unsigned i=0; i<len; ++i) {
hash += static_cast<const unsigned char*>(input)[i];
hash += hash << 10;
hash ^= hash >> 6;
hash += hash << 3;
hash ^= hash >> 11;
hash += hash << 15;
return hash;
unsigned long long m_counter;
void IncrementCounter() { ++m_counter; }
unsigned long long GetSeed() const {
return m_counter;
void SetSeed(unsigned long long new_seed) {
m_counter = new_seed;
unsigned int operator ()() {
// the next random number is generated here
const auto r = JenkinsHash(&m_counter, sizeof(m_counter));
return r;
// the default coontructor uses time()
// to seed the counter
CHashedCounterRng() : m_counter(time(0)) {}
// you can supply a predetermined seed here,
// or after construction with SetSeed(seed)
CHashedCounterRng(unsigned long long seed) : m_counter(seed) {}
int main() {
CHashedCounterRng rng;
// time()'s high bits change very slowly, so look at low digits
// if you want to verify that the seed is different between runs
const auto stored_counter = rng.GetSeed();
cout << "initial seed: " << stored_counter << endl;
for(int i=0; i<20; ++i) {
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
cout << endl;
cout << endl;
cout << "The last line again:" << endl;
rng.SetSeed(stored_counter + 19 * 8);
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
cout << endl << endl;
Find the largest subset of it which form a sequence

I came across this problem during an interview forum.,
Given an int array which might contain duplicates, find the largest subset of it which form a sequence.
Eg. {1,6,10,4,7,9,5}
then ans is 4,5,6,7
Sorting is an obvious solution. Can this be done in O(n) time.
My take on the problem is that this cannot be done O(n) time & the reason is that if we could do this in O(n) time we could do sorting in O(n) time also ( without knowing the upper bound).
As a random array can contain all the elements in sequence but in random order.
Does this sound a plausible explanation ? your thoughts.
I believe it can be solved in O(n) if you assume you have enough memory to allocate an uninitialized array of a size equal to the largest value, and that allocation can be done in constant time. The trick is to use a lazy array, which gives you the ability to create a set of items in linear time with a membership test in constant time.
Phase 1: Go through each item and add it to the lazy array.
Phase 2: Go through each undeleted item, and delete all contiguous items.
In phase 2, you determine the range and remember it if it is the largest so far. Items can be deleted in constant time using a doubly-linked list.
Here is some incredibly kludgy code that demonstrates the idea:
int main(int argc,char **argv)
static const int n = 8;
int values[n] = {1,6,10,4,7,9,5,5};
int index[n];
int lists[n];
int prev[n];
int next_existing[n]; //
int prev_existing[n];
int index_size = 0;
int n_lists = 0;
// Find largest value
int max_value = 0;
for (int i=0; i!=n; ++i) {
int v=values[i];
if (v>max_value) max_value=v;
// Allocate a lazy array
int *lazy = (int *)malloc((max_value+1)*sizeof(int));
// Set items in the lazy array and build the lists of indices for
// items with a particular value.
for (int i=0; i!=n; ++i) {
next_existing[i] = i+1;
prev_existing[i] = i-1;
int v = values[i];
int l = lazy[v];
if (l>=0 && l<index_size && index[l]==v) {
// already there, add it to the list
prev[n_lists] = lists[l];
lists[l] = n_lists++;
else {
// not there -- create a new list
l = index_size;
lazy[v] = l;
index[l] = v;
prev[n_lists] = -1;
lists[l] = n_lists++;
// Go through each contiguous range of values and delete them, determining
// what the range is.
int max_count = 0;
int max_begin = -1;
int max_end = -1;
int i = 0;
while (i<n) {
// Start by searching backwards for a value that isn't in the lazy array
int dir = -1;
int v_mid = values[i];
int v = v_mid;
int begin = -1;
for (;;) {
int l = lazy[v];
if (l<0 || l>=index_size || index[l]!=v) {
// Value not in the lazy array
if (dir==1) {
// Hit the end
if (v-begin>max_count) {
max_count = v-begin;
max_begin = begin;
max_end = v;
// Hit the beginning
begin = v+1;
dir = 1;
v = v_mid+1;
else {
// Remove all the items with value v
int k = lists[l];
while (k>=0) {
if (k!=i) {
next_existing[prev_existing[l]] = next_existing[l];
prev_existing[next_existing[l]] = prev_existing[l];
k = prev[k];
v += dir;
// Go to the next existing item
i = next_existing[i];
// Print the largest range
for (int i=max_begin; i!=max_end; ++i) {
if (i!=max_begin) fprintf(stderr,",");
I would say there are ways to do it. The algorithm is the one you already describe, but just use a O(n) sorting algorithm. As such exist for certain inputs (Bucket Sort, Radix Sort) this works (this also goes hand in hand with your argumentation why it should not work).
Vaughn Cato suggested implementation is working like this (its working like a bucket sort with the lazy array working as buckets-on-demand).
As shown by M. Ben-Or in Lower bounds for algebraic computation trees, Proc. 15th ACM Sympos. Theory Comput., pp. 80-86. 1983 cited by J. Erickson in pdf Finding Longest Arithmetic Progressions, this problem cannot be solved in less than O(n log n) time (even if the input is already sorted into order) when using an algebraic decision tree model of computation.
Earlier, I posted the following example in a comment to illustrate that sorting the numbers does not provide an easy answer to the question: Suppose the array is given already sorted into ascending order. For example, let it be (20 30 35 40 47 60 70 80 85 95 100). The longest sequence found in any subsequence of the input is 20,40,60,80,100 rather than 30,35,40 or 60,70,80.
Regarding whether an O(n) algebraic decision tree solution to this problem would provide an O(n) algebraic decision tree sorting method: As others have pointed out, a solution to this subsequence problem for a given multiset does not provide a solution to a sorting problem for that multiset. As an example, consider set {2,4,6,x,y,z}. The subsequence solver will give you the result (2,4,6) whenever x,y,z are large numbers not in arithmetic sequence, and it will tell you nothing about the order of x,y,z.
What about this? populate a hash-table so each value stores the start of the range seen so far for that number, except for the head element that stores the end of the range. O(n) time, O(n) space. A tentative Python implementation (you could do it with one traversal keeping some state variables, but this way seems more clear):
def longest_subset(xs):
table = {}
for x in xs:
start = table.get(x-1, x)
end = table.get(x+1, x)
if x+1 in table:
table[end] = start
if x-1 in table:
table[start] = end
table[x] = (start if x-1 in table else end)
start, end = max(table.items(), key=lambda pair: pair[1]-pair[0])
return list(range(start, end+1))
print(longest_subset([1, 6, 10, 4, 7, 9, 5]))
# [4, 5, 6, 7]
here is a un-optimized O(n) implementation, maybe you will find it useful:
for i in range(0,len(A)):
if not hash_tb.has_key(A[i]):
for i in range(0,max(A)):
if hash_tb.has_key(i):
if len(cur_seq)>len(max_sq):
print max_sq
