I have a structure which has 3 identifier fields and one value field. I have a list of these objects. To give an analogy, the identifier fields are like the primary keys to the object. These 3 fields uniquely identify an object.
Class
{
int a1;
int a2;
int a3;
int value;
};
I would be having a list of say 1000 object of this datatype. I need to check for specific values of these identity key values by passing values of a1, a2 and a3 to a lookup function which would check if any object with those specific values of a1, a2 and a3 is present and returns that value. What is the most effective way to implement this to achieve a best lookup time?
One solution I could think of is to have a 3 dimensional matrix of length say 1000 and populate the value in it. This has a lookup time of O(1). But the disadvantages are.
1. I need to know the length of array.
2. For higher identity fields (say 20), then I will need a 20 dimension matrix which would be an overkill on the memory. For my actual implementation, I have 23 identity fields.
Can you suggest a good way to store this data which would give me the best look up time?
Create a key class that contains all the identity fields, and define an appropriate equals function and hash method, and then use a hash map to map from the key class to its associated value. This will give you a time complexity of O(1) per lookup in the expected case, and it only requires space proportional to the number of actual key combinations observed (typically twice the number, although you can adjust the constant for the time/space tradeoff that you desire), rather than space proportional to all possible key combinations.
Use hash table (map). Construct the key to be "a1-a2-a3", and store data to H(key)=data.
I would simply sort the array by key, and use a binary search.
(untested)
int compare_entry(ENTRY *k1, ENTRY *k2) {
int d = k1->a1 - k2->a1;
if (d == 0) {
d = k1->a2 - k2->a2;
if (d == 0) {
d = k1->a3 - k2->a3;
}
}
return d; // >0 is k1 > k2, 0 if k1 == k2, <0 if k1 < k2
}
// Derived from Wikipedia
int find(ENTRY *list, int size, ENTRY *value) {
int low = 0;
int n = size - 1;
int high = n;
while (low < high) {
int mid = low + (high - low) / 2
int cmp = compare_entry(&list[mid], value);
if (cmp < 0) {
low = mid + 1;
} else {
high = mid;
}
}
if (low < n) {
int cmp = compare_entry(&list[low], value);
if (cmp == 0) {
return low; // found item at 'low' index
}
} else {
return -1; // not found
}
}
Absolutely worst case, you run through this thing, what, 10 times, and end up actually doing all of the comparisons in the key comparison. So that's, what, 85 integer math operations (additions, subtraction, and 1 shift)?
if your a1-a3 are ranging 0-100, then you can make your key a1 * 10000 + a2 * 100 + a3, and do a single compare, and worst case is 63 integer math operations. And your entire array fits within cache on most any modern processor. And it's memory efficient.
You can burn memory with a perfect hash or some other sparse matrix. Even with a perfect hash, I bet the hash calculation itself is competitive with this time, considering multiplication is expensive. This hits the memory bus harder, obviously.
Related
Looking for an efficient algorithm to match sets among a group of sets, ordered by the most overlapping members. 2 identical sets for example are the best match, while no overlapping members are the worst.
So, the algorithm takes input a list of sets and returns matching set pairs ordered by the sets with the most overlapping members.
Would be interested in ideas to do this efficiently. Brute force approach is to try all combinations and sort which obviously is not very performant when the number of sets is very large.
Edit: Use case - Assume a large number of sets already exist. When a new set arrives, the algorithm is run and the output includes matching sets (with at least one element overlap) sorted by the most matching to least (doesn't matter how many items are in the new/incoming set). Hope that clarifies my question.
If you can afford an approximation algorithm with a chance of error, then you should probably consider MinHash.
This algorithm allows estimating the similarity between 2 sets in constant time. For any constructed set, a fixed size signature is computed, and then only the signatures are compared when estimating the similarities. The similarity measure being used is Jaccard distance, which ranges from 0 (disjoint sets) to 1 (identical sets). It is defined as the intersection to union ratio of two given sets.
With this approach, any new set has to be compared against all existing ones (in linear time), and then the results can be merged into the top list (you can use a bounded search tree/heap for this purpose).
Since the number of possible different values is not very large, you get a fairly efficient hashing if you simply set the nth bit in a "large integer" when the nth number is present in your set. You can then look for overlap between sets with a simple bitwise AND followed by a "count set bits" operation. On 64 bit architecture, that means that you can look for the similarity between two numbers (out of 1000 possible values) in about 16 cycles, regardless of the number of values in each cluster. As the cluster gets more sparse, this becomes a less efficient algorithm.
Still - I implemented some of the basic functions you might need in some code that I attach here - not documented but reasonably understandable, I think. In this example I made the numbers small so I can check the result by hand - you might want to change some of the #defines to get larger ranges of values, and obviously you will want some dynamic lists etc to keep up with the growing catalog.
#include <stdio.h>
// biggest number you will come across: want this to be much bigger
#define MAXINT 25
// use the biggest type you have - not int
#define BITSPER (8*sizeof(int))
#define NWORDS (MAXINT/BITSPER + 1)
// max number in a cluster
#define CSIZE 5
typedef struct{
unsigned int num[NWORDS]; // want to use longest type but not for demo
int newmatch;
int rank;
} hmap;
// convert number to binary sequence:
void hashIt(int* t, int n, hmap* h) {
int ii;
for(ii=0;ii<n;ii++) {
int a, b;
a = t[ii]%BITSPER;
b = t[ii]/BITSPER;
h->num[b]|=1<<a;
}
}
// print binary number:
void printBinary(int n) {
unsigned int jj;
jj = 1<<31;
while(jj!=0) {
printf("%c",((n&jj)!=0)?'1':'0');
jj>>=1;
}
printf(" ");
}
// print the array of binary numbers:
void printHash(hmap* h) {
unsigned int ii, jj;
for(ii=0; ii<NWORDS; ii++) {
jj = 1<<31;
printf("0x%08x: ", h->num[ii]);
printBinary(h->num[ii]);
}
//printf("\n");
}
// find the maximum overlap for set m of n
int maxOverlap(hmap* h, int m, int n) {
int ii, jj;
int overlap, maxOverlap = -1;
for(ii = 0; ii<n; ii++) {
if(ii == m) continue; // don't compare with yourself
else {
overlap = 0;
for(jj = 0; jj< NWORDS; jj++) {
// just to see what's going on: take these print statements out
printBinary(h->num[ii]);
printBinary(h->num[m]);
int bc = countBits(h->num[ii] & h->num[m]);
printBinary(h->num[ii] & h->num[m]);
printf("%d bits overlap\n", bc);
overlap += bc;
}
if(overlap > maxOverlap) maxOverlap = overlap;
}
}
return maxOverlap;
}
int countBits (unsigned int b) {
int count;
for (count = 0; b != 0; count++) {
b &= b - 1; // this clears the LSB-most set bit
}
return count;
}
int main(void) {
int cluster[20][CSIZE];
int temp[CSIZE];
int ii,jj;
static hmap H[20]; // make them all 0 initially
for(jj=0; jj<20; jj++){
for(ii=0; ii<CSIZE; ii++) {
temp[ii] = rand()%MAXINT;
}
hashIt(temp, CSIZE, &H[jj]);
}
for(ii=0;ii<20;ii++) {
printHash(&H[ii]);
printf("max overlap: %d\n", maxOverlap(H, ii, 20));
}
}
See if this helps at all...
There is an array (greater than 1000 elements space) with 1000 large numbers (can be 64 bit numbers as well). The numbers in the array may not be necessarily sorted.
We have to generate a unique number at 1001th position that is different from the previous 1000.
Justify the approach used is the best.
My answer (don't know to what extent this was correct):
Sort the numbers, and start from the 0 position. The number that is at 1000th position + 1 is the required number.
Better suggestions for this?
Create an auxiliary array of 1001 elements. Set all these to 1 (or true or Y or whatever you choose). Run through the main array, if you find a number in the range 1..1000 then 0 out (or falsify some other how) the corresponding element in the auxiliary array. At the end the first element in the auxiliary array which is not 0 (or false) corresponds to a number which is not in the main array.
This is simple, and, I think, O(n) in time complexity, where n is the number of elements in the main array.
unsigned ii,slot;
unsigned array [ NNN ];
/* allocate a histogram */
#define XXX (NNN+1);
unsigned histogram [XXX];
memset(histogram, 0, sizeof histogram);
for (ii=0; ii < NNN; ii++) {
slot = array [ii ] % XXX;
histogram[slot] += 1;
}
for (slot=0; slot < NNN; slot++) {
if ( !histogram[slot]) break;
}
/* Now, slot + k * XXX will be a
** number that does not occur in the original array */
Note: this is similar to High performance Mark, but at least I typed in the code ...
If you sort your array, you have three possibilities for a unique number:
array[999]+1, if array[999] is not equal to INT_MAX
array[0]-1, if array[0] is not equal to INT_MIN
a number between array[i] and array[i+1], if array[i+1]-array[i]>1 (0<=i<=998). Notice that if the two previous tries have failed, then it is guaranteed that there is a number between two elements in your array.
Notice that this solution will also work for the 1002th, 1003th, and so on.
An attempt at a clumsy c# implementation
public class Test
{
public List<int> Sequence { get; set; }
public void GenerateFirstSequence()
{
Sequence = new List<int>();
for (var i = 0; i < 1000; i++)
{
var x = new Random().Next(0, int.MaxValue);
while (Sequence.Contains(x))
{
x = new Random().Next(0, int.MaxValue);
}
Sequence.Add(x);
}
}
public int GetNumberNotInSequence()
{
var number = Sequence.OrderBy(x => x).Max();
var mustRedefine = number == int.MaxValue && Sequence.Contains(number);
if (mustRedefine)
{
while (Sequence.Contains(number))
{
number = number - 1;
if (!Sequence.Contains(number))
return number;
}
}
return number + 1;
}
}
I have some thoughts on this problem:
You could create a hash table H, which contain 1000 elements. Suppose your array named A, and for each element, we have the reminder by 1000: m[i] = A[i] % 1000.
If there is a conflict between A[i] and A[j], that A[i] % 1000 = A[j] % 1000. That is to say, there must exist an index k, that no element's reminder by 1000 equals to k, then k is the number you are going to get.
If there is no conflict at all, just pick H[1] + 1000 as your result.
The complexity of this algorithm is O(l), in which l indicates the original list size, in the example, l = 1000
How do you print numbers of form 2^i * 5^j in increasing order.
For eg:
1, 2, 4, 5, 8, 10, 16, 20
This is actually a very interesting question, especially if you don't want this to be N^2 or NlogN complexity.
What I would do is the following:
Define a data structure containing 2 values (i and j) and the result of the formula.
Define a collection (e.g. std::vector) containing this data structures
Initialize the collection with the value (0,0) (the result is 1 in this case)
Now in a loop do the following:
Look in the collection and take the instance with the smallest value
Remove it from the collection
Print this out
Create 2 new instances based on the instance you just processed
In the first instance increment i
In the second instance increment j
Add both instances to the collection (if they aren't in the collection yet)
Loop until you had enough of it
The performance can be easily tweaked by choosing the right data structure and collection.
E.g. in C++, you could use an std::map, where the key is the result of the formula, and the value is the pair (i,j). Taking the smallest value is then just taking the first instance in the map (*map.begin()).
I quickly wrote the following application to illustrate it (it works!, but contains no further comments, sorry):
#include <math.h>
#include <map>
#include <iostream>
typedef __int64 Integer;
typedef std::pair<Integer,Integer> MyPair;
typedef std::map<Integer,MyPair> MyMap;
Integer result(const MyPair &myPair)
{
return pow((double)2,(double)myPair.first) * pow((double)5,(double)myPair.second);
}
int main()
{
MyMap myMap;
MyPair firstValue(0,0);
myMap[result(firstValue)] = firstValue;
while (true)
{
auto it=myMap.begin();
if (it->first < 0) break; // overflow
MyPair myPair = it->second;
std::cout << it->first << "= 2^" << myPair.first << "*5^" << myPair.second << std::endl;
myMap.erase(it);
MyPair pair1 = myPair;
++pair1.first;
myMap[result(pair1)] = pair1;
MyPair pair2 = myPair;
++pair2.second;
myMap[result(pair2)] = pair2;
}
}
This is well suited to a functional programming style. In F#:
let min (a,b)= if(a<b)then a else b;;
type stream (current, next)=
member this.current = current
member this.next():stream = next();;
let rec merge(a:stream,b:stream)=
if(a.current<b.current) then new stream(a.current, fun()->merge(a.next(),b))
else new stream(b.current, fun()->merge(a,b.next()));;
let rec Squares(start) = new stream(start,fun()->Squares(start*2));;
let rec AllPowers(start) = new stream(start,fun()->merge(Squares(start*2),AllPowers(start*5)));;
let Results = AllPowers(1);;
Works well with Results then being a stream type with current value and a next method.
Walking through it:
I define min for completenes.
I define a stream type to have a current value and a method to return a new string, essentially head and tail of a stream of numbers.
I define the function merge, which takes the smaller of the current values of two streams and then increments that stream. It then recurses to provide the rest of the stream. Essentially, given two streams which are in order, it will produce a new stream which is in order.
I define squares to be a stream increasing in powers of 2.
AllPowers takes the start value and merges the stream resulting from all squares at this number of powers of 5. it with the stream resulting from multiplying it by 5, since these are your only two options. You effectively are left with a tree of results
The result is merging more and more streams, so you merge the following streams
1, 2, 4, 8, 16, 32...
5, 10, 20, 40, 80, 160...
25, 50, 100, 200, 400...
.
.
.
Merging all of these turns out to be fairly efficient with tail recursio and compiler optimisations etc.
These could be printed to the console like this:
let rec PrintAll(s:stream)=
if (s.current > 0) then
do System.Console.WriteLine(s.current)
PrintAll(s.next());;
PrintAll(Results);
let v = System.Console.ReadLine();
Similar things could be done in any language which allows for recursion and passing functions as values (it's only a little more complex if you can't pass functions as variables).
For an O(N) solution, you can use a list of numbers found so far and two indexes: one representing the next number to be multiplied by 2, and the other the next number to be multiplied by 5. Then in each iteration you have two candidate values to choose the smaller one from.
In Python:
numbers = [1]
next_2 = 0
next_5 = 0
for i in xrange(100):
mult_2 = numbers[next_2]*2
mult_5 = numbers[next_5]*5
if mult_2 < mult_5:
next = mult_2
next_2 += 1
else:
next = mult_5
next_5 += 1
# The comparison here is to avoid appending duplicates
if next > numbers[-1]:
numbers.append(next)
print numbers
So we have two loops, one incrementing i and second one incrementing j starting both from zero, right? (multiply symbol is confusing in the title of the question)
You can do something very straightforward:
Add all items in an array
Sort the array
Or you need an other solution with more math analysys?
EDIT: More smart solution by leveraging similarity with Merge Sort problem
If we imagine infinite set of numbers of 2^i and 5^j as two independent streams/lists this problem looks very the same as well known Merge Sort problem.
So solution steps are:
Get two numbers one from the each of streams (of 2 and of 5)
Compare
Return smallest
get next number from the stream of the previously returned smallest
and that's it! ;)
PS: Complexity of Merge Sort always is O(n*log(n))
I visualize this problem as a matrix M where M(i,j) = 2^i * 5^j. This means that both the rows and columns are increasing.
Think about drawing a line through the entries in increasing order, clearly beginning at entry (1,1). As you visit entries, the row and column increasing conditions ensure that the shape formed by those cells will always be an integer partition (in English notation). Keep track of this partition (mu = (m1, m2, m3, ...) where mi is the number of smaller entries in row i -- hence m1 >= m2 >= ...). Then the only entries that you need to compare are those entries which can be added to the partition.
Here's a crude example. Suppose you've visited all the xs (mu = (5,3,3,1)), then you need only check the #s:
x x x x x #
x x x #
x x x
x #
#
Therefore the number of checks is the number of addable cells (equivalently the number of ways to go up in Bruhat order if you're of a mind to think in terms of posets).
Given a partition mu, it's easy to determine what the addable states are. Image an infinite string of 0s following the last positive entry. Then you can increase mi by 1 if and only if m(i-1) > mi.
Back to the example, for mu = (5,3,3,1) we can increase m1 (6,3,3,1) or m2 (5,4,3,1) or m4 (5,3,3,2) or m5 (5,3,3,1,1).
The solution to the problem then finds the correct sequence of partitions (saturated chain). In pseudocode:
mu = [1,0,0,...,0];
while (/* some terminate condition or go on forever */) {
minNext = 0;
nextCell = [];
// look through all addable cells
for (int i=0; i<mu.length; ++i) {
if (i==0 or mu[i-1]>mu[i]) {
// check for new minimum value
if (minNext == 0 or 2^i * 5^(mu[i]+1) < minNext) {
nextCell = i;
minNext = 2^i * 5^(mu[i]+1)
}
}
}
// print next largest entry and update mu
print(minNext);
mu[i]++;
}
I wrote this in Maple stopping after 12 iterations:
1, 2, 4, 5, 8, 10, 16, 20, 25, 32, 40, 50
and the outputted sequence of cells added and got this:
1 2 3 5 7 10
4 6 8 11
9 12
corresponding to this matrix representation:
1, 2, 4, 8, 16, 32...
5, 10, 20, 40, 80, 160...
25, 50, 100, 200, 400...
First of all, (as others mentioned already) this question is very vague!!!
Nevertheless, I am going to give a shot based on your vague equation and the pattern as your expected result. So I am not sure the following will be true for what you are trying to do, however it may give you some idea about java collections!
import java.util.List;
import java.util.ArrayList;
import java.util.SortedSet;
import java.util.TreeSet;
public class IncreasingNumbers {
private static List<Integer> findIncreasingNumbers(int maxIteration) {
SortedSet<Integer> numbers = new TreeSet<Integer>();
SortedSet<Integer> numbers2 = new TreeSet<Integer>();
for (int i=0;i < maxIteration;i++) {
int n1 = (int)Math.pow(2, i);
numbers.add(n1);
for (int j=0;j < maxIteration;j++) {
int n2 = (int)Math.pow(5, i);
numbers.add(n2);
for (Integer n: numbers) {
int n3 = n*n1;
numbers2.add(n3);
}
}
}
numbers.addAll(numbers2);
return new ArrayList<Integer>(numbers);
}
/**
* Based on the following fuzzy question # StackOverflow
* http://stackoverflow.com/questions/7571934/printing-numbers-of-the-form-2i-5j-in-increasing-order
*
*
* Result:
* 1 2 4 5 8 10 16 20 25 32 40 64 80 100 125 128 200 256 400 625 1000 2000 10000
*/
public static void main(String[] args) {
List<Integer> numbers = findIncreasingNumbers(5);
for (Integer i: numbers) {
System.out.print(i + " ");
}
}
}
If you can do it in O(nlogn), here's a simple solution:
Get an empty min-heap
Put 1 in the heap
while (you want to continue)
Get num from heap
print num
put num*2 and num*5 in the heap
There you have it. By min-heap, I mean min-heap
As a mathematician the first thing I always think about when looking at something like this is "will logarithms help?".
In this case it might.
If our series A is increasing then the series log(A) is also increasing. Since all terms of A are of the form 2^i.5^j then all members of the series log(A) are of the form i.log(2) + j.log(5)
We can then look at the series log(A)/log(2) which is also increasing and its elements are of the form i+j.(log(5)/log(2))
If we work out the i and j that generates the full ordered list for this last series (call it B) then that i and j will also generate the series A correctly.
This is just changing the nature of the problem but hopefully to one where it becomes easier to solve. At each step you can either increase i and decrease j or vice versa.
Looking at a few of the early changes you can make (which I will possibly refer to as transforms of i,j or just transorms) gives us some clues of where we are going.
Clearly increasing i by 1 will increase B by 1. However, given that log(5)/log(2) is approx 2.3 then increasing j by 1 while decreasing i by 2 will given an increase of just 0.3 . The problem then is at each stage finding the minimum possible increase in B for changes of i and j.
To do this I just kept a record as I increased of the most efficient transforms of i and j (ie what to add and subtract from each) to get the smallest possible increase in the series. Then applied whichever one was valid (ie making sure i and j don't go negative).
Since at each stage you can either decrease i or decrease j there are effectively two classes of transforms that can be checked individually. A new transform doesn't have to have the best overall score to be included in our future checks, just better than any other in its class.
To test my thougths I wrote a sort of program in LinqPad. Key things to note are that the Dump() method just outputs the object to screen and that the syntax/structure isn't valid for a real c# file. Converting it if you want to run it should be easy though.
Hopefully anything not explicitly explained will be understandable from the code.
void Main()
{
double C = Math.Log(5)/Math.Log(2);
int i = 0;
int j = 0;
int maxi = i;
int maxj = j;
List<int> outputList = new List<int>();
List<Transform> transforms = new List<Transform>();
outputList.Add(1);
while (outputList.Count<500)
{
Transform tr;
if (i==maxi)
{
//We haven't considered i this big before. Lets see if we can find an efficient transform by getting this many i and taking away some j.
maxi++;
tr = new Transform(maxi, (int)(-(maxi-maxi%C)/C), maxi%C);
AddIfWorthwhile(transforms, tr);
}
if (j==maxj)
{
//We haven't considered j this big before. Lets see if we can find an efficient transform by getting this many j and taking away some i.
maxj++;
tr = new Transform((int)(-(maxj*C)), maxj, (maxj*C)%1);
AddIfWorthwhile(transforms, tr);
}
//We have a set of transforms. We first find ones that are valid then order them by score and take the first (smallest) one.
Transform bestTransform = transforms.Where(x=>x.I>=-i && x.J >=-j).OrderBy(x=>x.Score).First();
//Apply transform
i+=bestTransform.I;
j+=bestTransform.J;
//output the next number in out list.
int value = GetValue(i,j);
//This line just gets it to stop when it overflows. I would have expected an exception but maybe LinqPad does magic with them?
if (value<0) break;
outputList.Add(value);
}
outputList.Dump();
}
public int GetValue(int i, int j)
{
return (int)(Math.Pow(2,i)*Math.Pow(5,j));
}
public void AddIfWorthwhile(List<Transform> list, Transform tr)
{
if (list.Where(x=>(x.Score<tr.Score && x.IncreaseI == tr.IncreaseI)).Count()==0)
{
list.Add(tr);
}
}
// Define other methods and classes here
public class Transform
{
public int I;
public int J;
public double Score;
public bool IncreaseI
{
get {return I>0;}
}
public Transform(int i, int j, double score)
{
I=i;
J=j;
Score=score;
}
}
I've not bothered looking at the efficiency of this but I strongly suspect its better than some other solutions because at each stage all I need to do is check my set of transforms - working out how many of these there are compared to "n" is non-trivial. It is clearly related since the further you go the more transforms there are but the number of new transforms becomes vanishingly small at higher numbers so maybe its just O(1). This O stuff always confused me though. ;-)
One advantage over other solutions is that it allows you to calculate i,j without needing to calculate the product allowing me to work out what the sequence would be without needing to calculate the actual number itself.
For what its worth after the first 230 nunmbers (when int runs out of space) I had 9 transforms to check each time. And given its only my total that overflowed I ran if for the first million results and got to i=5191 and j=354. The number of transforms was 23. The size of this number in the list is approximately 10^1810. Runtime to get to this level was approx 5 seconds.
P.S. If you like this answer please feel free to tell your friends since I spent ages on this and a few +1s would be nice compensation. Or in fact just comment to tell me what you think. :)
I'm sure everyone one's might have got the answer by now, but just wanted to give a direction to this solution..
It's a Ctrl C + Ctrl V from
http://www.careercup.com/question?id=16378662
void print(int N)
{
int arr[N];
arr[0] = 1;
int i = 0, j = 0, k = 1;
int numJ, numI;
int num;
for(int count = 1; count < N; )
{
numI = arr[i] * 2;
numJ = arr[j] * 5;
if(numI < numJ)
{
num = numI;
i++;
}
else
{
num = numJ;
j++;
}
if(num > arr[k-1])
{
arr[k] = num;
k++;
count++;
}
}
for(int counter = 0; counter < N; counter++)
{
printf("%d ", arr[counter]);
}
}
The question as put to me was to return an infinite set of solutions. I pondered the use of trees, but felt there was a problem with figuring out when to harvest and prune the tree, given an infinite number of values for i & j. I realized that a sieve algorithm could be used. Starting from zero, determine whether each positive integer had values for i and j. This was facilitated by turning answer = (2^i)*(2^j) around and solving for i instead. That gave me i = log2 (answer/ (5^j)). Here is the code:
class Program
{
static void Main(string[] args)
{
var startTime = DateTime.Now;
int potential = 0;
do
{
if (ExistsIandJ(potential))
Console.WriteLine("{0}", potential);
potential++;
} while (potential < 100000);
Console.WriteLine("Took {0} seconds", DateTime.Now.Subtract(startTime).TotalSeconds);
}
private static bool ExistsIandJ(int potential)
{
// potential = (2^i)*(5^j)
// 1 = (2^i)*(5^j)/potential
// 1/(2^1) = (5^j)/potential or (2^i) = potential / (5^j)
// i = log2 (potential / (5^j))
for (var j = 0; Math.Pow(5,j) <= potential; j++)
{
var i = Math.Log(potential / Math.Pow(5, j), 2);
if (i == Math.Truncate(i))
return true;
}
return false;
}
}
I came across this problem during an interview forum.,
Given an int array which might contain duplicates, find the largest subset of it which form a sequence.
Eg. {1,6,10,4,7,9,5}
then ans is 4,5,6,7
Sorting is an obvious solution. Can this be done in O(n) time.
My take on the problem is that this cannot be done O(n) time & the reason is that if we could do this in O(n) time we could do sorting in O(n) time also ( without knowing the upper bound).
As a random array can contain all the elements in sequence but in random order.
Does this sound a plausible explanation ? your thoughts.
I believe it can be solved in O(n) if you assume you have enough memory to allocate an uninitialized array of a size equal to the largest value, and that allocation can be done in constant time. The trick is to use a lazy array, which gives you the ability to create a set of items in linear time with a membership test in constant time.
Phase 1: Go through each item and add it to the lazy array.
Phase 2: Go through each undeleted item, and delete all contiguous items.
In phase 2, you determine the range and remember it if it is the largest so far. Items can be deleted in constant time using a doubly-linked list.
Here is some incredibly kludgy code that demonstrates the idea:
int main(int argc,char **argv)
{
static const int n = 8;
int values[n] = {1,6,10,4,7,9,5,5};
int index[n];
int lists[n];
int prev[n];
int next_existing[n]; //
int prev_existing[n];
int index_size = 0;
int n_lists = 0;
// Find largest value
int max_value = 0;
for (int i=0; i!=n; ++i) {
int v=values[i];
if (v>max_value) max_value=v;
}
// Allocate a lazy array
int *lazy = (int *)malloc((max_value+1)*sizeof(int));
// Set items in the lazy array and build the lists of indices for
// items with a particular value.
for (int i=0; i!=n; ++i) {
next_existing[i] = i+1;
prev_existing[i] = i-1;
int v = values[i];
int l = lazy[v];
if (l>=0 && l<index_size && index[l]==v) {
// already there, add it to the list
prev[n_lists] = lists[l];
lists[l] = n_lists++;
}
else {
// not there -- create a new list
l = index_size;
lazy[v] = l;
index[l] = v;
++index_size;
prev[n_lists] = -1;
lists[l] = n_lists++;
}
}
// Go through each contiguous range of values and delete them, determining
// what the range is.
int max_count = 0;
int max_begin = -1;
int max_end = -1;
int i = 0;
while (i<n) {
// Start by searching backwards for a value that isn't in the lazy array
int dir = -1;
int v_mid = values[i];
int v = v_mid;
int begin = -1;
for (;;) {
int l = lazy[v];
if (l<0 || l>=index_size || index[l]!=v) {
// Value not in the lazy array
if (dir==1) {
// Hit the end
if (v-begin>max_count) {
max_count = v-begin;
max_begin = begin;
max_end = v;
}
break;
}
// Hit the beginning
begin = v+1;
dir = 1;
v = v_mid+1;
}
else {
// Remove all the items with value v
int k = lists[l];
while (k>=0) {
if (k!=i) {
next_existing[prev_existing[l]] = next_existing[l];
prev_existing[next_existing[l]] = prev_existing[l];
}
k = prev[k];
}
v += dir;
}
}
// Go to the next existing item
i = next_existing[i];
}
// Print the largest range
for (int i=max_begin; i!=max_end; ++i) {
if (i!=max_begin) fprintf(stderr,",");
fprintf(stderr,"%d",i);
}
fprintf(stderr,"\n");
free(lazy);
}
I would say there are ways to do it. The algorithm is the one you already describe, but just use a O(n) sorting algorithm. As such exist for certain inputs (Bucket Sort, Radix Sort) this works (this also goes hand in hand with your argumentation why it should not work).
Vaughn Cato suggested implementation is working like this (its working like a bucket sort with the lazy array working as buckets-on-demand).
As shown by M. Ben-Or in Lower bounds for algebraic computation trees, Proc. 15th ACM Sympos. Theory Comput., pp. 80-86. 1983 cited by J. Erickson in pdf Finding Longest Arithmetic Progressions, this problem cannot be solved in less than O(n log n) time (even if the input is already sorted into order) when using an algebraic decision tree model of computation.
Earlier, I posted the following example in a comment to illustrate that sorting the numbers does not provide an easy answer to the question: Suppose the array is given already sorted into ascending order. For example, let it be (20 30 35 40 47 60 70 80 85 95 100). The longest sequence found in any subsequence of the input is 20,40,60,80,100 rather than 30,35,40 or 60,70,80.
Regarding whether an O(n) algebraic decision tree solution to this problem would provide an O(n) algebraic decision tree sorting method: As others have pointed out, a solution to this subsequence problem for a given multiset does not provide a solution to a sorting problem for that multiset. As an example, consider set {2,4,6,x,y,z}. The subsequence solver will give you the result (2,4,6) whenever x,y,z are large numbers not in arithmetic sequence, and it will tell you nothing about the order of x,y,z.
What about this? populate a hash-table so each value stores the start of the range seen so far for that number, except for the head element that stores the end of the range. O(n) time, O(n) space. A tentative Python implementation (you could do it with one traversal keeping some state variables, but this way seems more clear):
def longest_subset(xs):
table = {}
for x in xs:
start = table.get(x-1, x)
end = table.get(x+1, x)
if x+1 in table:
table[end] = start
if x-1 in table:
table[start] = end
table[x] = (start if x-1 in table else end)
start, end = max(table.items(), key=lambda pair: pair[1]-pair[0])
return list(range(start, end+1))
print(longest_subset([1, 6, 10, 4, 7, 9, 5]))
# [4, 5, 6, 7]
here is a un-optimized O(n) implementation, maybe you will find it useful:
hash_tb={}
A=[1,6,10,4,7,9,5]
for i in range(0,len(A)):
if not hash_tb.has_key(A[i]):
hash_tb[A[i]]=A[i]
max_sq=[];cur_seq=[]
for i in range(0,max(A)):
if hash_tb.has_key(i):
cur_seq.append(i)
else:
if len(cur_seq)>len(max_sq):
max_sq=cur_seq
cur_seq=[]
print max_sq
I got answer for the question, counting number of sets bits from here.
How to count the number of set bits in a 32-bit integer?
long count_bits(long n) {
unsigned int c; // c accumulates the total bits set in v
for (c = 0; n; c++)
n &= n - 1; // clear the least significant bit set
return c;
}
It is simple to understand also. And found the best answer as Brian Kernighans method, posted by hoyhoy... and he adds the following at the end.
Note that this is an question used during interviews. The interviewer will add the caveat that you have "infinite memory". In that case, you basically create an array of size 232 and fill in the bit counts for the numbers at each location. Then, this function becomes O(1).
Can somebody explain how to do this ? If i have infinite memory ...
The fastest way I have ever seen to populate such an array is ...
array[0] = 0;
for (i = 1; i < NELEMENTS; i++) {
array[i] = array[i >> 1] + (i & 1);
}
Then to count the number of set bits in a given number (provided the given number is less than NELEMENTS) ...
numSetBits = array[givenNumber];
If your memory is not finite, I often see NELEMENTS set to 256 (for one byte's worth) and add the number of set bits in each byte in your integer.
int counts[MAX_LONG];
void init() {
for (int i= 0; i < MAX_LONG; i++)
{
counts[i] = count_bits[i]; // as given
}
}
int count_bits_o1(long number)
{
return counts[number];
}
You can probably pre-populate the array more wiseley, i.e. fill with zeros, then every second index add one, then every fourth index add 1, then every eighth index add 1 etc, which might be a bit faster, although I doubt it...
Also, you might account for unsigned values.