hash algorithm for variable size boolean array [closed] - algorithm

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have some boolean arrays that their sizes are not constant, And I need a strong and fast hash algorithm to give minimum chance of hash collision for them.
My own idea was calculating the integer value of each boolean array but for example these 2 arrays will give same hash of 3:
[0 , 1, 1] and [1, 1]
I thought to multiply the size of array after calculating integer value, but this idea also sucks, because there is a high chance of hash collision.
Does anyone has a good idea?

You can insert a sentinel true element at the start of the array, then interpret the array as a binary number. This is a perfect hash (no collisions) for arrays with less than 32 elements. For larger arrays I suggest doing the arithmetic modulo a large prime less than 231.
Examples:
Array | Binary | Decimal
------------+--------+---------
[ 0, 1, 1 ] | 1011 | 11
[ 1, 1 ] | 111 | 7
This is the same as interpreting the array as a binary number, and then taking the bitwise OR with 1 << n where n is the size of the array.
Implementation:
int hash(int[] array)
{
int h = 1;
for (int i = 0; i < array.length; i++)
{
h = (h << 1) | array[i];
}
return h;
}
Note: This implementation only works well for arrays with less than 32 elements, because for larger arrays the calculation will overflow (assuming int is 32 bits) and the most significant bits will be completely discarded. This can be fixed by inserting h = h % ((1 << 31) - 1); before the end of the for-loop (the expression "(1 << 31) - 1" computes 231 - 1, which is prime).

My ideas:
Approach #1:
Calculate the first 2n prime numbers, where n is the length of the array.
Let hash = 1.
For i = 0 to n: If a bit at position i is 1, multiply hash by the 2ith and 2i + 1st prime. If it's 0, multiply it by the 2ith one only.
Approach #2:
Treat the binary arrays as ternary. Bit is 0 => ternary digit is 0; bit is 1 => ternary digit is 1; bit is not present => ternary digit is 2 (this former works because the array has a maximal possible length).
Calculate the ternary number using this substitution - the result will be unique.
Here's some code showing the implementation of these algorithms in C++ and a test program which generates hashes for every boolean array of length 0...18. I use the C++11 class std::unordered_map so that each hash is uniqued. Thus, if we don't have any duplicates (i. e. if the hash function is perfect), we should get 2 ^ 19 - 1 elements in the set, which we do (I had to change the integers to unsigned long long on IDEone, else the hashes weren't perfect - I suspect this has to do with 32 vs. 64 bit architectures):
#include <unordered_set>
#include <iostream>
#define MAX_LEN 18
unsigned long prime_hash(const unsigned int *arr, size_t len)
{
/* first 2 * MAX_LEN primes */
static const unsigned long p[2 * MAX_LEN] = {
2, 3, 5, 7, 11, 13, 17, 19, 23,
29, 31, 37, 41, 43, 47, 53, 59, 61,
67, 71, 73, 79, 83, 89, 97, 101, 103,
107, 109, 113, 127, 131, 137, 139, 149, 151
};
unsigned long h = 1;
for (size_t i = 0; i < len; i++)
h *= p[2 * i] * (arr[i] ? p[2 * i + 1] : 1);
return h;
}
unsigned long ternary_hash(const unsigned int *arr, size_t len)
{
static const unsigned long p3[MAX_LEN] = {
1, 3, 9, 27,
81, 243, 729, 2187,
6561, 19683, 59049, 177147,
531441, 1594323, 4782969, 14348907,
43046721, 129140163
};
unsigned long h = 0;
for (size_t i = 0; i < len; i++)
if (arr[i])
h += p3[i];
for (size_t i = len; i < MAX_LEN; i++)
h += 2 * p3[i];
return h;
}
void int2barr(unsigned int *dst, unsigned long n, size_t len)
{
for (size_t i = 0; i < len; i++) {
dst[i] = n & 1;
n >>= 1;
}
}
int main()
{
std::unordered_set<unsigned long> phashes, thashes;
/* generate all possible bool-arrays from length 0 to length 18 */
/* first, we checksum the only 0-element array */
phashes.insert(prime_hash(NULL, 0));
thashes.insert(ternary_hash(NULL, 0));
/* then we checksum the arrays of length 1...18 */
for (size_t len = 1; len <= MAX_LEN; len++) {
unsigned int bits[len];
for (unsigned long i = 0; i < (1 << len); i++) {
int2barr(bits, i, len);
phashes.insert(prime_hash(bits, len));
thashes.insert(ternary_hash(bits, len));
}
}
std::cout << "prime hashes: " << phashes.size() << std::endl;
std::cout << "ternary hashes: " << thashes.size() << std::endl;
return 0;
}

A simple an efficient hashcode is replacing 0 and 1 with prime numbers and do the usual shift-accumulator loop:
hash=0
for (bits in list):
hash = hash*31 + 2*bit + 3
return hash
Here 0 is treated as 3 and 1 is treated as 5, so that leading zeros are not ignored. The multiplication by 31 makes sure that order matters. This isn't cryptographically strong though: given a hash code for a short sequence it's simple arithmetic to reverse it.

Related

find number that does not repeat in O(n) time O(1) space

for starters, I did have a look at these questions:
Given an array of integers where some numbers repeat 1 time, some numbers repeat 2 times and only one number repeats 3 times, how do you find the number that repeat 3 times
Algorithm to find two repeated numbers in an array, without sorting
this one different:
given an unsorted array of integers with one unique number and the rest numbers repeat 3 times,
i.e.:
{4,5,3, 5,3,4, 1, 4,3,5 }
we need to find this unique number in O(n) time and O(1) space
NOTE: this is not a homework, just I an nice question I came across
What about this one:
Idea: do bitwise addition mod 3
#include <stdio.h>
int main() {
int a[] = { 1, 9, 9, 556, 556, 9, 556, 87878, 87878, 87878 };
int n = sizeof(a) / sizeof(int);
int low = 0, up = 0;
for(int i = 0; i < n; i++) {
int x = ~(up & a[i]);
up &= x;
x &= a[i];
up |= (x & low);
low ^= x;
}
printf("single no: %d\n", low);
}
This solution works for all inputs.
The idea is to extract the bits of an integer from array and add to respective 32bit
bitmap 'b' (implemented as 32byte array to represent 32bit no.)
unsigned int a[7] = {5,5,4,10,4,9,9};
unsigned int b[32] = {0}; //Start with zeros for a 32bit no.
main1() {
int i, j;
unsigned int bit, sum =0 ;
for (i=0;i<7; i++) {
for (j=0; j<32; j++) { //This loop can be optimized!!!!
bit = ((a[i] & (0x01<<j))>>j); //extract the bit and move to right place
b[j] += bit; //add to the bitmap array
}
}
for (j=0; j<32; j++) {
b[j] %= 2; //No. repeating exactly 2 times.
if (b[j] == 1) {
sum += (unsigned int) pow(2, j); //sum all the digits left as 1 to get no
//printf("no. is %d", sum);
}
}
printf("no. is %d", sum);
}

minimal positive number divisible to N

1<=N<=1000
How to find the minimal positive number, that is divisible by N, and its digit sum should be equal to N.
For example:
N:Result
1:1
10:190
And the algorithm shouldn't take more than 2 seconds.
Any ideas(Pseudocode,pascal,c++ or java) ?
Let f(len, sum, mod) be a bool, meaning we can build a number(maybe with leading zeros), that has length len+1, sum of digits equal to sum and gives mod when diving by n.
Then f(len, sum, mod) = or (f(len-1, sum-i, mod- i*10^len), for i from 0 to 9). Then you can find minimal l, that f(l, n, n) is true. After that just find first digit, then second and so on.
#define FOR(i, a, b) for(int i = a; i < b; ++i)
#define REP(i, N) FOR(i, 0, N)
#define FILL(a,v) memset(a,v,sizeof(a))
const int maxlen = 120;
const int maxn = 1000;
int st[maxlen];
int n;
bool can[maxlen][maxn+1][maxn+1];
bool was[maxlen][maxn+1][maxn+1];
bool f(int l, int s, int m)
{
m = m%n;
if(m<0)
m += n;
if(s == 0)
return (m == 0);
if(s<0)
return false;
if(l<0)
return false;
if(was[l][s][m])
return can[l][s][m];
was[l][s][m] = true;
can[l][s][m] = false;
REP(i,10)
if(f(l-1, s-i, m - st[l]*i))
{
can[l][s][m] = true;
return true;
}
return false;
}
string build(int l, int s, int m)
{
if(l<0)
return "";
m = m%n;
if(m<0)
m += n;
REP(i,10)
if(f(l-1, s-i, m - st[l]*i))
{
return char('0'+i) + build(l-1, s-i, m - st[l]*i);
}
return "";
}
int main(int argc, char** argv)
{
ios_base::sync_with_stdio(false);
cin>>n;
FILL(was, false);
st[0] = 1;
FOR(i, 1, maxlen)
st[i] = (st[i-1]*10)%n;
int l = -1;
REP(i, maxlen)
if(f(i, n, n))
{
cout<<build(i,n,n)<<endl;
break;
}
return 0;
}
NOTE that this uses ~250 mb of memory.
EDIT: I found a test where this solution runs, a bit too long. 999, takes almost 5s.
Update: I understood that the result was supposed to be between 0 and 1000, but no. With larger inputs the naïve algorithm can take a considerable amount of time. The output for 80 would be 29999998880.
You don't need a fancy algorithm. A loop that checks your condition for 1000 numbers will take less than 2 seconds on any reasonably modern computer, even in interpreted languages.
If you want to make it smart, you only need to check numbers that are multiples of N. To further restrict the search space, the remainders of N and the result have to be equal when divided by 9. This means that now you have to check only one number in every 9N.
Sure, pseudo-code, since it smells like homework :-)
def findNum (n):
testnum = n
while testnum <= 1000:
tempnum = testnum
sum = 0
while tempnum > 0:
sum = sum + (tempnum mod 10)
tempnum = int (tempnum / 10)
if sum == n:
return testnum
testnum = testnum + n
return -1
It takes about 15 thousandths of a second when translated to Python so well under your two-second threshold. It works by basically testing every multiple of N less than or equal to 1000.
The test runs through each digit in the number adding it to a sum then, if that sum matches N, it returns the number. If no number qualifies, it returns -1.
As test cases, I used:
n findNum(n) Justification
== ========== =============
1 1 1 = 1 * 1, 1 = 1
10 190 190 = 19 * 10, 10 = 1 + 9 + 0
13 247 247 = 13 * 19, 13 = 2 + 4 + 7
17 476 476 = 17 * 28, 17 = 4 + 7 + 6
99 -1 none needed
Now that only checks multiples up to 1000 as opposed to checking all numbers but checking all numbers starts to take much more than two seconds, no matter what language you use. You may be able to find a faster algorithm but I'd like to suggest something else.
You will probably not find a faster algorithm than what it would take to simply look up the values in a table. So, I would simply run a program once to generate output along the lines of:
int numberDesired[] = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
190, 209, 48, 247, 266, 195, 448, 476, 198, 874,
...
-1, -1};
and then just plug that into a new program so that it can use a blindingly fast lookup.
For example, you could do that with some Python like:
print "int numberDesired[] = {"
for i in range (0, 10):
s = " /* %4d-%4d */"%(i*10,i*10+9)
for j in range (0, 10):
s = "%s %d,"%(s,findNum(i*10+j))
print s
print "};"
which generates:
int numberDesired[] = {
/* 0- 9 */ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
/* 10- 19 */ 190, 209, 48, 247, 266, 195, 448, 476, 198, 874,
/* 20- 29 */ 3980, 399, 2398, 1679, 888, 4975, 1898, 999, 7588, 4988,
/* 30- 39 */ 39990, 8959, 17888, 42999, 28798, 57995, 29988, 37999, 59888, 49998,
/* 40- 49 */ 699880, 177899, 88998, 99889, 479996, 499995, 589996, 686999, 699888, 788998,
/* 50- 59 */ 9999950, 889899, 1989988, 2989889, 1999998, 60989995, 7979888, 5899899, 8988898, 8888999,
/* 60- 69 */ 79999980, 9998998, 19999898, 19899999, 59989888, 69999995, 67999998, 58999999, 99899888, 79899999,
:
};
That will take a lot longer than two seconds, but here's the thing: you only have to run it once, then cut and paste the table into your code. Once you have the table, it will most likely blow away any algorithmic solution.
The maximum digit sum you have to worry about is 1000. Since 1000 / 9 = ~100 This is actually not a lot, so I think the following should work:
Consider the following data structure:
entry { int r, sum, prev, lastDigit; }
Hold a queue of entry where initially you have r = 1 mod N, sum = 1, prev = -1, lastDigit = 1; r = 2 mod N, sum = 2, prev = -1, lastDigit = 2 etc.
When you extract an entry x from the queue:
y = new entry
for i = 0 to 9 do
y.r = (x.r * 10 + i) % N
y.sum = x.sum + i
y.prev = <position of x in the queue>
y.lastDigit = i
if y.r == 0 and y.sum == N
// you found your multiple: use the prev and lastDigit entries to rebuild it
if y.sum < N then
queue.add(y)
This is basically a BFS on the digits. Since the maximum sum you care about is small, this should be pretty efficient.
After thinking about it a bit, I think I have found the expected answer.
Think of it as a graph. For any number, you can make new number by multiplying that number by 10 and adding any of the digits 0-9. You will need to use BFS to reach the smallest number first.
For every node maintain sum and remainder. Using these values you can move to the adjacent nodes, also these values will help you avoid reaching useless states again and again. To print the number, you can use these values to trace your steps.
Complexity is O(n^2), in worst case table is completely filled. (See code)
Note : Code takes number of test cases first. Works under 0.3s for n<=1000.
[Edit] : Ac on spoj in 6.54s. Test files have 50 numbers.
#include<cstdio>
#include<queue>
#include<cstring>
using namespace std;
#define F first
#define S second
#define N 1100
#define mp make_pair
queue<pair<int, int> >Q;
short sumTrace[N][N], mulTrace[N][N];
void print(int sum, int mul) {
if (sumTrace[sum][mul] == 42)return;
print(sum-sumTrace[sum][mul], mulTrace[sum][mul]);
printf("%d",sumTrace[sum][mul]);
}
void solve(int n) {
Q.push(mp(0,0));
sumTrace[0][0]=42; // any number greater than 9
while (1) {
int sum = Q.front().F;
int mul = Q.front().S;
if (sum == n && mul == 0) break;
Q.pop();
for (int i=0; i<10; i++) {
int nsum = sum+i;
if (nsum > n)break;
int nmul = (mul*10+i)%n;
if (sumTrace[nsum][nmul] == -1) {
Q.push(mp(nsum, nmul));
sumTrace[nsum][nmul] = i;
mulTrace[nsum][nmul] = mul;
}
}
}
print(n,0);
while(!Q.empty())Q.pop();
}
int main() {
int t;
scanf("%d", &t);
while (t--) {
int n;
scanf("%d", &n);
memset(sumTrace, -1, sizeof sumTrace);
solve(n);
printf("\n");
}
return 0;
}

Better Solution to Project Euler #36?

Project Euler problem 36 states:
The decimal number, 585 = 1001001001 (binary), is palindromic in both bases.
Find the sum of all numbers, less than one million, which are palindromic in base 10 and base 2.
(Please note that the palindromic number, in either base, may not include leading zeros.)
There is already a solution to this on stack overflow, but I want a more efficient solution.
For example, since the palindrome cannot have leading 0's, no even numbers need to be checked, only odd numbers for which the last bit in binary is a 1. This simple observation already speeds up the brute force "check every number in the range" by a factor of 2.
But I would like to be more clever than that. Ideally, I would like an algorithm with running time proportional to the number of numbers in the sum. I don't think it's possible to do better than that. But maybe that is not possible. Could we for example, generate all palindromic decimal numbers less than one million in time proportional to the number of decimal numbers satisfying that property? (I think the answer is yes).
What is the most efficient algorithm to solve this palindrome sum problem? I would like to consider run-times parameterized by N: the size of the range of numbers (in this case 1 million), D: the set of decimal palindromes in the range, and B: the set of binary palindromes in the range. I hope for a run-time that is o(N) + O( |D intersect B| ), or failing that, O(min(|D|, |B|))
Note: The sequences of binary and decimal palindromes are well known.
e.g. binary palindromes < 100: 0, 1, 3, 5, 7, 9, 15, 17, 21, 27, 31, 33, 45, 51, 63, 65, 73, 85, 93, 99
. . .decimal palindromes < 100:0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 22, 33, 44, 55, 66, 77, 88, 99,
palindromes in both bases: 0, 1, 3, 5, 7, 9, 33, 99
The binary representations of 33 and 99 are 10001 and 1100011 respectively.
The next number which is a palindrome in both is 585 = 1001001001.
The number of palindromes in base b of length 2*k is (b-1)*b^(k-1), as is the number of palindromes of length 2*k-1. So the number of palindromes not exceeding N in any base is O(sqrt(N))¹. So if you generate all palindromes (not exceeding N) in one base and check if they are also palindromes in the other base, you have an O(sqrt(N)*log(N)) algorithm (the log factor comes from the palindrome check). That's o(N), but I don't know yet if it's also O(|D intersect B|).
It's not O(|D intersect B|) :( There are only 32 numbers up to 1010 which are palindromic in both bases. I don't see any pattern that would allow constructing only those.
¹ If N has d digits (in base b), the number of palindromes not exceeding N is between the number of palindromes having at most d-1 digits and the number of palindromes having at most d digits (both limits inclusive). There are (b-1)*b^(k-1) numbers having exactly k digits (in base b), of which (b-1)*b^(floor((k-1)/2))) are palindromes. Summing gives the number of base-b palindromes with at most k digits as either 2*(b^(k/2)-1) (if k is even) or (b-1)*b^((k-1)/2) + 2*(b^((k-1)/2)-1) (if k is odd). Hence, give or take a factor of 2*b, the number of palindromes with at most d digits is b^(d/2). Thus the number of palindromes not exceeding N is roughly N^0.5, with a factor bounded by a multiple of the base considered.
Consider that there are only about 2000 decimal palindromes between 1 and 1000000. From 1 to 999, you can string the number and its reverse together, both with and without duplicating the "middle" digit (the last digit of the left part). For each of those, you check whether it's also a binary palindrome, and if it is, it's part of the sum.
(not an answer to your question but a cute recursive bit-fiddling solution to Project Euler 36)
This may not be the most efficient algorithm but I like how it looks like. I wrote it after reading Daniel Fischer's answer, suggesting to generate all palindromes in one base and then checking in the other base if it's a palindrome too.
In 18 lines of code (including the brackets) it generates all the palindromes in base 2 and then checks if they're also palindrome in base 10.
Takes about 6 ms on my system.
This can probably be optimized (too many bitshifting operation to my taste, there's probably quite some unnecessary junk here) and there may be a better algo, but I "like" (+) the look of my code ; )
#Test
public void testProjectEuler36() {
final int v = rec(1, 1);
assertEquals( 872187, v );
}
public static int rec( final int n, final int z ) {
if ( n > 1000000 )
return 0;
if ( z % 2 == 0 ) {
final int l = n << 1 & -1 << z / 2 + 1;
final int r = n & -1 >>> 32 - z / 2;
return v(n) + rec( l | 1 << z / 2 | r, z + 1 ) + rec( l | r, z + 1 );
} else
return v(n) + rec( n << 1 & -1 << z / 2 + 1 | n & -1 >>> 31 - z / 2, z + 1 );
}
public static int v( final int n ) {
final String s = "" + n;
boolean ok = true;
for ( int j = s.length(), i = j / 2 - 1; i >= 0 && ok; i--)
ok = s.charAt(i) == s.charAt(j-(i+1));
return ok ? n : 0;
}
My first assumptions were entirely wrong, so I've fixed them up. I've provided two algorithms, one iterative and one recursive. They're obviously nowhere near as impressive and efficient as user988052's, but they're definitely easier to read! The first algorithm is iterative and has a runtime of 9ms. The second algorithm is recursive and has a runtime of 16ms. Although the second solution is cleaner, the recursive calls might be slowing it down.
First Algorithm (9ms):
/** Given half a palindrome, construct the rest of the palindrome with
* an optional string inserted in the middle. The returned string is
* only guaranteed to be a palindrome if 'mid' is empty or a palindrome. */
public static String getPalindrome(String bin_part, String mid) {
return bin_part + mid + (new StringBuilder(bin_part)).reverse();
}
/** Check if the specified string is a palindrome. */
public static boolean isPalindrome(String p) {
for (int i=0; i<p.length()/2; i++)
if (p.charAt(i) != p.charAt(p.length()-1-i))
return false;
return true;
}
public static void main(String[] args) {
String[] mids = {"0","1"};
long total = 0;
boolean longDone = false; // have the numbers with extra digits been tested
long start = System.currentTimeMillis();
for (long i=0; i<1000; i++) {
String bin_part = Long.toBinaryString(i);
String bin = getPalindrome(bin_part, "");
long dec = Long.valueOf(bin, 2);
if (dec >= 1000000) break; // totally done
if (isPalindrome(Long.toString(dec)))
total += dec;
if (!longDone) {
for (int m=0; m<mids.length; m++) {
bin = getPalindrome(bin_part, mids[m]);
dec = Long.valueOf(bin, 2);
if (dec >= 1000000) {
longDone = true;
break;
}
if (isPalindrome(Long.toString(dec)))
total += dec;
}
}
}
long end = System.currentTimeMillis();
System.out.println("Total: " + total + " in " + (end-start) + " ms");
}
Second Algorithm (16ms)
public long total = 0;
public long max_value = 1000000;
public long runtime = -1;
public static boolean isPalindrome(String s) {
for (int i=0; i<s.length()/2; i++)
if (s.charAt(i) != s.charAt(s.length()-1-i))
return false;
return true;
}
public void gen(String bin, boolean done) {
if (done) { // generated a valid binary number
// check current value and add to total if possible
long val = Long.valueOf(bin, 2);
if (val >= max_value)
return;
if (isPalindrome(Long.toString(val))) {
total += val;
}
// generate next value
gen('1' + bin + '1', true);
gen('0' + bin + '0', false);
} else { // generated invalid binary number (contains leading and trailing zero)
if (Long.valueOf('1' + bin + '1', 2) < max_value) {
gen('1' + bin + '1', true);
gen('0' + bin + '0', false);
}
}
}
public void start() {
total = 0;
runtime = -1;
long start = System.currentTimeMillis();
gen("",false);
gen("1",true);
gen("0",false);
long end = System.currentTimeMillis();
runtime = end - start;
}
public static void main(String[] args) {
Palindromes2 p = new Palindromes2();
p.start();
System.out.println("Total: " + p.total + " in " + p.runtime + " ms.");
}
Here is the best Python Implementation to solve this problem :
sum = 0
for i in range(1000000):
bina = int(str(bin(i)).replace('0b',''))
if(i==int(str(i)[::-1]))or(bina==int(str(bina)[::-1])):
#print("i : "+str(i))
#print("bina : "+str(bina))
sum+=i
print("Sum of numbers : ",sum)

Unlucky Numbers

Unlucky Numbers (NOT HOMEWORK)
There are few numbers considered to be unlucky(It contains only 4 and 7. ). Our goal is to find count of such numbers in the range of positive integers a and b.
For Example:
Input : a = 10 b = 20
Output : 0
Input : a = 30 b = 50
Output : 2 (44, 47)
Below is the Code I tried out using a static array approach, wherein I calculate all possible unlucky numbers for a 32-bit integer initially. This is done in O(n) and later a sequential scan helps obtain the count which is again an O(n) operation. Is there a better approach to solve this without the help of a static array ?
#define MAX_UNLUCKY 1022
static int unlucky[MAX_UNLUCKY];
int main(int argc, char **argv) {
int i, j, k;
int a, b, factor;
printf("Enter the numbers : \n");
scanf("%d",&a);
scanf("%d",&b);
unlucky[0] = 4;
unlucky[1] = 7;
factor = 10;
k = 1;
for(i = 2; i < MAX_UNLUCKY; ++i)
unlucky[i] = unlucky[(i >> 1) - 1]*factor + unlucky[k ^= 1];
for (i = 0; i < MAX_UNLUCKY;++i)
if (unlucky[i] > a) break;
for (k = i; k < MAX_UNLUCKY;++k) {
if (unlucky[k] > b) break;
printf("Unlukcy numbers = %d\n", unlucky[k]);
}
printf ("Total Number of Unlucky numbers in this range is %d\n", k-i);
return (0);
}
Consider the following:
How many numbers are there between
0x100 and 0x111?
100,101,110,111 ( 4 = 0x111 - 0x100 + 1 )
That's exactly how many unlucky numbers there are between 744 and 777 (744,747,774,777).
Now:
700 and 800 have the same number of unlucky numbers between them as 744 and 777.
744 is the smallest unlucky number greater than 700 and 777 is the greatest unlucky number smaller than 800.
No need to generate numbers, just substraction.
For cases like a = 10, b = 800, first find your number for 10-100 and then 100-800 (because you'll be counting some numbers twice):
For 10-100:
a = 44
b = 77
0x11 - 0x00 = 3 + 1 = 4 ( 44,47,74,77 )
For 100-800:
a = 444
b = 777
0x111 - 0x000 = 7 + 1 = 8 ( 444, 447, 474, 477, 744, 747, 774, 777 )
So between 10 and 800: 4+8 = 12 numbers, which is also correct.
This is also O(1) time & space if you find the auxiliary numbers efficiently, which shouldn't be too hard...

Find set of numbers in one collection that adds up to a number in another

For a game I'm making I have a situation where I have a list of numbers – say [7, 4, 9, 1, 15, 2] (named A for this) – and another list of numbers – say [11, 18, 14, 8, 3] (named B) – provided to me. The goal is to find all combinations of numbers in A that add up to a number in B. For example:
1 + 2 = 3
1 + 7 = 8
2 + 9 = 11
4 + 7 = 11
1 + 2 + 4 + 7 = 14
1 + 2 + 15 = 18
2 + 7 + 9 = 18
...and so on. (For purposes of this, 1 + 2 is the same as 2 + 1.)
For small lists like this, it's trivial to just brute-force the combinations, but I'm facing the possibility of seeing thousands to tens of thousands of these numbers and will be using this routine repeatedly over the lifespan of the application. Is there any kind of elegant algorithm available to accomplish this in reasonable time with 100% coverage? Failing this, is there any kind of decent heuristics I can find that can give me a "good enough" set of combinations in a reasonable amount of time?
I'm looking for an algorithm in pseudo-code or in any decently popular and readable language (note the "and" there....;) or even just an English description of how one would go about implementing this kind of search.
Edited to add:
Lots of good information provided so far. Thanks guy! Summarizing for now:
The problem is NP-Complete so there is no way short of brute force to get 100% accuracy in reasonable time.
The problem can be viewed as a variant of either the subset sum or knapsack problems. There are well-known heuristics for both which may be adaptable to this problem.
Keep the ideas coming! And thanks again!
This problem is NP-Complete... This is some variation of the sub-set sum problem which is known to be NP-Complete (actually, the sub-set sum problem is easier than yours).
Read here for more information:
http://en.wikipedia.org/wiki/Subset_sum_problem
As said in the comments with numbers ranging only from 1 to 30 the problem has a fast solution. I tested it in C and for your given example it only needs miliseconds and will scale very well. The complexity is O(n+k) where n is length of list A and k the length of list B, but with a high constant factor (there are 28.598 possibilites to get a sum from 1 to 30).
#define WIDTH 30000
#define MAXNUMBER 30
int create_combination(unsigned char comb[WIDTH][MAXNUMBER+1],
int n,
unsigned char i,
unsigned char len,
unsigned char min,
unsigned char sum) {
unsigned char j;
if (len == 1) {
if (n+1>=WIDTH) {
printf("not enough space!\n");
exit(-1);
}
comb[n][i] = sum;
for (j=0; j<=i; j++)
comb[n+1][j] = comb[n][j];
n++;
return n;
}
for (j=min; j<=sum/len; j++) {
comb[n][i] = j;
n = create_combination(comb, n, i+1, len-1, j, sum-j);
}
return n;
}
int main(void)
{
unsigned char A[6] = { 7, 4, 9, 1, 15, 2 };
unsigned char B[5] = { 11, 18, 14, 8, 3 };
unsigned char combinations[WIDTH][MAXNUMBER+1];
unsigned char needed[WIDTH][MAXNUMBER];
unsigned char numbers[MAXNUMBER];
unsigned char sums[MAXNUMBER];
unsigned char i, j, k;
int n=0, m;
memset(combinations, 0, sizeof combinations);
memset(needed, 0, sizeof needed);
memset(numbers, 0, sizeof numbers);
memset(sums, 0, sizeof sums);
// create array with all possible combinations
// combinations[n][0] stores the sum
for (i=2; i<=MAXNUMBER; i++) {
for (j=2; j<=i; j++) {
for (k=1; k<=MAXNUMBER; k++)
combinations[n][k] = 0;
combinations[n][0] = i;
n = create_combination(combinations, n, 1, j, 1, i);
}
}
// count quantity of any summands in each combination
for (m=0; m<n; m++)
for (i=1; i<=MAXNUMBER && combinations[m][i] != 0; i++)
needed[m][combinations[m][i]-1]++;
// count quantity of any number in A
for (m=0; m<6; m++)
if (numbers[A[m]-1] < MAXNUMBER)
numbers[A[m]-1]++;
// collect possible sums from B
for (m=0; m<5; m++)
sums[B[m]-1] = 1;
for (m=0; m<n; m++) {
// check if sum is in B
if (sums[combinations[m][0]-1] == 0)
continue;
// check if enough summands from current combination are in A
for (i=0; i<MAXNUMBER; i++) {
if (numbers[i] < needed[m][i])
break;
}
if (i<MAXNUMBER)
continue;
// output result
for (j=1; j<=MAXNUMBER && combinations[m][j] != 0; j++) {
printf(" %s %d", j>1 ? "+" : "", combinations[m][j]);
}
printf(" = %d\n", combinations[m][0]);
}
return 0;
}
1 + 2 = 3
1 + 7 = 8
2 + 9 = 11
4 + 7 = 11
1 + 4 + 9 = 14
1 + 2 + 4 + 7 = 14
1 + 2 + 15 = 18
2 + 7 + 9 = 18
Sounds like a Knapsack problem (see http://en.wikipedia.org/wiki/Knapsack_problem. On that page they also explain that the problem is NP-complete in general.
I think this means that if you want to find ALL valid combinations, you just need a lot of time.
This is a small generalization of the subset sum problem. In general, it is NP-complete, but as long as all the numbers are integers and the maximum number in B is relatively small, the pseudo-polynomial solution described in the Wikipedia article I linked should do the trick.

Resources