Generate all n - bit sequences with all unique k-bit subsequences.

Generate all n - bit sequences with all unique k-bit subsequences. - algorithm

I found this problem:
Consider sequences of 36 bits. Each such sequence has 32 5 - bit
sequences consisting of adjacent bits. For example, the sequence
1101011… contains the 5 - bit sequences 11010,10101,01011,…. Write a
program that prints all 36 - bit sequences with the two properties:
1.The first 5 bits of the sequence are 00000.
2. No two 5 - bit subsequences are the same.
So I generalized to find all n-bit sequences with k - bit unique subsequences satisfy the above requirements. However, the only approach I can think of is using a brutal force search: generate all permutations of n-bit sequence with the first k bits zero, then for each sequence, check if all k-bit subsequences are unique. This apparently is not a very efficient approach. I am wondering is there a better way to solve the problem?
Thanks.

The simplest approach seems to be a backtracking approach. You can keep track of which 5-bit sequences you've seen with a flat array. At each bit, try adding 0 -- counter = (counter & 0x0f) << 1 and check if you've seen that before, then do a counter = counter | 1 and try that path.
There are probably more efficient algorithms that can prune the search space faster. This seems related to https://en.wikipedia.org/wiki/De_Bruijn_sequence. I am not certain, but I believe that it is actually equivalent; that is, the last five digits of the sequence will have to be 10000, making it cyclic.
EDIT: here's some c code. Less efficient than it could be in terms of space, because of the recursion, but simple. The worst bit is the mask management. It appears I was correct about De Bruijn sequences; this finds all 2048 of them.
#include <stdio.h>
#include <stdlib.h>
char *binprint(int val) {
static char res[33];
int i;
res[32] = 0;
for (i = 0; i < 32; i++) {
res[31 - i] = (val & 1) + '0';
val = val >> 1;
}
return res;
}
void checkPoint(int mask, int counter) {
// Get the appropriate bit in the mask
int idxmask = 1 << (counter & 0x1f);
// Abort if we've seen this suffix before
if (mask & idxmask) {
return;
}
// Update the mask
mask = mask | idxmask;
// We're done if we've hit all 32
if (mask == 0xffffffff) {
printf("%10u 0000%s\n", counter, binprint(counter));
return;
}
checkPoint(mask, counter << 1);
checkPoint(mask, (counter << 1) | 1);
}
void main(int argc, char *argv[]) {
checkPoint(0, 0);
}

I remember seeing this exact problem in a programming interview questions book of mine. Here is their solution:
hope it helps. cheers.

Related

Covering segments by points

I did search and looked at these below links but it didn't help .
Point covering problem
Segments poked (covered) with points - any tricky test cases?
Need effective greedy for covering a line segment
Problem Description:
You are given a set of segments on a line and your goal is to mark as
few points on a line as possible so that each segment contains at least
one marked point
Task.
Given a set of n segments {[a0,b0],[a1,b1]....[an-1,bn-1]} with integer
coordinates on a line, find the minimum number 'm' of points such that
each segment contains at least one point .That is, find a set of
integers X of the minimum size such that for any segment [ai,bi] there
is a point x belongs X such that ai <= x <= bi
Output Description:
Output the minimum number m of points on the first line and the integer
coordinates of m points (separated by spaces) on the second line
Sample Input - I
3
1 3
2 5
3 6
Output - I
1
3
Sample Input - II
4
4 7
1 3
2 5
5 6
Output - II
2
3 6
I didn't understand the question itself. I need the explanation, on how to solve this above problem, but i don't want the code. Examples would be greatly helpful

Maybe this formulation of the problem will be easier to understand. You have n people who can each tolerate a different range of temperatures [ai, bi]. You want to find the minimum number of rooms to make them all happy, i.e. you can set each room to a certain temperature so that each person can find a room within his/her temperature range.
As for how to solve the problem, you said you didn't want code, so I'll just roughly describe an approach. Think about the coldest room you have. If making it one degree warmer won't cause anyone to no longer be able to tolerate that room, you might as well make the increase, since that can only allow more people to use that room. So the first temperature you should set is the warmest one that the most cold-loving person can still tolerate. In other words, it should be the smallest of the bi. Now this room will satisfy some subset of your people, so you can remove them from consideration. Then repeat the process on the remaining people.
Now, to implement this efficiently, you might not want to literally do what I said above. I suggest sorting the people according to bi first, and for the ith person, try to use an existing room to satisfy them. If you can't, try to create a new one with the highest temperature possible to satisfy them, which is bi.

Yes the description is pretty vague and the only meaning that makes sense to me is this:
You got some line
Segment on a line is defined by l,r
Where one parameter is distance from start of line and second is the segments length. Which one is which is hard to tell as the letters are not very usual for such description. My bet is:
l length of segment
r distance of (start?) of segment from start of line
You want to find min set of points
So that each segment has at least one point in it. That mean for 2 overlapped segments you need just one point ...
Surely there are more option how to solve this, the obvious is genere & test with some heuristics like genere combinations only for segments that are overlapped more then once. So I would attack this task in this manner (using assumed terminology from #2):
sort segments by r
add number of overlaps to your segment set data
so the segment will be { r,l,n } and set the n=0 for all segments for now.
scan segments for overlaps
something like
for (i=0;i<segments;i++) // loop all segments
for (j=i+1;j<segments;j++) // loop all latter segments until they are still overlapped
if ( segment[i] and segment [j] are overlapped )
{
segment[i].n++; // update overlap counters
segment[j].n++;
}
else break;
Now if the r-sorted segments are overlapped then
segment[i].r <=segment[j].r
segment[i].r+segment[i].l>=segment[j].r
scan segments handling non overlapped segments
for each segment such that segment[i].n==0 add to the solution point list its point (middle) defined by distance from start of line.
points.add(segment[i].r+0.5*segment[i].l);
And after that remove segment from the list (or tag it as used or what ever you do for speed boost...).
scan segments that are overlapped just once
So if segment[i].n==1 then you need to determine if it is overlapped with i-1 or i+1. So add the mid point of the overlap to the solution points and remove i segment from list. Then decrement the n of the overlapped segment (i+1 or i-1)` and if zero remove it too.
points.add(0.5*( segment[j].r + min(segment[i].r+segment[i].l , segment[j].r+segment[j].l )));
Loop this whole scanning until there is no new point added to the solution.
now you got only multiple overlaps left
From this point I will be a bit vague for 2 reasons:
I do not have this tested and I d not have any test data to validate not to mention I am lazy.
This smells like assignment so there is some work/fun left for you.
From start I would scann all segments and remove all of them which got any point from the solution inside. This step you should perform after any changes in the solution.
Now you can experiment with generating combination of points for each overlapped group of segments and remember the minimal number of points covering all segments in group. (simply by brute force).
There are more heuristics possible like handling all twice overlapped segments (in similar manner as the single overlaps) but in the end you will have to do brute force on the rest of data ...
[edit1] as you added new info
The r,l means distance of left and right from the start of line. So if you want to convert between the other formulation { r',l' } and (l<=r) then
l=r`
r=r`+l`
and back
r`=l
l`=r-l`
Sorry too lazy to rewrite the whole thing ...

Here is the working solution in C, please refer to it partially and try to fix your code before reading the whole. Happy coding :) Spoiler alert
#include <stdio.h>
#include <stdlib.h>
int cmp_func(const void *ptr_a, const void *ptr_b)
{
const long *a = *(double **)ptr_a;
const long *b = *(double **)ptr_b;
if (a[1] == b[1])
return a[0] - b[0];
return a[1] - b[1];
}
int main()
{
int i, j, n, num_val;
long **arr;
scanf("%d", &n);
long values[n];
arr = malloc(n * sizeof(long *));
for (i = 0; i < n; ++i) {
*(arr + i) = malloc(2 * sizeof(long));
scanf("%ld %ld", &arr[i][0], &arr[i][1]);
}
qsort(arr, n, sizeof(long *), cmp_func);
i = j = 0;
num_val = 0;
while (i < n) {
int skip = 0;
values[num_val] = arr[i][1];
for (j = i + 1; j < n; ++j) {
int condition;
condition = arr[i][1] <= arr[j][1] ? arr[j][0] <= arr[i][1] : 0;
if (condition) {
skip++;
} else {
break;
}
}
num_val++;
i += skip + 1;
}
printf("%d\n", num_val);
for (int k = 0; k < num_val; ++k) {
printf("%ld ", values[k]);
}
free(arr);
return 0;
}

Here's the working code in C++ for anyone searching :)
#include <bits/stdc++.h>
#define ll long long
#define double long double
#define vi vector<int>
#define endl "\n"
#define ff first
#define ss second
#define pb push_back
#define all(x) (x).begin(),(x).end()
#define mp make_pair
using namespace std;
bool cmp(const pair<ll,ll> &a, const pair<ll,ll> &b)
{
return (a.second < b.second);
}
vector<ll> MinSig(vector<pair<ll,ll>>&vec)
{
vector<ll> points;
for(int x=0;x<vec.size()-1;)
{
bool found=false;
points.pb(vec[x].ss);
for(int y=x+1;y<vec.size();y++)
{
if(vec[y].ff>vec[x].ss)
{
x=y;
found=true;
break;
}
}
if(!found)
break;
}
return points;
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int n;
cin>>n;
vector<pair<ll,ll>>v;
for(int x=0;x<n;x++)
{
ll temp1,temp2;
cin>>temp1>>temp2;
v.pb(mp(temp1,temp2));
}
sort(v.begin(),v.end(),cmp);
vector<ll>res=MinSig(v);
cout<<res.size()<<endl;
for(auto it:res)
cout<<it<<" ";
}

Binary to decimal (on huge numbers)

I am building a C library on big integer number. Basically, I'm seeking a fast algorythm to convert any integer in it binary representation to a decimal one
I saw JDK's Biginteger.toString() implementation, but it looks quite heavy to me, as it was made to convert the number to any radix (it uses a division for each digits, which should be pretty slow while dealing with thousands of digits).
So if you have any documentations / knowledge to share about it, I would be glad to read it.
EDIT: more precisions about my question:
Let P a memory address
Let N be the number of bytes allocated (and set) at P
How to convert the integer represented by the N bytes at address P (let's say in little endian to make things simpler), to a C string
Example:
N = 1
P = some random memory address storing '00101010'
out string = "42"
Thank for your answer still

The reason for the BigInteger.toString method looking heavy is doing the conversion in chunks.
A trivial algorithm would take the last digits and then divide the whole big integer by the radix until there is nothing left.
One problem with this is that a big integer division is quite expensive, so the number is subdivided into chunks that can be processed with regular integer division (opposed to BigInt division):
static String toDecimal(BigInteger bigInt) {
BigInteger chunker = new BigInteger(1000000000);
StringBuilder sb = new StringBuilder();
do {
int current = bigInt.mod(chunker).getInt(0);
bigInt = bigInt.div(chunker);
for (int i = 0; i < 9; i ++) {
sb.append((char) ('0' + remainder % 10));
current /= 10;
if (currnet == 0 && bigInt.signum() == 0) {
break;
}
}
} while (bigInt.signum() != 0);
return sb.reverse().toString();
}
That said, for a fixed radix, you are probably even better off with porting the "double dabble" algorithm to your needs, as suggested in the comments: https://en.wikipedia.org/wiki/Double_dabble

I recently got the challenge to print a big mersenne prime: 2**82589933-1. On my CPU that takes ~40 minutes with apcalc and ~120 minutes with python 2.7. It's a number with 24 million digits and a bit.
Here is my own little C code for the conversion:
// print 2**82589933-1
#include <stdio.h>
#include <math.h>
#include <stdint.h>
#include <inttypes.h>
#include <string.h>
const uint32_t exponent = 82589933;
//const uint32_t exponent = 100;
//outputs 1267650600228229401496703205375
const uint32_t blocks = (exponent + 31) / 32;
const uint32_t digits = (int)(exponent * log(2.0) / log(10.0)) + 10;
uint32_t num[2][blocks];
char out[digits + 1];
// blocks : number of uint32_t in num1 and num2
// num1 : number to convert
// num2 : free space
// out : end of output buffer
void conv(uint32_t blocks, uint32_t *num1, uint32_t *num2, char *out) {
if (blocks == 0) return;
const uint32_t div = 1000000000;
uint64_t t = 0;
for (uint32_t i = 0; i < blocks; ++i) {
t = (t << 32) + num1[i];
num2[i] = t / div;
t = t % div;
}
for (int i = 0; i < 9; ++i) {
*out-- = '0' + (t % 10);
t /= 10;
}
if (num2[0] == 0) {
--blocks;
num2++;
}
conv(blocks, num2, num1, out);
}
int main() {
// prepare number
uint32_t t = exponent % 32;
num[0][0] = (1LLU << t) - 1;
memset(&num[0][1], 0xFF, (blocks - 1) * 4);
// prepare output
memset(out, '0', digits);
out[digits] = 0;
// convert to decimal
conv(blocks, num[0], num[1], &out[digits - 1]);
// output number
char *res = out;
while(*res == '0') ++res;
printf("%s\n", res);
return 0;
}
The conversion is destructive and tail recursive. In each step it divides num1 by 1_000_000_000 and stores the result in num2. The remainder is added to out. Then it calls itself with num1 and num2 switched and often shortened by one (blocks is decremented). out is filled from back to front. You have to allocate it large enough and then strip leading zeroes.
Python seems to be using a similar mechanism for converting big integers to decimal.
Want to do better?
For large number like in my case each division by 1_000_000_000 takes rather long. At a certain size a divide&conquer algorithm does better. In my case the first division would be by dividing by 10 ^ 16777216 to split the number into divident and remainder. Then convert each part separately. Now each part is still big so split again at 10 ^ 8388608. Recursively keep splitting till the numbers are small enough. Say maybe 1024 digits each. Those convert with the simple algorithm above. The right definition of "small enough" would have to be tested, 1024 is just a guess.
While the long division of two big integer numbers is expensive, much more so than a division by 1_000_000_000, the time spend there is then saved because each separate chunk requires far fewer divisions by 1_000_000_000 to convert to decimal.
And if you have split the problem into separate and independent chunks it's only a tiny step away from spreading the chunks out among multiple cores. That would really speed up the conversion another step. It looks like apcalc uses divide&conquer but not multi-threading.

Resolve 16-Queens Problem in 1 second only

I should resolve 16-Queens Problem in 1 second.
I used backtracking algorithm like below.
This code is enough to resolve N-Queens Problem in 1 second when the N is smaller than 13.
But it takes long time if N is bigger than 13.
How can I improve it?
#include <stdio.h>
#include <stdlib.h>
int n;
int arr[100]={0,};
int solution_count = 0;
int check(int i)
{
int k=1, ret=1;
while (k < i && ret == 1) {
if (arr[i] == arr[k] ||
abs(arr[i]-arr[k]) == abs(i-k))
ret = 0;
k++;
}
return ret;
}
void backtrack(int i)
{
if(check(i)) {
if(i == n) {
solution_count++;
} else {
for(int j=1; j<=n; j++) {
arr[i+1] = j;
backtrack(i+1);
}
}
}
}
int main()
{
scanf("%d", &n);
backtrack(0);
printf("%d", solution_count);
}

Your algorithm is almost fine. A small change will probably give you enough time improvement to produce a solution much faster. In addition, there is a data structure change that should let you reduce the time even further.
First, tweak the algorithm a little: rather than waiting for the check all the way till you place all N queens, check early: every time you are about to place a new queen, check if another queen is occupying the same column or the same diagonal before making the arr[i+1] = j; assignment. This will save you a lot of CPU cycles.
Now you need to speed up checking of the next queen. In order to do that you have to change your data structure so that you could do all your checks without any loops. Here is how to do it:
You have N rows
You have N columns
You have 2N-1 ascending diagonals
You have 2N-1 descending diagonals
Since no two queens can take the same spot in any of the four "dimensions" above, you need an array of boolean values for the last three things; the rows are guaranteed to be different, because the i parameter of backtrack, which represents the row, is guaranteed to be different.
With N up to 16, 2N-1 goes up to 31, so you can use uint32_t for your bit arrays. Now you can check if a column c is taken by applying bitwise and & to the columns bit mask and 1 << c. Same goes for the diagonal bit masks.
Note: Doing a 16 Queen problem in under a second would be rather tricky. A very highly optimized program does it in 23 seconds on an 800 MHz PC. A 3.2 GHz should give you a speed-up of about 4 times, but it would be about 8 seconds to get a solution.

I would change while (k < i && ret == 1) { to while (k < i) {
and instead of ret = 0; do return 0;.
(this will save a check every iteration. It might be that your compiler does this anyway, or some other performance trick, but this might help a bit).

calculating the number of bits using K&R method with infinite memory

I got answer for the question, counting number of sets bits from here.
How to count the number of set bits in a 32-bit integer?
long count_bits(long n) {
unsigned int c; // c accumulates the total bits set in v
for (c = 0; n; c++)
n &= n - 1; // clear the least significant bit set
return c;
}
It is simple to understand also. And found the best answer as Brian Kernighans method, posted by hoyhoy... and he adds the following at the end.
Note that this is an question used during interviews. The interviewer will add the caveat that you have "infinite memory". In that case, you basically create an array of size 232 and fill in the bit counts for the numbers at each location. Then, this function becomes O(1).
Can somebody explain how to do this ? If i have infinite memory ...

The fastest way I have ever seen to populate such an array is ...
array[0] = 0;
for (i = 1; i < NELEMENTS; i++) {
array[i] = array[i >> 1] + (i & 1);
}
Then to count the number of set bits in a given number (provided the given number is less than NELEMENTS) ...
numSetBits = array[givenNumber];
If your memory is not finite, I often see NELEMENTS set to 256 (for one byte's worth) and add the number of set bits in each byte in your integer.

int counts[MAX_LONG];
void init() {
for (int i= 0; i < MAX_LONG; i++)
{
counts[i] = count_bits[i]; // as given
}
}
int count_bits_o1(long number)
{
return counts[number];
}
You can probably pre-populate the array more wiseley, i.e. fill with zeros, then every second index add one, then every fourth index add 1, then every eighth index add 1 etc, which might be a bit faster, although I doubt it...
Also, you might account for unsigned values.

How to find a binary logarithm very fast? (O(1) at best)

Is there any very fast method to find a binary logarithm of an integer number? For example, given a number
x=52656145834278593348959013841835216159447547700274555627155488768 such algorithm must find y=log(x,2) which is 215. x is always a power of 2.
The problem seems to be really simple. All what is required is to find the position of the most significant 1 bit. There is a well-known method FloorLog, but it is not very fast especially for the very long multi-words integers.
What is the fastest method?

A quick hack: Most floating-point number representations automatically normalise values, meaning that they effectively perform the loop Christoffer Hammarström mentioned in hardware. So simply converting from an integer to FP and extracting the exponent should do the trick, provided the numbers are within the FP representation's exponent range! (In your case, your integer input requires multiple machine words, so multiple "shifts" will need to be performed in the conversion.)

If the integers are stored in a uint32_t a[], then my obvious solution would be as follows:
Run a linear search over a[] to find the highest-valued non-zero uint32_t value a[i] in a[] (test using uint64_t for that search if your machine has native uint64_t support)
Apply the bit twiddling hacks to find the binary log b of the uint32_t value a[i] you found in step 1.
Evaluate 32*i+b.

The answer is implementation or language dependent. Any implementation can store the number of significant bits along with the data, as it is often useful. If it must be calculated, then find the most significant word/limb and the most significant bit in that word.

If you're using fixed-width integers then the other answers already have you pretty-well covered.
If you're using arbitrarily large integers, like int in Python or BigInteger in Java, then you can take advantage of the fact that their variable-size representation uses an underlying array, so the base-2 logarithm can be computed easily and quickly in O(1) time using the length of the underlying array. The base-2 logarithm of a power of 2 is simply one less than the number of bits required to represent the number.
So when n is an integer power of 2:
In Python, you can write n.bit_length() - 1 (docs).
In Java, you can write n.bitLength() - 1 (docs).

You can create an array of logarithms beforehand. This will find logarithmic values up to log(N):
#define N 100000
int naj[N];
naj[2] = 1;
for ( int i = 3; i <= N; i++ )
{
naj[i] = naj[i-1];
if ( (1 << (naj[i]+1)) <= i )
naj[i]++;
}
The array naj is your logarithmic values. Where naj[k] = log(k).
Log is based on two.

This uses binary search for finding the closest power of 2.
public static int binLog(int x,boolean shouldRoundResult){
// assuming 32-bit integer
int lo=0;
int hi=31;
int rangeDelta=hi-lo;
int expGuess=0;
int guess;
while(rangeDelta>1){
expGuess=(lo+hi)/2; // or (loGuess+hiGuess)>>1
guess=1<<expGuess;
if(guess<x){
lo=expGuess;
} else if(guess>x){
hi=expGuess;
} else {
lo=hi=expGuess;
}
rangeDelta=hi-lo;
}
if(shouldRoundResult && hi>lo){
int loGuess=1<<lo;
int hiGuess=1<<hi;
int loDelta=Math.abs(x-loGuess);
int hiDelta=Math.abs(hiGuess-x);
if(loDelta<hiDelta)
expGuess=lo;
else
expGuess=hi;
} else {
expGuess=lo;
}
int result=expGuess;
return result;
}

The best option on top of my head would be a O(log(logn)) approach, by using binary search. Here is an example for a 64-bit ( <= 2^63 - 1 ) number (in C++):
int log2(int64_t num) {
int res = 0, pw = 0;
for(int i = 32; i > 0; i --) {
res += i;
if(((1LL << res) - 1) & num)
res -= i;
}
return res;
}
This algorithm will basically profide me with the highest number res such as (2^res - 1 & num) == 0. Of course, for any number, you can work it out in a similar matter:
int log2_better(int64_t num) {
var res = 0;
for(i = 32; i > 0; i >>= 1) {
if( (1LL << (res + i)) <= num )
res += i;
}
return res;
}
Note that this method relies on the fact that the "bitshift" operation is more or less O(1). If this is not the case, you would have to precompute either all the powers of 2, or the numbers of form 2^2^i (2^1, 2^2, 2^4, 2^8, etc.) and do some multiplications(which in this case aren't O(1)) anymore.

The example in the OP is an integer string of 65 characters, which is not representable by a INT64 or even INT128. It is still very easy to get the Log(2,x) from this string by converting it to a double-precision number. This at least gives you easy access to integers upto 2^1023.
Below you find some form of pseudocode
# 1. read the string
string="52656145834278593348959013841835216159447547700274555627155488768"
# 2. extract the length of the string
l=length(string) # l = 65
# 3. read the first min(l,17) digits in a float
float=to_float(string(1: min(17,l) ))
# 4. multiply with the correct power of 10
float = float * 10^(l-min(17,l) ) # float = 5.2656145834278593E64
# 5. Take the log2 of this number and round to the nearest integer
log2 = Round( Log(float,2) ) # 215
Note:
some computer languages can convert arbitrary strings into a double precision number. So steps 2,3 and 4 could be replaced by x=to_float(string)
Step 5 could be done quicker by just reading the double-precision exponent (bits 53 up to and including 63) and subtracting 1023 from it.
Quick example code: If you have awk you can quickly test this algorithm.
The following code creates the first 300 powers of two:
awk 'BEGIN{for(n=0;n<300; n++) print 2^n}'
The following reads the input and does the above algorithm:
awk '{ l=length($0); m = (l > 17 ? 17 : l)
x = substr($0,1,m) * 10^(l-m)
print log(x)/log(2)
}'
So the following bash-command is a convoluted way to create a consecutive list of numbers from 0 to 299:
$ awk 'BEGIN{for(n=0;n<300; n++) print 2^n}' | awk '{ l=length($0); m = (l > 17 ? 17 : l); x = substr($0,1,m) * 10^(l-m); print log(x)/log(2) }'
0
1
2
...
299

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio