BitSet bug in cracking the coding interview? - algorithm

The following is the implementation of BitSet in the solution of question 10-4 in cracking the coding interview book. Why is it allocating an array of size/32 not (size/32 + 1). Am I missing something here or this is a bug?
If I pass 33 to the constructor of BitSet then I will allocate only one int and If I try to set or get the bit 32, I will get an AV!
package Question10_4;
class BitSet {
int[] bitset;
public BitSet(int size) {
bitset = new int[size >> 5]; // divide by 32
}
boolean get(int pos) {
int wordNumber = (pos >> 5); // divide by 32
int bitNumber = (pos & 0x1F); // mod 32
return (bitset[wordNumber] & (1 << bitNumber)) != 0;
}
void set(int pos) {
int wordNumber = (pos >> 5); // divide by 32
int bitNumber = (pos & 0x1F); // mod 32
bitset[wordNumber] |= 1 << bitNumber;
}
}

From what I can gather from reading the solution you mention (on page 205), and the little I understand about computer programming, it seems to me that this is a special implementation of a bitset, meant to take the argument of 32,000 in its construction (see the checkDuplicates function. The question is about examining an array with numbers from 1 to N, where N is at most 32,000, with only 4KB of memory).
This way, an array of 1000 elements is created, each one used for 32 bits in the bit set. You can see in the bitset class that to get a bit's position, we (floor) divide by 32 to get the array index, and then mod 32 to get the specific bit position.

Yes, answer in the book is incorrect.
Correct answer:
bitset = new int[(size + 31) >> 5]; // divide by 32

Related

IEEE754 single precision - General algorithm for representing the half of a number

Suppose N is an arbitrary number represented according to IEEE754 single precision standards. I want to find the most precise possible representation of N/2 again in IEEE754.
I want to find a general algorithm (described in words, I just want the necessary steps and cases to take into account) for obtaining the representation.
My approach is :
Say the number is represented: b0_b1_b2_b3...b_34.
Isolate the first bit which determines the sign (-/+) of the number.
Calculate the representation of the power (p) from the unsigned representation b_1...b_11.
If power = 128 we have a special case. If all the bits of the mantissa are equal to 0, we have, depending on b_0, either minus or plus infinity. We don't change anything. If the mantissa has at least one bit equal to 1 then we have NaN value. Again we change nothing.
if e is inside]-126, 127[then we have a normalized mantissam. The new power p can be calculated asp' = p - 1and belongs in the interval]-127, 126]. We then calculatem/2` and we represent it starting from the right and losing any bits that cannot be included in the 23 bits of the mantissa.
If e = -126, then in calculating the half of this number we pass in a denormalized mantissa. We represent p = 127, calculate the half of the mantissa and represent it again starting from the right losing any information that cannot be included.
Finally if e = -127 we have a denormalized mantissa. As long as m/2 can be represented in the number of bits available in the mantissa without losing information we represent that and keep p = -127. In any other case we represent the number as a positive or negative 0 depending on b_0
Any steps I have missed, any improvements ( I am sure there are ) that can be made or anything that seems completely wrong?
I implemented a divide by two algorithm in Java and verified it for all 32-bit inputs. I tried to follow your pseudocode, but there were three places where I diverged. First, the infinity/NaN exponent is 128. Second, in case 4 (normal -> normal), there's no need to operate on the fraction. Third, you didn't describe how round half to even works when you do operate on the fraction. LGTM otherwise.
public final class FloatDivision {
public static float divideFloatByTwo(float value) {
int bits = Float.floatToIntBits(value);
int sign = bits >>> 31;
int biased_exponent = (bits >>> 23) & 0xff;
int exponent = biased_exponent - 127;
int fraction = bits & 0x7fffff;
if (exponent == 128) {
// value is NaN or infinity
} else if (exponent == -126) {
// value is normal, but result is subnormal
biased_exponent = 0;
fraction = divideNonNegativeIntByTwo(0x800000 | fraction);
} else if (exponent == -127) {
// value is subnormal or zero
fraction = divideNonNegativeIntByTwo(fraction);
} else {
// value and result are normal
biased_exponent--;
}
return Float.intBitsToFloat((sign << 31) | (biased_exponent << 23) | fraction);
}
private static int divideNonNegativeIntByTwo(int value) {
// round half to even
return (value >>> 1) + ((value >>> 1) & value & 1);
}
public static void main(String[] args) {
int bits = Integer.MIN_VALUE;
do {
if (bits % 0x800000 == 0) {
System.out.println(bits);
}
float value = Float.intBitsToFloat(bits);
if (Float.floatToIntBits(divideFloatByTwo(value)) != Float.floatToIntBits(value / 2)) {
System.err.println(bits);
break;
}
} while (++bits != Integer.MIN_VALUE);
}
}

Binary to decimal (on huge numbers)

I am building a C library on big integer number. Basically, I'm seeking a fast algorythm to convert any integer in it binary representation to a decimal one
I saw JDK's Biginteger.toString() implementation, but it looks quite heavy to me, as it was made to convert the number to any radix (it uses a division for each digits, which should be pretty slow while dealing with thousands of digits).
So if you have any documentations / knowledge to share about it, I would be glad to read it.
EDIT: more precisions about my question:
Let P a memory address
Let N be the number of bytes allocated (and set) at P
How to convert the integer represented by the N bytes at address P (let's say in little endian to make things simpler), to a C string
Example:
N = 1
P = some random memory address storing '00101010'
out string = "42"
Thank for your answer still
The reason for the BigInteger.toString method looking heavy is doing the conversion in chunks.
A trivial algorithm would take the last digits and then divide the whole big integer by the radix until there is nothing left.
One problem with this is that a big integer division is quite expensive, so the number is subdivided into chunks that can be processed with regular integer division (opposed to BigInt division):
static String toDecimal(BigInteger bigInt) {
BigInteger chunker = new BigInteger(1000000000);
StringBuilder sb = new StringBuilder();
do {
int current = bigInt.mod(chunker).getInt(0);
bigInt = bigInt.div(chunker);
for (int i = 0; i < 9; i ++) {
sb.append((char) ('0' + remainder % 10));
current /= 10;
if (currnet == 0 && bigInt.signum() == 0) {
break;
}
}
} while (bigInt.signum() != 0);
return sb.reverse().toString();
}
That said, for a fixed radix, you are probably even better off with porting the "double dabble" algorithm to your needs, as suggested in the comments: https://en.wikipedia.org/wiki/Double_dabble
I recently got the challenge to print a big mersenne prime: 2**82589933-1. On my CPU that takes ~40 minutes with apcalc and ~120 minutes with python 2.7. It's a number with 24 million digits and a bit.
Here is my own little C code for the conversion:
// print 2**82589933-1
#include <stdio.h>
#include <math.h>
#include <stdint.h>
#include <inttypes.h>
#include <string.h>
const uint32_t exponent = 82589933;
//const uint32_t exponent = 100;
//outputs 1267650600228229401496703205375
const uint32_t blocks = (exponent + 31) / 32;
const uint32_t digits = (int)(exponent * log(2.0) / log(10.0)) + 10;
uint32_t num[2][blocks];
char out[digits + 1];
// blocks : number of uint32_t in num1 and num2
// num1 : number to convert
// num2 : free space
// out : end of output buffer
void conv(uint32_t blocks, uint32_t *num1, uint32_t *num2, char *out) {
if (blocks == 0) return;
const uint32_t div = 1000000000;
uint64_t t = 0;
for (uint32_t i = 0; i < blocks; ++i) {
t = (t << 32) + num1[i];
num2[i] = t / div;
t = t % div;
}
for (int i = 0; i < 9; ++i) {
*out-- = '0' + (t % 10);
t /= 10;
}
if (num2[0] == 0) {
--blocks;
num2++;
}
conv(blocks, num2, num1, out);
}
int main() {
// prepare number
uint32_t t = exponent % 32;
num[0][0] = (1LLU << t) - 1;
memset(&num[0][1], 0xFF, (blocks - 1) * 4);
// prepare output
memset(out, '0', digits);
out[digits] = 0;
// convert to decimal
conv(blocks, num[0], num[1], &out[digits - 1]);
// output number
char *res = out;
while(*res == '0') ++res;
printf("%s\n", res);
return 0;
}
The conversion is destructive and tail recursive. In each step it divides num1 by 1_000_000_000 and stores the result in num2. The remainder is added to out. Then it calls itself with num1 and num2 switched and often shortened by one (blocks is decremented). out is filled from back to front. You have to allocate it large enough and then strip leading zeroes.
Python seems to be using a similar mechanism for converting big integers to decimal.
Want to do better?
For large number like in my case each division by 1_000_000_000 takes rather long. At a certain size a divide&conquer algorithm does better. In my case the first division would be by dividing by 10 ^ 16777216 to split the number into divident and remainder. Then convert each part separately. Now each part is still big so split again at 10 ^ 8388608. Recursively keep splitting till the numbers are small enough. Say maybe 1024 digits each. Those convert with the simple algorithm above. The right definition of "small enough" would have to be tested, 1024 is just a guess.
While the long division of two big integer numbers is expensive, much more so than a division by 1_000_000_000, the time spend there is then saved because each separate chunk requires far fewer divisions by 1_000_000_000 to convert to decimal.
And if you have split the problem into separate and independent chunks it's only a tiny step away from spreading the chunks out among multiple cores. That would really speed up the conversion another step. It looks like apcalc uses divide&conquer but not multi-threading.

Generate all n - bit sequences with all unique k-bit subsequences.

I found this problem:
Consider sequences of 36 bits. Each such sequence has 32 5 - bit
sequences consisting of adjacent bits. For example, the sequence
1101011… contains the 5 - bit sequences 11010,10101,01011,…. Write a
program that prints all 36 - bit sequences with the two properties:
1.The first 5 bits of the sequence are 00000.
2. No two 5 - bit subsequences are the same.
So I generalized to find all n-bit sequences with k - bit unique subsequences satisfy the above requirements. However, the only approach I can think of is using a brutal force search: generate all permutations of n-bit sequence with the first k bits zero, then for each sequence, check if all k-bit subsequences are unique. This apparently is not a very efficient approach. I am wondering is there a better way to solve the problem?
Thanks.
The simplest approach seems to be a backtracking approach. You can keep track of which 5-bit sequences you've seen with a flat array. At each bit, try adding 0 -- counter = (counter & 0x0f) << 1 and check if you've seen that before, then do a counter = counter | 1 and try that path.
There are probably more efficient algorithms that can prune the search space faster. This seems related to https://en.wikipedia.org/wiki/De_Bruijn_sequence. I am not certain, but I believe that it is actually equivalent; that is, the last five digits of the sequence will have to be 10000, making it cyclic.
EDIT: here's some c code. Less efficient than it could be in terms of space, because of the recursion, but simple. The worst bit is the mask management. It appears I was correct about De Bruijn sequences; this finds all 2048 of them.
#include <stdio.h>
#include <stdlib.h>
char *binprint(int val) {
static char res[33];
int i;
res[32] = 0;
for (i = 0; i < 32; i++) {
res[31 - i] = (val & 1) + '0';
val = val >> 1;
}
return res;
}
void checkPoint(int mask, int counter) {
// Get the appropriate bit in the mask
int idxmask = 1 << (counter & 0x1f);
// Abort if we've seen this suffix before
if (mask & idxmask) {
return;
}
// Update the mask
mask = mask | idxmask;
// We're done if we've hit all 32
if (mask == 0xffffffff) {
printf("%10u 0000%s\n", counter, binprint(counter));
return;
}
checkPoint(mask, counter << 1);
checkPoint(mask, (counter << 1) | 1);
}
void main(int argc, char *argv[]) {
checkPoint(0, 0);
}
I remember seeing this exact problem in a programming interview questions book of mine. Here is their solution:
hope it helps. cheers.

Get the last 1000 digits of 5^1234566789893943

I saw the following interview question on some online forum. What is a good solution for this?
Get the last 1000 digits of 5^1234566789893943
Simple algorithm:
1. Maintain a 1000-digits array which will have the answer at the end
2. Implement a multiplication routine like you do in school. It is O(d^2).
3. Use modular exponentiation by squaring.
Iterative exponentiation:
array ans;
int a = 5;
while (p > 0) {
if (p&1) {
ans = multiply(ans, a)
}
p = p>>1;
ans = multiply(ans, ans);
}
multiply: multiplies two large number using the school method and return last 1000 digits.
Time complexity: O(d^2*logp) where d is number of last digits needed and p is power.
A typical solution for this problem would be to use modular arithmetic and exponentiation by squaring to compute the remainder of 5^1234566789893943 when divided by 10^1000. However in your case this will still not be good enough as it would take about 1000*log(1234566789893943) operations and this is not too much, but I will propose a more general approach that would work for greater values of the exponent.
You will have to use a bit more complicated number theory. You can use Euler's theorem to get the remainder of 5^1234566789893943 modulo 2^1000 a lot more efficiently. Denote that r. It is also obvious that 5^1234566789893943 is divisible by 5^1000.
After that you need to find a number d such that 5^1000*d = r(modulo 2^1000). To solve this equation you should compute 5^1000(modulo 2^1000). After that all that is left is to do division modulo 2^1000. Using again Euler's theorem this can be done efficiently. Use that x^(phi(2^1000)-1)*x =1(modulo 2^1000). This approach is way faster and is the only feasible solution.
The key phrase is "modular exponentiation". Python has that built in:
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> help(pow)
Help on built-in function pow in module builtins:
pow(...)
pow(x, y[, z]) -> number
With two arguments, equivalent to x**y. With three arguments,
equivalent to (x**y) % z, but may be more efficient (e.g. for ints).
>>> digits = pow(5, 1234566789893943, 10**1000)
>>> len(str(digits))
1000
>>> digits
4750414775792952522204114184342722049638880929773624902773914715850189808476532716372371599198399541490535712666678457047950561228398126854813955228082149950029586996237166535637925022587538404245894713557782868186911348163750456080173694616157985752707395420982029720018418176528050046735160132510039430638924070731480858515227638960577060664844432475135181968277088315958312427313480771984874517274455070808286089278055166204573155093723933924226458522505574738359787477768274598805619392248788499020057331479403377350096157635924457653815121544961705226996087472416473967901157340721436252325091988301798899201640961322478421979046764449146045325215261829432737214561242087559734390139448919027470137649372264607375942527202021229200886927993079738795532281264345533044058574930108964976191133834748071751521214092905298139886778347051165211279789776682686753139533912795298973229094197221087871530034608077419911440782714084922725088980350599242632517985214513078773279630695469677448272705078125
>>>
The technique we need to know is exponentiation by squaring and modulus. We also need to use BigInteger in Java.
Simple code in Java:
BigInteger m = //BigInteger of 10^1000
BigInteger pow(BigInteger a, long b) {
if (b == 0) {
return BigInteger.ONE;
}
BigInteger val = pow(a, b/2);
if (b % 2 == 0)
return (val.multiply(val)).mod(m);
else
return (val.multiply(val).multiply(a)).mod(m);
}
In Java, the function modPow has done it all for you (thank Java).
Use congruence and apply modular arithmetic.
Square and multiply algorithm.
If you divide any number in base 10 by 10 then the remainder represents
the last digit. i.e. 23422222=2342222*10+2
So we know:
5=5(mod 10)
5^2=25=5(mod 10)
5^4=(5^2)*(5^2)=5*5=5(mod 10)
5^8=(5^4)*(5^4)=5*5=5(mod 10)
... and keep going until you get to that exponent
OR, you can realize that as we keep going you keep getting 5 as your remainder.
Convert the number to a string.
Loop on the string, starting at the last index up to 1000.
Then reverse the result string.
I posted a solution based on some hints here.
#include <vector>
#include <iostream>
using namespace std;
vector<char> multiplyArrays(const vector<char> &data1, const vector<char> &data2, int k) {
int sz1 = data1.size();
int sz2 = data2.size();
vector<char> result(sz1+sz2,0);
for(int i=sz1-1; i>=0; --i) {
char carry = 0;
for(int j=sz2-1; j>=0; --j) {
char value = data1[i] * data2[j]+result[i+j+1]+carry;
carry = value/10;
result[i+j+1] = value % 10;
}
result[i]=carry;
}
if(sz1+sz2>k){
vector<char> lastKElements(result.begin()+(sz1+sz2-k), result.end());
return lastKElements;
}
else
return result;
}
vector<char> calculate(unsigned long m, unsigned long n, int k) {
if(n == 0) {
return vector<char>(1, 1);
} else if(n % 2) { // odd number
vector<char> tmp(1, m);
vector<char> result1 = calculate(m, n-1, k);
return multiplyArrays(result1, tmp, k);
} else {
vector<char> result1 = calculate(m, n/2, k);
return multiplyArrays(result1, result1, k);
}
}
int main(int argc, char const *argv[]){
vector<char> v=calculate(5,8,1000);
for(auto c : v){
cout<<static_cast<unsigned>(c);
}
}
I don't know if Windows can show a big number (Or if my computer is fast enough to show it) But I guess you COULD use this code like and algorithm:
ulong x = 5; //There are a lot of libraries for other languages like C/C++ that support super big numbers. In this case I'm using C#'s default `Uint64` number.
for(ulong i=1; i<1234566789893943; i++)
{
x = x * x; //I will make the multiplication raise power over here
}
string term = x.ToString(); //Store the number to a string. I remember strings can store up to 1 billion characters.
char[] number = term.ToCharArray(); //Array of all the digits
int tmp=0;
while(number[tmp]!='.') //This will search for the period.
tmp++;
tmp++; //After finding the period, I will start storing 1000 digits from this index of the char array
string thousandDigits = ""; //Here I will store the digits.
for (int i = tmp; i <= 1000+tmp; i++)
{
thousandDigits += number[i]; //Storing digits
}
Using this as a reference, I guess if you want to try getting the LAST 1000 characters of this array, change to this in the for of the above code:
string thousandDigits = "";
for (int i = 0; i > 1000; i++)
{
thousandDigits += number[number.Length-i]; //Reverse array... ¿?
}
As I don't work with super super looooong numbers, I don't know if my computer can get those, I tried the code and it works but when I try to show the result in console it just leave the pointer flickering xD Guess it's still working. Don't have a pro Processor. Try it if you want :P

calculating the number of bits using K&R method with infinite memory

I got answer for the question, counting number of sets bits from here.
How to count the number of set bits in a 32-bit integer?
long count_bits(long n) {
unsigned int c; // c accumulates the total bits set in v
for (c = 0; n; c++)
n &= n - 1; // clear the least significant bit set
return c;
}
It is simple to understand also. And found the best answer as Brian Kernighans method, posted by hoyhoy... and he adds the following at the end.
Note that this is an question used during interviews. The interviewer will add the caveat that you have "infinite memory". In that case, you basically create an array of size 232 and fill in the bit counts for the numbers at each location. Then, this function becomes O(1).
Can somebody explain how to do this ? If i have infinite memory ...
The fastest way I have ever seen to populate such an array is ...
array[0] = 0;
for (i = 1; i < NELEMENTS; i++) {
array[i] = array[i >> 1] + (i & 1);
}
Then to count the number of set bits in a given number (provided the given number is less than NELEMENTS) ...
numSetBits = array[givenNumber];
If your memory is not finite, I often see NELEMENTS set to 256 (for one byte's worth) and add the number of set bits in each byte in your integer.
int counts[MAX_LONG];
void init() {
for (int i= 0; i < MAX_LONG; i++)
{
counts[i] = count_bits[i]; // as given
}
}
int count_bits_o1(long number)
{
return counts[number];
}
You can probably pre-populate the array more wiseley, i.e. fill with zeros, then every second index add one, then every fourth index add 1, then every eighth index add 1 etc, which might be a bit faster, although I doubt it...
Also, you might account for unsigned values.

Resources