Right now I have
ifstream argfs (argv[1], ifstream::binary | ifstream::in)
int length;
argfs.seekg(0, argfs.end);
length = argfs.tellg();
Pretty much I'm trying to find out how many bits are in a file, but when this runs it gives me how many bytes. How can I do this? Is this not what " ::binary " is used for?
Thanks
Well there are 8 bits in a byte, just multiply your result by 8 and that will be the total number of bits.
ifstream argfs (argv[1], ifstream::binary | ifstream::in)
int length;
argfs.seekg(0, argfs.end);
length = argfs.tellg();
length *= CHAR_BIT; // will be 8
length=argfs.tellg();
Returns the position of the file pointer which is basically the number of characters read by the file pointer.
note that 1 Character=1 byte
Also one byte comprises of 8 bits so just multiply it by 8
length = 8*argfs.tellg();
Related
Given this code snippet from this textbook that I am currently studying. Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson) (global edition, so the book's exercises could be wrong.)
for (i = 31; i >= 0; i--) {
for (j = 31; j >= 0; j--) {
total_x += grid[i][j].x;
}
}
for (i = 31; i >= 0; i--) {
for (j = 31; j >= 0; j--) {
total_y += grid[i][j].y;
}
}
and this is the information given
The heart of the recent hit game SimAquarium is a tight loop that calculates the
average position of 512 algae. You are evaluating its cache performance on a
machine with a 2,048-byte direct-mapped data cache with 32-byte blocks (B = 32).
struct algae_position {
int x;
int y;
};
struct algae_position grid[32][32];
int total_x = 0, total_y = 0;
int i, j;
You should also assume the following:
sizeof(int) = 4.
grid begins at memory address 0.
The cache is initially empty.
The only memory accesses are to the entries of the array grid.
Variables i, j,
total_x, and total_y are stored in registers
The book gives the following questions as practice:
A. What is the total number of reads?
Answer given : 2048
B. What is the total number of reads that miss in the cache?
Answer given : 1024
C. What is the miss rate?
Answer given: 50%
I'm guessing for A, the answer is derived from 32*32 *2? 32*32 for the dimensions of the matrix and 2 because there are 2 separate loops for x and y vals. Is this correct? How should the total number of reads be counted?
How do we calculate the total number of misses that happen in the cache and the miss rate? I read that the miss rate is (1- hit-rate)
Question A
You are correct about 32 x 32 x 2 reads.
Question B
The loops counts down from 31 towards 0 but that doesn't matter for this question. The answer is the same for loops going from 0 to 31. Since that is a bit easier to explain I'll assume increasing loop counters.
When you read grid[0][0], you'll get a cache miss. This will bring grid[0][0], grid[0][1], grid[0][2] and grid[0][3] into the cache. This is because each element is 2x4 = 8 bytes and the block size is 32. In other words: 32 / 8 = 4 grid elements in one block.
So the next cache miss is for grid[0][4] which again will bring the next 4 grid elements into the cache. And so on... like:
miss
hit
hit
hit
miss
hit
hit
hit
miss
hit
hit
hit
...
So in the first loop you simply have:
"Number of grid elements" divided by 4.
or
32 * 32 / 4 = 256
In general in the first loop:
Misses = NumberOfElements / (BlockSize / ElementSize)
so here:
Misses = 32*32 / (32 / 8) = 256
Since the cache size is only 2048 and the whole grid is 32 x 32 x 8 = 8192, nothing read into the cache in the first loop will generate cache hit in the second loop. In other words - both loops will have 256 misses.
So the total number of cache misses are 2 x 256 = 512.
Also notice that there seem to be a bug in the book.
Here:
The heart of the recent hit game SimAquarium is a tight loop that calculates the
average position of 512 algae.
^^^
Hmmm... 512 elements...
Here:
for (i = 31; i >= 0; i--) {
for (j = 31; j >= 0; j--) {
^^^^^^
hmmm... 32 x 32 is 1024
So the loop access 1024 elements but the text says 512. So something is wrong in the book.
Question C
Miss rate = 512 misses / 2048 reads = 25 %
note:
Being very strict we cannot say for sure that the element size is two times the integer size. The C standard allow that structs contain padding. So in principle there could be 8 bytes padding in the struct (i.e. element size being 16) and that would give the results that the book says.
Lets say x = 1110 (14 in Dec) and I want to find the 2nd set bit from the right, 0100 (4 in Dec)
Yet another example, lets say x = 10110010 (178 in Dec) and I want the 3rd set bit from the right,i.e, 00100000 (32 in Dec)
How to find it? Is there a hack?
Subtracting one from a number will clear the least-significant bit which was set, while setting bits below that. ANDing with the original number will then leave a number which was equal to the original except with the original lowest set bit clear. This procedure may be iterated N times to yield a number with the lowest N set bits clear. The bit which is changed by the Nth iteration (if any) will be the Nth lowest bit that was set in the original.
Assuming a two's complement signed 32-bit integer called number is the input (hence only counting bits 0 to 30 in the for loop):
int number = (1 << 3) | 1; // this is the input, it can be whatever you like
int currentLsbCount = 0;
int desiredLsbCount = 2; // this is your n
int foundLsb = 0;
int foundLsbIndex = 0;
for (int i = 0; i < 31; i++)
{
int bit = (number >> i) & 1;
if (bit == 1)
{
++currentLsbCount;
}
if (currentLsbCount == desiredLsbCount)
{
foundLsb = number & (1 << i);
foundLsbIndex = i;
break;
}
}
foundLsb will hold the value or will be zero if the input was zero; foundLsbIndex will hold the index of the bit.
As far as I know you would have to iterate. There is no quicker method than looping through the bits. You could add some skip logic in, but it would not improve the worst case timing. For instance:
if ((number & ((1 << x) - 1)) == number)
{
// the bottom x bits are zero...
}
This would increase the number of operations for the worst case.
In VB.NET, I'd possibly do the following:
Private Function ReturnBit(input As Long, num As Long) As Long
Dim iResult As Long = 0 'Counts set bits.
Dim work As Long = input 'Working copy of input.
'Looping from the LSB to the MSB of a byte. Adjust for desired
'length, 15 for 2 bytes, 31 for 4 bytes, etc.
For i As Integer = 0 To 7
'If the working variable is 0, the input does not contain as
'many set bits as required. Return -1 if you wish.
If work = 0 Then Return 0
'Add the now LSB if 1, 0 otherwise.
iResult += (work And 1)
'iResult contains the number of set bits now. If this is
'the requested number, return this number. If you're just after
'the position, just return i instead. Instead of 2^i it could be
'more efficient to use 1<<i, but I'd rely on the compiler for
'this.
If iResult = num Then Return CLng(2 ^ i)
'Remove the LSB from the working copy.
work >>= 1
Next
Return 0 'Not enough set bits in input.
End Function
I found this problem:
Consider sequences of 36 bits. Each such sequence has 32 5 - bit
sequences consisting of adjacent bits. For example, the sequence
1101011… contains the 5 - bit sequences 11010,10101,01011,…. Write a
program that prints all 36 - bit sequences with the two properties:
1.The first 5 bits of the sequence are 00000.
2. No two 5 - bit subsequences are the same.
So I generalized to find all n-bit sequences with k - bit unique subsequences satisfy the above requirements. However, the only approach I can think of is using a brutal force search: generate all permutations of n-bit sequence with the first k bits zero, then for each sequence, check if all k-bit subsequences are unique. This apparently is not a very efficient approach. I am wondering is there a better way to solve the problem?
Thanks.
The simplest approach seems to be a backtracking approach. You can keep track of which 5-bit sequences you've seen with a flat array. At each bit, try adding 0 -- counter = (counter & 0x0f) << 1 and check if you've seen that before, then do a counter = counter | 1 and try that path.
There are probably more efficient algorithms that can prune the search space faster. This seems related to https://en.wikipedia.org/wiki/De_Bruijn_sequence. I am not certain, but I believe that it is actually equivalent; that is, the last five digits of the sequence will have to be 10000, making it cyclic.
EDIT: here's some c code. Less efficient than it could be in terms of space, because of the recursion, but simple. The worst bit is the mask management. It appears I was correct about De Bruijn sequences; this finds all 2048 of them.
#include <stdio.h>
#include <stdlib.h>
char *binprint(int val) {
static char res[33];
int i;
res[32] = 0;
for (i = 0; i < 32; i++) {
res[31 - i] = (val & 1) + '0';
val = val >> 1;
}
return res;
}
void checkPoint(int mask, int counter) {
// Get the appropriate bit in the mask
int idxmask = 1 << (counter & 0x1f);
// Abort if we've seen this suffix before
if (mask & idxmask) {
return;
}
// Update the mask
mask = mask | idxmask;
// We're done if we've hit all 32
if (mask == 0xffffffff) {
printf("%10u 0000%s\n", counter, binprint(counter));
return;
}
checkPoint(mask, counter << 1);
checkPoint(mask, (counter << 1) | 1);
}
void main(int argc, char *argv[]) {
checkPoint(0, 0);
}
I remember seeing this exact problem in a programming interview questions book of mine. Here is their solution:
hope it helps. cheers.
I implemented an insertion sort in C and someone who was helping me told my make something a pointer, as shown in the following line near the end, but why?
size_t size = sizeof( array ) / sizeof( *array );
Why is the second one a pointer to array, and what does size_t do?
sizeof(array) = size, in bytes, of the entire array;
sizeof(*array) = size, in bytes, of the first item in the array;
As items in a C array are of uniform size, dividing the first by the second gives the number of items in the array.
size_t is an unsigned integer large enough to store the size of any item the computer dan store in memory. So, usually, it's the same as an unsigned int, but it's not guaranteed to be and there's semantic value in it being a different thing.
Why is the second one a pointer to array
Example 1
char a[5];
sizeof(a)=5
sizeof(*a)=1
So, size = 5/1 = 5 // this indicates the no of elements in the array
Example 2
int a[5];
sizeof(a)= 20
sizeof(*a)=4
So, size = 20/4 = 5 // this indicates the no of elements in the array
and what does size_t do?
Read: What is size_t in C?
This was asked in my Google interview recently and I offered an answer which involved bit shift and was O(n) but she said this is not the fastest way to go about doing it. I don't understand, is there a way to count the bits set without having to iterate over the entire bits provided?
Brute force: 10000 * 16 * 4 = 640,000 ops. (shift, compare, increment and iteration for each 16 bits word)
Faster way:
We can build table 00-FF -> number of bits set. 256 * 8 * 4 = 8096 ops
I.e. we build a table where for each byte we calculate a number of bits set.
Then for each 16-bit int we split it to upper and lower
for (n in array)
byte lo = n & 0xFF; // lower 8-bits
byte hi = n >> 8; // higher 8-bits
// simply add number of bits in the upper and lower parts
// of each 16-bits number
// using the pre-calculated table
k += table[lo] + table[hi];
}
60000 ops in total in the iteration. I.e. 68096 ops in total. It's O(n) though, but with less constant (~9 times less).
In other words, we calculate number of bits for every 8-bits number, and then split each 16-bits number into two 8-bits in order to count bits set using the pre-built table.
There's (almost) always a faster way. Read up about lookup tables.
I don't know what the correct answer was when this question was asked, but I believe the most sensible way to solve this today is to use the POPCNT instruction. Specifically, you should use the 64-bit version. Since we just want the total number of set bits, boundaries between 16-bit elements are of no interest to us. Since the 32-bit and 64-bit POPCNT instructions are equally fast, you should use the 64-bit version to count four elements' worth of bits per cycle.
I just implemented it in Java:
import java.util.Random;
public class Main {
static int array_size = 1024;
static int[] array = new int[array_size];
static int[] table = new int[257];
static int total_bits_in_the_array = 0;
private static void create_table(){
int i;
int bits_set = 0;
for (i = 0 ; i <= 256 ; i++){
bits_set = 0;
for (int z = 0; z <= 8 ; z++){
bits_set += i>>z & 0x1;
}
table[i] = bits_set;
//System.out.println("i = " + i + " bits_set = " + bits_set);
}
}
public static void main(String args[]){
create_table();
fill_array();
parse_array();
System.out.println("The amount of bits in the array is: " + total_bits_in_the_array);
}
private static void parse_array() {
int current;
for (int i = 0; i < array.length; i++){
current = array[i];
int down = current & 0xff;
int up = current & 0xff00;
int sum = table[up] + table[down];
total_bits_in_the_array += sum;
}
}
private static void fill_array() {
Random ran = new Random();
for (int i = 0; i < array.length; i++){
array[i] = Math.abs(ran.nextInt()%512);
}
}
}
Also at https://github.com/leitao/bits-in-a-16-bits-integer-array/blob/master/Main.java
You can pre-calculate the bit counts in bytes and then use that for lookup. It is faster, if you make certain assumptions.
Number of operations (just computation, not reading input) should take the following
Shift approach:
For each byte:
2 ops (shift, add) times 16 bits = 32 ops, 0 mem access times 10000 = 320 000 ops + 0 mem access
Pre-calculation approach:
255 times 2 ops (shift, add) times 8 bits = 4080 ops + 255 mem access (write the result)
For each byte:
2 ops (compute addresses) + 2 mem access + op (add the results) = 30 000 ops + 20 000 mem access
Total of 30 480 ops + 20 255 mem access
So a lot more memory access with lot fewer operations
Thus, assuming everything else being equal, pre-calculation for 10 000 bytes is faster if we can assume memory access is faster than an operation by a factor of (320 000 - 30 480)/20 255 = 14.29
Which is probably true if you are alone on a dedicated core on a reasonably modern box as the 255 bytes should fit into a cache. If you start getting cache misses, the assumption might no longer hold.
Also, this math assumes pointer arithmetic and direct memory access as well as atomic operations and atomic memory access. Depending on your language of choice (and, apparently, based on previous answers, your choice of compiler switches), that assumption might not hold.
Finally, things get more interesting if you consider scalability: shifting can be easily parallelised onto up to 10000 cores but pre-computation not necessarily. As byte number goes up, however, lookup gets more and more advantageous.
So, in short. Yes, pre-calculation is faster under pretty reasonable assumptions but no, it is not guaranteed to be faster.