What is the bit? - bit

Everyone says that bit is the smallest unit of information.
But I don't understand what that unit of information means.
Is bit the smallest capacity of information?
Please help me. Thank you.

The "unit of information" is the way we can present data, in this case it's in a digital form.
For example, base 10 (1, 2, 3 ... 9) is a unit of information. It's the way that we can present data with loads of different states. We can present the number 100 using base 10, the number 0xFF using base 16, we can even represent data using physical measurements like kilograms or grams, or megabytes and gigabytes.
These are just examples, but the main point is that a unit of information is just the way we can represent data.
The "smallest" representation we can have is a bit, because you only have 2 states: 1 and 0. And that's pretty much it.

Related

Get all possible valid positions of ships in battleship game

I'm creating probability assistant for Battleship game - in essence, for given game state (field state and available ships), it would produce field where all free cells will have probability of hit.
My current approach is to do a monte-carlo like computation - get random free cell, get random ship, get random ship rotation, check if this placement is valid, if so continue with next ship from available set. If available set is empty, add how the ships were set to output stack. Redo this multiple times, use outputs to compute probability of each cell.
Is there sane algorithm to process all possible ship placements for given field state?
An exact solution is possible. But does not qualify as sane in my books.
Still, here is the idea.
There are many variants of the game, but let's say that we start with a worst case scenario of 1 ship of size 5, 2 of size 4, 3 of size 3 and 4 of size 2.
The "discovered state" of the board is all spots where shots have been taken, or ships have been discovered, plus the number of remaining ships. The discovered state naively requires 100 bits for the board (10x10, any can be shot) plus 1 bit for the count of remaining ships of size 5, 2 bits for the remaining ships of size 4, 2 bits for remaining ships of size 3 and 3 bits for remaining ships of size 2. This makes 108 bits, which fits in 14 bytes.
Now conceptually the idea is to figure out the map by shooting each square in turn in the first row, the second row, and so on, and recording the game state along with transitions. We can record the forward transitions and counts to find how many ways there are to get to any state.
Then find the end state of everything finished and all ships used and walk the transitions backwards to find how many ways there are to get from any state to the end state.
Now walk the data structure forward, knowing the probability of arriving at any state while on the way to the end, but this time we can figure out the probability of each way of finding a ship on each square as we go forward. Sum those and we have our probability heatmap.
Is this doable? In memory, no. In a distributed system it might be though.
Remember that I said that recording a state took 14 bytes? Adding a count to that takes another 8 bytes which takes us to 22 bytes. Adding the reverse count takes us to 30 bytes. My back of the envelope estimate is that at any point in our path there are on the order of a half-billion states we might be in with various ships left, killed ships sticking out and so on. That's 15 GB of data. Potentially for each of 100 squares. Which is 1.5 terabytes of data. Which we have to process in 3 passes.

if a Bitcoin mining nounce is just 32 bits long how come is it increasingly difficult to find the winning hash?

I'm learning about mining and the first thing that surprised me is that the nounce part of the algorithm which is supposed to be randomly looped until you get a number smaller than the target hash .. is just 32 bits long.
Can you explain why then is it so difficult to loop an unsigned int and how come is it increasingly difficult over time? Thank you.
The task is: try different nonce values in your potential block until you reach a block having a hash value below some given threshold.
I can't find the source right now, but I'm quite sure that since the introduction of special mining ASICs the 32-bit nonce is no longer enough to keep the miners busy for the planned 10 minutes interval between blocks. They are able to compute 4 billion block hashes in less than 10 minutes.
Increasing the difficulty didn't help anymore, as that reached the point where none of the 4 billion possible nonce values gave a hash below the threshold.
So they found some additional fields in the block that are now used as nonce-extension. The principle is still the same: try different values until you reach a block with a hash below the threshold, only now it's more than 32 bits that can be varied, allowing for the threshold to be lowered beyond the former 32-bit-implied barrier.
Because it's not just the 32bit nonce that is involved in the calculation. The 1MB of transaction data is also part of the mining input. There is then a non-trivial amount of arithmetic to arrive at the output, which then can be compared with the target.
Bitcoin mining is looping over all 4billion uints until you find a "right" one.
The way that difficulty is increased, is that only some of the bits of the output matter. E.g. early on the lowest 11 bits had to be some specific pattern, the remaining 21bits could be anything. In theory there would be 2million "right" values for each transaction block, uniformly distributed across the range of a uint. Then the "difficulty" is increased so that 13 bits have to be some pattern, so now there are 4x fewer "right" answers, so it takes (on average) 4x longer to find one.

Does length of number affects sorting time?

I have a simple question does the length of numbers that need to be sorted affects the sorting time ??
Example: Suppose we need to sort 10 million 6 digit numbers (like: 204134) and 10 million 2/3 digit numbers(like: 24, 143) and to sort both the sets individually. Does the set with 6 digit numbers is gonna take more time than the the one with 2/3 digit numbers ?
I know the hardware use each logic gate for a single digit so 6 logic gates for 6 digits compared to 2/3 gates for other set but i don't know whether this affects the sorting time or not. Can someone explain me this.
Helps will be appreciated.
Thanks
The hardware works with bits, not with decimal digits. Furthermore, the hardware always works with the same fixed amount of bits (for a given operation); smaller values are padded. For example, a 32 bit CPU usually has 32 bit comparator units with exactly as much circuitry needed for 32 bit comparisons, and uses those regardless of whether the values currently being compared would fit into fewer bits.
Another issue with your thinking is that the exact amount of logic gates doesn't matter much for performance. The propagation time of individual gates is much smaller than a clock cycle, only rather complicated circuits with long dependency chains actually take longer than a single cycle (and even then it might be pipelined to still get a throughput of 1 op per cycle). A surprisingly large number of logic gates in sequence (and an virtually unlimited number of logic gates in parallel) can easily finish their work within one clock cycle. Hence, a smart 64 bit comparison doesn't take more clock cycles than a 8 bit one.
The short answer: It depends, but probably not
The longer answer:
It's hard to know because you haven't said much about the hardware or the sorting algorithm. You mentioned later that you're using some MPI variant of Quicksort. So you're asking if there could be a performance difference between 6-bit numbers and 3-bit numbers due to the hardware. Well, if you pack those digits together then you're going to have better bandwidth when transferring the dataset from memory to the processor. Since you haven't mentioned anything about compacted arrays, I'll assume you're not doing this. Once the value is in the register it will have the same latency and throughput regardless of being 6 bits or 3 bits.
There are algorithms like radix sort that perform differently depending on the number of bits needed for your range of numbers. Since you're not using this, it doesn't apply.

Efficient storage of matrix from interview questions

I had several interviews and failed some of them.
There was a question actually asked by three different companies. Not exactly the same but they share common structure.
The question is: Now you have a matrix with 0 or 1 (or useful user profile represents by "1" and non useful represents by "0"// or image with 1 and 0 value). Now you need to store the image into the system efficiently. What method should you use?
In my opinion, they were expecting me to come up with efficient solution so I told them using 0, 1 and "value" together.
For example. 00000011100011111
Can be stored as 06 13 03 15
I know there's a encode method similar to this in multimedia or information technology.
But I don't think this is what they want.
And idea?
Thanks.!
It depends on what the meaning of "efficient".
A decent compromise between space and efficiency is to select a native datatype slightly bigger than the number of questions.
I e if you want to store 10 different binary values per item/person, use a short (or that's what the 16 bit data type is called in java).
If you store these in an array (short[]), you can very quickly find the item/person by it's id, used as a position in the array, and then a certain value by using bitwise operations (in this case bitshift a 1 to the position where the interesting item is stored and then use the bitwise operation & (and) to get this certain bit out. If the resulting short != 0 you know that bit was set.

Algorithm for assigning a unique series of bits for each user?

The problem seems simple at first: just assign an id and represent that in binary.
The issue arises because the user is capable of changing as many 0 bits to a 1 bit. To clarify, the hash could go from 0011 to 0111 or 1111 but never 1010. Each bit has an equal chance of being changed and is independent of other changes.
What would you have to store in order to go from hash -> user assuming a low percentage of bit tampering by the user? I also assume failure in some cases so the correct solution should have an acceptable error rate.
I would an estimate the maximum number of bits tampered with would be about 30% of the total set.
I guess the acceptable error rate would depend on the number of hashes needed and the number of bits being set per hash.
I'm worried with enough manipulation the id can not be reconstructed from the hash. The question I am asking I guess is what safe guards or unique positioning systems can I use to ensure this happens.
Your question isn't entirely clear to me.
Are you saying that you want to validate a user based on a hash of the user ID, but are concerned that the user might change some of the bits in the hash?
If that is the question, then as long as you are using a proven hash algorithm (such as MD5), there is very low risk of a user manipulating the bits of their hash to get another user's ID.
If that's not what you are after, could you clarify your question?
EDIT
After reading your clarification, it looks like you might be after Forward Error Correction, a family of algorithms that allow you to reconstruct altered data.
Essentially with FEC, you encode each bit as a series of 3 bits and apply the "majority wins" principal when decoding again. When encoding you represent "1" as "111" and "0" as "000". When decoding, if most of the encoded 3 bits are zero, you decode that to mean zero. If most of the encoded 3 bits are 1, you decode that to mean 1.
Assign each user an ID with the same number of bits set.
This way you can detect immediately if any tampering has occurred. If you additionally make the Hamming distance between any two IDs at least 2n, then you'll be able to reconstruct the original ID in cases where less than n bits have been set.
So you're trying to assign a "unique id" that will still remain a unique id even if it's changed to something else?
If the only "tampering" is changing 0's to 1's (but not vice-versa) (which seems fairly contrived), then you could get an effective 'ID' by assigning each user a particular bit position, set that bit to zero in that user's id, and to one in every other user's id.
Thus any fiddling by the user will result in corrupting their own id, but not allow impersonation of anyone else.
The distance between two IDs, ( the number of bits you have to change to get from one word to the other ) is called the Hamming distance. Error correcting codes can correct up to half this distance and still give you the original word. If you assume that 30% of the bits can be tampered with, this means that the distance between 2 words should be 60% of the bits. This leaves 40% of that space to be used for IDs. As long as you randomly generate up to 40% of the IDs you could for a given number of bits ( also include the error correcting part), you should be able to recover the original ID.

Resources