What is a bit set "high"? Simple explanation - bit

I know a set bit is a 1 (as opposed to a 0), but I can't seem to find a quick, simple explanation of what a bit set high is (in a 32 bit integer).

Bit set or bit set high means 1.
Example: If i have 8 bit number 10101010.
Now i want to check 5th bit is set or set high there i will see 5th bit if it is 1 it means it is set or set high otherwise it is not set or set low same in the case of 32 bit.
Visit it for programmatically example too LINK

High, Set and 1 are synonymous in binary computing.
High is just the term more likely to be used by someone dealing with the hardware implementation, and refers to a higher voltage (low being the complement) representing the bit.
E.g. from https://www.electronics-tutorials.ws/binary/bin_1.html:
Generally, a logic “1” represents a higher voltage, such as 5 volts,
which is commonly referred to as a HIGH value, while a logic “0”
represents a low voltage, such as 0 volts or ground, and is commonly
referred to as a LOW value. These two discrete voltage levels
representing the digital values of “1’s” (one’s) and “0’s” (zero’s)
are commonly called: BInary digiTS, and in digital and computational
circuits and applications they are normally referred to as binary
BITS.

Related

Is it acceptable to use each byte of a PRNG-generated number separately?

Say you have a non-cryptographically secure PRNG that generates 64-bit output.
Assuming that bytes are 8 bits, is it acceptable to use each byte of the 64-bit output as separate 8-bit random numbers or would that possibly break the randomness guarantees of a good PRNG? Or does it depend on the PRNG?
Because the PRNG is not cryptographically secure, the "randomness guarantee" I am worried about is not security, but whether the byte stream has the same guarantee of randomness, using the same definition of "randomness" that PRNG authors use, that the PRNG has with respect to its 64-bit output.
This should be quite safe with a CSPRNG. For comparison it's like reading /dev/random byte by byte. With a good CSPRNG it is also perfectly acceptable to simply generate a 64bit sample 8 times and pick 8 bits per sample as well (throwing away the 56 other bits).
With PRNGs that are not CSPRNG you will have 'security' concerns in terms of the raw output of the PRNG that outweigh whether or not you chop up output into byte sized chunks.
In all cases it is vital to make sure the PRNG is seeded and periodically re-seeded correctly (so as to flush any possibly compromised internal state regularly). Security depends on the unpredictability of your internal state, which is ultimately driven by the quality of your seed input. One thing good CSPRNG implementations will do for you is to pessimistically estimate the amount of captured 'entropy' to safeguard the output from predictable internal state.
Note however that with 8 bits you only have 256 possible outputs in any case, so it becomes more of a question of how you use this. For instance, if you do something like XOR based encryption against the output of a PRNG (i.e. treating it as a one time pad based on some pre shared secret seed), then using a known plain text attack may relatively easily reveal the contents of the internal state of the PRNG. That is another type of attack which good CSPRNG implementations are supposed to guard against by their design (using e.g. a computationally secure hash function).
EDIT to add: if you don't care about 'security' but only need the output to look random, then this should be quite safe -- in theory a good PRNG is just as likely to yield a 0 as 1, and that should not vary between any octet. So you expect a linear distribution of possible output values. One thing you can do to verify whether this skews the distribution is to run a Monte Carlo simulation of some reasonably large size (e.g. 1M) and compare the histograms with 256 bins for both the raw 64 bit and the 8 * 8 bit output. You expect a roughly flat diagram for both cases if the linear distribution is preserved intact.
It depends on the generator and its parameterization. Quoting from the Wikipedia page for Linear Congruential Generators: "The low-order bits of LCGs when m is a power of 2 should never be relied on for any degree of randomness whatsoever. [...]any full-cycle LCG when m is a power of 2 will produce alternately odd and even results."

Fixed Point Multiplication for FFT

I’m writing a Radix-2 DIT FFT algorithm in VHDL, which requires some fractional multiplication of input data by Twiddle Factor (TF). I use Fixed Point arithmetic’s to achieve that, with every word being 16 bit long, where 1 bit is a sign bit and the rest is distributed between integer and fraction. Therefore my dilemma:
I have no idea, in what range my input data will be, so if I just decide that 4 bits go to integer and the rest 11 bits to fraction, in case I get integer numbers higher than 4 bits = 15 decimal, I’m screwed. The same applies if I do 50/50, like 7 bits to integer and the rest to fraction. If I get numbers, which are very small, I’m screwed because of truncation or rounding, i.e:
Let’s assume I have an integer "3"(0000 0011) on input and TF of "0.7071" ( 0.10110101 - 8 bit), and let’s assume, for simplicity, my data is 8 bit long, therefore:
3x0.7071 = 2.1213
3x0.7071 = 0000 0010 . 0001 1111 = 2.12109375 (for 16 bits).
Here comes the trick - I need to up/down round or truncate 16 bits to 8 bits, therefore, I get 0000 0010, i.e 2 - the error is way too high.
My questions are:
How would you solve this problem of range vs precision if you don’t know the range of your input data AND you would have numbers represented in fixed point?
Should I make a process, which decides after every multiplication where to put the comma? Wouldn’t it make the multiplication slower?
Xilinx IP Core has 3 different ways for Fixed Number Arithmetic’s – Unscaled (similar to what I want to do, just truncate in case overflow happens), Scaled fixed point (I would assume, that in that case it decides after each multiplication, where the comma should be and what should be rounded) and Block Floating Point(No idea what it is or how it works - would appreciate an explanation). So how does this IP Core decide where to put the comma? If the decision is made depending on the highest value in my dataset, then in case I have just 1 high peak and the rest of the data is low, the error will be very high.
I will appreciate any ideas or information on any known methods.
You don't need to know the fixed-point format of your input. You can safely treat it as normalized -1 to 1 range or full integer-range.
The reason is that your output will have the same format as the input. Or, more likely for FFT, a known relationship like 3 bits increase, which would the output has 3 more integer bits than the input.
It is the core user's burden to know where the decimal point will end up, you have to document the change to dynamic range of course.

How to uniquely represent 99,999 bits as a byte, word, or double word

I have 99,999 bit flags that I need to represent uniquely with 32 bits or less. Any of the bits can be set and I need to know if the set bits differ from a comparable set of bits. I am considering using CRC to store a unique value hash but I am not sure if collisions will be a problem. Ideally, less than 500 of these bits will be set at any given time, but they will not be know ahead of time.
Is there suitable hash or other algorithm to uniquely represent these bits?
NO!
Without some other information about those bit flags to identify that certain combinations are impossible, this cannot be done. If all combinations are possible, then you will need to use 99,999 bits to store your 99,999 bit flags.
Edit:
Based on the background information that this is to reduce network usage and the expectation is that only about 500 of the bits are set, there are techniques that can be used, but none are a simple hash, and none are efficient enough to store in 32 bits. I would start by looking at Arithmetic Coding. This uses a probability distribution of the characters that you want to send (0.5% 1, 99.5% 0) to compress data. By my computations, you can "expect" a compression of about 22 times. But, for signals that are considered rare, you will pay the price by needing to transmit a signal larger than your starting 99,999 bits.

Why use 1 instead of -1?

At 29min mark of http://channel9.msdn.com/Events/GoingNative/2013/Writing-Quick-Code-in-Cpp-Quickly Andrei Alexandrescu says when using constants to prefer 0 and mentions hardware knows how to handle it. I did some assembly and I know what he is talking about and about the zero flag on CPUs
Then he says prefer the constant 1 rather then -1. -1 IIRC is not actually special but because it is negative the sign flag on CPUs would be set. 1 from my current understanding is simply a positive number there is no bit on the processor flag for it and no way to distinguish from 0 or other positive numbers.
But Andrei says to prefer 1 over -1. Why? What does hardware do with 1 that is better then -1?
First, it should be noted that Andrea Alexandrescu emphasized the difference between zero and the other two good constants, that the difference between using one and negative one is less significant. He also bundles compiler issues with hardware issues, i.e., the hardware might be able to perform the operation efficiently but the compiler will not generate the appropriate machine code given a reasonably clear expression in the chosen higher level language.
While I cannot read his mind, there are at least two aspects that may make one better than negative one.
Many ISAs provide comparison operations (or flag to GPR transfers) that return zero or one (e.g., MIPS has Set on Less Than) not zero or negative one. (SIMD instructions are an exception; SIMD comparisons typically generate zero or negative one [all bits set].)
At least one implementation of SPARC made loading smaller signed values more expensive, and I seem to recall that at least one ISA did not provide an instruction for loading signed bytes. Naive implementation of sign extension adds latency because whether to set or clear the more significant bits is not known until the value has been loaded.
Negative one does have some benefits. As you mentioned, testing for negativity is often relatively easy, so if negative one is the only negative value used it may be handled less expensively. Also, conditionally clearing a value based on zero or negative one is simply an and operation. (For conditionally setting or clearing a single bit, one rather than negative one would be preferred since such would involve only a shift and an and.)

How exactly does PC/Mac generates random numbers for either 0 or 1?

This question is NOT about how to use any language to generate a random number between any interval. It is about generating either 0 or 1.
I understand that many random generator algorithm manipulate the very basic random(0 or 1) function and take seed from users and use an algorithm to generate various random numbers as needed.
The question is that how the CPU generate either 0 or 1? If I throw a coin, I can generate head or tailer. That's because I physically throw a coin and let the nature decide. But how does CPU do it? There must be an action that the CPU does (like throwing a coin) to get either 0 or 1 randomly, right?
Could anyone tell me about it?
Thanks
(This has several facets and thus several algorithms. Keep in mind that there are many different forms of randomness used for different purposes, but I understand your question in the way that you are interested in actual randomness used for cryptography.)
The fundamental problem here is that computers are (mostly) deterministic machines. Given the same input in the same state they always yield the same result. However, there are a few ways of actually gathering entropy:
User input. Since users bring outside input into the system you can take that to derive some bits from that. Similar to how you could use radioactive decay or line noise.
Network activity. Again, an outside source of stuff.
Generally interrupts (which kinda include the first two).
As alluded to in the first item, noise from peripherals, such as audio input or a webcam can be used.
There is dedicated hardware that can generate a few hundred MiB of randomness per second. Usually they give you random numbers directly instead of their internal entropy, though.
How exactly you derive bits from that is up to you but you could use time between events, or actual content from the events, etc. – generally eliminating bias from entropy sources isn't easy or trivial and a lot of thought and algorithmic work goes into that (in the case of the aforementioned special hardware this is all done in hardware and the code using it doesn't need to care about it).
Once you have a pool of actually random bits you can just use them as random numbers (/dev/random on Linux does that). But this has downsides, since there is usually little actual entropy and possibly a higher demand for random numbers. So you can invent algorithms to “stretch” that initial randomness in a manner that makes it still impossible or at least very difficult to predict anything about following numbers (/dev/urandom on Linux or both /dev/random and /dev/urandom on FreeBSD do that). Fortuna and Yarrow are so-called cryptographically secure pseudo-random number generators and designed with that in mind. You still have a very good guarantee about the quality of random numbers you generate, but have many more before your entropy pool runs out.
In any case, the CPU itself cannot give you a random 0 or 1. There's a lot more involved and this usually includes the complete computer system or special hardware built for that purpose.
There is also a second class of computational randomness: Plain vanilla pseudo-random number generators (PRNGs). What I said earlier about determinism – this is the embodiment of it. Given the same so-called seed a PRNG will yield the exact same sequence of numbers every time¹. While this sounds idiotic it has practical benefits.
Suppose you run a simulation involving lots of random numbers, maybe to simulate interaction between molecules or atoms that involve certain probabilities and unpredictable behaviour. In science you want results anyone can independently verify, given the same setup and procedure (or, with computing, the same algorithms). If you used actual randomness the only option you have would be to save every single random number used to make sure others can replicate the results independently.
But with a PRNG all you need to save is the seed and remember what algorithm you used. Others can then get the exact same sequence of pseudo-random numbers independently. Very nice property to have :-)
Footnotes
¹ This even includes the CSPRNGs mentioned above, but they are designed to be used in a special way that includes regular re-seeding with entropy to overcome that problem.
A CPU can only generate a uniform random number, U(0,1), which happens to range from 0 to 1. So mathematically, it would be defined as a random variable U in the range [0,1]. Examples of random draws of a U(0,1) random number in the range 0 to 1 would be 0.28100002, 0.34522, 0.7921, etc. The probability of any value between 0 and 1 is equal, i.e., they are equiprobable.
You can generate binary random variates that are either 0 or 1 by setting a random draw of U(0,1) to a 0 if U(0,1)<=0.5 and 1 if U(0,1)>0.5, since in theory there will be an equal number of random draws of U(0,1) below 0.5 and above 0.5.

Resources