Session Token Transmission - session

There is a requirement for my web application regarding sessions. Points are below:-
Session Ids must be randomly generated.
Session Ids must be unpredictable.
The size of session Id should be large enough to ensure that it is not vulnerable to a brute force attack.
The character set should be complex, i.e. Make use of special character.
A length of 50 random characters is advised.

Essentially the answer to all those points is: use a good random number generator. Your system or platform should have some form of good PRNG to offer; on UNIX that's /dev/random or /dev/urandom, on Windows it's its build-in crypto API. Your language of choice likely offers a wrapper API around those. Simply suck a number of random bytes out of that PRNG. If it's any good, those bytes will be random and unpredictable. The length is entirely up to you, just read enough data from the PRNG. The character set will be more than complex enough, since you'll receive raw bytes; you'll in fact have to "dumb them down" into the ASCII character set, for example by base 64 encoding them.

Related

What is the meaning of the UNPREDICTABLE in random function?

Linguistically I understand the meaning of unpredictable. But, during this time I often find the word predictable in some cases. I usually find these words if I enter an area with several topics, for example:
Math.random vs crypto.getRandomValues in Javascript
Random vs Secure Random numbers
Etc
So what exactly does unpredictable mean in random functions? Then what are the conditions for a random function to be called "unpredictable random function"?
If a value is random, then it means that knowing the previous values in the sequence provides you no information about the next value.
If a value is unpredictable, then there is no "practical" means of determining the next value. It is generally a stronger claim than random.
(The word "practical" here is doing some work. It generally means "within some set of rules about what the attacker may do." If the attacker has full access to the CPU and RAM, then nothing is "unpredictable," but we are generally interested in cases where they do not have this.)
As an example of the difference, the digits of pi are believed to be random (we don't actually know this, but it appears to be true). That means that there is no way to guess, better than chance, the 10,000th digit of pi. It's random. But it's perfectly predictable. Anyone can easily determine its value. So the digits of pi are a perfectly good random sequence, and could even be used effectively to drive a game's behavior where randomness is sufficient, but it won't be a secure random sequence and is useless for cryptographic purposes.
If I went to random.org (which provides very good random numbers), and generated a value, but then used it repeatedly, it would be a random value but also completely predictable.
This predictability can occur when producing the seed of a PRNG. While the PRNG may generate excellent random values, if its seed is predictable then the entire sequence will be known. ("Predictable" here doesn't mean with 100% certainty; any level of certainty better than chance is sufficient.)
As an example of this problem, networking gear has a significant challenge generating an unpredictable seed when first booted, particularly if the networking gear nearby is rebooted at the same time. Whatever process you use to create a random value can easily fall into a small set of likely values ("small" compared to all the possible values; it may still be in the millions, but that's not many values in cryptography). This is a problem that can require significant effort to resolve in high-security systems.
Most cryptographic systems do not define how these initial, unpredictable values are to be generated. They're just an assumed input to the system.
Predictable is when the seed itself is from something that can be predicted, like the time for example in python random library:
import random, time
random.seed(time.time())
r1 = random.randrange(1e49, 1e50-1)
random.seed(time.time())
r2 = random.randrange(1e49, 1e50-1)
print(r1)
print(r2)
The output here will be the same.
Unpredictable would be when a random number has really high entropy, so that none could really find the initial seed and track down the random algorithm that was used.

What for people sometimes convert numbers or strings to bytes?

Sometimes I encounter questions about converting sth to bytes. Are anything existing where it is vitally important to convert to bytes or what for could I convert sth to bytes?
In most languages the most common string functions come as part of the language or in a library/include/import that comes pre-made, often employing object code to take advantage of processor based strings functions, however, sometimes you need to do something with a string that isnt natively supported by the language so since 8-bit days, people have viewed strings as an array of 7 or 8-bit characters, which fit within a byte and use conventions like ASCII to determine which byte value represents which character.
While standard languages often have functions like "string.replaceChar(OFFSET,'a')" this methodology can be painstaking slow because each call to the replaceChar method results in processing overhead which may be greater than the processing needing to be done.
There is also the simplicity factor when designing your own string algorithms but like I said, most of the common algorithms come prebuilt in modern languages. (stringCompare, trimString, reverseString, etc).
Suppose you want to perform an operation on a string which doesnt come as standard.
Suppose you want to add two numbers which are represented in decimal digits in strings and the size of these numbers are greater than the 64-bit bus size of the processor? The RSA encryption/descryption behind the SSL browser padlocks employs the use of numbers which dont fit into the word size of a desktop computer but none the less the programs on a desktop which deal with RSA certificates and keys must be able to process these data which are actually strings.
There are many and varied reasons you would want to deal with string, as an array of bytes but each of these reasons would be fairly specialised.

Is it acceptable to use each byte of a PRNG-generated number separately?

Say you have a non-cryptographically secure PRNG that generates 64-bit output.
Assuming that bytes are 8 bits, is it acceptable to use each byte of the 64-bit output as separate 8-bit random numbers or would that possibly break the randomness guarantees of a good PRNG? Or does it depend on the PRNG?
Because the PRNG is not cryptographically secure, the "randomness guarantee" I am worried about is not security, but whether the byte stream has the same guarantee of randomness, using the same definition of "randomness" that PRNG authors use, that the PRNG has with respect to its 64-bit output.
This should be quite safe with a CSPRNG. For comparison it's like reading /dev/random byte by byte. With a good CSPRNG it is also perfectly acceptable to simply generate a 64bit sample 8 times and pick 8 bits per sample as well (throwing away the 56 other bits).
With PRNGs that are not CSPRNG you will have 'security' concerns in terms of the raw output of the PRNG that outweigh whether or not you chop up output into byte sized chunks.
In all cases it is vital to make sure the PRNG is seeded and periodically re-seeded correctly (so as to flush any possibly compromised internal state regularly). Security depends on the unpredictability of your internal state, which is ultimately driven by the quality of your seed input. One thing good CSPRNG implementations will do for you is to pessimistically estimate the amount of captured 'entropy' to safeguard the output from predictable internal state.
Note however that with 8 bits you only have 256 possible outputs in any case, so it becomes more of a question of how you use this. For instance, if you do something like XOR based encryption against the output of a PRNG (i.e. treating it as a one time pad based on some pre shared secret seed), then using a known plain text attack may relatively easily reveal the contents of the internal state of the PRNG. That is another type of attack which good CSPRNG implementations are supposed to guard against by their design (using e.g. a computationally secure hash function).
EDIT to add: if you don't care about 'security' but only need the output to look random, then this should be quite safe -- in theory a good PRNG is just as likely to yield a 0 as 1, and that should not vary between any octet. So you expect a linear distribution of possible output values. One thing you can do to verify whether this skews the distribution is to run a Monte Carlo simulation of some reasonably large size (e.g. 1M) and compare the histograms with 256 bins for both the raw 64 bit and the 8 * 8 bit output. You expect a roughly flat diagram for both cases if the linear distribution is preserved intact.
It depends on the generator and its parameterization. Quoting from the Wikipedia page for Linear Congruential Generators: "The low-order bits of LCGs when m is a power of 2 should never be relied on for any degree of randomness whatsoever. [...]any full-cycle LCG when m is a power of 2 will produce alternately odd and even results."

Most efficient barcode to store a GUID

I have a system that I'm working on at the moment that requires users to log into the system, and the client wants to use a barcode scanner and cards to keep prices down. (Yes username and password cheaper, but she wants a card type solution so she gets one.)
All my data uses GUIDs as key fields, so I'd like to store the GUID directly on the card in the barcode. While its simple enough to code it as 3 of 9 its not going to be the most efficient use of space.
Is there a best practice or most efficient method for storing GUIDs in a barcode? I'd have assumed that since there's a consistent length, and depth to the data there would be a standard, but I can't find it. Would be easy enough to generate my own - control char either end and then binary data between, but would like something that standard readers will know how to interpret.
Any help gratefully received.
There are no open standards for special-purpose data compaction with generic linear barcodes such as Code 39 and Code 128. Most ISO/IEC-standardised 2D barcodes do support a special-purpose data encoding mechanism called Extended Channel Interpretation (ECI) which allows you to specify that data conforms to a certain application standard or encoding regime, for example ECI 298765 for IPv4 address compaction [*]. Unfortunately GUID compaction isn't amongst those that have been registered and even if it were you would nevertheless need to handle this within your application as reader support would be lacking.
That leaves you with having to pre-encode (and subsequently decode) the GUID into a format that can be handled efficiently by some ubiquitous barcode symbology.
An efficient way to store a GUID would be to convert it to a 40-digit[†] decimal representation and store the result in a Code 128 barcode using double-density numeric compression ("Mode C").
For example, consider the GUID:
cd171f7c-560d-4a62-8d65-16b87419a58c
Expressed as a hexadecimal number:
0xCD171F7C560D4A628D6516B87419A58C
Converted to 40 decimal digits:
0272611800569275698104677545117639878028
Encoded within a Code 128 barcode:
Your application would of course need to recognise this input as a decimal-encoded GUID and reverse the above process but I doubt that a significantly more efficient approach exists that doesn't require you to transform the data into an unusual radix and then deal with the complexities of handling ASCII control characters at scan time.
[*] The register of assigned ECI codes is available from the AIM store as "ECI Part 3: Register".
[†] Whilst it is possible to store the entire GUID range within 39 digits a 39-digit Mode C Code 128 symbol is in fact longer than a 40-digit symbol.

Encryption algorithm that output byte by byte based on password and offset

Is there a well-known (to be considered) algorithm that can encrypt/decrypt any arbitrary byte inside the file based on the password entered and the offset inside the file.
(Databyte, Offset, Password) => EncryptedByte
(EncryptedByte, Offset, Password) => DataByte
And is there some fundamental weakness in this approach or it's still theoretically possible to build it strong enough
Update:
More datails: Any cryptographic algorithm has input and output. For many existing ones the input operates on large blocks. I want to operate on only one byte, but the system based on this can only can remap bytes and weak by default, but if we take the position in the file of this byte, we for example can take the bits of this position value to interpret them as some operation on some step (0: xor, 1: shitf) and create the encrypted byte with this. But it's too simple, I'm looking for something stronger.
Maybe it's not very efficient but how about this:
for encryption use:
encryptedDataByte = Encrypt(offset,key) ^ dataByte
for decryption use:
dataByte = Encrypt(offset,key) ^ encryptedDataByte
Where Encrypt(offset,key) might be e.g. 3DES or AES (with padding the offset, if needed, and throwing away all but one result bytes)
If you can live with block sizes of 16 byte, you can try the XTS-mode described in the wikipedia article about Disk encryption theory (the advantage being that some good cryptologists already looked at it).
If you really need byte-wise encryption, I doubt that there is an established solution. In the conference Crypto 2009 there was a talk about How to Encipher Messages on a Small Domain: Deterministic Encryption and the Thorp Shuffle. In your case the domain is a byte, and as this is a power of 2, a Thorp Shuffle corresponds to a maximally unbalanced Feistel network. Maybe one can build something using the position and the password as key, but I'd be surprised if a home-made solution will be secure.
You can use AES in Counter Mode where you divide your input into blocks of 16 bytes (128 bits) and then basically encrypt a counter on the block number to get a pseudo-random 16 bytes that you can XOR with the plaintext. It is critically important to not use the same counter start value (and/or initialization vector) for the same key ever again or you will open yourself for an easy attack where an attacker can use a simple xor to recover the key.
You mention that you want to only operate on individual bytes, but this approach would give you that flexibility. Output Feedback Mode is another common one, but you have to be careful in its use.
You might consider using the EAX mode for better security. Also, make sure you're using something like PBKDF-2 or scrypt to generate your encryption key from the password.
However, as with most cryptography related issues, it's much better to use a rigorously tested and evaluated library rather than rolling your own.
Basically what you need to do is generate some value X (probably 1 byte) based on the offset and password, and use this to encrypt/decrypt the byte at that offset. We'll call it
X = f(offset,password)
The problem is that an attacker that "knows something" about the file contents (e.g. the file is English text, or a JPEG) can come up with an estimate (or sometimes be certain) of what an X could be. So he has a "rough idea" about many X values, and for each of these he knows what the offset is. There is a lot of information available.
Now, it would be nice if all that information were of little use to the attacker. For most purposes, using a cryptographic hash function (like SHA-1) will give you a reasonable assurance of decent security.
But I must stress that if this is something critical, consult an expert.
One possibility is a One Time Pad, possibly using the password to seed some pseudo-random number generator. One time pads theoretically achieve perfect secrecy, but there are some caveats. It should do what you're looking for though.

Resources