How do you use a non-number as a seed in an RNG? - random

How would you use, say, the string "abc" as the seed of an RNG? Would you change it into 0x616263, or 123, or hash it with sha1 (or some other hash), or something else?

Hashing with a high-quality hashing algorithm seems like the best solution. However, depending on how many bits you have to seed with, you might have to use only a part of the generated hash. This shouldn't be a problem if you use a cryptographically strong algorithm that has well-distributed outputs.

I'm assuming
you want a repeatable pseudorandom
sequence
your RNG expects an integer.
In this case, the answer is: it doesn't matter.

Related

Using multiple hash outputs in iterations?

Is there a known or perceived weakness to using the output of other hash algorithms as input for the next hash iteration?
Of course double hashing is not recommended, but this is not the same as double hashing.
Example:
I take a "secret" input and I hash it with SHA256, SHA384, and RIPEMD160 separately. I then combine the output of each into a single long string to use as input for a SHA512 hash. I then repeat this process repeatedly for a number of times.
In my mind, doing this significantly expands the length of the input into the SHA512 and essentially makes brute for even more infeasible.
Additionally, I considered using a 4th hash function merely to generate a value which could then be used to vary the length of the combined input string, by possibly discarding a few bytes in an unpredictable manner, so that the input is not a constant size. I'm not entirely sure that would be of any benefit.
Thoughts?
An answer to this question depends heavily on the attack scenario.
Of course double hashing is not recommended, but this is not the same as double hashing.
I would say: No! If you are storing passwords using a hash function, the attack on the store will be harder, if you use multiple rounds (feeding the output of round n as input for round n+1). Bitcoin as another example uses 2 passes (see here and here). For additional info see Why hashing twice?
by possibly discarding a few bytes in an unpredictable manner, so that the input is not a constant size. I'm not entirely sure that would be of any benefit.
That counteracts the way hash functions are designed. You want the function to produce the same output using the same input. Lifting this relationship basically destroys all use from the function. You could use a random number generator instead. See also: Does the MD5 algorithm always generate the same output for the same string? or Is sha-1 hash always the same?
In my mind, doing [...] essentially makes brute for even more infeasible.
The quoted statement is correct, but the reasoning is flawed. It makes brute force harder, because an attacker has to compute 4 functions instead of one. And she cannot use rainbow tables, because they aren't generated for your setup.
Wild guess: If you are using the mentioned setup to store and verify passwords, don't do it. Use PBKDF2 or bcrypt for that. See Password Storage Cheat Sheet

Pseudo random generator <=> hash function?

I've been thinking about this as a thought experiment to try and understand some hashing concepts. Consider the requirement for a say 128 bit hash function (i.e., its output is exactly 128 bits in length).
A. You might look at something like MD5. So you input your data to be hashed, and out pops a 128 bit number.
B. Alternatively, you find a magical pseudo random number generator (PRNG). Some sort of Frankenstein version of the Twister. It seeds itself from all of your input data to be hashed, and has an internal state size >> 128 bits. You then generate 128 pseudo random bits as output.
It seems to me that both A and B effectively produce an output that is determined solely by the input data. Are these two approaches therefore equivalent?
Supplemental:
Some feed back has suggested that there might be a security in-equivalence with my scenario. If the pseudo random number generator were to be something like Java's SecureRandom (which uses SHA-1), seeded from the input data, then might A <=> B?
If you seed a PRNG with your input data and then extract 128 bits of random data from it, then you effectively leave the hashing to the PRNG seed function, and the size of the hash that it generates will be the size of the PRNG state buffer.
However, if the state of the PRNG is larger than the 128 bits you extract as a hash, then there's a risk that some of the input data used for the seed won't have any effect on the bits of the PRNG state that you extract. This makes it a really bad hash, so you don't want to do that.
PRNG seed functions are typically very weak hashes, because hashing is not their business. They're almost certainly insecure (which you did not ask about), and separate from that they're usually quite weak at avalanching. A strong hash typically tries to ensure that every bit of input has a fair chance of affecting every bit of output. Insecure hashes typically don't worry that they'll fail at this if the input data is too short, but a PRNG seed will often make no effort at all.
Cryptographic hash functions are designed to make it hard to create input that generates a specific hash; and/or more it hard to create two inputs that generate the same hash.
If something is designed as a random number generating algorithm, then this was not one of the requirements for the design. So if something is "just" a random number generator, there is no guarantee that it satisfies these important constraints on a cryptographic hashcode. So in that sense, they are not equivalent.
Of course there may be random number generating algorithms that were also designed as cryptographic hashing algorithms, and in that case (if the implementation did a good job at satisfying the requirements) they may be equivalent.

Is it possible to find out which hash algorithm was used in these strings?

I don't want to reverse it. I just want to be sure what hash algorithm was used on these strings (I'm not sure if it's md5):
d27918bcc2a8562dc4549c2c00111e66
889f071e04755db26579a19f4303654e
47a21a13ee822c1450155bd0033b0f1d
Is there a way to do it?
One of the source for the strings above is certainly: '9915757678'
They're each 32 characters, so 128 bits. So it could be MD5.
However, there is no way to tell. Any hash function worth its salt will spread the hash values evenly throughout the entire output space, so if you have just a bunch of outputs, there's no way to tell hash functions apart.
Unless you can make some reasonable guesses about the input, and do some brute-forcing, of course.
It fits MD5() hash form (length-wise) but it could be just as well SHA1 hash stored in CHAR(32) field. As others have said - unless you have an example of input value. Then you could use a tool like this:
http://www.insidepro.com/hashes.php
to generate hashes using several diffrent algorithms and try to find if any one fits.
You're even more out of luck, if there was salt added before hashing.
No certain way, but this looks like MD5.
Based on size, these could be one of ntlm or md4 or md5.
I know I'm too late here!, but posting this as I didn't see this possible answer.

Simple integer encryption

Is there a simple algorithm to encrypt integers? That is, a function E(i,k) that accepts an n-bit integer and a key (of any type) and produces another, unrelated n-bit integer that, when fed into a second function D(E(i),k) (along with the key) produces the original integer?
Obviously there are some simple reversible operations you can perform, but they all seem to produce clearly related outputs (e.g. consecutive inputs lead to consecutive outputs). Also, of course, there are cryptographically strong standard algorithms, but they don't produce small enough outputs (e.g. 32-bit). I know any 32-bit cryptography can be brute-forced, but I'm not looking for something cryptographically strong, just something that looks random. Theoretically speaking it should be possible; after all, I could just create a dictionary by randomly pairing every integer. But I was hoping for something a little less memory-intensive.
Edit: Thanks for the answers. Simple XOR solutions will not work because similar inputs will produce similar outputs.
Would not this amount to a Block Cipher of block size = 32 bits ?
Not very popular, because it's easy to break. But theorically feasible.
Here is one implementation in Perl :
http://metacpan.org/pod/Crypt::Skip32
UPDATE: See also Format preserving encryption
UPDATE 2: RC5 supports 32-64-128 bits for its block size
I wrote an article some time ago about how to generate a 'cryptographically secure permutation' from a block cipher, which sounds like what you want. It covers using folding to reduce the size of a block cipher, and a trick for dealing with non-power-of-2 ranges.
A simple one:
rand = new Random(k);
return (i xor rand.Next())
(the point xor-ing with rand.Next() rather than k is that otherwise, given i and E(i,k), you can get k by k = i xor E(i,k))
Ayden is an algorithm that I developed. It is compact, fast and looks very secure. It is currently available for 32 and 64 bit integers. It is on public domain and you can get it from http://github.com/msotoodeh/integer-encoder.
You could take an n-bit hash of your key (assuming it's private) and XOR that hash with the original integer to encrypt, and with the encrypted integer to decrypt.
Probably not cryptographically solid, but depending on your requirements, may be sufficient.
If you just want to look random and don't care about security, how about just swapping bits around. You could simply reverse the bit string, so the high bit becomes the low bit, second highest, second lowest, etc, or you could do some other random permutation (eg 1 to 4, 2 to 7 3 to 1, etc.
How about XORing it with a prime or two? Swapping bits around seems very random when trying to analyze it.
Try something along the lines of XORing it with a prime and itself after bit shifting.
How many integers do you want to encrypt? How much key data do you want to have to deal with?
If you have few items to encrypt, and you're willing to deal with key data that's just as long as the data you want to encrypt, then the one-time-pad is super simple (just an XOR operation) and mathematically unbreakable.
The drawback is that the problem of keeping the key secret is about as large as the problem of keeping your data secret.
It also has the flaw (that is run into time and again whenever someone decides to try to use it) that if you take any shortcuts - like using a non-random key or the common one of using a limited length key and recycling it - that it becomes about the weakest cipher in existence. Well, maybe ROT13 is weaker.
But in all seriousness, if you're encrypting an integer, what are you going to do with the key no matter which cipher you decide on? Keeping the key secret will be a problem about as big (or bigger) than keeping the integer secret. And if you're encrypting a bunch of integers, just use a standard, peer reviewed cipher like you'll find in many crypto libraries.
RC4 will produce as little output as you want, since it's a stream cipher.
XOR it with /dev/random

Guessing the hash function?

I'd like to know which algorithm is employed. I strongly assume it's something simple and hopefully common. There's no lag in generating the results, for instance.
Input: any string
Output: 5 hex characters (0-F)
I have access to as many keys and results as I wish, but I don't know how exactly I could harness this to attack the function. Is there any method? If I knew any functions that converted to 5-chars to start with then I might be able to brute force for a salt or something.
I know for example that:
a=06a07
b=bfbb5
c=63447
(in case you have something in mind)
In normal use it converts random 32-char strings into 5-char strings.
The only way to derive a hash function from data is through brute force, perhaps combined with some cleverness. There are an infinite number of hash functions, and the good ones perform what is essentially one-way encryption, so it's a question of trial and error.
It's practically irrelevant that your function converts 32-character strings into 5-character hashes; the output is probably truncated. For fun, here are some perfectly legitimate examples, the last 3 of which are cryptographically terrible:
Use the MD5 hashing algorithm, which generates a 16-character hash, and use the 10th through the 14th characters.
Use the SHA-1 algorithm and take the last 5 characters.
If the input string is alphabetic, use the simple substitution A=1, B=2, C=3, ... and take the first 5 digits.
Find each character on your keyboard, measure its distance from the left edge in millimeters, and use every other digit, in reverse order, starting with the last one.
Create a stackoverflow user whose name is the 32-bit string, divide 113 by the corresponding user ID number, and take the first 5 digits after the decimal. (But don't tell 'em I told you to do it!)
Depending on what you need this for, if you have access to as many keys and results as you wish, you might want to try a rainbow table approach. 5 hex chars is only 1mln combinations. You should be able to brute-force generate a map of strings that match all of the resulting hashes in no time. Then you don't need to know the original string, just an equivalent string that generates the same hash, or brute-force entry by iterating over the 1mln input strings.
Following on from a comment I just made to Pontus Gagge, suppose the hash algorithm is as follows:
Append some long, constant string to the input
Compute the SHA-256 hash of the result
Output the last 5 chars of the hash.
Then I'm pretty sure there's no computationally feasible way from your chosen-plaintext attack to figure out what the hashing function is. To even prove that SHA-256 is in use (assuming it's a good hash function, which as far as we currently know it is), I think you'd need to know the long string, which is only stored inside the "black box".
That said, if I knew any published 20-bit hash functions, then I'd be checking those first. But I don't know any: all the usual non-crypto string hashing functions are 32 bit, because that's the expected size of an integer type. You should perhaps compare your results to those of CRC, PJW, and BUZ hash on the same strings, as well as some variants of DJB hash with different primes, and any string hash functions built in to well-known programming languages, like java.lang.String.hashCode. It could be that the 5 output chars are selected from the 8 hex chars generated by one of those.
Beyond that (and any other well-known string hashes you can find), I'm out of ideas. To cryptanalyse a black box hash, you start by looking for correlations between the bits of the input and the bits of the output. This gives you clues what functions might be involved in the hash. But that's a huge subject and not one I'm familiar with.
This sounds mildly illicit.
Not to rain on your parade or anything, but if the implementors have done their work right, you wouldn't notice lags beyond a few tens of milliseconds on modern CPU's even with strong cryptographic hashes, and knowing the algorithm won't help you if they have used salt correctly. If you don't have access to the code or binaries, your only hope is a trivial mistake, whether caused by technical limitations or carelesseness.
There is an uncountable infinity of potential (hash) functions for any given set of inputs and outputs, and if you have no clue better than an upper bound on their computational complexity (from the lag you detect), you have a very long search ahead of you...

Resources