Can I rely on Go's `crypto/rand` package to give me unique random string? - go

I want to generate 32 characters long unique unguessable alphanumeric secret keys. The secret key will be an identifier for my system and will be used to look up information.
While searching the web I stumbled upon the crypto/rand package of Go. Which is able to generate random alphanumerics with the help of underline system calls. But I am concerned that the value returned by the crypto/rand package might produce a non-unique string down the line.
Can anyone clarify if I can rely on the crypto/rand package for the job?

Of course with randomly generated tokens, there is always the possibility of generating a duplicate token. There are standards such as UUID (excluding v4) that use other methods to try to "guarantee" uniqueness of each identifier. These methods do not truly obviate the possibility of collisions, they just shift the failure modes. For example, UUID1 relies on uniqueness of MAC addresses, which is a whole issue of its own.
If you are not limited by the size of your tokens, you can easily pick a sufficiently large number of bits that the probability of collisions becomes so small that it is completely dwarfed by countless other failure modes (such as programmer error, cosmic rays, a mass global extinction event, etc.).
Very approximately, if you have a true random key length of N bits, you can generate 2^(N/2) keys before having a 50% chance of seeing collisions. See the Wikipedia page for UUID#Collisions for a more general formula.

With crypto/rand there is no guarantee that individual random numbers will occur more than once. The probability of this to happen is very low, however, and it may be good enough for your use case. In many cases UUID will be good enough. If you are curious about the probability of duplicate UUIDs, see Wikipedia for example.
If you really need true uniqueness you may want to combine random numbers with a map to record them, where the number serves as key and the value is a "don't care". While recording the numbers, duplicates can be detected and a new random can be requested in case. However, this approach may introduce a new challenge depending on your setting as the numbers are now kept in memory which is insecure per se. It will also be challenging in terms of complexity if your use case does not determine the quantity of secrets required during the lifetime of the system.
For me, it really boils down to the question whether the identifiers for your system you use for info lookups are really secrets or you just want unique identifiers which are hard to predict before they occur in the system. Maybe you can elaborate on your use case to clarify your requirements.

I think, for this type of thing, you should use UUID
package main
import (
"fmt"
"github.com/google/uuid"
)
func main() {
id := uuid.New()
fmt.Println(id.String())
}

Related

How can I generate a unique identifier that is apparently not progressive [duplicate]

A few months back I was tasked with implementing a unique and random code for our web application. The code would have to be user friendly and as small as possible, but still be essentially random (so users couldn't easily predict the next code in the sequence).
It ended up generating values that looked something like this:
Af3nT5Xf2
Unfortunately, I was never satisfied with the implementation. Guid's were out of the question, they were simply too big and difficult for users to type in. I was hoping for something more along the lines of 4 or 5 characters/digits, but our particular implementation would generate noticeably patterned sequences if we encoded to less than 9 characters.
Here's what we ended up doing:
We pulled a unique sequential 32bit id from the database. We then inserted it into the center bits of a 64bit RANDOM integer. We created a lookup table of easily typed and recognized characters (A-Z, a-z, 2-9 skipping easily confused characters such as L,l,1,O,0, etc.). Finally, we used that lookup table to base-54 encode the 64-bit integer. The high bits were random, the low bits were random, but the center bits were sequential.
The final result was a code that was much smaller than a guid and looked random, even though it absolutely wasn't.
I was never satisfied with this particular implementation. What would you guys have done?
Here's how I would do it.
I'd obtain a list of common English words with usage frequency and some grammatical information (like is it a noun or a verb?). I think you can look around the intertubes for some copy. Firefox is open-source and it has a spellchecker... so it must be obtainable somehow.
Then I'd run a filter on it so obscure words are removed and that words which are too long are excluded.
Then my generation algorithm would pick 2 words from the list and concatenate them and add a random 3 digits number.
I can also randomize word selection pattern between verb/nouns like
eatCake778
pickBasket524
rideFlyer113
etc..
the case needn't be camel casing, you can randomize that as well. You can also randomize the placement of the number and the verb/noun.
And since that's a lot of randomizing, Jeff's The Danger of Naïveté is a must-read. Also make sure to study dictionary attacks well in advance.
And after I'd implemented it, I'd run a test to make sure that my algorithms should never collide. If the collision rate was high, then I'd play with the parameters (amount of nouns used, amount of verbs used, length of random number, total number of words, different kinds of casings etc.)
In .NET you can use the RNGCryptoServiceProvider method GetBytes() which will "fill an array of bytes with a cryptographically strong sequence of random values" (from ms documentation).
byte[] randomBytes = new byte[4];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
You can increase the lengh of the byte array and pluck out the character values you want to allow.
In C#, I have used the 'System.IO.Path.GetRandomFileName() : String' method... but I was generating salt for debug file names. This method returns stuff that looks like your first example, except with a random '.xyz' file extension too.
If you're in .NET and just want a simpler (but not 'nicer' looking) solution, I would say this is it... you could remove the random file extension if you like.
At the time of this writing, this question's title is:
How can I generate a unique, small, random, and user-friendly key?
To that, I should note that it's not possible in general to create a random value that's also unique, at least if each random value is generated independently of any other. In addition, there are many things you should ask yourself if you want to generate unique identifiers (which come from my section on unique random identifiers):
Can the application easily check identifiers for uniqueness within the desired scope and range (e.g., check whether a file or database record with that identifier already exists)?
Can the application tolerate the risk of generating the same identifier for different resources?
Do identifiers have to be hard to guess, be simply "random-looking", or be neither?
Do identifiers have to be typed in or otherwise relayed by end users?
Is the resource an identifier identifies available to anyone who knows that identifier (even without being logged in or authorized in some way)?
Do identifiers have to be memorable?
In your case, you have several conflicting goals: You want identifiers that are—
unique,
easy to type by end users (including small), and
hard to guess (including random).
Important points you don't mention in the question include:
How will the key be used?
Are other users allowed to access the resource identified by the key, whenever they know the key? If not, then additional access control or a longer key length will be necessary.
Can your application tolerate the risk of duplicate keys? If so, then the keys can be completely randomly generated (such as by a cryptographic RNG). If not, then your goal will be harder to achieve, especially for keys intended for security purposes.
Note that I don't go into the issue of formatting a unique value into a "user-friendly key". There are many ways to do so, and they all come down to mapping unique values one-to-one with "user-friendly keys" — if the input value was unique, the "user-friendly key" will likewise be unique.
If by user friendly, you mean that a user could type the answer in then I think you would want to look in a different direction. I've seen and done implementations for initial random passwords that pick random words and numbers as an easier and less error prone string.
If though you're looking for a way to encode a random code in the URL string which is an issue I've dealt with for awhile then I what I have done is use 64-bit encoded GUIDs.
You could load your list of words as chakrit suggested into a data table or xml file with a unique sequential key. When getting your random word, use a random number generator to determine what words to fetch by their key. If you concatenate 2 of them, I don't think you need to include the numbers in the string unless "true randomness" is part of the goal.

Solidity - Generate unpredictable random number that does not depend on input

I know that the "how to generate random number" in solidity is a very common question. However, after reading the great majority of answers I did not find one to fit my case.
A short description of what I want to do is: I have a list of objects that each have a unique id, a number. I need to produce a list that contains 25% of those objects, randomly selected each time the function is called. The person calling the function cannot be depended on to provide input that will somehow influence predictably the resulting list.
The only answer I found that gives a secure random number was Here. However, it depends on input coming from the participants and it is meant to address a gambling scenario. I cannot use it in my implementation.
All other cases mention that the number generated is going to be predictable, and even some of those depend on a singular input to produce a single random number. Once again, does not help me.
Summarising, I need a function that will give me multiple, non-predictable, random numbers.
Thanks for any help.
Here is an option:
function rand()
public
view
returns(uint256)
{
uint256 seed = uint256(keccak256(abi.encodePacked(
block.timestamp + block.difficulty +
((uint256(keccak256(abi.encodePacked(block.coinbase)))) / (now)) +
block.gaslimit +
((uint256(keccak256(abi.encodePacked(msg.sender)))) / (now)) +
block.number
)));
return (seed - ((seed / 1000) * 1000));
}
It generates a random number between 0-999, and basically it's impossible to predict it (It has been used by some famous Dapps like Fomo3D).
Smart Contracts are deterministic, so, basically every functions are predictable - if we know input, we will be and we should be know output. And you cannot get random number without any input - almost every language generates "pseudo random number" using clock. This means, you will not get random number in blockchain using simple method.
There are many interesting methods to generate random number using Smart Contract - using DAO, Oracle, etc. - but they all have some trade-offs.
So in conclusion, There is no method you are looking for. You need to sacrifice something.
:(
100% randomness is definitely impossible on Ethereum. The reason for that is that when distributed nodes are building from the scratch the blockchain they will build the state by running every single transaction ever created on the blockchain, and all of them have to achieve the exact same final status. In order to do that randomness is totally forbidden from the Ethereum Virtual Machine, since otherwise each execution of the exact same code would potentially yield a different result, which would make impossible to reach a common final status among all participants of the network.
That being said, there are projects like RanDAO that pretend to create trustable pseudorandomness on the blockchain.
In any case, there are approaches to achieve pseudandomness, being two of the most important ones commit-reveal techniques and using an oracle (or a combination of both).
As an example that just occurred to me: you could use Oraclize to call from time to time to a trusted external JSON API that returns pseudorandom numbers and verify on the contract that the call has truly been performed.
Of course the downside of these methods is that you and/or your users will have to spend more gas executing the smart contracts, but it's in my opinion a fair price for the huge benefits in security.

Is r.uuid() guaranteed to be unique?

Is r.uuid() guaranteed to be unique?
Return a UUID (universally unique identifier), a string that can be used as a unique ID.
How universal is r.uuid()? Is it scoped to a table/database/instance of RethinkDB? Or is it simply computing the hash of a random byte sequence (e.g. /dev/rand)? Or does it hash nano-unix time?
You can check the answers to a related question in here.
UUIDs are supposed to be uniques because of the very low probability of colisitions. Although in theory they may not be uniques as it's a random algorithm that generates the UUIDs, you will hardly generate a duplicate.
From the Wikipedia they say that for 68,719,476,736 generated UUIDs (Which it's a very huge number for a common application) you have 0.0000000000000004 for an accidental clash. It's almost impossible..
UUID means universally unique identifier. In this context the word unique should be taken to mean "practically unique" rather than "guaranteed unique". Since the identifiers have a finite size, it is possible for two differing items to share the same identifier. This is a form of hash collision.
Anyone can create a UUID and use it to identify something with reasonable confidence that the same identifier will never be unintentionally created by anyone to identify something else.
A UUID is simply a 128-bit value.

When do hashes collide?

I understand that according to pigeonhole principle, if number of items is greater than number of containers, then at least one container will have more than one item. Does it matter which container will it be? How does this apply to MD5, SHA1, SHA2 hashes?
No it doesn't matter which container it is, and in fact this is not that important to cryptographic hashes; much more important is the birthday paradox, which says that you only need to hash sqrt(numberNeededByPigeonHolePrincipal) values, on average, before finding a collision.
Thus, the hash needs to be large enough that the square-root of the search space is too large to brute-force. The square-root-of-search-space for SHA1 is 280, and as of March 2012, no two values have ever been found with the same SHA1-hash (though I predict that will happen within the next year or two..); same with SHA2, a family of hashes which all have an even larger search-space. MD5 has been broken for a while though.
If you have more items to hash than you have slots, then you'll have hash collisions. But if you have a poor hashing algorithm, then you'll see collisions even when the items / slots ratio is very small. A good hashing algorithm (including most of the ones you'll see in the wild) will attempt to spread the resulting hashes over the entire output space as evenly as possible, and thus minimize collisions.
Note that a hash collision is not the end of the world. When used in a hash table, for instance, it just means that more than one item is stored in a slot, and the table code will have to traverse a little bit more to find or add the target item, increasing lookup time slightly.
You'll see people refer to MD5 as a "broken" hashing algorithm, when in reality, it's just a poor one to use as a cryptographic hash. It'll be better than one you build yourself.
The point of a hash function is to randomly distribute items into containers. For any good hash function, it doesn't/shouldn't "matter" which container is which as they must be indistinguishable.
This does not apply to "perfect hash" implementations which attempt to do better than random distribution — unlike the algorithms you mentioned.
As Michael mentioned, collisions happen LONG before there are as many items as slots. You must have graceful collision handling (or a perfect hash) if you want to handle the birthday paradox.
I think which application you're using the hash function for is an important distinction. Frequent collision in hashing containers, for example, can degrade performance. Frequent collision in cryptography will have far more devastating consequences (see: cryptographic hash function on Wikipedia).
Collision happens relatively easily even with "decent" hashing algorithm. For example, in Java,
String s = new String(new char[size]);
always hashes to 0. That is, all strings containing only \0 hash to 0 in Java.
As for "does it matter which container will it be?", again it depends on the application. You can design hash functions that would hash "similar" objects to nearby values. This is useful when you want to search for similar objects, for example. Just hash them all and see where they fall. In this case, collisions or near-collisions are desirable, because it groups objects that are similar.
In other applications, you want even the slightest change in the object to result in an entirely different hash value. This is the case in cryptography, for example, where you want to be as certain as possible that something has not been modified. It is far more difficult to find different objects that hash to the same value in this case.
Depending on your application, cryptographic hashes like MDA, SHA1/2 etc. may not be the ideal choice, precisely because they appear as if entirely random, thus giving you collisions as prediced by the birthday paradox. Traditionally, one reason for using simple hashes based on the remainder operation is that keys were expected to be serial numbers or similar, so that a remainder operation would sustain fewer collisions than expected at random. E.g. if the keys are the integers are 1..1000 you might have no collisions at all in a container of size 1009 if your hash function is the key mod 1009. People would sometimes hand-tune systems by carefully picking container size and hash function to achieve an even split.
Of course, if you have to worry about people maliciously choosing keys that will cause you difficulty, or an upstream system sending you very biassed keys (because e.g. it has its own hash table and decides to process all keys that hash to X at once). you may wish to use a hash based on a keyed cryptographic hash function to defend against this.

Mapping a value to an other value and back

Imagine a value, say '1234'. I want to map that value to an other value, say 'abcd'. The constrains:
The length of the target value is equal to the start value
The mapping should be unique. E.g. 1234 should only map to abcd and viseversa
The mapping process should be (very) difficult to guess. E.g. multiplying by 2 does count
The mapping should be reversible
The start value is an integer
The target value can be of any type
This should be a basic algorithm, eventually I'll write it in Ruby but that is of no concern here.
I was thinking along the following lines:
SECRET = 1234
def to(int)
SECRET + int * 2
end
def fro(int)
(int - SECRET) / 2
end
Obviously this violates constrains 1 and 3.
The eventual goal is to anonymize records in my database. I might be over thinking this.
First off, I rather think your objectives are too ambitious: why constraint 6?
Second, what you need is technically a bijection from the domain of integers.
Third, your constraint 3 goes against Kerkhoff's principle. You'd be better off with a well-known algorithm governed by a secret key, where the secret key is hard to derive even if you know the results for a large set of integers.
Fourth, what are you anonymizing against? If you are dealing with personal information, how will you protect against statistical analysis revealing that Xyzzy is actually John Doe, based on the relations to other data? There's some research on countering such attack vectors (google for e.g. 'k-anonymization').
Fifth, use existing cryptographic primitives rather than trying to invent your own. Encryption algorithms exist (e.g. AES in cipher-block-chaining mode) that are well-tested -- AES is well supported by all modern platforms, presumably Ruby as well. However, encryption still doesn't give records anonymity in any strong sense.
It might be worth you giving a little more detail on what you're trying to acheive. Presumably you're worried about some evil person getting hold of your data, but isn't it equally possible that this evil person will also have access to the code that accessed your database? What's to stop them learning the algorithm by inspecting your code?
If you truely want to anonymize the data then that's generally a one way thing (names are removed, credit card values are removed etc). If you're trying to encrypt the contents of the database then many database engines provide well tested mechanisms to do this. For example:
Best practices for dealing with encrypted data in MSSQL
database encryption
It's always better to use a product's encryption mechanism than roll your own.

Resources