I need to generate a 32 digit random number using lua, please suggest some way - random

I want to generate a 32 digit key containing alphanumeric characters and they should always be UNIQUE. Please suggest a way of doing that.
I've used math.random function but I get the same random number over and over again, i want it to be unique.

If you want a unique number you have to have a list of used numbers. Otherwise you will always have the chance of getting a used number again. Although it is quite unlikely with 32 digits.
You obviously did not read the documentation on math.random:
http://www.lua.org/manual/5.3/manual.html#pdf-math.random
Otherweise you would know that math.random will always give you the same pseudo random numbers unless you change the seed value using math.randomseed...
Please make sure to read the documentation on functions befor you use them.

The trivial program below generates 1000 random 32-bit keys without repetion with no wasted effort:
M=2^32-1
R={}
N=1000
n=0
for i=1,N do
local x=math.random(M)
if R[x]==nil then n=n+1 R[x]=n print(n,i,n==i,x) end
end
The point is that M is so large that repetitions are very unlikely, even if we increase N to one million.

Related

Using a specific seed in the `RANDOM_NUMBER` algorithm

I'm looking to use a specific set of seeds for the intrinsic function RANDOM_NUMBER (a PRNG). What I've read so far is that the seed value can be set via calling RANDOM_SEED, specifically RANDOM_SEED(GET = array). My confusion is how to (if it's possible) set a specific value for the algorithm, for instance in the RAND, RANDU, or RANDM algorithms one can specify their own seed directly. I'm not sure how to set the seed, as the get function seems to take an array. If it takes an array, does it always pull the seed value from a specific index in the array?
Basically, is there a way to set a specific single seed value? If so, would someone be able to write-it out?
As a side note - I'm attempting to set my seed because allegedly one of the other PRNGs I mentioned only works well with "large odd numbers" according to my professor, so I decided that I may as well control this when comparing the PRNG's.
First, the RANDOM_SEED is only used for controlimg the seed of RANDOM_NUMBER(). If you use any other random number generator subroutine or function, it will not affect them at all or if yes then in some compiler specific way which you must find in the manual. But most probably it does not affect them.
Second, you should not care at all whether the seed array contains 1, 4 or 42 integers, it doesn't matter because the PRNG uses the bits from the whole array in some unspecified custom way. For example you may be generating 64 bit real numbers, but the seed array is made of 32 bit integers. You cannot simply say which integer from the seed array does what. You can view the whole seed array as one big integer number that is cut into smaller pieces if you want.
Regarding your professors advice, who knows what he meant, but the seed is probably set by some procedure of that particular generator, and not by the standard RANDOM_SEED, so you must read the documentation for that generator.
And how to use a specific seed in RANDOM_SEED? It was described on this site several times,jkust search for RANDOM_SEED in the top right search field, really. But it is simple, once you know the size of the array, size it to any non-trivial numbers (you need enough non-zero bits) you want and use put=. That's really all, just don't think about individual values in the array, the whole array is one piece of data together.

Generate random number in interval in PostScript

I am struggling to find a way to generate a random number within a given interval in PostScript.
Basically PostScript has three functions to help you generate (pseudo-)random numbers. Those are rand, srand and rrand.
The later two are for passing a seed to the number generator to be able to reproduce specific results. At least that´s what I understood they are for. Anyway they don´t seem suitable for my case.
So rand seems to be the only function I can use to generate a random number, but...
rand returns a random integer in the range 0 to 231 − 1 (From the PostScript Language Reference, page 637 (651 in the PDF))
This is far beyond the the interval I´m looking for. I am more interested in values up to small thousands, maybe 10.000 or something like that and small float values, up to 100, all with the lower limit of 0.
I thought I could just narrow my numbers down by simple divisions and extracting the root but that tends to give me unusable small values in quite a lot cases. I am wondering if there are robust ways to either shrink a large number down to what I need or, I´d prefer that, only generate numbers in the desired interval.
Besides: while-loops are not possible in PostScript, otherwise I´d have written a function to generate numbers until they fit in my interval.
Any hints on what to look for breaking numbers down into my interval?
mod is often good enough and it's fast. But you may get a more uniform distribution by using floating-point ops.
rand 16#7fffffff div 100 mul cvi
This is because mod discards the upper bits of the input. And the PRNG is usually trying to randomize over all the bits. By scaling down then up, they all contribute something in the way of rounding effects.
Just use the modulo operator to get it down to the size you want:
GS>rand 100 mod stack
7

Is there anything wrong with a random seed of 0, 1

Is there anything particularly wrong with a random seed of 0, 1... or any other small integer, when using Mersenne Twister?
(I definitely want a repeatable pseudorandom sequence hence not seeding from time() or such like).
No problem whatsoever.
The point of seeding via a single integer is programmers's convenience:
You don't need to remember the rules for admissible seeds ("need 55 numbers, not all even; need < 256 bytes; ...");
You can (hopefully) easily get either different streams, or sequences which are at least far away from one another (some generators can prove this, not sure about MT); the seed is just (conceptually!) an index into a list of possible sequences.
The actual value of the seed is irrelevant -- 0 isn't more random than 1391202260 :)

Chicken/Egg problem: Hash of file (including hash) inside file! Possible?

Thing is I have a file that has room for metadata. I want to store a hash for integrity verification in it. Problem is, once I store the hash, the file and the hash along with it changes.
I perfectly understand that this is by definition impossible with one way cryptographic hash methods like md5/sha.
I am also aware of the possibility of containers that store verification data separated from the content as zip & co do.
I am also aware of the possibility to calculate the hash separately and send it along with the file or to append it at the end or somewhere where the client, when calculating the hash, ignores it.
This is not what I want.
I want to know whether there is an algorithm where its possible to get the resulting hash from data where the very result of the hash itself is included.
It doesn't need to be cryptographic or fullfill a lot of criterias. It can also be based on some heuristics that after a realistic amount of time deliver the desired result.
I am really not so into mathematics, but couldn't there be some really advanced exponential modulo polynom cyclic back-reference devision stuff that makes this possible?
And if not, whats (if there is) the proof against it?
The reason why i need tis is because i want (ultimately) to store a hash along with MP4 files. Its complicated, but other solutions are not easy to implement as the file walks through a badly desigend production pipeline...
It's possible to do this with a CRC, in a way. What I've done in the past is to set aside 4 bytes in a file as a placeholder for a CRC32, filling them with zeros. Then I calculate the CRC of the file.
It is then possible to fill the placeholder bytes to make the CRC of the file equal to an arbitrary fixed constant, by computing numbers in the Galois field of the CRC polynomial.
(Further details possible but not right at this moment. You basically need to compute (CRC_desired - CRC_initial) * 2-8*byte_offset in the Galois field, where byte_offset is the number of bytes between the placeholder bytes and the end of the file.)
Note: as per #KeithS's comments this solution is not to prevent against intentional tampering. We used it on one project as a means to tie metadata within an embedded system to the executable used to program it -- the embedded system itself does not have direct knowledge of the file(s) used to program it, and therefore cannot calculate a CRC or hash itself -- to detect inadvertent mismatch between an embedded system and the file used to program it. (In later systems I've just used UUIDs.)
Of course this is possible, in a multitude of ways. However, it cannot prevent intentional tampering.
For example, let
hash(X) = sum of all 32-bit (non-overlapping) blocks of X modulo 65521.
Let
Z = X followed by the 32-bit unsigned integer (hash(X) * 65521)
Then
hash(Z) == hash(X) == last 32-bits of Z
The idea here is just that any 32-bit integer congruent to 0 modulo 65521 will have no effect on the hash of X. Then, since 65521 < 2^16, hash has a range less then 2^16, and there are at least 2^16 values less than 2^32 congruent to 0 modulo 65521. And so we can encode the hash into a 32 bit integer that will not affect the hash. You could actually use any number less than 2^16, 65521 just happens to be the largest such prime number.
I remember an old DOS program that was able to embed in a text file the CRC value of that file. However, this is possible only with simple hash functions.
Altough in theory you could create such file for any kind of hash function (given enough time or the right algorithm), the attacker would be able to use exactly the same approach. Even more, he would have a chose: to use exactly your approach to obtain such file, or just to get rid of the check.
It means that now you have two problems instead of one, and both should be implemented with the same complexity. It's up to you to decide if it worth it.
EDIT: you could consider hashing some intermediary results (like RAW decoded output, or something specific to your codec). In this way the decoder would have it anyway, but for another program it would be more difficult to compute.
No, not possible. You either you a separate file for hashs ala md5sum, or the embedded hash is only for the "data" portion of the file.
the way the nix package manager does this is by when calculating the hash you pretend the contents of the hash in the file are some fixed value like 20 x's and not the hash of the file then you write the hash over those 20 x's and when you check the hash you read that and ignore again it pretending the hash was just the fixed value of 20 x's when hashing
they do this because the paths at which a package is installed depend on the hash of the whole package so as the hash is of fixed length they set it as some fixed value and then replace it with the real hash and when verifying they ignore the value they placed and pretend it's that fixed value
but if you don't use such a method is it impossible
It depends on your definition of "hash". As you state, obviously with any pseudo-random hash this would be impossible (in a reasonable amount of time).
Equally obvious, there are of course trivial "hashes" where you can do this. Data with an odd number of bits set to 1 hash to 00 and an even number of 1s hash to 11, for example. The hash doesn't modify the odd/evenness of the 1 bits, so files hash the same when their hash is included.

How to generate a verification code/number?

I'm working on an application where users have to make a call and type a verification number with the keypad of their phone.
I would like to be able to detect if the number they type is correct or not. The phone system does not have access to a list of valid numbers, but instead, it will validate the number against an algorithm (like a credit card number).
Here are some of the requirements :
It must be difficult to type a valid random code
It must be difficult to have a valid code if I make a typo (transposition of digits, wrong digit)
I must have a reasonable number of possible combinations (let's say 1M)
The code must be as short as possible, to avoid errors from the user
Given these requirements, how would you generate such a number?
EDIT :
#Haaked: The code has to be numerical because the user types it with its phone.
#matt b: On the first step, the code is displayed on a Web page, the second step is to call and type in the code. I don't know the user's phone number.
Followup : I've found several algorithms to check the validity of numbers (See this interesting Google Code project : checkDigits).
After some research, I think I'll go with the ISO 7064 Mod 97,10 formula. It seems pretty solid as it is used to validate IBAN (International Bank Account Number).
The formula is very simple:
Take a number : 123456
Apply the following formula to obtain the 2 digits checksum : mod(98 - mod(number * 100, 97), 97) => 76
Concat number and checksum to obtain the code => 12345676
To validate a code, verify that mod(code, 97) == 1
Test :
mod(12345676, 97) = 1 => GOOD
mod(21345676, 97) = 50 => BAD !
mod(12345678, 97) = 10 => BAD !
Apparently, this algorithm catches most of the errors.
Another interesting option was the Verhoeff algorithm. It has only one verification digit and is more difficult to implement (compared to the simple formula above).
For 1M combinations you'll need 6 digits. To make sure that there aren't any accidentally valid codes, I suggest 9 digits with a 1/1000 chance that a random code works. I'd also suggest using another digit (10 total) to perform an integrity check. As far as distribution patterns, random will suffice and the check digit will ensure that a single error will not result in a correct code.
Edit: Apparently I didn't fully read your request. Using a credit card number, you could perform a hash on it (MD5 or SHA1 or something similar). You then truncate at an appropriate spot (for example 9 characters) and convert to base 10. Then you add the check digit(s) and this should more or less work for your purposes.
You want to segment your code. Part of it should be a 16-bit CRC of the rest of the code.
If all you want is a verification number then just use a sequence number (assuming you have a single point of generation). That way you know you are not getting duplicates.
Then you prefix the sequence with a CRC-16 of that sequence number AND some private key. You can use anything for the private key, as long as you keep it private. Make it something big, at least a GUID, but it could be the text to War and Peace from project Gutenberg. Just needs to be secret and constant. Having a private key prevents people from being able to forge a key, but using a 16 bit CR makes it easier to break.
To validate you just split the number into its two parts, and then take a CRC-16 of the sequence number and the private key.
If you want to obscure the sequential portion more, then split the CRC in two parts. Put 3 digits at the front and 2 at the back of the sequence (zero pad so the length of the CRC is consistent).
This method allows you to start with smaller keys too. The first 10 keys will be 6 digits.
Does it have to be only numbers? You could create a random number between 1 and 1M (I'd suggest even higher though) and then Base32 encode it. The next thing you need to do is Hash that value (using a secret salt value) and base32 encode the hash. Then append the two strings together, perhaps separated by the dash.
That way, you can verify the incoming code algorithmically. You just take the left side of the code, hash it using your secret salt, and compare that value to the right side of the code.
I must have a reasonnable number of possible combinations (let's say 1M)
The code must be as short as possible, to avoid errors from the user
Well, if you want it to have at least one million combinations, then you need at least six digits. Is that short enough?
When you are creating the verification code, do you have access to the caller's phone number?
If so I would use the caller's phone number and run it through some sort of hashing function so that you can guarantee that the verification code you gave to the caller in step 1 is the same one that they are entering in step 2 (to make sure they aren't using a friend's validation code or they simply made a very lucky guess).
About the hashing, I'm not sure if it's possible to take a 10 digit number and come out with a hash result that would be < 10 digits (I guess you'd have to live with a certain amount of collision) but I think this would help ensure the user is who they say they are.
Of course this won't work if the phone number used in step 1 is different than the one they are calling from in step 2.
Assuming you already know how to detect which key the user hit, this should be doable reasonably easily. In the security world, there is the notion of a "one time" password. This is sometimes referred to as a "disposable password." Normally these are restricted to the (easily typable) ASCII values. So, [a-zA-z0-9] and a bunch of easily typable symbols. like comma, period, semi colon, and parenthesis. In your case, though, you'd probably want to limit the range to [0-9] and possibly include * and #.
I am unable to explain all the technical details of how these one-time codes are generated (or work) adequately. There is some intermediate math behind it, which I'd butcher without first reviewing it myself. Suffice it to say that you use an algorithm to generate a stream of one time passwords. No matter how mnay previous codes you know, the subsequent one should be impossibel to guess! In your case, you'll simply use each password on the list as the user's random code.
Rather than fail at explaining the details of the implementation myself, I'll direct you to a 9 page article where you can read up on it youself: https://www.grc.com/ppp.htm
It sounds like you have the unspoken requirement that it must be quickly determined, via algorithm, that the code is valid. This would rule out you simply handing out a list of one time pad numbers.
There are several ways people have done this in the past.
Make a public key and private key. Encode the numbers 0-999,999 using the private key, and hand out the results. You'll need to throw in some random numbers to make the result come out to the longer version, and you'll have to convert the result from base 64 to base 10. When you get a number entered, convert it back to base64, apply the private key, and see if the intereting numbers are under 1,000,000 (discard the random numbers).
Use a reversible hash function
Use the first million numbers from a PRN seeded at a specific value. The "checking" function can get the seed, and know that the next million values are good. It can either generate them each time and check one by one when a code is received, or on program startup store them all in a table, sorted, and then use binary search (maximum of compares) since one million integers is not a whole lot of space.
There are a bunch of other options, but these are common and easy to implement.
-Adam
You linked to the check digits project, and using the "encode" function seems like a good solution. It says:
encode may throw an exception if 'bad' data (e.g. non-numeric) is passed to it, while verify only returns true or false. The idea here is that encode normally gets it's data from 'trusted' internal sources (a database key for instance), so it should be pretty usual, in fact, exceptional that bad data is being passed in.
So it sounds like you could pass the encode function a database key (5 digits, for instance) and you could get a number out that would meet your requirements.

Resources