Algorithm to generate unique (possibly auto incremented) ids - algorithm

I need to generate unique ids for my application and I am looking for suitable algorithms. I would prefer something like this --
YYYY + MM + DD + HH + MM + SS + <random salt> + <something derived from the preceding values>
F.ex. -
20100128184544ewbhk4h3b45fdg544
I was thinking about using SHA-256 or something but the resultant string should not be too long. I could use UUID but again, they are too long and they are guaranteed to be unique on only one machine.
I would welcome suggestions, ideas. My programming language is Java.
Edit: The ids need not be cryptographically secure. I am looking at simpler hashing algos like the one by Dan Bernstein, etc.

You could use that SHA-256 and then only take the first 10 bytes from the result (or however many you like, balancing length and uniqueness however you like).

So I have finally settled for this -
d = YYYYMMDDHHMMSS
hash = d + sha256(d + random_salt)[:10]
Thank you all for the response.

Try this:
java.security.messageDigest()

I think if you use SHA1(MD5(YYYYMMDDHHMMSS + YourSystemName + ClientName)) u'll be fine with 40 chars.. ;)

Related

I cannot understand mistake lmer

I tried to solve the problem reading other answers but did not get the solution.
I am performing a lmer model:
MODHET <- lmer(PERC ~ SITE + TREAT + HET + TREAT*HET + (1|PINE), data = PRESU).
Perc is the percentage of predation. Site is a categorical variable that I am using as blocking factor. It is site identity where I performed the experiement. TREAT is categorical variable of 2 levels. HET is a continuous variable. The number of observation is 56 divided in 7 sites
Maybe the problem is how I expressed the random factor. In every site I selected 8 pines among 15 to perform the experiment. I included the pine identity as categorical random factor. For instance in Site 1 pines are called a1,a3,a7 ecc, while in site 2 are called b1,b4,b12 ecc...
The output of the model is
Error: number of levels of each grouping factor must be < number of observations
I donĀ“t understand where is the mistake. Could it be how I called the pines?
I tried also
MODHET <- lmer(PERC ~ SITE + TREAT + HET + TREAT*HET + (1|SITE:PINE), data = PRESU)
but the output is the same.
I hope that I explained well my problems. I read on this forum similar questions about it but I still do not get the solution.
Thank you for your help
Use argument control = lmerControl(check.nobs.vs.nRE = "ignore") in your lmer-call to suppress this error. However, I guess this does not solve the actual problem. It seems to me that your grouping level contains no "groups", probably "SITE" is your random intercept?
If you consider PINES nested as "subjects" within SITES, then I would suggest following formula:
MODHET <- lmer(PERC ~ TREAT*HET + (1|SITE), data = PRESU)
or,
MODHET <- lmer(PERC ~ TREAT*HET + (1 | SITE / PINE), data = PRESU)
But my answer may be wrong, I'm not sure whether I have enough information to fully understand what you're aiming at.
edit:
Sorry, nesting was not correctly specified, I fixed it in the above formula. See also this answer .

How can I combine comma format with scientific format in SAS?

I have data that I would like to represent as comma10.2 when less than 1,000,000 and e10. when greater than or equal to 1,000,000. It seems like there might be a way to do this using the picture format, so I thought I might also making missing values show up as --. This is what I've got so far:
proc format;
picture DashMiss . = '--' (noedit)
low - <1000000 = "000,009.99"
1000000 - high = ????;
run;
I'm not sure how to represent scientific notation using picture (hence the question marks). I don't have to just use picture if there's an easier way to do it.
I figured out how to use brackets to add the conditional format:
proc format;
picture DashMiss . = '--' (noedit)
low - <1000000 = "000,009.99"
1000000 - high = [e10.];
run;
I believe you could've simply used the best6. format or bestd6.2 to achieve the same results. It naturally uses scientific notation whenever the length is beyond the first of the 2 integers.

Ruby on Rails - generating bit.ly style identifiers

I'm trying to generate UUIDs with the same style as bit.ly urls like:
http://bit [dot] ly/aUekJP
or cloudapp ones:
http://cl [dot] ly/1hVU
which are even smaller
how can I do it?
I'm now using UUID gem for ruby but I'm not sure if it's possible to limitate the length and get something like this.
I am currently using this:
UUID.generate.split("-")[0] => b9386070
But I would like to have even smaller and knowing that it will be unique.
Any help would be pretty much appreciated :)
edit note: replaced dot letters with [dot] for workaround of banned short link
You are confusing two different things here. A UUID is a universally unique identifier. It has a very high probability of being unique even if millions of them were being created all over the world at the same time. It is generally displayed as a 36 digit string. You can not chop off the first 8 characters and expect it to be unique.
Bitly, tinyurl et-al store links and generate a short code to represent that link. They do not reconstruct the URL from the code they look it up in a data-store and return the corresponding URL. These are not UUIDS.
Without knowing your application it is hard to advise on what method you should use, however you could store whatever you are pointing at in a data-store with a numeric key and then rebase the key to base32 using the 10 digits and 22 lowercase letters, perhaps avoiding the obvious typo problems like 'o' 'i' 'l' etc
EDIT
On further investigation there is a Ruby base32 gem available that implements Douglas Crockford's Base 32 implementation
A 5 character Base32 string can represent over 33 million integers and a 6 digit string over a billion.
If you are working with numbers, you can use the built in ruby methods
6175601989.to_s(30)
=> "8e45ttj"
to go back
"8e45ttj".to_i(30)
=>6175601989
So you don't have to store anything, you can always decode an incoming short_code.
This works ok for proof of concept, but you aren't able to avoid ambiguous characters like: 1lji0o. If you are just looking to use the code to obfuscate database record IDs, this will work fine. In general, short codes are supposed to be easy to remember and transfer from one medium to another, like reading it on someone's presentation slide, or hearing it over the phone. If you need to avoid characters that are hard to read or hard to 'hear', you might need to switch to a process where you generate an acceptable code, and store it.
I found this to be short and reliable:
def create_uuid(prefix=nil)
time = (Time.now.to_f * 10_000_000).to_i
jitter = rand(10_000_000)
key = "#{jitter}#{time}".to_i.to_s(36)
[prefix, key].compact.join('_')
end
This spits out unique keys that look like this: '3qaishe3gpp07w2m'
Reduce the 'jitter' size to reduce the key size.
Caveat:
This is not guaranteed unique (use SecureRandom.uuid for that), but it is highly reliable:
10_000_000.times.map {create_uuid}.uniq.length == 10_000_000
The only way to guarantee uniqueness is to keep a global count and increment it for each use: 0000, 0001, etc.

Generating confirmation numbers

I need a technique (an a pointer to sample code if you have) for generating conformation numbers for web payment. I don't want the customer to write down a long sequence like a GUID but I don't want it easily predictable as well.
Using C#
Thanks for all the tips.
I decided on a format like this:
TdddRROOO
T = 2009 (next year will be U = 2010)
ddd = days this year
RR = two random numbers
000 = order number (I'll offset this so folks can't know the order number that day)
So the confirmation number will be something like
P23477098
You could do something with a mixture. Generate the first half of the key as a known, predictable value (e.g. 00001, 00002, 00003, etc.) and then generate the second half as a randomly generated value so it won't be predictable. Then, increment the "known, predictable" value so that you will never get a match.
Your unique code would then become: 00001-53481, 00002-43853, 00003-54511, etc.
Of course, I am sure there are libraries out there that probably do this already. (It might help if you specify what language you are using.)
I recent did same thing in PHP. We use random function in this class,
https://github.com/kohana/core/blob/3.3/master/classes/Kohana/Text.php
We use random('distinct', 8) to generate confirmation number. It generates strings like this,
4CFY24HJ
JH5AYL7J
2TVWTMJ5
As you can see, it has no confusing numbers/letters like (1/l, 0/O etc) so it makes it much clearer when customers have to read the numbers over the phone.
Decide on the characters (char[] chars) that you want in your confirmation code, decide on the length of confirmation code (n), generate n random numbers (i_1, i_2, ... i_n) in the range [0..chars.Length) and return the string chars[i_1]chars[i_2]...chars[i_n].
In C#:
public string ConfirmationCode(char[] chars, int length, Random rg) {
StringBuilder codeBuilder = new StringBuilder();
for(int i = 0; i < length; i++) {
int index = rg.Next(chars.Length);
codeBuilder.Append(chars[index]);
}
return codeBuilder.ToString();
For uniqueness, prepend the current time in yyyyMMddhhmmss format.
Just generate a random number between 100000 and 999999, for example. Also a good idea is to put some letters in front that identify that it is a confirmation number, such as CONF-843682 so that people will recognize it more easily when you ask for it.
Store the number in the database, together with an ID for the order and an expiry date (say 1 year).
You could do something like get a random number of a specified length, convert to base64 and add a checksum character.
How about something like Amazon's PayPhrase? Use a library like Faker (Ruby) or Data::Faker (Perl) to generate random phrases, or write your own utility. Then just use a simple hash function to convert the "confirmation phrase" into a number you can index.
As for C# there exists a port Ruby's Faker gem at http://github.com/slashdotdash/faker-cs

Creating a unique alphanumeric 10-character string

I'm looking to create a simple short-lived reservation system, and I'd like to generate confirmation numbers that are
unique
random-looking
alphanumeric
short-ish, at least much shorter than 32 character-long strings returned by sha1
I'm only looking to have ~500 reservations, so I don't imagine high likelyhood of collissions.
One idea I had is generate an sha1 hash based on a date-time stamp and username, then truncating it to its first 10 characters. Would something like that be reliably unique enough for the purposes of processing ~500 reservations?
There should be no difference in the randomness of any given bit of a SHA-1 hash, so that's possible. Another way would be to fold the hash into itself using XOR until you have 60 bits worth of data, then encode it using Base 64 to get a mostly alpha-numeric result.
This is only necessary if you want to be able to generate the same Id repeatedly for the same input data. Otherwise, if a random id that you generate once, and hold onto after that, use Anders' suggestion. If you get a conflict, just generate another one.
You can use whatever, even a plain random number generator; however, you should check that the reservation code isn't already present. If this is the case, add characters ('x') to the string (date+user) until you get a new random/sha1/etc.
I'm only looking to have ~500 reservations, so I don't imagine high likelyhood of collissions.
Another stupid idea: generate 1000 or 2000 unique random numbers with the desired properties, store them somewhere, and assign them to the users as they register :)
Here's one way to do it in Perl:
sub get_random_name()
{
my #chars=('a'..'z','A'..'Z');
my $random_string;
foreach (1..22)
{
# rand #chars will generate a random
# number between 0 and scalar #chars
$random_string .= $chars[rand #chars];
}
return $random_string . "-" . time();
}
I don't remember how long the time() part is, so you may have to adjust the numbers to fit your length. You can also remove that part if you don't need it.
If it's really just 500, then pre-generate 20,000 of them, into a table, then get the "next unused one" when you need it.
Some good tips on this question: How do I create a random alpha-numeric string in C++?
I'd avoid including characters like "1", "l", and "O", "0" and "5", "S", and "Z", "2" in your string, to make it easier for customers when they need to read your reservation code over the phone. The algorithm presented at that link should help you do this.
use a guid? 16 characters, though if you really don't care about collision, you could just choose the first n characters.
In C# you can use http://www.dotnetfunda.com/forums/thread1357-how-do-generate-unique-alpha-numeric-random-number-in-aspnet.aspx (the super easy way, they say)

Resources