How to generate the effective order number? (nice pattern with unpredicatable gap) - algorithm

just wondering does anyone in here have good idea about generating nice order id?
for example
832-28-394, which show a quite nice and formal order id (rather than just use an database auto increment number like ID=35).
the order id need to look random so it can not be able to guess by user.
e.g. 832-28-395 (shoudnt exist) so there will always some gap between each id.
just like the account number for your bank card?
Cheers

If you are using .NET you can use System.Guid.NewGuid()

The auto-incremented IDs are stored as integer or long integer data. One of the reasons for this is that this format is compact, saving space, including in indexes which are typically inclusive a primary key for use with joins and such.
If you wish to create a nice looking id following a particular format syntax, you'll need to manage the generation of the IDs yourself, and store these in a "regular" column not one that is auto-incremented.
I suggest you keep using "ugly looking" ids, be they auto-incremented or not, and format these value for display purposes only, using whatever format you may desire, including some format that use the values from several columns. Depending on the database system you are using you may be able to declare custom functions, at the level of the database itself, allowing you to obtain the readily formatted value with a simple query (as in
SELECT MakeAFancyId(id_field), some_other_columns, ..
FROM ...
If you cannot use some built-in or custom function at the level of SQL, you'll need to format the value supplied by SQL (an integer of sorts), into the desired format, on the client-side, using the language associated with your UI / presentation framework.

I'd create something where the first eight numbers are loosely in a pattern, and a third quartet looks random but is really a sort of checksum.
So, for example, the first eight digits increment based on the current seconds on the server clock.
The last four could be something like the sum of the first four, plus twice the sum of the second four, which will give either a two or three digit number. The final digit is calculated so that the sum of all 11 digits plus this last one is a multiple of 9.
This is slightly akin to how barcode numbers are verified. You can format the resulting 12 digits any way you want, although it is the first eight that are unique here.

Hash the clock time.
Mod by 100,000 or something.
Format with hyphens.
Check for duplicates. If found, restart.

I would suggest using a autoincrement ID in the database to link tables and as a primary key. Integer fields are always faster than string fields for indexing and well as searching.
You can have the order number field (which is for display) as a different field in the order table which will be used to display. And whenever you are planning to send a URl to a user or display a URL to the user which has order ID (which is a autoincremented number) you can encrypt it with some algorithm.
Both your purpose will be solved.
But I suggest not to make string as primary key. Though you can have a unique constraint on the order number which is going to be displayed.
Hope this helps.
Kalpak Luniya

I would suggest internally you keep the database derived primary key, which is auto-incremented.
For the visible order number, you will probably need a longer length than 8 characters, if you are using this for security.
If you are using Ruby, look at SecureRandom, which will generate sufficiently random strings to accomodate this. For example, you can use SecureRandom.hex(16), and it will give you a 16 digit hex number. I believe it can also give you base 64 strings, which will look weirder but be shorter.
Make sure this is not your only security on an order, as it may not be that hard to find a valid order number within your 8 digit code, especially if some are some sort of checksum.

For security reasons i suggest that you should use Criptographicaly secure random number generator. Think about idea on icreasing User Id length -if you have 1 million users then the probability to gues User ID in first try is 0.01 and 67 tries to increase probability over 0.5

Related

Laravel schema column

I'm trying to make numeric column in my table but I'm not sure which option would be the best fit for my need
logic
numbers start from 4 figure up e.g. 1000
no ending limitation can goes up like: 999999999999999999999999999999999999999999
increased incrementally e.g. last number was 1000, next will be 1001 (not random numbers)
Question
Laravel provided several option that can do my job but I need help to know which is the best for my purpose
bigIncrements
bigInteger
unsignedBigInteger
ALSO there is one option ->autoIncrement() should I add that too?

How does RethinkDB generate auto ids?

I'm writing a script which supposed to merge some data from sql-based db. Each row has a long-integer as a primary key (incremental). I was thinking about hashing these ids so that they'll somehow 'look' like the other ids already in my RethinkDB table. What I'm trying to achive here is to avoid dups in case of an attempt to merge the same data again, but keeping the original integers as ids along with the generated ids of the data saved directly to RethinkDB's table feels weird.
Can I do that?
How does RethinkDB generate auto ids anyways?
And am I approaching this correctly..?
RethinkDB uses a string-encoding of 128 bit UUIDs (basically hashed integers).
The string format looks like this: "HHHHHHHH-HHHH-HHHH-HHHH-HHHHHHHHHHHH" where every 'H' is a hexadecimal digit of the 128 bit integer. The characters 0-9 and a-f (lower case) are used.
If you want to generate such UUIDs from an existing integer, I recommend hashing the integer first. This will give you an even distribution over the whole key space (this makes sharding easier and avoids hotspots).
As a second step you have to format the hash value in a string of the format shown above. If you don't have enough digits, it's fine to leave some of the last 'H' as constant 0.
If you really want to go into the details of UUID generation, here are two links for further reading:
RFC 4122 "A Universally Unique IDentifier (UUID) URN Namespace" https://www.rfc-editor.org/rfc/rfc4122
RethinkDB's implementation of UUID generation and formatting https://github.com/rethinkdb/rethinkdb/blob/next/src/containers/uuid.cc

Limiting AutoIncrement to a specific range

I am trying to create an application for work. The app will be used internally and should allow us to assign some barcode numbers to our product SKUs. I am using Visual Studio / Basic 2010 Express to build this as my very limited and beginners experience is with VS 2010 Express.
I'll give a bit of information about how I see this application working and then I'll get on with my actual question:
I see the app allowing us to create a new Product in the database by a user entering the SKU and description of the product and then the app will assign this product the next available base number for the barcode and from there the app will (if required) generate the correct EAN13 and GTIN14 barcodes and store them against that SKU.
As a company we have a large range of barcode numbers we can use and we have split this large range up so that the first 50,000 (for example) are for our EAN13 codes, the next 50K are for our GTIN14 codes for Inner Cartons and the remaining 50K are for Master Cartons.
So in order to achieve this I have my Product table which contains the fields 'SKU', 'Description' and 'BarcodeBase'. I have managed to set the BarcodeBase field as unique and I am attempting to use AutoIncrement(Seed & Step) to make sure that this assigns the product a base barcode (before I calculate the check digit) that falls within the EAN13 range as described above...
So finally my question is: Is there a way I can put an upper limit on AutoIncrement so that on the off chance, way way in the future, the base barcode number will not overflow into the next range?
I've been googling unsuccessfully for an answer and I am only coming across things which talk about the data type of the field having a limit. For example the upper limit of an Int32 type. Through my searches I have become vaguely aware of the 'Expression' property of the field and also the possibility of coding a partial class - but I don't know if that is the right direction to go in or if there is something much simpler that I am overlooking / have not found.
I would really appreciate any help!
Edit: As per GrandMasterFlush's comment - I have added a local database to my VS project. So I think I am using a SQL Server Compact 3.5 db.
Use a CHECK constraint, e.g.:
ALTER TABLE dbo.Product ADD CONSTRAINT ...
CHECK (BarcodeBase BETWEEN 1 AND 50000);
I suggest you do not make BarcodeBase an IDENTITY column in the Product table (IDENTITY is the feature that you are referring to as "autoincrement"). IDENTITY is really designed for surrogate key use only and isn't ideal for meaningful business data. You can't update an IDENTITY column, it isn't necessarily sequential, may have gaps in the number sequence and you also only get to use one IDENTITY column per table. Instead of using IDENTITY in the Product table you can generate the sequence elsewhere, for example by incrementing a single value stored in a single row table.

how to insert in to db when number is having digits greater than m for number(m,n) in oracle

in DB which i do not have privilege to alter.
a column has number(13,4) and how is it possible to insert 999999999999999999 whose length is more than 13 ? It is throwing exception. Is it possible to convert in to 1.23e3 format and does the db save this format?
no it is not possible because of the rules and limitations you mentioned yourself. The column has that formatting, you cannot change it so you cannot make it fit. period
No it is not possible to insert a number, which is greater than the specified precision and scale of the column.
You have to change the database.
If you don't have permissions to alter the table then simply ask someone who does; you have a valid "business" need to do so.
I would highly recommend not working out some way to "hack" around this limitation. Constraints such as this exist to enforce data quality. Though maybe misapplied in this situation, putting data in two different formats in the same column makes it immeasurably more difficult to retrieve data from the database. Hence why you should always store numbers as numbers etc.
No, unfortunately not. There is no way how to achieve this.

What would be the best algorithm to find an ID that is not used from a table that has the capacity to hold a million rows

To elaborate ..
a) A table (BIGTABLE) has a capacity to hold a million rows with a primary Key as the ID. (random and unique)
b) What algorithm can be used to arrive at an ID that has not been used so far. This number will be used to insert another row into table BIGTABLE.
Updated the question with more details..
C) This table already has about 100 K rows and the primary key is not an set as identity.
d) Currently, a random number is generated as the primary key and a row inserted into this table, if the insert fails another random number is generated. the problem is sometimes it goes into a loop and the random numbers generated are pretty random, but unfortunately, They already exist in the table. so if we re try the random number generation number after some time it works.
e) The sybase rand() function is used to generate the random number.
Hope this addition to the question helps clarify some points.
The question is of course: why do you want a random ID?
One case where I encountered a similar requirement, was for client IDs of a webapp: the client identifies himself with his client ID (stored in a cookie), so it has to be hard to brute force guess another client's ID (because that would allow hijacking his data).
The solution I went with, was to combine a sequential int32 with a random int32 to obtain an int64 that I used as the client ID. In PostgreSQL:
CREATE FUNCTION lift(integer, integer) returns bigint AS $$
SELECT ($1::bigint << 31) + $2
$$ LANGUAGE SQL;
CREATE FUNCTION random_pos_int() RETURNS integer AS $$
select floor((lift(1,0) - 1)*random())::integer
$$ LANGUAGE sql;
ALTER TABLE client ALTER COLUMN id SET DEFAULT
lift((nextval('client_id_seq'::regclass))::integer, random_pos_int());
The generated IDs are 'half' random, while the other 'half' guarantees you cannot obtain the same ID twice:
select lift(1, random_pos_int()); => 3108167398
select lift(2, random_pos_int()); => 4673906795
select lift(3, random_pos_int()); => 7414644984
...
Why is the unique ID Random? Why not use IDENTITY?
How was the ID chosen for the existing rows.
The simplest thing to do is probably (Select Max(ID) from BIGTABLE) and then make sure your new "Random" ID is larger than that...
EDIT: Based on the added information I'd suggest that you're screwed.
If it's an option: Copy the table, then redefine it and use an Identity Column.
If, as another answer speculated, you do need a truly random Identifier: make your PK two fields. An Identity Field and then a random number.
If you simply can't change the tables structure checking to see if the id exists before trying the insert is probably your only recourse.
There isn't really a good algorithm for this. You can use this basic construct to find an unused id:
int id;
do {
id = generateRandomId();
} while (doesIdAlreadyExist(id));
doSomethingWithNewId(id);
Your best bet is to make your key space big enough that the probability of collisions is extremely low, then don't worry about it. As mentioned, GUIDs will do this for you. Or, you can use a pure random number as long as it has enough bits.
This page has the formula for calculating the collision probability.
A bit outside of the box.
Why not pre-generate your random numbers ahead of time? That way, when you insert a new row into bigtable, the check has already been made. That would make inserts into bigtable a constant time operation.
You will have to perform the checks eventually, but that could be offloaded to a second process that doesn’t involve the sensitive process of inserting into bigtable.
Or go generate a few billion random numbers, and delete the duplicates, then you won't have to worry for quite some time.
Make the key field UNIQUE and IDENTITY and you wont have to worry about it.
If this is something you'll need to do often you will probably want to maintain a live (non-db) data structure to help you quickly answer this question. A 10-way tree would be good. When the app starts it populates the tree by reading the keys from the db, and then keeps it in sync with the various inserts and deletes made in the db. So long as your app is the only one updating the db the tree can be consulted very quickly when verifying that the next large random key is not already in use.
Pick a random number, check if it already exists, if so then keep trying until you hit one that doesn't.
Edit: Or
better yet, skip the check and just try to insert the row with different IDs until it works.
First question: Is this a planned database or a already functional one. If it already has data inside then the answer by bmdhacks is correct. If it is a planned database here is the second question:
Does your primary key really need to be random? If the answer is yes then use a function to create a random id from with a known seed and a counter to know how many Ids have been created. Each Id created will increment the counter.
If you keep the seed secret (i.e., have the seed called and declared private) then no one else should be able to predict the next ID.
If ID is purely random, there is no algorithm to find an unused ID in a similarly random fashion without brute forcing. However, as long as the bit-depth of your random unique id is reasonably large (say 64 bits), you're pretty safe from collisions with only a million rows. If it collides on insert, just try again.
depending on your database you might have the option of either using a sequenser (oracle) or a autoincrement (mysql, ms sql, etc). Or last resort do a select max(id) + 1 as new id - just be carefull of concurrent requests so you don't end up with the same max-id twice - wrap it in a lock with the upcomming insert statement
I've seen this done so many times before via brute force, using random number generators, and it's always a bad idea. Generating a random number outside of the db and attempting to see if it exists will put a lot strain on your app and database. And it could lead to 2 processes picking the same id.
Your best option is to use MySQL's autoincrement ability. Other databases have similar functionality. You are guaranteed a unique id and won't have issues with concurrency.
It is probably a bad idea to scan every value in that table every time looking for a unique value. I think the way to do this would be to have a value in another table, lock on that table, read the value, calculate the value of the next id, write the value of the next id, release the lock. You can then use the id you read with the confidence your current process is the only one holding that unique value. Not sure how well it scales.
Alternatively use a GUID for your ids, since each newly generated GUID is supposed to be unique.
Is it a requirement that the new ID also be random? If so, the best answer is just to loop over (randomize, test for existence) until you find one that doesn't exist.
If the data just happens to be random, but that isn't a strong constraint, you can just use SELECT MAX(idcolumn), increment in a way appropriate to the data, and use that as the primary key for your next record.
You need to do this atomically, so either lock the table or use some other concurrency control appropriate to your DB configuration and schema. Stored procs, table locks, row locks, SELECT...FOR UPDATE, whatever.
Note that in either approach you may need to handle failed transactions. You may theoretically get duplicate key issues in the first (though that's unlikely if your key space is sparsely populated), and you are likely to get deadlocks on some DBs with approaches like SELECT...FOR UPDATE. So be sure to check and restart the transaction on error.
First check if Max(ID) + 1 is not taken and use that.
If Max(ID) + 1 exceeds the maximum then select an ordered chunk at the top and start looping backwards looking for a hole. Repeat the chunks until you run out of numbers (in which case throw a big error).
if the "hole" is found then save the ID in another table and you can use that as the starting point for the next case to save looping.
Skipping the reasoning of the task itself, the only algorithm that
will give you an ID not in the table
that will be used to insert a new line in the table
will result in a table still having random unique IDs
is generating a random number and then checking if it's already used
The best algorithm in that case is to generate a random number and do a select to see if it exists, or just try to add it if your database errs out sanely. Depending on the range of your key, vs, how many records there are, this could be a small amount of time. It also has the ability to spike and isn't consistent at all.
Would it be possible to run some queries on the BigTable and see if there are any ranges that could be exploited? ie. between 100,000 and 234,000 there are no ID's yet, so we could add ID's there?
Why not append your random number creator with the current date in seconds. This way the only way to have an identical ID is if two users are created at the same second and are given the same random number by your generator.

Resources