Asking for opinions : Accent marks / diacritics in primary key - oracle

I have this application that uses natural primary keys. The database uses the WE8ISO8859P15 character set. So in my table City whe have primary keys like 'MEDELLÍN' and 'MÜNCHEN'. I have a hunch we are going to have a lot of trouble with this.
The problems I see
Interfacing this data to databases with another character set. I don't want character set conversion on my primary key
Dumping the data to files and processing these files we always have to very aware of the special characters and the client settings
Should we allow diacritics in the PK? Please feel free to give your opinion.

Trying to ignore diacritics is just delaying the inevitable. Yes, you could save some issues in Eastern Europe. But you still can't deal with Greek city names. You'd need Unicode, and then there's no point anymore in misspelling Munchen/Muenchen; it's München.
That said, the entire notion that there's a single name for a city already breaks in Brussel aka Bruxelles, and that's Western Europe. So, they're fundamentally unsuitable for primary keys, no matter how you'd spell them.

Why not? You DB model is broken beyond repair already, so why not introduce another source of problems? ;)
More seriously, databases are getting better at supporting Unicode, so there is no problem with storing natural text (with all it's oddities). Your issue is "primary key". There are several ways in which the same text can be encoded (for example, you can have accented characters or diacritics with plain characters). This means you can get two different keys for the same text.
There are a lot of wrong reasons to use business keys as PK and no good ones. Don't do it. Bite the bullet and fix it. Fix it now. It will cost you less (even if it costs a lot) than not fixing it.

Like you, I feel it would be really looking for problems to allow them.
In addition to the problems you mention, it could be:
Imagine switching to another database vendor ...
I don't know if introducing a surrogate primary key is an option for you, but that could be the correct timing to do so ;-) ...
If not, you could duplicate the column :
the pk column would not be case sensitive, not have special characters and so on ...
an additional column would preserve what was entered by the user, to show it nicely in some UI...

Yes you will have problems with those characters. Leaving ASCII always causes problems. But when you do business not only in britain and the US, you don't have a choice.
I don't see special character set related problems for the Primary Key. If you export, import, interface or migrate you'll have to take these characters into account no matter if they are part of your PK or not.
But they do emphasize the problem of a natural key as primary key. It seems to be extremely likely that someone will write e.g. Muenchen just to later change it to München, which of course will cause the well known problem of updates on PK.

Whether your attribute is (part of) a key or not has nothing to do with the issue.
You have issues of character set conversion with ANY data traffic to/from this attribute anyway, regardless of whether it's a key or not.
Yes, in order to encode "correctly", and have the best possible guarantee that your data will never get corrupted because of character set conversion issues, you need the Unicode character set and one of its encodings.
I do have some serious doubts about the table itself, incidentally. What do you do with Heidelberg, Germany and Heidelberg, South Africa ? Oxford, UK and Oxford, US, where there's even hardly a state without one ?
What kind of information depends on that key ? If there is none at all, then your table is more of a "variable type" than it is a "genuine table". In that case, you might just as well forget the table and make your cityname attributes just plain String.
If you are really required to produce some "canonical spellings" for citynames when exporting data from the database, then I'd advise to try and set up a "phonetic search table" in which "commonly used spellings" are linked to the "canonical spelling" you are required to produce. Expect a serious effort in getting such tables populated, however.
In that case, then in addition to the already mentioned München/Muenchen and Western/Greek alphabet issues, don't forget about the Liège/Luik/Lüttich (München/Munich) kind of issues.

Things change their names, or have their names changed for them. Cities, Universities, Parks, People .. all unsuitable as Primary Keys. Unique Key, maybe? Or part of a Unique Key?

Related

FoxPro ERP throwing "Numeric Overflow" error. No support

So, a company I work at has an older ERP system that uses FoxPro 4 or 5. There is no support for the system, so I am trying to use skills that I don't possess. I'm good with Servers and even networks, but not coding. I have attached links to two similar errors that are occuring to two different users in different departments using different computers. Your help would be appreciated.
FoxPro Error 1
FoxPro Error 2
Well, the problem is exactly what it says on the tin. It looks like the issue is with the field BODY.COST. The field will have a maximum capacity, for example N(12, 2) would allow numbers up to 999999999.99 to be stored in it.
The system is attempting to put a number that is bigger than the defined capacity into this field. You can see it is a GATHER MEMVAR statement in both cases. This statement takes memory variables and updates a database table using them. One of the memory variables has ended up with a bigger number in it than the database field (looks like BODY.COST) that is intended to store it has capacity for.
Beyond that, with no support and no source code you are really limited to looking at what the user is trying to post and seeing if that gives you any clues. Is that the extent of the error dumps or are those just snippets?
The messages are saying that you are trying to store a larger value than the field would accept. This happens with numeric and float fields in Foxpro. In both of the messages, the table was indirectly aliased as "BODY" and the problematic field is "COST".
As a solution, using VFP5 (do not use a later version - there weren't VFP4), you can make all the numeric and float fields to either Currency or Double data type.
Currency has a high certainity and suggested for monetary values (need not be monetary). It is in the range of –922337203685477.5808 to 922337203685477.5807. That range is actually above what a numeric/float field can support.
If you think that is not enough range, than you can use double (something like -10^327 to 10^304 - VFP has a precision of 15 digits, you lose precision beyond that).
I would go with Currency.

how to insert in to db when number is having digits greater than m for number(m,n) in oracle

in DB which i do not have privilege to alter.
a column has number(13,4) and how is it possible to insert 999999999999999999 whose length is more than 13 ? It is throwing exception. Is it possible to convert in to 1.23e3 format and does the db save this format?
no it is not possible because of the rules and limitations you mentioned yourself. The column has that formatting, you cannot change it so you cannot make it fit. period
No it is not possible to insert a number, which is greater than the specified precision and scale of the column.
You have to change the database.
If you don't have permissions to alter the table then simply ask someone who does; you have a valid "business" need to do so.
I would highly recommend not working out some way to "hack" around this limitation. Constraints such as this exist to enforce data quality. Though maybe misapplied in this situation, putting data in two different formats in the same column makes it immeasurably more difficult to retrieve data from the database. Hence why you should always store numbers as numbers etc.
No, unfortunately not. There is no way how to achieve this.

Creating primary key from two or more column in Visual FoxPro 9

How do I create a primary key index from two or more columns in Visual FoxPro 9?
The columns may be of different types.
Compound indexes should be strings, so use the appropriate function (STR(), DTOS(), etc.) to convert the field before concatenating it. See the MSDN documentation for more details.
Another word of caution is to make sure you never trim the character representation of any of the columns included in the keys.
Something else you should be aware of is that referential integrity code generated by VFP is sometimes not clean or designed to work well with concatenated keys. Code is a lot simpler for surrogate keys (single meaningless column, normally integer or GUID). It might be too late in the design for you to consider this, but I will put it out here just in case it is still in the design stage or still a practical change to make.
Rick Schummer VFP MVP

How to generate the effective order number? (nice pattern with unpredicatable gap)

just wondering does anyone in here have good idea about generating nice order id?
for example
832-28-394, which show a quite nice and formal order id (rather than just use an database auto increment number like ID=35).
the order id need to look random so it can not be able to guess by user.
e.g. 832-28-395 (shoudnt exist) so there will always some gap between each id.
just like the account number for your bank card?
Cheers
If you are using .NET you can use System.Guid.NewGuid()
The auto-incremented IDs are stored as integer or long integer data. One of the reasons for this is that this format is compact, saving space, including in indexes which are typically inclusive a primary key for use with joins and such.
If you wish to create a nice looking id following a particular format syntax, you'll need to manage the generation of the IDs yourself, and store these in a "regular" column not one that is auto-incremented.
I suggest you keep using "ugly looking" ids, be they auto-incremented or not, and format these value for display purposes only, using whatever format you may desire, including some format that use the values from several columns. Depending on the database system you are using you may be able to declare custom functions, at the level of the database itself, allowing you to obtain the readily formatted value with a simple query (as in
SELECT MakeAFancyId(id_field), some_other_columns, ..
FROM ...
If you cannot use some built-in or custom function at the level of SQL, you'll need to format the value supplied by SQL (an integer of sorts), into the desired format, on the client-side, using the language associated with your UI / presentation framework.
I'd create something where the first eight numbers are loosely in a pattern, and a third quartet looks random but is really a sort of checksum.
So, for example, the first eight digits increment based on the current seconds on the server clock.
The last four could be something like the sum of the first four, plus twice the sum of the second four, which will give either a two or three digit number. The final digit is calculated so that the sum of all 11 digits plus this last one is a multiple of 9.
This is slightly akin to how barcode numbers are verified. You can format the resulting 12 digits any way you want, although it is the first eight that are unique here.
Hash the clock time.
Mod by 100,000 or something.
Format with hyphens.
Check for duplicates. If found, restart.
I would suggest using a autoincrement ID in the database to link tables and as a primary key. Integer fields are always faster than string fields for indexing and well as searching.
You can have the order number field (which is for display) as a different field in the order table which will be used to display. And whenever you are planning to send a URl to a user or display a URL to the user which has order ID (which is a autoincremented number) you can encrypt it with some algorithm.
Both your purpose will be solved.
But I suggest not to make string as primary key. Though you can have a unique constraint on the order number which is going to be displayed.
Hope this helps.
Kalpak Luniya
I would suggest internally you keep the database derived primary key, which is auto-incremented.
For the visible order number, you will probably need a longer length than 8 characters, if you are using this for security.
If you are using Ruby, look at SecureRandom, which will generate sufficiently random strings to accomodate this. For example, you can use SecureRandom.hex(16), and it will give you a 16 digit hex number. I believe it can also give you base 64 strings, which will look weirder but be shorter.
Make sure this is not your only security on an order, as it may not be that hard to find a valid order number within your 8 digit code, especially if some are some sort of checksum.
For security reasons i suggest that you should use Criptographicaly secure random number generator. Think about idea on icreasing User Id length -if you have 1 million users then the probability to gues User ID in first try is 0.01 and 67 tries to increase probability over 0.5

Oracle empty strings

How do you guys treat empty strings with Oracle?
Statement #1: Oracle treats empty string (e.g. '') as NULL in "varchar2" fields.
Statement #2: We have a model that defines abstract 'table structure', where for we have fields, that can't be NULL, but can be "empty". This model works with various DBMS; almost everywhere, all is just fine, but not with Oracle. You just can't insert empty string into a "not null" field.
Statement #3: non-empty default value is not allowed in our case.
So, would someone be so kind to tell me - how can we resolve it?
This is why I've never understood why Oracle is so popular. They don't actually follow the SQL standard, based on a silly decision they made many years ago.
The Oracle 9i SQL Reference states (this has been there for at least three major versions):
Oracle currently treats a character value with a length of zero as null. However, this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls.
But they don't say what you should do. The only ways I've ever found to get around this problem are either:
have a sentinel value that cannot occur in your real data to represent NULL (e.g, "deoxyribonucleic" for a surname field and hope that the movie stars don't start giving their kids weird surnames as well as weird first names :-).
have a separate field to indicate whether the first field is valid or not, basically what a real database does with NULLs.
Are we allowed to say "Don't support Oracle until it supports the standard SQL behaviour"? It seems the least pain-laden way in many respects.
If you can't force (use) a single blank, or maybe a Unicode Zero Width Non-Break Space (U+FEFF), then you probably have to go the whole hog and use something implausible such as 32 Z's to indicate that the data should be blank but isn't because the DBMS in use is Orrible.
Empty string and NULL in Oracle are the same thing. You want to allow empty strings but disallow NULLs.
You have put a NOT NULL constraint on your table, which is the same as a not-an-empty-string constraint. If you remove that constraint, what are you losing?

Resources