How do you guys treat empty strings with Oracle?
Statement #1: Oracle treats empty string (e.g. '') as NULL in "varchar2" fields.
Statement #2: We have a model that defines abstract 'table structure', where for we have fields, that can't be NULL, but can be "empty". This model works with various DBMS; almost everywhere, all is just fine, but not with Oracle. You just can't insert empty string into a "not null" field.
Statement #3: non-empty default value is not allowed in our case.
So, would someone be so kind to tell me - how can we resolve it?
This is why I've never understood why Oracle is so popular. They don't actually follow the SQL standard, based on a silly decision they made many years ago.
The Oracle 9i SQL Reference states (this has been there for at least three major versions):
Oracle currently treats a character value with a length of zero as null. However, this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls.
But they don't say what you should do. The only ways I've ever found to get around this problem are either:
have a sentinel value that cannot occur in your real data to represent NULL (e.g, "deoxyribonucleic" for a surname field and hope that the movie stars don't start giving their kids weird surnames as well as weird first names :-).
have a separate field to indicate whether the first field is valid or not, basically what a real database does with NULLs.
Are we allowed to say "Don't support Oracle until it supports the standard SQL behaviour"? It seems the least pain-laden way in many respects.
If you can't force (use) a single blank, or maybe a Unicode Zero Width Non-Break Space (U+FEFF), then you probably have to go the whole hog and use something implausible such as 32 Z's to indicate that the data should be blank but isn't because the DBMS in use is Orrible.
Empty string and NULL in Oracle are the same thing. You want to allow empty strings but disallow NULLs.
You have put a NOT NULL constraint on your table, which is the same as a not-an-empty-string constraint. If you remove that constraint, what are you losing?
Related
In one scenario we are dynamically creating sql to create temp tables on-fly. There is no issue with table_name as it is decided by us however the column-names are provided by sources not in our control.
Usually we would check the column names using below query:
select ..
where NOT REGEXP_LIKE (Column_Name_String,'^([a-zA-Z])[a-zA-Z0-9_]*$')
OR Column_Name_String is NULL
OR Length(Column_Name_String) > 30
However is there any build in function which can do a more extensive check. Also any input on the above query is welcome as well.
Thanks in advance.
Final query based on below answers:
select ..
where NOT REGEXP_LIKE (Column_Name_String,'^([a-zA-Z])[a-zA-Z0-9_]{0,29}$')
OR Column_Name_String is NULL
OR Upper(Column_Name_String) in (select Upper(RESERVED_WORDS.Keyword) from V$RESERVED_WORDS RESERVED_WORDS)
Particularly not happy with character's like $ in column name either hence won't be using..
dbms_assert.simple_sql_name('VALID_NAME')
Instead with regexp I can decide my own set of character's to allow.
This answer does not necessarily offer either a performance or logical improvement, but you can actually validate the column names using a single regex:
SELECT ...
WHERE NOT
REGEXP_LIKE (COALESCE(Column_Name_String, ''), '^([a-zA-Z])[a-zA-Z0-9_]{0,29}$')
This works because:
It uses the same pattern to match columns, i.e. starting with a letter and afterwards using only alphanumeric characters and underscore
NULL column names are mapped to empty string, which fails the regex
We use a length quantifier {0,29} to check the column length directly in the regex
" is there any build in function which can do a more extensive check."
Oracle has the DBMS_ASSERT.SIMPLE_SQL_NAME() function. This returns the passed name if it meets the Oracle naming rules ...
select dbms_assert.simple_sql_name('VALID_NAME') from dual;
... and hurls ORA-44003 if the name is invalid.
Valid names permit any characters if the name is double-quoted (yuck, but then so is creating "temp tables on-fly"). Also the function doesn't check the length of the name, so you will still need to validate that yourself.
Find out more in the docs.
Also here is a SQL Fiddle.
"creating a table with comment column is not possible as its a invalid identifier"
Fair point. DBMS_ASSERT is primarily aimed at preventing SQL injection. So it verifies that a value conforms to Oracle's naming rules, not that the value is a valid Oracle name. To catch things like comment you will also need to check the value against V$RESERVED_WORDS, probably where reserved != 'Y'. As this is a V$ view select on it is not granted by default; if you don't have access you'll need to ask your friendly DBA to help out.
" For validating column names I believe I should check with the entire list"
Up to you. The distinction is that some keywords can legitimately be used as identifiers. For instance TYPE only became a reserved word in Oracle version 8 when they introduced the object-relational stuff. But there were a lot of tables and views in existing systems which used 'TYPE' as a column name (not least the Oracle data dictionary). If Oracle had made TYPE a properly reserved word it would have broken all those systems. So the list of reserved words which cannot be used as identifiers is a sub-set of all the Oracle keywords.
Opinions on the general task:
"we are getting data from external sources (files) and the job of the program/script is to push that data to oracle tables."
There are two parts to this task.
The first is that you should have agreed a standard format for these files with the third parties. There should be no need for discovery of the files' structure or content. (Or if there is such a need because the files are randomly sourced from a carousel of third parties probably you should not be using a relational database but something else: Endeca? Python Pandas library?)
The second is the creating tables on the fly. If you have an agreed file structure then you should be loading into standard tables, using either SQL*Loader or external tables according to your circumstances. If you're on 12c maybe SQL*Loader Express Mode could be of interest.
I have a Pesky SSRS report Problem where in the main query of my report has a condition that can have more than 1000 choices and when user selects all it will fail as my backend database is Oracle. I have done some research and found a solution that would work.
Solution is
re-writing the in clause something like this
(1,ColumnName) in ((1,Searchitem1),(1,SearchItem2))
this will work however when I do this
(1,ColumnName) in ((1,:assignedValue))
and pass just one value it works. But when I pass more than one value it fails and gives me ORA-01722: Invalid number error
I have tried multiple combination of the same in clause but nothing is working
any help is appreciated...
Wild guess: your :assignedValue is a comma-separated list of numbers, and Oracle tries to parse it as a single number.
Passing multiple values as a single value for an IN query is (almost) never a good idea - either you have to use string concatenation (prone to SQL injection and terrible performance), or you have to have a fixed number of arguments to IN (which generally is not what you want).
I'd suggest you
INSERT your search items into a temporary table
use a JOIN with this search table in your SELECT
I need an urgent help from you guys, the thing i have a column which represent the full name of a user , now i want to split it into first and last name.
The format of the Full name is "World, hello", now the first name here is hello and last name is world.
I am using Derived Column(SSIS) and using Right Function for First Name and substring function for last name, but the result of these seems to be blank, this where even i am blank. :)
It's working for me. In general, you should provide more detail in your questions on places such as this to help others recreate and troubleshoot your issue. You did not specify whether we needed to address NULLs in this field nor do I know how you'd want to interpret it so there is room for improvement on this answer.
I started with a simple OLE DB Source and hard coded a query of "SELECT 'World, Hello' AS Name".
I created 2 Derived Column Tasks. The first one adds a column to Data Flow called FirstCommaPosition. The formula I used is FINDSTRING(Name,",", 1) If NAME is NULLable, then we will need to test for nullability prior to calling the FINDSTRING function. You'll then need to determine how you will want to store the split data in the case of NULLs. I would assume both first and last are should be NULLed but I don't know that.
There are two reasons for doing this in separate steps. The first is performance. As counter-intuitive as it sounds, doing less in a derived column results in better performance because the SSIS engine can better parallelize the operations. The other is more simple - I will need to use this value to make the first and last name split so it will be easier and less maintenance to reference a column than to copy paste a formula.
The second Derived Column is going to actually perform the split.
My FirstNameUnicode column uses this formula (FirstCommaPosition > 0) ? RTRIM(LTRIM(RIGHT(Name,FirstCommaPosition))) : "" That says "If we found a comma in the preceding step, then slice out everything from the comma's position to the end of the string and apply trim operations. If we didn't find a comma, then just return a blank string. The default string type for expressions will be the Unicode (DT_WSTR) so if that is not your need, you will need to cast the resultant into the correct string codepage (DT_STR)
My LastNameUnicode column uses this formula (FirstCommaPosition > 0) ? SUBSTRING(Name,1,FirstCommaPosition -1) : "" Similar logic as above except now I use the SUBSTRING operation instead of RIGHT. Users of the 2012 release of SSIS and beyond, rejoice fo you can use the LEFT function instead of SUBSTRING. Also note that you will need to back off 1 position to remove the comma.
I have this application that uses natural primary keys. The database uses the WE8ISO8859P15 character set. So in my table City whe have primary keys like 'MEDELLÍN' and 'MÜNCHEN'. I have a hunch we are going to have a lot of trouble with this.
The problems I see
Interfacing this data to databases with another character set. I don't want character set conversion on my primary key
Dumping the data to files and processing these files we always have to very aware of the special characters and the client settings
Should we allow diacritics in the PK? Please feel free to give your opinion.
Trying to ignore diacritics is just delaying the inevitable. Yes, you could save some issues in Eastern Europe. But you still can't deal with Greek city names. You'd need Unicode, and then there's no point anymore in misspelling Munchen/Muenchen; it's München.
That said, the entire notion that there's a single name for a city already breaks in Brussel aka Bruxelles, and that's Western Europe. So, they're fundamentally unsuitable for primary keys, no matter how you'd spell them.
Why not? You DB model is broken beyond repair already, so why not introduce another source of problems? ;)
More seriously, databases are getting better at supporting Unicode, so there is no problem with storing natural text (with all it's oddities). Your issue is "primary key". There are several ways in which the same text can be encoded (for example, you can have accented characters or diacritics with plain characters). This means you can get two different keys for the same text.
There are a lot of wrong reasons to use business keys as PK and no good ones. Don't do it. Bite the bullet and fix it. Fix it now. It will cost you less (even if it costs a lot) than not fixing it.
Like you, I feel it would be really looking for problems to allow them.
In addition to the problems you mention, it could be:
Imagine switching to another database vendor ...
I don't know if introducing a surrogate primary key is an option for you, but that could be the correct timing to do so ;-) ...
If not, you could duplicate the column :
the pk column would not be case sensitive, not have special characters and so on ...
an additional column would preserve what was entered by the user, to show it nicely in some UI...
Yes you will have problems with those characters. Leaving ASCII always causes problems. But when you do business not only in britain and the US, you don't have a choice.
I don't see special character set related problems for the Primary Key. If you export, import, interface or migrate you'll have to take these characters into account no matter if they are part of your PK or not.
But they do emphasize the problem of a natural key as primary key. It seems to be extremely likely that someone will write e.g. Muenchen just to later change it to München, which of course will cause the well known problem of updates on PK.
Whether your attribute is (part of) a key or not has nothing to do with the issue.
You have issues of character set conversion with ANY data traffic to/from this attribute anyway, regardless of whether it's a key or not.
Yes, in order to encode "correctly", and have the best possible guarantee that your data will never get corrupted because of character set conversion issues, you need the Unicode character set and one of its encodings.
I do have some serious doubts about the table itself, incidentally. What do you do with Heidelberg, Germany and Heidelberg, South Africa ? Oxford, UK and Oxford, US, where there's even hardly a state without one ?
What kind of information depends on that key ? If there is none at all, then your table is more of a "variable type" than it is a "genuine table". In that case, you might just as well forget the table and make your cityname attributes just plain String.
If you are really required to produce some "canonical spellings" for citynames when exporting data from the database, then I'd advise to try and set up a "phonetic search table" in which "commonly used spellings" are linked to the "canonical spelling" you are required to produce. Expect a serious effort in getting such tables populated, however.
In that case, then in addition to the already mentioned München/Muenchen and Western/Greek alphabet issues, don't forget about the Liège/Luik/Lüttich (München/Munich) kind of issues.
Things change their names, or have their names changed for them. Cities, Universities, Parks, People .. all unsuitable as Primary Keys. Unique Key, maybe? Or part of a Unique Key?
I am using NHibernate to query an Oracle 8i database. The problem is that all the strings in the returned objects are postfixed with special characters. For e.g.
CUSTOMER,ONE�������
The nhibernate field type is AnsiString and the Oracle datatype is CHAR(20) and the Character set is CHAR_CS. I am totally new with Oracle so i don't have a clue whats going on :(
CHAR(20) means the field is padded as necessary to be exactly 20 characters long. The padding character is a blank.
There must be a problem somewhere in your character set settings if padding characters appear as question marks. You may find more insight on your problem here.
What you need here is to trim the returned strings, or better yet move to VARCHAR2(20).
I couldn't find a proper solution for this issue but changing the nhibernate driver from 'OracleClientDriver' to 'OleDbDriver' solved this issue. Still if anyone knows how to tackle this issue properly please let me know as I don't like using the OldDbDriver for accessing Oracle because of possible compatibility issues.