I know that it does consider ' ' as NULL, but that doesn't do much to tell me why this is the case. As I understand the SQL specifications, ' ' is not the same as NULL -- one is a valid datum, and the other is indicating the absence of that same information.
Feel free to speculate, but please indicate if that's the case. If there's anyone from Oracle who can comment on it, that'd be fantastic!
I believe the answer is that Oracle is very, very old.
Back in the olden days before there was a SQL standard, Oracle made the design decision that empty strings in VARCHAR/VARCHAR2 columns were NULL and that there was only one sense of NULL (there are relational theorists that would differentiate between data that has never been prompted for, data where the answer exists but is not known by the user, data where there is no answer, etc. all of which constitute some sense of NULL).
By the time that the SQL standard came around and agreed that NULL and the empty string were distinct entities, there were already Oracle users that had code that assumed the two were equivalent. So Oracle was basically left with the options of breaking existing code, violating the SQL standard, or introducing some sort of initialization parameter that would change the functionality of potentially large number of queries. Violating the SQL standard (IMHO) was the least disruptive of these three options.
Oracle has left open the possibility that the VARCHAR data type would change in a future release to adhere to the SQL standard (which is why everyone uses VARCHAR2 in Oracle since that data type's behavior is guaranteed to remain the same going forward).
Tom Kyte VP of Oracle:
A ZERO length varchar is treated as
NULL.
'' is not treated as NULL.
'' when assigned to a char(1) becomes
' ' (char types are blank padded
strings).
'' when assigned to a varchar2(1)
becomes '' which is a zero length
string and a zero length string is
NULL in Oracle (it is no long '')
Oracle documentation alerts developers to this problem, going back at least as far as version 7.
Oracle chose to represent NULLS by the "impossible value" technique. For example, a NULL in a numeric location will be stored as "minus zero", an impossible value. Any minus zeroes that result from computations will be converted to positive zero before being stored.
Oracle also chose, erroneously, to consider the VARCHAR string of length zero (the empty string) to be an impossible value, and a suitable choice for representing NULL. It turns out that the empty string is far from an impossible value. It's even the identity under the operation of string concatenation!
Oracle documentation warns database designers and developers that some future version of Oracle might
break this association between the empty string and NULL, and break any code that depends on that association.
There are techniques to flag NULLS other than impossible values, but Oracle didn't use them.
(I'm using the word "location" above to mean the intersection of a row and a column.)
I suspect this makes a lot more sense if you think of Oracle the way earlier developers probably did -- as a glorified backend for a data entry system. Every field in the database corresponded to a field in a form that a data entry operator saw on his screen. If the operator didn't type anything into a field, whether that's "birthdate" or "address" then the data for that field is "unknown". There's no way for an operator to indicate that someone's address is really an empty string, and that doesn't really make much sense anyways.
According to official 11g docs
Oracle Database currently treats a character value with a length of zero as null. However, this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls.
Possible reasons
val IS NOT NULL is more readable than val != ''
No need to check both conditions val != '' and val IS NOT NULL
Empty string is the same as NULL simply because its the "lesser evil" when compared to the situation when the two (empty string and null) are not the same.
In languages where NULL and empty String are not the same, one has to always check both conditions.
Example from book
set serveroutput on;
DECLARE
empty_varchar2 VARCHAR2(10) := '';
empty_char CHAR(10) := '';
BEGIN
IF empty_varchar2 IS NULL THEN
DBMS_OUTPUT.PUT_LINE('empty_varchar2 is NULL');
END IF;
IF '' IS NULL THEN
DBMS_OUTPUT.PUT_LINE(''''' is NULL');
END IF;
IF empty_char IS NULL THEN
DBMS_OUTPUT.PUT_LINE('empty_char is NULL');
ELSIF empty_char IS NOT NULL THEN
DBMS_OUTPUT.PUT_LINE('empty_char is NOT NULL');
END IF;
END;
Because not treating it as NULL isn't particularly helpful, either.
If you make a mistake in this area on Oracle, you usually notice right away. In SQL server, however, it will appear to work, and the problem only appears when someone enters an empty string instead of NULL (perhaps from a .net client library, where null is different from "", but you usually treat them the same).
I'm not saying Oracle is right, but it seems to me that both ways are approximately equally bad.
Indeed, I have had nothing but difficulties in dealing with Oracle, including invalid datetime values (cannot be printed, converted or anything, just looked at with the DUMP() function) which are allowed to be inserted into the database, apparently through some buggy version of the client as a binary column! So much for protecting database integrity!
Oracle handling of NULLs links:
http://digitalbush.com/2007/10/27/oracle-9i-null-behavior/
http://jeffkemponoracle.com/2006/02/empty-string-andor-null.html
First of all, null and null string were not always treated as the same by Oracle. A null string is, by definition, a string containing no characters. This is not at all the same as a null. NULL is, by definition, the absence of data.
Five or six years or so ago, null string was treated differently from null by Oracle. While, like null, null string was equal to everything and different from everything (which I think is fine for null, but totally WRONG for null string), at least length(null string) would return 0, as it should since null string is a string of zero length.
Currently in Oracle, length(null) returns null which I guess is O.K., but length(null string) also returns null which is totally WRONG.
I do not understand why they decided to start treating these 2 distinct "values" the same. They mean different things and the programmer should have the capability of acting on each in different ways. The fact that they have changed their methodology tells me that they really don't have a clue as to how these values should be treated.
Related
My app is built on an Oracle Database.
Would it be possible to overcome the 4000 byte of Text limitation we have for Synchronized Record fields of appian?
I know the VARCHAR2(4000) limitation is considered to be a standard column type by Oracle, then choosing the EXTENDED for param. max_string_size in the DB would make it an "extended data type", as CLOB is. But since CLOB is forbidden to become a Snyc. RT field, would my large VARCHAR2 columns be also forbidden?
Asking for people who tried it. If no one did, I will ask the DBA, but could be easier to ask here ;)
My opinion is, don't bother with the extended string option. It provides no real benefit. A varchar2 greater than 4KB is a CLOB under a different name.
Creating a mySomeTable table with 2 fields
create table mySomeTable (
IDRQ VARCHAR2(32 CHAR),
PROCID VARCHAR2(64 CHAR)
);
Creating an index on the table by the PROCID field
create index idx_PROCID on mySomeTable(trunc(PROCID));
Inserting records:
insert into mySomeTable values ('a', '1'); -- OK
insert into mySomeTable values ('b', 'c'); -- FAIL
As you can see, an error has been made in the index construction script and the script will try to build an index on the field using the trunc() function.
trunct() is a function for working with dates or numbers, and the field has the string type
This index building script successfully works out and creates an index without displaying any warnings and errors.
An index is created on the table using the TRUNC(TO_NUMBER(PROCID)) function
When trying to insert or change an entry in the table, if PROCID cannot be converted to a number, I get the error ORA-01722: invalid number, which is actually logical.
However, the understanding that I am working in a table with rows and adding string values to the table, and the error is about converting to a number, was misleading and not understanding what is happening...
Question: Why does Oracle change the index construction function, instead of giving an error? And how can this be avoided in the future?
Oracle version 19.14
Naturally, there was only one solution - to create the right index with the right script
create index idx_PROCID on mySomeTable(PROCID);
however, this does not explain, to me, this Oracle behavior.
Oracle doesn't know if the index declaration is wrong or the column data type is wrong. Arguably (though some may well disagree!) Oracle shouldn't try to second-guess your intentions or enforce restrictions beyond those documented in the manual - that's what user-defined constraints are for. And, arguably, this index acts as a form of pseudo-constraint. That's a decision for the developer, not Oracle.
It's legal, if usually ill-advised, to store a number in a string column. If you actually intentionally chose to store numbers as strings - against best practice and possibly just to irritate future maintainers of your code - then the index behaviour is reasonable.
A counter-question is to ask where it should draw the line - if you expect it to error on your index expression, what about something like
create index idx_PROCID on mySomeTable(
case when regexp_like(PROCID, '^\d.?\d*$') then trunc(PROCID) end
);
or
create index idx_PROCID on mySomeTable(
trunc(to_number(PROCID default null on conversion error))
);
You might actually have chosen to store both numeric and non-numeric data in the same string column (again, I'm not advocating that) and an index like that might then useful - and you wouldn't want Oracle to prevent you from creating it.
Something that obviously doesn't make sense and shouldn't be allowed to you is much harder for software to evaluate.
Interestingly the documentation says:
Oracle recommends that you specify explicit conversions, rather than rely on implicit or automatic conversions, for these reasons:
...
If implicit data type conversion occurs in an index expression, then Oracle Database might not use the index because it is defined for the pre-conversion data type. This can have a negative impact on performance.
which is presumably why it actually chooses here to apply explicit conversion when it creates the index expression (which you can see in user_ind_expressions - fiddle)
But you'd get the same error if the index expression wasn't modified - there would still be an implicit conversion of 'c' to a number, and that would still throw ORA-01722. As would some strings that look like numbers if your NLS settings are incompatible.
Have existing table called temptable, column largenumber is a NUMBER field, with no precision set:
largenumber NUMBER;
Query:
select largenumber from temptable;
It returns:
-51524845525550100000000000000000000
But If I do
column largenumber format 999999999999999999999999999999999999999
And then
select largenumber from temptable;
It returns:
-51524845525550:100000000000000000000
Why is there a colon?
To test, I took the number, remove the colon, and insert it to another table temptable2, and did the same column largenumber format, the select returns the number without the colon:
select largenumber from temptable2;
It returns:
-51524845525550100000000000000000000
So the colon is not present here.
So what could possibly be in the original number field to cause that colon?
In the original row, If I do a select and try to do any TO_CHAR, REPLACE, CAST, or concatenate to text, it would give me number conversion error.
For example, trying to generate a csv:
select '"' || largenumber || '",'
FROM temptable;
would result in:
ORA-01722 ("invalid number") error occurs when an attempt is made to convert a character string into a number, and the string cannot be converted into a valid number
In a comment (in response to a question from me), you shared that dump(largenumber) on the offending value returns
Typ=2 Len=8: 45,50,56,53,52,48,46,48
From the outset, that means that the data stored on disk is invalid (it is not a valid representation of a value of number data type). Typ=2 is correct, that is for data type number. The length (8 bytes) is correct (we can all count to eight to see that).
What is wrong is the bytes themselves. And, we only need to inspect the first and the last byte to see that.
The first byte is 45. It encodes the sign and the exponent of your number. The first bit (1 or 0) represents the sign: 1 for positive, 0 for negative. 45 is less than 128, so the first bit in the first byte is 0; so the number is negative. (So far this matches what you know about the intended value.)
But, for negative numbers, the last byte is always the magic value 102. Always. In another comment under your original question, Connor McDonald asks about your platform - but this is platform-independent, it is how Oracle encodes numbers for permanent storage on any platform. So, we already know that the dump value you got tells us the value is invalid.
In fact, Connor, in the same comment, gave the correct representation of that number (according to Oracle's scheme for internal representation of numbers). Indeed, just the last byte is wrong: your dump shows 48, but it should be 102.
How can you fix this? If it's a one-off, just use an update statement to replace the value with the correct one and move on. If your table has a primary key, let's call it id, then find the id for this row, and then
update {your_table} set largenumber = -50...... where id = {that_id};
Question is, how many such corrupt values might you have in your table? If it's just one, you can shrug it off; but if it's many (or even "a handful") you may want to figure out how they got there in the first place.
In most cases, the database will reject invalid values; you can't simply insert 'abc' in a number column, for example. But there are ways to get bad data in; even intentionally, and in a repeatable way. So, you would have to investigate how the bad values were inserted (what process was used for insertion).
For a trivial way to insert bad data in a number column, in a repeatable manner, you can see this thread on the Oracle developers forum: https://community.oracle.com/tech/developers/discussion/3903746/detecting-invalid-values-in-the-db
Please be advised that I had just started learning Oracle at that time (I was less than two months in), so I may have said some stupid things in that thread; but the method to insert bad data is described there in full detail, and it was tested. That shows just one possible (and plausible!) way to insert invalid stuff in a table; how it happened in your specific case, you will have to investigate yourself.
I had to make a CHAR(1 CHAR) column wider and I forgot to change the column type to VARCHAR2:
DUPLICADO CHAR(3 CHAR)
I noticed the error when my PHP app would no longer find exact matches, e.g.:
SELECT *
FROM NUMEROS
WHERE DUPLICADO = :foo
... with :foo being #4 didn't find the 3-char padded #4 value. However, I initially hit a red herring while debugging the query in SQL Developer because injecting raw values into the query would find matches!
SELECT *
FROM NUMEROS
WHERE DUPLICADO = '#4'
Why do I get matches with the second query? Why do prepared statements make a difference?
To expand a little on my comments, I found a bit in the documentation that explains difference between blankpadded and nonpadded comparison:
http://docs.oracle.com/database/121/SQLRF/sql_elements002.htm#BABJBDGB
If both values in your comparison (the two sides of the equal sign) have datatype CHAR or NCHAR or are literal strings, then Oracle chooses blankpadded comparison. That means that if the lengths are different, then it pads the short one with blanks until they are the same length.
With the column DUPLICADO being a CHAR(3), the value '#4' is stored in the column as three characters '#4 ' (note the blank as third character.) When you do DUPLICADO = '#4' the rule states Oracle will use blankpadded comparison and therefore blankpad the literal '#4' until it has the same length as the column. So it actually becomes DUPLICADO = '#4 '.
But when you do DUPLICADO = :foo, it will depend on the datatype of the bind variable. If the datatype is CHAR, it will also perform blankpadded comparison. But if the datatype is VARCHAR2, then Oracle will use non-padded comparison and then it will be up to you to ensure to do blankpadding where necessary.
Depending on client or client language you may be able to specify the datatype of the bind variable and thereby get blankpadded or nonpadded comparison as needed.
SQL Developer may be a special case that might not allow you to specify datatype - it just possibly might default to bind variables always being datatype VARCHAR2. I don't know sufficient about SQL Developer to be certain about that ;-)
I have a query that has
... WHERE PRT_STATUS='ONT' ...
The prt_status field is defined as CHAR(5) though. So it's always padded with spaces. The query matches nothing as the result. To make this query work I have to do
... WHERE rtrim(PRT_STATUS)='ONT'
which does work.
That's annoying.
At the same time, a couple of pure-java DBMS clients (Oracle SQLDeveloper and AquaStudio) I have do NOT have a problem with the first query, they return the correct result. TOAD has no problem either.
I presume they simply put the connection into some compatibility mode (e.g. ANSI), so the Oracle knows that CHAR(5) expected to be compared with no respect to trailing characters.
How can I do it with Connection objects I get in my application?
UPDATE I cannot change the database schema.
SOLUTION It was indeed the way Oracle compares fields with passed in parameters.
When bind is done, the string is passed via PreparedStatement.setString(), which sets type to VARCHAR, and thus Oracle uses unpadded comparision -- and fails.
I tried to use setObject(n,str,Types.CHAR). Fails. Decompilation shows that Oracle ignores CHAR and passes it in as a VARCHAR again.
The variant that finally works is
setObject(n,str,OracleTypes.FIXED_CHAR);
It makes the code not portable though.
The UI clients succeed for a different reason -- they use character literals, not binding. When I type PRT_STATUS='ONT', 'ONT' is a literal, and as such compared using padded way.
Note that Oracle compares CHAR values using blank-padded comparison semantics.
From Datatype Comparison Rules,
Oracle uses blank-padded comparison
semantics only when both values in the
comparison are either expressions of
datatype CHAR, NCHAR, text literals,
or values returned by the USER
function.
In your example, is 'ONT' passed as a bind parameter, or is it built into the query textually, as you illustrated? If a bind parameter, then make sure that it is bound as type CHAR. Otherwise, verify the client library version used, as really old versions of Oracle (e.g. v6) will have different comparison semantics for CHAR.
If you cannot change your database table, you can modify your query.
Some alternatives for RTRIM:
.. WHERE PRT_STATUS like 'ONT%' ...
.. WHERE PRT_STATUS = 'ONT ' ... -- 2 white spaces behind T
.. WHERE PRT_STATUS = rpad('ONT',5,' ') ...
I would change CHAR(5) column into varchar2(5) in db.
You can use cast to char operation in your query:
... WHERE PRT_STATUS=cast('ONT' as char(5))
Or in more generic JDBC way:
... WHERE PRT_STATUS=cast(? as char(5))
And then in your JDBC code do use statement.setString(1, "ONT");