When to use RAW datatype column over VARCHAR2 in Oracle? - oracle

I've been working in a large scale project where all the primary keys are stored as RAW type. The ID field is auto-generated as a unique 16 digit UUID. I can't find any particular advantage of using RAW type column. Can someone help understand if there is any real advantage of storing primary keys in RAW format instead of VARCHAR2?

A GUID in Oracle is represented as raw(16).
You can get a GUID like this:
select sys_guid() from dual;
That's why you should use raw(16).

Well in the database design typically the size matter. The bigger key takes more space in storage, on disc, the sorting takes longer time etc.
From this point the integer database key is the most compact one (implemented as a NUMBER type with zero precision, allocation typically between 2-8 bytes).
From various reasons UUID is used as a key – with various motivations that are often independent of the database design rules.
Additionally, the UUID is often stored as formatted string in a VARCHAR2 column.
This is similar design as if you would store DATEs as a string (which is considered not a best practice).
Despite of it the RAW(16) columns allocate 16 bytes, the formatted UUID 36 bytes.
So in summary IMO there a following recommendations
Use NUMBER keys
If you can’t (and have solid arguments for it) use UUID in RAW(16) format
Note that of course the RAW format is a bit inconvenient to handle than a string (e.g. in setting of a bind variable). This often leads to the decision of storing the UUIDas a string - the vast majority of cases I encountered.
Below a small example illustrating the difference in sizing
create table tab
(id INT,
RAW_UUID RAW(16)
);
insert into tab(ID,RAW_UUID) values (1,sys_guid());
insert into tab(ID,RAW_UUID) values (1000000001,sys_guid());
select * from tab;
ID RAW_UUID
---------- --------------------------------
1 8135869AECF44FB280A04033888FD518
1000000001 DE04ED07DDD84D1AABE9059F38364C7E
select vsize(id), vsize(raw_uuid) from tab;
VSIZE(ID) VSIZE(RAW_UUID)
---------- ---------------
2 16
6 16
What you can do is to define a virtual column (i.e. column that allocates no space) that presents the formatted UUID:
alter table tab add ( UUID VARCHAR2(36) GENERATED ALWAYS AS
(SUBSTR(LOWER(RAWTOHEX(RAW_UUID)),1,8)||'-'||SUBSTR(LOWER(RAWTOHEX(RAW_UUID)),9,4)||'-'||
SUBSTR(LOWER(RAWTOHEX(RAW_UUID)),13,4)||'-'||SUBSTR(LOWER(RAWTOHEX(RAW_UUID)),17,4)||'-'||
SUBSTR(LOWER(RAWTOHEX(RAW_UUID)),21,12)) VIRTUAL VISIBLE);
Now the table has the text form UUID as well and you can use the familiar query
select * from tab where uuid = 'cbf7e2e2-a9e9-40fb-badc-18cb9a4fe663';
You can even define an index on the virtual column, but always before using UUID think on the Rule 1 above.

Related

How to handle group level variable in oracle PL/SQL, so that i can access individual as well as group level variable?

Currently we are working on migration of COBOL code into oracle PL/SQL. In COBOL there is a record/group level variable concept, eg.
01 PARENT-VAR.
05 CHILD-1 PIC 9(2) VALUE 12.
05 CHILD-2 PIC 9(3) VALUE 345.
Basically it means, we can access individual variable CHILD-1 or CHILD-2. Also if we access PARENT-VAR, we will get the value of both child variables auto-concatenated as: 12345.
If we try to implement same concept in PL/SQL we can use IS RECORD as:
TYPE TYP_PARENT_VAR IS RECORD
( CHILD-1 NUMBER(2) := 12,
CHILD-2 NUMBER(3) := 345);
VAR TYP_PARENT_VAR;
Now I can access individual child variables as VAR.CHILD-1 or VAR.CHILD-2. But if I have to access both variables at once, I see no way to do it without manual concatenation.
How can we access both the parent and the child-items?
PL/SQL uses compressed binary-coded decimal to encode numbers.
Those can't be concatenated like uncompressed BCD used by Cobol.
So you will be compelled to use some code to do the trick (ie CHILD-1 * 1000 + CHILD-2 or cast them as char and use string concatenation).
Take a look at computed columns. (SQL syntax is not really my thing, so this is likely wrong)
ALTER TABLE cobol_record
add column parent_var COMPUTED BY concat(to_char(child_1), to_char(child_2));
Or do that during create
CREATE TABLE cobol_record
(
child_1 number(2),
child_2 number(3),
parent_var COMPUTED BY CONCAT(TO_CHAR(child_1), TO_CHAR(child_2))
)
It won't take space in the table, but will let you do whatever concatenation you need. Yes, it's a manual concatenation, but only once, during the table definition. From what little I read, these fields are not computed until needed, so performance shouldn't take much of a hit.
And again, that syntax is off the top of a head that knows very little about Oracle SQL.

Oracle: Coercing VARCHAR2 and CLOB to the same type without truncation

In an app that supports MS SQL Server, MySQL, and Oracle, there's a table with the following relevant columns (types shown here are for Oracle):
ShortText VARCHAR2(1700) indexed
LongText CLOB
The app stores values 850 characters or less in ShortText, and longer ones in LongText. I need to create a view that returns that data, whichever column it's in. This works for SQL Server and MySQL:
SELECT
CASE
WHEN ShortText IS NOT NULL THEN ShortText
ELSE LongText
END AS TheValue
FROM MyTable
However, on Oracle, it generates this error:
ORA-00932: inconsistent datatypes: expected CHAR got CLOB
...meaning that Oracle won't implicitly convert the two columns to the same type, so the query has to do it explicitly. Don't want data to get truncated, so the type used has to be able to hold as much data as a CLOB, which as I understand it (not an Oracle expert) means CLOB, only, no other choices are available.
This works on Oracle:
SELECT
CASE
WHEN ShortText IS NOT NULL THEN TO_CLOB(ShortText)
ELSE LongText
END AS TheValue
FROM MyTable
However, performance is amazingly awful. A query that returns LongText directly took 70-80 ms for about 9k rows, but the above construct took between 30 and 60 seconds, unacceptable.
So:
Are there any other Oracle types I could coerce both columns to
that can hold as much data as a CLOB? Ideally something more
text-oriented, like MySQL's LONGTEXT, or SQL Server's NTEXT (or even
better, NVARCHAR(MAX))?
Any other approaches I should be looking at?
Some specifics, in particular ones requested by #Guido Leenders:
Oracle version: Oracle Database 11g 11.2.0.1.0 64bit Production
Not certain if I was the only user, but the relative times are still striking.
Stats for the small table where I saw the performance I posted earlier:
rowcount: 9,237
varchar column total length: 148,516
clob column total length: 227,020
The to_clob is pretty expensive, so try to avoid it. But I think it should perform reasonable well for 9K rows. Following test case based upon one of the applications we develop which has the similar datamodel behaviour:
create table bubs_projecten_sample
( id number
, toelichting varchar2(1700)
, toelichting_l clob
)
begin
for i in 1..10000
loop
insert into bubs_projecten_sample
( id
, toelichting
, toelichting_l
)
values
( i
, case when mod(i, 2) = 0 then 'short' else null end
, case when mod(i, 2) = 0 then rpad('long', i, '*') else null end
)
;
end loop;
commit;
end;
Now make sure everything in cache and dirty blocks written out:
select *
from bubs_projecten_sample
Test performance:
create table bubs_projecten_flat
as
select id
, to_clob(toelichting) toelichting_any
from bubs_projecten_sample
where toelichting is not null
union all
select id
, toelichting_l
from bubs_projecten_sample
where toelichting_l is not null
The create table take less than 1 second on a normal entry level server, including writing out the data, 17K consistent gets, 4K physical reads. Stored on disk (note the rpad) is 25K for toelichting and 16M for toelichting_l.
Can you further elaborate on the problem?
Please check that large CLOBs are not stored inline. Normally large CLOBs are stored in a separate system-maintained table. Storing large CLOBs inside a table can make going through the table with a Full Table Scan expensive.
Also, I can imagine populating both columns always. You still have the benefits of indexing working for the first so many characters. You just need to memorize in the table using an indicator whether the CLOB or the shortText column is leading.
As a side note; I see a difference between 850 and 1700. I would recommend making them equal, but remember to check that you are creating the table using character semantics. That can be done on statement level by using: "varchar2(850 char)". Please note that Oracle will actually create a column that fits 850 * 4 bytes (in AL32UTF8 at least, there the "32" stands for "4 bytes at most per character"). Good luck!

Should I use Oracle's sys_guid() to generate guids?

I have some inherited code that calls SELECT SYS_GUID() FROM DUAL each time an entity is created. This means that for each insertion there are two calls to Oracle, one to get the Guid, and another to insert the data.
I suppose that there may be a good reason for this, for example - Oracle's Guids may be optimized for high-volume insertions by being sequential and thus they maybe are trying to avoid excessive index tree re-balancing.
Is there a reason to use SYS_GUID as opposed to building your own Guid on the client?
Why roll your own if you already have it provided to you. Also, you don't need to grab it first and then insert, you can just insert:
create table my_tab
(
val1 raw(16),
val2 varchar2(100)
);
insert into my_tab(val1, val2) values (sys_guid(), 'Some data');
commit;
You can also use it as a default value for a primary key:
drop table my_tab;
create table my_tab
(
val1 raw(16) default sys_guid(),
val2 varchar2(100),
primary key(val1)
);
Here there's no need for setting up a before insert trigger to use a sequence (or in most cases even caring about val1 or how its populated in the code).
More maintenance for sequences also. Not to mention the portability issues when moving data between systems.
But, sequences are more human friendly imo (looking at and using a number is better than a 32 hex version of a raw value, by far). There may be other benefits to sequences, I haven't done any extensive comparisons, you may wish to run some performance tests first.
If your concern is two database calls, you should be able to call SYS_GUID() within your INSERT statement. You could even use a RETURNING clause to get the value that Oracle generated, so that you have it in your application for further use.
SYS_GUID can be used as a default value for a primary key column, which is often more convenient than using a sequence, but note that the values will be more or less random and not sequential. On the plus side, that may reduce contention for hot blocks, but on the minus side your index inserts will be all over the place as well. We generally recommend against this practice.
for reference click here
I have found no reason to generate a Guid from Oracle. The round trip between Oracle and the client for every Guid is likely slower than the occasional index rebalancing that occurs is random value inserts.

How do I optimize the following SQL query for performance?

How do I optimize the following SQL query for performance?
select * from Employee where CNIC = 'some-CNIC-number'
Will using alias help making it a little faster?
I am using Microsoft SQL Server.
This is better if you tell us what RDBMS you are using, but...
1 - Don't do SELECT *. Specify which columns you need. Less data = faster query
2 - For indexing, make sure you have an index on CNIC. You also want a good clustered index on a primary key (preferably something like an ID number)
3 - You put the number in single quotes ' ' which indicates you may have it as a varchar column. If it will always be NUMERIC, it should be an int/bigint data type. This takes up less space and will be faster to retrieve and index by.
Create an index on CNIC:
CREATE INDEX ix_employee_cnic ON employee (cnic)
First thing, as I see this column will be used for storing Id card nos, then you can make your coulmn of type int rather than varchar or nvarchar as searching will faster on an integer type as compared to varchar or nvarchar.
Second, use with (no lock), like
select * from Employee with (nolock) where CNIC = 'some-CNIC-number'
This is to minimize the chances of a deadlock.

Number VS Varchar(2) Primary Keys

I'm now to this point of my project that I need to design my database (Oracle).
Usually for the status and countries tables I don’t use a numeric primary key, for example
STATUS (max 6)
AC --> Active
DE --> Deleted
COUNTRIES (total 30)
UK --> United Kingdom
IT --> Italy
GR --> Greece
These tables are static, not updated through the application and it's not foreseen to be change in the future so there is no chance having update problems in tables that will use these values as foreign keys.
The main table of the application will use status and country (more than once e.g. origin country, destination country) and it is foreseen that 600000 rows will be added per year
So my question is, will these VARCHAR(2) keys will have an impact in the performance when querying the join of there 3 tables.
Will the first be significantly slower than the second?
SELECT m.*,
s.status_name,
c.country_name
FROM main m, status s, countries c
WHERE m.status_cd = s.status_cd
AND m.country_cd = c.country_cd
AND m.status_cd = 'AC'
AND m.country_cd = 'UK'
SELECT m.*,
s.status_name,
c.country_name
FROM main m, status s, countries c
WHERE m.status_cd = s.status_cd
AND m.country_cd = c.country_cd
AND m.status_cd = 1
AND m.country_cd = 2
Clarification:
Status is not binary ("max 6" next to the table name). The values will probably be:
* active
* deleted
* draft
* send
* replaced
and we need to display the decoded values to the user, so we need the names.
Both the status and country tables are so small that they are going to be memory resident in practice, whether formally stated as such or not. Indeed, except that a foreign key normally requires an index on the referenced primary key field, you might be tempted not to bother with any indexes on the tables.
The performance difference between the joins with different types is going to be negligible, and the numeric code will, if anything, be slower since there's 'more' data to store (but it is all so small that it is negligible, again).
So, go with the natural codes. All else apart, the SQL in the first example is clearer; the 'UK' and 'AC' are much more meaningful than 1 and 2.
In non-Oracle DBMS, you would probably use CHAR(2) for both the status and country code values. Oracle users tend to use VARCHAR2 for everything; I'm not sure whether there is a penalty for using a CHAR(2) column instead, especially since the column values are fixed length. (Under Informix, for instance, a VARCHAR(2) field - a field of up to two characters - would store as 3 bytes, a length (always 2 in your case) and the 2 data bytes. By contrast, a CHAR(2) field would occupy just 2 bytes.)
Check out this link. Bottom line is there isn't much performance difference between varchar and num. So you should go for which ever makes sense for the column. Here, the varchar seems to make more sense.
If 'status' is (and will always be?) a binary active/deleted field why bother with the table at all. It seems like normalization taken to an impractical extreme.
It would certainly be quicker, not to mention easier, to simply use a tinyint(1) field and record the active/deleted state as a 1 or 0.
This eliminates one of your joins entirely which has got to be a good thing.
It does not matter which methode you choose in this case. The important part is to use the same kind throughout the database and be consistent in your id convention.

Resources