What is space allocated for NULL values of an Oracle column (especially CLOB) - oracle

We have a report (simple text) to store in Oracle , average case will be less than 4K but some cases exceeding that. So an option is to use CLOB.
It is for logging purposes only, not used in query or update. Only insert once and retireve few times.
Space and overall schema (other tables) performance is of main concern.
I read about CLOB storage allocation format.
We are contemplating using 2 columns, msgV varchar2(4000) and msgC CLOB. When text exceeds 4k then we store into CLOB otherwise usual varchar2 and CLOB remains NULL.
So my question is,
Is this scheme better w.r.t to above performance consideration or simply use CLOB ?
(apart from it entailing more coding work maintaining this condition everywhere)
And what is the space consumed by NULL and Empty CLOBs (or any datatype) ?

Use a clob. If the data in the clob is under 4000 bytes, it will actually be stored inline. See section in link below about LOB's compared to LONG.
Oracle Lob

Related

Best way to save large column data in datawarehouse

I have a table that stores the changes to a transaction. All the changes are captured into a table. One of the column that comes as part of the transaction can have many comma separated values. The number of occurrences cannot be predicted. Also this field is not mandatory and can have null values as well.
The total number of transactions that i have in the table is around 100M. Out of those the number of records for which the value is populated is 1M. Out of the 1M transactions the number of records for which the length of the record exceeds 4000 is ~37K.
I mentioned the length as 4000 since in my oracle table the column which would save this has been defined as varchar2(4000).
I check at places and found that if I have to save something of unknown length then i should define the table column datatype as clob. But clob is expensive for me since only a very small amount of data has length > 4000. If I snowflake my star schema and create another table to store the values then even though, I have transactions for which the length is much smaller than 4000 would be saved as part of the clob column. This would be expensive both in terms of storage and performance.
Can someone suggest me an approach to solve this problem.
Thanks
S
You could create a master - detail table to store the comma separated values, then you could have rows rather than save all comma separated values in a single column. This could be managed with a foregn key using a pseudo key between master and detail table.
Here's one option.
Create two columns, e.g.
create table storage
(id number primary key,
long_text_1 varchar2(4000),
long_text_2 varchar2(4000)
);
Store values like
insert into storage (id, long_text_1, long_text_2)
values (seq.nextval,
substr(input_value, 1, 4000),
substr(input_value, 4001, 4000)
);
When retrieving them from the table, concatenate them:
select id,
long_text_1 || long_text_2 as long_text
from storage
where ...
You might benefit from using inline SecurFile CLOBs. With inline CLOBs, up to about 4000 bytes of data can be stored in rows like a regular VARCHAR2 and only the larger values will be stored in a separate CLOB segment. With SecureFiles, Oracle can significantly improve CLOB performance. (For example, import and export of SecureFiles is much faster than the old-fashioned BasicFile LOB format.)
Depending on your version, parameters, and table DDL, your database may already store CLOBs as inline SecureFiles. Ensure that your COMPATIBLE setting is 11.2 or higher, and that DB_SECUREFILE is one of "permitted", "always", or "preferred":
select name, value from v$parameter where name in ('compatible', 'db_securefile') order by 1;
Use a query like this to ensure that your tables were setup correctly, and nobody overrode the system settings:
select dbms_metadata.get_ddl('TABLE', 'YOUR_TABLE_NAME') from dual;
You should see something like this in the results:
... LOB ("CLOB_NAME") STORE AS SECUREFILE (... ENABLE STORAGE IN ROW ...) ...
One of the main problems with CLOBs is that they are stored in a separate segment, and a LOB index must be traversed to map each row in the table to a value in another segment. The below demo creates two tables to show that LOB segments do not need to be used when the the data is small and stored inline.
--drop table clob_test_inline;
--drop table clob_test_not_in;
create table clob_test_inline(a number, b clob) lob(b) store as securefile (enable storage in row);
create table clob_test_not_in(a number, b clob) lob(b) store as (disable storage in row);
insert into clob_test_inline select level, lpad('A', 900, 'A') from dual connect by level <= 10000;
insert into clob_test_not_in select level, lpad('A', 900, 'A') from dual connect by level <= 10000;
commit;
The inline table segment is large, because it holds all the data. The out of line table segment is small, because all of its data is held elsewhere.
select segment_name, bytes/1024/1024 mb_inline
from dba_segments
where segment_name like 'CLOB_TEST%'
order by 1;
SEGMENT_NAME MB_INLINE
---------------- ---------
CLOB_TEST_INLINE 27
CLOB_TEST_NOT_IN 0.625
Looking at the LOB segments, the sizes are reversed. The inline table doesn't store anything in the LOB segment.
select table_name, bytes/1024/1024 mb_out_of_line
from dba_segments
join dba_lobs
on dba_segments.owner = dba_lobs.owner
and dba_segments.segment_name = dba_lobs.segment_name
where dba_lobs.table_name like 'CLOB_TEST%'
order by 1;
TABLE_NAME MB_OUT_OF_LINE
------------ --------------
CLOB_TEST_INLINE 0.125
CLOB_TEST_NOT_IN 88.1875
Despite the above, I can't promise that CLOBs will still work for you. All I can say is that it's worth testing the data using CLOBs. You'll still need to look out for a few things. CLOBs store text slightly differently (UCS2 instead of UTF8), which may take up more space depending on your character sets. So check the segment sizes. But also beware that segment sizes can lie when they are small - there's a lot of auto-allocated overhead for sample data, so you'll want to use realistic sizes when testing.
Finally, as Raul pointed out, storing non-atomic values in a field is usually a terrible mistake. That said, there are rare times when data warehouses need to break the rules for performance, and data needs to be stored as compactly as possible. Before you store the data this way, ensure that you will never need to join based on those values, or query for individual values. If you think dealing with 100M rows is tough, just wait until you try to split 100M values and then join them to another table.

Oracle to PostgreSQL database reduces

Why does the database size reduce in PostgreSQL post migration from Oracle schema having lob, clob and blob datatypes
The main reason is that Postgres by default compresses values that are bigger than (approximately) 2000 bytes of data variable length data types - these are mainly text, varchar and bytea types.
Oracle will only compress the content of LOB columns if you are using the Enterprise Edition and enable the compression when defining the LOB column (the most important part is to use SecureFile instead of BasicFile).
Most probably your LOB columns where defined without using compression in Oracle and contain many values bigger than 2000 bytes, that's why you see a reduction in size due to Postgres' automatic compression.

Max character data which can store CLOB and NCLOB type

Quote from documentation
The LOB datatypes for character data are CLOB and NCLOB. They can store up
to 8 terabytes of character data (CLOB) or national character set data (NCLOB).
and this is another quote from same page:
The CLOB and NCLOB datatypes store up to 128 terabytes of character data in the database. CLOBs store database character set data, and NCLOBs store Unicode national character set data.`
I am little confused, there is some misunderstanding in documentation or i miss something?
The difference stems from the fact that you can define LOBs with different "chunk" sizes. Plus their maximum size is limited by the number of database blocks used for them. If you create a database (or tablespace) with a larger blocksize this means a LOB can contain more data.
From the manual:
CLOB objects can store up to (4 gigabytes -1) * (the value of the CHUNK parameter of LOB storage) of character data
And the next sentence describes the relation to the blocksize:
If the tablespaces in your database are of standard block size, and if you have used the default value of the CHUNK parameter of LOB storage when creating a LOB column, then this is equivalent to (4 gigabytes - 1) * (database block size).

Oracle LOB Not getting stored inline

Setup
I have an oracle table that has couple of attributes and a CLOB datatype. The table below I have create with two ways ,each of which should give the same behavior.
CREATE TABLE DEMO(
a number (10, 2),
data CLOB
)
CREATE TABLE DEMO(
a number (10, 2),
data CLOB
) LOB (data) Stored AS (STORAGE IN ROW ENABLED)
Scenario
As per the oracle documentation when the CLOB is greater the 4000 bytes it will be stored outline else inline.
When I store the data in this table for a clob value say "Hello" and then I see the segment information for the "Demo table" and "Demo table LOB segment" , it shows that all the data is going to table and no new blocks are being consumed in the Lob Segment.
When I store a bigger data with total character less than 1500 , then also I get the same behavior as above.
But when I store a data with total character > 2000 and < 3000 , then the LOB data is going to the LOB segment even though total character are less than 3000.
Question
Why is the data smaller than 3000 characters is going to the LOB Segment ? . Is that each character takes 2 bytes , which justifies that data till 1500 is going to the data instead of Log Segment.
Problem
Lots of disk space is getting wasted because of the LOB Table , since the CHUNK size is 8kb and the data per block will always be around 3 - 4K character and in some cases exceeding that. So essential for each row 4Kb space is wasted and in out case of 20 mn rows , its running in 50's of GBs
This may explain the above behaviour..
"The CLOB and NCLOB datatypes store up to 4 gigabytes of character data in the database. CLOBs store database character set data and NCLOBs store Unicode national character set data. For varying-width database character sets, the CLOB value is stored in the database using the two-byte Unicode character set, which has a fixed width. Oracle translates the stored Unicode value to the character set requested on the client or on the server, which can be fixed-width or varying width. When you insert data into a CLOB column using a varying-width character set, Oracle converts the data into Unicode before storing it in the database."
http://docs.oracle.com/cd/B10500_01/server.920/a96524/c13datyp.htm#3234

How to store unlimited characters in Oracle 11g?

We have a table in Oracle 11g with a varchar2 column. We use a proprietary programming language where this column is defined as string. Maximum we can store 2000 characters (4000 bytes) in this column. Now the requirement is such that the column needs to store more than 2000 characters (in fact unlimited characters). The DBAs don't like BLOB or LONG datatypes for maintenance reasons.
The solution that I can think of is to remove this column from the original table and have a separate table for this column and then store each character in a row, in order to get unlimited characters. This tble will be joined with the original table for queries.
Is there any better solution to this problem?
UPDATE: The proprietary programming language allows to define variables of type string and blob, there is no option of CLOB. I understand the responses given, but I cannot take on the DBAs. I understand that deviating from BLOB or LONG will be developers' nightmare, but still cannot help it.
UPDATE 2: If maximum I need is 8000 characters, can I just add 3 more columns so that I will have 4 columns with 2000 char each to get 8000 chars. So when the first column is full, values would be spilled over to the next column and so on. Will this design have any bad side effects? Please suggest.
If a blob is what you need convince your dba it's what you need. Those data types are there for a reason and any roll your own implementation will be worse than the built in type.
Also you might want to look at the CLOB type as it will meet your needs quite well.
You could follow the way Oracle stored their stored procedures in the information schema. Define a table called text columns:
CREATE TABLE MY_TEXT (
IDENTIFIER INT,
LINE INT,
TEXT VARCHAR2 (4000),
PRIMARY KEY (INDENTIFIER, LINE));
The identifier column is the foreign key to the original table. The Line is a simple integer (not a sequence) to keep the text fields in order. This allows keeping larger chunks of data
Yes this is not as efficient as a blob, clob, or LONG (I would avoid LONG fields if at all possible). Yes, this requires more mainenance, buf if your DBAs are dead set against managing CLOB fields in the database, this is option two.
EDIT:
My_Table below is where you currently have the VARCHAR column you are looking to expand. I would keep it in the table for the short text fields.
CREATE TABLE MY_TABLE (
INDENTIFER INT,
OTHER_FIELD VARCHAR2(10),
REQUIRED_TEXT VARCHAR(4000),
PRIMERY KEY (IDENTFIER));
Then write the query to pull the data join the two tables, ordering by LINE in the MY_TEXT field. Your application will need to split the string into 2000 character chunks and insert them in line order.
I would do this in a PL/SQL procedure. Both insert and select. PL/SQL VARCHAR strings can be up to 32K characters. Which may or may not be large enough for your needs.
But like every other person answering this question, I would strongly suggest making a case to the DBA to make the column a CLOB. From the program perspective this will be a BLOB and therefore simple to manage.
You said no BLOB or LONG... but what about CLOB? 4GB character data.
BLOB is the best solution. Anything else will be less convenient and a bigger maintenance annoyance.
Is BFILE a viable alternative datatype for your DBAs?
I don't get it. A CLOB is the appropriate database datatype. If your weird programming language will deal with strings of 8000 (or whatever) characters, what stops it writing those to a CLOB.
More specifically, what error do you get (from Oracle or your programming language) when you try to insert an 8000 character string into a column defined as a CLOB.

Resources