Does anybody know how SecureFile stores LOBs into chunks? The documentation says the CHUNK parameter is only advisory (http://docs.oracle.com/cd/E11882_01/appdev.112/e18294/adlob_smart.htm#CIHIDGJA).
I did some initial tests, it looks like Oracle uses minimally one block per LOB (EDIT: if it is stored in the lobsegment). Is that right?
CREATE TABLE sftest (mylob CLOB)
LOB (mylob) STORE AS SECUREFILE sftest_mylob (DISABLE STORAGE IN ROW);
INSERT INTO sftest SELECT object_name FROM all_objects;
11,825 rows inserted
SELECT blocks FROM user_segments WHERE segment_name='SFTEST_MYLOB';
14336
Actually, minimum one block per LOB is incorrect. It can be much smaller, depending on the size of your LOB. From the Oracle docs
LOB values are stored inline when any of the following conditions
apply:
When the size of the LOB stored in the given row is small, approximately 4000 bytes or less, and you either explicitly specify
ENABLE STORAGE IN ROW or the LOB storage clause when you create the
table, or when you do not specify this parameter (which is the
default).
When the LOB value is NULL (regardless of the LOB storage properties
for the column).
Using the default LOB storage properties (inline storage) can allow
for better database performance; it avoids the overhead of creating
and managing out-of-line storage for smaller LOB values. If LOB values
stored in your database are frequently small in size, then using
inline storage is recommended.
Yes, the minimal number of blocks of a SecureFile LOB is 1 (if it is not stored in row).
We are using SecureFile LOBs for a number of years in production, and I found out how to inspect the block number of an individual LOB. First, you need to get the number of extents the LOB has by calling DBMS_LOBUTIL.GETINODE(mylob).EXTENTS, then you can find out for each extent, how many blocks are stored it it with DBMS_LOBUTIL.GETLOBMAP(mylob, myextent).NBLKS, e.g.
SELECT DBMS_LOBUTIL.GETINODE(mylob).LENGTH AS len,
DBMS_LOBUTIL.GETINODE(mylob).EXTENTS AS extents,
DBMS_LOBUTIL.GETLOBMAP(mylob, 0).NBLKS AS nblks
FROM sftest;
LEN EXTENTS NBLKS
34 1 1
24 1 1
42 1 1
Related
I have a table that stores the changes to a transaction. All the changes are captured into a table. One of the column that comes as part of the transaction can have many comma separated values. The number of occurrences cannot be predicted. Also this field is not mandatory and can have null values as well.
The total number of transactions that i have in the table is around 100M. Out of those the number of records for which the value is populated is 1M. Out of the 1M transactions the number of records for which the length of the record exceeds 4000 is ~37K.
I mentioned the length as 4000 since in my oracle table the column which would save this has been defined as varchar2(4000).
I check at places and found that if I have to save something of unknown length then i should define the table column datatype as clob. But clob is expensive for me since only a very small amount of data has length > 4000. If I snowflake my star schema and create another table to store the values then even though, I have transactions for which the length is much smaller than 4000 would be saved as part of the clob column. This would be expensive both in terms of storage and performance.
Can someone suggest me an approach to solve this problem.
Thanks
S
You could create a master - detail table to store the comma separated values, then you could have rows rather than save all comma separated values in a single column. This could be managed with a foregn key using a pseudo key between master and detail table.
Here's one option.
Create two columns, e.g.
create table storage
(id number primary key,
long_text_1 varchar2(4000),
long_text_2 varchar2(4000)
);
Store values like
insert into storage (id, long_text_1, long_text_2)
values (seq.nextval,
substr(input_value, 1, 4000),
substr(input_value, 4001, 4000)
);
When retrieving them from the table, concatenate them:
select id,
long_text_1 || long_text_2 as long_text
from storage
where ...
You might benefit from using inline SecurFile CLOBs. With inline CLOBs, up to about 4000 bytes of data can be stored in rows like a regular VARCHAR2 and only the larger values will be stored in a separate CLOB segment. With SecureFiles, Oracle can significantly improve CLOB performance. (For example, import and export of SecureFiles is much faster than the old-fashioned BasicFile LOB format.)
Depending on your version, parameters, and table DDL, your database may already store CLOBs as inline SecureFiles. Ensure that your COMPATIBLE setting is 11.2 or higher, and that DB_SECUREFILE is one of "permitted", "always", or "preferred":
select name, value from v$parameter where name in ('compatible', 'db_securefile') order by 1;
Use a query like this to ensure that your tables were setup correctly, and nobody overrode the system settings:
select dbms_metadata.get_ddl('TABLE', 'YOUR_TABLE_NAME') from dual;
You should see something like this in the results:
... LOB ("CLOB_NAME") STORE AS SECUREFILE (... ENABLE STORAGE IN ROW ...) ...
One of the main problems with CLOBs is that they are stored in a separate segment, and a LOB index must be traversed to map each row in the table to a value in another segment. The below demo creates two tables to show that LOB segments do not need to be used when the the data is small and stored inline.
--drop table clob_test_inline;
--drop table clob_test_not_in;
create table clob_test_inline(a number, b clob) lob(b) store as securefile (enable storage in row);
create table clob_test_not_in(a number, b clob) lob(b) store as (disable storage in row);
insert into clob_test_inline select level, lpad('A', 900, 'A') from dual connect by level <= 10000;
insert into clob_test_not_in select level, lpad('A', 900, 'A') from dual connect by level <= 10000;
commit;
The inline table segment is large, because it holds all the data. The out of line table segment is small, because all of its data is held elsewhere.
select segment_name, bytes/1024/1024 mb_inline
from dba_segments
where segment_name like 'CLOB_TEST%'
order by 1;
SEGMENT_NAME MB_INLINE
---------------- ---------
CLOB_TEST_INLINE 27
CLOB_TEST_NOT_IN 0.625
Looking at the LOB segments, the sizes are reversed. The inline table doesn't store anything in the LOB segment.
select table_name, bytes/1024/1024 mb_out_of_line
from dba_segments
join dba_lobs
on dba_segments.owner = dba_lobs.owner
and dba_segments.segment_name = dba_lobs.segment_name
where dba_lobs.table_name like 'CLOB_TEST%'
order by 1;
TABLE_NAME MB_OUT_OF_LINE
------------ --------------
CLOB_TEST_INLINE 0.125
CLOB_TEST_NOT_IN 88.1875
Despite the above, I can't promise that CLOBs will still work for you. All I can say is that it's worth testing the data using CLOBs. You'll still need to look out for a few things. CLOBs store text slightly differently (UCS2 instead of UTF8), which may take up more space depending on your character sets. So check the segment sizes. But also beware that segment sizes can lie when they are small - there's a lot of auto-allocated overhead for sample data, so you'll want to use realistic sizes when testing.
Finally, as Raul pointed out, storing non-atomic values in a field is usually a terrible mistake. That said, there are rare times when data warehouses need to break the rules for performance, and data needs to be stored as compactly as possible. Before you store the data this way, ensure that you will never need to join based on those values, or query for individual values. If you think dealing with 100M rows is tough, just wait until you try to split 100M values and then join them to another table.
Could you please let me know what is the Data length for the 2nd column_id of "CLOB" data type in the Employee table? I see some blogs where it says maximum data length is : (4GB -1)* (database block size).
I'm new to this data designing.
Table : Employee
**Column_Name ----- Data_Type ------- Nullable ---- Column_Id**
Emp_ID NUMBER No 1
Emp_details CLOB NO 2
Please help me.
To get CLOB size for a given column in a given row, use DBMS_LOB.GETLENGTH function:
select dbms_lob.getlength(emp_details) from employee from emp_id=1;
To get CLOB size for a given column in a given table that is allocated in the tablespace, you need to identify both segments implementing the LOB.
You can compare both size with following query:
select v1.col_size, v2.seg_size from
(select sum(dbms_lob.getlength(emp_details)) as col_size from employee) v1,
(select sum(bytes) as seg_size from user_segments where segment_name in
(
(select segment_name from user_lobs where table_name='EMPLOYEE' and column_name='EMP_DETAILS')
union
(select index_name from user_lobs where table_name='EMPLOYEE' and column_name='EMP_DETAILS')
)
) v2
;
LOBs are not stored in the table, but outside of it in a dedicated structure called LOB segment, using an LOB index. As #pifor explains, you can inspect those structures in the dictionary view user_lobs.
The LOB segment uses blocks of usually 8192 bytes (check the tablespace in user_lobs), so the minimum size allocated for a single LOB is 8K. For 10.000 bytes, you need two 8K blocks and so on.
Please note that if your database is set to Unicode (as most modern Oracle databases are), the size of a CLOB is roughly 2x as expectet, because they are stored in a 16 bit encoding.
This gets a bit better if you compress the LOBs, but your Oracle license needs to cover "Advanced Compression".
For very small LOBs (less than ca 4000 bytes), you can avoid the 8K overhead and store them in the table where all the other columns are (enable storage in row).
Quote from documentation
The LOB datatypes for character data are CLOB and NCLOB. They can store up
to 8 terabytes of character data (CLOB) or national character set data (NCLOB).
and this is another quote from same page:
The CLOB and NCLOB datatypes store up to 128 terabytes of character data in the database. CLOBs store database character set data, and NCLOBs store Unicode national character set data.`
I am little confused, there is some misunderstanding in documentation or i miss something?
The difference stems from the fact that you can define LOBs with different "chunk" sizes. Plus their maximum size is limited by the number of database blocks used for them. If you create a database (or tablespace) with a larger blocksize this means a LOB can contain more data.
From the manual:
CLOB objects can store up to (4 gigabytes -1) * (the value of the CHUNK parameter of LOB storage) of character data
And the next sentence describes the relation to the blocksize:
If the tablespaces in your database are of standard block size, and if you have used the default value of the CHUNK parameter of LOB storage when creating a LOB column, then this is equivalent to (4 gigabytes - 1) * (database block size).
Setup
I have an oracle table that has couple of attributes and a CLOB datatype. The table below I have create with two ways ,each of which should give the same behavior.
CREATE TABLE DEMO(
a number (10, 2),
data CLOB
)
CREATE TABLE DEMO(
a number (10, 2),
data CLOB
) LOB (data) Stored AS (STORAGE IN ROW ENABLED)
Scenario
As per the oracle documentation when the CLOB is greater the 4000 bytes it will be stored outline else inline.
When I store the data in this table for a clob value say "Hello" and then I see the segment information for the "Demo table" and "Demo table LOB segment" , it shows that all the data is going to table and no new blocks are being consumed in the Lob Segment.
When I store a bigger data with total character less than 1500 , then also I get the same behavior as above.
But when I store a data with total character > 2000 and < 3000 , then the LOB data is going to the LOB segment even though total character are less than 3000.
Question
Why is the data smaller than 3000 characters is going to the LOB Segment ? . Is that each character takes 2 bytes , which justifies that data till 1500 is going to the data instead of Log Segment.
Problem
Lots of disk space is getting wasted because of the LOB Table , since the CHUNK size is 8kb and the data per block will always be around 3 - 4K character and in some cases exceeding that. So essential for each row 4Kb space is wasted and in out case of 20 mn rows , its running in 50's of GBs
This may explain the above behaviour..
"The CLOB and NCLOB datatypes store up to 4 gigabytes of character data in the database. CLOBs store database character set data and NCLOBs store Unicode national character set data. For varying-width database character sets, the CLOB value is stored in the database using the two-byte Unicode character set, which has a fixed width. Oracle translates the stored Unicode value to the character set requested on the client or on the server, which can be fixed-width or varying width. When you insert data into a CLOB column using a varying-width character set, Oracle converts the data into Unicode before storing it in the database."
http://docs.oracle.com/cd/B10500_01/server.920/a96524/c13datyp.htm#3234
I create a program to insert Large file into database (around 10M). I choosed BLOB type for objects column in my table.
Now I read BLOB just support binary object with maximoum lengh of 4M.
would you advice me what can I do in this case for upload those object more than 4M?
I am useing Oracle 9i or 10g.
You read something that appears to be incorrect.
Per the Oracle 10g Release 2 documentation:
The BLOB datatype stores unstructured binary large objects. BLOB objects
can be thought of as bitstreams with no character set semantics. BLOB
objects can store binary data up to (4 gigabytes -1) * (the value of the
CHUNK parameter of LOB storage).
If the tablespaces in your database are of standard block size, and if you
have used the default value of the CHUNK parameter of LOB storage when
creating a LOB column, then this is equivalent to (4 gigabytes - 1) *
(database block size).
The maximum size of a LOB supported by
the database is equal to the value of
the db_block_size initialization
parameter times the value 4294967295.
This allows for a maximum LOB size
ranging from 8 terabytes to 128
terabytes.
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_lob.htm#i1016062
According to this site,
http://ss64.com/ora/syntax-datatypes.html
BLOB has a max size of 4GB since Oracle 8, so 10MB should be no problem.