CLOB storage Oracle 11g - oracle

I have a encountered very strange behavior of Oracle LOB.
Situation: We have partitioned IOT that contains CLOB column. CLOB has separate LOB storage set up with LOGGING RETENTION and DISABLE IN ROW STORAGE options. CHUNK size is 8192bytes. PCTFREE is set default(null in dba_tables).
Now, we need to create a test case with certain amount of CLOBs loaded. we have chosen 19.5KB CLOB. After loading this CLOB 40 million times(used for perf. testing, does not matter about content) - the size on file system and in dba_data_files is 1230GB.
Question:
We estimated size of 40mil. CLOBs with size 19.5KB to ~780GB. How did we get 450GB more? I would guess it has something to do with CHUNK size - 19.5KB would use 3 CHUNKs, thus being size 24KB, which is still only 960GB. LOB index is around 2GBs.
Does anybody have an idea?(sorry for poor explanation)(P.S. running ORACLE 11g)
Thank you in advance!

Your comment is correct: "Data in CLOB columns is stored in a format that is compatible with UCS-2 when the database character set is multibyte, such as UTF8 or AL32UTF8".
Although I would not say this is just an extrapolation of VARCHAR2. UTF8 is a varying width character set, and does not always require 2 bytes.
15760 characters is 31520 bytes, which can only fit in 4 blocks, 32768 bytes. 32768 * 40000000 / 1024 / 1024 / 1024 = 1220GB. Which doesn't perfectly
match your result, but is very close. We'd need to see some more detailed numbers to look for a perfect match.

Related

Why is the ora-archive-state column a varchar2 4000 chars?

Can someone explain why Oracle made the ora-archive-state column a varchar2 of 4000 chars? When using the in-database archiving feature of Oracle 12c, when the column is 0, the record is visible. When anything other than 0, the record is hidden.
What's the purpose of having the extra 3999 chars when simply setting the column to 1 accomplishes the goal? I'm doubting Oracle is just wasting the space.
Because it allows you to mark "archived" rows differently: you can update ORA_ARCHIVE_STATE to different values, for example: to_char(systimestamp,'yyyy-mm-dd hh24:mi:ssxff')
to set it to the date of archiving. And later you can analyze archived records by this column.
I'm doubting Oracle is just wasting the space.
Varchar2 doesn't waste space. It is variable-length character string. Ie varchar2(4000b) doesn't mean it will use 4000 bytes, or varchar2(4000c) ~ chars. That's just maximum allowed column length

Oracle extended data types in combination with char semantics

There is one aspect of the extended data types introduced with Oracle 12 that I don't quite understand. The documentation (https://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF55623) says:
A VARCHAR2 or NVARCHAR2 data type with a declared size of greater than 4000 bytes, or a RAW data type with a declared size of greater than 2000 bytes, is an extended data type. Extended data type columns are stored out-of-line, leveraging Oracle's LOB technology.
According to this definition does "VARCHAR2(4000 CHAR)" have "a declared size of greater than 4000 bytes" because with a multi-byte character set (e.g. ALS32UTF8) it could contain more that 4000 bytes?
Or more specific: What happens when I create a column of that type? I could think of the following possibilities:
It is created with an extended data type, so that the content of that column is always stored in a CLOB regardless of its size.
The values for that column are stored inline if they need not more than 4000 bytes, and as CLOB if they are longer.
It will refuse to store any values with more than 4000 bytes. To circumvent that I have to declare the column as VARCHAR2(4001 CHAR) or something similar.
Edit: The question had been marked as a duplicate of Enter large content to oracle database for a few weeks, but I don't think it is. The other question is generally about how you can enter more than 4000 characters in a VARCHAR2 column, but I am asking a very specific question about an edge case.

Clob size in bytes

I have a database with the below NLS settings
NLS_NCHAR_CHARACTERSET - AL16UTF16
NLS_CHARACTERSET - AL32UTF8
There's a table with a clob column storing a base64 encoded data.
Since the characters are mostly english and letters, I would assume each character takes up 1 byte only as clob using the charset of NLS_CHARACTERSET for encoding.
With a inline enabled clob column, the clob will be stored inline unless it goes more that 4096 bytes in size. However, when I tried to store a set of data with 2048 chars, I found that it is not stored inline (By checking the table DBA_TABLES). So does it mean each character is not using only 1 byte? Can anyone elaborate on this?
Another test added:
Create a table with clob column with chunk size 8kb so that initial segment size is 65536 bytes.
After insert a row with 32,768 chars in clob column. The 2nd extent creation can be told by querying dba_segments.
http://docs.oracle.com/cd/E11882_01/server.112/e10729/ch6unicode.htm#r2c1-t12
It says:
Data in CLOB columns is stored in a format that is compatible with
UCS-2 when the database character set is multibyte, such as UTF8 or
AL32UTF8. This means that the storage space required for an English
document doubles when the data is converted
So it looks like CLOB internally stores everything as UCS-2 (Unicode), i.e. 2 bytes fixed per symbol. Consequently, it stores inline 4096/2 = 2048 chars.

How many bytes can nvarchar store in Oracle

I have an nvarchar field in Oracle and I would like to know: how many bytes it can hold if it has a length of 178?
It depends on your NLS_LANG setting.
Check http://docs.oracle.com/cd/B28359_01/server.111/b28298/ch6unicode.htm#g1008281
From the manual:
Width specifications of character data type NVARCHAR2 refer to the number of characters. The maximum column size allowed is 4000 bytes.

Oracle 10g small Blob or Clob not being stored inline?

According to the documents I've read, the default storage for a CLOB or BLOB is inline, which means that if it is less than approx 4k in size then it will be held in the table.
But when I test this on a dummy table in Oracle (10.2.0.1.0) the performance and response from Oracle Monitor (by Allround Automations) suggest that it is being held outwith the table.
Here's my test scenario ...
create table clobtest ( x int primary key, y clob, z varchar(100) )
;
insert into clobtest
select object_id, object_name, object_name
from all_objects where rownum < 10001
;
select COLUMN_NAME, IN_ROW
from user_lobs
where table_name = 'CLOBTEST'
;
This shows: Y YES (suggesting that Oracle will store the clob in the row)
select x, y from CLOBTEST where ROWNUM < 1001 -- 8.49 seconds
select x, z from CLOBTEST where ROWNUM < 1001 -- 0.298 seconds
So in this case, the CLOB values will have a maximum length of 30 characters, so should always be inline. If I run Oracle Monitor, it shows a LOB.Length followed by a LOB.Read() for each row returned, again suggesting that the clob values are held outwith the table.
I also tried creating the table like this
create table clobtest
( x int primary key, y clob, z varchar(100) )
LOB (y) STORE AS (ENABLE STORAGE IN ROW)
but got exactly the same results.
Does anyone have any suggestions how I can force (persuade, encourage) Oracle to store the clob value in-line in the table? (I'm hoping to achieve similar response times to reading the varchar2 column z)
UPDATE: If I run this SQL
select COLUMN_NAME, IN_ROW, l.SEGMENT_NAME, SEGMENT_TYPE, BYTES, BLOCKS, EXTENTS
from user_lobs l
JOIN USER_SEGMENTS s
on (l.Segment_Name = s. segment_name )
where table_name = 'CLOBTEST'
then I get the following results ...
Y YES SYS_LOB0000398621C00002$$ LOBSEGMENT 65536 8 1
The behavior of Oracle LOBs is the following.
A LOB is stored inline when:
(
The size is lower or equal than 3964
AND
ENABLE STORAGE IN ROW has been defined in the LOB storage clause
) OR (
The value is NULL
)
A LOB is stored out-of-row when:
(
The value is not NULL
) AND (
Its size is higher than 3964
OR
DISABLE STORAGE IN ROW has been defined in the LOB storage clause
)
Now this is not the only issue which may impact performance.
If the LOBs are finally not stored inline, the default behavior of Oracle is to avoid caching them (only inline LOBs are cached in the buffer cache with the other fields of the row). To tell Oracle to also cache non inlined LOBs, the CACHE option should be used when the LOB is defined.
The default behavior is ENABLE STORAGE IN ROW, and NOCACHE, which means small LOBs will be inlined, large LOBs will not (and will not be cached).
Finally, there is also a performance issue at the communication protocol level. Typical Oracle clients will perform 2 additional roundtrips per LOBs to fetch them:
- one to retrieve the size of the LOB and allocate memory accordingly
- one to fetch the data itself (provided the LOB is small)
These extra roundtrips are performed even if an array interface is used to retrieve the results. If you retrieve 1000 rows and your array size is large enough, you will pay for 1 roundtrip to retrieve the rows, and 2000 roundtrips to retrieve the content of the LOBs.
Please note it does not depend on the fact the LOB is stored inline or not. They are complete different problems.
To optimize at the protocol level, Oracle has provided a new OCI verb to fetch several LOBs in one roundtrips (OCILobArrayRead). I don't know if something similar exists with JDBC.
Another option is to bind the LOB on client side as if it was a big RAW/VARCHAR2. This only works if a maximum size of the LOB can be defined (since the maximum size must be provided at bind time). This trick avoids the extra rountrips: the LOBs are just processed like RAW or VARCHAR2. We use it a lot in our LOB intensive applications.
Once the number of roundtrips have been optimized, the packet size (SDU) can be resized in the net configuration to better fit the situation (i.e. a limited number of large roundtrips). It tends to reduce the "SQL*Net more data to client" and "SQL*Net more data from client" wait events.
If you're "hoping to achieve similar response times to reading the varchar2 column z", then you'll be disappointed in most cases.
If you're using a CLOB I suppose you need to store more than 4,000 bytes, right? Then if you need to read more bytes that's going to take longer.
BUT if you have a case where yes, you use a CLOB, but you're interested (in some instances) only in the first 4,000 bytes of the column (or less), then you have a chance of getting similar performance.
It looks like Oracle can optimize the retrieval if you use something like DBMS_LOB.SUBSTR and ENABLE STORAGE IN ROW CACHE clause with your table. Example:
CREATE TABLE clobtest (x INT PRIMARY KEY, y CLOB)
LOB (y) STORE AS (ENABLE STORAGE IN ROW CACHE);
INSERT INTO clobtest VALUES (0, RPAD('a', 4000, 'a'));
UPDATE clobtest SET y = y || y || y;
INSERT INTO clobtest SELECT rownum, y FROM all_objects, clobtest WHERE rownum < 1000;
CREATE TABLE clobtest2 (x INT PRIMARY KEY, z VARCHAR2(4000));
INSERT INTO clobtest2 VALUES (0, RPAD('a', 4000, 'a'));
INSERT INTO clobtest2 SELECT rownum, z FROM all_objects, clobtest2 WHERE rownum < 1000;
COMMIT;
In my tests on 10.2.0.4 and 8K block, these two queries give very similar performance:
SELECT x, DBMS_LOB.SUBSTR(y, 4000) FROM clobtest;
SELECT x, z FROM clobtest2;
Sample from SQL*Plus (I ran the queries multiple times to remove physical IO's):
SQL> SET AUTOTRACE TRACEONLY STATISTICS
SQL> SET TIMING ON
SQL>
SQL> SELECT x, y FROM clobtest;
1000 rows selected.
Elapsed: 00:00:02.96
Statistics
------------------------------------------------------
0 recursive calls
0 db block gets
3008 consistent gets
0 physical reads
0 redo size
559241 bytes sent via SQL*Net to client
180350 bytes received via SQL*Net from client
2002 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000 rows processed
SQL> SELECT x, DBMS_LOB.SUBSTR(y, 4000) FROM clobtest;
1000 rows selected.
Elapsed: 00:00:00.32
Statistics
------------------------------------------------------
0 recursive calls
0 db block gets
2082 consistent gets
0 physical reads
0 redo size
18993 bytes sent via SQL*Net to client
1076 bytes received via SQL*Net from client
68 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000 rows processed
SQL> SELECT x, z FROM clobtest2;
1000 rows selected.
Elapsed: 00:00:00.18
Statistics
------------------------------------------------------
0 recursive calls
0 db block gets
1005 consistent gets
0 physical reads
0 redo size
18971 bytes sent via SQL*Net to client
1076 bytes received via SQL*Net from client
68 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000 rows processed
As you can see, consistent gets are quite higher, but SQL*Net roundtrips and bytes are nearly identical in the last two queries, and that apparently makes a big difference in execution time!
One warning though: the difference in consistent gets might become a more likely performance issue if you have large result sets, as you won't be able to keep everything in buffer cache and you'll end up with very expensive physical reads...
Good luck!
Cheers
Indeed, it is stored within the row. You are likely dealing with the simple overhead of using a LOB instead of a varchar. Nothing is free. The DB probably doesn't know ahead of time where to find the row, so it probably still "follows a pointer" and does extra work just in case the LOB is big. If you can get by with a varchar, you should. Even old hacks like 2 varchars to deal with 8000 characters might solve your business case with higher performance.
LOBS are slow, difficult to query, etc. On the positive, they can be 4G.
What would be interesting to try is to shove something just over 4000 bytes into that clob, and see what the performance looks like. Maybe it is about the same speed? This would tell you that it's overhead slowing you down.
Warning, at some point network traffic to your PC slows you down on these kind of tests.
Minimize this by wrapping in a count, this isolates the work to the server:
select count(*) from (select x,y from clobtest where rownum<1001)
You can achieve similar effects with "set autot trace", but there will be tracing overhead too.
There are two indirections when it comes to CLOBs and BLOBs:
The LOB value might be stored in a different database segment than the rest of the row.
When you query the row, only the non-LOB fields are contained in the result set and accessing the LOB-fields requries one or more additional round trips between the client and the server (per row!).
I don't quite know how you measure the execution time and I've never used Oracle Monitor, but you might primarily be affected by the second indirection. Depending on the client software you use, it is possible to reduce the round trips. E.g. when you use ODP.NET, the parameter is called InitialLobFetchSize.
Update:
One one to tell which of the two indirections is relevant, you can run your LOB query with 1000 rows twice. If the time drops significantly from the first to the second run, it's indirection 1. On the second run, the caching pays off and access to the separate database segment isn't very relevant anymore. If the time stays about the same, it's the second indirection, namely the round trips between the client and the server, which cannot improve between two runs.
The time of more than 8 seconds for 1000 rows in a very simple query indicate it's indirection 2 because 8 seconds for 1000 rows can't really be explained with disk access unless your data is very scattered and your disk system under heavy load.
This is the key information (how to read LOB without extra roundtrips), which is not available in Oracle's documentation I think:
Another option is to bind the LOB on client side as if it was a big
RAW/VARCHAR2. This only works if a maximum size of the LOB can be
defined (since the maximum size must be provided at bind time). This
trick avoids the extra rountrips: the LOBs are just processed like RAW
or VARCHAR2. We use it a lot in our LOB intensive applications.
I had problem with loading simple table (few GB) with one blob column ( 14KB => thousands of rows) and I was investigating it for a long time, tried a lot of lob storage tunings (DB_BLOCK_SIZE for new tablespace, lob storage specification - CHUNK ), sqlnet.ora settings, client prefetching attributes, but this (treat BLOB as LONG RAW with OCCI ResultSet->setBufferData on client side) was the most important thing (persuade oracle to send blob column immediately without sending lob locator at first and loading each lob separately based on lob locator.
Now I can get even ~ 500Mb/s throughput (with columns < 3964B).
Our 14KB blob will be separated into multiple columns - so it'll be stored in row to get almost sequential reads from HDD. With one 14KB blob (one column) I get ~150Mbit/s because of non-sequential reads (iostat: low amount of merged read requests).
NOTE: don't forget to set also lob prefetch size/length:
err = OCIAttrSet(session, (ub4) OCI_HTYPE_SESSION, (void *) &default_lobprefetch_size, 0, (ub4) OCI_ATTR_DEFAULT_LOBPREFETCH_SIZE, errhp);
But I don't know how is it possible to achieve the same fetching throughput with ODBC connector. I was trying it without any success.

Resources