Oracle CLOB vs BLOB - oracle

I want to know what does Oracle's CLOB has to offer over BLOB data type.
Both have data storage limits of (4 GB - 1) * DB_BLOCK_SIZE.
A text string which is longer than 4000 bytes can not fit in VARCHAR2 column. Now, I can use CLOB and BLOB as well to store this string.
Everyone says, CLOB is good and meant for character data and BLOB is for binary data such as images, unstructured documents.
But I see I can store character data inside a BLOB as well.
What I want to know:
So, question is on the basics, why CLOB and why not BLOB always? Is there anything to do with encoding?
May be the question title should be, How CLOB handles the character data differently than a BLOB?

I want to know how BLOB treats the character type data.
It doesn't treat it as character type data, it only see it as a stream of bytes - it doesn't know or care what it represents.
From the documentation:
The BLOB data type stores unstructured binary large objects. BLOB objects can be thought of as bitstreams with no character set semantics.
Does clob stores the conding information along with it and uses it while retrieving the data ?
Not explicitly, but the data is stored in the database character set, as with VARCHAR2 data. From the documentation again:
The CLOB data type stores single-byte and multibyte character data. Both fixed-width and variable-width character sets are supported, and both use the database character set.
You might also have noticed that the dbms_lob package has procedures to convert between CLOB and BLOB data types. For both of those you have to specify the character set to use. So if you choose to store character data as a BLOB you have to know the character set when converting it to a BLOB, but perhaps more crucially you have to know the character set to be able convert it back. You can do it, but it doesn't mean you should. You have no way to validate the BLOB data until you come to try to convert it to a string.
As #APC alluded to, this is similar to storing a date as a string - you lose the advantages and type-safety using the correct data type would give you, and instead add extra pain, uncertainty and overhead for no benefit.
The question isn't really what advantages CLOBs have over BLOBs for storing character data; the question is really the reverse: what advantages do BLOBs have over CLOBs for storing character data? And the answer is usually that there are none.
#Boneist mentions the recommendation to store JSON as BLOBs, and there is more about that here.
(The only other reasons I can think of off-hand are that you have to store data from multiple source character sets and want to preserve them exactly as you received them. But then either you are only storing them and will never examine or manipulate the data from within the database itself, and will only return them to some external application untouched; in which case you don't care about the character set - so you're handling purely binary data and shouldn't be thinking of it as character data at all, any more than you'd care that an image you're storing is PNG vs. JPG or whatever. Or you will need to work with the data and so will have to record which character set each BLOB object represents, so you can convert as needed.)

Related

Storing large json strings in Oracle Db

We have a use case to store large json strings (about 10 kb +) in Oracle Db. What column data type is the most ideally suited for this? Clob or blob?
For Oracle 12.1 and higher, as Mathguy mentioned, you should follow Oracle's advice and use BLOBs to store JSON data. Recent versions of Oracle have added many SQL/JSON features that seamlessly deal with JSON regardless of the data type, and BLOBs will avoid some character set issues.
For Oracle 11.2 and lower, you should use CLOBs to store JSON data. Since you don't have access to native JSON functionality, you will probably need to rely on regular string processing. And dealing with character data in CLOBs is much easier than dealing with character data in BLOBs. (However, if you use a library like PL/JSON, then BLOBs might still work OK.)

what is the difference between long and long raw data types in Oracle?

My understanding is Long data type can store the actual string(chars), while Long raw data type stores the binary values of the string(chars). Is it right? Can a table have only one long type column?
The data types are described in the documentation; LONG is explained here (or the 11gR2 version):
LONG columns store variable-length character strings containing up to
2 gigabytes -1, or 231-1 bytes. LONG columns have many of the
characteristics of VARCHAR2 columns. You can use LONG columns to store
long text strings.
And LONG RAW is here:
The RAW and LONG RAW datatypes store data that is not to be
interpreted (that is, not explicitly converted when moving data
between different systems) by Oracle Database. These datatypes are
intended for binary data or byte strings. For example, you can use
LONG RAW to store graphics, sound, documents, or arrays of binary
data, for which the interpretation is dependent on the use.
So a RAW or LONG RAW can contain the binary representation of characters, but won't be subject to character set conversion etc. so probably isn't all that useful for that; an can contain any other binary data - anything that isn't supposed to represent text.
From the same LONG section:
A table can contain only one LONG column.
However, LONG is deprecated in favour of LOB (CLOB or NCLOB for text, BLOB for everything else), so you shouldn't be using them for new work, and should at least be considering replacing any you already have. Again from that same section on LONG:
Do not create tables with LONG columns. Use LOB columns (CLOB, NCLOB,
BLOB) instead. LONG columns are supported only for backward
compatibility.
Oracle also recommends that you convert existing LONG columns to LOB
columns.
This documentation on migrating from LONG to LOB might be of interest.

Is there a big technical difference between VARBINARY(MAX) and IMAGE data types?

I was reading on internet these statements about SQL Server data types:
VARBINARY(MAX) - Binary strings
with a variable length can store up
to 2^31-1 bytes.
IMAGE - Binary strings with a
variable length up to 2^31-1
(2,147,483,647) bytes.
Is there a really big technical difference between VARBINARY(MAX) and IMAGE data types?
If there is a difference: do we have to customize how ADO.NET inserts and updates image data field in SQL Server?
They store the same data: this is as far as it goes.
"image" is deprecated and has a limited set of features and operations that work with it. varbinary(max) can be operated on like shorter varbinary (ditto for text and varchar(max)).
Do not use image for any new project: just search here for the issues folk have with image and text datatypes because of the limited functionality.
Examples from SO: One, Two
I think that technically they are similar, but it is important to notice the following from the documentation:
ntext, text, and image data types will be removed in a future version of MicrosoftSQL Server. Avoid using these data types in new development work, and plan to modify applications that currently use them. Use nvarchar(max), varchar(max), and varbinary(max) instead.
Fixed and variable-length data types for storing large non-Unicode and Unicode character and >binary data. Unicode data uses the UNICODE UCS-2 character set.
They store the same data: this is as far as it goes.
"image" is deprecated and has a limited set of features and operations
that work with it. varbinary(max) can be operated on like shorter
varbinary (ditto for text and varchar(max)).
Do not use image for any new project: just search here for the issues
folk have with image and text datatypes because of the limited
functionality.
In fact, VARBINARY can store any data that can be converted into a byte array, such as files, and this is the same process that IMAGE data type uses, so, by this point of view, both data types can store the same data.
But VARBINARY have a size property, while IMAGE accepts any size up to the data type limits, so when using IMAGE data type, you will spend more resources to store the same data.
In a Microsoft® SQL Server®, the IMAGE data type is really deprecated, then you must bet in VARBINARY data type.
But be carefull: The Microsoft® SQL Server® CE® (including the latest 4.0 version) still using IMAGE data type and probably this data type will not "disappears" so soon, because in Compact Edition versions, this data type is better than any other to fast files storage.
I inadvertently found one difference between them. You can insert a string into an image type but not into a varbinary. Maybe that's why MS is deprecating the image type as it really doesn't make sense to set an image with a string.

Oracle Performance terrible after changing Varchar2 fields to NVarchar2

I've been developing a DotNet project on oracle (Ver 10.2) for the last couple of months and was using Varchar2 for my string data fields. This was fine and when navigating the project page refreshes were never more than a half second if even (it's quiet a data intensive project). The data is referenced from 2 different schemas, one a centralised store of data and one of which is my own. Now the centralised schema will be changing to be unicode compliant (but hasn't yet) so all Varchar2 fields will become NVarchar2, in preparation for this I changed all the fields in my schema to be NVarchar2 and since then performance has been horrible .. up to 30/40 second page refreshes.
Could this be because Varchar2 fields in the centralised schema will be joined against NVarchar2 fields in my schema on some stored procedures. I know NVarchar2 is twice the size of Varchar2 but that wouldn't explain the sudden massive change. As I said any tips for what to look for to improve would be great, if I haven't explained the scenario well enough do ask for more information.
Firstly, do a
select * from v$nls_parameters where parameter like '%SET%';
Character sets can be complicated. You can have single-byte charactersets, fixed-size multibyte character set sand variable-sized multi-byte character sets. See the unicode descriptions here
Secondly, if you are joining a string in a single-byte characterset to a string in a two-byte characters set, you have a choice. You can do a binary/byte comparison (which generally won't match anything if you compare between a single-byte character set and a two-byte characterset). Or you can do a linguistic comparison, which will generally mean some CPU cost, as one value is converted into another, and often the failure to use an index.
Indexes are ordered, A,B,C etc. But a character like Ä may fall in different places depending on the Linguistic order. Say the index structure puts Ä between A and B. But then you do a linguistic comparison. The language of that comparison may put Ä after Z, in which case the index can't be used. (Remember your condition could be a BETWEEN rather than an = ).
In short, you'll need a lot of preparation, both in your schema and the central store, to enable efficient joins between different charactersets.
It is difficult to say anything based on what you have provided. Did you manage to check if the estimated cardinalities and/or explain plan changed when you changed the datatype to NVARCHAR2? You may want to read the following blog post to see if you can find a lead
http://joze-senegacnik.blogspot.com/2009/12/cbo-oddities-in-determing-selectivity.html
It is likely no longer able to use indexes that it previously could. As Narendra suggests check the explain plan to see what changed. It is possible that once the centeralized store is changed the indexes will again be usable. I suggest testing that path.
Setting the NLS_LANG initialization parameter properly is essential to proper data conversion. The character set that is specified by the NLS_LANG initialization parameter should reflect the setting for the client operating system. Setting NLS_LANG correctly enables proper conversion from the client operating system code page to the database character set. When these settings are the same, Oracle assumes that the data being sent or received is encoded in the same character set as the database character set, so no validation or conversion is performed. This can lead to corrupt data if conversions are necessary.

Is it reasonable to use small blobs in Oracle?

In Oracle LongRaw and Varchar2 have a max length of 4kb in Oracle, but I need to store objects of 8kb & 16kb, so I'm wondering what's a good solution. I know I could use a Blob, but a Blob has variable length and is basically an extra file behind the scenes if I'm correct, a feature and a Price I'm not interested in paying for my Objects.
Are there any other solutions or datatypes that are more suited to this kind of need?
Thanks
A blob is not a file behind the scene. It is stored in the database. Why does it matter that it has variable length? You can just use a blob column (or clob if your data is text data) and it gets stored in its own segment.
You should use a BLOB.
A BLOB is not stored as an extra file, it's stored as a block in one of your datafiles (just like other data). If the BLOB becomes too large for a single block (which may not happen in your case) then it will continue in another block.
If your BLOB data is really small, you can get Oracle to store it inline with other data in your row (like a varchar2).
Internally, Oracle is doing something similar to what PAX suggested. The chunks are as big as a DB block minus some overhead. If you try and re-invent Oracle features on top of Oracle it's only going to be slower than the native feature.
You will also have to re-implement a whole heap of functionality that is already provided in DBMS_LOB (length, comparisons, etc).
Why don't you segment the binary data and store it in 4K chunks? You could either have four different columns for these chunks (and a length column for rebuilding them into your big structure) or the more normalized way of another table with the chunks in it tied back to the original table record.
This would provide for expansion should you need it in future.
For example:
Primary table:
-- normal columns --
chunk_id integer
chunk_last_len integer
Chunk table:
chunk_id integer
chunk_sequence integer
chunk varchar2(whatever)
primary key (chunk_id,chunk_sequence)
Of course, you may find that your DBMS does exactly that sort of behavior under the covers for BLOBs and it may be more efficient to let Oracle handle it, relieving you of the need to manually reconstruct your data from individual chunks. I'd measure the performance of each to figure out the best approach.
Don't store binary data in varchar2 columns, unless you are willing to encode them (base64 or similar). Character set issues might corrupt your data otherwise!
Try the following statement to see the effect:
select * from (select rownum-1 original, ascii(chr(rownum-1)) data from user_tab_columns where rownum<=256) where original<>data;
Varchar2 is of variable length just as well. If you need to store binary data of any bigger than small size in your database, you'll have to look in blob's direction. Another solutiuon is of course storing the binary somewhere on the file system, and storing the path to the file as a varchar in the db.

Resources