Is it reasonable to use small blobs in Oracle?

Is it reasonable to use small blobs in Oracle? - oracle

In Oracle LongRaw and Varchar2 have a max length of 4kb in Oracle, but I need to store objects of 8kb & 16kb, so I'm wondering what's a good solution. I know I could use a Blob, but a Blob has variable length and is basically an extra file behind the scenes if I'm correct, a feature and a Price I'm not interested in paying for my Objects.
Are there any other solutions or datatypes that are more suited to this kind of need?
Thanks

A blob is not a file behind the scene. It is stored in the database. Why does it matter that it has variable length? You can just use a blob column (or clob if your data is text data) and it gets stored in its own segment.

You should use a BLOB.
A BLOB is not stored as an extra file, it's stored as a block in one of your datafiles (just like other data). If the BLOB becomes too large for a single block (which may not happen in your case) then it will continue in another block.
If your BLOB data is really small, you can get Oracle to store it inline with other data in your row (like a varchar2).
Internally, Oracle is doing something similar to what PAX suggested. The chunks are as big as a DB block minus some overhead. If you try and re-invent Oracle features on top of Oracle it's only going to be slower than the native feature.
You will also have to re-implement a whole heap of functionality that is already provided in DBMS_LOB (length, comparisons, etc).

Why don't you segment the binary data and store it in 4K chunks? You could either have four different columns for these chunks (and a length column for rebuilding them into your big structure) or the more normalized way of another table with the chunks in it tied back to the original table record.
This would provide for expansion should you need it in future.
For example:
Primary table:
-- normal columns --
chunk_id integer
chunk_last_len integer
Chunk table:
chunk_id integer
chunk_sequence integer
chunk varchar2(whatever)
primary key (chunk_id,chunk_sequence)
Of course, you may find that your DBMS does exactly that sort of behavior under the covers for BLOBs and it may be more efficient to let Oracle handle it, relieving you of the need to manually reconstruct your data from individual chunks. I'd measure the performance of each to figure out the best approach.

Don't store binary data in varchar2 columns, unless you are willing to encode them (base64 or similar). Character set issues might corrupt your data otherwise!
Try the following statement to see the effect:
select * from (select rownum-1 original, ascii(chr(rownum-1)) data from user_tab_columns where rownum<=256) where original<>data;

Varchar2 is of variable length just as well. If you need to store binary data of any bigger than small size in your database, you'll have to look in blob's direction. Another solutiuon is of course storing the binary somewhere on the file system, and storing the path to the file as a varchar in the db.

Related

How ROWID fetches records fast?

Suppose I run a sql query and DB makes use of index structures to arrive at a ROWID(assuming an INDEX SCAN, like in Oracle), and now DB wants to use it to fetch the actual records.
So how does ROWID helps in fast access of the record? I assume ROWID must be somehow mapped to internal record storage.
I understand index is basically combination of B-tree and doubly linked list. But how are actual records stored such that ROWID fetches them fast.

A ROWID is simply a 10 byte physical row address that contains the relative file number, the block number within that file, and the row number within that block. See this succinct explanation:
Oracle FAQs - ROWID
With that information, Oracle can make an I/O read request of a single block by calculating the block offset byte position in the file and the length of the block. Then it can use the block's internal row map to jump directly to the byte offset within the block of the desired row. It doesn't have to scan through anything.
You can pull a human-readable representation of these three components by using this query against any (heap) table, any row:
SELECT ROWIDTOCHAR(rowid) row_id,
dbms_rowid.rowid_relative_fno(rowid) fno,
dbms_rowid.rowid_block_number(rowid) blockno,
dbms_rowid.rowid_row_number(rowid) rowno
FROM [yourtable]
WHERE ROWNUM = 1 -- pick any row
The fast retrieval is also often aided by the fact that single block reads are frequently bypassed altogether because the block is already in the buffer cache. Or if it is not in Oracle's buffer cache, a single block read from many cooked filesystems, unless disabled by the setting of filesytemio_options, may be cached at the OS level and never go to storage. And if you use a storage appliance, it probably has it's own caching mechanism. All of these caching mechanisms likely give caching preference to small reads over large ones, so single block reads from Oracle are likely to avoid hitting magnetic disk altogether, more so than multiblock reads associated with table scans.
But be careful - just because ROWID is the fastest way to retrieve a single row does not mean it is the fastest way to retrieve many rows. Because of the overhead of a read call, many single calls accumulate a lot of wasted overhead. When pulling large amounts of rows it is frequently more efficient to do a full table scan, especially when Oracle uses direct path reads to do so, than utilize ROWIDs either manually or via indexes.

Oracle CLOB vs BLOB

I want to know what does Oracle's CLOB has to offer over BLOB data type.
Both have data storage limits of (4 GB - 1) * DB_BLOCK_SIZE.
A text string which is longer than 4000 bytes can not fit in VARCHAR2 column. Now, I can use CLOB and BLOB as well to store this string.
Everyone says, CLOB is good and meant for character data and BLOB is for binary data such as images, unstructured documents.
But I see I can store character data inside a BLOB as well.
What I want to know:
So, question is on the basics, why CLOB and why not BLOB always? Is there anything to do with encoding?
May be the question title should be, How CLOB handles the character data differently than a BLOB?

I want to know how BLOB treats the character type data.
It doesn't treat it as character type data, it only see it as a stream of bytes - it doesn't know or care what it represents.
From the documentation:
The BLOB data type stores unstructured binary large objects. BLOB objects can be thought of as bitstreams with no character set semantics.
Does clob stores the conding information along with it and uses it while retrieving the data ?
Not explicitly, but the data is stored in the database character set, as with VARCHAR2 data. From the documentation again:
The CLOB data type stores single-byte and multibyte character data. Both fixed-width and variable-width character sets are supported, and both use the database character set.
You might also have noticed that the dbms_lob package has procedures to convert between CLOB and BLOB data types. For both of those you have to specify the character set to use. So if you choose to store character data as a BLOB you have to know the character set when converting it to a BLOB, but perhaps more crucially you have to know the character set to be able convert it back. You can do it, but it doesn't mean you should. You have no way to validate the BLOB data until you come to try to convert it to a string.
As #APC alluded to, this is similar to storing a date as a string - you lose the advantages and type-safety using the correct data type would give you, and instead add extra pain, uncertainty and overhead for no benefit.
The question isn't really what advantages CLOBs have over BLOBs for storing character data; the question is really the reverse: what advantages do BLOBs have over CLOBs for storing character data? And the answer is usually that there are none.
#Boneist mentions the recommendation to store JSON as BLOBs, and there is more about that here.
(The only other reasons I can think of off-hand are that you have to store data from multiple source character sets and want to preserve them exactly as you received them. But then either you are only storing them and will never examine or manipulate the data from within the database itself, and will only return them to some external application untouched; in which case you don't care about the character set - so you're handling purely binary data and shouldn't be thinking of it as character data at all, any more than you'd care that an image you're storing is PNG vs. JPG or whatever. Or you will need to work with the data and so will have to record which character set each BLOB object represents, so you can convert as needed.)

Doing String length on SQL Loader input field

I'm reading data from a fixed length text file and loading into a table with fixed length processing.
I want to check the input line length so that i'd discard the records that are not matching the fixed length and logging them into an Error Table.
Example
Load into Input_Log table if line is meeting the specified length
Load into Input_Error_Log table if the input line length is less than or greater than the fixed line length.

I believe you would be better served by bulk loading your data into a staging table, then load into the production table from there via a stored procedure where you can apply rules via normal PL/SQL & DML to your heart's content. This is a typical best practice anyway.
sqlldr isn't really the tool to get too complicated in, even if you could do what you want. Maintainability and restart-ability become more complicated when you add complexity to a tool that's really designed for bulk loading. Add the complexity to a proper program.
Let us know what you come up with.

do i need to set length for every poco property in Entity Framework Code First?

do i need to set length for every poco property in Entity Framework Code First ? if i dont
set stringLength or maxlength/minlength for a property , it will be nvarchar(max) , how bad is nvarchar(max) ? should i just leave it alone in development stage , and improve it before production ?

You should define a Max length for each property where you want to restrict the length. Note that the nvarchar(max) data type is different from the nvarchar(n) datatype, where n is a number from 1-4000. The max version that you get when you define no max length is meant for large blocks of text, like paragraphs and the like. It can handle extremely large lengths, and so the data is stored separately from the rest of the fields of the record. nvarchar(n), on the other hand, is stored inline with the rest of the rows.
It's probably best to go ahead and set those values as you want now, rather than waiting to do so later. Choose values that are as large as you will ever need, so you never have to increase them. nvarchar(n) stores its info efficiently; for example, a nvarchar(200) does not necessarily take up 200 characters of space; it only uses enough space to store what is actually put into it, plus a couple extra bytes for saving its length.
So whenever possible, you should set a limit on your entity's text fields.

NVARCHAR - is variable length field. So it consumes only space you need for it. On the other hand NCHAR allocates all the space it requires not on demand as NVARCHAR does.
MSDN advises to use nvarchar when the sizes of the column data entries are probably going to vary considerably.
It's the way to go as for me on the early stages of a project. You can tune it when needed.

According to the next blog post nvarchar(max) is not the same as ntext until the actual value size does not reach 4000 symbols (cause limitation is 8K, and widechars use two bytes per char). As far as it hits this size it behaves pretty much the same as ntext. So as for me I don't see any good reason to avoid using nvarchar(max) data type.

Sybase TEXT vs Oracle CLOB performance

We're in the process of converting our database from Sybase to Oracle and we've hit a performance problem. In Sybase, we had a TEXT field and replaced it with a CLOB in Oracle.
This is how we accessed the data in our java code:
while(rs.next()) {
String clobValue = rs.getString(1); // This takes 176ms in Oracle!
.
.
}
The database is across the country, but still, we didn't have any performance problems with Sybase and its retrieval of TEXT data.
Is there something we can do to increase this performance?

By default, LOBs are not fetched along with the table data and it takes an extra round-trip to the database to fetch them in getString.
If you are using Oracle's .NET provider, you may set InitialLOBFetchSize in the data reader settings to a value large enough to accommodate your large objects in memory so they could be fetched in all their entirety along with the other data.

Some other options:
Are the LOB columns being stored in-line (in the data row) or out-of-line (in a separate place)? If the LOB columns tend to be small (under 4k in size), you can use the ENABLE STORAGE IN ROW clause to tell Oracle to store the data in-line where possible.
If your LOBs are larger and frequently used, are they being stored in the buffer cache? The default in 10g is that LOBs are NOCACHE, meaning each i/o operation against them involve direct reads to the database, a synchronous disk event, which can be slow. A database trace would reveal significant waits on direct path read / direct path write events.
This chapter of the Oracle Application Developer's Guide - Large Objects would be valuable reading.

We decided to take a different approach which will allow us to ignore clob performance.
Our current code (I didn't write it!) queries a table in the database and retrieves all of the information in the table, including the clobs, even though it wasn't quite necessary to retrieve them all # the time. Instead, we created another field with the first 4k characters in a varchar and query that instead. Then, when we need the full clob, we query it on an individual basis, rather than all clobs for all records.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio