Calculate database size after convert to Unicode - oracle

We want to convert our database (oracle 11g) from this character set: ISO-8859-8
to this character set: AL32UTF8.
The new mode need to support European characters and more.
Those languages can appear in a lot of tables.
I want to get some estimate about the new size of tables\whole database
according to the the current data.
Is there a good way to do that?

I don't think it will change much. I presume your VARCHAR2 columns are currently defined like VARCHAR2(30), which by default means VARCHAR2(30 BYTE). And that will remain unchanged after the conversion. If you defined them as VARCHAR2(30 CHAR), I'm not sure what would happen. When you define a column with character symantics, I think Oracle reserves as much space as it might need, which is 4 bytes per character for AL32UTF8.

I don't understand why you are so concerned about the size. Perhaps the DB will be a few(!) percent larger, most likely you will not notice any difference.
Anyway, this PL/SQL should give some idea on the size:
declare
iso_size number;
utf8_size number;
iso_size_sum number := 0;
utf8_size_sum number := 0;
begin
for aCol in (select table_name, column_name from user_tab_cols where data_type = 'VARCHAR') loop
iso_size.extend;
utf8_size.extend;
execute immediate
'select sum(LENGTH('||aCol.column_name||')),
sum(LENGTHB(convert('||aCol.column_name||', ''AL32UTF8'')))
from '||aCol.table_name INTO iso_size, utf8_size;
iso_size_sum := iso_size_sum + iso_size;
utf8_size_sum := utf8_size_sum + utf8_size ;
end loop;
dbms_output.put_line('Current size: '||to_char(iso_size_sum/1024/1024))||' MiByte');
dbms_output.put_line('Estimated UTF-8 size: '||to_char(utf8_size_sum/1024/1024))||' MiByte');
end;
The two numbers should give you an indication(!) how much the database would grow. Note, the data in Oracle are organized in Blocks (typically 8 kiBytes) not bytes.
Due to performance reasons you should run the query only on one representative table not the entire schema.

Related

Writing a Version Number Function in PL/SQL

I want to write a function that will give me the next version number for a table. The table stores the existing version on each record. For example,
I have the cat table
cats
seqid 1
name Mr Smith
version number 1.2b.3.4
How can I write a program that will be able to increment these values based on various conditions?
This is my first attempt
if v_username is not null
then v_new_nbr = substr(v_cur_nbr, 1,7)||to_number(substr(v_cur_nbr, 8,1))+1
should be 1.2b.3.5
substr(v_cur_nbr, 1,7)||to_number(substr(v_cur_nbr, 8,1))+1
This hurls ORA-01722: invalid number. The reason is a subtle one. It seems Oracle applies the concatenation operator before the additions, so effectively you're adding one to the string '1.2b.3.4'.
One solution is using a TO_CHAR function to bracket the addition with the second substring before concatenating the result with the first substring:
substr(v_cur_nbr, 1,7) || to_char(to_number(substr(v_cur_nbr, 8,1))+1)
Working demo on db<>fiddle.
Incidentally, a key like this is a bad piece of data modelling. Smart keys are dumb. They always lead to horrible SQL (as you're finding) and risk data corruption. A proper model would have separate columns for each element of the version number. We can use virtual columns to concatenate the version number for display circumstances.
create table cats(
seqid number
,name varchar2(32)
,major_ver_no1 number
,major_ver_no2 number
,variant varchar2(1)
,minor_ver_no1 number
,minor_ver_no2 number
,v_cur_nbr varchar2(16) generated always as (to_char(major_ver_no1,'FM9') ||'.'||
to_char(major_ver_no2,'FM9') ||'.'||
variant ||'.'||
to_char(minor_ver_no1,'FM9') ||'.'||
to_char(minor_ver_no2,'FM9') ) );
So the set-up is a bit of a nause but incrementing the version numbers is a piece of cake.
update cats
set major_ver_no1 = major_ver_no1 +1
, major_ver_no2 = 0
, variant = 'a';
There's a db<>fiddle for that too.
Try searching mask for TO_NUMBER to be able to get the decimal number, this small example might help:
CREATE TABLE tmp_table (version varchar2(100));
INSERT INTO tmp_table(version) VALUES ('1.2b.3.4');
DECLARE
mainVersion NUMBER;
subVersion NUMBER;
currentVersion VARCHAR2(100);
BEGIN
SELECT version INTO currentVersion FROM tmp_table;
mainVersion := TO_NUMBER(SUBSTR(currentVersion,1,3),'9.9') + 0.1;
subVersion := TO_NUMBER(SUBSTR(currentVersion,6,3),'9.9') + 1.1;
UPDATE tmp_table SET version = (mainVersion||'b.'||subVersion);
END;

Performance of using a nested table inside the IN clause - Oracle

I'm trying to use a nested table inside the IN clause in a PL-SQL block.
First, I have defined a TYPE:
CREATE OR REPLACE TYPE VARCHAR_ARRAY AS TABLE OF VARCHAR2(32767);
Here is my PL-SQL block using the 'BULK COLLECT INTO':
DECLARE
COL1 VARCHAR2(50) := '123456789';
N_TBL VARCHAR_ARRAY := VARCHAR_ARRAY();
C NUMBER;
BEGIN
-- Print timestamp
DBMS_OUTPUT.PUT_LINE('START: ' || TO_CHAR(SYSTIMESTAMP ,'dd-mm-yyyy hh24:mi:ss.FF'));
SELECT COLUMN1
BULK COLLECT INTO N_TBL
FROM MY_TABLE
WHERE COLUMN1 = COL1;
SELECT COUNT(COLUMN1)
INTO C
FROM MY_OTHER_TABLE
WHERE COLUMN1 IN (SELECT column_value FROM TABLE(N_TBL));
-- Print timestamp
DBMS_OUTPUT.PUT_LINE('ENDED: ' || TO_CHAR(SYSTIMESTAMP ,'dd-mm-yyyy hh24:mi:ss.FF'));
END;
And the output is:
START: 01-08-2014 12:36:14.997
ENDED: 01-08-2014 12:36:17.554
It takes more than 2.5 seconds (2.557 seconds exactly)
Now, If I replace the nested table by a subquery, like this:
DECLARE
COL1 VARCHAR2(50) := '123456789';
N_TBL VARCHAR_ARRAY := VARCHAR_ARRAY();
C NUMBER;
BEGIN
-- Print timestamp
DBMS_OUTPUT.PUT_LINE('START: ' || TO_CHAR(SYSTIMESTAMP ,'dd-mm-yyyy hh24:mi:ss.FF'));
SELECT COUNT(COLUMN1)
INTO C
FROM MY_OTHER_TABLE
WHERE COLUMN1 IN (
-- Nested table replaced by a subquery
SELECT COLUMN1
FROM MY_TABLE
WHERE COLUMN1 = COL1
);
-- Print timestamp
DBMS_OUTPUT.PUT_LINE('ENDED: ' || TO_CHAR(SYSTIMESTAMP ,'dd-mm-yyyy hh24:mi:ss.FF'));
END;
The output is:
START: 01-08-2014 12:36:08.889
ENDED: 01-08-2014 12:36:08.903
It takes only 14 milliseconds...!!!
What could I do to enhance this PL-SQL block ?
Is there any database configuration needed?
Are the two query plans different?
Assuming that they are, the difference is likely that the optimizer has reasonable estimates about the number of rows the subquery will return and, thus, is able to choose the most efficient plan. When your data is in a nested table (I'd hate to use the word array in the type declaration here since that implies that you're using a varray when you're not), Oracle doesn't have information about how many elements are going to be in the collection. By default, it's going to guess that the collection has as many elements as your data blocks have bytes. So if you have 8k blocks, Oracle will guess that your collection has 8192 elements.
Assuming that your actual query doesn't return anywhere close to 8192 rows and that it actually returns many more or many fewer rows, you can potentially use the cardinality hint to let the optimizer make a more accurate guess. For example, if your query generally returns a few dozen rows, you probably want something like
SELECT COUNT(COLUMN1)
INTO C
FROM MY_OTHER_TABLE
WHERE COLUMN1 IN (SELECT /*+ cardinality(t 50) */ column_value
FROM TABLE(N_TBL) t);
The literal you put in the cardinality hint doesn't need to be particularly accurate, just close to general reality. If the number of rows is completely unknown the dynamic_sampling hint can help.
If you are using Oracle 11g, you may also benefit from cardinality feedback helping the optimizer learn to better estimate the number of elements in a collection.

100 strings in IN operator, oracle pl/sql

I am passing 100 table_names in the IN operator as strings, but I am getting numeric overflow error due to too many operands.
Is there a way where I can use something else besides IN ?
set serveroutput on
DECLARE
...
BEGIN
FOR r IN
(
SELECT table_name, column_name
FROM all_tab_columns
WHERE table_name IN (...100 strings)
)
AND data_type = 'NUMBER'
ORDER BY table_name, column_id
)
LOOP
execute immediate 'SELECT COUNT("' || r.column_name || '")
,COUNT(nvl2("' || r.column_name || '", NULL, 1))
FROM "' || r.table_name || '"'
INTO not_null_count, null_count;
DBMS_OUTPUT.PUT_LINE(..)
Note: For variables I am using PLS_Integer.
The suggested action for ORA-01426 is "reduce the operands". This doesn't mean reduce the number of operands. It means you're trying to put too large a number into a variable. So shrink the number, or enlarge the variable.
You say:
"for variables I am using PLS_Integer"
So, if you have a large table, and by large I mean more than 2,147,483,647 rows, you will get a numeric overflow. Because PLS_INTEGER is a 32-bit data type.
If this is your scenario then you need to declare your variables of data type INTEGER instead (or NUMBER(38,0)).
As #BobJarvis points out, PLS_INTEGER is optimized for hardware arithmetic. So the general advice would be to use it for counting type operations. However, your case simply requires a variable to hold the output of a SQL count() operation, so I don't think there will be any difference in efficiency.
I believe the limit on 'IN' clause is 1000 strings and not 100 strings. To debug:
a.) Try running your implicit cursor query in SQL.
b.) If it works fine then run the query in execute immediate after substituting the column name.
Also , try increasing the size of your not_null_count and null_count variables.
Hope it Helps
Vishad
some other possible solutions
use a temp table - populate it with the table names to filter join to it.
create a global array type
create type table_of_varchar2 is table of varchar2(30)
populate the array and filter using table_name member of arr_tables_list
Is there a way where I can use something else besides IN ?
Consider using a cursor instead.

Why use varchar2 instead of char for package constants?

Just came across a package which defines a large number of package-global constant strings as such:
DESTINATION_1 CONSTANT VARCHAR2(13) := '515 Pine Lane';
DESTINATION_2 CONSTANT VARCHAR2(18) := '670 Woodhaven Lane';
Is there any benefit to using varchar2 as the datatype for these over char?
Using Oracle 11g release 2.
In general I use VARCHAR2 instead of CHAR for all string variable or constant declarations for the simple reason that VARCHAR2 semantics are more consistent with my expectations. The issue here is that if the developer is less than accurate about his counting of the length of the constant, it will be padded on the right with blanks if declared as CHAR, while it will simply be stored without padding if declared as VARCHAR2. Consider:
DECLARE
strFixed CHAR(20) := 'This is a string';
strVariable VARCHAR2(20) := 'This is a string';
BEGIN
IF strFixed = strVariable THEN
DBMS_OUTPUT.PUT_LINE('Equal');
ELSE
DBMS_OUTPUT.PUT_LINE('Not equal');
END IF;
END;
At first look you might expect this to print "Equal", but it will actually print "Not equal". This is so because strFixed is not stored as 'This is a string'; instead, it's stored as "This is a string ' because CHAR variables are padded on the right with blanks out to the size specified in the variable declaration. Yes, I could have carefully counted the number of characters in the string (there are 17, by the way) and then adjusted the declaration carefully, but that is just SO 1970's (a decade I remember somewhat hazily and don't care to revisit :-). And, oh dear, I miscounted the number of characters in the string, so the fixed string would have been padded on the right to fill it out to the declared length and my comparison still wouldn't have worked.
The one case where I'll use CHAR instead of VARCHAR2 is if the variable of constant is only supposed to be a single character long. IMO declaring something as VARCHAR2(1) is just wrong. :-)
Just in passing I'll note that if you look in the package SYS.STANDARD you'll find that CHAR is declared as
subtype CHAR is VARCHAR2;
Thus, a CHAR is a VARCHAR2. Not sure how the space-padding is done, but it may well be that the space-padding is done at runtime and thus adds additional time.
Is there a performance advantage to one or the other? At most it won't be much. I suppose that if a CHAR variable has some space-padding it'll take a hair longer to compare than will an equivalent unpadded VARCHAR2 value, but in practical terms I don't believe this will matter. Also, since the space-padding is done at runtime that's going to add time. I suspect it's a wash, and will certainly be swamped by SQL effects.
Share and enjoy.
A CHAR datatype and VARCHAR2 datatype are stored identically ... So, in the case you describe, there's no difference.
The difference between a CHAR and a VARCHAR is that a CHAR(n) will ALWAYS be N bytes long, it will be blank padded upon insert to ensure this. A varchar2(n) on the other hand will be 1 to N bytes long, it will NOT be blank padded.
Hi this may be my assumption but varchar2 might have been used coz of performance,
a char is just a varchar2 blank padded out to the maximum length.
create table t ( x varchar2(30), y char(30) );
insert into t (x,y) values ( rpad('a',' ',30), 'a' );
IS ABSOLUTELY NOTHING, and given that the difference between columns X and Y below:
insert into t (x,y) values ('a','a')
is that X consumes 3 bytes (null indicator, leading byte length, 1 byte for 'a') and Y consumes 32 bytes (null indicator, leading byte length, 30 bytes for 'a ' )
Umm, varchar2 is going to be somewhat "at an advantage performance wise". It helps us NOT AT ALL that char(30) is always 30 bytes - to us, it is simply a varchar2 that is blank padded out to the maximum length.

OCI: Determine length of text representation of query columns

My goal is to execute a query (SELECT), fetch results and output them as text. Query is given as a parameter and can be e.g. select * from t.
I use OCIStmtPrepare and OCIStmtExecute, then I can describe columns of the query by OCIParamGet and series of OCIAttrGet. Suppose I get OCI_ATTR_DATA_TYPE = 12 (DATE) for one of the columns. Then OCI_ATTR_DATA_SIZE = 7 -- this is size of internal DATE representation.
I want to get this DATE as text, with respect to currect NLS settings. For that I use OCIDefineByPos with dty = SQLT_STR. It works alright, but I also need to supply a buffer for fetching. The question is: what size of buffer do I need?
Evidently it depends on NLS_DATE_FORMAT. I believe that Oracle knows this value:
SQL> create table x as select to_char(sysdate) d from dual;
Table created.
SQL> select value from nls_session_parameters where parameter='NLS_DATE_FORMAT';
VALUE
----------------------------------------
DD.MM.RR
SQL> select data_length from dba_tab_columns where table_name='X';
DATA_LENGTH
-----------
8
This is the exact length. Only when date format is masked from Oracle (by a function, for example), it uses absolute maximum (?) value of 75:
SQL> create or replace function get_date_format return varchar2 is
2 begin
3 return 'DD.MM.RR';
4 end;
5 /
Function created.
SQL> create table x as select to_char(sysdate,get_date_format) d from dual;
Table created.
SQL> select data_length from dba_tab_columns where table_name='X';
DATA_LENGTH
-----------
75
All said above applies to NUMBER as well.
So, is it possible to get length of text representation of a column in OCI?
The maximum buffer size for any date is 75. The maximum buffer size for any number is 42.
I hope that helps.
You can determine needed buffer size by calling OCIAttrGet for OCI_ATTR_DISP_SIZE attribute. It returns 40 for NUMBER, 75 for DATE, N for VARCHAR2(N). Add 1 byte for Null-termination and you good to go.
Yes - the trick is that in C, a string is really a pointer to a character array, so you would say char* mystring = OCIStringPtr(envhp, x); where x is a pointer to an OCIString, which you can get back by connecting with OCI_OBJECT set and asking for a SQLT_VST instead of an SQLT_STR. The actual memory for the string is allocated for you in the global env by OCI behind the scenes.

Resources