How would I calculate or estimate the increased storage difference from increasing the precision on a column from number(2,0) to number(6,0)? Or is it the same? No difference? Let's pretend I have 1 million rows in the table. What does the arithmetic look like?
The precision of a number field is basically just a constraint on how much precision Oracle will retain when storing a value. The underlying number format is actually the same (it's a varying-width field, and not something like a fixed-sized integer or float) - the amount of space required is related to the number of digits in the specific numbers which are being stored.
So if you don't modify the values in the table, then the size shouldn't change. But if you increase the precision and then update the table with values with more digits, then they will potentially consume more space.
Related
Assuming I have the following column in my DB:
value NUMBER(12,3)
What will happen if I will try to store a decimal that exceeds the defined scale? 1234.56789
Will I get an error that the scale is exceeded, or will the value be rounded to 3 decimals and stored?
Also, Is it a good practice to let the database do the rounding? Or should that rather be done in the code?
It will be rounded. In your particular example it will be 1234.568 .
For reference:
When you define a NUMBER variable, you can specify its precision (p)
and scale (s) so that it is sufficiently, but not unnecessarily,
large. Precision is the number of significant digits. Scale can be
positive or negative. Positive scale identifies the number of digits
to the right of the decimal point; negative scale identifies the
number of digits to the left of the decimal point that can be rounded
up or down.
The NUMBER data type is supported by Oracle Database standard
libraries and operates the same way as it does in SQL. It is used for
dimensions and surrogates when a text or INTEGER data type is not
appropriate. It is typically assigned to variables that are not used
for calculations (like forecasts and aggregations), and it is used for
variables that must match the rounding behavior of the database or
require a high degree of precision.
oracle DB is storing values as NUMBER, which from my understanding is max precision and scale. However hive documentation states that if you set DECIMAL with no precision or scale you will get DECIMAL(10,0) doc. What would the correct HIVE datatype be to accommodate ORACLE NUMBER.
Since Oracle allows any precision/scale for a NUMBER datatype, safest is to store it as Hive STRING, to avoid any kind of data loss.
depending on actual data/or it's usage, downstream can convert it to needed datatype.
According to Oracle docs:
NUMBER:
INT when the scale is 0 and the precision is less than 10
BIGINT when the scale is 0 and the precision is less than 19
DECIMAL when the scale is greater than 0 or the precision is greater
than 19
Is there any way to preserve number precision (0.100 vs 0.1) in Oracle? The precision needs to be stored for accountability reasons.
I want to avoid storing these numbers as a string because there is also a set of stored procedures which do some number crunching on these values.
No. The numbers 0.100, 0.1, and .1 are all identical from Oracle's perspective. They'll all have identical internal representations.
If you need to store the precision, you'll need a second column to store the precision. Or you'll need to store the number in a VARCHAR2 column and convert it to a number before doing the number crunching. Of course, you'll need to define the rules for handling precision in your number crunching logic. If you add 0.100 and 0.22, for example, is the result 0.32 or 0.320?
I would suggest storing both the numeric value for queries and mathematical operations, and the string version "as entered" for audit purposes.
I would suggest if the use case only requires the additional digits to be displayed then convert it to a string as the last step before sending the data to the report:
SELECT
TO_CHAR(column_name, 999.999)
FROM
table_name;
I'm trying to do a growth prediction on some tables I have and for that I've got to do some calculations on my row sizes, how many rows I generate by day and well.. the maths.
I'm calculating the average size of each row in my table as the sum of the average size of each field. So basicaly:
SELECT 'COL1' , avg(vsize(COL1)) FROM TABLE union
SELECT 'COL2' , avg(vsize(COL2)) FROM TABLE
Sum that up, multiply by the number of entries of a day and work the predictions from there.
Turns out that for one of the tables I've looked the resulting size is a lot smaller than I thought it would be and got me wondering if my method was right.
Also, I did not consider indexes sizes for my predictions - and of course I should.
My questions are:
Is this method I'm using reliable?
Tips on how could I work the predictions for the Indexes?
I've done my googling, but the methods I find are all about the segments and extends or else calculations based in the whole table. I will need the step with the actual row of my table to do the predictions (I have to analyse the data in the table in order to figure how many records a day).
And finally, this is an approximation. I know I'm missing some bytes here and there with overheads and stuff. I just want to make sure I'm only missing bytes and not gigas :)
1) Your method is sound to calculate the average size of a row. (Though be aware that if your column contains null, you should use avg(nvl(vsize(col1), 0)) instead of avg(vsize(COL1))). However, it doesn't take into account the physical arrangement of rows.
First of all, it doesn't take into account the header info (from both blocks and rows): you can't fit 8k data into 8k blocks. See the documentation on data block format for more information.
Then, rows are not always stored neatly packed. Oracle lets some space in each blocks so that the rows can grow when they are updated (governed by the pctfree parameter). Also when the rows are deleted the empty space is not reclaimed right away (if you're not using ASSM with locally managed tablespaces, the amount of free space required for a block to return to the list of available blocks depends on pctused).
If you already have some representative data in your table, you can estimate the amount of extra space you will need by comparing the space physically used (all_tables.blocks*block_size after having gathered statistics) to the average row length.
By the way Oracle can easily give you a good estimate of the average row length: gather statistics on the table and query all_tables.avg_row_len.
2) Most of the time (read: unless there is a bug or you fall into an atypical use of the index), the index will grow proportionaly to the number of rows.
If you have representative data, you can have a good estimation of its future size by multiplying its actual size by the relative growth of the number of rows.
The last time Oracle published their formulae for estimating the size of schema objects was in Oracle 8.0, which means the linked document is ten years out of date. However, I don't expect very much has changed in how Oracle reserves segment header, block header, or row header information.
How can I work out the maximum row size in a table, if I'm only given the datatype lengths (from all_tab_cols.data_length column) of the columns in the table (i.e. no statistics or ANALYZE)? There's extra complications in that this is an IOT, so there's index tree size to consider as well.
The formulas sizing indexes are involved -- they're generally focused on calculating rows per block to determine actual disk space required for N rows -- and the last time I can find that Oracle included this in their documentation was in Oracle 8.0. See here.
I'd recreate the table, generate some data with a Data Generator and actually measure it.
I generally wouldn't go further than 10s of MB of generated data as sufficient for a guide.
In some places, they tend to generate over-generous data length (eg standardizing on string sizes of 20, 50, 100 bytes/characters). Even on names you may allow 30 characters when most people are in the 5-10 length. As such estimates derived from field sizes, rather than actual lengths, will be VERY vague and you'll have a very large error margin.