I need to save before and after value changes of certain fields of an items table to an items_log table. Changes are saved by an after change trigger on the items table.
Some of the items table columns are varchar2 type and some are number(*) type.
What is the better approach? Saving to separate two before and after number fields and two before and after varchar2 fields? Or conserving space by saving everything to two before and after varchar2 fields?
The purpose of this log table is to record which user changed a field and the before and after values.
Could saving a float value to a string field lead to an unexpected diversion from the original value?
Thanks in advance
"What is the better approach?"
There is no "better" approach. There is only an approach that's good enough for your application. If your table will have a few thousand rows in it, it doesn't really matter. If your table will have a few million rows, then space may be more of a concern.
If your goal is to display to a user what changes occurred to your item and it's not going to see a lot of activity, storing everything as a varchar may be good enough. You probably don't want to store rows for fields that did not change.
I use APC's approach often. The items_log table is the same as the item table, and includes a history id, timestamp, action (I, U, or D), and user along with all the columns of the item row. Everything is maintained by a trigger. There are also built-in Oracle auditing features to do auditing for you.
Related
I am facing a weird problem. I have a table (observation_measurement) in oracle DB and it has many fields. One field name is observation_name. this observation_name field stores different measurements with it's value from a text file.
For example, observation_name stores four measurements a,b,c,d (name of the measurements) and their corresponding values 1,2,3,4 (values of those measurements).
Later it is reading same text file. This time that text file has three measurements a,b,d (c is not there) and their values are 7,8,9 and then store in the table. So, if I need the latest values for all observation_names then I should get a=7,b=8,c=null,d=9. But it is giving me
a=7,b=8,c=3,d=9. I dont know why it is getting old data for c measurement.
Any ideas?
NULL has to be handled specially in Oracle, like IS NULL or IS NOT NULL.
Hope, your update logic involves some validation over the column and it leaves NULL values untreated.
Since, some validation fails because of NULL, the old value is retained in the table.
Can you please update your question with the Query used to UPDATE the table.
I have one account table in that table I need to save the amount range
I have one drop down that has the values like $25k-$30k, $30k-$35k it needs to increase by 5 up to $250k.
I have planed to have all the values in one table (currency range) and I will map the id to the account. but my mate suggest that
it is beteer to save the values directly to account table.
Which is a best practice?
This Question may be closed by someone. I need only which is a best practice only.
First of all its wrong design approach to manage the range in varchar column.
I am not sure about your purpose to keep the range in varchar. If this is only for display and not required any manupulation then its better to change the account table directly.
But if you are doing further manupulation then here we have two approached to achieve it
1. It would good to have two saperate columns sale "MinValue" and "MaxValue" for limit range
2. If you are not suppose to change the account table then it will be good to keep a saperate table accountLimit and will have two column for range. Now you can associate the ID with account table. and can pick the value from the table accountLimit.
I am implementing full text search in postgres.
I would like to search all posts in my system. The posts fulltext index is an amalgamation of the post title and post body.
I have two ways of achieving this:
create a tsvector column in the posts table, trigger an update to it.
create a second table (posts_search) with a post_id and tsvector column containing the index data.
create a simple gin index ... (out of the question, cause my real world problem needs data in multiple tables for the index)
What is going to perform better, considering I sometimes need to filter down the search by other attributes in the table (like deleted_at is null and so on).
Is it a better approach to keep the tsvector column in the same table as the data (side effect select * now sucks) or a separate table (side effect, join required, index filtering is complicated)?
In my experiments, typical size of tsvector column is about 1% of the size of text field this tsvector was computed from using to_tsvector().
With this in mind, storing tsvector column in another table should provide performance benefit. For example, even if you do not use SELECT * (and you shouldn't, really), any seqscan in original single table will still have to load pages which contain original text. If you offload tsvector field to separate table, page loading will be faster by 100x.
In other words, I would favor second solution of offloading tsvector field to separate table. Or, alternatively, offloading posts (original text) deeper into your table hierarchy (but I guess it is almost the same thing).
Note that for full text search to work, original text is not necessary. You way want to even not store it in database, or store it in highly compressed format (and not necessarily easily accessible by SQL routines). It would work as long as something can create tsvector based on original text, or update when it changes.
I am designing a table in Teradata with about 30 columns. These columns are going to need to store several time-interval-style values such as Daily, Monthly, Weekly, etc. It is bad design to store the actual string values in the table since this would be an attrocious repeat of data. Instead, what I want to do is create a primitive lookup table. This table would hold Daily, Monthly, Weekly and would use Teradata's identity column to derive the primary key. This primary key would then be stored in the table I am creating as foreign keys.
This would work fine for my application since all I need to know is the primitive key value as I populate my web form's dropdown lists. However, other applications we use will need to either run reports or receive this data through feeds. Therefore, a view will need to be created that joins this table out to the primitives table so that it can actually return Daily, Monthly, and Weekly.
My concern is performance. I've never created a table with such a large amount of foreign key fields and am fairly new to Teradata. Before I go on the long road of figuring this all out the hard way, I'd like any advice I can get on the best way to achieve my goal.
Edit: I suppose I should add that this lookup table would be a mishmash of unrelated primitives. It would contain group of values relating to time intervals as already mentioned above, but also time frames such as 24x7 and 8x5. The table would be designed like this:
ID Type Value
--- ------------ ------------
1 Interval Daily
2 Interval Monthly
3 Interval Weekly
4 TimeFrame 24x7
5 TimeFrame 8x5
Edit Part 2: Added a new tag to get more exposure to this question.
What you've done should be fine. Obviously, you'll need to run the actual queries and collect statistics where appropriate.
One thing I can recommend is to have an additional row in the lookup table like so:
ID Type Value
--- ------------ ------------
0 Unknown Unknown
Then in the main table, instead of having fields as null, you would give them a value of 0. This allows you to use inner joins instead of outer joins, which will help with performance.
I have designed my database in such a way that One of my table contains 52 columns. All the attributes are tightly associated with the primary key attribute, So there is no scope of further Normalization.
Please let me know if same kind of situation arises and you don't want to keep so many columns in a single table, what is the other option to do that.
It is not odd in any way to have 50 columns. ERP systems often have 100+ columns in some tables.
One thing you could look into is to ensure most columns got valid default values (null, today etc). That will simplify inserts.
Also ensure your code always specifies the columns (i.e no "select *"). Any kind of future optimization will include indexes with a subset of the columns.
One approach we used once, is that you split your table into two tables. Both of these tables get the primary key of the original table. In the first table, you put your most frequently used columns and in the second table you put the lesser used columns. Generally the first one should be smaller. You now can speed up things in the first table with various indices. In our design, we even had the first table running on memory engine (RAM), since we only had reading queries. If you need to get the combination of columns from table1 and table2 you need to join both tables with the primary key.
A table with fifty-two columns is not necessarily wrong. As others have pointed out many databases have such beasts. However I would not consider ERP systems as exemplars of good data design: in my experience they tend to be rather the opposite.
Anyway, moving on!
You say this:
"All the attributes are tightly associated with the primary key
attribute"
Which means that your table is in third normal form (or perhaps BCNF). That being the case it's not true that no further normalisation is possible. Perhaps you can go to fifth normal form?
Fifth normal form is about removing join dependencies. All your columns are dependent on the primary key but there may also be dependencies between columns: e.g, there are multiple values of COL42 associated with each value of COL23. Join dependencies means that when we add a new value of COL23 we end up inserting several records, one for each value of COL42. The Wikipedia article on 5NF has a good worked example.
I admit not many people go as far as 5NF. And it might well be that even with fifty-two columns you table is already in 5NF. But it's worth checking. Because if you can break out one or two subsidiary tables you'll have improved your data model and made your main table easier to work with.
Another option is the "item-result pair" (IRP) design over the "multi-column table" MCT design, especially if you'll be adding more columns from time to time.
MCT_TABLE
---------
KEY_col(s)
Col1
Col2
Col3
...
IRP_TABLE
---------
KEY_col(s)
ITEM
VALUE
select * from IRP_TABLE;
KEY_COL ITEM VALUE
------- ---- -----
1 NAME Joe
1 AGE 44
1 WGT 202
...
IRP is a bit harder to use, but much more flexible.
I've built very large systems using the IRP design and it can perform well even for massive data. In fact it kind of behaves like a column organized DB as you only pull in the rows you need (i.e. less I/O) rather that an entire wide row when you only need a few columns (i.e. more I/O).