I am experiencing a strange problem with an Oracle DB, and would like to ask if anyone has experienced a similar problem:
In ALL of my tables (user_tables) EVERY SINGLE row has been duplicated.
What kind of action could have caused such a thing?
Can I restore the previous state without cleaning every table "by hand"?
I can imagine many actions which could lead to this situation (for example running imp twice), but it doesn't matter. You should simply prevent such duplicates by means of Unique/Primary keys.
As for restoring previous state you may want to read about flashback query feature:
http://docs.oracle.com/cd/B13789_01/appdev.101/b10795/adfns_fl.htm
Related
I am new to Teradata & fortunately got a chance to work on both DDL-DML statements.
One thing I observed is Teradata is very slow when time comes to UPDATE the data in a table having large number of records.
The simplest way I found on the Google to perform this update is to write an INSERT-SELECT statement with a CASE on column holding values to be update with new values.
But what when this situation arrives in Data Warehouse environment, when we need to update multiple columns from a table holding millions of rows ?
Which would be the best approach to follow ?
INSERT-SELECT only OR MERGE-UPDATE OR MLOAD ?
Not sure if any of the above approach is not used for this UPDATE operation.
Thank you in advance!
At enterprise level, we expect volumes to be huge and updates are often part of some scheduled jobs/scripts.
With huge volume of data, Updates comes as a costly operation that involve risk of blocking table for some time in case the update fails (due to fallback journal). Although scripts are tested well, and failures seldom happen in production environments, it's always better to have data that needs to be updated loaded to a temporary table in required form and inserted back to same table after deleting matching records to maintain SCD-1 (Where we don't maintain history).
I have a table with many rows.
For testing purpose my colleagues are also using same table. The problem is that some time he is deleting the row which I was testing and some time I.
So is there any way in oracle so I can make some specific rows to be read only so other should not delete and edit that?
Thanks.
There are a number of differnt ways to tackle this problem.
As Sun Tzu said, the best thing would be if you and your colleagues use data sets which do not collide.
For instance perhaps you could each have your own database instance, on local PCs; whether this will suit depends on a number of factors, not the least of which is your licensing arrangements with Oracle. Alternatively, you could have separate schemas in a shared database; depending on your application you may need to you synonyms or special connectioms.
Another approach: everybody builds their own data sets, known as test fixtures. This is a good policy, because testing is only truly valid when it runs against a known state; if we make assumptions regarding the presence or absence of data how valid are our test results? The point is, the tests should clean up after themselves, removing any data created in fixtures and by the running of tests. With this tactic you need to agree ranges of IDs for each team member: they must only use records within their ranges for testing or development work.
I prefer these sorts of approach because they don't really change the way the application works (arguably except using different schemas and synonyms). More draconian methods are available.
If you have Enterprise Edition you can use Row Level Security to protect your records. This is a extension of the last point: you will need a mechanism for identifying your records, and some infrastructure to identify ownership within the session. But in addition to preventing other users rom deleting your data you can also prevent them inserting, updating or even viewing records which are with your range of IDs. Find out more.
A lighter solution is use a trigger as A B Cade suggests. You will still need to identifying your records and who is connected (because presumably from time-to-time you will still want to delete your records.
One last strategy: take your ball home. Get the table in the state you want it and make a data pump export. For extra vindictiveness you can truncate the table at this point. Then any time you want to use the table you run a data pump import. This will reset the table's state, wiping out any existing data. This is just an extreme version of test scripts creating their own data.
You can create a trigger that prevents deleting some specific rows.
CREATE OR REPLACE TRIGGER trg_dont_delete
BEFORE DELETE
ON <your_table_name>
FOR EACH ROW
BEGIN
IF :OLD.ID in (<IDs of rows you dont want to be deleted>) THEN
raise_application_error (-20001, 'Do not delete my records!!!');
END IF;
END;
Of course you can make it smarter - make the if statement rely on user, or get the records IDs from another table and so on
Oracle supports row level locking. you can prevent the others to delete the row, which one you are using. for knowing better check this link.
I would like to write a MERGE statement in Vertica database.
I know it can't be used directly, and insert/update has to be
combined to get the desired effect.
The merge sentence looks like this:
MERGE INTO table c USING (select b.field1,field2 aeg from table a, table b
where a.field3='Y'
and a.field4=b.field4
group by b.field1) t
on (c.field1=t.field1)
WHEN MATCHED THEN
UPDATE
set c.UUS_NAIT=t.field2;
Would just like to see an example of MERGE being used as insert/update.
You really don't want to do an update in Vertica. Inserting is fine. Selects are fine. But I would highly recommend staying away from anything that updates or deletes.
The system is optimized for reading large amounts of data and for inserting large amounts of data. So since you want to do an operation that does 1 of the 2 I would advise against it.
As you stated, you can break apart the statement into an insert and an update.
What I would recommend, not knowing the details of what you want to do so this is subject to change:
1) Insert data from an outside source into a staging table.
2) Perform and INSERT-SELECT from that table into the table you desire using the criteria you are thinking about. Either using a join or in two statements with subqueries to the table you want to test against.
3) Truncate the staging table.
It seems convoluted I guess, but you really don't want to do UPDATE's. And if you think that is a hassle, please remember that what causes the hassle is what gives you your gains on SELECT statements.
If you want an example of a MERGE statement follow the link. That is the link to the Vertica documentation. Remember to follow the instructions clearly. You cannot write a Merge with WHEN NOT MATCHED followed and WHEN MATCHED. It has to follow the sequence as given in the usage description in the documentation (which is the other way round). But you can choose to omit one completely.
I'm not sure, if you are aware of the fact that in Vertica, data which is updated or deleted is not really removed from the table, but just marked as 'deleted'. This sort of data can be manually removed by running: SELECT PURGE_TABLE('schemaName.tableName');
You might need super user permissions to do that on that schema.
More about this can be read here: Vertica Documentation; Purge Data.
An example of this from Vertica's Website: Update and Insert Simultaneously using MERGE
I agree that Merge is supported in Vertica version 6.0. But if Vertica's AHM or epoch management settings are set to save a lot of history (deleted) data, it will slow down your updates. The update speeds might go from what is bad, to worse, to horrible.
What I generally do to get rid of deleted (old) data is run the purge on the table after updating the table. This has helped maintain the speed of the updates.
Merge is useful where you definitely need to run updates. Especially incremental daily updates which might update millions of rows.
Getting to your answer: I don't think Vertica supportes Subquery in Merge. You would get the following.
ERROR 0: Subquery in MERGE is not supported
When I had a similar use-case, I created a view using the sub-query and merged into the destination table using the newly created view as my source table. That should let you keep using MERGE operations in Vertica and regular PURGEs should let you keep your updates fast.
In fact merge also helps avoid duplicate entries during inserts or updates if you use the correct combination of fields in ON clause, which should ideally be a join on the primary keys.
I like geoff's answer in general. It seems counterintuitive, but you'll have better results creating a new table with the rows you want in it versus modifying an existing one.
That said, doing so would only be worth it once the table gets past a certain size, or past a certain number of UPDATEs. If you're talking about a table <1mil rows, I might chance it and do the updates in place, and then purge to get rid of tombstoned rows.
To be clear, Vertica is not well suited for single row updates but large bulk updates are much less of an issue. I would not recommend re-creating the entire table, I would look into strategies around recreating partitions or bulk updates from staging tables.
I need some help in auditing in Oracle. We have a database with many tables and we want to be able to audit every change made to any table in any field. So the things we want to have in this audit are:
user who modified
time of change occurred
old value and new value
so we started creating the trigger which was supposed to perform the audit for any table but then had issues...
As I mentioned before we have so many tables and we cannot go creating a trigger per each table. So the idea is creating a master trigger that can behaves dynamically for any table that fires the trigger. I was trying to do it but no lucky at all....it seems that Oracle restricts the trigger environment just for a table which is declared by code and not dynamically like we want to do.
Do you have any idea on how to do this or any other advice for solving this issue?
If you have 10g enterprise edition you should look at Oracle's Fine-Grained Auditing. It is definitely better than rolling your own.
But if you have a lesser version or for some reason FGA is not to your taste, here is how to do it. The key thing is: build a separate audit table for each application table.
I know this is not what you want to hear because it doesn't match the table structure you outlined above. But storing a row with OLD and NEW values for each column affected by an update is a really bad idea:
It doesn't scale ( a single update touching ten columns spawns ten inserts)
What about when you insert a record?
It is a complete pain to assemble the state of a record at any given time
So, have an audit table for each application table, with an identical structure. That means including the CHANGED_TIMESTAMP and CHANGED_USER on the application table, but that is not a bad thing.
Finally, and you know where this is leading, have a trigger on each table which inserts a whole record with just the :NEW values into the audit table. The trigger should fire on INSERT and UPDATE. This gives the complete history, it is easy enough to diff two versions of the record. For a DELETE you will insert an audit record with just the primary key populated and all other columns empty.
Your objection will be that you have too many tables and too many columns to implement all these objects. But it is simple enough to generate the table and trigger DDL statements from the data dictionary (user_tables, user_tab_columns).
You don't need write your own triggers.
Oracle ships with flexible and fine grained audit trail services. Have a look at this document (9i) as a starting point.
(Edit: Here's a link for 10g and 11g versions of the same document.)
You can audit so much that it can be like drinking from the firehose - and that can hurt the server performance at some point, or could leave you with so much audit information that you won't be able to extract meaningful information from it quickly, and/or you could end up eating up lots of disk space. Spend some time thinking about how much audit information you really need, and how long you might need to keep it around. To do so might require starting with a basic configuration, and then tailoring it down after you're able to get a sample of the kind of volume of audit trail data you're actually collecting.
Advice/suggestions needed for a bit of application design.
I have an application which uses 2 tables, one is a staging table, which many separate processes write to, once a 'group' of processes has finished, another job comes along a aggregates the results together into a final table, then deletes that 'group' from the staging table.
The problem that I'm having is that when the staging table is being cleared out, lots of redo is generated and I'm seeing a lot of 'log file sync' waits in the database. This is a shared database with many other applications and this is causing some issues.
When applying the aggregate, the rows are reduced to about 1 row in the final table for every 20 rows in the staging table.
I'm thinking of getting around this by rather than having a single 'staging' table, I will create a table for each 'group'. Once done, this table can just be dropped, which should result in much less redo.
I only have SE, so partitioned tables isn't an option. Also faster disks for the redo probably isn't an option in the short term either.
Is this a bad idea? Any better solutions to be offered?
Thanks.
Would it be possible to solve the problem by having your process do a logical delete (i.e. set a DELETE_FLAG column in the table to 'Y') and then having a nightly process that truncates the table (potentially writing any non-deleted rows to a separate table before the truncate and then copy them back after the table is truncated)?
Are you certain that the source of the log file sync waits is that your disks can't keep up with the I/O? That's certainly possible, of course, but there are other possible causes of excessive log file sync waits including excessive commits. There is an excellent article on tuning log file sync events on the Pythian blog.
The most common cause of excessive log file syncs is too frequent commits, which are often deliberately coded in a mistaken attempt to reduce system load due to locking. You should commit only when your business transaction is complete.
Loading each group into a separate table sounds like a fine plan to reduce redo. You can truncate individual group table following each aggregation.
Another (but I think probably worse) option is to create a new staging table with the groups that haven't been aggregated then drop the original and rename the new table to replace the staging table.
I prefer Justin's suggestion ("logical delete"), but another option to consider might be a partitioned table, if you have the EE licence. The aggregation process could drop a partition instead of deleting the rows.