I have two PL/SQL systems, residing in two separate databases. SystemA will need to populate SystemB's tables. This will probably be done over a datalink. Everytime a set of records is inserted in SystemB's tables, a process in SystemB must run. I could wait for SystemA to complete and then run a script to start processing in SystemB, but since SystemA could spend many hours processing and then populating SystemB, I'd rather that SystemB handle each set of records as soon as they become available (each set can be processed indpendently of the others so this should work OK).
What I'm not sure of is how I can do even-driven programming in PL/SQL. I'd need SystemA to notify SystemB that a set is ready for processing. My first idea was to have a special "event" table in SystemB and then when SystemA finishes a set, it inserts into the "event" table and there is a trigger on insert that starts the process (and the process could be a long one, possibly 5-10 minutes per process) in SystemB. I don't have enough experience with triggers in Oracle to know if this is an established way of doing it, OR if there's a better mechanism. Suggestions? Tips? Advice?
Use Oracle Advanced Queuing; it's designed for this. I believe you'll still have to set up a database link between the two systems (from B to A in this case, to consume the queue on A).
Yes, Oracle Advance Queues or even having A submit a venerable Oracle Job to B would be a better idea.
And, if your process is going to be needing complete replication of the data from A to B, then you might want to look something like an Oracle Streams process to copy over the data and then do the processing.
Related
We're supposed to update some columns in a table 'tab1' with some values(which can be picked up from a different table 'tab2'). Now 'tab1' is getting new records inserted almost every few seconds(from MQ by a different system).
We want to design a solution that will update 'tab1' as soon as there is a new record added to 'tab1'. It doesn't have to be done in the same moment as the record is added, but the sooner its updated, the better. We were considering what can be the best way to do it:
1) First we thought of a 'before insert' trigger on tab1, so we can update the record - but that design was vetted out by our Architect, since the organization doesn't allow use of database triggers(don't know why, but that is a restriction, we have been asked to live with)
2) Second we thought, we will create a stored procedure which will perform the updates to records in 'tab1'. This stored procedure will be called within an long-running loop from a shell script. After every iteration there will be a pause of lets say 3 secs and then next loop will kick off, which will again call the stored proc. So this job will run 12 AM to 11:59 PM and then restarted every night.
My question is - is there a database only solution to this? Any other solutions are also welcome, but simplicity of design will be a huge plus. One colleague was wondering if there is a 'trigger-like' solution, which will perform the job within the database itself - so we don't have to write a shell script.
Any pointers will be appreciated!
Triggers The obvious solution.
DBMS_SCHEDULER Another obvious solution.
Continuous Query Notification This would be a "trigger-like" solution. It's meant to call an application when the results of a specific query would be different. But you can call PL/SQL instead of an application, and the query could be a simple select * from tab1; which would fire on any table changes. Normally I'd hope an architect would be to look at this solution and say, "a trigger would be a lot simpler".
DBMS_JOBS This is the old version of DBMS_SCHEDULER and is not as good. But it's different and maybe it won't be caught as an unauthorized feature.
Ignore the Architect The problem isn't that he disapproved of using triggers or jobs; there may be legitimate reasons to ban those technologies. The problem is that he rejected a sound idea without clearly articulating why it wasn't allowed. If he understood databases, or cared about your project, or acted like a professional, he would have said something like, "Oh, I'm sorry, I know that's the typical way to do this, but we don't allow it because of X, Y, Z."
To answer your questions:
Q: Is there a database only solution to this?
Unlikely, given all the limitations on your architecture.
Q: Any other solutions are also welcomed
It seems your likely solution is to have your application handle what would normally be handled by a trigger or stored procedure. Just do it all in one transaction.
I am considering using an Oracle database to synchronize concurrent operations from two or more web applications on separate servers. The database is the single infrastructure element in common for those applications.
There is a good chance that two or more applications will attempt to perform the same operation at the exact same moment (cron invoked). I want to use the database to let one application decide that it will be the one which will do the work, and that the others will not do it at all.
The general idea is to perform a somehow-atomic and visible to all connections select/insert with node's ID. Only node which has the same id as the first inserted node ID returned by select would be do the work.
It was suggested to me that a merge statement can be of use here. However, after doing some research, I found a discussion which states that the merge statement is not designed to be called
Another option is to lock a table. By definition, only one node will be able to lock the server and do the insert, then select. After the lock is removed, other instances will see the inserted value and will not perform work.
What other solutions would you consider? I frown on workarounds with random delays, or even using oracle exceptions to notify a node that it should not do the work. I'd prefer a clean solution.
I ended up going with SELECT FOR UPDATE. It works as intended. It is important to remember to commit the transaction as soon as the needed update is made, so that other nodes don't hang waiting for the value.
I need your opinion in this situation. I’ll try to explain the scenario. I have a Windows service that stores data in an Oracle database periodically. The table where this data is being stored is partitioned by date (Interval-Date Range Partitioning). The database also has a dbms_scheduler job that, among other operations, truncates and drops older partitions.
This approach has been working for some time, but recently I had an ORA-00054 error. After some investigation, the error was reproduced with the following steps:
Open one sqlplus session, disable auto-commit, and insert data in the
partitioned table, without committing the changes;
Open another sqlplus session and truncate/drop an old partition (DDL
operations are automatically committed, if I’m not mistaken). We
will then get the ORA-00054 error.
There are some constraints worthy to be mentioned:
I don’t have DBA access to the database;
This is a legacy application and a complete refactoring isn’t
feasible;
So, in your opinion, is there any way of dropping these old partitions, without the risk of running into an ORA-00054 error and without the intervention of the DBA? I can just delete the data, but the number of empty partitions will grow everyday.
Many thanks in advance.
This error means somebody (or something) is working with the data in the partition you are trying to drop. That is, the lock is granted at the partition level. If nobody was using the partition your job could drop it.
Now you say this is a legacy app and you don't want to, or can't, refactor it. Fair enough. But there is clearly something not right if you have a process which is zapping data that some other process is using. I don't agree with #tbone's suggestion of just looping until the lock is released: you can't just get rid of data which somebody is using with establishing why they are still working with data that they apparently should not be using.
So, the first step is to find out what the locking session is doing. Why are they still amending this data your background job wants to retire? Here's a script which will help you establish which session has the lock.
Except that you "don't have DBA access to the database". Hmmm, that's a curly one. Basically this is not a problem which can be resolved without DBA access.
It seems like you have several issues to deal with. Unfortunately for you, they are political and architectural rather than technical, and there's not much we can do to help you further.
How about wrapping the truncate or drop in pl/sql that tries the operation in a loop, waiting x seconds between tries, for a max num of tries. Then use dbms_scheduler to call that procedure/function.
Maybe this can help. Seems to be the same issue as the one that you discribe.
(ignore the comic sans, if you can) :)
I'm trying to create a Ruby script that spawns several concurrent child processes, each of which needs to access the same data store (a queue of some type) and do something with the data. The problem is that each row of data should be processed only once, and a child process has no way of knowing whether another child process might be operating on the same data at the same instant.
I haven't picked a data store yet, but I'm leaning toward PostgreSQL simply because it's what I'm used to. I've seen the following SQL fragment suggested as a way to avoid race conditions, because the UPDATE clause supposedly locks the table row before the SELECT takes place:
UPDATE jobs
SET status = 'processed'
WHERE id = (
SELECT id FROM jobs WHERE status = 'pending' LIMIT 1
) RETURNING id, data_to_process;
But will this really work? It doesn't seem intuitive the Postgres (or any other database) could lock the table row before performing the SELECT, since the SELECT has to be executed to determine which table row needs to be locked for updating. In other words, I'm concerned that this SQL fragment won't really prevent two separate processes from select and operating on the same table row.
Am I being paranoid? And are there better options than traditional RDBMSs to handle concurrency situations like this?
As you said, use a queue. The standard solution for this in PostgreSQL is PgQ. It has all these concurrency problems worked out for you.
Do you really want many concurrent child processes that must operate serially on a single data store? I suggest that you create one writer process who has sole access to the database (whatever you use) and accepts requests from the other processes to do the database operations you want. Then do the appropriate queue management in that thread rather than making your database do it, and you are assured that only one process accesses the database at any time.
The situation you are describing is called "Non-repeatable read". There are two ways to solve this.
The preferred way would be to set the transaction isolation level to at least REPEATABLE READ. This will mean that any row that concurrent updates of the nature you described will fail. if two processes update the same rows in overlapping transactions one of them will be canceled, its changes ignored, and will return an error. That transaction will have to be retried. This is achieved by calling
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
At the start of the transaction. I can't seem to find documentation that explains an idiomatic way of doing this for ruby; you may have to emit that sql explicitly.
The other option is to manage the locking of tables explicitly, which can cause a transaction to block (and possibly deadlock) until the table is free. Transactions won't fail in the same way as they do above, but contention will be much higher, and so I won't describe the details.
That's pretty close to the approach I took when I wrote pg_message_queue, which is a simple queue implementation for PostgreSQL. Unlike PgQ, it requires no components outside of PostgreSQL to use.
It will work just fine. MVCC will come to the rescue.
Oracle has two seemingly competing technologies. CDC and DCN.
What are the strengths of each?
When would you use one and not the other?
In general, you would use DCN to notify a client application that the client application needs to clear/ update the application's cache. You would use CDC for ETL processing.
DCN would generally be preferable when you have an OLTP application that needs to be notified immediately about data changes in the database. Since the goal here is to minimize the number of network round-trips and the number of database hits, you'd generally want the application to use DCN for queries which either are mostly static. If a large fraction of the query is changing regularly, you may be better off just refreshing the application's cache on a set frequency rather than running queries constantly to get the changed data (DCN does not contain the changed data, just the ROWID of the row(s) that changed). If the application goes down, I believe DCN allows changes to be lost.
CDC would generally be preferable when you have a DSS application that needs to periodically pull over all the data that changed in a number of tables. CDC can guarantee that the subscriber has received every change to the underlying table(s) which can be important if you are trying to replicate changes to a different database . CDC allows the subscriber to pull the changes at its convenience rather than trying to notify the subscriber that there are changes, so you'd definitely want CDC if you wanted the subscriber to process new data every hour or every day rather than in near real time. (note: DCN also has a guaranteed delivery mode, see comments below. --Mark Harrison)
CDC seems to be much more complex to set up than DCN.
I mean to setup DCN I wrap a select in a start and end DCN block and then write a procedure to be called with a collect of changes. That's it.
CDC requires publishers and subscribers and anyways, seems like more work.