Single Schedule Multiple entry in ODI - oracle

I am scheduling in mapping. This is steps i have taken.
create a scenario of the mapping
2.Create schedule of the scenario.
i. in Definition-
a. Context: Global
b. Log Level: 5
c. Status: Active
ii. Execution style-
a. Repetition: Many Times, Interval between repetition: 1 minutes
Now, when i go to schedule window of Agent, i see two schedule active for the same mapping. Why?. Isn't there should be only one schedule here.
I see only one schedule in agent only if do the changes on Definition and keep Execution style empty.
Tool: ODI 12c
Images: Issue images

You can see in the third part of your picture, in Schedule, that on the column Start time, the time is different for the two lines. There you can see all your runnings.
So, you don't have two schedule, but one that was run two times in the same day.

Related

Updating Same Record with Multiple Times at the Same Time on Oracle

I have a Voucher Management System. And I also have a campaign on that system works first click first get order. First minutes on the start of campaign, so many people trying to get to vouchers. And every time when they try to do this, the same mechanism works on thedesign image.
At the updating number of given voucher, I realize a bottleneck. All this updates trying to update same row on a limited time. Because of that, transactions adding a queue and waiting for the current update.
After the campaign, I see some updates waited for 10 seconds. How can I solve this?
Firstly, I try to minimize query execution time, but it is already a simple query.
Assuming everyone is doing something like:
SELECT ..
FROM voucher_table
WHERE <criteria indicating the first order>
FOR UPDATE
then obviously everyone (except one person) is going to queue up until the first person commits. Effectively you end up with a single user system.
You might to check out the SKIP LOCKED clause, which allows a process to attempt to lock an eligible row but still skip over it to the next eligible row if the first one is locked.

Plugin to trigger only once on create / update of multiple records in entity / table x

We have an entity which holds 3 types of Records A, B, C. Record type A is the parent of B (each A can be the parent to multiple B records) and further B is the parent of C (each B can be the parent to multiple C records). On creation / update of every record C, CalculateCommercials plugin will run that will pull all the sibling C records under a given record B and aggregate / roll-up the totals and update that parent record B with those rolled-up sum / totals. Same thing happens on update of B record totals which will roll-up the totals to grand-parent A record. The problem here is, when we create/update multiple C records under a given parent B this will trigger multiple CalculateCommercials plugin instances which is a bit inefficient & resource intensive. Is there a better approach where we let the CalculateCommercials to trigger only once irrespective of the number of C records being created / updated?
For example, if we create / update 10 C records at a given time we want the CalculateCommercials plugin to run only once which will roll-up the totals to B and again only once to update A, instead of 10 times (currently its triggering 10 CalculateCommercials plugin instances to roll-up to B, which in-turn trigger another 10 more instances to update grand-parent record type C).
Sometimes this chain of auto-triggers resulting in the exceeding of 2 minute time limit for the plugin instance. Is there a better approach to simplify the rolling up of totals to parent B and then on to A?
A lot depends upon how the rollup fields are used, and their consistency requirements.
Simplest approach is to plan for eventual consistency rather than strict consistency. The update-grandchild plugin would merely mark a "dirty" field in the grandchild entity, which requires no additional DB locking. Separately, a scheduled workflow or Power Automate flow runs every 'N' minutes, which finds all the dirty grandchildren, updates their parents' rollup fields, and resets the dirty fields. Because this workflow is scheduled and nothing else takes a write lock on the rollup fields, there is little contention on the parent and grandparent records. However, it means that rollup values are always a few minutes out-of-date if the grandchildren are constantly changing.
D365 seems to implement a "rollup field" concept that does exactly this: https://www.encorebusiness.com/blog/rollup-fields-in-microsoft-dynamics-365-crm/ The scheduled task runs every hour by default, but can be configured by an admin.
To improve the update latency, you could make the flow on-demand or triggered by updates to the relevant grandchild fields, but have it check & update a "flow already triggered" table to ensure that multiple flow instances aren't trying to do the updates simultaneously. The "winning" instance of the flow sets the "running" flag, loops repeatedly until it finds no more dirty records, and then clears the "running" flag. The "losing" instances terminate immediately when they see that another instance is active.
This pattern requires a lot more care to handle potential race conditions related to records being marked dirty after the running flag is cleared, as well as transient errors that might terminate the flow before it completes. Again, a simple solution is to schedule a "sweeper" instance once per hour, or day.
If you need it to be strictly consistent, then this whitepaper might help: https://download.microsoft.com/download/D/6/6/D66E61BA-3D18-49E8-B042-8434E64FAFCA/ScalableDynamicsCRMCustomizations.docx It discusses the tradeoffs inherent in various locking and registration options for plugins.
If the rollup fields are rarely consumed, then it might be more efficient to recalculate the field only when a client requests that field while reading the parent or grandparent entity, using a separate plugin on those entities. You'd want to ensure that clients don't request that field unless they need it.

How to process a logic or job periodically for all users in a large scale?

I have a large set of users in my project like 50m.
I should create a playlist for each user every day, for doing this, I'm currently using this method:
I have a column in my users' table that holds the latest time of creating a playlist for that user, and I name it last_playlist_created_at.
I run a query on the users' table and get the top 1000s, that selects the list of users which their last_playlist_created_at is past one day and sort the result in ascending order by last_playlist_created_at
After that, I run a foreach on the result and publish a message for each in my message-broker.
Behind the message-broker, I start around 64 workers to process the messages (create a playlist for the user) and update last_playlist_created_at in the users' table.
If my message-broker messages list was empty, I will repeat these steps (While - Do-While)
I think the processing method is good enough and can be scalable as well,
but the method we use to create the message for each user is not scalable!
How should I do to dispatch a large set of messages for each of my users?
Ok, so my answer is completely based on your comment where you mentioned that you use while(true) to check if the playlist needs to be updated which does not seem so trivial.
Although this is a design question and there are multiple solutions, here's how I would solve it.
First up, think of updating the playlist for a user as a job.
Now, in your case this is a scheduled Job. ie. once a day.
So, use a scheduler to schedule the next job time.
Write a Scheduled Job Handler to push this to a Message Queue. This part is just to handle multiple jobs at the same time where you could control the flow.
Generate the playlist for the user based on the job. Create a Schedule event for the next day.
You could persist Scheduled Job data just to avoid race conditions.

Oracle DBMS_SCHEDULER job to monitor a DBMS_ALERT

Env: Oracle 12c R2
Trying to understand what the best approach would be to set up an Oracle DBMS_SCHEDULER job that would be used to monitor a DBMS_ALERT trigger that checks when a specific column value changes within a table.
The thing is, this table column value change will sometimes occur on a frequent basis and sometimes it may only occur twice a day but I will need to monitor this column change via the DBMS_ALERT.
The trigger I have is as follows and I have a procedure called check_signal that checks for the signal that I wish to use within the DBMS_SCHEDULER job.
The goal that I am trying to achieve is that I am going to have the situation where I will need to run say, three jobs:
Job1
Job2
Job3
The thing is, the payload returned from Job1 is required and and passed as parameters into Job2 and again, the payload returned from Job2 is required and passed as parameters into Job3.
It is this wait/alert that I am trying to achieve through the use of DBMS_ALERTS.
create or replace trigger my_tab_upd after update of status on my_tab for each row
begin
dbms_alert.signal('mystatusalert', 'changed from '||:old.status||' to '||:new.status||'.');
end;
/
This will be used via a web-based application which is used by multiple users.
Just unsure on how to setup this scheduled job that will continuously check for the alert and then be used within the web app.
If there is a better means than DBMS_ALERT, then please let me know.
The general answer is simple, while polling for events every N seconds you get an average delay N/2 seconds and maximal delay of N seconds.
In context of DBMS_ALERT you should re-think this approach, as this will implement polling with wait on the event.
The periodically executed jobs make basically tho thinks:
DBMS_ALERT.REGISTER on an event name
wait with DBMS_ALERT.WAITONE
Assume that the DBMS_SCHEDULER jobs runs every 10 seconds and it is started in the phase with frequent signalling. So the first execution returns quickly after receiving an event.
The second execution falls in the quite period, so the job will wait hours to get an event.
I think this is not what you expect as
1) the waiting job will have an open session - what you want to avoid as follows from you other question
You may use timeout = 0 in the DBMS_ALERT.WAITONE, but this will return close to no events, except those fired accidentally between the REGISTER and WAITONE
2) if in the first 10 seconds two events are signalled, the second one will be lost as at the signaling time the subscribing job is not active and no registration exists.

How to order ETL tasks in Sql Server Data Tools (Integration Services)?

I'm a newbie in ETL processing. I am trying to populate a data mart through ETL and have hit a bump. I have 4 ETL tasks(Each task filling a particular table in the Mart) and the problem is that I need to perform them in a particular order so as to avoid constraint violations like Foreign Key constraints. How can I achieve this? Any help is really appreciated.
This is a snap of my current ETL:
Create a separate Data Flow Task for each table you're populating in the Control Flow, and then simply connect them together in the order you need them to run in. You should be able to just copy/paste the components from your current Data Flow to the new ones you create.
The connections between Tasks in the Control Flow are called Precendence Constraints, and if you double-click on one you'll see that they give you a number of options on how to control the flow of your ETL package. For now though, you'll probably be fine leaving it on the defaults - this will mean that each Data Flow Task will wait for the previous one to finish successfully. If one fails, the next one won't start and the package will fail.
If you want some tables to load in parallel, but then have some later tables wait for all of those to be finished, I would suggest adding a Sequence Container and putting the ones that need to load in parallel into it. Then connect from the Sequence Container to your next Data Flow Task(s) - or even from one Sequence Container to another. For instance, you might want one Sequence Container holding all of your Dimension loading processes, followed by another Sequence Container holding all of your Fact loading processes.
A common pattern goes a step further than using separate Data Flow Tasks. If you create a separate package for every table you're populating, you can then create a parent package, and use the Execute Package Task to call each of the child packages in the correct order. This is fantastic for reusability, and makes it easy for you to manually populate a single table when needed. It's also really nice when you're testing, as you don't need to keep disabling some Tasks or re-running the entire load when you want to test a single table. I'd suggest adopting this pattern early on so you don't have a lot of re-work to do later.

Resources