Correct SSIS executing tasks in random order - visual-studio

This is a repost from 8 years ago, since the solutions provided there didn't worked for me, maybe now there are more alternatives for me and the other people which had that problem and couldn't solve it as well.
I have six Data Flow Task as shown in the following screenshot:
They execute in a different order everytime I execute them, and even the first one executes twice. I've recreated the tasks, hoping it was SSIS executing them in creation order.
They run in a random order each time I execute the package despite the Precedence Constraints, so I decided to recreate the WHOLE package. Failed as well.
It simply feels like Microsoft is messing up with me, since I don't find another explanation.
Any help provided will be a relief for me if my post is not voted as Redundant.
/Edited in order to add info/
My real problem is SSIS not inserting data in a defined order. It just executes the data insert as it pleases. Because I do need data to be natively stored with a specified order. I've done it before, just don't get why this tiem is different. I could however run a ORDER BY to get the data as I want except it's not me the one who'll be accessing the data, hope the one who's gonna extract and print the data notice that.
The biggest issue however is SSIS executing twice a random task, as I can't have for any reason a duplicate of the data as it will be later used for summarizing as well.(I suspect this is connected to the random order execution since the guy who posted the original question had the exact same issue as me).
The real way to notice these issues is not looking at the SSIS processes, but looking at the data stored in the DB. Sorry if I was unclear about my problem.

The SSIS log doesn't show you the tasks in the order that they run in. In your screenshot above it looks like it put them in alphabetical order, in fact.
Just because Abril is above Enero in your execution log doesn't mean that Abril ran first and Enero ran second.
Addendum based on comments below:
You are under the misconception that if you INSERT data into a database in a certain order, that when you SELECT that data without specifying an ORDER BY, you will get the data in the order it was inserted. This turns out not to be the case. The ONLY guaranteed way to get data from a database in a certain order is to use an ORDER BY clause when you SELECT it.
Let me be perfectly clear about this. When you say "I get my data from March being listed first than my data from January, meaning it was inserted first", you are wrong.
As for why your January data seems be to getting inserted twice, we would need to see the details of all the working parts: the original source data, the destination data before insert, the destination data after insert, and the SSIS package that does the insert. Without enough information to reproduce the issue ourselves, there is no way we can help you understand why it is happening in your package.

Related

Pentaho Dimension Lookup/Update deadlock error

I have a Dimension Lookup/update step, and I am trying to update a table with data from JSON files, but it is failing with the following error:
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - ERROR (version 9.1.0.0-324, build 9.1.0.0-324 from 2020-09-07 05.09.05 by buildguy) : Because of an error this step can't continue:
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - Couldn't get row from result set
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - Transaction (Process ID 78) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
This is the configuration of the Dimension Lookup/update step.
and this is part of the transformation
If I use only one copy to start the step, it works everything ok, but if I put more than one it gives me the mentioned error. The thing is that the error seems to be casual, sometime crashes after inserting two rows, other times it inserts everything without giving the error.
Searching the documentation and in interned didn't help much, I was not able to fix it. I read that could be a insertion order problem or a primary key related problem, but the data is fine (the keys are unique) and the configuration of the step seems ok. What I noticed is that does not insert the Technical key in order, I think is because it depends on the process that finishes first, but I don't find a way to force it (assuming this is the problem).
Does anyone know which is the problem here, and how could I fix it? Thank you.
Don’t run multiple copies of the Lookup/Update step. It has a commit size of 100, so if you have 2 copies of the step you have two threads concurrently trying to update the same table. Most likely one of them is locking the table ( or a block of rows) that the other tries to write and that lock causes a timeout.
Why it sometimes crashes and sometimes works? It’s actually a bit random: each copy receives a batch of rows to act upon, and it depends on which rows are sent to each copy and how many updates are required.
So I managed to solve the problem finally. The problem was not related with Pentaho strictly but with SQL Server. In particular, I had to redefine the index on the table on which the insertion was made. You will find more details in this answer: Insert/Update deadlock with SQL Server

PL/SQL Developer statements sometimes do not commit or "stick"

I apologize if this is too vague, but it is a random issue that occurs with many types of statements. Google and Stack Overflow searches have failed me. Here is what I am experiencing, I hope that someone out there has seen or at least heard of this happening and possibly knows of a solution.
From time to time, with no apparent rhyme or reason, statements that I run through PL/SQL Developer against our Oracle databases do not "stick". Last week I ran an update on table A, a commit for the update statement, then a truncate on table B and an insert to table B followed by another commit. Everything seemed to work fine, as in I received no errors. I was, of course, able to query the changes and see that they were made. However, upon logging out and then back in, the changes had not been committed. Even the truncate command had not worked "stuck" - and truncates do not need a commit performed.
Some details that may be helpful: I am logging into the database server through PL/SQL on a shared account that is used by my team only to gain access to the schema (multiple schemas on each server, each schema has one shared login/PW). Of the 12 people on my team, I am the only one experiencing this issue. I have asked our database administration team to investigate my profile setup and have been told that my profile looks the same as my teammates' profiles. We are forced to go through Citrix to connect to our production database servers. I can only have one instance of PL/SQL open at any time through Citrix, so I typically have PL/SQL connected to several schemas, but I have never been running SQL on more than one schema simultaneously. I'm not even sure if that's possible, but I thought I would mention it. I typically have 3-4 windows open within PL/SQL, each connected to a different schema.
My manager was directly involved in a case where something similar to this happened. I ran four update commands, and committed each one in between; then he ran a select statement only to find that my updates had not actually committed.
I hope that one of my fellow Overflowers' has seen or heard of this issue, or at least may be able to provide me with a direction to follow to attempt to get to the bottom of this.
"it has begun to reflect poorly on me and damage my reputation in the company."
What would really reflect poorly on you would be you believing that an Oracle RDBMS is a magical or random device, or, even worse, sentient and conducting a personal vendetta against you. Computers may seem vindictive but that is always us projecting onto them ;-)
The way to burnish your reputation would be through an informed investigation of the situation. Databases do not randomly lose transactions. So, what is going on?
Possible culprits:
Triggers: does table A have an UPDATE trigger which suppresses some of your SQL?
Synonyms: are tables A and B really the tables you think they are?
Ownership: are these tables in another schema which has row level security enabled (although that should through an error message if you violate a policy)?
PL/SQL Developer configuration: is the IDE hiding error messages or are you not spotting them?
Object types: are tables A and B really tables? Could they be views with INSTEAD OF triggers suppressing some of your SQL?
Object types: or could A and B be materialized views and your session has QUERY_REWRITE_INTEGRITY=stale_tolerated?
If that last one seems a bit of a stretch there other similarly esoteric explanations, involving data flashback, pipelined functions and other malarky. This a category of explanation which indicates a colleague is pranking you.
How to proceed:
Try different tools. SQL*Plus (or the new SQL Command Line) may produce a different outcome. Rule out PL/SQL Developer.
Write some test cases. Strive to establish reproducible test cases: given a certain set-up this SQL statement always leads to a given outcome (SQL always sticks or always does not).
Eliminate bugs or "funnies" in the queries you use to check the results.
Use the data dictionary to understand the characteristics and associated objects of the troublesome tables. You need to understand what causes the different outcomes. What distinguishes a row where the UPDATE holds compared to one where it does not?
I have used PL/SQL Developer for over a decade and I have never known it silently undo successful truncate operations. If it can do that, AA should add it as a menu item. It seems more likely that you ran the commands against the wrong database connection.
I can feel your frustration, sorry you're going through this. I am surprised, however, that at a large company, your change control process is like this. I don't work for a large multi-national company, but any changes done to a production database are first approved by management and run by the DBAs (or in your case, your team). Every script that is run does a few things:
Lists the database instance information its connecting to. For example:
select host_name, instance_name, version, startup_time from v$instance;
Spools the output to a file (the DBAs typically use sqlplus, but I'm sure PL/SQL Developer can do the same)
Shows the current date and time (in the beginning and end of the script)
The output file is saved to a change control server (the directory structure makes it easy to pull any changes for a given instance and/or given timeframe)
Exits on any errors:
WHENEVER SQLERROR EXIT SQL.SQLCODE
Any additional checks that need to be run post script (select counts, etc)
Shows each command that is being run (set echo on), including the commits!
All of this would allow you to not only verify that the script was run successfully, but would allow you to CYOA. Perhaps you can talk with your team about putting some of this in place in your own environment. Hope that helps.
I have no way of knowing if my issue is fixed or not, but here is what I've done:
1. I contacted our company's Citrix team to request that they give my team the ability to have several instances of PL/SQL open. This has been done and so will eliminate the need for one instance with multiple DB connections.
2. I contacted the DBA's and had them remove my old profile, then create a new one with a new username.
So far, all SQL I've run under these new conditions has been just fine. However, I have no way of recreating the issue I'm experiencing so I am just continuing on about my business and hoping for the best.
Should I find a few months from now that I have not experienced this issue again I will update this post in case anyone else experiences it.
Thank you all for the accusations of operator error (screenshots prove that this is not operator error but why should you believe me when my own co-workers have accused me of faking the screenshots) and for the moral support.

A better approach than Oracle trigger

We're supposed to update some columns in a table 'tab1' with some values(which can be picked up from a different table 'tab2'). Now 'tab1' is getting new records inserted almost every few seconds(from MQ by a different system).
We want to design a solution that will update 'tab1' as soon as there is a new record added to 'tab1'. It doesn't have to be done in the same moment as the record is added, but the sooner its updated, the better. We were considering what can be the best way to do it:
1) First we thought of a 'before insert' trigger on tab1, so we can update the record - but that design was vetted out by our Architect, since the organization doesn't allow use of database triggers(don't know why, but that is a restriction, we have been asked to live with)
2) Second we thought, we will create a stored procedure which will perform the updates to records in 'tab1'. This stored procedure will be called within an long-running loop from a shell script. After every iteration there will be a pause of lets say 3 secs and then next loop will kick off, which will again call the stored proc. So this job will run 12 AM to 11:59 PM and then restarted every night.
My question is - is there a database only solution to this? Any other solutions are also welcome, but simplicity of design will be a huge plus. One colleague was wondering if there is a 'trigger-like' solution, which will perform the job within the database itself - so we don't have to write a shell script.
Any pointers will be appreciated!
Triggers The obvious solution.
DBMS_SCHEDULER Another obvious solution.
Continuous Query Notification This would be a "trigger-like" solution. It's meant to call an application when the results of a specific query would be different. But you can call PL/SQL instead of an application, and the query could be a simple select * from tab1; which would fire on any table changes. Normally I'd hope an architect would be to look at this solution and say, "a trigger would be a lot simpler".
DBMS_JOBS This is the old version of DBMS_SCHEDULER and is not as good. But it's different and maybe it won't be caught as an unauthorized feature.
Ignore the Architect The problem isn't that he disapproved of using triggers or jobs; there may be legitimate reasons to ban those technologies. The problem is that he rejected a sound idea without clearly articulating why it wasn't allowed. If he understood databases, or cared about your project, or acted like a professional, he would have said something like, "Oh, I'm sorry, I know that's the typical way to do this, but we don't allow it because of X, Y, Z."
To answer your questions:
Q: Is there a database only solution to this?
Unlikely, given all the limitations on your architecture.
Q: Any other solutions are also welcomed
It seems your likely solution is to have your application handle what would normally be handled by a trigger or stored procedure. Just do it all in one transaction.

DataSet and Insert statements

I'm having some trouble with Visual Studio and the creation of DataSets from a database.
Whenever I create a new TableAdapter, the Insert-Methods parameters are, lets just say, it messes up.
The database is a MS Access 2000 Database file. If I create a new TabelAdapter, everything works just fine. I select to create DatabaseDirect Methods and it all goes through without errors.
Then, I look at the statements. All perfectly fine. But then, I check the Insert-Methods parameters and I see this:
Parameter List http://img243.imageshack.us/img243/3175/paramlist.png
All the parameters are set to default Strings with no name. I have to rename and define all of their types over again.
Interesting thing is, this does never affect the last parameter (As you see: Comment is not renamed etc) and it only happens to the Insert-Method. When I check the Update-Method (which also uses the exact same parameters), they are all correctly named and the type also fits the one in the databse.
Parameter list http://img816.imageshack.us/img816/853/paramlistnormal.png
Is this a known bug? Did I do something wrong when creating the TableAdapter?
You see, it's not that big an issue, I just can't understand why it works with every other method, just not the Insert and it is quite a fuss to rename and retype all of the parameters if you create a table adapter for a table that has significantly more fields than just the 12 I showed you.
It looks like at least one other person has had a similar problem. Although this post doesn't specifically mention Access, the symptoms seem to be the same as what you've seen.
Unfortunately, there wasn't a clear solution listed there. The OP only says that he was able to call the automatically-generated Insert command, rather than trying to create his own Insert query, and so he did not need to resolve his original issue.
Also, he mentions that everything seems to work fine with all of the other tables in his database, and that this happens with only one table. That may mean that it's not an Access-specific issue, but rather that the tables in your database have something in common with the table in this post, and that common factor is what is preventing the TableAdapter from working as it should.

Fast query runs slow in SSRS

I have an SSRS report that calls out to a stored procedure. If I run the stored procedure directly from a query window, it will return in under 2 seconds. However, the same query run from an 2005 SSRS report takes up to 5 minutes to complete. This is not just happening on the first run, it happens every time. Additionally, I don't see this same problem in other environments.
Any ideas on why the SSRS report would run so slow in this particular environment?
Thanks for the suggestions provided here. We have found a solution and it did turn out to be related to the parameters. SQL Server was producing a convoluted execution plan when executed from the SSRS report due to 'parameter sniffing'. The workaround was to declare variables inside of the stored procedure and assign the incoming parameters to the variables. Then the query used the variables rather than the parameters. This caused the query to perform consistently whether called from SQL Server Manager or through the SSRS report.
I will add that I had the same problem with a non-stored procedure query - just a plain select statement. To fix it, I declared a variable within the dataset SQL statement and set it equal to the SSRS parameter.
What an annoying workaround! Still, thank you all for getting me close to the answer!
Add this to the end of your proc: option(recompile)
This will make the report run almost as fast as the stored procedure
I had the same problem, here is my description of the problem
"I created a store procedure which would generate 2200 Rows and would get executed in almost 2 seconds however after calling the store procedure from SSRS 2008 and run the report it actually never ran and ultimately I have to kill the BIDS (Business Intelligence development Studio) from task manager".
What I Tried: I tried running the SP from reportuser Login but SP was running normal for that user as well, I checked Profiler but nothing worked out.
Solution:
Actually the problem is that even though SP is generating the result but SSRS engine is taking time to read these many rows and render it back.
So I added WITH RECOMPILE option in SP and ran the report .. this is when miracle happened and my problem got resolve.
I had the same scenario occuring..Very basic report, the SP (which only takes in 1 param) was taking 5 seconds to bring back 10K records, yet the report would take 6 minutes to run. According to profiler and the RS ExecutionLogStorage table, the report was spending all it's time on the query. Brian S.'s comment led me to the solution..I simply added WITH RECOMPILE before the AS statement in the SP, and now the report time pretty much matches the SP execution time.
I simply deselected 'Repeat header columns on each page' within the Tablix Properties.
If your stored procedure uses linked servers or openquery, they may run quickly by themselves but take a long time to render in SSRS. Some general suggestions:
Retrieve the data directly from the server where the data is stored by using a different data source instead of using the linked server to retrieve the data.
Load the data from the remote server to a local table prior to executing the report, keeping the report query simple.
Use a table variable to first retrieve the data from the remote server and then join with your local tables instead of directly returning a join with a linked server.
I see that the question has been answered, I'm just adding this in case someone has this same issue.
I had the report html output trouble on report retrieving 32000 lines. The query ran fast but the output into web browser was very slow. In my case I had to activate “Interactive Paging” to allow user to see first page and be able to generate Excel file. The pros of this solution is that first page appears fast and user can generate export to Excel or PDF, the cons is that user can scroll only current page. If user wants to see more content he\she must use navigation buttons above the grid. In my case user accepted this behavior because the export to Excel was more important.
To activate “Interactive Paging” you must click on the free area in the report pane and change property “InteractiveSize”\ “Height” on the report level in Properties pane. Set this property to different from 0. I set to 8.5 inches in my case. Also ensure that you unchecked “Keep together on one page if possible” property on the Tablix level (right click on the Tablix, then “Tablix Properties”, then “General”\ “Page Break Options”).
I came across a similar issue of my stored procedure executing quickly from Management Studio but executing very slow from SSRS. After a long struggle I solved this issue by deleting the stored procedure physically and recreating it. I am not sure of the logic behind it, but I assume it is because of the change in table structure used in the stored procedure.
I Faced the same issue. For me it was just to unckeck the option :
Tablix Properties=> Page Break Option => Keep together on one page if possible
Of SSRS Report. It was trying to put all records on the same page instead of creating many pages.
Aside from the parameter-sniffing issue, I've found that SSRS is generally slower at client side processing than (in my case) Crystal reports. The SSRS engine just doesn't seem as capable when it has a lot of rows to locally filter or aggregate. Granted, these are result set design problems which can frequently be addressed (though not always if the details are required for drilldown) but the more um...mature...reporting engine is more forgiving.
In my case, I just had to disconnect and connect the SSMS. I profiled the query and the duration of execution was showing 1 minute even though the query itself runs under 2 seconds. Restarted the connection and ran again, this time the duration showed the correct execution time.
I was able to solve this by removing the [&TotalPages] builtin field from the bottom. The time when down from minutes to less than a second.
Something odd that I could not determined was having impact on the calculation of total pages.
I was using SSRS 2012.
Couple of things you can do, without executing the actual report just run the sproc from within the data tab of reporting services. Does it still take time?
Another option is to use SQL Profiler and determine what is coming in and out of the database system.
Another thing you can do to test it, so to recreate a simple report without any parameters. Run the report and see if it makes a difference. It could be that your RS report is corrupted or badly formed that may cause the rendering to be really slow.
Had the same problem, and fixed it by giving the shared dataset a default parameter and updating that dataset in the reporting server.
DO you use "group by" in the SSRS table?
I had a report with 3 grouped by fields and I noticed that the report runed very slowly despite having a light query, to the point where I can't even dial values in the search field.
Than I removed the groupings and now the report goes up in seconds and everything works in an instant.
In our case, no code was required.
Note from our Help Desk: "Clearing out your Internet Setting will fix this problem."
Maybe that means "clear cache."

Resources