SSIS Sort task (data flow) - sorting

I have a package where I am sorting by Aged Months column, I am not doing any other thing to filter the data and I am not removing duplicates, but once the data flow passes the Sort task the output rows are much less than the input. (1.218.206 rows in - only 53,344 out) any clue what's happening here?

I just solved it, I removed a derived column at the end of the package with some calculations and now all data get loaded into my destination. it doesn't make any sense since the derived column is way below the sorting task but for some reason that worked.
Thank you #DrHouseofSQL

Related

Persisting data between benchmarks using BenchmarkDotNet

I'm trying to benchmark two databases (different types, different locations).
My select benchmarks are working fine, but I'm having trouble with my inserts, updates and deletes.
I tried saving the key (GUID) I use for the insert in a class field of type Queue<string> but when my update benchmark is run this field is reset and thus empty, the same in my delete benchmark.
I don't want to call the delete statement after the insert statement in my insert benchmark or an insert statement in my delete benchmark because then the time results are off.
How to handle this situation?
I thought of creating a list of GUIDs in the [GlobalSetup] but when I change the number of iterations I need to increase or decrease this list.
Any advice will be much appreciated.
I've fixed this myself by saving the keys in a text file on GlobalCleanup and reading this file on GlobalSetup.

Correct SSIS executing tasks in random order

This is a repost from 8 years ago, since the solutions provided there didn't worked for me, maybe now there are more alternatives for me and the other people which had that problem and couldn't solve it as well.
I have six Data Flow Task as shown in the following screenshot:
They execute in a different order everytime I execute them, and even the first one executes twice. I've recreated the tasks, hoping it was SSIS executing them in creation order.
They run in a random order each time I execute the package despite the Precedence Constraints, so I decided to recreate the WHOLE package. Failed as well.
It simply feels like Microsoft is messing up with me, since I don't find another explanation.
Any help provided will be a relief for me if my post is not voted as Redundant.
/Edited in order to add info/
My real problem is SSIS not inserting data in a defined order. It just executes the data insert as it pleases. Because I do need data to be natively stored with a specified order. I've done it before, just don't get why this tiem is different. I could however run a ORDER BY to get the data as I want except it's not me the one who'll be accessing the data, hope the one who's gonna extract and print the data notice that.
The biggest issue however is SSIS executing twice a random task, as I can't have for any reason a duplicate of the data as it will be later used for summarizing as well.(I suspect this is connected to the random order execution since the guy who posted the original question had the exact same issue as me).
The real way to notice these issues is not looking at the SSIS processes, but looking at the data stored in the DB. Sorry if I was unclear about my problem.
The SSIS log doesn't show you the tasks in the order that they run in. In your screenshot above it looks like it put them in alphabetical order, in fact.
Just because Abril is above Enero in your execution log doesn't mean that Abril ran first and Enero ran second.
Addendum based on comments below:
You are under the misconception that if you INSERT data into a database in a certain order, that when you SELECT that data without specifying an ORDER BY, you will get the data in the order it was inserted. This turns out not to be the case. The ONLY guaranteed way to get data from a database in a certain order is to use an ORDER BY clause when you SELECT it.
Let me be perfectly clear about this. When you say "I get my data from March being listed first than my data from January, meaning it was inserted first", you are wrong.
As for why your January data seems be to getting inserted twice, we would need to see the details of all the working parts: the original source data, the destination data before insert, the destination data after insert, and the SSIS package that does the insert. Without enough information to reproduce the issue ourselves, there is no way we can help you understand why it is happening in your package.

Power Query 'an evaluation is in progress' when merging

When merging two tables in PowerQuery an evaulation is run to determine the possible number of matches. I run pretty large tables (merge a 10K record table with a 500K record table) so this can take a long time.
I know there will be matches because I have done this before and I am not a beginner. Yet PowerQuery insists on running this behaviour.
Is there anyway to baypass this step? It almost feels like when you just need to turn automatic calculation off in Excel so that you can get on with actually doing something.
Any ideas?
I would add in an upstream filter to limit the rows e.g. Keep Rows / Keep Top Rows / 100. You may need to do this on both Queries. Ideally you Keep enough rows or use a specific filter to get some matches, to help your downstream Query design work.
Then once the query design is finished, I would remove the filter(s) and let it rip.
This is what PQ should be doing in the Query Editor, but it does seem to go rogue on Merge in particular.

SSIS load validation

I'm importing data from a text file using Bulk Insert in the script component in SSIS package.
Package Ran successfully and data imported into SQL
Now how do I validate the completeness of the data?
1. I can get the row count from source and destination and compare.
but my manager wants to know how we can verify all the data has come a cross as it is without any issues.
If we are comparing 2 tables then probably a joining them on all fields and see anything missing out.
I’m not sure how to compare a text file and a sql table.
One way I could is write code to ready the file line by line and query the database for that record and compare each and every field. We have millions of records so this is not going to be a simple task.
Any other way to validate all of the data ??
Any suggestions would be much appreciated
Thanks
Ned
Well you could take the same file and do a look-up to the SQL source and if any of the columns don't match move to a row count.
Here's a generic example of how you can do this.

Informatica Data Quality - Match Analysis

In our Duplicate analysis requirement the input data has 1418 records out of which 1380 records are duplicate records.
On using the Match Analysis (used Key Generator, Matcher, Associator, Consolidator) in IDQ integrated with PowerCenter except for 8 records all duplicates were eliminated.
On executing the workflow by excluding these records, duplicates appear in other records for which duplicate didnt occur in the previous run.
Can anyone tell why this mismatch occurs?
Looks like your Consolidator transformation is not getting correct association ids and hence inserting multiple records resulting in duplicates.
please try the below steps:
1) Try to create a workflow in IDQ itself by deploying the mapping which you developed in IDQ.
2) Also keep a check on the business keys of the records which make a primary key through which you are identifying the dups in source.

Resources