I have a large dataset where I do data validation using a syntax. For each validation a variable is created and set to 1 if there is a problem with data I need to check out.
For each validation I then create a subset of the data holding only the relevant variables for the relevant cases. Still using the syntax I save these data files in excel in order to do the checks and correct the data (in a database).
Problem is that not all of my 50+ validations detect any problematic data every time I run the check, but 50+ files are saved because I save a file for each validation. I'd like to save the files only if there is data in them.
Current syntax for saving the files is:
DATASET ACTIVATE DataSet1.
DATASET COPY error1.
DATASET ACTIVATE error1.
FILTER OFF.
USE ALL.
SELECT IF (var_error1 = 1).
EXECUTE.
SAVE TRANSLATE OUTFILE='path + '_error1.xlsx'
/TYPE=XLS
/VERSION=12
/MAP
/REPLACE
/FIELDNAMES
/CELLS=VALUES
/KEEP=var1 var2 var3 var4.
This is repeated for each validation. If no case violates the validation for "error1" I will still get an output file (which is empty).
Any way to alter the syntax to only save the data if there are in fact cases that violate the validation?
The following syntax will write a new syntax that will contain the command to save the file to excel - only if there are actual cases in the file. You will run the new syntax every time, but the excel will be created only in relevant cases :
DATASET ACTIVATE DataSet1.
DATASET COPY error1.
DATASET ACTIVATE error1.
FILTER OFF.
USE ALL.
SELECT IF (var_error1 = 1).
EXECUTE.
do if $casenum=1.
write outfile='path\tmp\run error1.sps' /"SAVE TRANSLATE OUTFILE='path\var_error1.xlsx'"
/" /TYPE=XLS /VERSION=12 /MAP /REPLACE /FIELDNAMES /CELLS=VALUES /KEEP=var1 var2 var3 var4.".
end if.
exe.
insert file='path\tmp\run error1.sps'.
Please edit the "path" according to your needs.
Note that the new syntax will be written in all cases, but when there is no data in the file, the syntax will be empty, and so the empty file won't be written to excel.
Related
I am using a software, pc/mrp, which appears to have a built-in Visual Fox Pro editor for FRX files. It also has an external usage of an ef file. Based on some usage of Google, the report designer seems standard, not custom. The ef file usage may be a custom thing. Now, I need to find a way to get access to a value from a SQL statement inside the report. The statement needs to run per-line in the report.
EF:
This file has sections:
~in~
~out~
In these sections, I can run code, but if there is a ~perline~ type section, I don't know how to access it. I can use the ~in~ to try to create a relationship between the databases, as shown in the following example:
~IN~
THISAREA = SELECT()
USE PARTMAST ORDER BYPARTNO IN 0
SELECT (THISAREA)
SET RELATION TO PARTNO INTO PARTMAST ADDITIVE
GO TOP
~OUT~
USE IN SELECT("SALES")
But, for this I don't know how to join the databases. I have two databases (A,B) I need to connect them based on two fields (pono,line). If (A.pono and a.line) = (B.pono and B.line) then they would be linked. Is this possible?
Report Designer:
The other way I see this working is to do the query inside the report designer. Inside report properties is a variable tab. I can use this to assign to variables using expressions. I need:
SELECT field from B where B.pono = pono and B.line = line; INTO ARRAY varArray;
But, it gives me an error, likely because this is trying to create a new variable as opposed to actually assigning to the variable in the report. I tried editing a field inside the designer to use the preceeding code as well, but that also failed.
Is there a way using the report designer or the ef file to grab the data I need per line?
The sample code you show is doing something like a join with the SET RELATION command. To use SET RELATION, there has to be an index on the relevant field (expression) in the child table. So, if your table B has an index on PONO + LINE (or, if those are numeric, STR(PONO, length) + STR(LINE, length)), you can SET RELATION TO PONO + LINE INTO B, again, using the more complicated expression if necessary.
I have an analysis which contains hidden one column. While I'm trying to export result to .xlsx file, it works right, and hidden column doesn't print and calculation works fine. But when I'm trying to export it to .csv - either with ';' delimeter or tab-delimeter - hidden column appears.
There is no opportunity to exclude this column from analysis defenition because of field that I need to calculate, that has strong dependence on hidden column. Also I can't keep it in that form and remove column and add calculation by myself because this file after export automatically will be imported to database which has not enough space to make such operation every month till forever. Is there any way not to print hidden column and save prepared calculation while exporting to CSV?
No. CSV exports exactly what's in the analysis. That's its point and task. You can always clone your analysis, prepare the columns as you need and then just expose it as a download link.
CSV = exact, pure raw data as it's in the analysis construction
Excel = formatted based on what's rendered visually
How to read an excel sheet and put the cell value within different text fields through UiPath?
I have a excel sheet as follows:
I have read the excel contents and to iterate over the contents later I have stored the contents in a Output Data Table as follows:
Read Range - Output:
DataTable: CVdatatable
Output Data Table
DataTable: CVdatatable
Text: opCVdatatable
Screenshot:
Finally, I want to read the text opCVdatatable in a iteration and write them into text fields. So in the desired Input fileds I mentioned opCVdatatable or opCVdatatable+ "[k(enter)]" as required.
Screenshot:
But UiPath seems to start from the begining of the Output Data Table whenever I called for opCVdatatable.
Inshort, each desired Input fileds are iteratively getting filled up by all the data with the data stored in the Output Data Table.
Can someone help me out please?
My first recommendation is to use Workbook: Read range activity to read data from Excel because it is quicker, works in the background, and does not require excel to be installed on the system.
Start your sequence like this (note the add headers property is not checked):
You do not need to use Output Data Table because this activity outputs a string containing all row items. What you want to do instead is to access the items in the data table and output each one as a string in your type into, e.g., CVDatatable.Rows(0).Item(0).ToString, like so:
You mention you want to read the text opCVdatatable in an iteration and write them into text fields. This is a little bit more complex, but i'll give you an example. You can use a For Each Row activity and loop through each row in CVDatatable, setting the index property if required. See below:
The challenge is to get the selector correct here and make it dynamic, so that it targets a different text field per iteration. The selector for the type into activity will depend on the system you are targeting, but here is an example:
And the selector for this:
Also, here is a working XAML file for you to test.
Hope this helps.
Chris
Here's a different, more general approach. Instead of including the target in the process itself, the Excel would be modified to include parts of a selector:
Note that column B now contains an identifier, and this ID depends on the application you will be working with. For example, here's my sample app looks like. As you can see, the first text box has an id of 585, the second one is 586, and so on (note that you can work with any kind of identifier including the control's name if exposed to UiPath):
Now, instead of adding multiple Type Into elements to your workflow, you would add just a single one, loop over each of the datatable's row, and then create a dynamic selector:
In my case the selector for the Type Into activity looks as follows:
"<wnd cls='#32770' title='General' /><wnd ctrlid='" + row(1).ToString() + "' />"
This will allow you to maintain the process from the Excel sheet alone - if there's a new field that needs to be mapped, just add it to your sheet. No changes to the Workflow are required.
What I want to do is take data from a dbf file and insert it in a table. Which I've already done. Since there are many files, a For-Each Container is being used. However, before inserting it into a table, I want to look at the date fields and compare it to a date variable. If the dates match the variable, then move on to the step of the flow. But if any of the dates don't match the variable, then that file and its contents are discarded and the next file is looked at.
How do I accomplish this in SSIS?
You're looking for the Conditional Split Component within your Data Flow Task.
Assuming your source column is MyDate and you have an SSIS Variable called #[User::ReferenceDate] then you'd apply an expression like
[MyDate] == #[User::ReferenceDate]
That will evaluate to True when the dates match, false otherwise.
In your Conditional Split, add a row into the component.
OutputName: DatesMatched
Condition: [MyDate] == #[User::ReferenceDate]
Default output name: DatesUnmatched
Now when you connect the output from this to your destination, it'll ask whether you want to route the data using the DatesMatched or DatesUnmatched path. Use the DatesMatched path.
As I re-read this, if any of the dates don't match the variable, then that file and its contents are discarded then you're looking at double processing the file. The first time to read it all in and validate it. The second time, optional, will actually load to the database.
From your Conditional Split, add a RowCount to the DatesUnmatched path. Use a Variable of type Integer/Int32 named CountDatesUnmatched. In a perfect world, that will be zero when the validation of the file completes.
In the Precedent Constraint between the Validation Data Flow and the actual Import Data Flow, double click the connector line and change the evaluation criteria from Constraint to Expression and Constraint. Leave the value as Success and in the Expression use #[User::CountDatesUnmatched] == 0 That data flow will only light up if both conditions are true: parsing was successful and no rows were sent to the Row Count component.
Finally, you can cheat and sometimes this approach makes sense. If you're using an OLE DB Destination, then you can use the MaximumInsertCommitSize of the default 2B and a data access mode of fast load. This translates to "Everything is going to commit or none of it is". That can lock up your target table and cause your transaction log to grow heavily depending on how much data you're loading. Use the Conditional Split as described above but for the DatesUnmatched path, induce a failure. A Derived column with divide by zero or a script task with an explicit FireError event will cause that transaction to go belly up. You'd need to do some magic in the OnError event handler to not abort the overall file processing but it's a lazy hack (or one that is useful when double reading the file is prohibitive but impacting the database is less so)
Is there any way to use a variable in the from part (for example SELECT myColumn1 FROM ?) in a task flow - source without having to give the variable a valid default value first?
To be more exact in my situation it is so that I'm getting the tablenames out of a table and then use a control workflow to foreach over the list of tablenames and then call a workflow from within that then gets data from these tables each. In this workflow I have the before mentioned SELECT statement.
To get it to work properly I had to set the variable to a valid default value (on package level) as else I could not create the workflow itself (as the datasource couldn't be created as the select was invalid without the default value).
So my question here is: Is there any workaround possible in this case where I don't need a valid default value for the variable?
The datatables:
The different tables which are selected in the dataflow have the exact same tables in terms of columns (thus which columns, naming of columns and datatypes of columns). Only the data inside of them is different (thus its data for customer A, customer B,....).
You're in luck as this is a trivial thing to implement with SSIS.
The base problem for most people is that they come at SSIS like it's still DTS where you could do whatever you want inside a data flow. They threw out the extreme flexibility with DTS in favor of raw processing performance.
You cannot parameterize the table in a SQL statement. It's simply not allowed.
Instead, the approach that people take is to use Expressions. In your case, assuming you had two Variables of type String created, #[User::QualifiedTableName] and #[User::QuerySource]
Assume that [dbo].[spt_values] is assigned to QualifiedTableName. As you loop through the table names, you will assign the value into this variable.
The "trick" is to apply an expression to the #[User::QuerySource]. Make the expression
"SELECT T.* FROM " + #[User::QualifiedTableName] + " AS T;"
This allows you to change out your table name whenever the value of the other variable changes.
In your data flow, you will change your OLE DB Source to be driven by a query contained in a variable instead of the traditional table selection.
If you want an example of where I use QuerySource to drive a data flow, there's an example on mixing an integer and string in an ssis derived column
Create a second variable. Set its Expression to create the full
Select statement, using the value of the first variable.
In the Data Source, use "SQL command from variable" option for the
Data Access Mode property.
If you can, set a default value for the variable you created in step
That will make filling out the columns from your data source much easier.
If you can't use a default value for the variable, set the Data
Source's ValidateExternalMetadata property to False.
You may have to open the data source with the Advanced Editor and
create Output columns manually.