Error in azureml "Non numeric value(s) were encountered in the target column." - automl

I am using Automated ML to run a time series forecasting pipeline.
When the AutoMLStep gets triggered, I get this error: Non numeric value(s) were encountered in the target column.
The data to this step is passed through an OutputTabularDatasetConfig, after applying the read_delimited_files() on an OutputFileDatasetConfig. I've inspected the prior step, and the data is comprised of a 'Date' column and a numeric column called 'Place' with +80 observations in monthly frequencies.
Nothing seems to be wrong with the column type or the data. I've also applied a number of techniques on the data prep side e.g. pd.to_numeric(), astype(float) to ensure it is numeric.
I've also tried forcing this through the FeaturizationConfig() add_column_purpose('Place','Numeric') but in this case, I get another error: Expected column(s) Place in featurization config's column purpose not found in X.
Any thoughts on how to solve?

So a few learnings on this interacting with the stellar Azure Machine Learning engineering team.
When calling the read_delimited_files() method, ensure that the output folder does not have many inputs or files. For example, if all intermediate outputs are saved to a common folder, it may read all the prior inputs into this folder, and depending upon the shape of the data, borrow the schema from the first file, or confuse all of them together. This can lead to inconsistencies and errors. In my case, I was dumping many files to the same location, hence this was causing confusion for this method. The fix is either to distinctly mark the output folder (e.g. with a UUID) or give different paths.
The dataframe from read_delimiter_files() may treat all columns as object type which can lead to a data type check failure (i.e. label_column needs to be numeric). To mitigate, explictly state the type. For example:
from azureml.data import DataType
prepped_data = prepped_data.read_delimited_files(set_column_types={"Place":DataType.to_float()})

Related

Issue with choice action when running transform map

I'm trying to insert records to a table by using transform maps. I have this field in the target table, which is a choice type, and I have set the choice action in the source table's field to reject if there's no matching value found. But, when I tried inserting the record using the transform map with the correct value, which exists in the choice list of the target field, it still got rejected and hence not inserting the records.
I have tried searching for possible reasons as to why it still got rejected even with correct value in the source field. Here's the sample link that I have found: https://hi.service-now.com/kb_view.do?sysparm_article=KB0677334
It says that if there are more than 40 characters for the choice list value it will be truncated and might not match those choice. But the choices in the target field has only 20 characters or less.
I have first tried running the transform map in the lower environments before proceeding to production. In the lower environment it works fine and the records got inserted. But, when I tried it in production it got rejected.
There is a difference between choice and choice list. Within the choice list the values are comma separated sys_ids. I could imagine that you have multiple values for import and then the max character are reached or the values do not match, etc.
You could use this approach:
Instead of a direct assignment, source to target field, use the script to target. Then you gain the full script power ;)
Maybe here you could add some logic like switch case or whatever, I guess you get the point.

Identify if objects are the same

I want to check certain objects against each other and identify if they are the same.
For example, I need to verify that the total cost in one page is the same as another page. I developed a script that works, however the total cost changes every day so I have to update the object properties in maintenance mode every day.
Is there a way that UFT automatically recognizes this object must change and update?
I request you to elaborate your question. For now, you can use .* if certain values of the object are changing. Alternatively, you can store the values in an excel sheet and you can change everyday depending on the requirement.
If this is not helpful let me know
It sounds like you actually want to compare the values shown in two different objects, and see if those values are the same. (I assume this because you say they are on two different pages)
Also, you mention maintenance mode, so I assume you are using checkpoints to store their expected values.
I would suggest: instead of storing the expected values in a checkpoint, you could read the value of the first object (getROproperty), store it in a variable (dataTable field, environment variable, etc), and then navigate to the other page, read the ROproperty from the other object, and then compare.
i.e.
if {browser,page,object...}.getROproperty({whateverPropertyYouNeed}) = environment({storedFirstValue}) then
reporter.reportevent micPass,"compare step","{details here}"
end if
*replace stuff inside {} with your code, I don't know what it is
If you need to actually store the total cost externally, you could use a DataTable field and export the sheets at the end. then import the same sheet at the beginning. That would save the data to an excel sheet on a drive.

How to generate sequential numbers using UCM Oracle iDOC Script?

I want to create a metadata field to a certain Check-In Profile. This field is Info Only and it looks like this:
IFAP-XXXX.DD.MMM/YY
I already have done this code:
<$dprDefaultValue="IFAP-" & formatDateWithPattern(dateCurrent(),"MMM/yy")$>
And the output is: IFAP-.01Jan/16
What I need is to put a sequential number where "XXXX" is, starting with 0800, every time a user checks in. For example: IFAP-0801.01.Jan/16. How can I do that?
Getting a unique sequence number can be challenging. One way would be to write a custom service that executes a query against the database (which controls the sequence) and responds with the number. You could then executeService("MY_CUSTOM_SEQUENCE_SERVICE")$> to get the value.
One of the issues with the above approach is what happens if the checkin fails (due to a filter or something else). Then you have accidentally used up a value.
Another approach would be to use a database trigger to replace XXXX with the sequence number (using the same database sequence number).

Oracle - build dimension from a file based data source

I'm trying to build a star schema in Oracle 12c. In my case my data source is not a relational database but a single excel/csv file which is populated via a google form, which means I don't have any sort of reference from a source system such as auto incremental keys/ids. Now what would be the best approach to build a star schema given this condition?
File row sample:
<submitted timestamp>,<submitted by user>,<region>,<country>,<branch>,<branch location>,<branch area>,<branch type>,<branch name>,<branch private? yes/no value>,<the following would be all "fact" values (measurements),...,...,...
In case i wanted to build a "branch" dimension, how would I handle updates/inserts after the first load into the dimension table?
Thought solution so far:
I had thought of making a concatenated string "key" with the branch values, which would make it unique (underscore would be the "glue" to concatenate the values), eg:
<region>_<country>_<branch>_<branch location> as branch_key
I would insert all the distinct branches into a staging table, including they branch_key column for each one of them, then when trying to load into the dimension I could compare which key does not exists yet in my dimension table and then insert it. As for updates, I'm a bit stuck on how to handle that, I had thought of having another file mapping which branches are active having a expiration date column. Basically trying to simulate what I could do having the data in a database instead of CSV files.
This is all I can think of so far, do you have any other recommendations/ideas on how to implement this? Take on consideration that the data source cannot as in I have to read these csv files, since data is not stored anywhere else.
Thank you.

How do I validate data in a file in SSIS before inserting into a database?

What I want to do is take data from a dbf file and insert it in a table. Which I've already done. Since there are many files, a For-Each Container is being used. However, before inserting it into a table, I want to look at the date fields and compare it to a date variable. If the dates match the variable, then move on to the step of the flow. But if any of the dates don't match the variable, then that file and its contents are discarded and the next file is looked at.
How do I accomplish this in SSIS?
You're looking for the Conditional Split Component within your Data Flow Task.
Assuming your source column is MyDate and you have an SSIS Variable called #[User::ReferenceDate] then you'd apply an expression like
[MyDate] == #[User::ReferenceDate]
That will evaluate to True when the dates match, false otherwise.
In your Conditional Split, add a row into the component.
OutputName: DatesMatched
Condition: [MyDate] == #[User::ReferenceDate]
Default output name: DatesUnmatched
Now when you connect the output from this to your destination, it'll ask whether you want to route the data using the DatesMatched or DatesUnmatched path. Use the DatesMatched path.
As I re-read this, if any of the dates don't match the variable, then that file and its contents are discarded then you're looking at double processing the file. The first time to read it all in and validate it. The second time, optional, will actually load to the database.
From your Conditional Split, add a RowCount to the DatesUnmatched path. Use a Variable of type Integer/Int32 named CountDatesUnmatched. In a perfect world, that will be zero when the validation of the file completes.
In the Precedent Constraint between the Validation Data Flow and the actual Import Data Flow, double click the connector line and change the evaluation criteria from Constraint to Expression and Constraint. Leave the value as Success and in the Expression use #[User::CountDatesUnmatched] == 0 That data flow will only light up if both conditions are true: parsing was successful and no rows were sent to the Row Count component.
Finally, you can cheat and sometimes this approach makes sense. If you're using an OLE DB Destination, then you can use the MaximumInsertCommitSize of the default 2B and a data access mode of fast load. This translates to "Everything is going to commit or none of it is". That can lock up your target table and cause your transaction log to grow heavily depending on how much data you're loading. Use the Conditional Split as described above but for the DatesUnmatched path, induce a failure. A Derived column with divide by zero or a script task with an explicit FireError event will cause that transaction to go belly up. You'd need to do some magic in the OnError event handler to not abort the overall file processing but it's a lazy hack (or one that is useful when double reading the file is prohibitive but impacting the database is less so)

Resources