SSIS: Variable as "NEW" derived column - visual-studio

I am trying to write the SQL task results in a flat file. I have a SQL task, followed by foreach loop that parses the object results to variables. Inside the foreach I have a data flow.
Inside the dataflow I have a Derived Column transformation editor, where I am trying to use the variables as columns. This is because I want to write the column in a flat file. However the Derived column keeps complaining about not having any INPUT columns (and writing 0 rows to flatfile) and I do not know why.
These are the instructions I am trying to follow: Using Variable as expression in Derived column transformation SSIS

Derived Column transformation is a part of Data Flow. Data Flow means that you have a set of rows with columns originated from some Data Flow Source, undergoing DFT transformations like Derived Column and then passing rows to Data Flow Destination. Data Flow Transformation needs to have input and output.
In your case - create a OLE DB Source with some dummy query like `select 0 as dummy' and direct this data flow to your Derived Column. Later you can drop this dummy column.

Related

Oracle - build dimension from a file based data source

I'm trying to build a star schema in Oracle 12c. In my case my data source is not a relational database but a single excel/csv file which is populated via a google form, which means I don't have any sort of reference from a source system such as auto incremental keys/ids. Now what would be the best approach to build a star schema given this condition?
File row sample:
<submitted timestamp>,<submitted by user>,<region>,<country>,<branch>,<branch location>,<branch area>,<branch type>,<branch name>,<branch private? yes/no value>,<the following would be all "fact" values (measurements),...,...,...
In case i wanted to build a "branch" dimension, how would I handle updates/inserts after the first load into the dimension table?
Thought solution so far:
I had thought of making a concatenated string "key" with the branch values, which would make it unique (underscore would be the "glue" to concatenate the values), eg:
<region>_<country>_<branch>_<branch location> as branch_key
I would insert all the distinct branches into a staging table, including they branch_key column for each one of them, then when trying to load into the dimension I could compare which key does not exists yet in my dimension table and then insert it. As for updates, I'm a bit stuck on how to handle that, I had thought of having another file mapping which branches are active having a expiration date column. Basically trying to simulate what I could do having the data in a database instead of CSV files.
This is all I can think of so far, do you have any other recommendations/ideas on how to implement this? Take on consideration that the data source cannot as in I have to read these csv files, since data is not stored anywhere else.
Thank you.

VBScript - Reading data from excel list

I have a couple of questions on reading data from excel sheet using vbscript code and storing
them in a dictionary object.
1) This is more excel specific. Can I do Data filtering on an excel based on what I select in a master column. For Ex: I have a column name TicketType which has a list of values namely Type1, Type2, Type3. If I select Type1, the data in the remaining columns should show data specific to Type 1, when I select Type2, the remaining columns should change to data specific to Type2.
2) If point one is possible in excel, I would want to use VBScript to select the value first in the master column and then read the filtered data in to a dictionary. I would request for some help here. My whole idea is not to create more rows and instead use one single row and keep filtering and reading.
Thanks
Srinivas

Using variables in From part of a task flow source

Is there any way to use a variable in the from part (for example SELECT myColumn1 FROM ?) in a task flow - source without having to give the variable a valid default value first?
To be more exact in my situation it is so that I'm getting the tablenames out of a table and then use a control workflow to foreach over the list of tablenames and then call a workflow from within that then gets data from these tables each. In this workflow I have the before mentioned SELECT statement.
To get it to work properly I had to set the variable to a valid default value (on package level) as else I could not create the workflow itself (as the datasource couldn't be created as the select was invalid without the default value).
So my question here is: Is there any workaround possible in this case where I don't need a valid default value for the variable?
The datatables:
The different tables which are selected in the dataflow have the exact same tables in terms of columns (thus which columns, naming of columns and datatypes of columns). Only the data inside of them is different (thus its data for customer A, customer B,....).
You're in luck as this is a trivial thing to implement with SSIS.
The base problem for most people is that they come at SSIS like it's still DTS where you could do whatever you want inside a data flow. They threw out the extreme flexibility with DTS in favor of raw processing performance.
You cannot parameterize the table in a SQL statement. It's simply not allowed.
Instead, the approach that people take is to use Expressions. In your case, assuming you had two Variables of type String created, #[User::QualifiedTableName] and #[User::QuerySource]
Assume that [dbo].[spt_values] is assigned to QualifiedTableName. As you loop through the table names, you will assign the value into this variable.
The "trick" is to apply an expression to the #[User::QuerySource]. Make the expression
"SELECT T.* FROM " + #[User::QualifiedTableName] + " AS T;"
This allows you to change out your table name whenever the value of the other variable changes.
In your data flow, you will change your OLE DB Source to be driven by a query contained in a variable instead of the traditional table selection.
If you want an example of where I use QuerySource to drive a data flow, there's an example on mixing an integer and string in an ssis derived column
Create a second variable. Set its Expression to create the full
Select statement, using the value of the first variable.
In the Data Source, use "SQL command from variable" option for the
Data Access Mode property.
If you can, set a default value for the variable you created in step
That will make filling out the columns from your data source much easier.
If you can't use a default value for the variable, set the Data
Source's ValidateExternalMetadata property to False.
You may have to open the data source with the Advanced Editor and
create Output columns manually.

Hive: How to have a derived column that has stores the sentiment value from the sentiment analysis API

Here's the scenario:
Say you have a Hive Table that stores twitter data.
Say it has 5 columns. One column being the Text Data.
Now How do you add a 6th column that stores the sentiment value from the Sentiment Analysis of the twitter Text data. I plan to use the Sentiment Analysis API like Sentiment140 or viralheat.
I would appreciate any tips on how to implement the "derived" column in Hive.
Thanks.
Unfortunately, while the Hive API lets you add a new column to your table (using ALTER TABLE foo ADD COLUMNS (bar binary)), those new columns will be NULL and cannot be populated. The only way to add data to these columns is to clear the table's rows and load data from a new file, this new file having that new column's data.
To answer your question: You can't, in Hive. To do what you propose, you would have to have a file with 6 columns, the 6th already containing the sentiment analysis data. This could then be loaded into your HDFS, and queried using Hive.
EDIT: Just tried an example where I exported the table as a .csv after adding the new column (see above), and popped that into M$ Excel where I was able to perform functions on the table values. After adding functions, I just saved and uploaded the .csv, and rebuilt the table from it. Not sure if this is helpful to you specifically (since it's not likely that sentiment analysis can be done in Excel), but may be of use to anyone else just wanting to have computed columns in Hive.
References:
https://cwiki.apache.org/Hive/gettingstarted.html#GettingStarted-DDLOperations
http://comments.gmane.org/gmane.comp.java.hadoop.hive.user/6665
You can do this in two steps without a separate table. Steps:
Alter the original table to add the required column
Do an "overwrite table select" of all columns + your computed column from the original table into the original table.
Caveat: This has not been tested on a clustered installation.

Altova Mapforce: Joining XML Input and conditional SQL Join using two tables

I'm trying to get the following done: Using Altova Mapforce, I use an XML file with schema as a source. I want to map it to exactly the same output, but only add data to one field.
The value of the field (it's Tax) is determined using a two table SQL join with a WHERE clause over both tables. The tables are joined using foreign keys, the relation is recognized by Mapforce.
The first field of the WHERE clause comes from the first table (header type table), the second and third field from the second tables (lines type tables).
However, I cannot seem to create the logical and correct equivalent of what I am describing here. I've tried it using complex AND constructions where it then inserts the one field I would need multiple times. I've tried WHERE clauses but they fail as they never supply both tables at the same time and there seems to be no way to use a pre-specified JOINing of two tables as a source. The WHERE clause then recognizes only the fields from the first table, not the second one.
Is there an example for this? Joining two (or more) tables, using WHERE to determine the exact row, then using a value from that row?
Best wishes.

Resources