How can I extract values from a custom flat file header into variables? - etl

I have been stuck for a while with this problem and I have no clue. I am trying to upload multiple CSV files which has dates but I wanted the dates stored as date variables so I use the date variables to form part of the column in a table using script componet and I have no idea how to create the dates as date variables in SSIS.
CSV files look as shown below when opened in Excel.
CSV data 1:
Relative Date: 02/01/2013
Run Date: 15/01/2013
Organisation,AreaCode,ACount
Chadwell,RM6,50
Primrose,RM6,60
CSV data 2:
Relative Date: 14/02/2013
Run Date: 17/02/2013
Organisation,AreaCode,ACount
Second Ave,E12,110
Fourth Avenue, E12,130
I want the Relative Date and Run Date stored as date variables. I hope I made sense.

Your best solution would be to use a Script Task in your control flow. With this you would pre-process your CSV files - you can easily parse the first two rows, retrieving your wanted dates and storing them into two variables created beforehand. (http://msdn.microsoft.com/en-us/library/ms135941.aspx)
Important to make sure when passing the variables into the script task you set them as ReadWriteVariables. Use these variables in any way you desire afterwards.
Updated Quick Walkthrough:
I presume that the CSV files you will want to import will be located in the same directory:
Add a Foreach Loop Container which will loop through the files in your specified directory and inside, a Script Task which will be responsible for parsing the two dates in each of your files and a Data Flow Task which you will use for your file import.
Create the variables you will be using - one for the FileName/Path, two for the two dates you want to retrieve. These you won't fill in as it will be done automatically in your process.
Set-up your Foreach Loop Container:
Select a Foreach File Enumerator
Select a directory folder that will contain your files. (Even better, add a variable that will take in a path you specify. This can then be read into the enumerator using its expression builder)
Wildcard for the files that will be searched in that directory.
You also need to map each filename the enumerator generates to the variable you created earlier.
Open up your Script Task, add the three variables to the ReadWriteVariables section. This is important, otherwise you won't be able to write to your variables.
This is the script I used for the purpose. Not necessarily the best, works for this example.
public void Main()
{
string filePath = this.Dts.Variables["User::FileName"].Value.ToString();
using (StreamReader reader = new System.IO.StreamReader(filePath))
{
string line = "";
bool getNext = true;
while (getNext && (line = reader.ReadLine()) != null)
{
if(line.Contains("Relative Date"))
{
string date = getDate(line);
this.Dts.Variables["User::RelativeDate"].Value = date;
// Test Event Information
bool fireAgain = false;
this.Dts.Events.FireInformation(1, "Rel Date", date,
"", 0, ref fireAgain);
}
else if (line.Contains("Run Date"))
{
string date = getDate(line);
this.Dts.Variables["User::RunDate"].Value = date;
// Test Event Information
bool fireAgain = false;
this.Dts.Events.FireInformation(1, "Run Date", date,
"", 0, ref fireAgain);
break;
}
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
private string getDate(string line)
{
Regex r = new Regex(#"\d{2}/\d{2}/\d{4}");
MatchCollection matches = r.Matches(line);
return matches[matches.Count - 1].Value;
}
The results from the execution of the Script Task for the two CSV files. The dates can now be used in any way you fancy in your Data Flow Task. Make sure you skip the first rows you don't need to import in your Source configuration.

Related

Google sheets script Exceeded maximum execution time

I wrote a script to import stock data from a csv file stored in Google Drive to an existing google sheet.
In one function I'm doing this for multiple csv files. Unfortunately I get "Exceeded maximum execution time" sometimes, but not all the time.
Do you have an idea how I can boost performance on this:
//++++++++++++++ SPY +++++++++++++++++++
var file = DriveApp.getFilesByName("SPY.csv").next();
var csvData = Utilities.parseCsv(file.getBlob().getDataAsString());
//Create new temporary sheet
var activeSpreadsheet = SpreadsheetApp.getActiveSpreadsheet();
var yourNewSheet = activeSpreadsheet.getSheetByName("SPY-Import");
if (yourNewSheet != null) {
activeSpreadsheet.deleteSheet(yourNewSheet);
}
yourNewSheet = activeSpreadsheet.insertSheet();
yourNewSheet.setName("SPY-Import");
//Import
var sheet = SpreadsheetApp.getActiveSheet();
sheet.getRange(1, 1, csvData.length, csvData[0].length).setValues(csvData);
//Copy from temporary sheet to destination
var spreadsheet = SpreadsheetApp.getActive();
spreadsheet.getRange('A:B').activate();
spreadsheet.setActiveSheet(spreadsheet.getSheetByName('SPY'), true);
spreadsheet.getRange('A2').activate();
spreadsheet.getRange('\'SPY-Import\'!A:B').copyTo(spreadsheet.getActiveRange(),
SpreadsheetApp.CopyPasteType.PASTE_NORMAL, false);
//Delete temporary sheet
// Get Spreadsheet Object
var spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
// Get target sheet object
var sheet = spreadsheet.getSheetByName("SPY-Import");
// Delete
spreadsheet.deleteSheet(sheet);
Thanks in advance!
I believe your situation and goal as follows.
You have several CSV files like SPY.csv.
Your Spreadsheet has the several sheet corresponding to each CSV file like SPY.
You want to put the values from the CSV data to the Spreadsheet.
You want to put the values of the column "A" and "B" of the CSV data.
In your current situation, you copied the script in your question several times and run them by changing the csv filename and sheet name.
You want to reduce the process cost of your script. I understood your goal like this.
Modification points:
SpreadsheetApp.getActiveSpreadsheet() is used several times. And, activate() is used several times.
I think that in your case, SpreadsheetApp.getActiveSpreadsheet() can be declared one time, and activate() is not required to be used.
In order to copy the CSV data to Spreadsheet, the CSV data is put to a temporal sheet and the required values are copied to the destination sheet.
In this case, I think that the CSV data is directly put to the destination sheet by processing on the array.
I think that above points lead to the reduction of process cost. When above points are reflected to your script, it becomes as follows.
Modified script:
Please copy and paste the following script and prepare the variable of obj. When you run the script, the CSV data is retrieved and processed, and then, the values are put to the Spreadsheet.
function myFunction() {
var obj = [
{filename: "SPY.csv", sheetname: "SPY"},
{filename: "###.csv", sheetname: "###"},
,
,
,
];
var ss = SpreadsheetApp.getActiveSpreadsheet();
obj.forEach(({filename, sheetname}) => {
var file = DriveApp.getFilesByName(filename);
if (file.hasNext()) {
var sheet = ss.getSheetByName(sheetname);
if (sheet) {
// sheet.clearContents(); // Is this requierd in your situation?
var csv = DriveApp.getFileById(file.next().getId()).getBlob().getDataAsString();
var values = Utilities.parseCsv(csv).map(([a, b]) => [a, b]);
sheet.getRange(2, 1, values.length, 2).setValues(values);
}
}
});
}
Note:
Please use this script with enabling V8
I'm not sure about your CSV data. So when Utilities.parseCsv(csv) cannot be used, please use the delimiter.
In this modification, Spreadsheet service is used. If above modified script occurs the same error of Exceeded maximum execution time, please tell me. At that time, I would like to propose the sample script using Sheets API.
References:
Spreadsheet Service
parseCsv(csv)

Using Jmeter how extracted values can be inserted to different columns of a database table

Using JMeter-Bean shell sampler, I have extracted and split(with ',' delimiter) dynamic string. Currently I got stuck how this split values can be inserted to different columns of a database table.
Here is the code snippet which prints all the values after split. After splitting the string, each value will store into an array. You can retrieve by specifying the array position.
You can use the variable name e.g. aftersplit[0], aftersplit[1], and so on in the insert query.
String mystring = "here is my, dynamic, random, and unique string";
String[] aftersplit = mystring.split(",");
System.out.println(aftersplit[0]);
System.out.println(aftersplit[1]);
System.out.println(aftersplit[2]);
//To print all the values after splitting
for (int i=0; i < aftersplit.length; i++){
System.out.println(aftersplit[i]);
}
I would recommend the following approach: store your dynamic values into JMeter Variables having number postfix, example code:
String source = "foo,bar,baz";
int counter = 1;
for (String token : source.split(",")) {
vars.put("token_" + counter, token);
counter++;
}
It produces the following JMeter Variables:
token_1=foo
token_2=bar
token_3=baz
Then add ForEach Controller to iterate the generated variables and JDBC Request sampler as a child of the ForEach Controller to insert them into the database. See The Real Secret to Building a Database Test Plan With JMeter to learn how to establish database connection and execute arbitrary SQL queries using JMeter

Selectively loading iis log files into Hive

I am just getting started with Hadoop/Pig/Hive on the cloudera platform and have questions on how to effectively load data for querying.
I currently have ~50GB of iis logs loaded into hdfs with the following directory structure:
/user/oi/raw_iis/Webserver1/Org/SubOrg/W3SVC1056242793/
/user/oi/raw_iis/Webserver2/Org/SubOrg/W3SVC1888303555/
/user/oi/raw_iis/Webserver3/Org/SubOrg/W3SVC1056245683/
etc
I would like to load all the logs into a Hive table.
I have two issues/questions:
1.
My first issue is that some of the webservers may not have been configured correctly and will have iis logs without all columns. These incorrect logs need additional processing to map the available columns in the log to the schema that contains all columns.
The data is space delimited, the issue is that when not all columns are enabled, the log only includes the columns enabled. Hive cant automatically insert nulls since the data does not include the columns that are empty. I need to be able to map the available columns in the log to the full schema.
Example good log:
#Fields: date time s-ip cs-method cs-uri-stem useragent
2013-07-16 00:00:00 10.1.15.8 GET /common/viewFile/1232 Mozilla/5.0+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/27.0.1453.116+Safari/537.36
Example log with missing columns (cs-method and useragent):
#Fields: date time s-ip cs-uri-stem
2013-07-16 00:00:00 10.1.15.8 /common/viewFile/1232
The log with missing columns needs to be mapped to the full schema like this:
#Fields: date time s-ip cs-method cs-uri-stem useragent
2013-07-16 00:00:00 10.1.15.8 null /common/viewFile/1232 null
How can I map these enabled fields to a schema that includes all possible columns, inserting blank/null/- token for fields that were missing? Is this something I could handle with a Pig script?
2.
How can I define my Hive tables to include information from the hdfs path, namely Org and SubOrg in my dir structure example so that it is query-able in Hive? I am also unsure how to properly import data from the many directories into a single hive table.
First provide Sample data for better help.
How can I map these enabled fields to a schema that includes all possible columns, inserting blank/null/- token for fields that were missing?
If you have delimiter in file you can use Hive and hive automatically inserts nulls properly wherever data is not there.provided that you do not have delimiter as part of your data.
Is this something I could handle with a Pig script?
If you have delimiter among the fields then you can use Hive ,otherwise you can go for mapreduce/pig.
How can I include information from the hdfs path, namely Org and SubOrg in my dir structure example so that it is query-able in Hive?
Seems you are new bee in hive,before querying you have to create a table which includes information like path,delimiter and schema.
Is this a good candidate for partitioning?
You can apply partition on date if you wish.
I was able to solve both my issues with Pig UDF (user defined functions)
Mapping columns to proper schema: See this answer and this one.
All I really had to do is add some logic to handle the iis headers that start with #. Below are the snippets from getNext() that I used, everything else is the same as mr2ert's example code.
See the values[0].equals("#Fields:") parts.
#Override
public Tuple getNext() throws IOException {
...
Tuple t = mTupleFactory.newTuple(1);
// ignore header lines except the field definitions
if(values[0].startsWith("#") && !values[0].equals("#Fields:")) {
return t;
}
ArrayList<String> tf = new ArrayList<String>();
int pos = 0;
for (int i = 0; i < values.length; i++) {
if (fieldHeaders == null || values[0].equals("#Fields:")) {
// grab field headers ignoring the #Fields: token at values[0]
if(i > 0) {
tf.add(values[i]);
}
fieldHeaders = tf;
} else {
readField(values[i], pos);
pos = pos + 1;
}
}
...
}
To include information from the file path, I added the following to my LoadFunc UDF that I used to solve 1. In the prepareToRead override, grab the filepath and store it in a member variable.
public class IISLoader extends LoadFunc {
...
#Override
public void prepareToRead(RecordReader reader, PigSplit split) {
in = reader;
filePath = ((FileSplit)split.getWrappedSplit()).getPath().toString();
}
Then within getNext() I could add the path to the output tuple.

how to export and import BLOB data type in oracle

how to export and import BLOB data type in oracle using any tool. i want to give that as release
Answering since it has a decent view count even with it being 5 year old question..
Since this question was asked 5 years ago there's a new tool named SQLcl ( http://www.oracle.com/technetwork/developer-tools/sqlcl/overview/index.html)
We factored out the scripting engine out of SQLDEV into cmd line. SQLDev and this are based on java which allows usage of nashorn/javascript engine for client scripting. Here's a short example that is a select of 3 columns. ID just the table PK , name the name of the file to create, and content the BLOB to extract from the db.
The script command triggers this scripting. I placed this code below into a file named blob2file.sql
All this adds up to zero plsql, zero directories instead just some sql scripts with javascript mixed in.
script
// issue the sql
// bind if needed but not in this case
var binds = {}
var ret = util.executeReturnList('select id,name,content from images',binds);
// loop the results
for (i = 0; i < ret.length; i++) {
// debug messages
ctx.write( ret[i].ID + "\t" + ret[i].NAME+ "\n");
// get the blob stream
var blobStream = ret[i].CONTENT.getBinaryStream(1);
// get the path/file handle to write to
// replace as need to write file to another location
var path = java.nio.file.FileSystems.getDefault().getPath(ret[i].NAME);
// dump the file stream to the file
java.nio.file.Files.copy(blobStream,path);
}
/
The result is my table emptied into files ( I only had 1 row ). Just run as any plain sql script.
SQL> #blob2file.sql
1 eclipse.png
blob2file.sql eclipse.png
SQL>

How to use Crystal Reports without a tightly-linked DB connection?

I'm learning to use Crystal Reports (with VB 2005).
Most of what I've seen so far involves slurping data directly from a database, which is fine if that's all you want to display in the report.
My DB has a lot of foreign keys, so the way I've tried to stay sane with presenting actual information in my app is to add extra members to my objects that contain strings (descriptions) of what the foreign keys represent. Like:
Class AssetIdentifier
Private ID_AssetIdentifier As Integer
Private AssetID As Integer
Private IdentifierTypeID As Integer
Private IdentifierType As String
Private IdentifierText As String
...
Here, IdentifierTypeID is a foreign key, and I look up the value in a different table and place it in IdentifierType. That way I have the text description right in the object and I can carry it around with the other stuff.
So, on to my Crystal Reports question.
Crystal Reports seems to make it straightforward to hook up to records in a particular table (especially with the Experts), but that's all you get.
Ideally, I'd like to make a list of my classes, like
Dim assetIdentifiers as New List(Of AssetIdentifier)
and pass that to a Crystal Report instead of doing a tight link to a particular DB, have most of the work done for me but leaving me to work around the part that it doesn't do. The closest I can see so far is an ADO.NET dataset, but even that seems far removed. I'm already handling queries myself fine: I have all kinds of functions that return List(Of Whatever) based on queries.
Is there an easy way to do this?
Thanks in advance!
UPDATE: OK, I found something here:
http://msdn.microsoft.com/en-us/library/ms227595(VS.80).aspx
but it only appears to give this capability for web projects or web applications. Am I out of luck if I want to integrate into a standalone application?
Go ahead and create the stock object as described in the link you posted and create the report (StockObjectsReport) as they specify. In this simplified example I simply add a report viewer (crystalReportViewer1) to a form (Form1) and then use the following code in the Form_Load event.
stock s1 = new stock("AWRK", 1200, 28.47);
stock s2 = new stock("CTSO", 800, 128.69);
stock s3 = new stock("LTWR", 1800, 12.95);
ArrayList stockValues = new ArrayList();
stockValues.Add(s1);
stockValues.Add(s2);
stockValues.Add(s3);
ReportDocument StockObjectsReport = new StockObjectsReport();
StockObjectsReport.SetDataSource(stockValues);
crystalReportViewer1.ReportSource = StockObjectsReport;
This should populate your report with the 3 values from the stock object in a Windows Form.
EDIT: Sorry, I just realized that your question was in VB, but my example is in C#. You should get the general idea. :)
I'm loading the report by filename and it is working perfect:
//........
ReportDocument StockObjectsReport;
string reportPath = Server.MapPath("StockObjectsReport.rpt");
StockObjectsReport.Load(reportPath);
StockObjectsReport.SetDataSource(stockValues);
//Export PDF To Disk
string filePath = Server.MapPath("StockObjectsReport.pdf");
StockObjectsReport.ExportToDisk(ExportFormatType.PortableDocFormat, filePath);
#Dusty had it. However in my case it turned out you had to wrap the object in a list even though it was a single item before I could get it to print. See full code example:
string filePath = null;
string fileName = null;
ReportDocument newDoc = new ReportDocument();
// Set Path to Report File
fileName = "JShippingParcelReport.rpt";
filePath = func.GetReportsDirectory();
// IF FILE EXISTS... THEN
string fileExists = filePath +#"\"+ fileName;
if (System.IO.File.Exists(fileExists))
{
// Must Convert Object to List for some crazy reason?
// See: https://stackoverflow.com/a/35055093/1819403
var labelList = new List<ParcelLabelView> { label };
newDoc.Load(fileExists);
newDoc.SetDataSource(labelList);
try
{
// Set User Selected Printer Name
newDoc.PrintOptions.PrinterName = report.Printer;
newDoc.PrintToPrinter(1, false, 0, 0); //copies, collated, startpage, endpage
// Save Printing
report.Printed = true;
db.Entry(report).State = System.Data.Entity.EntityState.Modified;
db.SaveChanges();
}
catch (Exception e2)
{
string err = e2.Message;
}
}

Resources